{"id":13353,"date":"2019-02-26T16:39:17","date_gmt":"2019-02-26T15:39:17","guid":{"rendered":"http:\/\/dbdmg.polito.it\/wordpress\/?page_id=13353"},"modified":"2020-02-09T11:50:16","modified_gmt":"2020-02-09T10:50:16","slug":"big-data-architecture-and-data-analytics-2018-2019","status":"publish","type":"page","link":"https:\/\/dbdmg.polito.it\/wordpress\/teaching\/big-data-architecture-and-data-analytics-2018-2019\/","title":{"rendered":"Big Data: Architectures and Data Analytics (2018\/2019)"},"content":{"rendered":"<h3 id=\"tinyTOC\">Table of content<\/h3>\n<ul>\n<li><a href=\"#General-information-1\"><\/strong>General information<\/a><\/li>\n<li><a href=\"#Exam-rules-1\">Exam rules<\/a><\/li>\n<li><a href=\"#Announcements-1\">Announcements<\/a><\/li>\n<li><a href=\"#Materials-1\">Materials<\/a><\/li>\n<li><a href=\"#Exercises-1\">Exercises<\/a><\/li>\n<li><a href=\"#Exam-Examples-1\">Exam Examples<\/a><\/li>\n<li><a href=\"#Practices-1\">Practices<\/a><\/li>\n<li><a href=\"#Additional-materials-1\">Additional materials<\/a><\/li>\n<\/ul>\n<h3><strong><span id=\"General-information-1\"><\/strong>General information<\/span><\/h3>\n<ul>\n<li>ECTS: 6<\/li>\n<li>Professor: <a href=\"https:\/\/dbdmg.polito.it\/wordpress\/people\/paolo-garza\/\">Paolo Garza<\/a><\/li>\n<li>Students from AA to GZ\n<ul>\n<li>Teaching assistants:\n<ul>\n<li>Alessandro Farasin<\/li>\n<li>Francesco Ventura<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Students from HA to ZZ\n<ul>\n<li>Teaching assistants:\n<ul>\n<li>Andrea Pasini<\/li>\n<li>Marilisa Montemurro<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3><span id=\"Exam-rules-1\">Exam rules<\/span><\/h3>\n<ul>\n<li>Exam rules Academic Year 2018-2019 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/02\/ExamRulesBigData2017_18.pdf\">pdf<\/a>)<\/li>\n<\/ul>\n<h3><span id=\"Announcements-1\">Announcements<\/span><\/h3>\n<ul>\n<li>(21\/01\/2020)\n<ul>\n<li>The exam scheduled for January 24, 2020 will be held at 11:00 in Classroom 3I<\/li>\n<li>Please, remember to bring with you:\n<ul>\n<li>the student card and\/or an identity document<\/li>\n<li><strong>sheets of paper (&#8220;fogli protocollo&#8221;)<\/strong><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<ul>\n<li style=\"list-style-type: none;\"><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<table style=\"width: 1182.68px; border-color: #000000;\" border=\"2\" cellspacing=\"2\" cellpadding=\"2\">\n<tbody>\n<tr>\n<td style=\"width: 602px;\">Students from AA to GZ<\/td>\n<td style=\"width: 602px;\">Students from HA to ZZ<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 602px;\">\n<ul>\n<li>(04\/06\/2019) The <strong>lectures<\/strong> scheduled for next week are <strong>cancelled<\/strong> <strong>(from Monday June 10 to Friday June 14)<\/strong>\n<ul>\n<li>Tuesday, June 11 \u2013 from 5.30pm to 7pm (Team 1 &#8211; Lab activity) &#8211; Cancelled<\/li>\n<li>Wednesday, June 12 \u2013 from 5.30pm to 7pm (Team 2 &#8211; Lab activity) &#8211; Cancelled<\/li>\n<li>Thursday, June 13 &#8211; from 1.00pm to 4.00 pm (Lecture) &#8211; Cancelled<\/li>\n<li>Friday, June 14 &#8211; from 4.00pm to 5.30 pm (Lecture) &#8211; Cancelled<\/li>\n<\/ul>\n<\/li>\n<li>(26\/02\/2019)\n<ul>\n<li>First lecture: <strong>Thursday, March 7, 2019 at 13:00 &#8211; Room 1B<\/strong>.<\/li>\n<\/ul>\n<\/li>\n<li>(26\/02\/2019)\n<ul>\n<li><strong>No lab activities during the first two weeks.<\/strong><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/td>\n<td style=\"width: 602px;\">\n<ul>\n<li>(04\/06\/2019) The <strong>lectures<\/strong> scheduled for next week are <strong>cancelled<\/strong> <strong>(from Monday June 10 to Friday June 14)<\/strong>\n<ul>\n<li>Monday, June 10 &#8211; from 11.30 to 13.00 (Lecture) &#8211; Cancelled<\/li>\n<li>Tuesday, June 11 \u2013 from 10.00 to 11.30 (Team 1 &#8211; Lab activity) &#8211; Cancelled<\/li>\n<li>Wednesday, June 12 \u2013 from 13.00 to 14.30 (Team 2 &#8211; Lab activity) &#8211; Cancelled<\/li>\n<li>Friday, June 14 &#8211; from 8.30 to 11.30 pm (Lecture) &#8211; Cancelled<\/li>\n<\/ul>\n<\/li>\n<li>\u00a0(26\/02\/2019)\n<ul>\n<li>First lecture: <strong>Monday, March 4, 2019 at 11:30 &#8211; Room 8C<\/strong>.<\/li>\n<\/ul>\n<\/li>\n<li>(26\/02\/2019)\n<ul>\n<li><strong>No lab activities during the first two weeks.<\/strong><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h3><\/h3>\n<h3><span id=\"Materials-1\">Materials<\/span><\/h3>\n<ul>\n<li>Introduction to the course (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/02\/00_Intro_BigData_2x.pdf\">2 slides per page<\/a>,\u00a0<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/02\/00_Intro_BigData_6x.pdf\">6 slides per page<\/a>)<!-- (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/00_Intro_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/02\/00_Intro_BigData_6x.pdf\">6 slides per page<\/a>)--><\/li>\n<li>Introduction to Big Data (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/02\/01_Intro_BigData_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/02\/01_Intro_BigData_BigData_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>Hadoop and MapReduce\n<ul>\n<li>Introduction to Apache Hadoop and the MapReduce programming paradigm (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/02_Intro_HadoopAndMapReduce_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/02_Intro_HadoopAndMapReduce_BigData_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>Hadoop implementation of MapReduce &#8211; Basic structure of MapReduce programs in Hadoop (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/03\/03_HadoopImplementationOfMapReduce_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/03_HadoopImplementationOfMapReduce_6x.pdf\">6 slides per page<\/a>)\n<ul>\n<li>Source code of the Word Count Ecplise project (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/03\/WordCount.zip\">WordCount.zip<\/a>) &#8211; Use the import option to import it in Eclipse<\/li>\n<li>PDF version of the code (i.e., PDF version of the java files) (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/03\/WordCountPDF.zip\">WordCountPDF.zip<\/a>)<\/li>\n<li>BigData@Polito environment (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/03\/00_Cluster_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/03\/00_Cluster_BigData_6x.pdf\">6 slides per page<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Interaction with HDFS and Hadoop by means of the command line (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/04_HDFS_Hadoop_CommandLine_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/04_HDFS_Hadoop_CommandLine_BigData_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>MapReduce programs and Hadoop &#8211; Part 2 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/03\/05_MapReduce_Hadoop_Part2_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/03\/05_MapReduce_Hadoop_Part2_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>MapReduce programs and Hadoop &#8211; Part 3 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/03\/06_MapReduce_Hadoop_Part3_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/03\/06_MapReduce_Hadoop_Part3_BigData_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>MapReduce &#8211; Design patterns &#8211; Part 1 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/07_MapReduce_Patterns_Part1_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/07_MapReduce_Patterns_Part1_BigData_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>MapReduce &#8211; Multiple Inputs and Multiple Outputs (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/09_MultipleInputsAndOutputs_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/09_MultipleInputsAndOutputs_BigData_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>MapReduce &#8211; Distributed cache\n<ul>\n<li>New APIs (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/10_DistributedCache_BigDataNewAPIs_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/10_DistributedCache_BigDataNewAPIs_6x.pdf\">6 slides per page<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>MapReduce &#8211; Design patterns &#8211; Part 2 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/11_MapReduce_Patterns_Part2_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/11_MapReduce_Patterns_Part2_BigData_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>MapReduce &#8211; Relational Algebra\/SQL operators (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/12_SQLOperators_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/12_SQLOperators_BigData_6x.pdf\">6 slides per page<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Spark\n<ul>\n<li>Introduction to Apache Spark (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/14_SparkIntroduction_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/14_SparkIntroduction_BigData_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>Introduction to Apache Spark &#8211; Part 2 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/15_SparkIntroductionPart2_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/15_SparkIntroductionPart2_BigData_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>RDD-based programs (RDDs creation and basic transformations) &#8211; Part 1 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/16_SparkRDDBasedProgramming_Part1_BigData_Lambda_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/16_SparkRDDBasedProgramming_Part1_BigData_Lambda_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>RDD-based programs (RDDs basic actions) &#8211; Part 2 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/17_SparkRDDBasedProgramming_Part2_BigData_Lambda_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/17_SparkRDDBasedProgramming_Part2_BigData_Lambda_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>How to submit a Spark application (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/18_SparkSubmit_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/18_SparkSubmit_BigData_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>RDD-based programs (key-value pair RDDs and transformations on PairRDDs) &#8211; Part 3 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/04\/19_SparkRDDBasedProgramming_Part3_BigData_Lambda_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/04\/19_SparkRDDBasedProgramming_Part3_BigData_Lambda_6x.pdf\">6 slides per page<\/a>) &#8211; <strong>Updated on April 10, 2018<\/strong> (five new slides on flatMapToPair)<\/li>\n<li>RDD-based programs (Set transformations and actions on PairRDDs) &#8211; Part 4 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/20_SparkRDDBasedProgramming_Part4_BigData_Lambda_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/20_SparkRDDBasedProgramming_Part4_BigData_Lambda_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>RDD-based programs (DoubleRDDs) &#8211; Part 5 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/21_SparkRDDBasedProgramming_Part5_BigData_Lambda_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/21_SparkRDDBasedProgramming_Part5_BigData_Lambda_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>RDD-based programs (Cache, accumulators, broadcast variables) &#8211; Part 6 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/22_SparkRDDBasedProgramming_Part6_BigData_Lambda_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/22_SparkRDDBasedProgramming_Part6_BigData_Lambda_6x.pdf\">6 slides per page<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Datasets, DataFrames and Spark SQL (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/23_SparkSQL_Datasets_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/23_SparkSQL_Datasets_BigData_6x.pdf\">6 slides per page<\/a>)\n<ul>\n<li>Spark SQL example &#8211; DataFrames vs Datasets vs SQL\n<ul>\n<li>Problem specification (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/ExampleSparkSQL_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/ExampleSparkSQL_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>Solution (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/ExampleSparkSQL.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Spark SQL and User Defined Functions (UDFs) (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/23b_SparkSQL_Datasets_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/23b_SparkSQL_Datasets_BigData_6x.pdf\">6 slides per page<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Data mining and Machine learning algorithms with Spark MLlib\n<ul>\n<li>Data Mining &#8211; Recap\n<ul>\n<li>Introduction (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/24a-DMintro_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/24a-DMintro_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>Data and Preprocessing (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/24b-DMPreProc_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/24b-DMPreProc_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>Itemset mining and Association rules (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/05\/24c-DMassrules_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/24c-DMassrules_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>Classification (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/24d-DMClassification_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/24d-DMClassification_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>Clustering (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/24e-DMClustering_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/24e-DMClustering_6x.pdf\">6 slides per page<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Spark MLlib\n<ul>\n<li>Spark MLlib &#8211; Introduction and Classification of structured (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/25_SparkMLlib_Part1_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/25_SparkMLlib_Part1_BigData_6x.pdf\">6 slides per page<\/a>)\n<ul>\n<li>Logistic Regression example code (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/MLlibPipelineLogisticRegression.zip\">zip<\/a>)<\/li>\n<li>Decision Trees example code (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/MLlibPipelineDecisionTree.zip\">zip<\/a>)<\/li>\n<li>Decision Trees and Categorical class label example code (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/MLlibPipelineDecisionTreeCategoricalLabel.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Spark MLlib &#8211; Classification of textual data (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/26_SparkMLlib_Part2_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/26_SparkMLlib_Part2_BigData_6x.pdf\">6 slides per page<\/a>)\n<ul>\n<li>Textual data classification example code (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/MLlibPipelineText.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Spark MLlib &#8211; Parameter tuning (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/27_SparkMLlib_Part3_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/27_SparkMLlib_Part3_BigData_6x.pdf\">6 slides per page<\/a>)\n<ul>\n<li>Parameter tuning example code (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/MLlibPipelineLogisticRegressionCrossValidation.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Spark MLlib &#8211; Clustering of structured data (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/28_SparkMLlib_Part4_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/28_SparkMLlib_Part4_BigData_6x.pdf\">6 slides per page<\/a>)\n<ul>\n<li>Clustering example code (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/MLlibPipelineClustering.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Spark MLlib &#8211; Itemset and Association rule mining (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/29_SparkMLlib_Part5_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/29_SparkMLlib_Part5_BigData_6x.pdf\">6 slides per page<\/a>)\n<ul>\n<li>Itemset and Association rule mining example code (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/MLlibFPGrowth.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Spark MLlib &#8211; Linear regression (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/30_SparkMLlib_Part6_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/30_SparkMLlib_Part6_BigData_6x.pdf\">6 slides per page<\/a>)\n<ul>\n<li>\u00a0Linear regression example code (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/MLlibPipelineLinearRegression.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Spark Streaming (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/31_SparkStreaming_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/31_SparkStreaming_BigData_6x.pdf\">6 slides per page<\/a>)\n<ul>\n<li>Word Count &#8211; Streaming version (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/SparkStreamingWordCount.zip\">zip<\/a>)<\/li>\n<li>Word Count and Window (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/SparkStreamingWordCountWindow.zip\">zip<\/a>)<\/li>\n<li>Word Count &#8211; Stateful version (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/SparkStreamingWordCountStateful.zip\">zip<\/a>)<\/li>\n<li>Word Count &#8211; Streaming version &#8211; Read data from HDFS folder (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/SparkStreamingWordCountFolder.zip\">zip<\/a>)<\/li>\n<li>Word Count &#8211; Output sort by key &#8211; Based on the transformPair() transformation (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/SparkStreamingWordCountSortByKey.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>DBMS for Big data\n<ul>\n<li>Relational and Non-relational databases for Big data <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/06\/32_Rel_vs_NoRelDatabases_BigData_2x.pdf\">(2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/06\/32_Rel_vs_NoRelDatabases_BigData_6x.pdf\">6 slides per page<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3><span id=\"Exercises-1\">Exercises<\/span><\/h3>\n<ul>\n<li>MapReduce Exercises &#8211; Part 1 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/03\/01_MapReduce_Exercises_Part1_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/03\/01_MapReduce_Exercises_Part1_BigData_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>MapReduce Exercises &#8211; Part 2 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/03\/02_MapReduce_Exercises_Part2_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/03\/02_MapReduce_Exercises_Part2_BigData_6x.pdf\">6 slides per page<\/a>)\n<ul>\n<li>Solutions &#8211; Part 1 and 2\n<ul>\n<li>Source code\/Eclipse &#8211; maven projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/03\/SolutionsExercisesPart1_Part2.zip\">SolutionsExercisesPart1_Part2.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>MapReduce Exercises &#8211; Part 3 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/03_MapReduce_Exercises_Part3_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/03_MapReduce_Exercises_Part3_BigData_6x.pdf\">6 slides per page<\/a>)\n<ul>\n<li>Solutions &#8211; Part 3\n<ul>\n<li>Source code\/Eclipse &#8211; maven projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/03\/SolutionsExercisesPart3.zip\">SolutionsExercisesPart3.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>MapReduce Exercises &#8211; Part 4 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/04\/04_MapReduce_Exercises_Part4_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/04\/04_MapReduce_Exercises_Part4_BigData_6x.pdf\">6 slides per page<\/a>)\n<ul>\n<li>Solutions &#8211; Part 4\n<ul>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/SolutionsExercisesPart4.zip\">SolutionsExercisesPart4.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>MapReduce Exercises &#8211; Part 5 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/04\/05_MapReduce_Exercises_Part5_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/04\/05_MapReduce_Exercises_Part5_BigData_6x.pdf\">6 slides per page<\/a>)\n<ul>\n<li>Solutions &#8211; Part 5\n<ul>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/SolutionsExercisesPart5.zip\">SolutionsExercisesPart5.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>MapReduce Exercises &#8211; Part 6 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/04\/06_MapReduce_Exercises_Part6_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/04\/06_MapReduce_Exercises_Part6_BigData_6x.pdf\">6 slides per page<\/a>)\n<ul>\n<li>Solutions &#8211; Part 6\n<ul>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/04\/SolutionsExercisesPart6.zip\">SolutionsExercisesPart6.zip<\/a>) &#8211; An alternative solution for exercise 23 has been uploaded on March 25, 2019 (Exercise23TwoJobsV2)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>MapReduce Exercises &#8211; Part 7 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/04\/07_MapReduce_Exercises_Part7_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/07_MapReduce_Exercises_Part7_BigData_6x.pdf\">6 slides per page<\/a>)\n<ul>\n<li>Solutions &#8211; Part 7\n<ul>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/04\/SolutionsExercisesPart7.zip\">SolutionsExercisesPart7.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<ul>\n<li>Spark Exercises &#8211; Part 8 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/04\/08_Spark_Exercises_Part8_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/04\/08_Spark_Exercises_Part8_BigData_6x.pdf\">6 slides per page<\/a>)\n<ul>\n<li>Simulation &#8211; Exercise #31 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/04\/08_Spark_Exercises_Part8_BigData_Ex31Simulation_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/04\/08_Spark_Exercises_Part8_BigData_Ex31Simulation_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>Solutions &#8211; Part 8\n<ul>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/04\/SolutionsExercisesPart8.zip\">SolutionsExercisesPart8.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Solutions of Exercises 32-36 based on Spark SQL (with Dataset, DataFrame, SQL-like language)\n<ul>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/SolutionsExercisesPart8SparkSQL.zip\">SolutionsExercisesPart8SparkSQL.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Spark Exercises &#8211; Part 9 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/05\/09_Spark_Exercises_Part9_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/05\/09_Spark_Exercises_Part9_BigData_6x.pdf\">6 slides per page<\/a>) &#8211; <strong>Updated on April 13, 2019 (Exercise 39 bis has been included)<\/strong>\n<ul>\n<li>Solutions &#8211; Part 9\n<ul>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/05\/SolutionsExercisesPart9.zip\">SolutionsExercisesPart9.zip<\/a>)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/04\/Exercise39Bis.zip\">SolutionsExercise39Bis.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Solutions of Exercises 37-38 based on Spark SQL (with Dataset, DataFrame, SQL-like language)\n<ul>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/SolutionsExercisesPart9SparkSQL.zip\">SolutionsExercisesPart9SparkSQL.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Spark Exercises &#8211; Part 10 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/05\/10_Spark_Exercises_Part10_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/05\/10_Spark_Exercises_Part10_BigData_6x.pdf\">6 slides per page<\/a>)\n<ul>\n<li>Solutions &#8211; Part 10\n<ul>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/05\/SolutionsExercisesPart10.zip\">SolutionsExercisesPart10.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Spark Exercises &#8211; Part 11 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/05\/11_Spark_Exercises_Part11_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/05\/11_Spark_Exercises_Part11_BigData_6x.pdf\">6 slides per page<\/a>)\n<ul>\n<li>Solutions &#8211; Part 11\n<ul>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/05\/SolutionsExercisesPart11.zip\">SolutionsExercisesPart11.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Spark Exercises &#8211; Part 12 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/05\/12_Spark_Exercises_Part12_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/05\/12_Spark_Exercises_Part12_BigData_6x.pdf\">6 slides per page<\/a>)\n<ul>\n<li>Solutions &#8211; Part 12\n<ul>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/SolutionsExercisesPart12.zip\">SolutionsExercisesPart12.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Spark UDFs Exercises &#8211; Part 14 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/14_Spark_Exercises_Part14_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/14_Spark_Exercises_Part14_BigData_6x.pdf\">6 slides per page<\/a>)\n<ul>\n<li>Solutions &#8211; Part 14\n<ul>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/SolutionsExercisesPart14.zip\">SolutionsExercisesPart14.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Spark Streaming Exercises &#8211; Part 13 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/13_Spark_Exercises_Part13_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/13_Spark_Exercises_Part13_BigData_6x.pdf\">6 slides per page<\/a>)\n<ul>\n<li>Solutions &#8211; Part 13\n<ul>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/SolutionsExercisesPart13.zip\">SolutionsExercisesPart13.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Spark Streaming Exercises &#8211; Part 15 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/05\/15_Spark_Exercises_Part15_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/05\/15_Spark_Exercises_Part15_BigData_6x.pdf\">6 slides per page<\/a>) &#8211; <strong>Uploaded on May 29, 2019<\/strong>\n<ul>\n<li>Solutions &#8211; Part 15\n<ul>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/05\/SolutionsExercisesPart15.zip\">SolutionsExercisesPart15.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3><span id=\"Exam-Examples-1\">Exam Examples<\/span><\/h3>\n<ul>\n<li><strong>At the exam, the following template will be provided for the exercise based on Hadoop for the Driver part (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/06\/TemplateHadoop.pdf\">Hadoop template<\/a>)<\/strong>\n<ul>\n<li>For the Spark exercises, no templates are provided<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<ul>\n<li>Exam example #1\n<ul>\n<li>Text (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/05\/ExamExample1.pdf\">pdf<\/a>)<\/li>\n<li>Solution\n<ul>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/Example1.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam example #2\n<ul>\n<li>Text (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/06\/ExamExample2.pdf\">pdf<\/a>)<\/li>\n<li>Solution\n<ul>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/Example2.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam July 1, 2016\n<ul>\n<li>Text (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/07\/Exam20160701.pdf\">pdf<\/a>)<\/li>\n<li>Solution\n<ul>\n<li>Question 1: (d)<\/li>\n<li>Question 2: (b)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/Exam20160701.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam July 12, 2016\n<ul>\n<li>Text (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/Exam20160712.pdf\">pdf<\/a>)<\/li>\n<li>Solution\n<ul>\n<li>Question 1: (a)<\/li>\n<li>Question 2: (a)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/Exam20160712.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam September 19, 2016\n<ul>\n<li>Text (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/06\/Exam20160919.pdf\">pdf<\/a>)<\/li>\n<li>Solution\n<ul>\n<li>Question 1: (c)<\/li>\n<li>Question 2: (a)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/Exam20160919.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam June 30, 2017\n<ul>\n<li>Text (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/06\/Exam20170630_v1.pdf\">pdf<\/a>)<\/li>\n<li>Solution\n<ul>\n<li>Question 1: (b)<\/li>\n<li>Question 2: (c)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/Exam20170630.zip\">zip<\/a>) &#8211; Updated on June 12, 2019<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam July 14, 2017\n<ul>\n<li>Text (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/07\/Exam20170714_v1.pdf\">pdf<\/a>)<\/li>\n<li>Solution\n<ul>\n<li>Question 1: (d)<\/li>\n<li>Question 2: (c)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/Exam20170714.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam September 14, 2017\n<ul>\n<li>Text (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/09\/Exam20170914_v1.pdf\">pdf<\/a>)<\/li>\n<li>Solution\n<ul>\n<li>Question 1: (a)<\/li>\n<li>Question 2: (b)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/Exam20170914.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam January 22, 2018\n<ul>\n<li>Text (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/06\/Exam20180122_v1_updated.pdf\">pdf<\/a>)<\/li>\n<li>Solution\n<ul>\n<li>Question 1: (b)<\/li>\n<li>Question 2: (b)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/06\/Exam2018_01_22.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam June 26, 2018\n<ul>\n<li>Text Version #1 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/06\/Exam20180626_v1.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (c)<\/li>\n<li>Question 2: (c)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/06\/DraftSolutionv1.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Text Version #2 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/06\/Exam20180626_v2.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (b)<\/li>\n<li>Question 2: (c)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/06\/DraftSolutionv2.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam July 16, 2018\n<ul>\n<li>Text Version #1 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/07\/Exam20180716_v1.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (d)<\/li>\n<li>Question 2: (a)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/07\/DraftSolutionv1_20180716.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Text Version #2 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/07\/Exam20180716_v2.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (b)<\/li>\n<li>Question 2: (d)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/07\/DraftSolutionv2_20180716.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam September 3, 2018\n<ul>\n<li>Text Version #1 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/09\/Exam20180903_v1.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (d)<\/li>\n<li>Question 2: (c)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/06\/DraftSolutionv1_201809003.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Text Version #2 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/09\/Exam20180903_v2.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (b)<\/li>\n<li>Question 2: (c) <\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam February 15, 2019\n<ul>\n<li>Text Version #1 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/03\/Exam20190215_v1.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (d)<\/li>\n<li>Question 2: (c)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/06\/DraftSolutionv1_20190215.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Text Version #2 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/03\/Exam20190215_v2.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (d)<\/li>\n<li>Question 2: (b)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam July 2, 2019\n<ul>\n<li>Text Version #1 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/07\/Exam20190702_v1.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (a)<\/li>\n<li>Question 2: (b)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/07\/BozzaSoluzionev1_20190702.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Text Version #2 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/07\/Exam20190702_v2.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (a)<\/li>\n<li>Question 2: (b)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/07\/BozzaSoluzionev2_20190702.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam July 18, 2019\n<ul>\n<li>Text Version #1 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/07\/Exam20190718_v1.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (b)<\/li>\n<li>Question 2: (b)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/07\/DraftSolutionExam20190718_v1.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Text Version #2 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/07\/Exam20190718_v2.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (c)<\/li>\n<li>Question 2: (b)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/07\/DraftSolutionExam20190718_v2.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam September 19, 2019\n<ul>\n<li>Text (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/01\/Exam20190919_v1.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (b)<\/li>\n<li>Question 2: (b)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam September 19, 2019\n<ul>\n<li>Text Version #1 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/02\/Exam20200124_v1.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (c)<\/li>\n<li>Question 2: (b)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Text Version #2 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/02\/Exam20200124_v2.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (a)<\/li>\n<li>Question 2: (c)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3><span id=\"Practices-1\">Practices<\/span><\/h3>\n<ul>\n<li><strong>No lab activities during the first two weeks<\/strong><\/li>\n<li style=\"text-align: left;\">Schedule of the lab activities\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li>\n<table style=\"width: 1174px;\" border=\"2\" cellspacing=\"2\" cellpadding=\"2\">\n<tbody>\n<tr>\n<td style=\"width: 567.733px;\">Students from AA to GZ<\/td>\n<td style=\"width: 584.267px;\">Students from HA to ZZ<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 567.733px;\">\n<ul>\n<li>TEAM 1: Students from AA to CI &#8211; Tuesday from 5.30pm to 7pm<\/li>\n<li>TEAM 2: Students from CL to GZ &#8211; Wednesday from 5.30pm to 7pm<\/li>\n<li>\n<table style=\"width: 400px;\" border=\"2\" cellspacing=\"2\" cellpadding=\"2\">\n<tbody>\n<tr>\n<td style=\"width: 50.3833px;\"><\/td>\n<td style=\"width: 147.883px;\"><strong>Team 1<\/strong><\/td>\n<td style=\"width: 171.733px;\"><strong>Team 2<\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 50.3833px;\">Lab #1<\/td>\n<td style=\"width: 147.883px;\">Tuesday, March 19 &#8211; from 5.30pm to 7pm<\/td>\n<td style=\"width: 171.733px;\">Wednesday, March 20 &#8211; from 5.30pm to 7pm<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 50.3833px;\">Lab #2<\/td>\n<td style=\"width: 147.883px;\">Tuesday, March 26 &#8211; from 5.30pm to 7pm<\/td>\n<td style=\"width: 171.733px;\">Thursday, March 27 &#8211; from 5.30pm to 7pm<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 50.3833px;\">Lab #3<\/td>\n<td style=\"width: 147.883px;\">Tuesday, April 2 &#8211; from 5.30pm to 7pm<\/td>\n<td style=\"width: 171.733px;\">Wednesday, April 3 &#8211; from 5.30pm to 7pm<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 50.3833px;\">Lab #4<\/td>\n<td style=\"width: 147.883px;\">Tuesday, April 9 &#8211; from 5.30pm to 7pm<\/td>\n<td style=\"width: 171.733px;\">Wednesday, April 10 &#8211; from 5.30pm to 7pm<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 50.3833px;\">Lab #5<\/td>\n<td style=\"width: 147.883px;\">Tuesday, April 16 &#8211; from 5.30pm to 7pm<\/td>\n<td style=\"width: 171.733px;\">Wednesday, April 17 &#8211; from 5.30pm to 7pm<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 50.3833px;\">Lab #6<\/td>\n<td style=\"width: 147.883px;\">Tuesday, May 7 &#8211; from 5.30pm to 7pm<\/td>\n<td style=\"width: 171.733px;\">Wednesday, May 8 &#8211; from 5.30pm to 7pm<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 50.3833px;\">Lab #7<\/td>\n<td style=\"width: 147.883px;\">Tuesday, May 14 &#8211; from 5.30pm to 7pm<\/td>\n<td style=\"width: 171.733px;\">Wednesday, May 15 &#8211; from 5.30pm to 7pm<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 50.3833px;\">Lab #8<\/td>\n<td style=\"width: 147.883px;\">Tuesday, May 21 &#8211; from 5.30pm to 7pm<\/td>\n<td style=\"width: 171.733px;\">Wednesday, May 22 &#8211; from 5.30pm to 7pm<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 50.3833px;\">Lab #9<\/td>\n<td style=\"width: 147.883px;\">Tuesday, May 28 &#8211; from 5.30pm to 7pm<\/td>\n<td style=\"width: 171.733px;\">Wednesday, May 27 &#8211; from 5.30pm to 7pm<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 50.3833px;\">Lab #10<\/td>\n<td style=\"width: 147.883px;\">Tuesday, June 4 &#8211; from 5.30pm to 7pm<\/td>\n<td style=\"width: 171.733px;\">Wednesday, June 5 &#8211; from 5.30pm to 7pm<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/li>\n<\/ul>\n<\/td>\n<td style=\"width: 584.267px;\">\n<ul>\n<li>TEAM 1: Students from HA to QZ &#8211; Tuesday from 10am to 11.30am<\/li>\n<li>TEAM 2: Students from RA to ZZ &#8211; Wednesday from 1pm to 2.30pm<\/li>\n<li>\n<table style=\"width: 417px;\" border=\"2\" cellspacing=\"2\" cellpadding=\"2\">\n<tbody>\n<tr>\n<td style=\"width: 52.2833px;\"><\/td>\n<td style=\"width: 172.883px;\"><strong>Team 1<\/strong><\/td>\n<td style=\"width: 161.833px;\"><strong>Team 2<\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 52.2833px;\">Lab #1<\/td>\n<td style=\"width: 172.883px;\">Tuesday, March 19 &#8211; from 10am to 11.30am<\/td>\n<td style=\"width: 161.833px;\">Wednesday, March 20 &#8211; from 1pm to 2.30pm<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 52.2833px;\">Lab #2<\/td>\n<td style=\"width: 172.883px;\">Tuesday, March 26 &#8211; from 10am to 11.30am<\/td>\n<td style=\"width: 161.833px;\">Wednesday, March 27 &#8211; from 1pm to 2.30pm<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 52.2833px;\">Lab #3<\/td>\n<td style=\"width: 172.883px;\">Tuesday, April 2 &#8211; from 10am to 11.30am<\/td>\n<td style=\"width: 161.833px;\">Wednesday, April 4 &#8211; from 1pm to 2.30pm<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 52.2833px;\">Lab #4<\/td>\n<td style=\"width: 172.883px;\">Tuesday, April 9 &#8211; from 10am to 11.30am<\/td>\n<td style=\"width: 161.833px;\">Wednesday, April 10 &#8211; from 1pm to 2.30pm<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 52.2833px;\">Lab #5<\/td>\n<td style=\"width: 172.883px;\">Tuesday, April 16 &#8211; from 10am to 11.30am<\/td>\n<td style=\"width: 161.833px;\">Wednesday, April 17 &#8211; from 1pm to 2.30pm<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 52.2833px;\">Lab #6<\/td>\n<td style=\"width: 172.883px;\">Tuesday, May 7 &#8211;\u00a0 from 10am to 11.30am<\/td>\n<td style=\"width: 161.833px;\">Wednesday, May 8 &#8211; from 1pm to 2.30pm<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 52.2833px;\">Lab #7<\/td>\n<td style=\"width: 172.883px;\">Tuesday, May 14 &#8211; from 10am to 11.30am<\/td>\n<td style=\"width: 161.833px;\">Wednesday, May 15 &#8211; from 1pm to 2.30pm<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 52.2833px;\">Lab #8<\/td>\n<td style=\"width: 172.883px;\">Tuesday, May 21 &#8211; from 10am to 11.30am<\/td>\n<td style=\"width: 161.833px;\">Wednesday, May 22 &#8211; from 1pm to 2.30pm<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 52.2833px;\">Lab #9<\/td>\n<td style=\"width: 172.883px;\">Tuesday, May 28 &#8211; from 10am to 11.30am<\/td>\n<td style=\"width: 161.833px;\">Wednesday, May 29 &#8211; from 1pm to 2.30pm<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 52.2833px;\">Lab #10<\/td>\n<td style=\"width: 172.883px;\">Tuesday, June 4 &#8211; from 10am to 11.30am<\/td>\n<td style=\"width: 161.833px;\">Wednesday, June 5 &#8211; from 1pm to 2.30pm<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/li>\n<\/ul>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><!--\n \t\n\n<li style=\"list-style-type: none;\">\n\n\n<ul><\/li>\n\n\n--><\/p>\n<p><!-- Lab1 --><\/p>\n<ul>\n<li>Lab1: Hadoop and MapReduce\n<ul>\n<li>BigData@Polito environment (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/03\/00_Cluster_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/03\/00_Cluster_BigData_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>Text (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/03\/Lab1_BigData.pdf\">pdf<\/a>)<\/li>\n<li>Project and data (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/03\/Lab1_BigData_with_libraries.zip\">Lab1_BigData.zip<\/a>)<\/li>\n<li>Solution\n<ul>\n<li>Bonus track (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/03\/Lab1_SolBonus.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><!-- Lab2 --><\/p>\n<ul>\n<li>Lab2: Filter with Hadoop MapReduce\n<ul>\n<li>Text (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/03\/Lab2.pdf\">pdf<\/a>)<\/li>\n<li>Skeleton Eclipse project Hadoop &#8211; MapReduce (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/03\/Lab_Skeleton-1.zip\">Lab_Skeleton.zip<\/a><\/li>\n<li>Solution\n<ul>\n<li>Part 1 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/Lab2_Sol.zip\">zip<\/a>)<\/li>\n<li>Bonus track (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/Lab2_SolBonus.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><!-- Lab3 --><\/p>\n<ul>\n<li>Lab3: Frequently bought\/reviewed together application with Hadoop MapReduce\n<ul>\n<li>Text (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/Lab3_1718.pdf\">pdf<\/a>)<\/li>\n<li>Skeleton Eclipse project Hadoop &#8211; MapReduce (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/Lab3_Skeleton_Java8.zip\">Lab3_Skeleton.zip<\/a>)<\/li>\n<li>Solution\n<ul>\n<li>Solution (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/Lab3_Sol_Java8-1.zip\">zip<\/a>) &#8211; Three alternative solutions are provided<\/li>\n<li>Comments on the three uploaded possible solutions (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/Lab3_DraftSolution_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/Lab3_DraftSolution_BigData_6x.pdf\">6 slides per page<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><!-- Lab4 --><\/p>\n<ul>\n<li>Lab4: Normalized ratings for product recommendations with Hadoop MapReduce\n<ul>\n<li>Text (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/Lab4.pdf\">pdf<\/a>)<\/li>\n<li>Sample dataset (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/04\/ReviewsSample.csv\">ReviewsSample.csv<\/a>)<\/li>\n<li>Skeleton Eclipse project Hadoop &#8211; MapReduce (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/Lab4_Skeleton_Java8.zip\">Lab4_Skeleton.zip<\/a>)<\/li>\n<li>Solution (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/Lab4_Sol_Java8.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><!-- Lab5 --><\/p>\n<ul>\n<li>Lab5: Filter data and compute basic statistics with Apache Spark\n<ul>\n<li>Text (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/04\/Lab5Bigdata-1.pdf\">pdf<\/a>)<\/li>\n<li>SampleLocalFile.csv (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/SampleLocalFile.csv\">SampleLocalFile.csv<\/a>)<\/li>\n<li>Skeleton Eclipse project &#8211; Spark (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/04\/Lab5_Template_Java8-6.zip\">Lab5_Template.zip<\/a>)<\/li>\n<li>Solution (<a href=\"http:\/\/dbdmg.polito.it\/template_labBigData\/Lab5_Sol.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><!-- Lab6 --><\/p>\n<ul>\n<li>Lab6: Frequently bought\/reviewed together application with Apache Spark\n<ul>\n<li>Text (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/Lab6.pdf\">pdf<\/a>)<\/li>\n<li>Sample dataset (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/ReviewsSample.csv\">ReviewsSample.csv<\/a>)<\/li>\n<li>Skeleton Eclipse project &#8211; Spark (<a href=\"http:\/\/dbdmg.polito.it\/template_labBigData\/Lab6_Template_Java8.zip\">Lab6_Template.zip<\/a>)<\/li>\n<li>Solution (<a href=\"http:\/\/dbdmg.polito.it\/template_labBigData\/Lab6Sol.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><!-- Lab7 --><\/p>\n<ul>\n<li>Lab7: Bike sharing data analysis\n<ul>\n<li>Text (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/Lab7.pdf\">pdf<\/a>)<\/li>\n<li>Sample data (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/sampleData.zip\">zip<\/a>)<\/li>\n<li>Skeleton Eclipse project &#8211; Spark (<a href=\"http:\/\/dbdmg.polito.it\/template_labBigData\/Lab7_Template_Java8.zip\">Lab7_Template.zip<\/a>)<\/li>\n<li>Example of KML file (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/example.zip\">zip<\/a>)<\/li>\n<li>Another KML visualizer that can be used to visualize on a map the result of your analysis: <a href=\"http:\/\/kmlviewer.nsspot.net\/\">http:\/\/kmlviewer.nsspot.net\/<\/a><\/li>\n<li>Solution (<a href=\"http:\/\/dbdmg.polito.it\/template_labBigData\/Lab7_Sol_Java8.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><!-- Lab8 --><\/p>\n<ul>\n<li>Lab8: Bike sharing data analysis based on Spark SQL\n<ul>\n<li>Text (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/Lab8.pdf\">pdf<\/a>)<\/li>\n<li>Sample data (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/sampleData.zip\">zip<\/a>)<\/li>\n<li>Skeleton Eclipse project &#8211; Spark (<a href=\"http:\/\/dbdmg.polito.it\/template_labBigData\/Lab8_Template_Java8.zip\">Lab8_Template.zip<\/a>)<\/li>\n<li>Solution (<a href=\"http:\/\/dbdmg.polito.it\/template_labBigData\/Lab8Dataset.zip\">Dataset-based.zip<\/a>) (<a href=\"http:\/\/dbdmg.polito.it\/template_labBigData\/Lab8SQL.zip\">SQL-based.zip<\/a>) (<a href=\"http:\/\/dbdmg.polito.it\/template_labBigData\/Lab8DataFrame.zip\">DataFrame-based.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><!-- Lab9 --><\/p>\n<ul>\n<li>Lab9: A classification pipeline with MLlib + SparkSQL\n<ul>\n<li>Text (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/Lab9.pdf\">pdf<\/a>)<\/li>\n<li>Skeleton Eclipse project &#8211; Spark (<a href=\"http:\/\/dbdmg.polito.it\/template_labBigData\/Lab9_Template.zip\">Lab9_Template.zip<\/a>)<\/li>\n<li>Sample file with 100 reviews (<a href=\"http:\/\/dbdmg.polito.it\/template_labBigData\/ReviewsSample.csv\">ReviewsSample.csv<\/a>)<\/li>\n<li>Solution\n<ul>\n<li>Logistic regression (<a href=\"http:\/\/dbdmg.polito.it\/template_labBigData\/Lab9_SolutionLR.zip\">zip<\/a>)<\/li>\n<li>DecisionTree (<a href=\"http:\/\/dbdmg.polito.it\/template_labBigData\/Lab9_SolutionDT.zip\">zip<\/a>)<\/li>\n<li>Logistic regression based on text analysis (<a href=\"http:\/\/dbdmg.polito.it\/template_labBigData\/Lab9_SolutionLRText.zip\">zip<\/a>)<\/li>\n<li>DecisionTree based on text analysis (<a href=\"http:\/\/dbdmg.polito.it\/template_labBigData\/Lab9_SolutionDTText.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><!-- Lab10 --><\/p>\n<ul>\n<li>Lab10: Tweet analysis &#8211; Spark streaming\n<ul>\n<li>Text (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/06\/Lab10.pdf\">pdf<\/a>)<\/li>\n<li>Skeleton Eclipse project &#8211; (<a href=\"http:\/\/dbdmg.polito.it\/template_labBigData\/Lab10_TemplateSpark_Java1.8_2.zip\">Lab10_Template.zip<\/a>)<\/li>\n<li>Example files &#8211; tweets (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/exampledata_tweets.zip\">exampledata_tweets.zip<\/a>)<\/li>\n<li>Solution (<a href=\"http:\/\/dbdmg.polito.it\/template_labBigData\/Lab10_Solution.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><!-- end --><\/p>\n<h3><span id=\"Additional-materials-1\">Additional materials<\/span><\/h3>\n<ul>\n<li>Slides and screencasts about Java (kindly provided by prof. Torchiano) (<a href=\"http:\/\/dbdmg.polito.it\/~paolo\/JavaMaterials\/02JEY%20-%20Object%20Oriented%20Programming.html\">link<\/a>)\n<ul>\n<li>Suggested slides\/lectures for those students who do not know Java\n<ul>\n<li>OO Paradigm and UML (The UML part in not mandatory)<\/li>\n<li>The Java Environment<\/li>\n<li>Java Basic Features<\/li>\n<li>Java Inheritance<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<ul>\n<li>Slides about the Scala language &#8211; These slides are not part of the course program (no questions or exercises on these slides at the exam)\n<ul>\n<li>Introduction (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/01-scala-Introduction.pdf\">pdf<\/a>)<\/li>\n<li>Data types, variables, expressions, loops, basic console operations (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/02-scala_part2.pdf\">pdf<\/a>)<\/li>\n<li>Scala and Functional programming (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/03-scala_part3.pdf\">pdf<\/a>)<\/li>\n<li>Collections (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/04-scala_part4.pdf\">pdf<\/a>)<\/li>\n<li>Scala and Object-oriented programming (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/05-scala_part5.pdf\">pdf<\/a>)<\/li>\n<li>Exercises\n<ul>\n<li>Text (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/PracticeScala.pdf\">pdf<\/a>)<\/li>\n<li>Solution (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/Solutions.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>MapReduce &#8211; Hadoop internals (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/13_HadoopInternals_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/13_HadoopInternals_BigData_6x.pdf\">6 slides per page<\/a>) &#8211; These slides are not part of the course program (no questions or exercises on these slides at the exam)<\/li>\n<li>Apache HIVE (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/06\/33_Hive_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/06\/33_Hive_BigData_6x.pdf\">6 slides per pag<\/a>e) &#8211; These slides are not part of the course program (no questions or exercises on these slides at the exam)<\/li>\n<li>Apache Storm &#8211; These slides are not part of the course program (no questions or exercises on these slides at the exam)\n<ul>\n<li>Introduction (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/01_Introduction_Storm_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/01_Introduction_Storm_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>Storm Architecture (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/02_StormArchitecture_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/02_StormArchitecture_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>Developing Storm applications (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/03_DevelopingStormApplictions_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/03_DevelopingStormApplictions_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>Advances topics (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/04_AdvancedDevelopment_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/04_AdvancedDevelopment_6x.pdf\">6 slides per page<\/a>)<\/li>\n<li>Trident (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/05_Trident_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/05_Trident_6x.pdf\">6 slides per page<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<br class=\"fixfloat\" \/>","protected":false},"excerpt":{"rendered":"<p>Table of content General information Exam rules Announcements Materials Exercises Exam Examples Practices Additional materials General information ECTS: 6 Professor: Students from AA to GZ Teaching assistants: Alessandro Farasin Francesco Ventura Students from HA to ZZ Teaching assistants: Andrea Pasini Marilisa Montemurro Exam rules Exam rules Academic Year 2018-2019 (pdf) Announcements (21\/01\/2020) The exam scheduled<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/teaching\/big-data-architecture-and-data-analytics-2018-2019\/\">[&#8230;]<\/a><\/p>\n","protected":false},"author":3,"featured_media":0,"parent":96,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-13353","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/pages\/13353","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/comments?post=13353"}],"version-history":[{"count":100,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/pages\/13353\/revisions"}],"predecessor-version":[{"id":15321,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/pages\/13353\/revisions\/15321"}],"up":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/pages\/96"}],"wp:attachment":[{"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/media?parent=13353"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}