{"id":16570,"date":"2020-09-24T13:16:10","date_gmt":"2020-09-24T12:16:10","guid":{"rendered":"https:\/\/dbdmg.polito.it\/wordpress\/?page_id=16570"},"modified":"2021-10-26T14:26:10","modified_gmt":"2021-10-26T13:26:10","slug":"big-data-architectures-and-data-analytics-2020-2021","status":"publish","type":"page","link":"https:\/\/dbdmg.polito.it\/wordpress\/teaching\/big-data-architectures-and-data-analytics-2020-2021\/","title":{"rendered":"Big Data: Architectures and Data Analytics (2020\/2021)"},"content":{"rendered":"<h3 id=\"tinyTOC\">Table of content<\/h3>\n<ul>\n<li><a href=\"#General-information-1\"><\/strong>General information<\/a><\/li>\n<li><a href=\"#Exam-rules-1\">Exam rules<\/a><\/li>\n<li><a href=\"#Announcements-1\">Announcements<\/a><\/li>\n<li><a href=\"#Slides-1\">Slides<\/a><\/li>\n<li><a href=\"#Exercises-1\">Exercises<\/a><\/li>\n<li><a href=\"#Practices-1\">Practices<\/a><\/li>\n<li><a href=\"#Exam-Examples-1\">Exam Examples<\/a><\/li>\n<li><a href=\"#Additional-material-1\">Additional material<\/a><\/li>\n<\/ul>\n<h1><span style=\"color: #ff0000;\">This is the old version of the web page of the Big data course.<\/span><\/h1>\n<h1><span style=\"color: #ff0000;\">Web page of the academic year 2021\/22: <a style=\"color: #ff0000;\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/index.php\/2021\/09\/16\/big-data-architectures-and-data-analytics-2021-2022\/\">link<\/a><\/span><\/h1>\n<h3><\/h3>\n<h3><strong><span id=\"General-information-1\"><\/strong>General information<\/span><\/h3>\n<ul>\n<li>ECTS: 6<\/li>\n<li>Professor: <a href=\"https:\/\/dbdmg.polito.it\/wordpress\/people\/paolo-garza\/\">Paolo Garza<\/a><\/li>\n<li>Teaching assistants:\n<ul>\n<li>Luca Colomba<\/li>\n<\/ul>\n<ul>\n<li>Francesco Ventura<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3><span id=\"Exam-rules-1\">Exam rules<\/span><\/h3>\n<ul>\n<li>Exam rules Academic Year 2020-2021 (<a href=\"https:\/\/didattica.polito.it\/pls\/portal30\/gap.pkg_guide.viewGap?p_cod_ins=01QYDPE&amp;p_a_acc=2021&amp;p_header=S&amp;p_lang=IT\">link<\/a>)<\/li>\n<\/ul>\n<h3><span id=\"Announcements-1\">Announcements<\/span><\/h3>\n<ul>\n<li>\u00a0(24\/09\/2020)\n<ul>\n<li><strong>First (online) lecture: Tuesday, September 29 at 13.00 &#8211; <\/strong><strong>Online virtual classroom<\/strong><\/li>\n<\/ul>\n<\/li>\n<li>(24\/09\/2020)\n<ul>\n<li><strong>No lab activities during the first two weeks.<\/strong><\/li>\n<li><strong>The lab activities scheduled for Monday, September 28 from 17:30 to 19:00 and Tuesday, September 29 from 8:30 to 10:00 are cancelled.<\/strong><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3><span id=\"Slides-1\">Slides<\/span><\/h3>\n<ul>\n<li>Introduction to the course content and exam rules (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/00_Intro_BigData_2021.pdf\">slides<\/a>)<\/li>\n<li>Introduction to Big Data (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/01_Intro_BigData_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/01_Intro_BigData_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>)<\/li>\n<li>Big Data Architectures (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/02_Architectures_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/02_Architectures_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>)<\/li>\n<li>Hadoop and MapReduce\n<ul>\n<li>Introduction to Apache Hadoop and the MapReduce programming paradigm (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/03_Intro_HadoopAndMapReduce_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/03_Intro_HadoopAndMapReduce_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>)\n<ul>\n<li>Interaction with HDFS and Hadoop by means of the command line (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/03b_HDFS_Hadoop_CommandLine_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/03b_HDFS_Hadoop_CommandLine_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Hadoop implementation of MapReduce (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/04_HadoopImplementationOfMapReduce.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/04_HadoopImplementationOfMapReduce_NewStyle.pdf\">slides &#8211; no black background<\/a>)\n<ul>\n<li>Source code of the Word Count Ecplise project (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/03\/WordCount.zip\">WordCount.zip<\/a>) &#8211; Use the import maven project option to import it in Eclipse<\/li>\n<li>PDF version of the code (i.e., PDF version of the java files) (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/03\/WordCountPDF.zip\">WordCountPDF.zip<\/a>)<\/li>\n<li>BigData@Polito environment + Jupyter &#8211; How to submit MapReduce jobs on BigData@Polito (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/04b_ClusterJupyter_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/04b_ClusterJupyter_BigData_NewStyle.pdf\">slides\u00a0 &#8211; no black background<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>MapReduce &#8211; Design patterns &#8211; Part 1 (slides) (slides without black background) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/05_MapReduce_Patterns_Part1_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/05_MapReduce_Patterns_Part1_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>)<\/li>\n<li>MapReduce and Hadoop &#8211; Advanced Topics: Multiple inputs, Multiple outputs, Distributed cache (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/06_AdvancedTopicsMapReduce_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/06_AdvancedTopicsMapReduce_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>)<\/li>\n<li>MapReduce &#8211; Design patterns &#8211; Part 2 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/07_MapReduce_Patterns_Part2_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/07_MapReduce_Patterns_Part2_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>)<\/li>\n<li>MapReduce &#8211; Relational Algebra\/SQL operators (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/08_SQLOperators_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/08_SQLOperators_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Spark\n<ul>\n<li>Introduction to Apache Spark (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/10_SparkIntroduction_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/10_SparkIntroduction_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>)\n<ul>\n<li>How to submit Spark applications (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/10b_SparkSubmit_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/10b_SparkSubmit_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>RDD-based programs\n<ul>\n<li>RDDs: creation, basic transformations and actions (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/11_SparkRDD_Basic_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/11_SparkRDD_Basic_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>)<\/li>\n<li>Key-value pair RDDs: transformations and actions on PairRDDs (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/12_SparkRDD_PairRDD_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/12_SparkRDD_PairRDD_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>)<\/li>\n<li>DoubleRDDs (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/13_SparkRDD_DoubleRDD_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/13_SparkRDD_DoubleRDD_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>)<\/li>\n<li>Advanced Topics: Cache, accumulators, broadcast variables (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/14_SparkRDD_AdvancedTopics_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/14_SparkRDD_AdvancedTopics_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Spark SQL, Datasets and DataFrames (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/15_SparkSQL_Datasets_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/15_SparkSQL_Datasets_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>)<\/li>\n<li>Data Mining \u2013 Recap\n<ul>\n<li>Introduction (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/24a-DMintro.pdf\">slides<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Spark MLlib\n<ul>\n<li>Spark MLlib \u2013 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/16_SparkMLlib_Part1_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/16_SparkMLlib_Part1_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>)\n<ul>\n<li>Logistic Regression example code (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/MLlibPipelineLogisticRegression.zip\">zip<\/a>)<\/li>\n<li>Decision Trees example code (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/\/MLlibPipelineDecisionTree.zip\">zip<\/a>)<\/li>\n<li>Decision Trees and Categorical class label example code (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/\/MLlibPipelineDecisionTreeCategoricalLabel.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Spark MLlib \u2013 Classification of textual data (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/17_SparkMLlib_Part2_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/17_SparkMLlib_Part2_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>)\n<ul>\n<li>Textual data classification example code (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/\/MLlibPipelineText.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Spark MLlib \u2013 Classification and Parameter tuning (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/18_SparkMLlib_Part3_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/18_SparkMLlib_Part3_BigData_NewStyle.pdf\">slides\u00a0 &#8211; no black background<\/a>)\n<ul>\n<li>Parameter tuning example code (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/MLlibPipelineLogisticRegressionCrossValidation.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Spark MLlib \u2013 Clustering of structured data (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/19_SparkMLlib_Part4_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/19_SparkMLlib_Part4_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>)\n<ul>\n<li>Clustering example code (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/MLlibPipelineClustering.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Spark MLlib \u2013 Itemset and Association rule mining (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/20_SparkMLlib_Part5_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/20_SparkMLlib_Part5_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>)\n<ul>\n<li>Itemset and Association rule mining example code (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/MLlibFPGrowth.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Spark MLlib \u2013 Linear regression (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/21_SparkMLlib_Part6_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/21_SparkMLlib_Part6_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>)\n<ul>\n<li>Linear regression example code (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/MLlibPipelineLinearRegression.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Spark Streaming (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/22_SparkStreaming_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/22_SparkStreaming_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>) Last update &#8211; Dec 11, 2020 (Slides 64-69 are new. The other slides have not been changed.)\n<ul>\n<li>Word Count \u2013 Streaming version (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/SparkStreamingWordCount.zip\">zip<\/a>)<\/li>\n<li>Word Count and Window (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/SparkStreamingWordCountWindow.zip\">zip<\/a>)<\/li>\n<li>Word Count \u2013 Stateful version (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/SparkStreamingWordCountStateful.zip\">zip<\/a>)<\/li>\n<li>Word Count \u2013 Streaming version \u2013 Read data from HDFS folder (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/SparkStreamingWordCountFolder.zip\">zip<\/a>)<\/li>\n<li>Word Count \u2013 Output sort by key \u2013 Based on the transformPair() transformation (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/SparkStreamingWordCountSortByKey.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Relational and Non-relational databases for Big data\n<ul>\n<li>Introduction to relational and non-relational databases for Big data (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/23_Rel_vs_NoRelDatabases_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/23_Rel_vs_NoRelDatabases_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3><span id=\"Exercises-1\">Exercises<\/span><\/h3>\n<ul>\n<li>MapReduce\n<ul>\n<li>Basic project\n<ul>\n<li>Linux and MacOs\n<ul>\n<li>Basic Eclipse project for MapReduce applications (based on maven) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/MapReduceBasicProject.zip\">MapReduceBasicProject.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Windows\n<ul>\n<li>Setup instructions (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/ConfigureWindowsEnviroment.pdf\">ConfigureWindowsEnviroment.pdf<\/a>)\n<ul>\n<li>You must install also <strong>JDK 1.8<\/strong> and select it for the imported project inside Eclipse. If you already installed the JDK environment\u00a0 but the version is greater than JDK 1.8 you must install also JDK 1.8.<\/li>\n<\/ul>\n<\/li>\n<li>Winutils executable (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/winutils.zip\">winutils.zip<\/a>)<\/li>\n<li>Basic Eclipse project for MapReduce applications (based on maven) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/MapReduceBasicProjectWindows.zip\">MapReduceBasicProjectWindows.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>MapReduce exercises (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/01_MapReduce_Exercises_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/01_MapReduce_Exercises_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>)\n<ul>\n<li>Solutions of Exercises 1-12 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/Solutions1_12.zip\">Solutions1_12.zip<\/a>)<\/li>\n<li>Solutions of Exercises 13-22 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/Solutions13_22.zip\">Solutions13_22.zip<\/a>)<\/li>\n<li>Solutions of Exercises 23-29 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/Solutions23_29.zip\">Solutions23_29.zip<\/a>) &#8211; The solution of Exercise 23 Bis has been updated (October 29, 2020)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Spark\n<ul>\n<li>Spark RDD-, Dataset-, DataFrame-based exercises (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/11\/02_Spark_Exercises_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/11\/02_Spark_Exercises_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>)\n<ul>\n<li>Example data \u2013 One folder with (few) data for each exercise (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/11\/ExSparkData2021.zip\">ExSparkData.zip<\/a>)<\/li>\n<li>Solutions of Exercises 30-50 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/11\/SolutionsExSpark2021.zip\">SolutionsExSpark.zip<\/a>)\n<ul>\n<li>Ex. 39 Bis &#8211; Comparison between two alternative solutions (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/11\/02Bis_Spark_SolEx39Bis_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/11\/02Bis_Spark_SolEx39Bis_BigData-_NewStyle.pdf\">slides &#8211; no background<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Spark streaming exercises (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/12\/03_SparkStreaming_Exercises_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/12\/03_SparkStreaming_Exercises_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>)\n<ul>\n<li>Solutions of Exercises 51-53 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/12\/SolutionsSparkStreaming.zip\">SolutionsSparkStreaming.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3><span id=\"Practices-1\">Practices<\/span><\/h3>\n<ul>\n<li><strong>No lab activities during the first two weeks<\/strong><\/li>\n<\/ul>\n<ul>\n<li>TEAM 1: Students from A to H &#8211; Monday from 5.30 pm to 7 pm<\/li>\n<li>TEAM 2: Students from I to Z &#8211; Tuesday from 8.30 am to 10 am<\/li>\n<\/ul>\n<ul>\n<li>Lab1: Hadoop and MapReduce\n<ul>\n<li>Online virtual lab, for online questions and answers\n<ul>\n<li>Team 1: Monday, October 12 &#8211; 5.30pm &#8211; 7pm<\/li>\n<li>Team 2: Tuesday, October 13 &#8211; 8.30 am to 10 am<\/li>\n<\/ul>\n<\/li>\n<li>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/Lab1_BigData.pdf\">pdf<\/a>)<\/li>\n<li>How to import and run locally on your PC a MapReduce program by using Eclipse + maven (<a href=\"https:\/\/www.dropbox.com\/s\/niilmhv6k1130dt\/01_ImportProject_LocalRun.mp4?dl=0\">01_ImportProject_LocalRun.mp4<\/a>)<\/li>\n<li>How to create a jar file and execute your application on the remote cluster BigData@Polito (<a href=\"https:\/\/www.dropbox.com\/s\/65xy3hu9qvqp2oc\/02_Jar_ClusterExecution.mp4?dl=0\">02_Jar_ClusterExecution.mp4<\/a>)<\/li>\n<li>Basic project and small example data set\n<ul>\n<li>Linux and macOS (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/Lab1.zip\">Lab1.zip<\/a>)<\/li>\n<li>Windows (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/Lab1Windows.zip\">Lab1Windows.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Bigger data set: finefoods_text.txt (<a href=\"https:\/\/www.dropbox.com\/s\/fswdiblx15mhmyo\/finefoods_text.zip?dl=0\">zip<\/a>)<\/li>\n<li>Solution\n<ul>\n<li>Bonus track: <a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/Lab1_SolBonus_1920.zip\">Lab1_SolBonus_1920.zip<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Lab2: Filter with Hadoop MapReduce\n<ul>\n<li>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/Lab2_1920.pdf\">pdf<\/a>)<\/li>\n<li>Skeleton Eclipse project Hadoop \u2013 MapReduce\n<ul>\n<li>Linux and macOS (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/Lab2_Skeleton1920.zip\">Lab2_Skeleton1920.zip<\/a>)<\/li>\n<li>Windows (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/Lab2Windows_Skeleton1920.zip\">Lab2Windows_Skeleton1920.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Outputs of the first lab (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/OutputFolderLab1.zip\">OutputFolderLab1.zip<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/OutputFolderLab1BonusTrack.zip\">OutputFolderLab1BonusTrack.zip<\/a>). You can use them to test your application locally on your PC<\/li>\n<li>Solution\n<ul>\n<li><a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/Lab2_Sol1920.zip\">Lab2_Sol1920.zip<\/a><\/li>\n<li><a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/Lab2_SolBonus.zip\">Lab2_SolBonus1920.zip<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Lab3: Frequently bought\/reviewed together application with Hadoop MapReduce\n<ul>\n<li>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/Lab3_1920.pdf\">pdf<\/a>)<\/li>\n<li>Skeleton Eclipse project Hadoop \u2013 MapReduce\n<ul>\n<li>Linux and macOS (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/Lab3_Skeleton1920.zip\">Lab3_Skeleton1920.zip<\/a>)<\/li>\n<li>Windows (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/Lab3Windows_Skeleton1920.zip\">Lab3Windows_Skeleton1920.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Input file (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/AmazonTransposedDataset_Sample.txt\">AmazonTransposedDataset_Sample.txt<\/a>)<\/li>\n<li>Expected output\/result (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/resultLab3.txt\">part-r-00000<\/a>)<\/li>\n<li>Solution\n<ul>\n<li><a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/Lab3_Sol1920.zip\">Lab3_Sol1920.zip<\/a> \u2013 Three alternative solutions are provided (the solutions are characterized by a different efficiency)<\/li>\n<li>Comments on the three uploaded solutions (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/Lab3_DraftSolution_BigData.pdf\">slides<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/Lab3_DraftSolution_BigData_NewStyle.pdf\">slides &#8211; no black background<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Lab4: Normalized ratings for product recommendations with Hadoop MapReduce\n<ul>\n<li>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/04\/Lab4_1920.pdf\">pdf<\/a>)<\/li>\n<li>Sample dataset (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/04\/ReviewsSample.csv\">ReviewsSample.csv<\/a>)<\/li>\n<li>Skeleton Eclipse project Hadoop \u2013 MapReduce\n<ul>\n<li>Linux and macOS (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/04\/Lab4_Skeleton1920.zip\">Lab4_Skeleton1920.zip<\/a>)<\/li>\n<li>Windows (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/04\/Lab4Windows_Skeleton1920.zip\">Lab4Windows_Skeleton1920.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Expected output (the input is the large file Reviews.csv) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/11\/risLab4.txt\">resLab4.txt<\/a>)<\/li>\n<li>Solution\n<ul>\n<li><a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/11\/Lab4_Sol2021.zip\">Lab4_Sol.zip<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Lab5: Filter data and compute basic statistics with Apache Spark\n<ul>\n<li>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/04\/Lab5BigData_1920.pdf\">pdf<\/a>)<\/li>\n<li>Sample file (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/SampleLocalFile.csv\">SampleLocalFile.csv<\/a>)<\/li>\n<li>Skeleton Eclipse project Spark (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/04\/Lab5BigData_Template1920.zip\">Lab5BigData_Template1920.zip<\/a>)<\/li>\n<li>Solution\n<ul>\n<li><a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/04\/Lab5BigData_Sol1920.zip\">Lab5_Sol.zip<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<ul>\n<li>Lab6: Frequently bought\/reviewed together application with Apache Spark\n<ul>\n<li>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/04\/Lab6BigData_1920.pdf\">pdf<\/a>)<\/li>\n<li>Sample file (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/ReviewsSample.csv\">ReviewsSample.csv<\/a>)<\/li>\n<li>Skeleton Eclipse project Spark (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/04\/Lab6BigData_Template1920.zip\">Lab6BigData_Template1920.zip<\/a>)<\/li>\n<li>Expected output &#8211; Task 1 (the input is the large file Reviews.csv) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/11\/outputTask1Lab6.zip\">outputTask1Lab6.zip<\/a>)<\/li>\n<li>Solution\n<ul>\n<li><a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/04\/Lab6BigData_Sol1920.zip\">Lab6BigData_Sol1920.zip<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Lab7: Bike sharing data analysis\n<ul>\n<li>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/Lab7BigData_1920.pdf\">pdf<\/a>)<\/li>\n<li>Sample data (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/sampleData.zip\">zip<\/a>)<\/li>\n<li>Skeleton Eclipse project Spark (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/Lab7BigData_Template1920.zip\">Lab7BigData_Template1920.zip<\/a>)<\/li>\n<li>Example KML file (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/example.zip\">zip<\/a>)<\/li>\n<li>Expected output\n<ul>\n<li>Execution on sample data (sampleData\/registerSample.csv and sampleData\/stations.csv) and minimum criticality<br \/>\nthreshold = 0.4 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/11\/resSampleData0.4.txt\">part-00000<\/a>)<\/li>\n<li>Execution on complete data (\/data\/students\/bigdata-01QYD\/Lab7\/register.csv and \/data\/students\/bigdata-01QYD\/Lab7\/stations.csv) and minimum criticality<br \/>\nthreshold = 0.6 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/11\/resAllData0.6.txt\">part-00000<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Another KML visualizer that can be used to visualize on a map the result of your analysis: <a href=\"http:\/\/kmlviewer.nsspot.net\/\">http:\/\/kmlviewer.nsspot.net<\/a><\/li>\n<li>Solution\n<ul>\n<li><a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/Lab7BigData_Sol1920.zip\">Lab7BigData_Sol1920.zip<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Lab8: Bike sharing data analysis based on Spark SQL\n<ul>\n<li>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/11\/Lab8BigData_2021.pdf\">pdf<\/a>)<\/li>\n<li>Sample data (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/sampleData.zip\">zip<\/a>)<\/li>\n<li>Skeleton Eclipse project Spark (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/Lab8BigData_Template1920.zip\">Lab8BigData_Template1920.zip<\/a>)<\/li>\n<li>Expected output\n<ul>\n<li>Execution on sample data (sampleData\/registerSample.csv and sampleData\/stations.csv) and minimum criticality<br \/>\nthreshold = 0.4 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/11\/out_Lab8sample.zip\">out_Lab8sample.zip<\/a>)<\/li>\n<li>Execution on complete data (\/data\/students\/bigdata-01QYD\/Lab8\/register.csv and \/data\/students\/bigdata-01QYD\/Lab8\/stations.csv) and minimum criticality<br \/>\nthreshold = 0.6 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/11\/out_Lab8All.zip\">out_Lab8.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Solution\n<ul>\n<li><a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/Lab8BigData_Sol1920.zip\">Lab8BigData_Sol.zip<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Lab9: A classification pipeline with MLlib + SparkSQL\n<ul>\n<li>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/Lab9BigData_1920.pdf\">pdf<\/a>)<\/li>\n<li>Sample file with 100 reviews (<a href=\"http:\/\/dbdmg.polito.it\/template_labBigData\/ReviewsSample.csv\">ReviewsSample.csv<\/a>)<\/li>\n<li>Skeleton Eclipse project Spark (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/Lab9BigData_Template1920.zip\">Lab9BigData_Template1920.zip<\/a>)<\/li>\n<li>Solution\n<ul>\n<li>Logistic regression (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/Lab9BigData_SolutionLR.zip\">zip<\/a>)<\/li>\n<li>DecisionTree (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/Lab9BigData_SolutionDT.zip\">zip<\/a>)<\/li>\n<li>Logistic regression based on text analysis (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/Lab9BigData_SolutionLR.zip\">zip<\/a>)<\/li>\n<li>DecisionTree based on text analysis (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/Lab9BigData_SolutionDTText.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Lab10: Tweet analysis \u2013 Spark streaming\n<ul>\n<li>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/06\/Lab10BigData_1920.pdf\">pdf<\/a>)<\/li>\n<li>Example files \u2013 tweets (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/exampledata_tweets.zip\">exampledata_tweets.zip<\/a>)<\/li>\n<li>Skeleton Eclipse project Spark (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/06\/Lab10BigData_Template1920.zip\">Lab10BigData_Template1920.zip<\/a>)<\/li>\n<li>Solution\n<ul>\n<li><a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/06\/Lab10BigData_Sol1920.zip\">Lab10BigData_Sol1920.zip<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3><span id=\"Exam-Examples-1\">Exam Examples<\/span><\/h3>\n<p><strong>Pay attention that from this academic year (2020\/21) <span style=\"color: #ff0000;\">the exam is closed book<\/span><\/strong><\/p>\n\n<ul>\n<li>Exam June 26, 2018\n<ul>\n<li>Exam &#8211; Version #1 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/06\/Exam20180626_v1.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (c)<\/li>\n<li>Question 2: (c)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/06\/DraftSolutionv1.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam &#8211; Version #2 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/06\/Exam20180626_v2.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (b)<\/li>\n<li>Question 2: (c)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/06\/DraftSolutionv2.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam July 16, 2018\n<ul>\n<li>Exam &#8211; Version #1 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/07\/Exam20180716_v1.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (d)<\/li>\n<li>Question 2: (a)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/07\/DraftSolutionv1_20180716.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam &#8211; Version #2 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/07\/Exam20180716_v2.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (b)<\/li>\n<li>Question 2: (d)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/07\/DraftSolutionv2_20180716.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam September 3, 2018\n<ul>\n<li>Exam &#8211; Version #1 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/09\/Exam20180903_v1.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (d)<\/li>\n<li>Question 2: (c)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/06\/DraftSolutionv1_201809003.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam &#8211; Version #2 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/09\/Exam20180903_v2.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (b)<\/li>\n<li>Question 2: (c)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/01\/DraftSolutionv2_201809003.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam February 15, 2019\n<ul>\n<li>Exam &#8211; Version #1 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/03\/Exam20190215_v1.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (d)<\/li>\n<li>Question 2: (c)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/06\/DraftSolutionv1_20190215.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam &#8211; Version #2 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/03\/Exam20190215_v2.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (d)<\/li>\n<li>Question 2: (b)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam July 2, 2019\n<ul>\n<li>Exam &#8211; Version #1 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/07\/Exam20190702_v1.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (a)<\/li>\n<li>Question 2: (b)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/07\/BozzaSoluzionev1_20190702.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam &#8211; Version #2 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/07\/Exam20190702_v2.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (a)<\/li>\n<li>Question 2: (b)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/07\/BozzaSoluzionev2_20190702.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam July 18, 2019\n<ul>\n<li>Exam &#8211; Version #1 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/07\/Exam20190718_v1.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (b)<\/li>\n<li>Question 2: (b)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/07\/DraftSolutionExam20190718_v1.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam &#8211; Version #2 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/07\/Exam20190718_v2.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (c)<\/li>\n<li>Question 2: (b)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/07\/DraftSolutionExam20190718_v2.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li><a id=\"exam20200702\"><\/a>Exam July 2, 2020\n<ul>\n<li>Exam (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/07\/BD_Exam20200702.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (b)<\/li>\n<li>Question 2: (a)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/07\/DraftSolutionExam20120702.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li><a id=\"exam20200716\"><\/a>Exam July 16, 2020\n<ul>\n<li>Exam (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/07\/BD_Exam20200716.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (b)<\/li>\n<li>Question 2: (b) &#8211; Note that there are two actions and hence the input file is read two times.<\/li>\n<li>Source code\/Eclipse projects (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/07\/DraftSolutionExam20120716.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam September 17, 2020\n<ul>\n<li>Exam (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/09\/BD_Exam20200917.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (d)<\/li>\n<li>Question 2: (c)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/09\/DraftSolutionExam20120917.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam February 5, 2021\n<ul>\n<li>Exam (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/02\/BD_Exam20210205.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (b)<\/li>\n<li>Question 2: (c)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/02\/DraftSolutionExam20210205.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam June 30, 2021\n<ul>\n<li>Exam (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/07\/BD_Exam20210630.pdf\">pdf<\/a>)\n<ul>\n<li>Draft of the solution\n<ul>\n<li>Question 1: (a)<\/li>\n<li>Question 2: (c)<\/li>\n<li>Source code\/Eclipse projects (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/07\/DraftSolutionExam20210630.zip\">zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3><span id=\"Additional-material-1\">Additional material<\/span><\/h3>\n<ul>\n<li>Slides and screencasts about Java (kindly provided by prof. Torchiano) (<a href=\"http:\/\/dbdmg.polito.it\/~paolo\/JavaMaterials\/02JEY%20-%20Object%20Oriented%20Programming.html\">link<\/a>)\n<ul>\n<li>Suggested slides\/lectures for those students who have never used Java\n<ul>\n<li>OO Paradigm and UML (The UML part is not mandatory)<\/li>\n<li>The Java Environment<\/li>\n<li>Java Basic Features<\/li>\n<li>Java Inheritance<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Data mining &#8211; Centralized algorithms\n<ul>\n<li>Data and Preprocessing (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/24b-DMPreProc.pdf\">slides<\/a>)<\/li>\n<li>Itemset mining and Association rules (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/24c-DMassrules.pdf\">slides<\/a>)<\/li>\n<li>Classification (slides) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/\/24d-DMClassification.pdf\">slidese<\/a>)<\/li>\n<li>Clustering (slides) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/10\/\/24e-DMClustering.pdf\">slides<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n\n<br class=\"fixfloat\" \/>","protected":false},"excerpt":{"rendered":"<p>Table of content General information Exam rules Announcements Slides Exercises Practices Exam Examples Additional material This is the old version of the web page of the Big data course. Web page of the academic year 2021\/22: link General information ECTS: 6 Professor: Teaching assistants: Luca Colomba Francesco Ventura Exam rules Exam rules Academic Year 2020-2021<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/teaching\/big-data-architectures-and-data-analytics-2020-2021\/\">[&#8230;]<\/a><\/p>\n","protected":false},"author":3,"featured_media":0,"parent":96,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-16570","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/pages\/16570","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/comments?post=16570"}],"version-history":[{"count":90,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/pages\/16570\/revisions"}],"predecessor-version":[{"id":18355,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/pages\/16570\/revisions\/18355"}],"up":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/pages\/96"}],"wp:attachment":[{"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/media?parent=16570"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}