{"id":4549,"date":"2022-09-20T14:39:10","date_gmt":"2022-09-20T12:39:10","guid":{"rendered":"https:\/\/dbdmg.polito.it\/dbdmg_web\/?p=4549"},"modified":"2023-01-10T12:20:54","modified_gmt":"2023-01-10T11:20:54","slug":"big-data-architectures-and-data-analytics-2022-2023","status":"publish","type":"post","link":"https:\/\/dbdmg.polito.it\/dbdmg_web\/2022\/big-data-architectures-and-data-analytics-2022-2023\/","title":{"rendered":"Big Data: Architectures and Data Analytics (2022\/2023)"},"content":{"rendered":"\n<h2 class=\"eplus-wrapper wp-block-heading\" id=\"general-information\">General Information<\/h2>\n\n\n\n<p class=\" eplus-wrapper\"><strong>SSD<\/strong>: ING-INF\/05<\/p>\n\n\n\n<p class=\" eplus-wrapper\"><strong>CFU<\/strong>: 6<\/p>\n\n\n\n<p class=\" eplus-wrapper\"><strong>Professor<\/strong>: Daniele Apiletti<\/p>\n\n\n\n<p class=\" eplus-wrapper\"><strong>Teaching Assistant<\/strong>: Simone Monaco<\/p>\n\n\n\n<p class=\" eplus-wrapper\">Q&amp;A teaching&nbsp;<strong>assistance&nbsp;<\/strong>on Piazza:&nbsp;<a href=\"http:\/\/piazza.com\/polito.it\/fall2022\/01qydov\/\">piazza.com\/polito.it\/fall2022\/01qydov\/<\/a><\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer eplus-wrapper\"><\/div>\n\n\n\n<h2 class=\"eplus-wrapper wp-block-heading\" id=\"announcements\">Announcements<\/h2>\n\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-0f2521\">\n<li class=\" eplus-wrapper\"><strong>20-09-22:<\/strong> The first lecture is scheduled for September  29, 2022 at 13:00 in Classroom R2 <\/li>\n\n\n\n<li class=\" eplus-wrapper\"><strong>26-09-22:<\/strong> No lab activities during the first weeks of the course, Lab will start on <strong>October 11th, 2022<\/strong>.<\/li>\n\n\n\n<li class=\" eplus-wrapper\">We are using Piazza for class discussion, we invite all students to\u00a0<a href=\"http:\/\/piazza.com\/polito.it\/fall2022\/01qydov\">join the course Piazza<\/a>. Piazza is highly catered to getting help fast and efficiently from both classmates and teachers. Rather than emailing questions to the teaching staff, students are invited to post their questions on Piazza.<\/li>\n<\/ul>\n\n\n<hr class=\"wp-block-separator has-css-opacity eplus-wrapper\"\/>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer eplus-wrapper\"><\/div>\n\n\n\n<h2 class=\"eplus-wrapper wp-block-heading\" id=\"teaching-material\">Teaching Material<\/h2>\n\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-1cfcef\">\n<li class=\" eplus-wrapper\"><strong>Introduction to the course content and exam rules<\/strong> (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/09\/00_Intro_BigData_2223_Daniele.pdf\">slides<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\"><strong>Introduction to Big Data<\/strong> (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/01_Intro_BigData_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\"><strong>Big Data Architectures<\/strong> (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/02_Architectures_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\"><strong>Hadoop and MapReduce <\/strong><ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-96f456\">\n<li class=\" eplus-wrapper\">Introduction to Apache Hadoop and the MapReduce programming paradigm (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/03_Intro_HadoopAndMapReduce_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-254611\">\n<li class=\" eplus-wrapper\"><span style=\"background-color: rgba(0, 0, 0, 0.2);\"><mark style=\"background-color:#fff\" class=\"has-inline-color\">Interaction with HDFS and Hadoop by means of the command line (<\/mark><\/span><a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/03b_HDFS_Hadoop_CommandLine_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a><span style=\"background-color: rgba(0, 0, 0, 0.2);\"><mark style=\"background-color:#fff\" class=\"has-inline-color\">)<\/mark><\/span><\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Hadoop implementation of MapReduce (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/04_HadoopImplementationOfMapReduce_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-859d31\">\n<li class=\" eplus-wrapper\">BigData@Polito environment + Jupyter \u2013 How to submit MapReduce jobs on BigData@Polito (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/04b_ClusterJupyter_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">MapReduce and Hadoop \u2013 Advanced Topics: Multiple inputs, Multiple outputs, Distributed cache (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/06_AdvancedTopicsMapReduce_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">MapReduce \u2013 Design patterns \u2013 Part 1 (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/05_MapReduce_Patterns_Part1_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">MapReduce \u2013 Design patterns \u2013 Part 2 (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/07_MapReduce_Patterns_Part2_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">MapReduce \u2013 Relational Algebra\/SQL operators (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/08_SQLOperators_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\"><strong>Spark<\/strong><ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-3ba9f7\">\n<li class=\" eplus-wrapper\">Introduction to Apache Spark (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/10_SparkIntroduction_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-33f699\">\n<li class=\" eplus-wrapper\">How to submit Spark applications (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/10b_SparkSubmit_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">RDD-based programs RDDs<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-6eba5a\">\n<li class=\" eplus-wrapper\">Creation, basic transformations and actions (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/11_SparkRDD_Basic_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Key-value pair RDDs: transformations and actions on PairRDDs (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/12_SparkRDD_PairRDD_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">DoubleRDDs (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/13_SparkRDD_DoubleRDD_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Advanced Topics: Cache, accumulators, broadcast variables (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/14_SparkRDD_AdvancedTopics_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Spark SQL, Datasets and DataFrames (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/15_SparkSQL_Datasets_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-7fce3c\">\n<li class=\" eplus-wrapper\"><span style=\"background-color: rgba(0, 0, 0, 0.2);\"><mark style=\"background-color:#fff\" class=\"has-inline-color\">Spark SQL &#8211; Join examples (<\/mark><\/span><a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/ExamplesSparkSQLJoins.zip\" target=\"_blank\">ExamplesSparkSQLJoins.zip<\/a><span style=\"background-color: rgba(0, 0, 0, 0.2);\"><mark style=\"background-color:#fff\" class=\"has-inline-color\">)<\/mark><\/span><\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Data Mining<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-2ee53d\">\n<li class=\" eplus-wrapper\">Recap data mining tasks (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/DMintro.pdf\" target=\"_blank\">slides<\/a>) &#8211; From the &#8220;Data Science And Database Technology&#8221; course<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Spark MLlib<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-78d206\">\n<li class=\" eplus-wrapper\">Introduction and Classification of structured data (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/16_SparkMLlib_Part1_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-8f926c\">\n<li class=\" eplus-wrapper\">Logistic Regression example code (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/MLlibPipelineLogisticRegression.zip\" target=\"_blank\">zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Decision Trees example code (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/MLlibPipelineDecisionTree.zip\" target=\"_blank\">zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Decision Trees and Categorical class label example code (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/MLlibPipelineDecisionTreeCategoricalLabel.zip\" target=\"_blank\">zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Classification of textual data (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/17_SparkMLlib_Part2_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>, <a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/MLlibPipelineText.zip\" target=\"_blank\">example code zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Classification and Parameter tuning (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/18_SparkMLlib_Part3_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>, <a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/MLlibPipelineLogisticRegressionCrossValidation.zip\" target=\"_blank\">example code zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Clustering of structured data (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/19_SparkMLlib_Part4_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>, <a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/MLlibPipelineClustering.zip\" target=\"_blank\">example code zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Itemset and Association rule mining (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/20_SparkMLlib_Part5_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>, <a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/MLlibFPGrowth.zip\" target=\"_blank\">example code zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Linear regression (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/21_SparkMLlib_Part6_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>, <a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/MLlibPipelineLinearRegression.zip\" target=\"_blank\">example code zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Spark Streaming (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/12\/22_SparkStreaming_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-ad998e\">\n<li class=\" eplus-wrapper\">Examples: Word Count \u2013 Streaming versions (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/12\/ExamplesSparkStreaming.zip\" target=\"_blank\">zip<\/a>)<\/li>\n<\/ul><\/li>\n<\/ul><\/li>\n<\/ul>\n\n\n<p class=\" eplus-wrapper\"><\/p>\n\n\n\n<h2 class=\"eplus-wrapper wp-block-heading\">Exercises<\/h2>\n\n\n\n<p class=\" eplus-wrapper\"><mark style=\"background-color:rgba(0, 0, 0, 0);color:#fc0303\" class=\"has-inline-color\">If you use your PC to write and run your code, import the projects based on Maven (those projects can be run locally).<br>If you use the PC available in the LAB, import the Eclipse projects with libraries (those projects cannot be run locally but only on the cluster exporting the jar file of the project).<\/mark><\/p>\n\n\n\n<h4 class=\"eplus-wrapper wp-block-heading\">MapReduce <\/h4>\n\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-5ceb2f\">\n<li class=\" eplus-wrapper\"><strong>MapReduce exercises (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/01_MapReduce_Exercises_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/strong><ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-f5e83e\">\n<li class=\" eplus-wrapper\">Solutions of Exercises 1-29 (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/SolutionsExMapReduce.zip\" target=\"_blank\">SolutionsExMapReduce.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\"><strong>Basic project <\/strong><\/li>\n<\/ul>\n\n\n<figure class=\"is-style-regular wp-block-table eplus-wrapper\"><table><tbody><tr><td><strong>Linux and MacOS<\/strong><\/td><td><strong>Windows<\/strong><\/td><\/tr><tr><td>\u2022 Basic Eclipse project for MapReduce applications (with libraries) (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/WordCountLibraries.zip\" target=\"_blank\">MapReduceBasicProjectWithLibraries.zip<\/a>) &#8211; Import using Import\/General\/Existing Projects into Workspace<br>\u2022 Basic Eclipse project for MapReduce applications (based on maven) (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/MapReduceBasicProject.zip\" target=\"_blank\">MapReduceBasicProject.zip<\/a>) &#8211; Import it using Import\/Maven\/Existing Maven Projects<\/td><td>\u2022 Basic Eclipse project for MapReduce applications (with libraries) (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/WordCountLibraries.zip\" target=\"_blank\"><\/a><a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/WordCountLibraries.zip\">MapReduceBasicProjectWithLibraries.zip<\/a>) &#8211; Import using Import\/General\/Existing Projects into Workspace<br><br>\u2022 Setup instructions for running MapReduce applications locally inside Eclipse (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/ConfigureWindowsEnviroment.pdf\" target=\"_blank\">ConfigureWindowsEnviroment.pdf<\/a>)<br><em>&#8211; You must install also <strong>JDK 1.8<\/strong> and select it for the imported project inside Eclipse. If you already installed the JDK environment but the version is greater than JDK 1.8 you must install also JDK 1.8.<br>&#8211; Winutils executable (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/winutils.zip\" target=\"_blank\">winutils.zip<\/a>)<br>&#8211; Basic Eclipse project for MapReduce applications (based on maven) (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/MapReduceBasicProjectWindows.zip\" target=\"_blank\">MapReduceBasicProjectWindows.zip<\/a>)<\/em><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\" eplus-wrapper\"><\/p>\n\n\n\n\n\n<h4 class=\"eplus-wrapper wp-block-heading\">Spark<\/h4>\n\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-770a23\">\n<li class=\" eplus-wrapper\"><strong>Spark RDD-, Dataset-, DataFrame-based exercises (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/02_Spark_Exercises_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/strong><ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-408bfa\">\n<li class=\" eplus-wrapper\">Example data \u2013 One folder with (few) data for each exercise (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/ExampleDataSpark.zip\" target=\"_blank\">ExampleDataSpark.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Solutions of Exercises 30-50 (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/SolutionsSpark30-50.zip\" target=\"_blank\">SolutionsExSpark30-50.zip<\/a>) &#8211; Updated on November 19, 2021 &#8211; Added a second possible solution for Exercise #44 (folder Exercise44 _v2)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Solutions of Exercises from 32 to 38 and 44 based on Spark SQL (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/SolSparkSQL32-38_44.zip\" target=\"_blank\">SolSparkSQL32-38_44.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\"><strong>Spark streaming exercises (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/12\/03_SparkStreaming_Exercises_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/strong><ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-35e07c\">\n<li class=\" eplus-wrapper\">Solutions of Exercises 51-53 (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/12\/SolutionsSparkStreaming51-53.zip\">SolutionsSparkStreaming51_53.zip<\/a>)<\/li>\n<\/ul><\/li>\n<\/ul>\n\n\n<hr class=\"wp-block-separator has-css-opacity eplus-wrapper\"\/>\n\n\n\n<h3 class=\"eplus-wrapper wp-block-heading\" id=\"laboratory-material\">Laboratory Material<\/h3>\n\n\n\n\n\n<figure class=\"wp-block-table eplus-wrapper\"><table><tbody><tr><td><strong>Student Group<\/strong><\/td><td><strong>Time<\/strong><\/td><td><strong>Room<\/strong><\/td><\/tr><tr><td>Team A: Students from A to L<\/td><td>Tue, 16:00-17:30<\/td><td>Laib1<\/td><\/tr><tr><td>Team B: Students from M to Z<\/td><td>Tue, 17:30-19:00<\/td><td>Laib1<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-0d766a\">\n<li class=\" eplus-wrapper\"><strong>Lab1: Hadoop and MapReduce<\/strong><ul><li>Problem specification (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab1_BigData.pdf\" target=\"_blank\">pdf<\/a>)<\/li><li>Basic project and small example data set (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab1_BigData_with_libraries.zip\" target=\"_blank\">Lab1_BigData_with_libraries.zip<\/a>)<\/li><li>Basic project based on Maven &#8211; Use this version of the project to run the MapReduce application locally on your own PC<ul><li>Import it using Import\/Maven\/Existing Maven Projects<ul><li>Linux and macOS (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab1.zip\" target=\"_blank\">Lab1.zip<\/a>)<\/li><li>Windows (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab1Windows.zip\" target=\"_blank\">Lab1Windows.zip<\/a>) <\/li><\/ul><\/li><li>Bigger data set: finefoods_text.txt (<a rel=\"noreferrer noopener\" href=\"https:\/\/www.dropbox.com\/s\/fswdiblx15mhmyo\/finefoods_text.zip?dl=0\" target=\"_blank\">zip<\/a>)<\/li><\/ul><\/li><\/ul><ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-00909a\">\n<li class=\" eplus-wrapper\">Solution Bonus track<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-ca2f82\">\n<li class=\" eplus-wrapper\"><a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab1_SolBonusMvn.zip\" target=\"_blank\">Lab1_SolBonusMvn.zip<\/a> &#8211; The project is based on mvn <\/li>\n<\/ul><\/li>\n<\/ul><\/li>\n<\/ul>\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-8abbcd\">\n<li class=\" eplus-wrapper\"><strong>Lab2: Filter with Hadoop MapReduce<\/strong><ul><li>Problem specification (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab2_2021.pdf\" target=\"_blank\">pdf<\/a>)<\/li><li>Skeleton Eclipse project Hadoop \u2013 MapReduce<ul><li>Version with libraries: <a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab2_Skeleton_with_libraries.zip\" target=\"_blank\">Lab2_Skeleton_with_libraries.zip<\/a><\/li><li>Maven project, Linux and macOS (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab2_Skeleton.zip\" target=\"_blank\">Lab2_Skeleton.zip<\/a>)<\/li><li>Maven project, Windows (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab2Windows_Skeleton.zip\" target=\"_blank\">Lab2Windows_Skeleton.zip<\/a>)<\/li><\/ul><\/li><\/ul><ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-70301c\">\n<li class=\" eplus-wrapper\">Outputs of the first lab (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/OutputFolderLab1.zip\" target=\"_blank\">OutputFolderLab1.zip<\/a>) (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/OutputFolderLab1BonusTrack.zip\" target=\"_blank\">OutputFolderLab1BonusTrack.zip<\/a>). You can use them to test your application locally on your own PC if you are using Maven<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-dd80db\">\n<li class=\" eplus-wrapper\">Solution: <a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab2_Sol.zip\" target=\"_blank\">Lab2_Sol.zip<\/a> &#8211; The project is based on mvn<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Solution Bonus track: <a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab2_SolBonus.zip\" target=\"_blank\">Lab2_SolBonus.zip<\/a> &#8211; The project is based on mvn <\/li>\n<\/ul><\/li>\n<\/ul><\/li>\n<\/ul>\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-1af49d\">\n<li class=\" eplus-wrapper\"><strong>Lab3: Frequently bought\/reviewed together application with Hadoop MapReduce <\/strong><ul><li>Problem specification (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab3_2021.pdf\" target=\"_blank\">pdf<\/a>)<\/li><\/ul><ul><li>Skeleton Eclipse project Hadoop \u2013 MapReduce<ul><li>Version with libraries (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab3_Skeleton_with_libraries.zip\" target=\"_blank\">Lab3_Skeleton_with_libraries.zip<\/a>)<\/li><li>Maven project, Linux and macOS (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab3_Skeleton.zip\" target=\"_blank\">Lab3_Skeleton.zip<\/a>)<\/li><li>Maven project, Windows (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab3Windows_Skeleton.zip\" target=\"_blank\">Lab3Windows_Skeleton.zip<\/a>)<\/li><\/ul><\/li><li>Sample file (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/AmazonTransposedDataset_Sample.txt\" target=\"_blank\">AmazonTransposedDataset_Sample.txt<\/a>)<\/li><\/ul><ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-2a1398\">\n<li class=\" eplus-wrapper\">Solution<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-e90816\">\n<li class=\" eplus-wrapper\"><a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab3_Sol.zip\" target=\"_blank\">Lab3_Sol.zip<\/a> &#8211; The project is based on mvn <\/li>\n\n\n\n<li class=\" eplus-wrapper\">Comments on the three uploaded solutions (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab3_DraftSolution_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n<\/ul><\/li>\n<\/ul><\/li>\n<\/ul>\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-7d6878\">\n<li class=\" eplus-wrapper\"><strong>Lab4: Normalized ratings for product recommendations with Hadoop MapReduce<\/strong><ul><li>Problem specification (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab4_2021.pdf\" target=\"_blank\">pdf<\/a>)<\/li><li>Skeleton Eclipse project Hadoop \u2013 MapReduce<ul><li>Version with libraries (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab4_Skeleton_with_libraries.zip\" target=\"_blank\">Lab4_Skeleton_with_libraries.zip<\/a>)<\/li><li>Maven project, Linux and macOS (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab4_Skeleton.zip\" target=\"_blank\">Lab4_Skeleton.zip<\/a>)<\/li><li>Maven project, Windows (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab4Windows_Skeleton.zip\" target=\"_blank\">Lab4Windows_Skeleton.zip<\/a>)<\/li><\/ul><\/li><\/ul><ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-83059c\">\n<li class=\" eplus-wrapper\">Sample file (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/ReviewsSample.csv\" target=\"_blank\">ReviewsSample.csv<\/a>) <\/li>\n\n\n\n<li class=\" eplus-wrapper\">Solution (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/Lab4_Sol.zip\" target=\"_blank\" rel=\"noreferrer noopener\">zip<\/a> &#8211; The project is based on mvn)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\"><strong>Lab5: Filter data and compute basic statistics with Apache Spark<\/strong><ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-13ca2e\">\n<li class=\" eplus-wrapper\">Problem specification (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/Lab5_2021.pdf\" target=\"_blank\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Skeleton Eclipse project \u2013 Spark <ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-06c24e\">\n<li class=\" eplus-wrapper\">Version with libraries (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/Lab5_Skeleton_with_libraries.zip\" target=\"_blank\">Lab5_Skeleton_with_libraries.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Maven project Linux, macOS, Windows (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/Lab5BigData_Skeleton.zip\" target=\"_blank\">Lab5_Skeleton.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Sample file (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/SampleLocalFile.csv\" target=\"_blank\">SampleLocalFile.csv<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Solution (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/Lab5BigData_Sol.zip\" target=\"_blank\">zip<\/a>)<\/li>\n<\/ul><\/li>\n<\/ul>\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-7acda7\">\n<li class=\" eplus-wrapper\"><strong>Lab6: Frequently bought\/reviewed together application with Apache Spark<\/strong>\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-df9a06\">\n<li class=\" eplus-wrapper\">Problem specification (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/Lab6_2023.pdf\" target=\"_blank\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Skeleton Eclipse project \u2013 Spark<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-ce0c67\">\n<li class=\" eplus-wrapper\">Version with libraries (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/Lab6_Skeleton_with_libraries.zip\" target=\"_blank\">Lab6_Skeleton_with_libraries.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Maven project Linux, macOS, Windows (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/Lab6BigData_Skeleton.zip\" target=\"_blank\">Lab6_Skeleton.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Sample file (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/ReviewsSample.csv\" target=\"_blank\">ReviewsSample.csv<\/a>)<\/li>\n<\/ul>\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-eb3c99\">\n<li class=\" eplus-wrapper\">Expected output \u2013 Task 1 (expected output if the input is the HDFS file Reviews.csv) (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/outputTask1Lab6.zip\" target=\"_blank\">outputTask1Lab6.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Solution (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/Lab6BigData_Sol.zip\" target=\"_blank\" rel=\"noreferrer noopener\">zip<\/a>)<\/li>\n<\/ul><\/li>\n<\/ul>\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-96c152\">\n<li class=\" eplus-wrapper\"><strong>Lab7: Bike sharing data analysis<\/strong>\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-efcc35\">\n<li class=\" eplus-wrapper\">Problem specification (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/Lab7_2023.pdf\" target=\"_blank\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Skeleton Eclipse project \u2013 Spark<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-94b1c1\">\n<li class=\" eplus-wrapper\">Version with libraries (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/Lab7_Skeleton_with_libraries.zip\" target=\"_blank\">Lab7_Skeleton_with_libraries.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Maven project Linux, macOS, Windows (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/Lab7_Skeleton.zip\" target=\"_blank\">Lab7_Skeleton.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Sample data (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/sampleData.zip\" target=\"_blank\">sampleData.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Example KML file (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/exampleKML.zip\" target=\"_blank\">exampleKML.zip<\/a>)<\/li>\n<\/ul>\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-c8b779\">\n<li class=\" eplus-wrapper\">Expected output<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-62fc03\">\n<li class=\" eplus-wrapper\">Execution on sample data (sampleData\/registerSample.csv and sampleData\/stations.csv) and minimum criticality threshold = 0.4 (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/resSampleData0.4-1.txt\" target=\"_blank\">part-00000<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Execution on complete data (\/data\/students\/bigdata-01QYD\/Lab7\/register.csv and \/data\/students\/bigdata-01QYD\/Lab7\/stations.csv) and minimum criticality threshold = 0.6 (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/resAllData0.6-1.txt\" target=\"_blank\">part-00000<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Solution (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/Lab7BigData_Sol.zip\" target=\"_blank\" rel=\"noreferrer noopener\">zip<\/a>)<\/li>\n<\/ul><\/li>\n<\/ul>\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-23c63a\">\n<li class=\" eplus-wrapper\"><strong>Lab8: Bike sharing data analysis based on Spark SQL<\/strong>\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-3c3052\">\n<li class=\" eplus-wrapper\">Problem specification (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/Lab8.pdf\" target=\"_blank\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Skeleton Eclipse project \u2013 Spark<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-658db4\">\n<li class=\" eplus-wrapper\">Version with libraries (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/Lab8_Skeleton_with_libraries.zip\" target=\"_blank\">Lab8_Skeleton_with_libraries.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Maven project Linux, macOS, Windows (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/Lab8_Skeleton.zip\" target=\"_blank\">Lab8_Skeleton.zip<\/a>)<\/li>\n<\/ul><\/li>\n<\/ul>\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-6b1217\">\n<li class=\" eplus-wrapper\">Sample data (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/sampleData-1.zip\" target=\"_blank\">sampleData.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Solution (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/12\/Lab8_Sol.zip\" target=\"_blank\" rel=\"noreferrer noopener\">zip<\/a>)<\/li>\n<\/ul><\/li>\n<\/ul>\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-0df3c9\">\n<li class=\" eplus-wrapper\"><strong>Lab9: A classification pipeline with MLlib + SparkSQL<\/strong>  \n\n\n\n\n\n\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-56904e\">\n<li class=\" eplus-wrapper\">Problem specification (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/12\/Lab9.pdf\" target=\"_blank\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Skeleton Eclipse project \u2013 Spark<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-b6e151\">\n<li class=\" eplus-wrapper\">Version with libraries (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/12\/Lab9_Skeleton_with_libraries.zip\" target=\"_blank\">Lab9_Skeleton_with_libraries.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Maven project (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/12\/Lab9_Skeleton.zip\" target=\"_blank\">Lab9_Skeleton.zip<\/a>)<\/li>\n<\/ul><\/li>\n<\/ul>\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-74b4c2\">\n<li class=\" eplus-wrapper\">Sample file with 100 reviews (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/12\/ReviewsSample.csv\" target=\"_blank\">ReviewsSample.csv<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Solution<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-403714\">\n<li class=\" eplus-wrapper\">Logistic regression (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/12\/Lab9_SolLR.zip\" target=\"_blank\">zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">DecisionTree (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/12\/Lab9_SolDT.zip\" target=\"_blank\">zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Logistic regression based on text analysis (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/12\/Lab9_SolLRText.zip\" target=\"_blank\">zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">DecisionTree based on text analysis (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/12\/Lab9_SolDTText.zip\" target=\"_blank\">zip<\/a>)<\/li>\n<\/ul><\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\"><strong>Lab10: Tweet analysis \u2013 Spark streaming<\/strong> <ul><li>Problem specification (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/12\/Lab10.pdf\" target=\"_blank\">pdf<\/a>)<\/li><\/ul><ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-bed9f5\">\n<li class=\" eplus-wrapper\">Skeleton Eclipse project \u2013 Spark<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-2284a4\">\n<li class=\" eplus-wrapper\">Version with libraries (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/12\/Lab10_Skeleton_with_libraries.zip\" target=\"_blank\">Lab10_Skeleton_with_libraries.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Maven project (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/12\/Lab10_Skeleton.zip\" target=\"_blank\">Lab10_Skeleton.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Example files \u2013 tweets (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/12\/exampledata_tweets.zip\" target=\"_blank\">exampledata_tweets.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Solution<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-70d93d\">\n<li class=\" eplus-wrapper\"><a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/12\/Lab10_Sol.zip\" target=\"_blank\">Lab10_sol.zip<\/a><\/li>\n<\/ul><\/li>\n<\/ul><\/li>\n<\/ul>\n\n\n<hr class=\"wp-block-separator has-css-opacity eplus-wrapper\"\/>\n\n\n\n<h3 class=\"eplus-wrapper wp-block-heading\" id=\"exam-examples\">Exam examples<\/h3>\n\n\n\n\n\n<p class=\" eplus-wrapper\">Pay attention that from the academic year 2022\/23 the exam is <strong><mark style=\"background-color:rgba(0, 0, 0, 0);color:#f30a0a\" class=\"has-inline-color\">open book<\/mark><\/strong>.<\/p>\n\n\n\n<figure class=\"wp-block-table eplus-wrapper\"><table><tbody><tr><td><strong>Text<\/strong><\/td><td><strong>Solutions<\/strong><\/td><\/tr><tr><td>Spark Streaming &#8211; Examples of multiple choice questions (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/12\/ExamplesMultipleChoiceQuestions.pdf\" target=\"_blank\">pdf<\/a>)<\/td><td>Question 1: (c)<br>Question 2: (d)<br>Question 3: (b)<\/td><\/tr><tr><td>Exam June 30, 2017 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/06\/Exam20170630_v1.pdf\">pdf<\/a>)<\/td><td>Question 1: (b)<br>Question 2: (c)<br>Source code\/Eclipse projects (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/Exam20170630.zip\">zip<\/a>)<\/td><\/tr><tr><td>Exam July 14, 2017 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/07\/Exam20170714_v1.pdf\">pdf<\/a>)<\/td><td>Question 1: (d)<br>Question 2: (c)<br>Source code\/Eclipse projects (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/Exam20170714.zip\">zip<\/a>)<\/td><\/tr><tr><td>Exam September 14, 2017 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/09\/Exam20170914_v1.pdf\">pdf<\/a>)<\/td><td>Question 1: (a)<br>Question 2: (b)<br>Source code\/Eclipse projects (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/Exam20170914.zip\">zip<\/a>)<\/td><\/tr><tr><td>Exam June 26, 2018 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/06\/Exam20180626_v1.pdf\">pdf<\/a>)<\/td><td>Question 1: (c)<br>Question 2: (c)<br>Source code\/Eclipse projects (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/06\/DraftSolutionv1.zip\">zip<\/a>)<\/td><\/tr><tr><td>Exam July 16, 2018 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/07\/Exam20180716_v1.pdf\">pdf<\/a>)<\/td><td>Question 1: (d)<br>Question 2: (a)<br>Source code\/Eclipse projects (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/07\/DraftSolutionv1_20180716.zip\">zip<\/a>)<\/td><\/tr><tr><td>Exam September 3, 2018 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/09\/Exam20180903_v1.pdf\">pdf<\/a>)<\/td><td>Question 1: (d)<br>Question 2: (c)<br>Source code\/Eclipse projects (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/06\/DraftSolutionv1_201809003.zip\">zip<\/a>)<\/td><\/tr><tr><td>Exam February 15, 2019 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/03\/Exam20190215_v1.pdf\">pdf<\/a>)<\/td><td>Question 1: (d)<br>Question 2: (c)<br>Source code\/Eclipse projects (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/06\/DraftSolutionv1_20190215.zip\">zip<\/a>)<\/td><\/tr><tr><td>Exam July 2, 2019 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/07\/Exam20190702_v1.pdf\">pdf<\/a>)<\/td><td>Question 1: (a)<br>Question 2: (b)<br>Source code\/Eclipse projects (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/07\/BozzaSoluzionev1_20190702.zip\">zip<\/a>)<\/td><\/tr><tr><td>Exam July 18, 2019 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/07\/Exam20190718_v1.pdf\">pdf<\/a>)<\/td><td>Question 1: (b)<br>Question 2: (b)<br>Source code\/Eclipse projects (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2019\/07\/DraftSolutionExam20190718_v1.zip\">zip<\/a>)<\/td><\/tr><tr><td>Exam July 2, 2020 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/07\/BD_Exam20200702.pdf\">pdf<\/a>)<\/td><td>Question 1: (b)<br>Question 2: (a)<br>Source code\/Eclipse projects (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/07\/DraftSolutionExam20120702.zip\">zip<\/a>)<\/td><\/tr><tr><td>Exam July 16, 2020 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/07\/BD_Exam20200716.pdf\">pdf<\/a>)<\/td><td>Question 1: (b)<br>Question 2: (b) \u2013 Note that there are two actions and hence the input file is read two times.<br>Source code\/Eclipse projects (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/07\/DraftSolutionExam20120716.zip\">zip<\/a>)<\/td><\/tr><tr><td>Exam September 17, 2020 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/09\/BD_Exam20200917.pdf\">pdf<\/a>)<\/td><td>Question 1: (d)<br>Question 2: (c)<br>Source code\/Eclipse projects (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/09\/DraftSolutionExam20120917.zip\">zip<\/a>)<\/td><\/tr><tr><td>Exam February 5, 2021 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/02\/BD_Exam20210205.pdf\">pdf<\/a>)<\/td><td>Question 1: (b)<br>Question 2: (c)<br>Source code\/Eclipse projects (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/02\/DraftSolutionExam20210205.zip\">zip<\/a>)<\/td><\/tr><tr><td>Exam June 30, 2021Exam (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/12\/BD_Exam20210630.pdf\" target=\"_blank\">pdf<\/a>)<\/td><td>Question 1: (a)<br>Question 2: (c)<br>Source code\/Eclipse projects (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/07\/DraftSolutionExam20210630.zip\">zip<\/a>)<\/td><\/tr><tr><td>Exam February 2, 2022 (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/02\/BD_Exam20220202_v1.pdf\" target=\"_blank\">pdf<\/a>)<\/td><td>Question 1: (b)<br>Question 2: (d)<br>Source code\/Eclipse projects (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/02\/Draft20200202.zip\" target=\"_blank\">zip<\/a>)<\/td><\/tr><tr><td>Exam February 21, 2022 (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/02\/BD_Exam20220221_v2.pdf\">pdf<\/a>)<\/td><td>Question 1: (d)<br>Question 2: (d)<br>Source code\/Eclipse projects (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/02\/Draft20200221.zip\">zip<\/a>)<\/td><\/tr><tr><td>Exam July 4, 2022 (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/07\/BD_Exam20220704.pdf\">pdf<\/a>)<\/td><td>Question 1: (c)<br>Question 2: (d)<br>Source code\/Eclipse projects (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/07\/Draft20220704.zip\">zip<\/a>)<\/td><\/tr><tr><td>Exam September 6, 2022 (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/09\/BD_Exam20220906.pdf\" target=\"_blank\">pdf<\/a>)<\/td><td>Question 1: (c)<br>Question 2: (d)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity eplus-wrapper\"\/>\n\n\n\n<h3 class=\"eplus-wrapper wp-block-heading\" id=\"additional-material\">Additional material<\/h3>\n\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-8e3be5\">\n<li class=\" eplus-wrapper\">Slides and screencasts about Java (kindly provided by prof. Torchiano) (<a href=\"http:\/\/dbdmg.polito.it\/~paolo\/JavaMaterials\/02JEY%20-%20Object%20Oriented%20Programming.html\">link<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-9f0a99\">\n<li class=\" eplus-wrapper\">Suggested slides\/lectures for those students who have never used Java<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-300e21\">\n<li class=\" eplus-wrapper\">OO Paradigm and UML (The UML part is not mandatory)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">The Java Environment<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Java Basic Features<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Java Inheritance<\/li>\n<\/ul><\/li>\n<\/ul><\/li>\n<\/ul>\n\n\n<div class=\"wp-block-buttons eplus-wrapper is-layout-flex wp-block-buttons-is-layout-flex\"><\/div>\n","protected":false},"excerpt":{"rendered":"<p>General Information SSD: ING-INF\/05 CFU: 6 Professor: Daniele Apiletti Teaching Assistant: Simone Monaco Q&amp;A teaching&nbsp;assistance&nbsp;on Piazza:&nbsp;piazza.com\/polito.it\/fall2022\/01qydov\/ Announcements Teaching Material Exercises If you use your PC to write and run your code, import the projects based on Maven (those projects can be run locally).If you use the PC available in the &hellip;<\/p>\n","protected":false},"author":11,"featured_media":1517,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"editor_plus_copied_stylings":"{}","footnotes":""},"categories":[37],"tags":[],"class_list":["post-4549","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-courses"],"_links":{"self":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts\/4549","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/users\/11"}],"replies":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/comments?post=4549"}],"version-history":[{"count":67,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts\/4549\/revisions"}],"predecessor-version":[{"id":7264,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts\/4549\/revisions\/7264"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/media\/1517"}],"wp:attachment":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/media?parent=4549"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/categories?post=4549"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/tags?post=4549"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}