{"id":17693,"date":"2021-02-20T18:25:37","date_gmt":"2021-02-20T17:25:37","guid":{"rendered":"https:\/\/dbdmg.polito.it\/wordpress\/?page_id=17693"},"modified":"2021-09-23T08:07:42","modified_gmt":"2021-09-23T07:07:42","slug":"distributed-architectures-for-big-data-processing-and-analytics-2020-2021","status":"publish","type":"page","link":"https:\/\/dbdmg.polito.it\/wordpress\/teaching\/distributed-architectures-for-big-data-processing-and-analytics-2020-2021\/","title":{"rendered":"Distributed architectures for big data processing and analytics (2020\/2021)"},"content":{"rendered":"<h3 id=\"tinyTOC\">Table of content<\/h3>\n<ul>\n<li><a href=\"#General-information-1\"><\/strong>General information<\/a><\/li>\n<li><a href=\"#Exam-rules-1\">Exam rules<\/a><\/li>\n<li><a href=\"#Slides-1\">Slides<\/a><\/li>\n<li><a href=\"#Exercises-1\">Exercises<\/a><\/li>\n<li><a href=\"#Practices-1\">Practices<\/a><\/li>\n<li><a href=\"#Exam-Examples-1\">Exam Examples<\/a><\/li>\n<li><a href=\"#Additional-material-1\">Additional material<\/a><\/li>\n<\/ul>\n<h3><strong><span style=\"color: #ff0000;\">Pay attention that this page is the web page for\u00a0 to the academic year 2020\/2021<\/span><\/strong><\/h3>\n<h3><strong><span id=\"General-information-1\"><\/strong>General information<\/span><\/h3>\n<ul>\n<li>ECTS: 8<\/li>\n<li>Professor: <a href=\"https:\/\/dbdmg.polito.it\/wordpress\/people\/paolo-garza\/\">Paolo Garza<\/a><\/li>\n<li>Teaching assistant: Luca Colomba<\/li>\n<\/ul>\n<h3><span id=\"Exam-rules-1\">Exam rules<\/span><\/h3>\n<ul>\n<li>Exam rules Academic Year 2020-2021 (<a href=\"https:\/\/didattica.polito.it\/pls\/portal30\/gap.pkg_guide.viewGap?p_cod_ins=01TUYSM&amp;p_a_acc=2021&amp;p_header=S&amp;p_lang=IT\">exam rules<\/a>)<\/li>\n<\/ul>\n<h3><span id=\"Slides-1\">Slides<\/span><\/h3>\n<ul>\n<li>Introduction to the course content and exam rules (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/02\/00_Intro_DistributedBigData_2021.pdf\">pdf<\/a>)<!-- (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/02\/00_Intro_BigData_2x.pdf\">2 slides per page<\/a>, <a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/02\/00_Intro_BigData_6x.pdf\">6 slides per page<\/a>)--><\/li>\n<li>Introduction to Big Data (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/02\/01_Intro_BigData_BigData.pdf\">pdf<\/a>)<\/li>\n<li>Big Data Architectures (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/02\/02_Architectures_BigData.pdf\">pdf<\/a>)<\/li>\n<li>Hadoop and MapReduce\n<ul>\n<li>Introduction to Apache Hadoop and the MapReduce programming paradigm (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/02\/03_Intro_HadoopAndMapReduce_BigData.pdf\">pdf<\/a>)\n<ul>\n<li>Interaction with HDFS and Hadoop by means of the command line (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/02\/03b_HDFS_Hadoop_CommandLine_BigData.pdf\">pdf<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Hadoop implementation of MapReduce (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/02\/04_HadoopImplementationOfMapReduce.pdf\">pdf<\/a>)\n<ul>\n<li>Source code of the Word Count Ecplise project (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/03\/WordCount.zip\">WordCount.zip<\/a>) &#8211; Use the import maven project option to import it in Eclipse<\/li>\n<li>PDF version of the code (i.e., PDF version of the java files) (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/03\/WordCountPDF.zip\">WordCountPDF.zip<\/a>)<\/li>\n<li>BigData@Polito environment + Jupyter &#8211; How to submit MapReduce jobs on BigData@Polito (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/02\/04b_ClusterJupyter_BigData.pdf\">pdf<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>MapReduce &#8211; Design patterns &#8211; Part 1 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/02\/05_MapReduce_Patterns_Part1_BigData.pdf\">pdf<\/a>)<\/li>\n<li>MapReduce and Hadoop &#8211; Advanced Topics: Multiple inputs, Multiple outputs, Distributed cache (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/02\/06_AdvancedTopicsMapReduce_BigData.pdf\">pdf<\/a>)<\/li>\n<li>MapReduce &#8211; Design patterns &#8211; Part 2 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/02\/07_MapReduce_Patterns_Part2_BigData.pdf\">pdf<\/a>)<\/li>\n<li>MapReduce &#8211; Relational Algebra\/SQL operators (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/02\/08_SQLOperators_BigData.pdf\">pdf<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Spark\n<ul>\n<li>Introduction to Apache Spark (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/10_SparkIntroduction_DistributedBigDataNB.pdf\">pdf<\/a>)\n<ul>\n<li>How to submit Spark applications (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/10b_SparkSubmit_DistributedBigDataNB.pdf\">pdf<\/a>)<\/li>\n<li>How to use Jupyter notebooks for your Spark applications (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/10c_JupyterNotebooks_DistributedBigDataNB.pdf\">pdf<\/a>)\n<ul>\n<li>A useful online tutorial for those who want to install and run Spark locally on their PCs (tested for Linux) &#8211; How to use PySpark on your computer&#8221; by <span class=\"aq b ar as at au r dt q\"><span class=\"aq cl fv as br fw fx fy fz ga dt\">Favio V\u00e1zquez (<a href=\"https:\/\/towardsdatascience.com\/how-to-use-pyspark-on-your-computer-9c7180075617\">link<\/a>)<\/span><\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>RDD-based programs\n<ul>\n<li>RDDs: creation, basic transformations and actions (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/11_SparkRDDBasedProgramming_DistributedBigDataNB.pdf\">pdf<\/a>)<\/li>\n<li>Key-value RDDs: transformations and actions on key-value RDDs (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/12_SparkPairRDD_DistributedBigDataNB.pdf\">pdf<\/a>)<\/li>\n<li>DoubleRDDs (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/13_SparkDoubleRDD_DistributedBigDataNB.pdf\">pdf<\/a>)<\/li>\n<li>Advanced Topics: Cache, accumulators, broadcast variables (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/14_SparkRDDBasedProgramming_AdvancedTopics_DistributedBigDataNB.pdf\">pdf<\/a>)<\/li>\n<li>Advanced Topics &#8211; Part II: Custom partitioners, broadcast join (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/15_SparkRDDBasedProgramming_AdvancedTopicsII_DistributedBigDataNB.pdf\">pdf<\/a>)\n<ul>\n<li>RDD partition examples (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/04\/RDDPartitionsExamples.zip\">RDDPartitionsExamples.zip<\/a>)<\/li>\n<li>PageRank example (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/04\/RDDPageRank.zip\">RDDPageRank.zip<\/a>)<br \/>\nIntroduction to PageRank (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/15b_SparkIntroPageRank.pdf\">pdf<\/a>) &#8211; Uploaded on April 19, 2021<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Spark SQL and DataFrames\n<ul>\n<li>Spark SQL (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/16_SparkSQL_DistributedBigDataNB.pdf\">pdf<\/a>)\n<ul>\n<li>Simple examples &#8211; Jupyter notebook (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/SparkSQLSimpleExamples.zip\">SparkSQLSimpleExamples.zip<\/a>)<\/li>\n<li>Spark SQL join examples &#8211; Jupyter notebook (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/ExamplesSparkSQLJoins.zip\">ExamplesSparkSQLJoins.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Spark SQL &#8211; Part II (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/17_SparkSQL_PartII_DistributedBigDataNB.pdf\">pdf<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li>Data mining and Machine learning algorithms with Spark\n<ul>\n<li>MLlib\n<ul>\n<li>Introduction and Preprocessing (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/18a_SparkMLlib_DistributedBigDataNB.pdf\">pdf<\/a>)<\/li>\n<li>Classification (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/18b_SparkMLlib_DistributedBigDataNB.pdf\">pdf<\/a>)\n<ul>\n<li>Classification examples &#8211; Jupyter notebooks and sample data (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/ExampleClassificationMLlib.zip\">ExampleClassificationMLlib.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Clustering (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/18c_SparkMLlib_DistributedBigDataNB.pdf\">pdf<\/a>)\n<ul>\n<li>Clustering example &#8211; Jupyter notebook and sample data (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/ExampleClusteringMLlib.zip\">ExampleClusteringMLlib.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Regression (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/18d_SparkMLlib_DistributedBigDataNB.pdf\">pdf<\/a>)\n<ul>\n<li>Regression example &#8211; Jupyter notebook and sample data (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/ExampleRegressionMLlib.zip\">ExampleRegressionMLlib.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Itemset and Association rule mining (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/18e_SparkMLlib_DistributedBigDataNB.pdf\">pdf<\/a>)\n<ul>\n<li>Itemset and Association rule mining example &#8211; Jupyter notebook and sample data (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/ExampleItemsetMLlib.zip\">ExampleItemsetMLlib.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>GraphX\/GraphFrames\n<ul>\n<li>Introduction to GraphX and GraphFrames (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/19_SparkGraphFrame_PartI_DistributedBigDataNB.pdf\">pdf<\/a>)<\/li>\n<li>Graph Algorithms with GraphFrames (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/20_SparkGraphFrame_Algorithms_DistributedBigDataNB.pdf\">pdf<\/a>)\n<ul>\n<li>Simple example &#8211; Jupyter notebook (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/GraphFrameExamples.zip\">GraphFrameExamples.zip<\/a>)\n<ul>\n<li>Select kernel GraphFrames (Yarn) to run it on jupyter.polito.it<\/li>\n<li>Run &#8220;pyspark &#8211;packages graphframes:graphframes:0.8.1-spark3.0-s_2.12 &#8211;repositories https:\/\/repos.spark-packages.org&#8221; to run it locally on your PC\n<ul>\n<li>Use package graphframes:graphframes:0.8.0-spark2.4-s_2.11 if you locally installed Spark 2 instead of Spark 3<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Streaming data analytics\n<ul>\n<li>Spark Streaming\n<ul>\n<li>Spark Streaming (DStreams) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/21_SparkStreaming_DistributedBigDataNB.pdf\">pdf<\/a>)\n<ul>\n<li>Simple examples &#8211; Jupyter notebooks (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/SparkSteamingExamples.zip\">SparkSteamingExamples.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Structured Streaming (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/22_SparkStructuredStreaming_DistributedBigDataNB.pdf\">pdf<\/a>)\n<ul>\n<li>Simple examples &#8211; Jupyter notebooks (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/ExampleStructutedStreaming.zip\">SparkStructutedStreamingExamples.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Introduction to other big stream processing frameworks: Apache Storm, Apache Flink, .. (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/23_StreamingFrameworks_DistributedBigDataNB.pdf\">pdf<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n<h3><span id=\"Exercises-1\">Exercises<\/span><\/h3>\n<ul>\n<li>MapReduce\n<ul>\n<li>MapReduce exercises (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/02\/01_MapReduce_Exercises_BigData.pdf\">pdf<\/a>)\n<ul>\n<li>Solutions of Exercises 1-12 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/Solutions1_12.zip\">Solutions1_12.zip<\/a>)<\/li>\n<li>Solutions of Exercises 13-22 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/Solutions13_22.zip\">Solutions13_22.zip<\/a>)<\/li>\n<li>Solutions of Exercises 23-29 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/Solutions23_29.zip\">Solutions23_29.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Basic project\n<ul>\n<li>Linux and macOS\n<ul>\n<li>Basic Eclipse project for MapReduce applications (based on maven) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/MapReduceBasicProject.zip\">MapReduceBasicProject.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Windows\n<ul>\n<li>Setup instructions (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/ConfigureWindowsEnviroment.pdf\">ConfigureWindowsEnviroment.pdf<\/a>)\n<ul>\n<li>You must install also <strong>JDK 1.8<\/strong> and select it for the imported project inside Eclipse. If you already installed the JDK environment\u00a0 but the version is greater than JDK 1.8 you must install also JDK 1.8.<\/li>\n<\/ul>\n<\/li>\n<li>Winutils executable (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/winutils.zip\">winutils.zip<\/a>)<\/li>\n<li>Basic Eclipse project for MapReduce applications (based on maven) (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/MapReduceBasicProjectWindows.zip\">MapReduceBasicProjectWindows.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Spark\n<ul>\n<li>Spark RDD-, DataFrame-based exercises (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/02_Spark_Exercises_BigDataNB.pdf\">pdf<\/a>)\n<ul>\n<li>Example data &#8211; One folder with (few) data for each exercise (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/ExSparkData30_46.zip\">ExSparkData.zip<\/a>)<\/li>\n<li>Solutions of Exercises 30-36 &#8211; Jupyter notebooks (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/04\/SparkNotebooksSol30_36.zip\">SparkNotebooksSol30_36.zip<\/a>)<\/li>\n<li>Solutions of Exercises 37-42 &#8211; Jupyter notebooks (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/04\/SparkNotebooksSol37_42.zip\">SparkNotebooksSol37_42.zip<\/a>)\n<ul>\n<li>Exercises 37-38 \u2013 Spark SQL-based solutions \u2013 Jupyter notebooks (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/SparkNotebooksSol37_38DataframeSQL.zip\">SparkNotebooksSol37_38DataframeSQL.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Solutions of Exercises 43-46 &#8211; Jupyter notebooks (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/SparkNotebooksSol43_46.zip\">SparkNotebooksSol43_46.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Spark SQL exercises (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/02_Spark_ExerciseSparkSQLNB.pdf\">pdf<\/a>)\n<ul>\n<li>Example data &#8211; One folder with (few) data for each exercise (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/ExSparkSQLData.zip\">ExSparkSQLData.zip<\/a>)<\/li>\n<li>Solutions of Exercises 47-50 &#8211;\u00a0 Jupyter notebooks (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/SparkNotebooksSol47_50.zip\">SparkNotebooksSol47_50.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Spark MLlib exercises (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/03_MLlib_Exercises_BigDataNB.pdf\">pdf<\/a>)\n<ul>\n<li>Example data &#8211; One folder with (few) data for each exercise (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/ExampleMLlibData.zip\">ExampleMLlibData.zip<\/a>)<\/li>\n<li>Solutions of Exercise 51 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/SparkNotebooksSol51.zip\">SparkNotebooksSol51.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>GraphFrame exercises (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/04_GraphFrame_Exercises_BigDataNB.pdf\">pdf<\/a>)\n<ul>\n<li>Example data &#8211; One folder with (few) data for each exercise (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/ExampleGraphFrameData.zip\">ExampleGraphFrameData.zip<\/a>)<\/li>\n<li>Solutions of Exercises 52-57 &#8211; Jupyter notebooks (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/SparkNotebooksSol52_57.zip\">SparkNotebooksSol52_57.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Spark streaming exercises (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/05_SparkStreaming_Exercises_BigDataNB.pdf\">pdf<\/a>)\n<ul>\n<li>Example data &#8211; One folder with (few) data for each exercise (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/ExampleSparkStreamingData-1.zip\">ExampleSparkStreamingData.zip<\/a>)<\/li>\n<li>Solutions of Exercises 58-65 &#8211; Jupyter notebooks (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/05\/SparkNotebooksSol58_65.zip\">SparkNotebooksSol58_65.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Spark structured streaming and MLlib exercise (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/05\/06_SparkStructuredStreamingAndMLlib_ExercisesNB.pdf\">pdf<\/a>)\n<ul>\n<li>Example data &#8211; One folder with (few) data for each exercise (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/05\/ExampleSparkStructuredMLlibData.zip\">ExampleSparkStructuredMLlibData.zip<\/a>)<\/li>\n<li>Solution of Exercise 66 &#8211; Jupyter notebooks (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/05\/SparkNotebooksSol66.zip\">SparkNotebooksSol66.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li style=\"list-style-type: none;\"><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li style=\"list-style-type: none;\"><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3><span id=\"Practices-1\">Practices<\/span><\/h3>\n<ul>\n<li><strong>No lab activities during the first week<\/strong><\/li>\n<li>TEAM 1: Students from A to L \u2013 Friday from 2.30 pm to 4 pm<\/li>\n<li>TEAM 2: Students from M to Z \u2013 Friday from 4 pm to 5.30 pm<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<ul>\n<li>Lab1: Hadoop and MapReduce (<strong>Friday, March 12<\/strong>)\n<ul>\n<li>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/Lab1_DBD2021.pdf\">pdf<\/a>)<\/li>\n<li>How to import and run locally on your PC a MapReduce program by using Eclipse + maven (<a href=\"https:\/\/www.dropbox.com\/s\/niilmhv6k1130dt\/01_ImportProject_LocalRun.mp4?dl=0\">01_ImportProject_LocalRun.mp4<\/a>)<\/li>\n<li>How to create a jar file and execute your application on the remote cluster BigData@Polito (<a href=\"https:\/\/www.dropbox.com\/s\/65xy3hu9qvqp2oc\/02_Jar_ClusterExecution.mp4?dl=0\">02_Jar_ClusterExecution.mp4<\/a>)<\/li>\n<li>Basic project and small example data set\n<ul>\n<li>Linux and macOS (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/Lab1.zip\">Lab1.zip<\/a>)<\/li>\n<li>Windows (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/Lab1Windows.zip\">Lab1Windows.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Bonus task &#8211; Skeleton Eclipse project Hadoop \u2013 MapReduce\n<ul>\n<li>Linux and macOS (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/Lab1Bonus_Skeleton.zip\">Lab1Bonus_Skeleton.zip<\/a>)<\/li>\n<li>Windows (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/Lab1WindowsBonus_Skeleton.zip\">Lab1WindowsBonus_Skeleton.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Solution\n<ul>\n<li><a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/Lab1Bonus_Sol2021.zip\">Lab1Bonus_Sol2021.zip<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Lab2: Frequently bought\/reviewed together application with Hadoop MapReduce (<strong>Friday, March 19<\/strong>)\n<ul>\n<li>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/Lab2_DBD2021.pdf\">pdf<\/a>)<\/li>\n<li>Skeleton Eclipse project Hadoop \u2013 MapReduce\n<ul>\n<li>Linux and macOS (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/Lab2_Skeleton2021.zip\">Lab2_Skeleton2021.zip<\/a>)<\/li>\n<li>Windows (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/Lab2Windows_Skeleton2021.zip\">Lab2Windows_Skeleton2210.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Sample file (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/AmazonTransposedDataset_Sample.txt\">AmazonTransposedDataset_Sample.txt<\/a>)<\/li>\n<li>\u00a0Solution\n<ul>\n<li><a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/Lab2_Sol2021.zip\">Lab2_Sol2021.zip<\/a> \u2013 Three alternative solutions are provided (the solutions are characterized by a different efficiency)<\/li>\n<li>Comments on the three uploaded solutions (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/Lab2_DraftSolution_BigData.pdf\">slides<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Lab3: Normalized ratings for product recommendations with Hadoop MapReduce (<strong>Friday, March 26<\/strong>)\n<ul>\n<li>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/Lab3_DBD2021.pdf\">pdf<\/a>)<\/li>\n<li>Sample dataset (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/04\/ReviewsSample.csv\">ReviewsSample.csv<\/a>)<\/li>\n<li>Skeleton Eclipse project Hadoop \u2013 MapReduce\n<ul>\n<li>Linux and macOS (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/Lab3_Skeleton2021.zip\">Lab3_Skeleton2021.zip<\/a>)<\/li>\n<li>Windows (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/Lab3Windows_Skeleton2021.zip\">Lab3Windows_Skeleton2021.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>\u00a0Solution\n<ul>\n<li><a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/03\/Lab3_Sol2021.zip\">Lab3_Sol2021zip<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Lab4: Filter data and compute basic statistics with Apache Spark (<strong>Friday, April 9<\/strong>)\n<ul>\n<li>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/Lab4_DBD2021.pdf\">pdf<\/a>)<\/li>\n<li>Sample file (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/04\/SampleLocalFile.csv\">SampleLocalFile.csv<\/a>)<\/li>\n<li>Solution\n<ul>\n<li><a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/Lab4DBD_Sol2021.zip\">Lab4_Sol2021.zip<\/a> &#8211; Jupyter notebook (Lab4_Sol.ipynb) and Python script (Lab4_Sol.py)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Lab5: Frequently bought\/reviewed together application with Apache Spark (<strong>Friday, April 16)<\/strong>\n<ul>\n<li>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/Lab5_DBD2021.pdf\">pdf<\/a>)<\/li>\n<li>Sample dataset (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2018\/05\/ReviewsSample.csv\">ReviewsSample.csv<\/a>)<\/li>\n<li>Solution\n<ul>\n<li><a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/Lab5DBD_Sol2021.zip\">Lab5_Sol2021.zip<\/a> &#8211; Jupyter notebook (Lab5_DBD2021Sol.ipynb) and Python script (Lab5_DBD2021Sol.py)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Lab6: Bike sharing data analysis (<strong>Friday, April 23<\/strong>)\n<ul>\n<li>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/Lab6_DBD2021.pdf\">pdf<\/a>)<\/li>\n<li>Sample data (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/sampleData.zip\">zip<\/a>)<\/li>\n<li>Example KML file (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/example.zip\">zip<\/a>)<\/li>\n<li>Another KML visualizer that can be used to visualize on a map the result of your analysis: <a href=\"http:\/\/kmlviewer.nsspot.net\/\">http:\/\/kmlviewer.nsspot.net<\/a><\/li>\n<li>Solution\n<ul>\n<li><a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/Lab6DBD_Sol2021-1.zip\">Lab6_Sol2021.zip<\/a> &#8211; Jupyter notebook (Lab6_DBD2021Sol.ipynb) and Python script (Lab6_DBD2021Sol.py)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Lab7: Bike sharing data analysis based on Spark SQL (<strong>Friday, April 30 &#8211; 14:30-16:00<\/strong>)\n<ul>\n<li>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/Lab7_DBD2021.pdf\">pdf<\/a>)<\/li>\n<li>Sample data (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/sampleData.zip\">zip<\/a>)<\/li>\n<li>Solution\n<ul>\n<li><a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/04\/Lab7DBD_Sol2021.zip\">Lab7_Sol1920.zip<\/a> &#8211; Jupyter notebook and Python script<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Lab8: A classification pipeline with MLlib + SparkSQL (<strong>Friday, May 7 &#8211; 14:30-16:00<\/strong>)\n<ul>\n<li>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/05\/Lab8_DBD2021.pdf\">pdf<\/a>)<\/li>\n<li>Template (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/05\/lab8_template2021.zip\">zip<\/a>)<\/li>\n<li>\u00a0Solution\n<ul>\n<li><a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/05\/Lab8_Sol2021.zip\">Lab8_Sol2021.zip<\/a> &#8211; Jupyter notebooks<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Lab9: GraphFrame (<strong>Friday, May 14 &#8211; 14:30-16:00<\/strong>)\n<ul>\n<li>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/05\/Lab9_DBD2021.pdf\">pdf<\/a>)<\/li>\n<li>Solution\n<ul>\n<li><a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/05\/Lab9_Sol2021.zip\">Lab9_Sol2021.zip<\/a> &#8211; Jupyter notebook<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Lab10: Tweet analysis \u2013 Spark streaming (<strong>Friday, May 21 &#8211; 14:30-16:00<\/strong>)\n<ul>\n<li>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/05\/Lab10_DBD2021.pdf\">pdf<\/a>)<\/li>\n<li>Example files \u2013 tweets (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2017\/05\/exampledata_tweets.zip\">exampledata_tweets.zip<\/a>)<\/li>\n<li>Solution\n<ul>\n<li><a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/05\/Lab10_Sol2021.zip\">Lab10_Sol1920.zip<\/a> &#8211; Jupyter notebook<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Lab11: Classification with MLlib + Spark streaming (<strong>Friday, May 28 &#8211; 14:30-16:00<\/strong>)\n<ul>\n<li>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/05\/Lab11_DBD2021.pdf\">pdf<\/a>)<\/li>\n<li>Template (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/05\/Lab11_DBD2021_templates.zip\">zip<\/a>)<\/li>\n<li>All data &#8211; train, test and streaming (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/05\/Lab11_DBD_all_data.zip\">all_data.zip<\/a>)<\/li>\n<li>Streaming only data (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/05\/Lab11_DBD_streaming_data.zip\">streaming.zip<\/a>)<\/li>\n<li>Solution\n<ul>\n<li><a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/05\/Lab11_DBD2021_Solution.zip\">Lab11_Sol2021.zip<\/a> &#8211; Jupyter notebook<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3><span id=\"Exam-Examples-1\">Exam Examples<\/span><\/h3>\n<ul>\n<li>Exam Example #1 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/DistrBD_ExamExample1.pdf\">pdf<\/a>)\n<ul>\n<li>Solution\n<ul>\n<li>Question 1: (d)<\/li>\n<li>Question 2: (c)<\/li>\n<li><a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/06\/SolutionExamExample1.zip\">SolutionExamExample1.zip<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam Example #2 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/03\/DistrBD_ExamExample2.pdf\">pdf<\/a>)\n<ul>\n<li>Solution\n<ul>\n<li>Question 1: (d)<\/li>\n<li>Question 2: (c)<\/li>\n<li><a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/06\/SolutionExamExample2.zip\">SolutionExamExample2.zip<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam Example #3 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/06\/DistrBD_ExamExample3.pdf\">pdf<\/a>)\n<ul>\n<li>Solution\n<ul>\n<li>Question 1: (c)<\/li>\n<li>Question 2: (c)<\/li>\n<li><a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/06\/SolutionExamExample3.zip\">SolutionExamExample3.zip<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam Example #4 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/06\/DistrBD_ExamExample4.pdf\">pdf<\/a>)\n<ul>\n<li>Solution\n<ul>\n<li>Question 1: (d)<\/li>\n<li>Question 2: (c)<\/li>\n<li><a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/06\/SolutionExamExample4.zip\">SolutionExamExample4.zip<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam Example #5 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/06\/DistrBD_ExamExample5.pdf\">pdf<\/a>)\n<ul>\n<li>Solution\n<ul>\n<li>Question 1: (b)<\/li>\n<li>Question 2: (b)<\/li>\n<li><a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/06\/SolutionExamExample5.zip\">SolutionExamExample5.zip<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam June 27, 2020 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/06\/DBD_Exam20200627.pdf\">pdf<\/a>)\n<ul>\n<li>Solution\n<ul>\n<li>Question 1: (b)<\/li>\n<li>Question 2: (a)<\/li>\n<li>MapReduce and Spark (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/06\/DBD_Exam20200627Sol.zip\">DBD_Exam20200627Sol.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam July 20, 2020 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/07\/DBD_Exam20200720.pdf\">pdf<\/a>)\n<ul>\n<li>Solution\n<ul>\n<li>Question 1: (d)<\/li>\n<li>Question 2: (b) \u2013 Note that there are three actions and hence the input file is read three times.<\/li>\n<li>MapReduce and Spark (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/07\/DBD_Exam20200720Sol.zip\">DBD_Exam20200720Sol.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam September 14, 2020 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/09\/DBD_Exam20200914.pdf\">pdf<\/a>)\n<ul>\n<li>Solution\n<ul>\n<li>Question 1: (d)<\/li>\n<li>Question 2: (c)<\/li>\n<li>MapReduce and Spark (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/09\/DBD_Exam20200914Sol.zip\">DBD_Exam20200914Sol.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam January 22, 2021 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/06\/DBD_Exam20210122.pdf\">pdf<\/a>)\n<ul>\n<li>Solution\n<ul>\n<li>Question 1: (c)<\/li>\n<li>Question 2: (c)<\/li>\n<li>MapReduce and Spark (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/06\/DBD_Exam20210122Sol.zip\">DBD_Exam20210122Sol.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Some more examples of multiple choice questions (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/06\/ExamplesMultipleChoiceQuestions.pdf\">pdf<\/a>)\n<ul>\n<li>Solution\n<ul>\n<li>Question 1: (c)<\/li>\n<li>Question 2: (d)<\/li>\n<li>Question 3: (d)<\/li>\n<li>Question 4: (d)<\/li>\n<li>Question 5: (b)<\/li>\n<li>Question 6: (d)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam June 21, 2021 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/06\/DBD_Exam20210621.pdf\">pdf<\/a>)\n<ul>\n<li>Solution\n<ul>\n<li>Question 1: (b)<\/li>\n<li>Question 2: (a)<\/li>\n<li>MapReduce and Spark (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/06\/DraftSolutionExam_20210621.zip\">DBD_Exam20210621Sol.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Exam July 5, 2021 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/07\/DBD_Exam20210705.pdf\">pdf<\/a>)\n<ul>\n<li>Solution\n<ul>\n<li>Question 1: (c)<\/li>\n<li>Question 2: (a)<\/li>\n<li>MapReduce and Spark (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/07\/DBD_Exam20210705Sol.zip\">DBD_Exam20210705Sol.zip<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3><span id=\"Additional-material-1\">Additional material<\/span><\/h3>\n<ul>\n<li>Slides and screencasts about Java (kindly provided by prof. Torchiano) (<a href=\"http:\/\/dbdmg.polito.it\/~paolo\/JavaMaterials\/02JEY%20-%20Object%20Oriented%20Programming.html\">link<\/a>)\n<ul>\n<li>Suggested slides\/lectures for those students who have never used Java\n<ul>\n<li>OO Paradigm and UML (The UML part is not needed)<\/li>\n<li>The Java Environment<\/li>\n<li>Java Basic Features<\/li>\n<li>Java Inheritance<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Slides about Relational model and SQL language (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/teaching\/databases\/\">link<\/a>)\n<ul>\n<li>Suggested parts\n<ul>\n<li>Relational data model<\/li>\n<li>SQL language:\n<ul>\n<li>Basics<\/li>\n<li>The SELECT statement: basics<\/li>\n<li>Nested queries<\/li>\n<li>Set operators<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<br class=\"fixfloat\" \/>","protected":false},"excerpt":{"rendered":"<p>Table of content General information Exam rules Slides Exercises Practices Exam Examples Additional material Pay attention that this page is the web page for\u00a0 to the academic year 2020\/2021 General information ECTS: 8 Professor: Teaching assistant: Luca Colomba Exam rules Exam rules Academic Year 2020-2021 (exam rules) Slides Introduction to the course content and exam<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/teaching\/distributed-architectures-for-big-data-processing-and-analytics-2020-2021\/\">[&#8230;]<\/a><\/p>\n","protected":false},"author":3,"featured_media":0,"parent":96,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-17693","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/pages\/17693","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/comments?post=17693"}],"version-history":[{"count":86,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/pages\/17693\/revisions"}],"predecessor-version":[{"id":18343,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/pages\/17693\/revisions\/18343"}],"up":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/pages\/96"}],"wp:attachment":[{"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/media?parent=17693"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}