{"id":8339,"date":"2024-02-24T11:33:00","date_gmt":"2024-02-24T10:33:00","guid":{"rendered":"https:\/\/dbdmg.polito.it\/dbdmg_web\/?p=8339"},"modified":"2024-09-10T09:08:45","modified_gmt":"2024-09-10T07:08:45","slug":"distributed-architectures-for-big-data-processing-and-analytics-2023-2024","status":"publish","type":"post","link":"https:\/\/dbdmg.polito.it\/dbdmg_web\/2024\/distributed-architectures-for-big-data-processing-and-analytics-2023-2024\/","title":{"rendered":"Distributed architectures for big data processing and analytics (2023\/2024)"},"content":{"rendered":"\n<h2 class=\" wp-block-heading eplus-wrapper\">General Information<\/h2>\n\n\n\n<p class=\" eplus-wrapper\"><strong>SSD<\/strong>: ING-INF\/05<\/p>\n\n\n\n<p class=\" eplus-wrapper\"><strong>CFU<\/strong>: 8<\/p>\n\n\n\n<p class=\" eplus-wrapper\"><strong>Professor<\/strong>: Paolo Garza<\/p>\n\n\n\n<p class=\" eplus-wrapper\"><strong>Teaching Assistant<\/strong>: Simone Papicchio<\/p>\n\n\n\n<hr class=\" wp-block-separator has-css-opacity eplus-wrapper\"\/>\n\n\n\n<h2 class=\" wp-block-heading eplus-wrapper\">Teaching Material<\/h2>\n\n\n\n<h5 class=\" wp-block-heading eplus-wrapper\">Introduction<\/h5>\n\n\n<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-cee3c5\">\n<li class=\" eplus-wrapper\">Introduction to the course content and exam rules (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/02\/00_Intro_DistributedBigData_2324.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Introduction to Big Data (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/02\/01_Intro_BigData_BigDataNB.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Big Data Architectures (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/02\/02_Architectures_BigDataNB.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<\/li>\n<\/ul>\n\n\n<h5 class=\" wp-block-heading eplus-wrapper\">Hadoop and MapReduce<\/h5>\n\n\n<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-8c881f\">\n<li class=\" eplus-wrapper\">Introduction to Apache Hadoop and the MapReduce programming paradigm (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/02\/03_Intro_HadoopAndMapReduce_BigDataNB.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-3a28ce\">\n<li class=\" eplus-wrapper\">Interaction with HDFS and Hadoop using the command line (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/02\/03b_HDFS_Hadoop_CommandLine_BigDataNB.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Hadoop implementation of MapReduce (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/02\/04_HadoopImplementationOfMapReduceNB.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-e05690\">\n<li class=\" eplus-wrapper\">Source code of the Word Count Ecplise project (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/02\/MapReduceBasicProject.zip\" target=\"_blank\" rel=\"noreferrer noopener\">WordCount.zip<\/a>) \u2013 Use the import maven project option to import it into Visual Studio Code<\/li>\n\n\n\n<li class=\" eplus-wrapper\">BigData@Polito environment + Jupyter \u2013 How to submit MapReduce jobs on BigData@Polito (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/02\/04b_ClusterJupyter_BigDataNB.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">MapReduce \u2013 Design patterns \u2013 Part 1 (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/02\/05_MapReduce_Patterns_Part1_BigDataNB.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">MapReduce and Hadoop \u2013 Advanced Topics: Multiple inputs, Multiple outputs, Distributed cache (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/02\/06_AdvancedTopicsMapReduce_BigDataNB.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">MapReduce \u2013 Design patterns \u2013 Part 2 (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/02\/07_MapReduce_Patterns_Part2_BigDataNB.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">MapReduce \u2013 Relational Algebra\/SQL operators (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/02\/08_SQLOperators_BigDataNB.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<\/li>\n<\/ul>\n\n\n<h5 class=\" wp-block-heading eplus-wrapper\">Spark<\/h5>\n\n\n\n<div class=\"wp-block-group eplus-wrapper\"><div class=\"wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained\"><ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-af4d4c\">\n<li class=\" eplus-wrapper\">Introduction to Apache Spark (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/10_SparkIntroduction_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-d23b8d\">\n<li class=\" eplus-wrapper\">How to submit Spark applications (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/10b_SparkSubmit_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">How to use Jupyter Notebooks for your Spark applications (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/10c_JupyterNotebooks_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">You can install PySpark and JupyterLab using\u00a0<strong>Conda\/Miniconda\/pip<\/strong>\u00a0(<a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/dbdmg\/pyspark-install\" target=\"_blank\">instructions here<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">RDD-based programs<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-5e2754\">\n<li class=\" eplus-wrapper\">RDDs: creation, basic transformations and actions (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/11_SparkRDDBasedProgramming_DistributedBigDataNB.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-3932ed\">\n<li class=\" eplus-wrapper\">Some examples (partially selected from the slides): Examples &#8211; Notebook (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/04\/ExamplesSlides.zip\">ExamplesFromSlides.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Key-value RDDs: transformations and actions on key-value RDDs (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/12_SparkPairRDD_DistributedBigDataNB.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-43f1de\">\n<li class=\" eplus-wrapper\">Inner join, left outer join, right outer join, full outer join, and &#8220;NOT IN&#8221; with PairRDDs: Examples &#8211; Notebook (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/04\/JoinsRDD.zip\" target=\"_blank\" rel=\"noreferrer noopener\">JoinsRDD.zip<\/a>) &#8211; Uploaded on April 21, 2024<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">DoubleRDDs (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/13_SparkDoubleRDD_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Advanced Topics: Cache, accumulators, broadcast variables, custom partitioners, broadcast join (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/14_SparkRDDBasedProgramming_AdvancedTopics_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-9f59a7\">\n<li class=\" eplus-wrapper\">RDD partition examples (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/RDDPartitionsExamples.zip\" target=\"_blank\">RDDPartitionsExamples.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Introduction to PageRank (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/15b_SparkIntroPageRankNB.pdf\" target=\"_blank\">pdf<\/a>) \u2013 Example: PageRank \u201cnaive\u201d implementation (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/RDDPageRank.zip\" target=\"_blank\">RDDPageRank.zip<\/a>)<\/li>\n<\/ul><\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Spark SQL and DataFrames<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-d9cbd0\">\n<li class=\" eplus-wrapper\">Spark SQL (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/16_SparkSQL_DistributedBigDataNB.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>) &#8211; Slide 86 was updated on May 8, 2024<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-0cf39d\">\n<li class=\" eplus-wrapper\">Simple examples \u2013 Jupyter notebook (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/SparkSQLSimpleExamples.zip\" target=\"_blank\">SparkSQLSimpleExamples.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Spark SQL join examples \u2013 Jupyter notebook (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/ExamplesSparkSQLJoins.zip\" target=\"_blank\">ExamplesSparkSQLJoins.zip<\/a>)<\/li>\n<\/ul><\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Data mining and Machine learning algorithms with Spark MLlib<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-134b96\">\n<li class=\" eplus-wrapper\">Introduction and Preprocessing (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/18a_SparkMLlib_DistributedBigDataNB.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>) &#8211; Slide 52: updated on May 15, 2024<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Classification (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/18b_SparkMLlib_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-423c66\">\n<li class=\" eplus-wrapper\">Classification examples \u2013 Jupyter notebooks and sample data (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/ExampleClassificationMLlib.zip\" target=\"_blank\">ExampleClassificationMLlib.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Clustering (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/18c_SparkMLlib_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-03549a\">\n<li class=\" eplus-wrapper\">Clustering example \u2013 Jupyter notebook and sample data (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/ExampleClusteringMLlib.zip\" target=\"_blank\">ExampleClusteringMLlib.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Regression (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/18d_SparkMLlib_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-49dbb5\">\n<li class=\" eplus-wrapper\">Regression example \u2013 Jupyter notebook and sample data (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/ExampleRegressionMLlib.zip\" target=\"_blank\">ExampleRegressionMLlib.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Itemset and Association rule mining (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/18e_SparkMLlib_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-0a25f2\">\n<li class=\" eplus-wrapper\">Itemset and Association rule mining example \u2013 Jupyter notebook and sample data (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/ExampleItemsetMLlib.zip\" target=\"_blank\">ExampleItemsetMLlib.zip<\/a>)<\/li>\n<\/ul><\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">GraphX\/GraphFrames<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-332384\">\n<li class=\" eplus-wrapper\">Introduction to GraphX and GraphFrames (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/19_SparkGraphFrame_PartI_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Graph Algorithms with GraphFrames (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/20_SparkGraphFrame_Algorithms_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-77ed5a\">\n<li class=\" eplus-wrapper\">Simple example \u2013 Jupyter notebook (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/GraphFrameExamples.zip\" target=\"_blank\">GraphFrameExamples.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Select kernel GraphFrames (Yarn) to run it on jupyter.polito.it<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Run \u201cpyspark \u2013packages graphframes:graphframes:0.8.1-spark3.0-s_2.12 \u2013repositories https:\/\/repos.spark-packages.org\u201d to run it locally on your PC \u2013 Use package graphframes:graphframes:0.8.0-spark2.4-s_2.11 if you locally installed Spark 2 instead of Spark 3<\/li>\n<\/ul><\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Streaming data analytics<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-f2e8d3\">\n<li class=\" eplus-wrapper\">Spark Streaming Spark Streaming (DStreams) (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/21_SparkStreaming_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-4ef085\">\n<li class=\" eplus-wrapper\">Simple examples \u2013 Jupyter notebooks (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/SparkSteamingExamples.zip\" target=\"_blank\">SparkSteamingExamples.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Structured Streaming (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/22_SparkStructuredStreaming_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-ea2a6c\">\n<li class=\" eplus-wrapper\">Simple examples \u2013 Jupyter notebooks (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/ExampleStructutedStreaming.zip\" target=\"_blank\">SparkStructutedStreamingExamples.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Introduction to other big stream processing frameworks: Apache Storm, Apache Flink, .. (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/23_StreamingFrameworks_DistributedBigDataNB.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>) &#8211; <strong>Not covered this academic year<\/strong><\/li>\n<\/ul><\/li>\n<\/ul><\/div><\/div>\n\n\n\n<h2 class=\" wp-block-heading eplus-wrapper\">Exercises<\/h2>\n\n\n\n<h5 class=\" wp-block-heading eplus-wrapper\">MapReduce<\/h5>\n\n\n<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-487970\">\n<li class=\" eplus-wrapper\">MapReduce Exercises (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/01_MapReduce_Exercises_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-15a093\">\n<li class=\" eplus-wrapper\">Solutions of Exercises 1-29 (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/SolutionsExMapReduce.zip\" target=\"_blank\" rel=\"noreferrer noopener\">SolutionsExMapReduce.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">How to configure Visual Studio Code on your personal laptop: \ud83d\udcd8<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2023\/10\/BigData_labs-VSCode_guide.pdf\">guide<\/a>.<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-cf8095\">\n<li class=\" eplus-wrapper\">Note that <strong>you must also install<\/strong> <strong>JDK 1.8<\/strong> and select it for the imported project inside the IDE. If you have already installed the JDK environment but the version is greater than JDK 1.8, you must also install<strong> JDK 1.8<\/strong>.<\/li>\n\n\n\n<li class=\" eplus-wrapper\"><p class=\" eplus-wrapper\"><strong>Windows users only:<\/strong> You must configure the <strong>winutils<\/strong> (\ud83d\uddc3\ufe0f<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/winutils.zip\" target=\"_blank\" rel=\"noreferrer noopener\">winutils.zip<\/a>) and set up some environmental variables. Follow this \ud83d\udcd8<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/ConfigureWindowsEnviroment.pdf\">extra guide<\/a> for the complete configuration.<\/p><ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-fb00c9\">\n<li class=\" eplus-wrapper\">Some of you solved the problems with their Windows version by downloading winutils.exe and hadoop.dll from this alternative source: <a href=\"https:\/\/github.com\/steveloughran\/winutils\/tree\/master\/hadoop-2.7.1\/bin\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/github.com\/steveloughran\/winutils\/tree\/master\/hadoop-2.7.1\/bin<\/a><\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\"><p class=\" eplus-wrapper\">There are multiple versions of the basic projects. The version with libraries is <strong>the only one<\/strong> you can use on the LABINF computers. Use it on your laptop if you are not interested in running the applications locally. All the other versions are Maven projects, so you can use them locally on your personal laptop to write the code and then run it locally inside Visual Studio Code or on the BigData@Polito cluster. The legend is as follows: \ud83d\udcdalib: Project\/template with libraries, \ud83d\udc27mavU: Maven project for Linux\/MacOS, \ud83e\ude9fmavW: Maven project for Windows (Hadoop projects only).<\/p><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Basic project for <strong>MapReduce<\/strong> applications (\ud83d\udcda<strong><a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/03\/MapReduceBasicProjectWithLibraries.zip\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>lib<\/strong><\/a><\/strong>, \ud83d\udc27<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/MapReduceBasicProject.zip\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>mavU<\/strong><\/a>, \ud83e\ude9f<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/MapReduceBasicProjectWindows.zip\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>mavW<\/strong><\/a>)<\/li>\n<\/ul><\/li>\n<\/ul>\n\n\n<h5 class=\" wp-block-heading eplus-wrapper\">Spark<\/h5>\n\n\n<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-01190c\">\n<li class=\" eplus-wrapper\">Spark exercises (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/02_Spark_Exercises_BigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-8a0212\">\n<li class=\" eplus-wrapper\">Example data \u2013 One folder with (few) data for each exercise (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/ExSparkData30_46.zip\" target=\"_blank\">ExSparkData.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">RDD-based solutions of Exercises 30-46 \u2013 Jupyter notebooks (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/SolutionsExSpark30_46.zip\" target=\"_blank\" rel=\"noreferrer noopener\">SparkNotebooksSol30_46.zip<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-bd0f16\">\n<li class=\" eplus-wrapper\">Solution of Exercise 44 based on Left Outer Join (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/04\/ex44LeftOuterJoin.zip\">ex44LeftOuterJoin.zip<\/a>) &#8211; Uploaded on April 24, 2024<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Solution of Exercise 46 based on Spark SQL APIs + RDD.groupByKey() &#8211; Example to show how to create and manage &#8220;static windows&#8221; with almost only Spark SQL APIs (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/06\/ex46_DF.zip\" target=\"_blank\" rel=\"noreferrer noopener\">ex46_DF.zip<\/a>) &#8211; Uploaded on June 13, 2024<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">PySpark Installation Guide<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-311ccf\">\n<li class=\" eplus-wrapper\">How to run PySpark applications on your PC or Google Colab: You can install PySpark and JupyterLab using\u00a0<strong>Conda\/Miniconda\/pip<\/strong>\u00a0(<a href=\"https:\/\/github.com\/dbdmg\/pyspark-install\" target=\"_blank\" rel=\"noreferrer noopener\">instructions here<\/a>)<\/li>\n<\/ul><\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Spark SQL exercises (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/02_Spark_ExerciseSparkSQLNB.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-97337b\">\n<li class=\" eplus-wrapper\">Example data \u2013 One folder with (few) data for each exercise (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/ExSparkSQLData.zip\" target=\"_blank\">ExSparkSQLData.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Solutions of Exercises 47-50 \u2013 Jupyter notebooks (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/SparkNotebooksSol47_50.zip\" target=\"_blank\" rel=\"noreferrer noopener\">SparkNotebooksSol47_50.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Spark MLlib exercises (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/03_MLlib_Exercises_BigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-e425d6\">\n<li class=\" eplus-wrapper\">Example data \u2013 One folder with (few) data for each exercise (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/ExampleMLlibData.zip\" target=\"_blank\">ExampleMLlibData.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Solutions of Exercise 51 (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/SparkNotebooksSol51.zip\" target=\"_blank\">SparkNotebooksSol51.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">GraphFrame exercises (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/04_GraphFrame_Exercises_BigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-363514\">\n<li class=\" eplus-wrapper\">Example data \u2013 One folder with (few) data for each exercise (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/ExampleGraphFrameData.zip\" target=\"_blank\">ExampleGraphFrameData.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Solutions of Exercises 52-57b \u2013 Jupyter notebooks (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/05\/SparkNotebooksSol52_57b.zip\" target=\"_blank\">SparkNotebooksSol52_57b.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Spark streaming exercises (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/05_SparkStreaming_Exercises_BigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-e64f63\">\n<li class=\" eplus-wrapper\">Example data \u2013 One folder with (few) data for each exercise (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/ExampleSparkStreamingData-1.zip\" target=\"_blank\">ExampleSparkStreamingData.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Solutions of Exercises 58-65 \u2013 Jupyter notebooks (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/SparkNotebooksSol58_65.zip\" target=\"_blank\">SparkNotebooksSol58_65.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Spark structured streaming and MLlib exercise (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/06_SparkStructuredStreamingAndMLlib_ExercisesNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-70e2f1\">\n<li class=\" eplus-wrapper\">Example data \u2013 One folder with (few) data for each exercise (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/ExampleSparkStructuredMLlibData.zip\" target=\"_blank\">ExampleSparkStructuredMLlibData.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Solution of Exercise 66 \u2013 Jupyter notebooks (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/SparkNotebooksSol66.zip\" target=\"_blank\" rel=\"noreferrer noopener\">SparkNotebooksSol66.zip<\/a>)<\/li>\n<\/ul><\/li>\n<\/ul>\n\n\n<h2 class=\" wp-block-heading eplus-wrapper\">Laboratory Material<\/h2>\n\n\n\n<p class=\" eplus-wrapper\">Team 1: Students from A to L \u2013 Tuesday from 11:30 to 13:00 (First lab activity \u2013 March 12, 2024) @ <a href=\"https:\/\/www.labinf.polito.it\/\">LABINF<\/a><br>Team 2: Students from M to Z \u2013 Friday from 11:30 to 13:00 (First lab activity \u2013 March 15, 2024) @ <a href=\"https:\/\/www.labinf.polito.it\/\" target=\"_blank\" rel=\"noreferrer noopener\">LABINF<\/a><\/p>\n\n\n<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-af9fe2\">\n<li class=\" eplus-wrapper\"><strong>How to configure Visual Studio Code on your personal laptop:<\/strong> \ud83d\udcd8<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2023\/10\/BigData_labs-VSCode_guide.pdf\">guide<\/a>.<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-940f60\">\n<li class=\" eplus-wrapper\">Note that <strong>you must also install JDK 1.8<\/strong> and select it for the imported project inside the IDE. If you have already installed the JDK environment but the version is greater than JDK 1.8, you must also install<strong> JDK 1.8<\/strong>.<\/li>\n\n\n\n<li class=\" eplus-wrapper\"><strong>Windows users only:<\/strong> You must configure the <strong>winutils<\/strong> (\ud83d\uddc3\ufe0f<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/winutils.zip\" target=\"_blank\" rel=\"noreferrer noopener\">winutils.zip<\/a>) and set up some environmental variables. Follow this \ud83d\udcd8<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/ConfigureWindowsEnviroment.pdf\">extra guide<\/a> for the complete configuration.<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-c98dee\">\n<li class=\" eplus-wrapper\">Some of you solved the problems with their Windows version by downloading winutils.exe and hadoop.dll from this alternative source: <a href=\"https:\/\/github.com\/steveloughran\/winutils\/tree\/master\/hadoop-2.7.1\/bin\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/github.com\/steveloughran\/winutils\/tree\/master\/hadoop-2.7.1\/bin<\/a><\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Laboratory materials are available in multiple versions. The version with libraries is <strong>the only one<\/strong> you can use on the LABINF computers. Use it on your laptop if you are not interested in running the applications locally. All the other versions are Maven projects, so you can use them locally on your personal laptop to write the code and then run it locally inside Visual Studio Code or on the BigData@Polito cluster. The legend is as follows: \ud83d\udcdalib: Project\/template with libraries, \ud83d\udc27mavU: Maven project for Linux\/MacOS, \ud83e\ude9fmavW: Maven project for Windows (Hadoop projects only).<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Basic project for <strong>MapReduce<\/strong> applications (\ud83d\udcda<strong><a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/03\/MapReduceBasicProjectWithLibraries.zip\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>lib<\/strong><\/a><\/strong>, \ud83d\udc27<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/MapReduceBasicProject.zip\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>mavU<\/strong><\/a>, \ud83e\ude9f<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/MapReduceBasicProjectWindows.zip\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>mavW<\/strong><\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\"><strong>How to configure JDK 1.8 on MAC in case of errors with standard procedure<\/strong>:<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-affdd4\">\n<li class=\" eplus-wrapper\">Follow the guide on <a href=\"https:\/\/docs.aws.amazon.com\/corretto\/latest\/corretto-8-ug\/downloads-list.html\">Downloads for Amazon Corretto 8 &#8211; Amazon Corretto 8<\/a><\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\"><strong>PySpark Installation Guide<\/strong>: How to run PySpark applications on your PC or Google Colab: You can install PySpark and JupyterLab using\u00a0<strong>Conda\/Miniconda\/pip<\/strong>\u00a0(<a href=\"https:\/\/github.com\/dbdmg\/pyspark-install\" target=\"_blank\" rel=\"noreferrer noopener\">instructions here<\/a>)<\/li>\n<\/ul>\n\n\n<p class=\" eplus-wrapper\">Problem specifications\/Lab solutions<\/p>\n\n\n<ul id=\"block-531ce537-4416-400a-983d-d652b1ef93ab\" class=\" wp-block-list eplus-wrapper eplus-styles-uid-17e4de\">\n<li class=\" eplus-wrapper\">\n<\/ul>\n\n\n<figure class=\" wp-block-table eplus-wrapper\"><table><tbody><tr><td>Problem specification and input data<\/td><td>Solution (Maven-based)<\/td><\/tr><tr><td><strong>Lab 1<\/strong>: Hadoop and MapReduce<br>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/03\/Lab1_BD_vscode.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<br>Basic project and small example dataset (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2023\/04\/Lab1_BigData_with_libraries_vscode.zip\" target=\"_blank\" rel=\"noreferrer noopener\">Lab1_BigData_with_libraries_vscode.zip<\/a>)<br>Basic project based on Maven \u2013 Use this version to run the MapReduce application locally on your own PC (<strong>DO NOT USE IT AT LABINF<\/strong>)<br>\u2014 Linux and macOS (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab1.zip\" target=\"_blank\" rel=\"noreferrer noopener\">Lab1.zip<\/a>)<br>\u2014 Windows (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab1Windows.zip\" target=\"_blank\" rel=\"noreferrer noopener\">Lab1_Windows.zip<\/a>)<br>Bigger dataset: finefoods_text.txt (<a href=\"https:\/\/www.dropbox.com\/s\/fswdiblx15mhmyo\/finefoods_text.zip?dl=0\" target=\"_blank\" rel=\"noreferrer noopener\">zip<\/a>)<\/td><td>Solution: <a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab1_SolBonusMvn.zip\" target=\"_blank\" rel=\"noreferrer noopener\">Bonus track Lab1_SolBonusMvn.zip<\/a><\/td><\/tr><tr><td><strong>Lab 2<\/strong>: Filter with Hadoop MapReduce<br>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/03\/Lab2_BD_vscode.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<br>Skeleton project Hadoop \u2014 MapReduce (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2023\/10\/Lab2_Skeleton_with_libraries_vscode.zip\" target=\"_blank\" rel=\"noreferrer noopener\">Lab2_Skeleton_with_libraries_vscode.zip<\/a>)<br>Basic project based on Maven \u2014 Use this version of the project to run the MapReduce application locally on your own PC (<strong>DO NOT USE IT AT LABINF<\/strong>)<br>\u2014 Linux and macOS (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab2_Skeleton.zip\" target=\"_blank\" rel=\"noreferrer noopener\">Lab2_Skeleton.zip<\/a>)<br>\u2014 Windows (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab2Windows_Skeleton.zip\" target=\"_blank\" rel=\"noreferrer noopener\">Lab2Windows_Skeleton.zip<\/a>)<br>Outputs of the first lab (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/OutputFolderLab1.zip\" target=\"_blank\" rel=\"noreferrer noopener\">OutputFolderLab1.zip<\/a>) (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/OutputFolderLab1BonusTrack.zip\" target=\"_blank\" rel=\"noreferrer noopener\">OutputFolderLab1BonusTrack.zip<\/a>). You can use them to test your application locally on your own PC if you are using Maven<\/td><td>Solution: <a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab2_Sol.zip\" target=\"_blank\" rel=\"noreferrer noopener\">Lab2_Sol.zip<\/a><br>Solution Bonus track: <a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab2_SolBonus.zip\" target=\"_blank\" rel=\"noreferrer noopener\">Lab2_SolBonus.zip<\/a><\/td><\/tr><tr><td><strong>Lab 3<\/strong>: Frequently bought\/reviewed together with Hadoop and MapReduce<br>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/03\/Lab3_DBD_vscode.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<br>Skeleton project Hadoop \u2014 MapReduce (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2023\/10\/Lab3_Skeleton_with_libraries_vscode.zip\" target=\"_blank\" rel=\"noreferrer noopener\">Lab3_Skeleton_with_libraries_vscode.zip<\/a>)<br>Sample data (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/AmazonTransposedDataset_Sample.txt\" target=\"_blank\" rel=\"noreferrer noopener\">AmazonTransposedDataset_Sample.txt<\/a>)<br>Basic project based on Maven \u2014 Use this version of the project to run the MapReduce application locally on your own PC (<strong>DO NOT USE IT AT LABINF<\/strong>)<br>\u2014 Linux and macOS (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab3_Skeleton.zip\" target=\"_blank\" rel=\"noreferrer noopener\">Lab3_Skeleton.zip<\/a>)<br>\u2014 Windows (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab3Windows_Skeleton.zip\" target=\"_blank\" rel=\"noreferrer noopener\">Lab3Windows_Skeleton.zip<\/a>)<\/td><td>Solution:&nbsp;<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/Lab3_DBD_Sol.zip\" target=\"_blank\" rel=\"noreferrer noopener\">Lab3_DBD_Sol.zip<\/a>&nbsp;\u2013 This project is based on mvn<br>\u2014 Comments on the three uploaded solutions (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/Lab3_DraftSolution_BigData_NewStyle.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<br>\u2014 <strong>The second solution MUST NOT BE USED<\/strong> &#8211; It is highly inefficient<\/td><\/tr><tr><td><strong>Lab 4<\/strong>: Normalized ratings for product recommendations with Hadoop MapReduce<br>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/04\/Lab4_DBD_vscode.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<br>Skeleton Eclipse project Hadoop \u2013 MapReduce (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/Lab4_Skeleton_with_libraries.zip\" target=\"_blank\" rel=\"noreferrer noopener\">Lab4_DBD_with_libraries.zip<\/a>)<br>Basic project based on Maven \u2013 Use this version to run the MapReduce application locally on your own PC (<strong>DO NOT USE THIS ON LABINF PCs<\/strong>)<br>\u2014 Linux and macOS (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/Lab4_DBD_mvn.zip\" target=\"_blank\" rel=\"noreferrer noopener\">Lab4_DBD_mvn.zip<\/a>)<br>\u2014 Windows (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/Lab4_DBD_Windows_mvn.zip\" target=\"_blank\" rel=\"noreferrer noopener\">Lab4_DBD_Windows_mvn.zip<\/a>)<br>Sample file (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/ReviewsSample.csv\" target=\"_blank\" rel=\"noreferrer noopener\">ReviewsSample.csv<\/a>)<\/td><td>Solution: <a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/Lab4_Sol.zip\" target=\"_blank\" rel=\"noreferrer noopener\">Lab4_Sol.zip<\/a><\/td><\/tr><tr><td><strong>Lab 5<\/strong>: Filter data and compute basic statistics with Apache Spark<br>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/Lab5_DBD.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<br>Sample file (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/SampleLocalFile.csv\" target=\"_blank\" rel=\"noreferrer noopener\">SampleLocalFile.csv<\/a>)<\/td><td>Solution:&nbsp;<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/Lab5_DBD_Sol.zip\" target=\"_blank\" rel=\"noreferrer noopener\">Lab5_DBD_Sol.zip<\/a><br>\u2014 Jupyter notebook (Lab5_Sol.ipynb)<br>\u2014 Python script (Lab5_Sol.py)<\/td><\/tr><tr><td><strong>Lab 6<\/strong>: Frequently bought\/reviewed together application with Apache Spark<br>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/Lab6_DBD.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<br>Sample dataset (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/ReviewsSample.csv\" target=\"_blank\" rel=\"noreferrer noopener\">ReviewsSample.csv<\/a>)<\/td><td>Solution:&nbsp;<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/Lab6_DBD_Sol.zip\" target=\"_blank\" rel=\"noreferrer noopener\">Lab6_DBD_Sol.zip<\/a><br>\u2014 Jupyter notebook (Lab6_Sol.ipynb) <br>\u2014 Python script (Lab6_Sol.py)<\/td><\/tr><tr><td><strong>Lab 7<\/strong>: Bike sharing data analysis<br>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/Lab7_DBD.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<br>Sample data (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/sampleData.zip\">zip<\/a>)<br>Example KML file (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/example.zip\">zip<\/a>)<br>KML file containing the result of the analysis setting the threshold to 0.6 and running the program on the HDFS file (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/resultTh0.6.zip\">zip<\/a>)<\/td><td>Solution:&nbsp;<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/Lab7_DBD_Sol.zip\" target=\"_blank\" rel=\"noreferrer noopener\">Lab7_DBD_Sol.zip<\/a><br>\u2014 Jupyter notebook (Lab7_Sol.ipynb)<br>\u2014 Python script (Lab7_Sol.py)<\/td><\/tr><tr><td><strong>Lab 8<\/strong>: Bike sharing data analysis based on Spark SQL<br>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/Lab8_DBD.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<br>Sample data (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/sampleData.zip\">zip<\/a>)<\/td><td>Solution <a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/Lab8_DBD_Sol.zip\" target=\"_blank\" rel=\"noreferrer noopener\">Lab8_DBD_Sol.zip<\/a> <br>\u2014 Jupyter notebooks (Lab8_Sol.ipynb and Lab8_SolSQL.ipynb)<br>\u2014 Python scripts (Lab8_Sol.py and Lab8_SolSQL.py)<\/td><\/tr><tr><td><strong>Lab 9: <\/strong>A classification pipeline with MLlib + SparkSQL<br>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/05\/Lab9_DBD.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<br>Sample data (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/05\/Lab9_template.zip\" target=\"_blank\" rel=\"noreferrer noopener\">zip<\/a>)<\/td><td>Solution <a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/05\/Lab9_DBD_Sol.zip\" target=\"_blank\" rel=\"noreferrer noopener\">Lab9_DBD_Sol.zip<\/a> <br>\u2014 Jupyter notebooks<\/td><\/tr><tr><td><strong>Lab10<\/strong>: GraphFrame<br>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/05\/Lab10_DBD.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<br>Data (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/05\/Lab10Data.zip\" target=\"_blank\" rel=\"noreferrer noopener\">zip<\/a>)<\/td><td>Solution <a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/05\/Lab10_DBD_Sol.zip\" target=\"_blank\" rel=\"noreferrer noopener\">Lab10_DBD_Sol.zip<\/a> <br>\u2014 Jupyter notebooks &#8211; Updated on May 30, 2024. distinct() has been added in Task 2.<\/td><\/tr><tr><td><strong>Lab11<\/strong>: Tweet analysis &#8211; Spark streaming<br>Problem specification (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/05\/Lab11_DBD.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<br>Example files &#8211; tweets (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/05\/Lab11Data.zip\" data-type=\"URL\" data-id=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/05\/Lab11Data.zip\" target=\"_blank\" rel=\"noreferrer noopener\">zip<\/a>)<\/td><td>Solution <a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/05\/Lab11_DBD_Sol.zip\" target=\"_blank\" rel=\"noreferrer noopener\">Lab11_DBD_Sol.zip<\/a><br>\u2014 Jupyter notebooks<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 class=\" wp-block-heading eplus-wrapper\">Previous exam examples<\/h2>\n\n\n\n<figure class=\" wp-block-table eplus-wrapper\"><table><tbody><tr><td>Exams<\/td><td>Solutions<\/td><\/tr><tr><td>Exam September 6, 2024 (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/09\/DBD_Exam_2024_09_06.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<\/td><td>Question 1: (a) &#8211; The three codes are equivalent. They are based on commutative functions\/methods.<br>Question 2: (a) &#8211; There are 3 distinct keys emitted by the map phase. Hence, the reduce method is invoked 3 times. It follows that the sum of the values of the three instances of numCitiesD is 3.<br>MapReduce and Spark (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/09\/dbd_20240906.zip\" target=\"_blank\" rel=\"noreferrer noopener\">DBD_Exam20240906Sol.zip<\/a>)<\/td><\/tr><tr><td>Exam July 19, 2024 (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/07\/DBD_Exam_2024_07_19.pdf\">pdf<\/a>)<\/td><td>Question 1: (b) &#8211; 2 times &#8211; Three actions are based on the content of the input file, but highTempRDD is cached. Hence, the input file is read once to compute the value of the count action applied to tempRDD and then one more time to compute the content of highTempRDD, which is then used to calculate the results of the actions count and reduce applied to highTempRDD. Globally, due to the cache of highTempRDD, the input file is read twice. <br>Question 2: (d) &#8211; 6 &#8211; There are 6 input lines => the map method is invoked, overall, 6 times.<br>MapReduce and Spark (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/07\/dbd_20240719.zip\" target=\"_blank\" rel=\"noreferrer noopener\">DBD_Exam20240719Sol.zip<\/a>)<\/td><\/tr><tr><td>Exam July 5, 2024 (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/07\/DBD_Exam_2024_07_05.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<\/td><td>Question 1: (c) &#8211; Application B is not equivalent to A and C because .reduce(lambda v1,v2: min(v1, v2) ).filter(lambda value : value&gt;5) is not equivalent to .filter(lambda value : value&gt;5).reduce(lambda v1,v2: min(v1, v2) ). The two functions are not commutative.<br>Question 2: (a) &#8211; Considering all instances of the reducer class, the reduce method is invoked 3 times overall (2 + 1 + 0).<br>MapReduce and Spark (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/07\/dbd_20240705.zip\" target=\"_blank\" rel=\"noreferrer noopener\">DBD_Exam20240705Sol.zip<\/a>)<br>Sketch of a solution based on SQL (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/07\/DraftSQLBased.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">SQLBasedSolution.pdf<\/a>)<\/td><\/tr><tr><td>Exam February 20, 2024 (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/06\/DBD_Exam_2024_02_20.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<\/td><td>Question 1: (a), Question 2: (b)<br>MapReduce and Spark (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/06\/DraftSolution20240220DBD.zip\" target=\"_blank\" rel=\"noreferrer noopener\">DBD_Exam20240220Sol.zip<\/a>) &#8211; Uploaded on June 16, 2024<\/td><\/tr><tr><td>Exam September 18, 2023 (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/06\/DBD_Exam20230918.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<\/td><td>Question 1: (c), Question 2: (c)<br>MapReduce and Spark (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/07\/Draft_DBD_EXAM_20230918.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Paper-based sketch of the solution &#8211; No code_ Exam20230918.pdf<\/a>) &#8211; Uploaded on July 4, 2024<\/td><\/tr><tr><td>Exam July 19, 2023 (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2023\/07\/DBD_Exam20230719.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<\/td><td>Question 1: (a), Question 2: (b)<br>MapReduce and Spark (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2023\/07\/dbd_20230719.zip\" target=\"_blank\" rel=\"noreferrer noopener\">DBD_Exam20230719Sol.zip<\/a>) &#8211; Updated on June 9, 2024, with an SQL-based solution and some example data<\/td><\/tr><tr><td>Exam June 26, 2023 (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2023\/07\/DBD_Exam20230626.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<\/td><td>Question 1: (b), Question 2: (c)<br>MapReduce and Spark (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2023\/07\/DBD_Exam20230626Sol.zip\" target=\"_blank\" rel=\"noreferrer noopener\">DBD_Exam20230626Sol.zip<\/a>) &#8211; Updated on June 8, 2024, with an SQL-based solution and some example data<\/td><\/tr><tr><td>Exam September 1, 2022 (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/09\/DBD_Exam20220901.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<\/td><td>Question 1: (b), Question 2: (d)<br>MapReduce and Spark (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/09\/DBD_Exam20220901Sol.zip\" target=\"_blank\" rel=\"noreferrer noopener\">DBD_Exam20220901Sol.zip<\/a>)<\/td><\/tr><tr><td>Exam July 18, 2022 (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/07\/DBD_Exam20220718.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<\/td><td>Question 1: (b), Question 2: (b)<br>MapReduce and Spark (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/07\/DraftSolution20220718.zip\" target=\"_blank\" rel=\"noreferrer noopener\">DBD_Exam20220718Sol.zip<\/a>) &#8211; Updated on June 9, 2024, with an SQL-based solution &#8211; Example related to &#8220;static windows&#8221; and how to manage them either RDD or Spark SQL APIs<\/td><\/tr><tr><td>Exam June 27, 2022 (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/06\/DBD_Exam20220627.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<\/td><td>Question 1: (c), Question 2: (a)<br>MapReduce and Spark (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/06\/DBD_Exam20220627Sol.zip\" target=\"_blank\" rel=\"noreferrer noopener\">DBD_Exam20220607Sol.zip<\/a>) &#8211; Updated on June 8, 2024, with an SQL-based solution and some example data<\/td><\/tr><tr><td>Exam February 10, 2022 (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/DBD_Exam20220210.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<\/td><td>Question 1: (a), Question 2: (b)<br>MapReduce and Spark (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/06\/DBD_Exam20220210Sol.zip\" target=\"_blank\" rel=\"noreferrer noopener\">DBD_Exam20220210Sol.zip<\/a>)<\/td><\/tr><tr><td>Exam September 17, 2021 (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/DBD_Exam20210917.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<\/td><td>Question 1: (b), Question 2: (a)<br>MapReduce and Spark (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/06\/DraftSolution20210917.zip\" target=\"_blank\" rel=\"noreferrer noopener\">DBD_Exam20210917.zip<\/a>)<\/td><\/tr><tr><td>Exam July 5, 2021 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/07\/DBD_Exam20210705.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<\/td><td>Question 1: (c), Question 2: (a)<br>MapReduce and Spark (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/05\/DBD_Exam20210705Sol.zip\" target=\"_blank\" rel=\"noreferrer noopener\">DBD_Exam20210705Sol.zip<\/a>) &#8211; Updated on May 7, 2024, with an SQL-based solution<\/td><\/tr><tr><td>Exam June 21, 2021 (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/06\/DBD_Exam20210621.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<\/td><td>Question 1: (b), Question 2: (a)<br>MapReduce and Spark (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/06\/DraftSolutionExam_20210621.zip\">DBD_Exam20210621Sol.zip<\/a>)<\/td><\/tr><tr><td>Exam July 20, 2020 (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/04\/DBD_Exam20200720.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<\/td><td>Question 1: (d), Question 2: (b)<br>Question 2 \u2013 Note that there are three actions. Hence, the input file is read three times.<br>MapReduce and Spark (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/05\/DBD_Exam20200720Sol.zip\" target=\"_blank\" rel=\"noreferrer noopener\">DBD_Exam20200720Sol.zip<\/a>)<\/td><\/tr><tr><td>Exam June 27, 2020 (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2024\/04\/DBD_Exam20200627.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<\/td><td>Question 1: (b), Question 2: (a)<br>MapReduce and Spark (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2020\/06\/DBD_Exam20200627Sol.zip\">DBD_Exam20200627Sol.zip<\/a>)<\/td><\/tr><tr><td>More examples of multiple choice questions (<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2021\/06\/ExamplesMultipleChoiceQuestions.pdf\">pdf<\/a>)<br>\ufeff<\/td><td>Question 1: (c)<br>Question 2: (d)<br>Question 3: (d)<br>Question 4: (d)<br>Question 5: (b)<br>Question 6: (d)<\/td><\/tr><tr><td>GraphFrame \u2013 Examples of multiple choice questions (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/05\/ExamplesMultipleChoiceQuestionsGraphFrame.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">pdf<\/a>)<\/td><td>Question 1: (d)<br>Question 2: (c)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 class=\" wp-block-heading eplus-wrapper\">Additional material<\/h2>\n\n\n\n<p class=\" eplus-wrapper\">Slides and screencasts about Java (kindly provided by Prof. Torchiano) (<a href=\"http:\/\/dbdmg.polito.it\/~paolo\/JavaMaterials\/02JEY%20-%20Object%20Oriented%20Programming.html\">link<\/a>)<br>Focus on the following subset of slides\/lectures (for students who have never used Java):<br>&#8212; OO Paradigm and UML (The UML part is not mandatory)<br>&#8212; The Java Environment<br>&#8212;  Java Basic Features<br>&#8212; Java Inheritance<\/p>\n","protected":false},"excerpt":{"rendered":"<p>General Information SSD: ING-INF\/05 CFU: 8 Professor: Paolo Garza Teaching Assistant: Simone Papicchio Teaching Material Introduction Hadoop and MapReduce Spark Exercises MapReduce Spark Laboratory Material Team 1: Students from A to L \u2013 Tuesday from 11:30 to 13:00 (First lab activity \u2013 March 12, 2024) @ LABINFTeam 2: Students from &hellip;<\/p>\n","protected":false},"author":5,"featured_media":3290,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"editor_plus_copied_stylings":"{}","footnotes":""},"categories":[37],"tags":[],"class_list":["post-8339","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-courses"],"_links":{"self":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts\/8339","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/comments?post=8339"}],"version-history":[{"count":95,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts\/8339\/revisions"}],"predecessor-version":[{"id":10680,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts\/8339\/revisions\/10680"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/media\/3290"}],"wp:attachment":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/media?parent=8339"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/categories?post=8339"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/tags?post=8339"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}