{"id":5461,"date":"2023-02-18T17:42:21","date_gmt":"2023-02-18T16:42:21","guid":{"rendered":"https:\/\/dbdmg.polito.it\/dbdmg_web\/?p=5461"},"modified":"2024-03-08T11:44:11","modified_gmt":"2024-03-08T10:44:11","slug":"distributed-architectures-for-big-data-processing-and-analytics-2022-2023","status":"publish","type":"post","link":"https:\/\/dbdmg.polito.it\/dbdmg_web\/2023\/distributed-architectures-for-big-data-processing-and-analytics-2022-2023\/","title":{"rendered":"Distributed architectures for big data processing and analytics (2022\/2023)"},"content":{"rendered":"<h1 class=\" wp-block-heading eplus-wrapper eplus-styles-uid-2689a9\"><strong>This web page is related to an old version of the course.<br>The web page of the current instance of the course is available at <a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/2024\/distributed-architectures-for-big-data-processing-and-analytics-2023-2024\/\">link<\/a>.<\/strong><\/h1>\n\n\n<h2 class=\" wp-block-heading eplus-wrapper\">General Information<\/h2>\n\n\n\n<p class=\" eplus-wrapper\"><strong>SSD<\/strong>: ING-INF\/05<\/p>\n\n\n\n<p class=\" eplus-wrapper\"><strong>CFU<\/strong>: 8<\/p>\n\n\n\n<p class=\" eplus-wrapper\"><strong>Professor<\/strong>: Paolo Garza<\/p>\n\n\n\n<p class=\" eplus-wrapper\"><strong>Teaching Assistant<\/strong>: Luca Colomba<\/p>\n\n\n\n<h2 class=\" wp-block-heading eplus-wrapper\">Announcements<\/h2>\n\n\n\n<p class=\" eplus-wrapper\">03-03-2023: Lab activities<br>&#8212; Team 1: Students from A to J &#8211; Tuesday from 11:30 to 13:00 (First lab activity &#8211; March 7, 2023) @ <a href=\"https:\/\/www.labinf.polito.it\/\">LABINF<\/a><br>&#8212; Team 2: Students from K to Z &#8211; Friday from 11:30 to 13:00 (First lab activity &#8211; Match 10, 2023) @ <a href=\"https:\/\/www.labinf.polito.it\/\">LABINF<\/a><\/p>\n\n\n\n<p class=\" eplus-wrapper\">18-02-2023: The first lecture is scheduled for February 27, 2023, at 8:30 in Classroom 27<\/p>\n\n\n\n<hr class=\" wp-block-separator has-css-opacity eplus-wrapper\"\/>\n\n\n\n<h2 class=\" wp-block-heading eplus-wrapper\">Teaching Material<\/h2>\n\n\n\n<h5 class=\" wp-block-heading eplus-wrapper\">Introduction<\/h5>\n\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-3d9fdb\">\n<li class=\" eplus-wrapper\">Introduction to the course content and exam rules (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2023\/02\/00_Intro_DistributedBigData_2223.pdf\" target=\"_blank\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Introduction to Big Data (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/02\/\/01_Intro_BigData_BigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Big Data Architectures (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/02\/\/02_Architectures_BigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<\/li>\n<\/ul>\n\n\n<h5 class=\" wp-block-heading eplus-wrapper\">Hadoop and MapReduce<\/h5>\n\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-b4793f\">\n<li class=\" eplus-wrapper\">Introduction to Apache Hadoop and the MapReduce programming paradigm (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/02\/\/03_Intro_HadoopAndMapReduce_BigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-dc6928\">\n<li class=\" eplus-wrapper\">Interaction with HDFS and Hadoop by means of the command line (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/02\/\/03b_HDFS_Hadoop_CommandLine_BigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Hadoop implementation of MapReduce (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/02\/\/04_HadoopImplementationOfMapReduceNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-fa3679\">\n<li class=\" eplus-wrapper\">Source code of the Word Count Ecplise project (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/03\/WordCount.zip\" target=\"_blank\">WordCount.zip<\/a>) \u2013 Use the import maven project option to import it in Eclipse<\/li>\n\n\n\n<li class=\" eplus-wrapper\">PDF version of the code (i.e., PDF version of the java files) (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2016\/03\/WordCountPDF.zip\" target=\"_blank\">WordCountPDF.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">BigData@Polito environment + Jupyter \u2013 How to submit MapReduce jobs on BigData@Polito (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/02\/\/04b_ClusterJupyter_BigDataNB.pdf\">pdf<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">MapReduce \u2013 Design patterns \u2013 Part 1 (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/02\/\/05_MapReduce_Patterns_Part1_BigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">MapReduce and Hadoop \u2013 Advanced Topics: Multiple inputs, Multiple outputs, Distributed cache (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/02\/\/06_AdvancedTopicsMapReduce_BigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">MapReduce \u2013 Design patterns \u2013 Part 2 (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/02\/\/07_MapReduce_Patterns_Part2_BigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">MapReduce \u2013 Relational Algebra\/SQL operators (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/02\/08_SQLOperators_BigDataNB.pdf\">pdf<\/a>)<\/li>\n<\/ul>\n\n\n<h5 class=\" wp-block-heading eplus-wrapper\">Spark<\/h5>\n\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-fea89f\">\n<li class=\" eplus-wrapper\">Introduction to Apache Spark (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/10_SparkIntroduction_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-11dbae\">\n<li class=\" eplus-wrapper\">How to submit Spark applications (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/10b_SparkSubmit_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">How to use Jupyter Notebooks for your Spark applications (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/10c_JupyterNotebooks_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">You can install PySpark and JupyterLab using <strong>Conda\/Miniconda\/pip<\/strong> (<a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/dbdmg\/pyspark-install\" target=\"_blank\">instructions here<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">RDD-based programs<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-864fe7\">\n<li class=\" eplus-wrapper\">RDDs: creation, basic transformations and actions (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/11_SparkRDDBasedProgramming_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Key-value RDDs: transformations and actions on key-value RDDs (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/12_SparkPairRDD_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">DoubleRDDs (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/13_SparkDoubleRDD_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Advanced Topics: Cache, accumulators, broadcast variables, custom partitioners, broadcast join (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/14_SparkRDDBasedProgramming_AdvancedTopics_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-0b83b1\">\n<li class=\" eplus-wrapper\">RDD partition examples (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/RDDPartitionsExamples.zip\" target=\"_blank\">RDDPartitionsExamples.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Introduction to PageRank (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/15b_SparkIntroPageRankNB.pdf\" target=\"_blank\">pdf<\/a>) &#8211; Example: PageRank \u201cnaive\u201d implementation (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/RDDPageRank.zip\" target=\"_blank\">RDDPageRank.zip<\/a>)<\/li>\n<\/ul><\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Spark SQL and DataFrames<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-e0513e\">\n<li class=\" eplus-wrapper\">Spark SQL (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/16_SparkSQL_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-937cb8\">\n<li class=\" eplus-wrapper\">Simple examples \u2013 Jupyter notebook (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/SparkSQLSimpleExamples.zip\" target=\"_blank\">SparkSQLSimpleExamples.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Spark SQL join examples \u2013 Jupyter notebook (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/ExamplesSparkSQLJoins.zip\" target=\"_blank\">ExamplesSparkSQLJoins.zip<\/a>)<\/li>\n<\/ul><\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Data mining and Machine learning algorithms with Spark MLlib<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-b1c3dd\">\n<li class=\" eplus-wrapper\">Introduction and Preprocessing (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/18a_SparkMLlib_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Classification (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/18b_SparkMLlib_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-734d97\">\n<li class=\" eplus-wrapper\">Classification examples \u2013 Jupyter notebooks and sample data (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/ExampleClassificationMLlib.zip\" target=\"_blank\">ExampleClassificationMLlib.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Clustering (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/18c_SparkMLlib_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-bb806b\">\n<li class=\" eplus-wrapper\">Clustering example \u2013 Jupyter notebook and sample data (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/ExampleClusteringMLlib.zip\" target=\"_blank\">ExampleClusteringMLlib.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Regression (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/18d_SparkMLlib_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-a5e3ef\">\n<li class=\" eplus-wrapper\">Regression example \u2013 Jupyter notebook and sample data (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/ExampleRegressionMLlib.zip\" target=\"_blank\">ExampleRegressionMLlib.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Itemset and Association rule mining (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/18e_SparkMLlib_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-8a5f34\">\n<li class=\" eplus-wrapper\">Itemset and Association rule mining example \u2013 Jupyter notebook and sample data (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/ExampleItemsetMLlib.zip\" target=\"_blank\">ExampleItemsetMLlib.zip<\/a>)<\/li>\n<\/ul><\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">GraphX\/GraphFrames<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-17941f\">\n<li class=\" eplus-wrapper\">Introduction to GraphX and GraphFrames (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/19_SparkGraphFrame_PartI_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Graph Algorithms with GraphFrames (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/20_SparkGraphFrame_Algorithms_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-ec34b4\">\n<li class=\" eplus-wrapper\">Simple example \u2013 Jupyter notebook (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/GraphFrameExamples.zip\" target=\"_blank\">GraphFrameExamples.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Select kernel GraphFrames (Yarn) to run it on jupyter.polito.it<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Run \u201cpyspark \u2013packages graphframes:graphframes:0.8.1-spark3.0-s_2.12 \u2013repositories https:\/\/repos.spark-packages.org\u201d to run it locally on your PC &#8211; Use package graphframes:graphframes:0.8.0-spark2.4-s_2.11 if you locally installed Spark 2 instead of Spark 3<\/li>\n<\/ul><\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Streaming data analytics<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-a037e1\">\n<li class=\" eplus-wrapper\">Spark Streaming Spark Streaming (DStreams) (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/21_SparkStreaming_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-7a95a0\">\n<li class=\" eplus-wrapper\">Simple examples \u2013 Jupyter notebooks (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/SparkSteamingExamples.zip\" target=\"_blank\">SparkSteamingExamples.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Structured Streaming (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/22_SparkStructuredStreaming_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-574cc2\">\n<li class=\" eplus-wrapper\">Simple examples \u2013 Jupyter notebooks (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/ExampleStructutedStreaming.zip\" target=\"_blank\">SparkStructutedStreamingExamples.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Introduction to other big stream processing frameworks: Apache Storm, Apache Flink, .. (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/23_StreamingFrameworks_DistributedBigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<\/li>\n<\/ul><\/li>\n<\/ul>\n\n\n<h2 class=\" wp-block-heading eplus-wrapper\">Exercises<\/h2>\n\n\n\n<p class=\" eplus-wrapper\"><strong>If you use your PC to write and run your code, import the projects based on Maven (those projects can be run locally).<br>If you use the PC available in the LAB, import the Eclipse projects with libraries (those projects cannot be run locally but only on the cluster exporting the project jar file).<\/strong><\/p>\n\n\n\n<h5 class=\" wp-block-heading eplus-wrapper\">MapReduce<\/h5>\n\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-697f72\">\n<li class=\" eplus-wrapper\">MapReduce Exercises (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/01_MapReduce_Exercises_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-ab35a4\">\n<li class=\" eplus-wrapper\">Solutions of Exercises 1-29 (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/SolutionsExMapReduce.zip\" target=\"_blank\">SolutionsExMapReduce.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Basic MapReduce project with Linux and macOS<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-860fe0\">\n<li class=\" eplus-wrapper\">Basic Eclipse project for MapReduce applications (with libraries) (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/WordCountLibraries.zip\" target=\"_blank\">MapReduceBasicProjectWithLibraries.zip<\/a>) \u2013 Import using Import\/General\/Existing Projects into Workspace<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Basic Eclipse project for MapReduce applications (based on maven) (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/MapReduceBasicProject.zip\" target=\"_blank\">MapReduceBasicProject.zip<\/a>) \u2013 Import this project using Import\/Maven\/Existing Maven Projects<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Basic MapReduce project with Windows<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-bef5c3\">\n<li class=\" eplus-wrapper\">Basic Eclipse project for MapReduce applications (with libraries) (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/WordCountLibraries.zip\" target=\"_blank\">MapReduceBasicProjectWithLibraries.zip<\/a>) \u2013 Import using Import\/General\/Existing Projects into Workspace<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Setup instructions for running MapReduce applications locally inside Eclipse (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/ConfigureWindowsEnviroment.pdf\" target=\"_blank\">ConfigureWindowsEnviroment.pdf<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-40f1e4\">\n<li class=\" eplus-wrapper\"><strong>You must<\/strong> also <strong>install JDK 1.8<\/strong> and select it for the imported project inside Eclipse. If you already installed the JDK environment, but the version is newer than JDK 1.8, you must also install JDK 1.8.<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Winutils executable (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/winutils.zip\" target=\"_blank\">winutils.zip<\/a>) &#8211; Some of you solved the problems with their Windows version by downloading winutils.exe and hadoop.dll from this alternative source: <a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/steveloughran\/winutils\/tree\/master\/hadoop-2.7.1\/bin\" target=\"_blank\">https:\/\/github.com\/steveloughran\/winutils\/tree\/master\/hadoop-2.7.1\/bin<\/a><\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Basic Eclipse project for MapReduce applications (based on maven) (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/MapReduceBasicProjectWindows.zip\" target=\"_blank\">MapReduceBasicProjectWindows.zip<\/a>)<\/li>\n<\/ul><\/li>\n<\/ul>\n\n\n<h5 class=\" wp-block-heading eplus-wrapper\">Spark<\/h5>\n\n\n<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-b67180\">\n<li class=\" eplus-wrapper\">Spark exercises (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/02_Spark_Exercises_BigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-16ba8b\">\n<li class=\" eplus-wrapper\">Example data \u2013 One folder with (few) data for each exercise (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/ExSparkData30_46.zip\" target=\"_blank\">ExSparkData.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">RDD-based solutions of Exercises 30-46 \u2013 Jupyter notebooks (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/03\/SolutionsExSpark30_46.zip\" target=\"_blank\">SparkNotebooksSol30_46.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Spark SQL exercises (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/02_Spark_ExerciseSparkSQLNB.pdf\" target=\"_blank\">pdf<\/a>) &#8211; Spark SQL Exam exercise example 4 (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2023\/04\/DistrBD_Example4Spark.pdf\" target=\"_blank\">pdf<\/a>) &#8211; Uploaded on April 29<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-28c1a8\">\n<li class=\" eplus-wrapper\">Example data \u2013 One folder with (few) data for each exercise (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/ExSparkSQLData.zip\" target=\"_blank\">ExSparkSQLData.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Solutions of Exercises 47-50 \u2013 Jupyter notebooks (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/SparkNotebooksSol47_50.zip\" target=\"_blank\">SparkNotebooksSol47_50.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Solution of Exercise Example 4 (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2023\/04\/ExerciseExample4Spark.zip\" target=\"_blank\">ExerciseExample4Spark.zip<\/a>)  &#8211; Uploaded on April 29<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Spark MLlib exercises (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/03_MLlib_Exercises_BigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-104b51\">\n<li class=\" eplus-wrapper\">Example data \u2013 One folder with (few) data for each exercise (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/ExampleMLlibData.zip\" target=\"_blank\">ExampleMLlibData.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Solutions of Exercise 51 (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/SparkNotebooksSol51.zip\" target=\"_blank\">SparkNotebooksSol51.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">GraphFrame exercises (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/04_GraphFrame_Exercises_BigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-5c29ac\">\n<li class=\" eplus-wrapper\">Example data \u2013 One folder with (few) data for each exercise (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/ExampleGraphFrameData.zip\" target=\"_blank\">ExampleGraphFrameData.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Solutions of Exercises 52-57b \u2013 Jupyter notebooks (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/05\/SparkNotebooksSol52_57b.zip\" target=\"_blank\">SparkNotebooksSol52_57b.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Spark streaming exercises (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/05_SparkStreaming_Exercises_BigDataNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-266518\">\n<li class=\" eplus-wrapper\">Example data \u2013 One folder with (few) data for each exercise (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/ExampleSparkStreamingData-1.zip\" target=\"_blank\">ExampleSparkStreamingData.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Solutions of Exercises 58-65 \u2013 Jupyter notebooks (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/SparkNotebooksSol58_65.zip\" target=\"_blank\">SparkNotebooksSol58_65.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Spark structured streaming and MLlib exercise (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/06_SparkStructuredStreamingAndMLlib_ExercisesNB.pdf\" target=\"_blank\">pdf<\/a>)<ul class=\"eplus-wrapper wp-block-list eplus-styles-uid-3381e1\">\n<li class=\" eplus-wrapper\">Example data \u2013 One folder with (few) data for each exercise (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/ExampleSparkStructuredMLlibData.zip\" target=\"_blank\">ExampleSparkStructuredMLlibData.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Solution of Exercise 66 \u2013 Jupyter notebooks (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/04\/SparkNotebooksSol66.zip\" target=\"_blank\">SparkNotebooksSol66.zip<\/a>)<\/li>\n<\/ul><\/li>\n<\/ul>\n\n\n\n\n\n\n\n\n\n\n\n\n<h2 class=\" wp-block-heading eplus-wrapper\">Additional material<\/h2>\n\n\n\n<p class=\" eplus-wrapper\">Slides and screencasts about Java (kindly provided by Prof. Torchiano) (<a href=\"http:\/\/dbdmg.polito.it\/~paolo\/JavaMaterials\/02JEY%20-%20Object%20Oriented%20Programming.html\">link<\/a>)<br>Focus on the following subset of slides\/lectures (for students who have never used Java):<br>&#8212; OO Paradigm and UML (The UML part is not mandatory)<br>&#8212; The Java Environment<br>&#8212;  Java Basic Features<br>&#8212; Java Inheritance<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This web page is related to an old version of the course.The web page of the current instance of the course is available at link. General Information SSD: ING-INF\/05 CFU: 8 Professor: Paolo Garza Teaching Assistant: Luca Colomba Announcements 03-03-2023: Lab activities&#8212; Team 1: Students from A to J &#8211; &hellip;<\/p>\n","protected":false},"author":5,"featured_media":3290,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"editor_plus_copied_stylings":"{}","footnotes":""},"categories":[37],"tags":[],"class_list":["post-5461","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-courses"],"_links":{"self":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts\/5461","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/comments?post=5461"}],"version-history":[{"count":133,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts\/5461\/revisions"}],"predecessor-version":[{"id":7324,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts\/5461\/revisions\/7324"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/media\/3290"}],"wp:attachment":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/media?parent=5461"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/categories?post=5461"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/tags?post=5461"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}