{"id":7097,"date":"2023-09-26T20:25:45","date_gmt":"2023-09-26T18:25:45","guid":{"rendered":"https:\/\/dbdmg.polito.it\/dbdmg_web\/?p=7097"},"modified":"2024-11-28T19:37:44","modified_gmt":"2024-11-28T18:37:44","slug":"big-data-processing-and-analytics-2023-24","status":"publish","type":"post","link":"https:\/\/dbdmg.polito.it\/dbdmg_web\/2023\/big-data-processing-and-analytics-2023-24\/","title":{"rendered":"Big Data Processing and Analytics (2023\/24)"},"content":{"rendered":"\n<h2 class=\" wp-block-heading eplus-wrapper\" id=\"general-information\">General Information<\/h2>\n\n\n\n<p class=\" eplus-wrapper\"><strong>SSD<\/strong>: ING-INF\/05<\/p>\n\n\n\n<p class=\" eplus-wrapper\"><strong>CFU<\/strong>: 6<\/p>\n\n\n\n<p class=\" eplus-wrapper\"><strong>Professor<\/strong>: Paolo Garza<\/p>\n\n\n\n<p class=\" eplus-wrapper\"><strong>Teaching Assistant<\/strong>: Luca Colomba<\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\" wp-block-spacer eplus-wrapper\"><\/div>\n\n\n\n<h3 class=\" wp-block-heading eplus-wrapper\" id=\"announcements\">Announcements<\/h3>\n\n\n<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-b75b5b\">\n<li class=\" eplus-wrapper\">26-09-23: The first lecture is scheduled for October 2, 2023, at 16:00 in Classroom 4P<\/li>\n\n\n\n<li class=\" eplus-wrapper\">26-09-23: No lab activities during the first week of the course<\/li>\n<\/ul>\n\n\n<hr class=\" wp-block-separator has-css-opacity eplus-wrapper\"\/>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\" wp-block-spacer eplus-wrapper\"><\/div>\n\n\n\n<h3 class=\" wp-block-heading eplus-wrapper\" id=\"teaching-material\">Teaching Material<\/h3>\n\n\n\n<h5 class=\" wp-block-heading eplus-wrapper\">INTRODUCTION<\/h5>\n\n\n<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-109aed\">\n<li class=\" eplus-wrapper\">Introduction to the course content and exam rules (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2023\/09\/00_Intro_BigDataProcessing_2324.pdf\" data-type=\"link\" data-id=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2023\/09\/00_Intro_BigDataProcessing_2324.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">slides<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Introduction to Big Data (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/01_Intro_BigData_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Big Data Architectures (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/02_Architectures_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n<\/ul>\n\n\n<h5 class=\" wp-block-heading eplus-wrapper\">HADOOP AND MAPREDUCE<\/h5>\n\n\n<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-1c95be\">\n<li class=\" eplus-wrapper\">Introduction to Apache Hadoop and the MapReduce programming paradigm (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/03_Intro_HadoopAndMapReduce_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>) <ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-a81101\">\n<li class=\" eplus-wrapper\">Interaction with HDFS and Hadoop by means of the command line (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/03b_HDFS_Hadoop_CommandLine_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Hadoop implementation of MapReduce (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/04_HadoopImplementationOfMapReduce_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-07824c\">\n<li class=\" eplus-wrapper\">BigData@Polito environment + Jupyter \u2013 How to submit MapReduce jobs on BigData@Polito (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/04b_ClusterJupyter_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">MapReduce and Hadoop \u2013 Advanced Topics: Multiple inputs, Multiple outputs, Distributed cache (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/06_AdvancedTopicsMapReduce_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">MapReduce \u2013 Design patterns \u2013 Part 1 (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/05_MapReduce_Patterns_Part1_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">MapReduce \u2013 Design patterns \u2013 Part 2 (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/07_MapReduce_Patterns_Part2_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">MapReduce \u2013 Relational Algebra\/SQL operators (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/08_SQLOperators_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n<\/ul>\n\n\n<h5 class=\" wp-block-heading eplus-wrapper\">SPARK<\/h5>\n\n\n<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-c3e960\">\n<li class=\" eplus-wrapper\">Introduction to Apache Spark (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/10_SparkIntroduction_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-4ac451\">\n<li class=\" eplus-wrapper\">How to submit Spark applications (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/10b_SparkSubmit_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">RDD-based programs RDDs<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-094eb9\">\n<li class=\" eplus-wrapper\">Creation, basic transformations, and actions (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/11_SparkRDD_Basic_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Key-value pair RDDs: transformations and actions on PairRDDs (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/12_SparkRDD_PairRDD_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">DoubleRDDs (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/13_SparkRDD_DoubleRDD_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Advanced Topics: Cache, accumulators, broadcast variables (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/14_SparkRDD_AdvancedTopics_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Spark SQL, Datasets, and DataFrames (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/15_SparkSQL_Datasets_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-5ec152\">\n<li class=\" eplus-wrapper\">Spark SQL \u2013 Join examples (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/ExamplesSparkSQLJoins.zip\" target=\"_blank\">ExamplesSparkSQLJoins.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Data Mining<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-c0ae01\">\n<li class=\" eplus-wrapper\">Recap data mining tasks (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/DMintro.pdf\" target=\"_blank\">slides<\/a>) \u2013 From the \u201cData Science And Database Technology\u201d course<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Spark MLlib<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-8cd06b\">\n<li class=\" eplus-wrapper\">Introduction and Classification of structured data (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/16_SparkMLlib_Part1_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-9a313d\">\n<li class=\" eplus-wrapper\">Logistic Regression example code (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/MLlibPipelineLogisticRegression.zip\" target=\"_blank\">zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Decision Trees example code (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/MLlibPipelineDecisionTree.zip\" target=\"_blank\">zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Decision Trees and Categorical class label example code (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/MLlibPipelineDecisionTreeCategoricalLabel.zip\" target=\"_blank\">zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Classification of textual data (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/17_SparkMLlib_Part2_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-d026a9\">\n<li class=\" eplus-wrapper\">Textual data classification example code (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/MLlibPipelineText.zip\" target=\"_blank\">zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Classification and Parameter tuning (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/18_SparkMLlib_Part3_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-5fd4ab\">\n<li class=\" eplus-wrapper\">Parameter tuning example code (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/MLlibPipelineLogisticRegressionCrossValidation.zip\" target=\"_blank\">zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Clustering of structured data (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/19_SparkMLlib_Part4_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-ef6f47\">\n<li class=\" eplus-wrapper\">Clustering example code (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/MLlibPipelineClustering.zip\" target=\"_blank\">zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Itemset and Association rule mining (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/20_SparkMLlib_Part5_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-5da072\">\n<li class=\" eplus-wrapper\">Itemset and Association rule mining example code (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/MLlibFPGrowth.zip\" target=\"_blank\">zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Linear regression (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/21_SparkMLlib_Part6_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-af89bd\">\n<li class=\" eplus-wrapper\">Linear regression example code (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/MLlibPipelineLinearRegression.zip\" target=\"_blank\">zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Spark Streaming (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/12\/22_SparkStreaming_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-af09e8\">\n<li class=\" eplus-wrapper\">Examples: Word Count \u2013 Streaming versions (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/12\/ExamplesSparkStreaming.zip\" target=\"_blank\">zip<\/a>)<\/li>\n<\/ul><\/li>\n<\/ul><\/li>\n<\/ul>\n\n\n<h4 class=\" wp-block-heading eplus-wrapper\" id=\"exercise\">Exercises<\/h4>\n\n\n\n<h5 class=\" wp-block-heading eplus-wrapper\">MAP REDUCE<mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-accent-2-color\"> <\/mark><\/h5>\n\n\n<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-e1094c\">\n<li class=\" eplus-wrapper\">MapReduce exercises (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/01_MapReduce_Exercises_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-494e7a\">\n<li class=\" eplus-wrapper\">Solutions of Exercises 1-29 (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/SolutionsExMapReduce.zip\" target=\"_blank\">SolutionsExMapReduce.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">How to Write and Compile your Java Application using VSCode (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2023\/10\/BigData_labs-VSCode_guide.pdf\" data-type=\"link\" data-id=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2023\/10\/BigData_labs-VSCode_guide.pdf\" target=\"_blank\">pdf<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Linux or Mac: Basic project for MapReduce applications (based on maven) (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/MapReduceBasicProject.zip\" target=\"_blank\">MapReduceBasicProject.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Windows: Basic project for MapReduce applications (based on maven) (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/MapReduceBasicProjectWindows.zip\" target=\"_blank\">MapReduceBasicProjectWindows.zip<\/a>)\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-db877e\">\n<li class=\" eplus-wrapper\">How to configure the Windows environment to run MapReduce applications locally on your PC(<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/ConfigureWindowsEnviroment.pdf\" target=\"_blank\">ConfigureWindowsEnviroment.pdf<\/a>)<\/li>\n<\/ul>\n\n<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-fd4652\">\n<li class=\" eplus-wrapper\"><strong>You must also install<\/strong> <strong>JDK 1.8<\/strong> and select it for the imported project inside the IDE. If you already installed the JDK environment but the version is greater than JDK 1.8 you must also install<strong> JDK 1.8<\/strong>.<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Winutils executable (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/09\/winutils.zip\" target=\"_blank\">winutils.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n<li class=\" eplus-wrapper eplus-styles-uid-2689a9\">If you use your PC to write and run your code locally, use the projects based on Maven (those projects can be run locally).<\/li>\n\n<li class=\" eplus-wrapper eplus-styles-uid-2689a9\">If you use the PC available in the LAB, import the projects with libraries as reported in the first lab (those projects cannot be run locally but only on the cluster exporting the project jar file).<\/li><\/ul>\n\n\n<h5 class=\" wp-block-heading eplus-wrapper\">SPARK<\/h5>\n\n\n<ul class=\"eplus-ce5B3z wp-block-list eplus-wrapper eplus-styles-uid-6b2245\">\n<li class=\" eplus-wrapper\">Spark RDD-, Dataset-, DataFrame-based exercises (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/02_Spark_Exercises_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-4c5a4e\">\n<li class=\" eplus-wrapper\">Example data \u2013 One folder with (few) data for each exercise (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/ExampleDataSpark.zip\" target=\"_blank\">ExampleDataSpark.zip<\/a>)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Solutions of Exercises 30-50 (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/10\/SolutionsSpark30-50.zip\" target=\"_blank\" rel=\"noreferrer noopener\">SolutionsExSpark30-50.zip<\/a>) &#8211; <strong>Updated December 6, 2023<\/strong> &#8211; A new alternative solution for Exercise 44, based on left outer join, has been uploaded (Exercise44v2).<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Solutions of Exercises from 32 to 38 and 44 based on Spark SQL (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/11\/SolSparkSQL32-38_44.zip\" target=\"_blank\" rel=\"noreferrer noopener\">SolSparkSQL32-38_44.zip<\/a>) &#8211; <strong>Updated December 6, 2023<\/strong> &#8211; One more alternative solution for Exercise 44 has been uploaded.<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Spark MLlib (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/12\/03_MLlib_Exercises_BigData_NewStyle.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">slides<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-da51b1\">\n<li class=\" eplus-wrapper\">Solution (<a href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2022\/12\/MLlibExampleExercise.zip\" target=\"_blank\" rel=\"noreferrer noopener\">MLlibExampleExercise.zip<\/a>)<\/li>\n<\/ul><\/li>\n\n\n\n<li class=\" eplus-wrapper\">Spark streaming exercises (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/12\/03_SparkStreaming_Exercises_BigData_NewStyle.pdf\" target=\"_blank\">slides<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-bac1ff\">\n<li class=\" eplus-wrapper\">Solutions of Exercises 51-53 (<a rel=\"noreferrer noopener\" href=\"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-content\/uploads\/2021\/12\/SolutionsSparkStreaming51-53.zip\" target=\"_blank\">SolutionsSparkStreaming51_53.zip<\/a>)<\/li>\n<\/ul><\/li>\n<\/ul>\n\n\n<hr class=\" wp-block-separator has-css-opacity eplus-wrapper\"\/>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\" wp-block-spacer eplus-wrapper\"><\/div>\n\n\n\n<h3 class=\" wp-block-heading eplus-wrapper\" id=\"laboratory-material\">Laboratory Material<\/h3>\n\n\n\n\n\n\n\n\n\n\n\n\n\n<hr class=\" wp-block-separator has-css-opacity eplus-wrapper\"\/>\n\n\n\n\n\n\n\n<hr class=\" wp-block-separator has-alpha-channel-opacity eplus-wrapper\"\/>\n\n\n\n<h3 class=\" wp-block-heading eplus-wrapper\" id=\"additional-material\">Additional material<\/h3>\n\n\n<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-f8aa42\">\n<li class=\" eplus-wrapper\">Slides and screencasts about Java (kindly provided by Prof. Torchiano) (<a href=\"http:\/\/dbdmg.polito.it\/~paolo\/JavaMaterials\/02JEY%20-%20Object%20Oriented%20Programming.html\">link<\/a>)<ul class=\" wp-block-list eplus-wrapper eplus-styles-uid-6541e1\">\n<li class=\" eplus-wrapper\">Suggested slides\/lectures for those students who have never used Java<\/li>\n\n\n\n<li class=\" eplus-wrapper\">OO Paradigm and UML (The UML part is not mandatory)<\/li>\n\n\n\n<li class=\" eplus-wrapper\">The Java Environment<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Java Basic Features<\/li>\n\n\n\n<li class=\" eplus-wrapper\">Java Inheritance<\/li>\n<\/ul><\/li>\n<\/ul>\n\n\n<div class=\"wp-block-buttons eplus-wrapper is-layout-flex wp-block-buttons-is-layout-flex\"><\/div>\n","protected":false},"excerpt":{"rendered":"<p>General Information SSD: ING-INF\/05 CFU: 6 Professor: Paolo Garza Teaching Assistant: Luca Colomba Announcements Teaching Material INTRODUCTION HADOOP AND MAPREDUCE SPARK Exercises MAP REDUCE SPARK Laboratory Material Additional material<\/p>\n","protected":false},"author":5,"featured_media":4585,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"editor_plus_copied_stylings":"{}","footnotes":""},"categories":[37],"tags":[],"class_list":["post-7097","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-courses"],"_links":{"self":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts\/7097","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/comments?post=7097"}],"version-history":[{"count":94,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts\/7097\/revisions"}],"predecessor-version":[{"id":10805,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/posts\/7097\/revisions\/10805"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/media\/4585"}],"wp:attachment":[{"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/media?parent=7097"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/categories?post=7097"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dbdmg.polito.it\/dbdmg_web\/wp-json\/wp\/v2\/tags?post=7097"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}