Big data: architectures and data analytics (2015/2016)
Table of content
- General information
- Exam rules
- Announcements
- Materials
- Seminars
- Exercises
- Exam Examples
- Practices
- Additional materials
AY 2015/2016
New web page: https://dbdmg.polito.it/wordpress/teaching/big-data-architectures-and-data-analytics-20162017/
General information
- ECTS: 6
- Professor: Paolo Garza
- Assistant lecturer:Luca Venturini
Exam rules
- Exam rules Academic Year 2015-2016 (pdf)
Materials
- Introduction to the course (2 slides per page, 6 slides per page)
- Introduction to Big Data (2 slides per page, 6 slides per page)
- Hadoop and MapReduce
- Introduction to Apache Hadoop and the MapReduce programming paradigm (2 slides per page, 6 slides per page)
- Hadoop implementation of MapReduce – Basic structure of MapReduce programs in Hadoop (2 slides per page, 6 slides per page)
- Source code of the Word Count Ecplise project (WordCount.zip) – Use the import option to import it in Eclipse
- PDF version of the code (i.e., PDF version of the java files) (WordCountPDF.zip)
- Interaction with HDFS and Hadoop by means of the command line (2 slides per page, 6 slides per page)
- MapReduce programs and Hadoop – Part 2 (2 slides per page, 6 slides per page)
- MapReduce programs and Hadoop – Part 3 (2 slides per page, 6 slides per page)
- MapReduce – Design patterns – Part 1 (2 slides per page, 6 slides per page)
- MapReduce – Multiple Inputs and Multiple Outputs (2 slides per page, 6 slides per page)
- MapReduce – Distributed cache (2 slides per page, 6 slides per page)
- MapReduce – Design patterns – Part 2 (2 slides per page, 6 slides per page)
- MapReduce – Relational Algebra/SQL operators (2 slides per page, 6 slides per page)
- MapReduce – Hadoop internals (2 slides per page, 6 slides per page)
- Spark
- Introduction to Apache Spark (2 slides per page, 6 slides per page)
- Introduction to Apache Spark – Part 2 (2 slides per page, 6 slides per page)
- Source code of the Spark Word Count Ecplise project (SparkWordCount.zip)
- PDF version of the code (i.e., PDF version of the java files) (SparkWordCountPDF.pdf)
- RDD-based programs (RDDs creation and basic transformations) – Part 1 (2 slides per page, 6 slides per page)
- RDD-based programs (RDDs basic actions) – Part 2 (2 slides per page, 6 slides per page)
- How to submit a Spark application (2 slides per page, 6 slides per page)
- RDD-based programs (key-value pair RDDs) – Part 3 (2 slides per page, 6 slides per page)
- RDD-based programs (transformations on two PairRDDs and actions on PairRDDs) – Part 4 (2 slides per page, 6 slides per page)
- RDD-based programs (DoubleRDDs) – Part 5 (2 slides per page, 6 slides per page)
- RDD-based programs (Cache, accumulators, broadcast variables) – Part 6 (2 slides per page, 6 slides per page)
- Spark SQL
- Spark SQL (2 slides per page, 6 slides per page)
- Data Mining – Recap
- Introduction (2 slides per page, 6 slides per page)
- Data and Preprocessing (2 slides per page, 6 slides per page)
- Itemset mining and Association rules (2 slides per page, 6 slides per page)
- Classification (2 slides per page, 6 slides per page)
- Clustering (2 slides per page, 6 slides per page)
- Spark MLlib
- Spark MLlib – Part 1 (2 slides per page, 6 slides per page)
- Spark MLlib – Part 2 (2 slides per page, 6 slides per page)
- Spark MLlib – Part 3 (2 slides per page, 6 slides per page)
- Spark MLlib – Part 4 (2 slides per page, 6 slides per page)
- Spark MLlib – Part 5 (2 slides per page, 6 slides per page)
- Spark MLlib – Part 6 (2 slides per page, 6 slides per page)
- Introduction to Apache Hadoop and the MapReduce programming paradigm (2 slides per page, 6 slides per page)
- Hadoop implementation of MapReduce – Basic structure of MapReduce programs in Hadoop (2 slides per page, 6 slides per page)
- Source code of the Word Count Ecplise project (WordCount.zip) – Use the import option to import it in Eclipse
- PDF version of the code (i.e., PDF version of the java files) (WordCountPDF.zip)
- Interaction with HDFS and Hadoop by means of the command line (2 slides per page, 6 slides per page)
- MapReduce programs and Hadoop – Part 2 (2 slides per page, 6 slides per page)
- MapReduce programs and Hadoop – Part 3 (2 slides per page, 6 slides per page)
- MapReduce – Design patterns – Part 1 (2 slides per page, 6 slides per page)
- MapReduce – Multiple Inputs and Multiple Outputs (2 slides per page, 6 slides per page)
- MapReduce – Distributed cache (2 slides per page, 6 slides per page)
- MapReduce – Design patterns – Part 2 (2 slides per page, 6 slides per page)
- MapReduce – Relational Algebra/SQL operators (2 slides per page, 6 slides per page)
- MapReduce – Hadoop internals (2 slides per page, 6 slides per page)
- Introduction to Apache Spark (2 slides per page, 6 slides per page)
- Introduction to Apache Spark – Part 2 (2 slides per page, 6 slides per page)
- Source code of the Spark Word Count Ecplise project (SparkWordCount.zip)
- PDF version of the code (i.e., PDF version of the java files) (SparkWordCountPDF.pdf)
- RDD-based programs (RDDs creation and basic transformations) – Part 1 (2 slides per page, 6 slides per page)
- RDD-based programs (RDDs basic actions) – Part 2 (2 slides per page, 6 slides per page)
- How to submit a Spark application (2 slides per page, 6 slides per page)
- RDD-based programs (key-value pair RDDs) – Part 3 (2 slides per page, 6 slides per page)
- RDD-based programs (transformations on two PairRDDs and actions on PairRDDs) – Part 4 (2 slides per page, 6 slides per page)
- RDD-based programs (DoubleRDDs) – Part 5 (2 slides per page, 6 slides per page)
- RDD-based programs (Cache, accumulators, broadcast variables) – Part 6 (2 slides per page, 6 slides per page)
- Spark SQL (2 slides per page, 6 slides per page)
- Introduction (2 slides per page, 6 slides per page)
- Data and Preprocessing (2 slides per page, 6 slides per page)
- Itemset mining and Association rules (2 slides per page, 6 slides per page)
- Classification (2 slides per page, 6 slides per page)
- Clustering (2 slides per page, 6 slides per page)
- Spark MLlib – Part 1 (2 slides per page, 6 slides per page)
- Spark MLlib – Part 2 (2 slides per page, 6 slides per page)
- Spark MLlib – Part 3 (2 slides per page, 6 slides per page)
- Spark MLlib – Part 4 (2 slides per page, 6 slides per page)
- Spark MLlib – Part 5 (2 slides per page, 6 slides per page)
- Spark MLlib – Part 6 (2 slides per page, 6 slides per page)
Exercises
- MapReduce Exercises – Part 1 (2 slides per page, 6 slides per page)
- MapReduce Exercises – Part 2 (2 slides per page, 6 slides per page)
- Solutions – Part 1 and 2
- Source code/Eclipse projects (SolutionsExercisesPart1_Part2.zip)
- PDF version of the solutions (i.e., PDF version of the java files) (SolutionsExercisesPart1_Part2_pdfversion.zip)
- Solutions – Part 1 and 2
- MapReduce Exercises – Part 3 (2 slides per page, 6 slides per page)
- Solutions – Part 3
- Source code/Eclipse projects (SolutionsExercisesPart3.zip)
- PDF version of the solutions (i.e., PDF version of the java files) (SolutionsExercisesPart3_pdfversion.zip)
- Solutions – Part 3
- MapReduce Exercises – Part 4 (2 slides per page, 6 slides per page)
- Solutions – Part 4
- Source code/Eclipse projects (SolutionsExercisesPart4.zip)
- PDF version of the solutions (i.e., PDF version of the java files) (SolutionsExercisesPart4_pdfversion.zip)
- Solutions – Part 4
- MapReduce Exercises – Part 5 (2 slides per page, 6 slides per page)
- Solutions – Part 5
- Source code/Eclipse projects (SolutionsExercisesPart5.zip)
- PDF version of the solutions (i.e., PDF version of the java files) (SolutionsExercisesPart5_pdfversion.zip)
- Solutions – Part 5
- MapReduce Exercises – Part 6 (2 slides per page, 6 slides per page)
- Solutions – Part 6
- Source code/Eclipse projects (SolutionsExercisesPart6.zip)
- PDF version of the solutions (i.e., PDF version of the java files) (SolutionsExercisesPart6_pdfversion.zip)
- Solutions – Part 6
- MapReduce Exercises – Part 7 (2 slides per page, 6 slides per page)
- Solutions – Part 7
- Source code/Eclipse projects (SolutionsExercisesPart7.zip)
- PDF version of the solutions (i.e., PDF version of the java files) (SolutionsExercisesPart7_pdfversion.zip)
- Solutions – Part 7
- Spark Exercises – Part 8 (2 slides per page, 6 slides per page)
- Solutions – Part 8
- Source code/Eclipse projects (SolutionsExercisesPart8.zip)
- PDF version of the solutions (i.e., PDF version of the java files) (SolutionsExercisesPart8_pdfversion.zip)
- Solutions – Part 8
- Spark Exercises – Part 9 (2 slides per page, 6 slides per page)
- Solutions – Part 9
- Source code/Eclipse projects (SolutionsExercisesPart9.zip)
- PDF version of the solutions (i.e., PDF version of the java files) (SolutionsExercisesPart9_pdfversion.zip)
- Solutions – Part 9
- Spark Exercises – Part 10 (2 slides per page, 6 slides per page)
- Solutions – Part 10
- Source code/Eclipse projects (SolutionsExercisesPart10.zip)
- PDF version of the solutions (i.e., PDF version of the java files) (SolutionsExercisesPart10_pdfversion.zip)
- Solutions – Part 10
- Spark Exercises – Part 11 (2 slides per page, 6 slides per page)
- Solutions – Part 11
- Source code/Eclipse projects (SolutionsExercisesPart11.zip)
- PDF version of the solutions (i.e., PDF version of the java files) (SolutionsExercisesPart11_pdfversion.zip)
- Solutions – Part 11
- Spark Exercises – Part 12 (2 slides per page, 6 slides per page)
- Solutions – Part 12
- Source code/Eclipse projects (SolutionsExercisesPart12.zip)
- PDF version of the solutions (i.e., PDF version of the java files) (SolutionsExercisesPart12_pdfversion.zip)
- Solutions – Part 12