Distributed architectures for big data processing and analytics (2019/2020)


This page has hierarchy - Parent page: Teaching

Table of content

Pay attention that this page is the web page for  to the academic year 2019/2020

General information

  • ECTS: 8
  • Professor: Paolo Garza
  • Teaching assistant: Martino Trevisan

Exam rules

  • Exam rules Academic Year 2019-2020 – ONLINE EXAMINATION SESSION (pdf)
  • Exam rules Academic Year 2019-2020 (pdf)

Announcements

  • (24/02/2020)
    • No lab activities during the first week

Slides

  • Video lectures:
    • Teaching portal / Material/ Virtual classroom
    • or
    • Teaching portal / Material/ Link relativi al corso -> Links to the dropbox copy of the video lectures

 

Exercises

Practices

  • Lab1: Hadoop and MapReduce (Wednesday, March 18 – 13:00-14:30)
  • Lab2: Filter with Hadoop MapReduce  (Friday, March 20 – 13:00-14:30)
  • Lab3: Frequently bought/reviewed together application with Hadoop MapReduce  (Friday, March 27 – 13:00-14:30)
  • Lab4: Normalized ratings for product recommendations with Hadoop MapReduce   (Friday, April 3 – 13:00-14:30)
  • Lab5: Filter data and compute basic statistics with Apache Spark (Friday, April 17 – 13:00-14:30)
  • Lab6: Frequently bought/reviewed together application with Apache Spark (Friday, April 24 – 13:00-14:30)
  • Lab7: Bike sharing data analysis (Wednesday, April 29 – 13:00-14:30)
    • Problem specification (pdf)
    • Sample data (zip)
    • Example KML file (zip)
    • Another KML visualizer that can be used to visualize on a map the result of your analysis: http://kmlviewer.nsspot.net
    • Solution
      • Lab7_Sol1920.zip – Jupyter notebook (Lab7_1920Sol.ipynb) and Python script (Lab7_1920Sol.py)
  • Lab8: Bike sharing data analysis based on Spark SQL (Friday, May 8 – 13:00-14:30)
    • Problem specification (pdf)
    • Sample data (zip)
    • Solution
  • Lab9: A classification pipeline with MLlib + SparkSQL (Friday, May 15 – 13:00-14:30)
  • Lab10: GraphFrame (Friday, May 22 – 13:00-14:30)
  • Lab11: Tweet analysis – Spark streaming (Friday, May 29 – 13:00-14:30)

Exam Examples

Additional material

  • Slides and screencasts about Java (kindly provided by prof. Torchiano) (link)
    • Suggested slides/lectures for those students who have never used Java
      • OO Paradigm and UML (The UML part is not mandatory)
      • The Java Environment
      • Java Basic Features
      • Java Inheritance