Big Data: Architectures and Data Analytics (2018/2019)


This page has hierarchy - Parent page: Teaching

Table of content

General information

  • ECTS: 6
  • Professor: Paolo Garza
  • Students from AA to GZ
    • Teaching assistants:
      • Alessandro Farasin
      • Francesco Ventura
  • Students from HA to ZZ
    • Teaching assistants:
      • Andrea Pasini
      • Marilisa Montemurro

Exam rules

  • Exam rules Academic Year 2018-2019 (pdf)

Announcements

  • (21/01/2020)
    • The exam scheduled for January 24, 2020 will be held at 11:00 in Classroom 3I
    • Please, remember to bring with you:
      • the student card and/or an identity document
      • sheets of paper (“fogli protocollo”)

 

Students from AA to GZ Students from HA to ZZ
  • (04/06/2019) The lectures scheduled for next week are cancelled (from Monday June 10 to Friday June 14)
    • Tuesday, June 11 – from 5.30pm to 7pm (Team 1 – Lab activity) – Cancelled
    • Wednesday, June 12 – from 5.30pm to 7pm (Team 2 – Lab activity) – Cancelled
    • Thursday, June 13 – from 1.00pm to 4.00 pm (Lecture) – Cancelled
    • Friday, June 14 – from 4.00pm to 5.30 pm (Lecture) – Cancelled
  • (26/02/2019)
    • First lecture: Thursday, March 7, 2019 at 13:00 – Room 1B.
  • (26/02/2019)
    • No lab activities during the first two weeks.
  • (04/06/2019) The lectures scheduled for next week are cancelled (from Monday June 10 to Friday June 14)
    • Monday, June 10 – from 11.30 to 13.00 (Lecture) – Cancelled
    • Tuesday, June 11 – from 10.00 to 11.30 (Team 1 – Lab activity) – Cancelled
    • Wednesday, June 12 – from 13.00 to 14.30 (Team 2 – Lab activity) – Cancelled
    • Friday, June 14 – from 8.30 to 11.30 pm (Lecture) – Cancelled
  •  (26/02/2019)
    • First lecture: Monday, March 4, 2019 at 11:30 – Room 8C.
  • (26/02/2019)
    • No lab activities during the first two weeks.

Materials

Exercises

Exam Examples

  • At the exam, the following template will be provided for the exercise based on Hadoop for the Driver part (Hadoop template)
    • For the Spark exercises, no templates are provided
  • Exam example #1
    • Text (pdf)
    • Solution
      • Source code/Eclipse projects (zip)
  • Exam example #2
    • Text (pdf)
    • Solution
      • Source code/Eclipse projects (zip)
  • Exam July 1, 2016
    • Text (pdf)
    • Solution
      • Question 1: (d)
      • Question 2: (b)
      • Source code/Eclipse projects (zip)
  • Exam July 12, 2016
    • Text (pdf)
    • Solution
      • Question 1: (a)
      • Question 2: (a)
      • Source code/Eclipse projects (zip)
  • Exam September 19, 2016
    • Text (pdf)
    • Solution
      • Question 1: (c)
      • Question 2: (a)
      • Source code/Eclipse projects (zip)
  • Exam June 30, 2017
    • Text (pdf)
    • Solution
      • Question 1: (b)
      • Question 2: (c)
      • Source code/Eclipse projects (zip) – Updated on June 12, 2019
  • Exam July 14, 2017
    • Text (pdf)
    • Solution
      • Question 1: (d)
      • Question 2: (c)
      • Source code/Eclipse projects (zip)
  • Exam September 14, 2017
    • Text (pdf)
    • Solution
      • Question 1: (a)
      • Question 2: (b)
      • Source code/Eclipse projects (zip)
  • Exam January 22, 2018
    • Text (pdf)
    • Solution
      • Question 1: (b)
      • Question 2: (b)
      • Source code/Eclipse projects (zip)
  • Exam June 26, 2018
    • Text Version #1 (pdf)
      • Draft of the solution
        • Question 1: (c)
        • Question 2: (c)
        • Source code/Eclipse projects (zip)
    • Text Version #2 (pdf)
      • Draft of the solution
        • Question 1: (b)
        • Question 2: (c)
        • Source code/Eclipse projects (zip)
  • Exam July 16, 2018
    • Text Version #1 (pdf)
      • Draft of the solution
        • Question 1: (d)
        • Question 2: (a)
        • Source code/Eclipse projects (zip)
    • Text Version #2 (pdf)
      • Draft of the solution
        • Question 1: (b)
        • Question 2: (d)
        • Source code/Eclipse projects (zip)
  • Exam September 3, 2018
    • Text Version #1 (pdf)
      • Draft of the solution
        • Question 1: (d)
        • Question 2: (c)
        • Source code/Eclipse projects (zip)
    • Text Version #2 (pdf)
      • Draft of the solution
        • Question 1: (b)
        • Question 2: (c)
  • Exam February 15, 2019
    • Text Version #1 (pdf)
      • Draft of the solution
        • Question 1: (d)
        • Question 2: (c)
        • Source code/Eclipse projects (zip)
    • Text Version #2 (pdf)
      • Draft of the solution
        • Question 1: (d)
        • Question 2: (b)
  • Exam July 2, 2019
    • Text Version #1 (pdf)
      • Draft of the solution
        • Question 1: (a)
        • Question 2: (b)
        • Source code/Eclipse projects (zip)
    • Text Version #2 (pdf)
      • Draft of the solution
        • Question 1: (a)
        • Question 2: (b)
        • Source code/Eclipse projects (zip)
  • Exam July 18, 2019
    • Text Version #1 (pdf)
      • Draft of the solution
        • Question 1: (b)
        • Question 2: (b)
        • Source code/Eclipse projects (zip)
    • Text Version #2 (pdf)
      • Draft of the solution
        • Question 1: (c)
        • Question 2: (b)
        • Source code/Eclipse projects (zip)
  • Exam September 19, 2019
    • Text (pdf)
      • Draft of the solution
        • Question 1: (b)
        • Question 2: (b)
  • Exam September 19, 2019
    • Text Version #1 (pdf)
      • Draft of the solution
        • Question 1: (c)
        • Question 2: (b)
    • Text Version #2 (pdf)
      • Draft of the solution
        • Question 1: (a)
        • Question 2: (c)

Practices

  • No lab activities during the first two weeks
  • Schedule of the lab activities
      • Students from AA to GZ Students from HA to ZZ
        • TEAM 1: Students from AA to CI – Tuesday from 5.30pm to 7pm
        • TEAM 2: Students from CL to GZ – Wednesday from 5.30pm to 7pm
        • Team 1 Team 2
          Lab #1 Tuesday, March 19 – from 5.30pm to 7pm Wednesday, March 20 – from 5.30pm to 7pm
          Lab #2 Tuesday, March 26 – from 5.30pm to 7pm Thursday, March 27 – from 5.30pm to 7pm
          Lab #3 Tuesday, April 2 – from 5.30pm to 7pm Wednesday, April 3 – from 5.30pm to 7pm
          Lab #4 Tuesday, April 9 – from 5.30pm to 7pm Wednesday, April 10 – from 5.30pm to 7pm
          Lab #5 Tuesday, April 16 – from 5.30pm to 7pm Wednesday, April 17 – from 5.30pm to 7pm
          Lab #6 Tuesday, May 7 – from 5.30pm to 7pm Wednesday, May 8 – from 5.30pm to 7pm
          Lab #7 Tuesday, May 14 – from 5.30pm to 7pm Wednesday, May 15 – from 5.30pm to 7pm
          Lab #8 Tuesday, May 21 – from 5.30pm to 7pm Wednesday, May 22 – from 5.30pm to 7pm
          Lab #9 Tuesday, May 28 – from 5.30pm to 7pm Wednesday, May 27 – from 5.30pm to 7pm
          Lab #10 Tuesday, June 4 – from 5.30pm to 7pm Wednesday, June 5 – from 5.30pm to 7pm
        • TEAM 1: Students from HA to QZ – Tuesday from 10am to 11.30am
        • TEAM 2: Students from RA to ZZ – Wednesday from 1pm to 2.30pm
        • Team 1 Team 2
          Lab #1 Tuesday, March 19 – from 10am to 11.30am Wednesday, March 20 – from 1pm to 2.30pm
          Lab #2 Tuesday, March 26 – from 10am to 11.30am Wednesday, March 27 – from 1pm to 2.30pm
          Lab #3 Tuesday, April 2 – from 10am to 11.30am Wednesday, April 4 – from 1pm to 2.30pm
          Lab #4 Tuesday, April 9 – from 10am to 11.30am Wednesday, April 10 – from 1pm to 2.30pm
          Lab #5 Tuesday, April 16 – from 10am to 11.30am Wednesday, April 17 – from 1pm to 2.30pm
          Lab #6 Tuesday, May 7 –  from 10am to 11.30am Wednesday, May 8 – from 1pm to 2.30pm
          Lab #7 Tuesday, May 14 – from 10am to 11.30am Wednesday, May 15 – from 1pm to 2.30pm
          Lab #8 Tuesday, May 21 – from 10am to 11.30am Wednesday, May 22 – from 1pm to 2.30pm
          Lab #9 Tuesday, May 28 – from 10am to 11.30am Wednesday, May 29 – from 1pm to 2.30pm
          Lab #10 Tuesday, June 4 – from 10am to 11.30am Wednesday, June 5 – from 1pm to 2.30pm

  • Lab2: Filter with Hadoop MapReduce

  • Lab3: Frequently bought/reviewed together application with Hadoop MapReduce

  • Lab4: Normalized ratings for product recommendations with Hadoop MapReduce

  • Lab9: A classification pipeline with MLlib + SparkSQL
    • Text (pdf)
    • Skeleton Eclipse project – Spark (Lab9_Template.zip)
    • Sample file with 100 reviews (ReviewsSample.csv)
    • Solution
      • Logistic regression (zip)
      • DecisionTree (zip)
      • Logistic regression based on text analysis (zip)
      • DecisionTree based on text analysis (zip)

Additional materials

  • Slides and screencasts about Java (kindly provided by prof. Torchiano) (link)
    • Suggested slides/lectures for those students who do not know Java
      • OO Paradigm and UML (The UML part in not mandatory)
      • The Java Environment
      • Java Basic Features
      • Java Inheritance