Data Science Lab: process and methods (2021/22)

Data Science Lab: process and methods (2021/22)

General Information


CFU: 8

Professor: Elena Baralis

Teaching Assistants: Tania Cerquitelli, Giuseppe Attanasio, Flavio Giobergia


Exam rules

The exam rules for A.Y. 2021/2022 are available here.

Written Exams

  • Winter call 1 (January 25, 2022)
  • Winter call 2 (February 14, 2022)
    • Results (pdf)
    • Final results (pdf)


Exam SessionAssignmentResultsExample Report *
Winter 2021-2022pdf1st call, 2nd callpdf
Summer 2021-2022
Fall 2021-2022

* Occasionally, we may ask students to publish here their reports in case of very good productions. They will serve as a reference for their colleagues.

Teaching Material

Data science

This section will contain the slides of the data science course.

  • Course introduction (slides)
  • Introduction to data science (slides)
  • Data preprocessing (slides)
  • Association rules (slides)
  • Data exploration, feature engineering and data visualization (slides)
  • Classification fundamentals (slides)
  • Clustering fundamentals (slides)
  • Regression analysis (slides)
  • Time series analysis (slides)


This section will contain the slides of the data science course.


Other material

  • Scientific writing – how to write your report (slides)
  • Use case: Modelling energy efficiency of buildings based on open-data (slides)
  • ML in production: Automation of ML pipelines with Luigi (slides, link repository)

Laboratory Material

This section will contain all the material for carrying out laboratories. No laboratory will be evaluated and assigned a mark, so no laboratory will give additional points to the final exam.


Introduction to laboratories – pdf

In-classroom notes (aka Giuseppe and Flavio’s quick and dirty Python snippets to show something live during laboratories): deepnote

Data Science Lab Environment: link

Laboratory #1: Python lists and dictionarieshtml
Laboratory #2: Tabular and textual datahtml
Laboratory #3: Itemset mining and Association Ruleshtml
Laboratory #4: KNN implementation (NumPy)html
Laboratory #5: Pandashtml
Laboratory #6: Tree-based modelshtml
Laboratory #7: Classification *pdf
Laboratory #8: Modeling time series html
Laboratory #9: Regression *pdf
Laboratory #10: Clustering and K-Meanshtml

* During this laboratory, we will set up Data Science Lab Environment, the online evaluation platform we will use during the leaderboard part of the project.

Team organization

Students will be divided into two teams, Team Orange 🍊 and Team Lime 🍋 Team Orange will attend the laboratories on Monday from 10 to 13. Team Lime, instead, will attend the next day, on Tuesday from 16 to 19. Both the lab sessions will be held in LAIB 3.

Sorting alphabetically by the last name, students in the interval [ABBAS, GUZZETTA] will be Team Orange. While students in the interval [HALEEM, ZU] will be Team Lime.

Since we know that some of you will have a non-stop 9-hours lecture day, we can allow a few team changes. If you want to change team, please insert your student ID in one of the two dedicated columns in the link you received via email.

However, we need to keep the two populations balanced: please do not request a team change unless it is strictly necessary.