Data science lab: process and methods (2020/2021)

General information

CFU: 8
Professor: Elena Baralis
Teaching assistants:
Tania Cerquitelli (Lessons), Andrea Pasini (Python classes)
Giuseppe Attanasio, Flavio Giobergia, Francesco Ventura (Laboratory sessions)


The following are the key dates for the 2021 summer session:

  • June 9, 2021 (by end of day): submission platform opens
  • June 14, 2021: written exam
  • June 23, 2021, 23:59 CEST: submission platform closes


Exam rules: pdf

The report template (adapted from the official IEEE template) in LaTeX (you are strongly encouraged to use this one) or Word format.

Make sure you test your software/hardware setup before taking the exam. You can find a simulation of the exam (using Respondus + Lockdown Browser) from Portale della didattica (Remote Exams -> Take the simulation test)

Submission platform: link

Second call of the winter session:

  • Assignment: pdf
  • Written scores (10/02/2021): pdf
  • Final scores: pdf

First call of the winter session:

  • Assignment: pdf
  • Written scores (25/01/2021): pdf
  • Final scores (25/01/2021): pdf
  • Some of the best-written reports: pdf, pdf, pdf, pdf


  • 06-10-2020. You can register to Piazza whether you are already enrolled in the course or you are waiting for it. If you do not have an address yet, drop an email to including your personal email address and your ID on the Polito website or the Apply procedure – it is in the format FXXXXX. The registration with your personal address is temporary: remember to add your educational address on Piazza as soon as you receive it.
  • 28-09-2020. The laboratories will not take place in the first week of the course.
  • 28-09-2020.  During the semester, we will be using Piazza. Piazza is a collaborative Questions and Answers platform that allows students to post their questions to the teaching staff. Please signup here with your educational email address and use your full name, i.e. Name Surname. We will also post on Piazza useful notes, suggestions, and the code for Python exercises (under the section “Resources” on Piazza).
  • 25-09-2020. The first Python lesson will be on 02 October 2020. We suggest you to bring your own PC with Python3 and Jupyter installed.
    In the “Python” section you can find instruction for installing the necessary software.

Learning material


Data science

This section will contain the slides of the data science course.

  • Course introduction (pdf)
  • Introduction to data science (pdf)
  • Data preprocessing (pdf)
  • Association rules (pdf)
  • Classification (pdf)
  • Regression analysis (pdf)
  • Time series analysis (pdf)
  • Data exploration, Feature Engineering, Data visualization (pdf)
  • Clustering (pdf) – NEW (07/12/2020)
    • Clustering (pdf) OLD

Other material

  • Writing your report (pdf)
  • Thesis proposals (pdf)


  • Use case: Modelling energy efficiency of buildings based on open-data (pdf)
  • Use case: Characterising Electricity Consumption Over Time for Residential Consumers through cluster analysis (pdf)



    This section will contain the slides and material of the Python classes.

    • Exercises on piazza. Here we will publish text and solutions of the exercises solved during Python lectures.
    • Python installation tutorial (pdf)
    • GitHub tutorial (pdf). Github is a useful resource to share your code online and manage version control.


    • Introduction to Python (pdf)
    • Python programming (pdf)
    • Overview of Python libraries and Matplotlib (pdf)
    • Structuring Python projects (pdf)
    • Numpy (pdf)
    • Pandas (pdf)
    • Scikit-learn: classification (pdf)
    • Scikit-learn: regression (pdf)
    • Scikit-learn: clustering (pdf)
    • Scikit-learn: preprocessing (pdf)

Exam exercises

Exercises for the written exam


Laboratory material

This section will contain all the material for carrying out laboratories.

  • Laboratory 1 (7-8 October 2020): pdf – Solution: html
  • Laboratory 2 (14-15 October): pdf – Solution: html
  • Laboratory 3 (21-22 October): pdf (updated 22/10/2020) – Solution: html
  • Laboratory 4 (28-29 October): pdf (updated, EDIT on Equation 3) – Solution: html
  • Laboratory 5 (4-5 November): pdf – Solution: html
  • Laboratory 6 (11-12 November): pdf – Solution: html
  • Laboratory 7 (18-19 November): pdf – Solution: pdf
  • Laboratory 8 (25-26 November): pdf – Solution: pdf
  • Laboratory 9 (2-3 December): pdf – Solution: pdf
  • Laboratory 10 (9-10 December): pdf

Research Bites and Seminars

  • ML in production – Automation of ML pipelines with Luigi – 11/12/2020 – Eliana Pastor: material
  • Image understanding: Tasks and architectures – 14/12/2020 – Andrea Pasini: slides
  • Generative Adversarial Networks: Beyond discriminative models – 14/12/2020 – Moreno La Quatra: slides
  • A brief (and practical) introduction to word embeddings – 18/12/2020 – Flavio Giobergia: slides (demo)
  • From recurrent models to the advent of Attention: a recap – 18/12/2020 – Giuseppe Attanasio: slides
  • Explainable Artificial Intelligence: an introduction to current trends – 18/12/2020 – Francesco Ventura: slides
  • How to start a start-up – 16/10/2020 – Luca de Alfaro: slides

 Parent page 


 © 2021 - DataBase and Data Mining Group