Data Management and Visualization (2023-24)

Data Management and Visualization (2023-24)

General Information

Lecturers: Daniele Apiletti, Diego Monti

Teaching staff: Alessandro Fiori, Simone Monaco

SSD: ING-INF/05 – CFU: 8 – course details from the official student guide

Q&A teaching assistance on Piazza:

📰 News

  • We are using Piazza for class discussion. We invite all students to join the course Piazza. Piazza is highly suggested to get help fast and efficiently from classmates and teachers. Rather than emailing questions to the teaching staff, students are invited to post their questions on Piazza.
  • Seminar announcement – Data science and retail: how Data Management and Machine Learning can improve customer experience
    • When: Monday, January 15th, 14:30-16:00, classroom 1P
    • Who: Marco Stella, Data Science Manager, Miroglio Group
    • What: Miroglio Group ambition is to become a Data and Customer centric company. How to propose the best personalized product for each customer? Can new technologies facilitate this goal? In this seminar we will explain how we are implementing a Cloud Data Platform to know and understand better our customers. We will go through a use case where Machine Learning algorithms are applied to customer segmentation, hence facilitating, pushing and harmonizing marketing product campaigns.
    • Material: slides
  • Seminar announcement – PowerBI: a free tool for Big Data Management and Advanced Visualization
    • When: Tuesday, January 16th, 11:30-13:00, classroom R3
    • Who: Victor Rivas, Evergrow BI
    • What: Unlock the full potential of Power BI, a premier, yet free tool for Windows users. The seminar will guide you through the strategic use of Power BI for managing and visualizing Big Data using innovative features like Composite Models and Direct Lake in Fabric. Glean insights into the comparative advantages of tabular versus multidimensional models, and learn to craft visual narratives that make data speak volumes.
    • Material: intro by Adam Saxon, Program Manager, Microsoft.
  • Seminar announcement – The power of Data Modeling in Business Analytics
    • When: Tuesday, January 16th, 13:00-14:30, classroom R3
    • Who: Marco Russo, SQLBI
    • What: This seminar focuses on different data models used in business analytics. Using the example of “Tickit”, a fictional website for buying and selling event tickets, we’ll examine how data models work in real-world scenarios. We’ll look at typical raw data (like denormalized files and OLTP data source, and then compare different analytical models like Inmon (data warehouse) and Kimball (dimensional modeling), and then introduce the additional features of semantic models to create reports in an interactive way and write shorter queries.
    • Material: slides and demo

📒 Teaching material

Course introduction (slides)

  • Introduction (slides)
  • Conceptual and logical design (slides)
  • Data analysis, OLAP, extended SQL (slides)
  • ETL process (slides)
  • Materialized view (slides)
  • Data warehousing in Oracle (slides)
  • Data warehousing: physical design (slides)
  • Conceptual schema: textual formalism (slides)
  • Non-relational databases for data management – introduction (slides)
  • Introduction to MongoDB, collections, create, delete, GUI (slides)
  • MongoDB, querying data, find operator, aggregation pipeline (slides)
  • MongoDB aggregation examples, indexes (slides)
  • Distributed Data Management, replication, and the CAP theorem (slides)
  • MongoDB replica set (slides, updated Nov 18)
  • Distributed transactions (slides)
  • Distributed data processing and Map Reduce (slides)
  • NoSQL design recipe (slides)
  • MongoDB design patterns part 1 (slide)
  • MongoDB design patterns part 2 (slide)

🗒️ Exercises

💻 Laboratory material

The first lab is scheduled for Thursday, October 19

Students groupTimeRoom
TEAM A (FROM A TO K)Thursday, 16:00 – 17:30LAIB2B
TEAM B (FROM L TO Z)Thursday, 17:30 – 19:00LAIB2B

For Labs 1 and 2, you need to run Extended SQL on Oracle databases. SQL Developer is already available on the devices. If you want to practise at home (or on PoliTO devices as well), we suggest you configure its online version on Oracle Live SQL, in particular:

  • You can add tables using SQL scripts
  • A short guide on how to import SQL scripts and query the DB in Oracle Live SQL is available (pdf)

Lab 1: Extended SQL

Lab 2: Extended SQL

  • Text – Additional queries (pdf)
  • Solution (pdf)

Lab 3: Looker Studio

Lab 4: MongoDB Compass

  • Text (pdf)
  • Data (link)
  • Draft solution (pdf) – updated 2023-11-20

Lab 5: MongoDB replica set

  • Text (pdf)
  • Script and Data (file)
  • Docker Compose (file)
  • Draft solution (pdf)

Lab 6: Visualization analysis

Lab 7: Redesign with Tableau

  • Solution (zip)

Lab 8: Visualization of a dataset

  • Solution (zip)

Lab 9: Intervals and dashboards

  • Solution (zip)

Lab 10: Geographic roles and maps

  • Solution (zip)

Lab 11: Dataviz exam simulation

  • Text (pdf)
  • Visualization (jpg)

📄 Exam

  • June 21th, 2023
    • Text + DW and NoSQL solutions (pdf)
  • February 22th, 2023
    • Text + DW and NoSQL solutions (pdf)
    • Data visualization solutions (pdf)
    • Conceptual design (pdf)
  • February 7th, 2023
    • Text + DW and NoSQL solutions (pdf)
    • Data visualization solutions (pdf)
    • Conceptual design (pdf)
  • June 29th, 2022
    • Text + DW and NoSQL solutions (pdf)
  • February 17th, 2022
    • Text + DW and NoSQL solutions (pdf)
    • Data visualization solutions (pdf)
    • Conceptual design (pdf)
  • January 28th, 2022
    • Text + DW and NoSQL solutions (pdf)
    • Data visualization solutions (pdf)
  • September 1st, 2021
    • Text + DW and NoSQL solutions (pdf)
    • Data visualization solutions (pdf)
  • June 17th, 2021
    • Text + DW and NoSQL solutions (pdf)
    • Data visualization solution (pdf)
  • Feb 15th, 2021
    • Text + DW and NoSQL solutions (pdf)
    • Data visualization solutions (pdf)
  • Feb 1st, 2021
    • Text + DW and NoSQL solutions (pdf)
    • Data visualization solutions (pdf)