General Information
SSD: ING-INF/05
CFU: 8
Professor: Silvia Chiusano
Teaching Assistants: Alessandro Fiori, Eliana Pastor
Announcements [dd-mm-yy]
22-01-2021 – The list of students who will discuss their homework is available in the Homework Section.
19-12-2022 – The deadline for Homework 3 is postponed to December 22, 2022.
21-11-2022 – The text of Homework 2 is available in the “Homeworks to be delivered” Section. Deadline: December 5, 2022.
15-11-2022 – The text of Homework 1 is available in the “Homeworks to be delivered” Section. Deadline: November 28, 2022.
12-10-2022 Lectures on Thursday, October 13, 2022 and on Friday, October 14, 2022 will be held only online (not in presence) via virtual classroom. Both lectures will be recorded.
10-10-2022 – The lecture on Tuesday, October 11, 2022 will be held online only (not in-person) via virtual classroom. The lecture will be recorded.
03-10-2022 – The lesson on Tuesday, October 4, 2022 has been cancelled.
27-09-2022 – Laboratory exercises will begin from the fourth week.
26-09-2022 – The first lecture is on September 27th.
Teaching Material
- Course introduction: pdf
Part I
- Introduction to Data Science (slides)
- Data warehouse: introduction (slides)
- Data warehouse: design (slides)
- Data warehouse: analysis (slides)
- Data warehouse: materialize view (slides)
- Data lakes (slides)
- Data mining process (slides)
- Data preparation (slides)
- Data mining: association rules (slides)
- Data mining: classification (slides)
- Data mining: clustering (slides)
Part II
- Introduction to DBMS (slides)
- Buffer Manager (slides)
- Physical access to data (slides)
- Query optimization (slides)
- Physical Design (slides)
- Concurrency Control (slides)
- Reliability management (slides)
- Distributed databases (slides)
- NoSQL, beyond relational databases (slides)
- Introduction to MongoDB (slides)
- ElasticSearch (slides)
Oracle
- Extended SQL (2 slides per page, 6 slides per page)
- Oracle optimizer (slides)
- Oracle Hints (slides)
Exercise
- Extended SQL
- Exercise 1 (text, draft solution)
- Data Warehouse
- Storehouses (text, draft solution)
- Italian wines (text)
- Remote heating (text)
- Scientific publications (text)
- Materialized views and triggers (text, draft solution)
- Supporting material: Introduction to triggers (slides)
- Optimizer
- Fine (text, draft solution)
- Students (text)
- Athletes (text)
Laboratory Material
The laboratory practices will start from the fourth week.
LAB SCHEDULE.
Topic | Team A 11:30-13:00 | Team B 8:30-10:00 |
Lab 1: Extended SQL | 19/10/2022 | 20/10/2022 |
Lab 2: Data Studio | 26/10/2022 | 27/10/2022 |
Lab 3: Materialize views | 09/11/2022 | 10/11/2022 |
Lab 4: Data mining with Rapidminer | 16/11/2022 | 17/11/2022 |
Lab for Homework #2 on data mining with Rapidminer (scholarship holder) | 23/11/2022 | 24/11/2022 |
Lab 5: Oracle optimizer | 30/11/2022 | 1/12/2022 |
Lab 6: MongoDB | 14/12/2022 | 15/12/2022 |
Division into two teams for surname.
TEAM A (FROM A TO J) on Wednesday from 11.30am to 1pm
TEAM B (FROM K TO Z) on Thursday from 8:30am to 10am
Lab 1: Extended SQL
- Text (pdf)
- Data warehouse tables in csv format (zip)
- SQL Developer is already available at LABINF. If you want to practise at home, you can follow these tutorials
- Installing Oracle Database 18c Express Edition and SQL Developer
- Import Database and Tables: Tutorial
- In the case, you want to practice at home and you have problems in using Oracle Database and SQL Developer, you can consider Oracle Live SQL.
- You can add tables using SQL scripts (zip)
- A short guide on how to import SQL scripts and query the DB in Oracle Live SQL is available (pdf)
- if you experience some issues on importing the complete FACT table, you can opt for a “light” version of the table with contains a sample of the rows (facts_sample.sql).
Draft solution (star schema, queries)
Lab 2 : data warehouse analytics and reporting (Google Data Studio/Looker Studio)
Lab 3 : materialized views and triggers
- Text (pdf)
- Draft solution (pdf)
Lab 4: Data mining – Rapid Miner
- Text Practice 4
- Dataset (Users.xls)
- Supporting material
- Rapid Miner 5.0 Community Edition Guide (rapidminer-5.0-manual-english_v1.0)
- Rapid Miner Studio 10 download https://rapidminer.com/platform/educational/
- You need to register an account for educational purposes, using the student role, with your institutional email
- Introduction to RapidMiner (2 slides per page, 3 slides per page, 6 slides per page)
- Examples (download)
Lab 5: The Oracle Optimizer
- Text (pdf)
- Scripts for creating DBs (Lab5Database_OPT)
- Useful scripts
Lab 6: NoSQL in MongoDB
- Text (pdf)
- Collection “restaurants” (txt, zipped json)
Homework to be delivered
To obtain the points associated with the homeworks, students have to observe the following terms:
- Complete all the points of the exercises in the homework text.
- Prepare one file in PDF, DOC or ODT format with the solution of the homework.
- Name the file as: HomeworkN_Surname_Name_StudentId.XXX where
- StudentId, Surname and Name should be substituted with student information
- the N character following Homework should be substituted with the number of the submitted homework
- the filename extension XXX depends on the file type chosen for the submission (PDF, DOC or ODT).
- DOCX format is not supported.
- Since uploaded files are automatically processed, naming the file with a wrong name implies the cancellation of the related homework submission.
- For example, for homework 1 and extension pdf, the student with name and surname Mario Rossi and id s123456 will upload Homework1_Rossi_Mario_s123456.pdf
- Load the file on the didactic portal (Portale della didattica) in the section Work Submission (Elaborati) before the deadline.
- Multiple loadings for the same student and/or for the same homework are not allowed.
- The upload date show on the didactic portal is considered for the evaluation.
- Since uploaded files are automatically processed, the upload after the deadline implies the cancellation of the related homework submission.
- During the upload procedure a description (“Descrizione”) field is requested. Insert the same name of the file according to the rules described above.
- Only the students without the access to the course page on the didactic portal can submit the homework before the deadline by sending an email to the assistant lecturer (eliana dot pastor at polito dot it)
- Discuss the homework with a positive evaluation on the fixed date (announcement will be published).
Homework discussion: Students attending the written exam must bring the following items:
- for Homeworks #1 – #4:
- a hard-copy of the submitted reports
Homework deliveries:
Homework discussion. At the end of the exam, the following students will have to come to Lab 5 of DAUIN department (second floor, entrance from Corso Castelfidardo 39) for the discussion of their homeworks:
- 318907
- 316607
- 313642
- 315962
Homework submissions : list of delivered submissions. In case of any incongruencies or missing delivery, send an email to eliana.pastor@polito.it.
Homework | Material | Deadline |
Homework #1: Data warehouse and materialized views | HW text – draft solution | to be delivered by Monday, November 28th, 2022 at 11.59 PM (UTC/GMT+1) |
Homework #2: Data mining | HW text – breast dataset | to be delivered by Monday, December 5th, 2022 at 11.59 PM (UTC/GMT+1) |
Homework #3: The Optimizer | HW text | to be delivered by Monday, December 22nd |
Homework #4: MongoDB | HW text, bike_stations dataset | to be delivered by Sunday, January 15th, 2023 at 11.59 PM (UTC/GMT+1) |