Data Science and Database Technology (2020/2021)
Table of content
- General information
- Exams
- Material
- Exercises
- Exercises from written exams
- Other exercises
- Tutoring sessions
- Practices
- Homework to be delivered
General information
- ECTS: 8
- Professor: Silvia Chiusano
- Assistant lecturer: Alessandro Fiori, Eliana Pastor
- Exam rules A.Y. 2020/2021 (Covid-19 Emergency) (pdf-rules)
Announcements
- [02-02-2021] We posted a recap of homework submissions.
- [21-11-2020] Homework 4 (deadline 7-01-2021) has been published in Section “Homeworks to be delivered”
- [27-11-2020] Homework 3 (deadline 7-12-2020) has been published in Section “Homeworks to be delivered”
- [13-11-2020] Homework 2 (deadline 23-11-2020) has been published in Section “Homeworks to be delivered”
- [29-10-2020] We posted an updated version of Homework 1 (Query b.ii )
- [29-10-2020] Calendar of tutoring sessions in November has been posted
- [29-10-2020] Calendar of laboratory practices has been posted
- [28-10-2020] Considering the problem with Virtual Classroom during the 1st lab (19th 2020), we have recorded the description of Practise #1 in Virtual Classroom. You will find it (soon, we already processed it on VC) as “Laboratory 1 – Lab description “. Please refer to this recording for the Lab#1.
- [28-10-2020] Homework 1 (deadline 8-11-2020) has been published in Section “Homeworks to be delivered”
- [16-10-2020] The first laboratory will be held on Monday October 19th 2020. The tutorial for the software install for Practice #1 (Extended SQL) is now available in Section “Practices”.
- LAB SCHEDULE.
TEAM A (FROM A TO G) on Monday from 2.30pm to 4pm
TEAM B (FROM H TO Z) on Monday from 4pm to 5.30pm
- LAB SCHEDULE.
- [16-10-2020] Join us on Piazza. You can register at Piazza whether you are already enrolled in the course or you are waiting for it. If you do not have an @studenti.polito.it address yet, drop an email to eliana.pastor@polito.it including your personal email address and your ID on the Polito website or the Apply procedure – it is in the format FXXXXX. The registration with your personal address is temporary: remember to add your educational address on Piazza as soon as you receive it.
Exams
- Exam rules A.Y. 2020/2021 (Covid-19 Emergency) (pdf-rules)
Exam on January 26, 2021
- Draft solution (pdf)
Material
- Course introduction (slides)
Part I
- Introduction to Big Data (slides)
- Data warehouse: introduction (slides)
- Data warehouse: design (slides)
- Data warehouse: analysis (slides)
- Data mining: introduction (slides)
- Data mining: data preprocessing (slides)
- Data mining: Association rules (slides).
- Data mining: classification (slides, slidesNew)
- Data mining: classification, neural networks (slides)
- Data mining: clustering (slides)
Oracle
- Extended SQL (2 slides per page, 6 slides per page)
Part II
- Triggers (slides)
- Introduction to DBMS (slides)
- Buffer Manager (slides)
- Physical access to data (slides)
- Query optimization (slides)
- Physical Design (slides)
- Concurrency Control (slides)
- Reliability Management (slides)
- Distributed databases (slides)
- Beyond relational databases (slides)
- Intro to MongoDB (part1, part2)
Oracle
- Oracle Optimizer
- Baseline version (2 slides per page, 6 slides per page)
- Extended version with examples (2 slides per page,6 slides per page)
- Hints (2 slides per page, 6 slides per page)
- Documentation
- Oracle Database 10g documentation library
- Oracle Database Performance Tuning Guide
- The Query Optimizer
- Statistics about indices, meaning of the columns in the statistics table (e.g.,
CLUSTERING_FACTOR
) - Statistics about tables, meaning of the columns in the statistics table (es.
EMPTY_BLOCKS
)
Exercises
Data warehouse
Exercise | Text | Draft solution |
Extended SQL (Customer ) | text | Draft solution |
Extended SQL (Rental ) | text | Draft solution |
Data warehouse design (Italian household) | exercise | Draft solution |
Data warehouse design (SearchingYourHouse) | exercise | Draft solution |
Data warehouse design (Hotel chain) | exercise | Draft solution |
Data warehouse design (Parcels service) | exercise |
Triggers
Exercise | Text | Draft solution |
Exercise 1 (Athlete) | exercise | Draft solution |
Exercise 2 (Greenhouse) | exercise | Draft solution |
Exercise 3 (Student grant) | exercise | Draft solution |
Exercise 4 (Boat rental) | exercise | DraftSolBoatRental (timecondition) |
Optimizer
Exercise | Text | Draft solution |
Exercise 1 (Fine) | 2 slides per page 6 slides per page | Draft Solution |
Exercise 2 (Students, Projects) | 2 slides per page 6 slides per page | Draft Solution |
Exercise 3 (Discs) | exercise | Draft Solution |
Exercise 4 (Athletes, Members) | 2 slides per page 6 slides per page | Draft Solution |
Exercises from written exams
AA 2015-2016
Exam | Draft solution |
Exam (23-02-2016) | optimizer, dw, trigger |
Exam (2016-01-27) | optimizer, dw, trigger |
AA 2011-2012
Exam | Draft solution |
Exam (2012-02-06) | optimizer and dw, trigger |
Exam (2012-02-28) | optimizer and dw |
Exam (2012-06-21) | optimizer and dw, trigger |
Exam (2012-09-07) | optimizer and dw |
AA 2010-2011
Exam | Draft solution |
Exam (2011-02-07) | optimizer-trigger-dw |
Exam (2011-02-22) | |
Exam (2011-07-08) | optimizer, trigger |
Exam (2011-09-21) | optimizer, trigger |
Tutoring sessions
- In tutoring sessions teachers are available to answer questions on homework, exercises and topics presented in lessons.
- The calendar of the tutoring sessions in November is the following:
- Monday 2/11 – 2:30-3pm 4-4:30pm – Virtual classroom
- Friday 16/11 – 2:30-3pm 4-4:30pm – Virtual classroom
- Thursday 30/11 – 2:30-3pm 4-4:30pm – Virtual classroom
- During the lab session (2:30-5:30 pm) the scholarship holder will be also available to answer questions on homeworks
Practices
LAB SCHEDULE.
TEAM A (FROM A TO G) on Monday from 2.30pm to 4pm
TEAM B (FROM H TO Z) on Monday from 4pm to 5.30pm
Lab calendar
Topic | Team A (2:30-4pm)
Team B (4-5:30pm) |
Lab Assistance |
Practice #1: Extended SQL in Oracle | 19/10/2020 | assistant lecturer |
Practice #2: Data warehousing | 26/10/2020 | assistant lecturer |
Lab for Homework #1 – Data warehousing + Tutoring session | 2/11/2020 | scholarship holder |
Practice #3: Data mining with Rapidminer | 9/11/2020 | assistant lecturer |
Lab for Homework #2 on data mining with Rapidminer + Tutoring session | 16/11/2020 | scholarship holder |
Practice #4: Oracle triggers | 23/11/2020 | assistant lecturer |
Lab for Homework #3 on triggers in Oracle + Tutoring session | 30/11/2020 | scholarship holder |
Practice #5: Oracle optimizer | 14/12/2020 | assistant lecturer |
Practice #6: MongoDB | 11/01/2021 | assistant lecturer |
Lab 1: Extended SQL
- Installing Oracle Database 18c Express Edition and SQL Developer
Lab 2: Data-warehouse analytics and reporting with Google Data Studio
Lab 3: Data mining – Rapid Miner
-
- Text Practice 3
- Dataset (Users.xls)
-
- Supporting material
-
- Rapid Miner 5.0 Community Edition Guide (rapidminer-5.0-manual-english_v1.0)
- Rapid Miner download http://rapidminer.com/products/rapidminer-studio/
- Free Community Edition
- Introduction to RapidMiner (2 slides per page, 3 slides per page, 6 slides per page)
- Examples (download)
-
- Supporting material
Lab 4: Triggers
- Text (pdf)
- scripts for creating DBs (create_db scripts)
- Screenshots of the database after the trigger executions: Results
- Draft solution (pdf) [UPDATE (Ex. 3) ]
Lab 5: The Oracle Optimizer
- Text (pdf)
- Scripts for creating DBs (Lab5Database_OPT)
- Useful scripts
- Documentation and description of the execution plan operations
- Draft solution (pdf)
Lab 6: NoSQL in MongoDB
Homework to be delivered
To obtain the points associated with the homeworks, students have to observe the following terms:
-
- Complete all the points of the exercises in the homework text.
-
- Prepare one file in PDF, DOC or ODT format with the solution of the homework.
-
- Name the file as: HomeworkN_Surname_Name_StudentId.XXX where
-
- StudentId, Surname and Name should be substituted with student information
-
- the N character following Homework should be substituted with the number of the submitted homework
-
- the filename extension XXX depends on the file type chosen for the submission (PDF, DOC or ODT).
-
- DOCX format is not supported.
- Since uploaded files are automatically processed, naming the file with a wrong name implies the cancellation of the related homework submission.
- For example, for homework 1 and extension pdf, the student with name and surname Mario Rossi and id s123456 will upload Homework1_Rossi_Mario_s123456.pdf
-
- Name the file as: HomeworkN_Surname_Name_StudentId.XXX where
-
- Load the file on the didactic portal (Portale della didattica) in the section Work Submission (Elaborati) before the deadline.
-
- Multiple loadings for the same student and/or for the same homework are not allowed.
-
- The upload date show on the didactic portal is considered for the evaluation.
- Since uploaded files are automatically processed, the upload after the deadline implies the cancellation of the related homework submission.
-
- Load the file on the didactic portal (Portale della didattica) in the section Work Submission (Elaborati) before the deadline.
-
- During the upload procedure a description (“Descrizione”) field is requested. Insert the same name of the file according to the rules described above.
-
- Only the students without the access to the course page on the didactic portal can submit the homework before the deadline by sending an email to the assistant lecturer (eliana dot pastor at polito dot it)
- Discuss the homework with a positive evaluation on the fixed date (announcement will be published).
Homework discussion: Students attending the written exam must bring the following items:
- for Homeworks #1 – #4:
- a hard-copy of the submitted reports
Homework deliveries:
Homework submissions : list of delivered submissions. In case of any incongruencies or missing delivery, send an email to eliana.pastor@polito.it.
Homework | Material | Deadline | Homework deliveries |
Homework #1: Data warehouse | HW text (pdf)
[NEW] Updated (b) ii |
to be delivered by Sunday, November 08th, 2020 at 11.59 PM (UTC/GMT+1) | |
Homework #2: Data Mining | HW text (pdf) – Dataset (breast.xls) | to be delivered by Monday, November 23th, 2020 at 11.59 PM (UTC/GMT+1) | |
Homework #3: Triggers | HW text (pdf) – scripts. | to be delivered by Monday, December 7th, 2020 at 11.59 PM (UTC/GMT+1) | |
Homework #4: Query Optimization | HW text (pdf) [Updated having COUNT(*)>1] | to be delivered by Thursday, January 7th, 2021 at 11.59 PM (UTC/GMT+1) |