Data Science and Database Technology
Table of content
- General information
- Weekly class schedule
- Exams
- Material
- Exercises
- Exercises from written exams
- Other exercises
- Tutoring sessions
- Practices
- Practices
- Homework to be delivered
General information
- ECTS: 8
- Professor: Silvia Chiusano
- Assistant lecturer: Alessandro Fiori, Eliana Pastor
Announcements
- [2020-06-04] Exam rules for exams on June-September 2020 are available (DSDBT-ExamRulesJune-September2020)
- [20-05-2020] Exam 09-05-2020: Results are available.
- [22-04-2020] Exam rules for the exam on May 9, 2020 are available (ExamRulesMay2020)
Exams
Exams on June-September 2020
- Exam rules for exams on June-September, 2020 (DSDBT-ExamRulesJune-September2020)
Exam on May 9, 2020
- Exam rules for exam on May 9, 2020 (ExamRulesMay2020 )
- Results are available in the didactic portal, section “Materiale”. All exam grades will be recorded. Students, who received a grade >= 18 in the written exam and want to reject it, have to send an e-mail to Silvia Chiusano (silvia dot chiusano at polito dot it) by June 11, 2020.
- Notes on Exam DSDBT09-05-2020
- DSDBT-May2020DraftSolution
Exam (in presence) on February 2020
- Exam rules (ExamRules)
Material
- Course introduction (slides)
Part I
-
- Introduction to Big Data (slides)
- Data warehouse: introduction (slides)
- Data warehouse: design (slides)
- Data warehouse: analysis (slides)
- Data mining: introduction (slides)
- Data mining: data preprocessing (slides)
- Data mining: Association rules (slides). New, updated on 17/10/2019
- Data mining: classification (slides, slidesNew)
- [NEW] Data mining: classification, neural networks (slides)
- Data mining: clustering (slides)
Oracle
- Extended SQL (2 slides per page, 6 slides per page)
Part II
- Triggers (slides)
- Introduction to DBMS (slides)
- Buffer Manager (slides)
- Physical access to data (slides)
- Query optimization (slides)
- Physical Design (slides)
- Concurrency Control (slides)
- Reliability Management (slides)
- Distributed databases (slides)
- Beyond relational databases (slides)
- Intro to MongoDB (part1, part2)
Oracle
- Oracle Optimizer
- Baseline version (2 slides per page, 6 slides per page)
- Extended version with examples (2 slides per page,6 slides per page)
- Hints (2 slides per page, 6 slides per page)
- Documentation
- Oracle Database 10g documentation library
- Oracle Database Performance Tuning Guide
- The Query Optimizer
- Statistics about indices, meaning of the columns in the statistics table (e.g.,
CLUSTERING_FACTOR
)
- Statistics about tables, meaning of the columns in the statistics table (es.
EMPTY_BLOCKS
)
- Introduction to Big Data (slides)
- Data warehouse: introduction (slides)
- Data warehouse: design (slides)
- Data warehouse: analysis (slides)
- Data mining: introduction (slides)
- Data mining: data preprocessing (slides)
- Data mining: Association rules (slides). New, updated on 17/10/2019
- Data mining: classification (slides, slidesNew)
- [NEW] Data mining: classification, neural networks (slides)
- Data mining: clustering (slides)
- Baseline version (2 slides per page, 6 slides per page)
- Extended version with examples (2 slides per page,6 slides per page)
- Oracle Database 10g documentation library
- Oracle Database Performance Tuning Guide
- The Query Optimizer
- Statistics about indices, meaning of the columns in the statistics table (e.g.,
CLUSTERING_FACTOR
) - Statistics about tables, meaning of the columns in the statistics table (es.
EMPTY_BLOCKS
)
Exercises
Data warehouse
Exercise | Text | Draft solution |
Extended SQL (Customer ) | text | Draft solution |
Extended SQL (Rental ) | text | Draft solution |
Data warehouse design (Italian household) | exercise | Draft solution |
Data warehouse design (SearchingYourHouse) | exercise | Draft solution |
Data warehouse design (Hotel chain) | exercise | Draft solution |
Data warehouse design (Parcels service) | exercise |
Triggers
Exercise | Text | Draft solution |
Exercise 1 (Athlete) | exercise | Draft solution |
Exercise 2 (Greenhouse) | exercise | Draft solution |
Exercise 3 (Student grant) | exercise | Draft solution |
Exercise 4 (Boat rental) | exercise | DraftSolBoatRental (timecondition) |
Optimizer
Exercise | Text | Draft solution |
Exercise 1 (Fine) | 2 slides per page 6 slides per page | Draft Solution |
Exercise 2 (Students, Projects) | 2 slides per page 6 slides per page | Draft Solution |
Exercise 3 (Discs) | exercise | Draft Solution |
Exercise 4 (Athletes, Members) | 2 slides per page 6 slides per page | Draft Solution |
Exercises from written exams
AA 2015-2016
Exam | Draft solution |
Exam (23-02-2016) | optimizer, dw, trigger |
Exam (2016-01-27) | optimizer, dw, trigger |
AA 2011-2012
Exam | Draft solution |
Exam (2012-02-06) | optimizer and dw, trigger |
Exam (2012-02-28) | optimizer and dw |
Exam (2012-06-21) | optimizer and dw, trigger |
Exam (2012-09-07) | optimizer and dw |
AA 2010-2011
Exam | Draft solution |
Exam (2011-02-07) | optimizer, trigger, dw |
Exam (2011-02-22) | |
Exam (2011-07-08) | optimizer, trigger |
Exam (2011-09-21) | optimizer, trigger |
Tutoring sessions
- In tutoring sessions teachers are available to answer questions on homework, exercises and topics presented in lessons.
- The calendar of the tutoring sessions in December is the following:
- Thursday 5/12 – 8:30-10:00 – room 8D
- Friday 6/12 – 17:30-18:00 – room 8D
- Thursday 12/12 – 8:30-10:00 – room 8D
- Thursday 19/12 – 8:30-10:00 – room 8D
- Friday 20/12 – 17:30-18:00 – room 8D
- The calendar of the tutoring sessions in January will be posted soon
Practices
Topic | Team A | Team B | Lab Assistance |
Practice #1: Extended SQL in Oracle | 21/10/2019 | 25/10/2019 | assistant lecturer |
Practice #2: Data warehousing | 28/10/2019 | 8/11/2019 | assistant lecturer |
Practice #3: Data mining with Rapidminer | 11/11/2019 | 15/11/2019 | assistant lecturer |
Lab for Homework #2 on data mining with Rapidminer | 18/11/2019 | 22/11/2019 | scholarship holder |
Practice #4: Oracle triggers | 25/11/2019 | 29/11/2019 | assistant lecturer |
Lab for Homework #3 on triggers in Oracle | 2/12/2019 | 6/12/2019 | scholarship holder |
Practice #5: Oracle optimizer | 16/12/2019 [NEW] | 13/12/2019 | assistant lecturer |
Practice #6: MongoDB | 13/01/2020 [NEW] | 10/01/2020 | assistant lecturer |
LAB SCHEDULE. To fit LabInf capacity students must attend all the course labs according to the following schedule (students are sorted in alphabetic order by surname):
- TEAM A (SURNAME STARTING WITH A TO K) on Monday from 8.30 pm to 10pm
- TEAM B (SURNAME STARTING WITH L TO Z) on Friday from 16pm to 17.30pm
The compliance to the schedule above is necessary to guarantee the feasibility of the lab.
LAB ACCOUNT. Please make sure to have an account on the LABINF PCs before the beginning of the lab practice (the accounts used to log in to the PCs of the other LAIBs are *not* valid). To register an account at LABINF, please visit the Labinf website for further information.
Lab 1: Extended SQL
-
- FOR STUDENTS WHO WANT TO PRACTICE AT HOME WITH EXTENDED SQL:
- to import tables using Oracle SQL Developer: Import data (right click on “Tables” of “Connection” Tab)
- (alternatively) to import tables from the Oracle XE Web interface: select Home>Utilities>Data Load/Unload>Load>Load Text Data->)
- FOR STUDENTS WHO WANT TO PRACTICE AT HOME WITH EXTENDED SQL:
- Installing Oracle 11g Express Edition at home
Lab 2: Data-warehouse analytics and reporting with Google Data Studio
Lab 3: Data mining – Rapid Miner
-
- Recommendations
-
- Rapid Miner is already installed on the LabInf PCs. Please follow the instructions reported in the practice text.
- Some processes may need a memory allocation higher than the default one (1024 MB). If so, you have to allocate a higher Java Heap memory space to Rapid Miner. Under Windows, run the command shell (cmd), go to the Rapid Miner lib folder (by default, c:\Program Files\Rapid-I\RapidMiner5\lib) and launch Rapid Miner form the shell using the following command: java -Xmx1500m -jar rapidminer.jar where 1500 is the maximum heap memory space expressed in MB.
-
- Recommendations
- Supporting material
-
- Dataset (Utenti.xls)
-
- Rapid Miner 5.0 Community Edition Guide (rapidminer-5.0-manual-english_v1.0)
- Introduction to RapidMiner (2 slides per page, 3 slides per page, 6 slides per page)
- Examples (download)
- For students who want to practice with Rapid Miner at home:
-
- Register to download the free Community Edition (follow the link at the bottom of the page).
- Install Rapid Miner on your PC
-
-
Lab 4: Triggers
- Text (pdf)
- scripts for creating DBs (create_db scripts)
- Screenshots of the database after the trigger executions: Results
- LABINF – Oracle SQL Developer English version: RUN, write %APPDATA%\sqldeveloper, select 19.1.0 version, product.conf, write addVMoption -Duser.language=en
- Note: If the following error occurs “ORA-00001: unique constraint (SYS.I_PLSCOPE_SIG_IDENTIFIER$) violated“, try to change PLSCOPE_SETTINGS to NONE from menu Tools->Preferences->Database->PL/SQL Compiler->PLScope Identifiers. Alternatively add “ALTER SESSION SET PLSCOPE_SETTINGS = ‘IDENTIFIERS:NONE’; ” before the creation/update of a trigger (CREATE OR REPLACE TRIGGER TriggerName).
Lab 5: The Oracle Optimizer
-
- Text (pdf)
-
- Useful scripts
-
- Description of the execution plan operations
- Script for database generation: database (home practice or if not available at LABINF PC)
Lab 6: NoSQL in MongoDB
-
-
- Text Practice 6
- Tutorial (pdf)
- Collection (txt)
-
Homework to be delivered
To obtain the points associated with the homeworks, students have to observe the following terms:
-
- Complete all the points of the exercises in the homework text.
-
- Prepare one file in PDF, DOC or ODT format with the solution of the homework.
-
- Name the file as: HomeworkN_Surname_Name_StudentId.XXX where
-
- StudentId, Surname and Name should be substituted with student information
-
- the N character following Homework should be substituted with the number of the submitted homework
-
- the filename extension XXX depends on the file type chosen for the submission (PDF, DOC or ODT).
-
- DOCX format is not supported.
- Since uploaded files are automatically processed, naming the file with a wrong name implies the cancellation of the related homework submission.
-
- Name the file as: HomeworkN_Surname_Name_StudentId.XXX where
-
- Load the file on the didactic portal (Portale della didattica) in the section Work Submission (Elaborati) before the deadline.
-
- Multiple loadings for the same student and/or for the same homework are not allowed.
-
- The upload date show on the didactic portal is considered for the evaluation.
- Since uploaded files are automatically processed, the upload after the deadline implies the cancellation of the related homework submission.
-
- Load the file on the didactic portal (Portale della didattica) in the section Work Submission (Elaborati) before the deadline.
-
- During the upload procedure a description (“Descrizione”) field is requested. Insert the same name of the file according to the rules described above.
-
- Only the students without the access to the course page on the didactic portal can submit the homework before the deadline by sending an email to the assistant lecturer (eliana dot pastor at polito dot it)
- Discuss the homework with a positive evaluation on the fixed date (announcement will be published).
Homework discussion: Students attending the written exam must bring the following items:
- for Homeworks #1 – #4:
- a hard-copy of the submitted reports
Homework | Material | Deadline | Homework deliveries |
Homework #1: Data warehouse | HW text (pdf) | to be delivered by Sunday, November 10th, 2019 at 11.59 PM (UTC/GMT+1) | |
Homework #2: Data Mining | HW text (pdf) – Dataset (breast.xls) | to be delivered by Sunday, November 24th, 2019 at 11.59 PM (UTC/GMT+1) | |
Homework #3: Triggers | HW text (pdf) – scripts. | to be delivered by Sunday, December 8th, 2019 at 11.59 PM (UTC/GMT+1) | |
Homework #4: Query Optimization | HW text (pdf) | to be delivered by Sunday, January 12th, 2020 at 11.59 PM (UTC/GMT+1) |