Data Science and Database Technology


This page has hierarchy - Parent page: Teaching

Table of content

General information

 

Announcements

  • [17-12-2018] Homework 4 (deadline 11/01/19) has been published in Section “Homeworks to be delivered”
  • [6-12-2018] The 5th laboratory will be held on Wednesday, December 19, 2018 (Team A) and on Thursday,December 20, 2018 (Team B). The laboratory is on Optimizer in Oracle.
  • [29-11-2018] Homework 3 (deadline 10/12/18) has been published in Section “Homeworks to be delivered”
  • [28-11-2018] The calendar for homework checking and tutoring sessions has been posted in section “Tutoring”
  • [26-11-2018] The 4th laboratory will be held on Wednesday, November 28, 2018 (Team A) and on Thursday, November 29, 2018 (Team B). The laboratory is on Triggers in Oracle.
  • [15-11-2018] Homework 2 (deadline 26/11/18) has been published in Section “Homeworks to be delivered”


Exams


Material

  • Course introduction (slides)

Part I

    • Introduction to Big Data (slides)
    • Data warehouse: introduction (slides)
    • Data warehouse: design (slides)
    • Data warehouse: analysis (slides)
    • Data mining: introduction (slides)
    • Data mining: data preprocessing (slides)
    • Data mining: association rules (slides)
    • Data mining: classification (slides, slidesNew)
    • [NEW] Data mining: classification, neural networks (slides)
    • Data mining: clustering (slides)

 

Oracle
  1. Extended SQL (2 slides per page, 6 slides per page)

Part II

 

Oracle
  1. Oracle Optimizer
    1. Baseline version (2 slides per page6 slides per page)
    2. Extended version with examples (2 slides per page,6 slides per page)
  2. Hints (2 slides per page6 slides per page)
  3. Documentation

 


Exercises

Data warehouse

 

Exercise  Text Draft solution
Extended SQL (Customer ) text Draft solution
Extended SQL (Rental ) text Draft solution
Data warehouse design (Italian household) exercise Draft solution Remark: Write all queries from (a) to (f)
Data warehouse design (SearchingYourHouse) exercise DraftSolutioSearchingYourHouse
Data warehouse design (Hotel chain) exercise Draft solution
Data warehouse design (Parcels service) exercise

 

Triggers

Exercise Text Draft solution
Exercise 1 (Athlete) exercise Draft solution
Exercise 2 (Greenhouse) exercise Draft solution
Exercise 3 (Student grant) exercise Draft solution
Exercise 4 (Boat rental) exercise

 

Optimizer

 

Exercise  Text Draft solution
Exercise 1 (Fine) 2 slides per page 6 slides per page
Exercise 2 (Students, Projects) 2 slides per page 6 slides per page
Exercise 3 (Discs) exercise
Exercise 4 (Athletes, Members) 2 slides per page 6 slides per page
Exercise 5 (Actors) exercise

 


Tutoring and homework verification

  • The calendar of the sessions for the homework verification is the following:
    • Friday, November 30, 2018 – 16.00-17.30, room 17A
    • Friday, December 7, 2018 – 16.00-17.30, room 19A
    • Thursday, December 13, 2018 -14.30-16.00, room 10I
    • Thursday, December 20, 2018 -14.30-16.00, room 10I
    • January: To be defined

 

  • Link to Doodle for reservation for homework verificationDoodle
  • The session on Thursday, December 13, 2018 -14.30-16.00 (room 10I) and on Thursday, December 20, 2018 -14.30-16.00 (room 10I) is divided into 3 slots of half an hour: 14:30-15:00, 15:00-15:30, 15:30-16:00
    • To access the homework verification, students must register in one of the slots.
    • When a slot exceeds the maximum reservation limit allowed, the system removes it from choices available.
  • For correctness and for a better efficiency, we ask the booked students to report any cancellation of the booking by email to eliana.pastor@polito.it.

 

  • The calendar of the tutoring sessions for doubts and questions is as follows.
    • Thursday, November 29, 2018 -14.30-16.00, room 10I
    • Thursday, December 6, 2018 -14.30-16.00,  room 10I
    • January: To be defined

 

 


Practices

 

Topic Team A Team B Lab Assistance
Practice #1: Extended SQL in Oracle  24/10/2018  25/10/2018 assistant lecturer + scholarship holder
Additional optional lab for students who have not completed Practice #1 31/10/2018  25/10/2018 scholarship holder
Practice #2: Data warehousing  7/11/2018  8/11/2018 assistant lecturer + scholarship holder
Practice #3: Data mining with Rapidminer 14/11/2018 15/11/2018 assistant lecturer + scholarship holder
Lab for Homework #2 on data mining with Rapidminer 21/11/2018 22/11/2018 scholarship holder
Practice #4: Oracle triggers 28/11/2018 29/11/2018 assistant lecturer + scholarship holder
Lab for Homework #3 on triggers in Oracle 5/12/2018 6/12/2018 scholarship holder
Practice #5: Oracle optimizer  19/12/2018  20/12/2018 assistant lecturer + scholarship holder

 

 

LAB SCHEDULE. To fit LabInf capacity students must attend all the course labs according to the following schedule (students are sorted in alphabetic order by surname):

  • TEAM A (FROM AAA TO IARIA) on Wednesday from 2.30 pm to 4pm
  • TEAM B (FROM KERBIZI TO ZOTO) on Thursday from 1pm to 2.30pm

The compliance to the schedule above is necessary to guarantee the feasibility of the lab.

LAB ACCOUNT. Please make sure to have an account on the LABINF PCs before the beginning of the lab practice (the accounts used to log in to the PCs of the other LAIBs are *not* valid). To register an account at LABINF, please visit the Labinf  website for further information.

 

Lab 1: Extended SQL

  • Text (pdf)
  • Data warehouse tables in csv format (zip)
  •        FOR STUDENTS WHO WANT TO PRACTICE AT HOME WITH EXTENDED SQL:
    • to import tables using Oracle SQL Developer: Import data (right click on “Tables” of “Connection” Tab)
    • (alternatively) to import tables from the Oracle XE Web interface: select Home>Utilities>Data Load/Unload>Load>Load Text Data->)
  • Installing Oracle 11g Express Edition at home
  • Draft solution of queries 1-5 and materialized view (pdf) and DW design

 

Lab 2: Data-warehouse analytics and reporting with Google Data Studio

 

Lab 3: Data mining – Rapid Miner  

  • Text Practice 3
  • Recommendations
    • Rapid Miner is already installed on the LabInf PCs. Please follow the instructions reported in the practice text.
    • Some processes may need a memory allocation higher than the default one (1024 MB). If so, you have to allocate a higher Java Heap memory space to Rapid Miner. Under Windows, run the command shell (cmd), go to the Rapid Miner lib folder (by deafult, c:\Program Files\Rapid-I\RapidMiner5\lib) and launch Rapid Miner form the shell using the following command: java -Xmx1500m -jar rapidminer.jar  where 1500 is the maximum heap memory space expressed in MB.
  • Supporting material

Lab 4: Triggers

  • Text (pdf)
  • scripts for creating DBs (create_db scripts)
  • Screeshots of the database after the trigger executions: Results
  • Draft solution (pdf)
  • Note: If the following error occurs “ORA-00001: unique constraint (SYS.I_PLSCOPE_SIG_IDENTIFIER$) violated“, try to change PLSCOPE_SETTINGS to NONE from menu Tools->Preferences->Database->PL/SQL Compiler->PLScope Identifiers. Alternatively add “ALTER SESSION SET PLSCOPE_SETTINGS = ‘IDENTIFIERS:NONE’; ” before the creation/update of a trigger (CREATE OR REPLACE TRIGGER TriggerName).

 


Homework to be delivered

 

To obtain the points associated with the homeworks, students have to observe the following terms:

  • Complete all the points of the exercises in the homework text.
  • Prepare one file in PDF, DOC or ODT format with the solution of the homework.
  • Name the file as: HomeworkN_Surname_Name_StudentId.XXX where
    • StudentId, Surname and Name should be substituted with student information
    • the N character following Homework should be substituted with the number of the submitted homework
    • the filename extension XXX depends on the file type chosen for the submission (PDF, DOC or ODT).
    • DOCX format is not supported.
    • Since uploaded files are automatically processed, naming the file with a wrong name implies the cancellation of the related homework submission.
  • Load the file on the didactic portal (Portale della didattica) in the section Work Submission (Elaborati) before the deadline.
    • Multiple loadings for the same student and/or for the same homework are not allowed.
    • The upload date show on the didactic portal is considered for the evaluation.
    • Since uploaded files are automatically processed, the upload after the deadline implies the cancellation of the related homework submission.
  • During the upload procedure a description (“Descrizione”) field is requested. Insert the same name of the file according to the rules described above.
  • Only the students without the access to the course page on the didactic portal can submit the homework before the deadline by sending an email to the assistant lecturer (eliana dot pastor at polito dot it)
  • Discuss the homework with a positive evaluation on the fixed date (announcement will be published).

Homework discussion: Students attending the written exam must bring the following items:

  • for Homeworks #1 – #4:
    • a hard-copy of the submitted reports

 

 

 

Homework Material Deadline Homework deliveries
Homework #1: Data warehouse HW text (pdf) to be delivered by Monday, November 12th, 2018 at 11.59 PM (UTC/GMT+1)
Homework #2: Data Mining HW text (pdf) – Dataset (breast.xls) to be delivered by Monday, November 26th, 2018 at 11.59 PM (UTC/GMT+1)
Homework #3: Triggers HW text (pdf) –  scripts. to be delivered by Monday, December 10th, 2018 at 11.59 PM (UTC/GMT+1)
Homework #4: Query Optimization HW text (pdf) to be delivered by Friday, January 11th, 2019 at 11.59 PM (UTC/GMT+1)