Data Science and Database Technology


This page has hierarchy - Parent page: Teaching

Table of content

General information

Announcements

  • [15-2-2019] The list of students who delivered the homeworks is available at the following link: (pdf). Anyone who has delivered the homeworks and is not present in the respective lists, is asked to contact Eliana Pastor (eliana.pastor@polito.it)
  • [9-1-2019] A tutoring session is scheduled on Thursday, January 10, 2019 (13.00-14.30, room 10). Teachers will be available to answer questions on  course topics and Homework #4.
  • [17-12-2018] Homework 4 (deadline 11/01/19) has been published in Section “Homeworks to be delivered”


Exams

  • 8th February exam:
  •   Students division into rooms:
    • Room 1P: from AKHIL ANAND to CIAUDANO
      Room 2P: from CILLUFFO to FERRERO
      Room 3P: from FILOCAMO to MANASSERI
      Room 4P: from MANCO to PRENGA
      Room 16: from QURESHI to ZHU
    • To attend the exam students have to be regularly booked,  bring an identity card and wait for the call in front of the room.
    • Homework verification:
    • After the written exam, the following students have to discuss their homeworks. The discussion will take place at Lab5 in the “Automatica e Informatica” (DAUIN) department, entrance Corso Castelfidardo 39, floor 2.
      • S256925
      • S265554
      • S256689
      • S265391
      • S265444
      • S265351
  • Exam rules (ExamRules)

Material

  • Course introduction (slides)

Part I

    • Introduction to Big Data (slides)
    • Data warehouse: introduction (slides)
    • Data warehouse: design (slides)
    • Data warehouse: analysis (slides)
    • Data mining: introduction (slides)
    • Data mining: data preprocessing (slides)
    • Data mining: association rules (slides)
    • Data mining: classification (slides, slidesNew)
    • [NEW] Data mining: classification, neural networks (slides)
    • Data mining: clustering (slides)
Oracle
  1. Extended SQL (2 slides per page, 6 slides per page)

Part II

Oracle
  1. Oracle Optimizer
    1. Baseline version (2 slides per page6 slides per page)
    2. Extended version with examples (2 slides per page,6 slides per page)
  2. Hints (2 slides per page6 slides per page)
  3. Documentation


Exercises

Data warehouse

Exercise  Text Draft solution
Extended SQL (Customer ) text Draft solution
Extended SQL (Rental ) text Draft solution
Data warehouse design (Italian household) exercise Draft solution Remark: Write all queries from (a) to (f)
Data warehouse design (SearchingYourHouse) exercise DraftSolutioSearchingYourHouse
Data warehouse design (Hotel chain) exercise Draft solution
Data warehouse design (Parcels service) exercise

Triggers

Exercise Text Draft solution
Exercise 1 (Athlete) exercise Draft solution
Exercise 2 (Greenhouse) exercise Draft solution
Exercise 3 (Student grant) exercise Draft solution
Exercise 4 (Boat rental) exercise DraftSolBoatRental (timecondition)

Optimizer

Exercise  Text Draft solution
Exercise 1 (Fine) 2 slides per page 6 slides per page Draft Solution
Exercise 2 (Students, Projects) 2 slides per page 6 slides per page Draft Solution
Exercise 3 (Discs) exercise Draft Solution
Exercise 4 (Athletes, Members) 2 slides per page 6 slides per page Draft Solution
Exercise 5 (Actors) exercise Draft solution

Exercises from written exams

AA 2015-2016

Exam Draft solution
Exam (23-02-2016) optimizer, dw, trigger
Exam (2016-01-27) optimizer, dw, trigger

AA 2011-2012

Exam Draft solution
Exam (2012-02-06) optimizer and dwtrigger
Exam (2012-02-28) optimizer and dw
Exam (2012-06-21) optimizer and dwtrigger
Exam (2012-09-07) optimizer and dw

AA 2010-2011

Exam Draft solution
Exam (2011-02-07) optimizer, trigger, dw
Exam (2011-02-22)
Exam (2011-07-08) optimizer, trigger
Exam (2011-09-21) optimizer, trigger


Tutoring and homework verification

  • The calendar of the sessions for the homework verification is the following:
    • Friday, November 30, 2018 – 16.00-17.30, room 17A
    • Friday, December 7, 2018 – 16.00-17.30, room 19A
    • Thursday, December 13, 2018 -14.30-16.00, room 10I
    • Thursday, December 20, 2018 -14.30-16.00, room 10I
    • January: to be defined
  • Link to Doodle for reservation for homework verificationDoodle
    • To access the homework verification, students must register in one of the slots.
    • When a slot exceeds the maximum reservation limit allowed, the system removes it from choices available.
  • For correctness and for a better efficiency, we ask the booked students to report any cancellation of the booking by email to eliana.pastor@polito.it.
  • The calendar of the tutoring sessions for doubts and questions is as follows.
    • Thursday, November 29, 2018 -14.30-16.00, room 10I
    • Thursday, December 6, 2018 -14.30-16.00,  room 10I
    • January: January 10, 2019  – 13.00-14.30, room 10

Practices

Topic Team A Team B Lab Assistance
Practice #1: Extended SQL in Oracle  24/10/2018  25/10/2018 assistant lecturer + scholarship holder
Additional optional lab for students who have not completed Practice #1 31/10/2018  25/10/2018 scholarship holder
Practice #2: Data warehousing  7/11/2018  8/11/2018 assistant lecturer + scholarship holder
Practice #3: Data mining with Rapidminer 14/11/2018 15/11/2018 assistant lecturer + scholarship holder
Lab for Homework #2 on data mining with Rapidminer 21/11/2018 22/11/2018 scholarship holder
Practice #4: Oracle triggers 28/11/2018 29/11/2018 assistant lecturer + scholarship holder
Lab for Homework #3 on triggers in Oracle 5/12/2018 6/12/2018 scholarship holder
Practice #5: Oracle optimizer  19/12/2018  20/12/2018 assistant lecturer + scholarship holder
Lab for Oracle optimizer  (previous lab) and H4  9/01/2019  10/01/2019 scholarship holder

LAB SCHEDULE. To fit LabInf capacity students must attend all the course labs according to the following schedule (students are sorted in alphabetic order by surname):

  • TEAM A (FROM AAA TO IARIA) on Wednesday from 2.30 pm to 4pm
  • TEAM B (FROM KERBIZI TO ZOTO) on Thursday from 1pm to 2.30pm

The compliance to the schedule above is necessary to guarantee the feasibility of the lab.

LAB ACCOUNT. Please make sure to have an account on the LABINF PCs before the beginning of the lab practice (the accounts used to log in to the PCs of the other LAIBs are *not* valid). To register an account at LABINF, please visit the Labinf  website for further information.

Lab 1: Extended SQL

    • Data warehouse tables in csv format (zip)
    •        FOR STUDENTS WHO WANT TO PRACTICE AT HOME WITH EXTENDED SQL:
      • to import tables using Oracle SQL Developer: Import data (right click on “Tables” of “Connection” Tab)
      • (alternatively) to import tables from the Oracle XE Web interface: select Home>Utilities>Data Load/Unload>Load>Load Text Data->)
  • Installing Oracle 11g Express Edition at home
  • Draft solution of queries 1-5 and materialized view (pdf) and DW design

Lab 2: Data-warehouse analytics and reporting with Google Data Studio

Lab 3: Data mining – Rapid Miner  

    • Recommendations
        • Rapid Miner is already installed on the LabInf PCs. Please follow the instructions reported in the practice text.
      • Some processes may need a memory allocation higher than the default one (1024 MB). If so, you have to allocate a higher Java Heap memory space to Rapid Miner. Under Windows, run the command shell (cmd), go to the Rapid Miner lib folder (by deafult, c:\Program Files\Rapid-I\RapidMiner5\lib) and launch Rapid Miner form the shell using the following command: java -Xmx1500m -jar rapidminer.jar  where 1500 is the maximum heap memory space expressed in MB.

Lab 4: Triggers

  • Text (pdf)
  • scripts for creating DBs (create_db scripts)
  • Screeshots of the database after the trigger executions: Results
  • Draft solution (pdf)
  • Note: If the following error occurs “ORA-00001: unique constraint (SYS.I_PLSCOPE_SIG_IDENTIFIER$) violated“, try to change PLSCOPE_SETTINGS to NONE from menu Tools->Preferences->Database->PL/SQL Compiler->PLScope Identifiers. Alternatively add “ALTER SESSION SET PLSCOPE_SETTINGS = ‘IDENTIFIERS:NONE’; ” before the creation/update of a trigger (CREATE OR REPLACE TRIGGER TriggerName).

Lab 5: The Oracle Optimizer

  • Installing Oracle 11g Express Edition at home
    • To download and install Oracle Express Edition
    • To import the database in Oracle  on your personal Computer
      • Download the following archive empdb.zip
      • Extract the database file empdb.dump
      • Download the batch file Oracle-DB-import.bat (for Windows) or the shell script Oracle-DB-import.sh (for Linux) and save it in the same directory in which the empdb.dump file is stored
      • Modify the batch file or the shell script by replacing the keyword password by the password string defined during the Oracle XE installation and (only for the shell script) check the Oracle directory path
      • Please check that tables  EMP e DEPT are not already present. Otherwise, please remove then with the DROP command  (during user creation / workspace Application Express, Oracle automatically creates EMP and DEPT example tables)
      • Execute the batch file or the shell script updated with the correct password
      • Alternatively:  copy of the database

Homework to be delivered

To obtain the points associated with the homeworks, students have to observe the following terms:

    • Complete all the points of the exercises in the homework text.
    • Prepare one file in PDF, DOC or ODT format with the solution of the homework.
    • Name the file as: HomeworkN_Surname_Name_StudentId.XXX where
        • StudentId, Surname and Name should be substituted with student information
        • the N character following Homework should be substituted with the number of the submitted homework
        • the filename extension XXX depends on the file type chosen for the submission (PDF, DOC or ODT).
        • DOCX format is not supported.
      • Since uploaded files are automatically processed, naming the file with a wrong name implies the cancellation of the related homework submission.
    • Load the file on the didactic portal (Portale della didattica) in the section Work Submission (Elaborati) before the deadline.
        • Multiple loadings for the same student and/or for the same homework are not allowed.
        • The upload date show on the didactic portal is considered for the evaluation.
      • Since uploaded files are automatically processed, the upload after the deadline implies the cancellation of the related homework submission.
    • During the upload procedure a description (“Descrizione”) field is requested. Insert the same name of the file according to the rules described above.
    • Only the students without the access to the course page on the didactic portal can submit the homework before the deadline by sending an email to the assistant lecturer (eliana dot pastor at polito dot it)
  • Discuss the homework with a positive evaluation on the fixed date (announcement will be published).

Homework discussion: Students attending the written exam must bring the following items:

  • for Homeworks #1 – #4:
    • a hard-copy of the submitted reports
Homework Material Deadline Homework deliveries
Homework #1: Data warehouse HW text (pdf) to be delivered by Monday, November 12th, 2018 at 11.59 PM (UTC/GMT+1)
Homework #2: Data Mining HW text (pdf) – Dataset (breast.xls) to be delivered by Monday, November 26th, 2018 at 11.59 PM (UTC/GMT+1)
Homework #3: Triggers HW text (pdf) –  scripts. to be delivered by Monday, December 10th, 2018 at 11.59 PM (UTC/GMT+1)
Homework #4: Query Optimization HW text (pdf)* to be delivered by Friday, January 11th, 2019 at 11.59 PM (UTC/GMT+1)

*Updated: typos corrected (19/12)