Data Science and Database Technology


This page has hierarchy - Parent page: Teaching

Table of content

General information

Announcements

  • [17-10-2018] The first laboratory will be held on week October 21 2018 – October 25, 2019. The text of Practice #1 (Extended SQL) is now available in Section “Practices”.
    • LAB SCHEDULE. To fit LabInf capacity students must attend all the course labs according to the following schedule (students are sorted in alphabetic order by surname):
      TEAM A (FROM A TO K) on Monday from 8.30 am to 10am
      TEAM B (FROM L TO Z) on Friday from 16pm to 17.30pm
      The compliance to the schedule above is necessary to guarantee the feasibility of the lab.



Material

  • Course introduction (slides)

Part I

    • Introduction to Big Data (slides)
    • Data warehouse: introduction (slides)
    • Data warehouse: design (slides)
    • Data warehouse: analysis (slides)
    • Data mining: introduction (slides)
    • Data mining: data preprocessing (slides)
    • Data mining: Association rules (slides). New, updated on 17/10/2019
    • Data mining: classification (slides, slidesNew)
    • [NEW] Data mining: classification, neural networks (slides)
    • Data mining: clustering (slides)
Oracle
  1. Extended SQL (2 slides per page, 6 slides per page)

Part II

Oracle
  1. Oracle Optimizer
    1. Baseline version (2 slides per page6 slides per page)
    2. Extended version with examples (2 slides per page,6 slides per page)
  2. Hints (2 slides per page6 slides per page)
  3. Documentation

 


Exercises

Data warehouse

Exercise  Text Draft solution
Extended SQL (Customer ) text
Extended SQL (Rental ) text
Data warehouse design (Italian household) exercise
Data warehouse design (SearchingYourHouse) exercise
Data warehouse design (Hotel chain) exercise
Data warehouse design (Parcels service) exercise

Triggers

Exercise Text Draft solution
Exercise 1 (Athlete) exercise  
Exercise 2 (Greenhouse) exercise
Exercise 3 (Student grant) exercise
Exercise 4 (Boat rental) exercise

 

Exercises from written exams

 

AA 2015-2016

Exam Draft solution
Exam (23-02-2016)
Exam (2016-01-27)

AA 2011-2012

Exam Draft solution
Exam (2012-02-06)
Exam (2012-02-28)
Exam (2012-06-21)
Exam (2012-09-07)

AA 2010-2011

Exam Draft solution
Exam (2011-02-07)
Exam (2011-02-22)
Exam (2011-07-08)
Exam (2011-09-21)

 

 

 


 

Practices

Topic Team A Team B Lab Assistance
Practice #1: Extended SQL in Oracle  21/10/2019  25/10/2019 assistant lecturer
Practice #2: Data warehousing  28/10/2019  8/11/2019 assistant lecturer
Practice #3: Data mining with Rapidminer 11/11/2019 15/11/2019 assistant lecturer
Lab for Homework #2 on data mining with Rapidminer 18/11/2019 22/11/2019 scholarship holder
Practice #4: Oracle triggers 25/11/2019 29/11/2019 assistant lecturer
Lab for Homework #3 on triggers in Oracle 2/12/2019 6/12/2019 scholarship holder
Practice #5: Oracle optimizer  9/12/2019  13/12/2019 assistant lecturer
Practice #6: MongoDB  16/12/2019  10/01/2020 assistant lecturer

LAB SCHEDULE. To fit LabInf capacity students must attend all the course labs according to the following schedule (students are sorted in alphabetic order by surname):

  • TEAM A (SURNAME STARTING WITH A TO K) on Monday from 8.30 pm to 10pm
  • TEAM B (SURNAME STARTING WITH L TO Z) on Friday from 16pm to 17.30pm

The compliance to the schedule above is necessary to guarantee the feasibility of the lab.

LAB ACCOUNT. Please make sure to have an account on the LABINF PCs before the beginning of the lab practice (the accounts used to log in to the PCs of the other LAIBs are *not* valid). To register an account at LABINF, please visit the Labinf  website for further information.

 

Lab 1: Extended SQL

    • Text (pdf)
    • Data warehouse tables in csv format (zip)
    •        FOR STUDENTS WHO WANT TO PRACTICE AT HOME WITH EXTENDED SQL:
      • to import tables using Oracle SQL Developer: Import data (right click on “Tables” of “Connection” Tab)
      • (alternatively) to import tables from the Oracle XE Web interface: select Home>Utilities>Data Load/Unload>Load>Load Text Data->)
  • Installing Oracle 11g Express Edition at home
  • Draft solution of queries 1-5 and materialized view (pdf) and DW design

Lab 2: Data-warehouse analytics and reporting with Google Data Studio

Lab 3: Data mining – Rapid Miner  

    • Recommendations
        • Rapid Miner is already installed on the LabInf PCs. Please follow the instructions reported in the practice text.
      • Some processes may need a memory allocation higher than the default one (1024 MB). If so, you have to allocate a higher Java Heap memory space to Rapid Miner. Under Windows, run the command shell (cmd), go to the Rapid Miner lib folder (by default, c:\Program Files\Rapid-I\RapidMiner5\lib) and launch Rapid Miner form the shell using the following command: java -Xmx1500m -jar rapidminer.jar  where 1500 is the maximum heap memory space expressed in MB.

 

 

 


Homework to be delivered

To obtain the points associated with the homeworks, students have to observe the following terms:

    • Complete all the points of the exercises in the homework text.
    • Prepare one file in PDF, DOC or ODT format with the solution of the homework.
    • Name the file as: HomeworkN_Surname_Name_StudentId.XXX where
        • StudentId, Surname and Name should be substituted with student information
        • the N character following Homework should be substituted with the number of the submitted homework
        • the filename extension XXX depends on the file type chosen for the submission (PDF, DOC or ODT).
        • DOCX format is not supported.
      • Since uploaded files are automatically processed, naming the file with a wrong name implies the cancellation of the related homework submission.
    • Load the file on the didactic portal (Portale della didattica) in the section Work Submission (Elaborati) before the deadline.
        • Multiple loadings for the same student and/or for the same homework are not allowed.
        • The upload date show on the didactic portal is considered for the evaluation.
      • Since uploaded files are automatically processed, the upload after the deadline implies the cancellation of the related homework submission.
    • During the upload procedure a description (“Descrizione”) field is requested. Insert the same name of the file according to the rules described above.
    • Only the students without the access to the course page on the didactic portal can submit the homework before the deadline by sending an email to the assistant lecturer (eliana dot pastor at polito dot it)
  • Discuss the homework with a positive evaluation on the fixed date (announcement will be published).

Homework discussion: Students attending the written exam must bring the following items:

  • for Homeworks #1 – #4:
    • a hard-copy of the submitted reports

 

 

Homework Material Deadline Homework deliveries
Homework #1: Data warehouse HW text (pdf) to be delivered by Sunday, November 10th, 2019 at 11.59 PM (UTC/GMT+1)
Homework #2: Data Mining HW text (pdf) – Dataset (breast.xls) to be delivered by Sunday, November 24th, 2019 at 11.59 PM (UTC/GMT+1)