Data Science and Database Technology (2020/2021)


This page has hierarchy - Parent page: Teaching

Table of content

General information

Announcements

  • [02-02-2021] We posted a recap of homework submissions.
  • [21-11-2020]  Homework 4 (deadline 7-01-2021) has been published in Section “Homeworks to be delivered”
  • [27-11-2020]  Homework 3 (deadline 7-12-2020) has been published in Section “Homeworks to be delivered”
  • [13-11-2020]  Homework 2 (deadline 23-11-2020) has been published in Section “Homeworks to be delivered”
  • [29-10-2020] We posted an updated version of Homework 1 (Query b.ii )
  • [29-10-2020] Calendar of tutoring sessions in November has been posted
  • [29-10-2020] Calendar of laboratory practices has been posted
  • [28-10-2020] Considering the problem with Virtual Classroom during the 1st lab (19th 2020), we have recorded the description of Practise #1 in Virtual Classroom. You will find it (soon, we already processed it on VC) as “Laboratory 1 – Lab description “. Please refer to this recording for the Lab#1.
  • [28-10-2020]  Homework 1 (deadline 8-11-2020) has been published in Section “Homeworks to be delivered”
  • [16-10-2020] The first laboratory will be held on Monday October 19th 2020. The tutorial for the software install for Practice #1 (Extended SQL) is now available in Section “Practices”.
    • LAB SCHEDULE.
      TEAM A (FROM A TO G) on Monday from 2.30pm to 4pm
      TEAM B (FROM H TO Z) on Monday from 4pm to 5.30pm
  • [16-10-2020]  Join us on Piazza. You can register at Piazza whether you are already enrolled in the course or you are waiting for it. If you do not have an @studenti.polito.it address yet, drop an email to eliana.pastor@polito.it including your personal email address and your ID on the Polito website or the Apply procedure – it is in the format FXXXXX. The registration with your personal address is temporary: remember to add your educational address on Piazza as soon as you receive it.

Exams

  • Exam rules A.Y. 2020/2021 (Covid-19 Emergency)  (pdf-rules)

Exam on January 26, 2021

  • Draft solution (pdf)

Material

  • Course introduction (slides)

Part I

  • Introduction to Big Data (slides)
  • Data warehouse: introduction (slides)
  • Data warehouse: design (slides)
  • Data warehouse: analysis (slides)
  • Data mining: introduction (slides)
  • Data mining: data preprocessing (slides)
  • Data mining: Association rules (slides).
  • Data mining: classification (slides, slidesNew)
  • Data mining: classification, neural networks (slides)
  • Data mining: clustering (slides)

Oracle

  1. Extended SQL (2 slides per page, 6 slides per page)

 

Part II

Oracle

  1. Oracle Optimizer
    1. Baseline version (2 slides per page6 slides per page)
    2. Extended version with examples (2 slides per page,6 slides per page)
  2. Hints (2 slides per page6 slides per page)
  3. Documentation


 Exercises

Data warehouse

Exercise  Text Draft solution
Extended SQL (Customer ) text Draft solution
Extended SQL (Rental ) text Draft solution
Data warehouse design (Italian household) exercise Draft solution
Data warehouse design (SearchingYourHouse) exercise Draft solution
Data warehouse design (Hotel chain) exercise Draft solution
Data warehouse design (Parcels service) exercise

 

Triggers

Exercise Text Draft solution
Exercise 1 (Athlete) exercise Draft solution
Exercise 2 (Greenhouse) exercise Draft solution
Exercise 3 (Student grant) exercise Draft solution
Exercise 4 (Boat rental) exercise DraftSolBoatRental (timecondition)

Optimizer

Exercise  Text Draft solution
Exercise 1 (Fine) 2 slides per page 6 slides per page Draft Solution
Exercise 2 (Students, Projects) 2 slides per page 6 slides per page Draft Solution
Exercise 3 (Discs) exercise Draft Solution
Exercise 4 (Athletes, Members) 2 slides per page 6 slides per page  Draft Solution

Exercises from written exams

AA 2015-2016

Exam Draft solution
Exam (23-02-2016) optimizer, dw, trigger
Exam (2016-01-27) optimizer, dw, trigger

AA 2011-2012

Exam Draft solution
Exam (2012-02-06) optimizer and dwtrigger
Exam (2012-02-28) optimizer and dw
Exam (2012-06-21) optimizer and dwtrigger
Exam (2012-09-07) optimizer and dw

AA 2010-2011

Exam Draft solution
Exam (2011-02-07) optimizer-trigger-dw
Exam (2011-02-22)
Exam (2011-07-08) optimizer, trigger
Exam (2011-09-21) optimizer, trigger

 


Tutoring sessions

  • In tutoring sessions teachers are available to answer questions on homework, exercises and topics presented in lessons.
  • The calendar of the tutoring sessions in November is the following:
    • Monday 2/11 – 2:30-3pm  4-4:30pm – Virtual classroom
    • Friday 16/11 – 2:30-3pm  4-4:30pm – Virtual classroom
    • Thursday 30/11 – 2:30-3pm  4-4:30pm – Virtual classroom
  • During the lab session (2:30-5:30 pm) the scholarship holder will be also available to answer questions on homeworks

 


Practices

LAB SCHEDULE. 
TEAM A (FROM A TO G) on Monday from 2.30pm to 4pm
TEAM B (FROM H TO Z) on Monday from 4pm to 5.30pm

Lab calendar

Topic Team A (2:30-4pm)

Team B (4-5:30pm)

Lab Assistance
Practice #1: Extended SQL in Oracle  19/10/2020 assistant lecturer
Practice #2: Data warehousing  26/10/2020 assistant lecturer
Lab for Homework #1 –  Data warehousing + Tutoring session  2/11/2020 scholarship holder
Practice #3: Data mining with Rapidminer  9/11/2020 assistant lecturer
Lab for Homework #2 on data mining with Rapidminer + Tutoring session  16/11/2020 scholarship holder
Practice #4: Oracle triggers  23/11/2020 assistant lecturer
Lab for Homework #3 on triggers in Oracle + Tutoring session  30/11/2020 scholarship holder
Practice #5: Oracle optimizer  14/12/2020 assistant lecturer
Practice #6: MongoDB  11/01/2021 assistant lecturer

 

Lab 1: Extended SQL

  • Text (pdf)
  • Data warehouse tables in csv format (zip)
  • Import Database and Tables
  • Installing Oracle Database 18c Express Edition and SQL Developer
    • To download and install Oracle Express Edition: home page
    • To download and install SQL Developer: home page
    • Tutorial
    • If you have problems with the installation, please use Piazza (folder oraclexesql)
    • Draft solution of queries 1-5 and materialized view (pdf) and DW design

Lab 2: Data-warehouse analytics and reporting with Google Data Studio

Lab 3: Data mining – Rapid Miner  

Lab 4: Triggers

  • Text (pdf)
  • scripts for creating DBs (create_db scripts)
  • Screenshots of the database after the trigger executions: Results
  • Draft solution (pdf) [UPDATE (Ex. 3) ]

Lab 5: The Oracle Optimizer

Lab 6: NoSQL in MongoDB


Homework to be delivered

 

To obtain the points associated with the homeworks, students have to observe the following terms:

    • Complete all the points of the exercises in the homework text.
    • Prepare one file in PDF, DOC or ODT format with the solution of the homework.
    • Name the file as: HomeworkN_Surname_Name_StudentId.XXX where
        • StudentId, Surname and Name should be substituted with student information
        • the N character following Homework should be substituted with the number of the submitted homework
        • the filename extension XXX depends on the file type chosen for the submission (PDF, DOC or ODT).
        • DOCX format is not supported.
      • Since uploaded files are automatically processed, naming the file with a wrong name implies the cancellation of the related homework submission.
      •  For example, for homework 1 and extension pdf, the student with name and surname Mario Rossi and id s123456 will upload Homework1_Rossi_Mario_s123456.pdf
    • Load the file on the didactic portal (Portale della didattica) in the section Work Submission (Elaborati) before the deadline.
        • Multiple loadings for the same student and/or for the same homework are not allowed.
        • The upload date show on the didactic portal is considered for the evaluation.
      • Since uploaded files are automatically processed, the upload after the deadline implies the cancellation of the related homework submission.
    • During the upload procedure a description (“Descrizione”) field is requested. Insert the same name of the file according to the rules described above.
    • Only the students without the access to the course page on the didactic portal can submit the homework before the deadline by sending an email to the assistant lecturer (eliana dot pastor at polito dot it)
  • Discuss the homework with a positive evaluation on the fixed date (announcement will be published).

Homework discussion: Students attending the written exam must bring the following items:

  • for Homeworks #1 – #4:
    • a hard-copy of the submitted reports

Homework deliveries:

Homework submissions : list of delivered submissions. In case of any incongruencies or missing delivery, send an email to eliana.pastor@polito.it.

 

Homework Material Deadline Homework deliveries
Homework #1: Data warehouse HW text (pdf)

[NEW] Updated (b) ii

to be delivered by Sunday, November 08th, 2020 at 11.59 PM (UTC/GMT+1)
Homework #2: Data Mining HW text (pdf) – Dataset (breast.xls) to be delivered by Monday, November 23th, 2020 at 11.59 PM (UTC/GMT+1)
Homework #3: Triggers HW text (pdf) –  scripts. to be delivered by Monday, December 7th, 2020 at 11.59 PM (UTC/GMT+1)
Homework #4: Query Optimization HW text (pdf) [Updated having COUNT(*)>1] to be delivered by Thursday, January 7th, 2021 at 11.59 PM (UTC/GMT+1)