Data Science And Database Technology (2021/2022)

Data Science And Database Technology (2021/2022)

General Information

SSD: ING-INF/05

CFU: 8

Professor: Silvia Chiusano

Teaching Assistants: Alessandro Fiori, Eliana Pastor

Announcements [dd-mm-yy]

23-01-2022 – The list of the submitted homeworks is available in the “Homework” section.

27-10-2021 – On the week of 1/11/2021-5/11/2021 there will be no laboratory.

28-09-2021 – Slides and other material used in lessons and practices will be made available here during the semester.

07-11-2021 – The calendar of the laboratory practices has been updated.


Teaching Material

  • Course introduction: pdf

Part I

  • Introduction to Data Science (slides)
  • Data warehouse: introduction (slides)
  • Data warehouse: design (slides)
  • Data warehouse: analysis (slides)
  • Data warehouse: materialize view (slides)
  • Data lakes (slides)
  • Data mining process (slides)
  • Data preparation (slides)
  • Data mining: association rules (slides)
  • Data mining: classification (slides)
  • Data mining: clustering (slides)

Part II

Oracle

Exercise


Laboratory Material

The laboratory practices will start from the fourth week.

TopicTeam A (5:30-7pm)Team B (4-5:30pm)Lab Assistance
Practice #1: Extended SQL in Oracle 19/10/202122/10/2021assistant lecturer
Practice #2: Data warehousing 26/10/2021 29/10/2021assistant lecturer
Practice #3: Materialize views 9/11/2021 12/11/2021assistant lecturer
Practice #4: Data mining with Rapidminer 16/11/2021 1911/2021assistant lecturer
Lab for Homework #2 on data mining with Rapidminer 23/11/2021 26/11/2021scholarship holder
Practice #5: Oracle optimizer 30/11/2021 3/12/2021assistant lecturer
Practice #6: MongoDB14/12/202117/12/2021assistant lecturer

LAB SCHEDULE. 
TEAM A (FROM A TO K) on Tuesday from 5.30pm to 7pm
TEAM B (FROM L TO Z) on Friday from 4pm to 5.30pm

Lab 1: Extended SQL

  • Text (pdf)
  • Data warehouse tables in csv format (zip)
  • SQL Developer is already available at LABINF. If you want to practise at home, you can follow these tutorials
  • Installing Oracle Database 18c Express Edition and SQL Developer
    • To download and install Oracle Express Edition: home page
    • To download and install SQL Developer: home page
    • Tutorial
  • In the case, you want to practice at home and you have problems in using Oracle Database and SQL Developer, you can consider Oracle Live SQL.
    • You can add tables using SQL scripts (zip)
    • A short guide on how to import SQL scripts and query the DB in Oracle Live SQL is available (pdf)
    • if you experience some issues on importing the complete FACT table, you can opt for a “light” version of the table with contains a sample of the rows (facts_sample.sql).

Draft solution (star schema, queries)

Lab 2 : data warehouse analytics and reporting (Google Data Studio)

Lab 3 : materialized views and triggers

  • Text (pdf)
  • Draft solution (pdf)

Lab 4: Data mining – Rapid Miner  

Lab 5: The Oracle Optimizer

Lab 6: NoSQL in MongoDB

Homework to be delivered

To obtain the points associated with the homeworks, students have to observe the following terms:

  • Complete all the points of the exercises in the homework text.
  • Prepare one file in PDF, DOC or ODT format with the solution of the homework.
  • Name the file as: HomeworkN_Surname_Name_StudentId.XXX where
    • StudentId, Surname and Name should be substituted with student information
    • the N character following Homework should be substituted with the number of the submitted homework
    • the filename extension XXX depends on the file type chosen for the submission (PDF, DOC or ODT).
    • DOCX format is not supported.
    • Since uploaded files are automatically processed, naming the file with a wrong name implies the cancellation of the related homework submission.
    •  For example, for homework 1 and extension pdf, the student with name and surname Mario Rossi and id s123456 will upload Homework1_Rossi_Mario_s123456.pdf
  • Load the file on the didactic portal (Portale della didattica) in the section Work Submission (Elaborati) before the deadline.
    • Multiple loadings for the same student and/or for the same homework are not allowed.
    • The upload date show on the didactic portal is considered for the evaluation.
    • Since uploaded files are automatically processed, the upload after the deadline implies the cancellation of the related homework submission.
  • During the upload procedure a description (“Descrizione”) field is requested. Insert the same name of the file according to the rules described above.
  • Only the students without the access to the course page on the didactic portal can submit the homework before the deadline by sending an email to the assistant lecturer (eliana dot pastor at polito dot it)
  • Discuss the homework with a positive evaluation on the fixed date (announcement will be published).

Homework discussion: Students attending the written exam must bring the following items:

  • for Homeworks #1 – #4:
    • a hard-copy of the submitted reports

Homework deliveries:

Homework submissions : list of delivered submissions. In case of any incongruencies or missing delivery, send an email to eliana.pastor@polito.it.

HomeworkMaterialDeadlineHomework deliveries
Homework #1: Data warehouse and materialized viewsHW textto be delivered by Thursday, November 25th, 2021 at 11.59 PM (UTC/GMT+1)

Homework #2: Data miningHW textbreast dataset to be delivered by Thursday, December 2nd, 2021 at 11.59 PM (UTC/GMT+1)

Homework #3: OptimizerHW textto be delivered by Thursday, December 16nd, 2021 at 11.59 PM (UTC/GMT+1)

Homework #4: MongoDBHW text, bike_stations dataset (updated 18/12/21)to be delivered by Tuesday, January 11, 2022 at 11.59 PM (UTC/GMT+1)