Data Science And Database Technology (2022/2023)

Data Science And Database Technology (2022/2023)

General Information

SSD: ING-INF/05

CFU: 8

Professor: Silvia Chiusano

Teaching Assistants: Alessandro Fiori, Eliana Pastor

Announcements [dd-mm-yy]

22-01-2021 – The list of students who will discuss their homework is available in the Homework Section.

19-12-2022 – The deadline for Homework 3 is postponed to December 22, 2022.

21-11-2022 – The text of Homework 2 is available in the “Homeworks to be delivered” Section. Deadline: December 5, 2022.

15-11-2022 – The text of Homework 1 is available in the “Homeworks to be delivered” Section. Deadline: November 28, 2022.

12-10-2022 Lectures on Thursday, October 13, 2022 and on Friday, October 14, 2022 will be held only online (not in presence) via virtual classroom. Both lectures will be recorded.

10-10-2022 – The lecture on Tuesday, October 11, 2022 will be held online only (not in-person) via virtual classroom. The lecture will be recorded.

03-10-2022 – The lesson on Tuesday, October 4, 2022 has been cancelled.

27-09-2022 – Laboratory exercises will begin from the fourth week.

26-09-2022 – The first lecture is on September 27th.


Teaching Material

  • Course introduction: pdf

Part I

  • Introduction to Data Science (slides)
  • Data warehouse: introduction (slides)
  • Data warehouse: design (slides)
  • Data warehouse: analysis (slides)
  • Data warehouse: materialize view (slides)
  • Data lakes (slides)
  • Data mining process (slides)
  • Data preparation (slides)
  • Data mining: association rules (slides)
  • Data mining: classification (slides)
  • Data mining: clustering (slides)

Part II

  • Introduction to DBMS (slides)
  • Buffer Manager (slides)
  • Physical access to data (slides)
  • Query optimization (slides)
  • Physical Design (slides)
  • Concurrency Control (slides)
  • Reliability management (slides)
  • Distributed databases (slides)
  • NoSQL, beyond relational databases (slides)
  • Introduction to MongoDB (slides)
  • ElasticSearch (slides)

Oracle

Exercise


Laboratory Material

The laboratory practices will start from the fourth week.

LAB SCHEDULE. 

TopicTeam A 11:30-13:00Team B 8:30-10:00
Lab 1: Extended SQL19/10/202220/10/2022
Lab 2: Data Studio26/10/202227/10/2022
Lab 3: Materialize views09/11/202210/11/2022
Lab 4: Data mining with Rapidminer16/11/202217/11/2022
Lab for Homework #2 on data mining with Rapidminer (scholarship holder)23/11/202224/11/2022
Lab 5: Oracle optimizer30/11/20221/12/2022
Lab 6: MongoDB14/12/202215/12/2022

Division into two teams for surname.
TEAM A (FROM A TO J) on Wednesday from 11.30am to 1pm
TEAM B (FROM K TO Z) on Thursday from 8:30am to 10am

Lab 1: Extended SQL

  • Text (pdf)
  • Data warehouse tables in csv format (zip)
  • SQL Developer is already available at LABINF. If you want to practise at home, you can follow these tutorials
  • Installing Oracle Database 18c Express Edition and SQL Developer
    • To download and install Oracle Express Edition: home page
    • To download and install SQL Developer: home page
    • Tutorial
      • Installation Guide for Windows
      • Installation Guide for Ubuntu
      • Installation Guide for Mac OS
  • In the case, you want to practice at home and you have problems in using Oracle Database and SQL Developer, you can consider Oracle Live SQL.
    • You can add tables using SQL scripts (zip)
    • A short guide on how to import SQL scripts and query the DB in Oracle Live SQL is available (pdf)
    • if you experience some issues on importing the complete FACT table, you can opt for a “light” version of the table with contains a sample of the rows (facts_sample.sql).

Draft solution (star schemaqueries)

Lab 2 : data warehouse analytics and reporting (Google Data Studio/Looker Studio)

Lab 3 : materialized views and triggers

  • Draft solution (pdf)

Lab 4: Data mining – Rapid Miner  

Lab 5: The Oracle Optimizer

Lab 6: NoSQL in MongoDB

Homework to be delivered

To obtain the points associated with the homeworks, students have to observe the following terms:

  • Complete all the points of the exercises in the homework text.
  • Prepare one file in PDF, DOC or ODT format with the solution of the homework.
  • Name the file as: HomeworkN_Surname_Name_StudentId.XXX where
    • StudentId, Surname and Name should be substituted with student information
    • the N character following Homework should be substituted with the number of the submitted homework
    • the filename extension XXX depends on the file type chosen for the submission (PDF, DOC or ODT).
    • DOCX format is not supported.
    • Since uploaded files are automatically processed, naming the file with a wrong name implies the cancellation of the related homework submission.
    •  For example, for homework 1 and extension pdf, the student with name and surname Mario Rossi and id s123456 will upload Homework1_Rossi_Mario_s123456.pdf
  • Load the file on the didactic portal (Portale della didattica) in the section Work Submission (Elaborati) before the deadline.
    • Multiple loadings for the same student and/or for the same homework are not allowed.
    • The upload date show on the didactic portal is considered for the evaluation.
    • Since uploaded files are automatically processed, the upload after the deadline implies the cancellation of the related homework submission.
  • During the upload procedure a description (“Descrizione”) field is requested. Insert the same name of the file according to the rules described above.
  • Only the students without the access to the course page on the didactic portal can submit the homework before the deadline by sending an email to the assistant lecturer (eliana dot pastor at polito dot it)
  • Discuss the homework with a positive evaluation on the fixed date (announcement will be published).

Homework discussion: Students attending the written exam must bring the following items:

  • for Homeworks #1 – #4:
    • a hard-copy of the submitted reports

Homework deliveries:

Homework discussion. At the end of the exam, the following students will have to come to Lab 5 of DAUIN department (second floor, entrance from Corso Castelfidardo 39) for the discussion of their homeworks:

  • 318907
  • 316607
  • 313642
  • 315962

Homework submissions : list of delivered submissions. In case of any incongruencies or missing delivery, send an email to eliana.pastor@polito.it.

HomeworkMaterialDeadline
Homework #1: Data warehouse and materialized viewsHW text – draft solutionto be delivered by Monday, November 28th, 2022 at 11.59 PM (UTC/GMT+1)
Homework #2: Data miningHW text – breast datasetto be delivered by Monday, December 5th, 2022 at 11.59 PM (UTC/GMT+1)
Homework #3: The OptimizerHW textto be delivered by Monday, December 22nd 19th, 2022 at 11.59 PM (UTC/GMT+1)
Homework #4: MongoDBHW textbike_stations datasetto be delivered by Sunday, January 15th, 2023 at 11.59 PM (UTC/GMT+1)