Data Science And Database Technology (2023/2024)

Data Science And Database Technology (2023/2024)

General Information

SSD: ING-INF/05

CFU: 8

Professor: Silvia Chiusano

Teaching Assistants: Alessandro Fiori, Davide Napolitano

Announcements [yy-mm-dd]

  • [2023-11-07] – Wednesday, November 8, 2023: Laboratory on Data Studio for TEAMS A and B
  • [2023-10-17] – The laboratory starts on Wednesday 25/10/2023 – The organization of students into teams has been published (more details in Laboratory section of this page)

Weekly schedule (from 4/12/2023 to 8/12/2023)

10:00-11:3013:00-14:3014:30-16:0017:30-19:00
Monday [Room R4]
DBMS Technology: Query optimization in Oracle and exercises
[Room R4]
DBMS Technology: Query optimization in Oracle and exercises
Wednesday[LAIB2B]
Team A: Oracle optimizer
[LAIB2B]
Team B: Oracle optimizer
[Room R3]
DBMS Technology: Physical design; Concurrency Control
Thursday[Room R2]
DBMS Technology: Concurrency Control

Teaching Material

  • Course introduction: pdf

Part I

  • Introduction to Data Science (slides)
  • Data warehouse: introduction (slides)
  • Data warehouse: design (slides)
  • Data warehouse: analysis (slides)
  • Data warehouse: materialized view, physical design, ETL (slides)
  • Data lakes (slides)
  • Data mining process (slides)
  • Data preparation (slides)
  • Data mining: association rules (slides)
  • Data mining: classification (slides)
  • Data mining: clustering (slides)

Part II

  • Introduction to DBMS (slides)
  • Buffer Manager (slides)
  • Physical access to data (slides)
  • Query optimization (slides)
  • Physical Design (slides)
  • Concurrency Control (slides)
  • Reliability management (slides)
  • Distributed databases (slides)
  • NoSQL, beyond relational databases (slides)
  • Introduction to MongoDB (slides)
  • ElasticSearch (slides)


Exercise

SUBJECTMATERIAL
Extended SQL, materialized view, triggersExtended SQL and materialized view in Oracle (2 slides per page6 slides per page)
Exercise 1 on extended SQL (text, draft solution)
Materialized views and triggers (text, draft solution)
Supporting material: Introduction to triggers (slides)
Data WarehouseStorehouses (text, draft solution)
Italian wines (text, draft solution)
Remote heating (text, draft solution)
Scientific publications (text)
Query optimizationFine (text)
Students (text, draft solution)
Athletes (text)
Tourist village (text)

Laboratory Material

The laboratory practices will start from the fourth week.

LAB TEAMS (Division into two teams for surname)WHENWHERE
TEAM A: from AAA to LZZ Wednesday 13:00-14:30LAIB2B
TEAM B: from MAA to ZZZWednesday 14:30-16:30LAIB2B
NOTE: it is recommended to respect the division into teams to allow the laboratories to take place


SUBJECTLAB SCHEDULE TEXTSOLUTIONSOFTWARE
Lab 1: Extended SQLWednesday 25/10/2023Text Sol_DW Sol_SQLFiles
Lab 2: Data StudioWednesday 08/11/2023Text
Lab 3: Materialize viewsWednesday 15/11/2023TextSol
Lab 4: Data mining with RapidminerWednesday 22/11/2023TextSolFiles
Lab 5: Oracle optimizerWednesday 06/12/2023TextFiles
Lab 6: MongoDBWednesday 20/12/2023

Homework to be delivered

To obtain the points associated with the homeworks, students have to observe the following terms:

  • Complete all the points of the exercises in the homework text.
  • All exercises must be computer-written where possible (e.g. SQL queries, Triggers, etc…). Only some exercises are accepted handwritten, such as Conceptual Schema in DW design.
  • Prepare one file in PDF format with the solution of the homework.
  • Name the file as: HomeworkN_Surname_Name_StudentId.pdf where
    • StudentId, Surname and Name should be substituted with student information
    • the N character following Homework should be substituted with the number of the submitted homework
    • Since uploaded files are automatically processed, using the wrong format or naming the file with a wrong name implies the cancellation of the related homework submission.
    • For example, for homework 1 and extension pdf, the student with name and surname Mario Rossi and id s123456 will upload Homework1_Rossi_Mario_s123456.pdf
  • Load the file on the didactic portal (Portale della didattica) in the section Work Submission (Elaborati) before the deadline.
    • Multiple loadings for the same student and/or for the same homework are not allowed.
    • The upload date show on the didactic portal is considered for the evaluation.
    • Since uploaded files are automatically processed, the upload after the deadline implies the cancellation of the related homework submission.
  • During the upload procedure a description (“Descrizione”) field is requested. Insert the same name of the file according to the rules described above.
  • Only the students without the access to the course page on the didactic portal can submit the homework before the deadline by sending an email to the assistant lecturer (davide.napolitano@polito.it)
  • Discuss the homework with a positive evaluation on the fixed date (announcement will be published).

Homework to be delivered:

The solution of each homework will be uploaded after the corresponding deadline.

Homework discussion:

HomeworkTextFilesSolutionUploadDeadline
Homework #1: Data warehouse and materialized viewsTextuploaded before the end of November 15th, 2023 to be delivered by November 28th, 2023 at 11.59 PM (UTC/GMT+1)
Homework #2: Data miningTextDatasetuploaded before the end of November 24th, 2023 to be delivered by December 6th, 2023 at 11.59 PM (UTC/GMT+1)
Homework #3: The Optimizeruploaded before the end of December 7th, 2023 to be delivered by December 20th, 2023 at 11.59 PM (UTC/GMT+1)
Homework #4: MongoDBuploaded before the end of December 21st, 2023 to be delivered by January 11th, 2024 at 11.59 PM (UTC/GMT+1)