Big data and cloud-based data mining

Big data and cloud-based data mining

Tutors

Paolo Garza, Tania Cerquitelli

Description

Large volumes of data are being produced by various next generation applications.  Internet Traffic data, Online Social Network data and Satellite data (e.g., the Copernicus EMS data) are only some few example of Big data sources. The automatic extraction of knowledge and the analysis of such huge data volume is a challenging task. However, standard in-core or in-memory data mining algorithm are not able to process that amount of data, indeed new approaches and programming paradigm are necessary.

Currently, cloud-based approaches are considered valid solution to Big data process.

Objectives

  • Study cloud-based approaches over the HADOOP and SPARK framework
  • Design and implement novel cloud-based data mining algorithm
    • HADOOP framework
    • SPARK framework
    • MapReduce paradigm
  • Design and develop analysis module based on HADOOP and SPARK Ecosystem

 Required skills

  • Very good programming skills (especially Java and/or Python)
  • Basic knowledge of Data Mining and Machine Learning

Application

  • Emergency management systems
    • European Project I-REACT: Improving Resilience to Emergencies through Advanced Cyber Technologies
  • Internet Traffic Data analysis
    • European Project ONTIC: Online Network TraffIc Characterization

ontic_logo_terminado_115