Large-scale itemset mining
Itemset mining focuses on the extraction of useful knowledge from huge quantities of data. A wide range of different domains need to deal with the ever-growing amounts of gathered data (e.g., biological data, network traffic data, text mining, streams of sensor network data, spatio-temporal data). Traditional in-core mining algorithms do not scale well with large volumes of data and are hindered by critical issues such as main-memory exhaustion and long execution times. Scalable and alternative approaches have to be devised to efficiently perform large-scale data mining. In this research activity, innovative approaches exploiting disk-based data structures and memory-efficient algorithms to extract frequent itemsets are investigated.
Technical reports
- TR-2-2012: Large scale itemset mining co-authored by Elena Baralis, Tania Cerquitelli, Silvia Chiusano, and Alberto Grand
Datasets
Real datasets
- Wikipedia dataset (tar.gz archive)
Synthetic datasets
- Script to generate synthetic datasets by means of the IBM Data Generator (tar.gz archive)
Publications
Elena Baralis, Tania Cerquitelli, Silvia Chiusano: A persistent HY-Tree to efficiently support itemset mining on large datasets. SAC 2010: 1060-1064
Master thesis
Alberto Grand. Master Thesis. Index support for itemset mining. Joint double-degree program between Politecnico di Torino and University of Illinois at Chicago. Master of Science in Electrical and Computer Engineering. November 2009 (pdf)