{"id":3700,"date":"2012-04-26T13:59:59","date_gmt":"2012-04-26T12:59:59","guid":{"rendered":"http:\/\/dbdmg.polito.it\/wordpress\/?page_id=3700"},"modified":"2014-11-13T10:25:54","modified_gmt":"2014-11-13T09:25:54","slug":"large-scale-itemset-mining","status":"publish","type":"page","link":"https:\/\/dbdmg.polito.it\/wordpress\/research\/large-scale-itemset-mining\/","title":{"rendered":"Large-scale itemset mining"},"content":{"rendered":"<p>Itemset mining focuses on the extraction of useful knowledge from huge quantities of data. A wide range of different domains need to deal with the ever-growing amounts of gathered data\u00a0 (e.g., biological data, network traffic data, text mining, streams of sensor network data, spatio-temporal data). Traditional in-core mining algorithms do not scale well with large volumes of data and are hindered by critical issues such as main-memory exhaustion and long execution times. Scalable and alternative approaches have to be devised to efficiently perform large-scale data mining. In this research activity, innovative approaches exploiting disk-based data structures and memory-efficient algorithms to extract frequent itemsets are investigated.<\/p>\n<h2>Technical reports<\/h2>\n<ul>\n<li>TR-2-2012: Large scale itemset mining co-authored by Elena Baralis, Tania Cerquitelli, Silvia Chiusano, and Alberto Grand<\/li>\n<\/ul>\n<\/div>\n<h2>Datasets<\/h2>\n<h4>Real datasets<\/h4>\n<ul>\n\t<!--li>Wikipedia dataset (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2012\/04\/wiki.tar.gz\">tar.gz<\/a> archive)<\/li-->\n<li>Wikipedia dataset (<a href=\"http:\/\/dbdmg.polito.it\/~tania\/\/wiki.tar.gz\">tar.gz<\/a> archive)<\/li>\n<\/ul>\n<h4>Synthetic datasets<\/h4>\n<ul>\n<li>Script to generate synthetic datasets by means of the IBM Data Generator (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2012\/04\/run-gen.tar.gz\">tar.gz<\/a> archive)<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2>Publications<\/h2>\n<p>Elena\u00a0 Baralis, Tania Cerquitelli, Silvia Chiusano: A persistent HY-Tree to efficiently support itemset mining on large datasets. SAC 2010: 1060-1064<\/p>\n<h2>Master thesis<\/h2>\n<p><span style=\"font-size: small;\">Alberto Grand. Master Thesis. Index support for itemset mining.\u00a0<\/span>Joint double-degree program between Politecnico di Torino and University of Illinois at Chicago.\u00a0Master of Science in Electrical and Computer Engineering. November 2009 (<a href=\"http:\/\/dbdmg.polito.it\/wordpress\/wp-content\/uploads\/2012\/04\/Master-Thesis-Grand.pdf\">pdf<\/a>)<\/p>\n<br class=\"fixfloat\" \/>","protected":false},"excerpt":{"rendered":"<p>Itemset mining focuses on the extraction of useful knowledge from huge quantities of data. A wide range of different domains need to deal with the ever-growing amounts of gathered data\u00a0 (e.g., biological data, network traffic data, text mining, streams of sensor network data, spatio-temporal data). Traditional in-core mining algorithms do not scale well with large<a href=\"https:\/\/dbdmg.polito.it\/wordpress\/research\/large-scale-itemset-mining\/\">[&#8230;]<\/a><\/p>\n","protected":false},"author":4,"featured_media":0,"parent":98,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-3700","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/pages\/3700","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/comments?post=3700"}],"version-history":[{"count":25,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/pages\/3700\/revisions"}],"predecessor-version":[{"id":7550,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/pages\/3700\/revisions\/7550"}],"up":[{"embeddable":true,"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/pages\/98"}],"wp:attachment":[{"href":"https:\/\/dbdmg.polito.it\/wordpress\/wp-json\/wp\/v2\/media?parent=3700"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}