Associative classification

This page has hierarchy - Parent page: Research

L3 (Live and Let Live)


Classification aims to define an abstract model of a set of classes, called classifier, which is built from a set of labeled data, the training set. The classifier is then used to appropriately classify new data for which the class label is unknown. Different approaches have been proposed to build accurate classifiers, for example, naive Bayes classification, decision trees, and SVMs. Recently, association rules have become a valuable tool for classification purposes (for example, CAEP, CMAR, CBA, and ADT). In associative classification, the rule consequent is a class label, and the classifier is a set of association rules. The generation of an associative classifier consists of two steps. First, classification rules are extracted from the training data. Then, pruning techniques are applied to select a small subset of high-quality rules and build an accurate model of training data. Usually, a large rule set is mined to allow a wide selection of rules and the generation of accurate classifiers. However, in large or correlated data sets, rule mining may yield a huge number of classification rules. Rule extraction becomes difficult, and it becomes hard to optimally exploit the generated rules. Hence, pruning techniques, in particular, support-based pruning, are exploited to reduce the complexity of the extraction task.
To address both an excessive rule set size and overpruning, we have proposed a new associative classifiers that relies on a lazy pruning approach coupled with a compact representation of the rule set. We named our classifier L3, which stands for Live and Let Live (that is, pruning only takes place when strictly necessary). The lazy pruning technique performs a reduced amount of pruning by eliminating only “harmful rules,” that is, rules that only misclassify training data. During classification, L3 adopts a two-step approach in which high-quality rules (that is, rules used in the classification of training data) are considered first, and “unchecked” rules, that is, rules unused during the training phase, are used next to classify unlabeled data. Rules of the second type are only considered when unlabeled data cannot be classified by means of the first type of rules. L3 is based on a compact form that allows representing large rule sets. The compact form proposed in L3 represents a rule set without information loss and allows the regeneration of the complete rule set. This form also allows reaching very low support thresholds and mining large rule sets. Rule compression provides a space-effective representation of these large rule sets in the classifier.

Software – Download section

The following implementations of the L3 algorithm are available: