On October 23rd, 2014, we updated our
By continuing to use LinkedIn’s SlideShare service, you agree to the revised terms, so please take a few minutes to review them.
Data Mining: Mining ,associations, and correlationsPresentation Transcript
Mining ,Associations, and Correlations
What is Market Basket Analysis? Market basket analysis may be performed on the retail data of customer transactions at a store. That can be then used to plan marketing or advertising strategies, or in the design of a new catalog. Market basket analysis can also help retailers plan which items to put on sale at reduced prices. If customers tend to purchase computers and printers together, then having a sale on printers may encourage the sale of printers as well as computers.
What is Association rule mining? Association rule mining can be viewed as a two-step process: 1. Find all frequent item-sets: By definition, each of these item-sets will occur at least as frequently as a predetermined minimum support count, min sup. 2. Generate strong association rules from the frequent item-sets: By definition, these rules must satisfy minimum support and minimum confidence.
Basis for pattern Mining The completeness of patterns to be mined The levels of abstraction involved in the rule set The number of data dimensions involved in the rule: The types of values handled in the rule The kinds of rules to be mined The kinds of patterns to be mined
Methods to improve the efficiency of Apriori algorithm for mining Hash-based technique Hashing item-sets into corresponding buckets A hash-based technique can be used to reduce the size of the candidate k-item-sets, Ck , for k > 1.
Methods to improve the efficiency of Apriori algorithm for mining Transaction reduction Reducing the number of transactions scanned in future iterations A transaction that does not contain any frequent k-item-sets cannot contain any frequent (k + 1)-item-sets.
Methods to improve the efficiency of Apriori algorithm for mining Partitioning Partitioning the data to find candidate item-sets A partitioning technique can be used that requires just two database scans to mine the frequent item-sets as shown below , It has two phases
Methods to improve the efficiency of Apriori algorithm for mining Sampling Mining on a subset of the given data The basic idea of the sampling approach is to pick a random sample S of the given data D, and then search for frequent item-sets in S instead of D. In this way, we trade off some degree of accuracy against efficiency
Methods to improve the efficiency of Apriori algorithm for mining Dynamic item-set counting Adding candidate item-sets at different points during a scan A dynamic item-set counting technique was proposed in which the database is partitioned into blocks marked by start points.
Pruning strategies in data mining Item merging: If every transaction containing a frequent item-set X also contains an item-set Y but not any proper superset of Y , then X ∪Y forms a frequent closed item-set and there is no need to search for any item-set containing X but no Y . Sub-item-set pruning: If a frequent item-set X is a proper subset of an already found frequent closed item-set Y and support count(X) = support count(Y ), then X and all of X’s descendants in the set enumeration tree cannot be frequent closed item-sets and thus can be pruned.
Pruning strategies in data mining Item skipping: In the depth-first mining of closed item-sets, at each level, there will be a prefix item-set X associated with a header table and a projected database. If a local frequent item p has the same support in several header tables at different levels, we can safely prune p from the header tables at higher levels.
What are Constraint-Based Association Mining? The constraints can include the following: Knowledge type constraints: These specify the type of knowledge to be mined, such as association or correlation. Data constraints: These specify the set of task-relevant data. Dimension/level constraints: These specify the desired dimensions (or attributes) of the data, or levels of the concept hierarchies, to be used in mining. Interestingness constraints: These specify thresholds on statistical measures of rule interestingness, such as support, confidence, and correlation. Rule constraints: These specify the form of rules to be mined.
Meta rule-Guided Mining of Association Rules Metarules allow users to specify the syntactic form of rules that they are interested in mining. The rule forms can be used as constraints to help improve the efficiency of the mining process.
Constraint Pushing or Mining Guided by Rule Constraints Rule constraints specify expected set/subset relationships of the variables in the mined rules, constant initiation of variables, and aggregate functions.
Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net