Frequent data sets algos

To: Sir Altaf Hussain
Topic
Analysis of Frequent Item set
Mining on Variant Datasets
Summery By:
ISHTIAQ HUSSAIN BANGASH(15-S-06)
And
FARHAN AKRAM(15-S-27)
Class: BSIT-VI

Contents
• Introduction
• Association rule mining
• Frequent itemset mining and Algorithms for data model
• Algorithms:
• Apriori
• FP-Growth
• H-mine
• P-Hmine
• Conclusion

Introduction
• In this paper a complete description of the dataset mushroom is
described on hypothetical samples corresponding to different
species of mushrooms.
• The dataset consists of 8124 instances of 119 attributes which are
derived from 24 species.
• So this is checked by different algorithms which discussed the
datasets of mushroom.

Association rule mining
• Process of discovering
relationship among the data
items in large data base.
• It is one of the most important
problem in the data mining.
• Finding frequent itemset is one
of the most computationally
expensive tasks in association
rule mining.

Frequent itemset mining representations
Follows are the methods of
representation of databases:
1. Horizontal representation
2. Vertical representation
3. Bit-vector representation

Algorithms:
• Apriori
• FP-Growth
• H-mine
• P-Hmine

Apiori
• In preprocessing of apriori algorithm the scane of database is
performed to find out support count of each item then all these
whose minimum is less are removed from the database.
• Aprori follows two step method to find out frequent itemset that
is :
• Join step
• Prune step

FP-Growth
• FP-Growth is known as one of the fastest algorithm of frequent set
mining.
• it uses a compact Data Structure called a FP-tree.
• FP-growth approach first represent the frequent itemset in the
form of frequent pattern tree fp-tree which is compressed
structure

H-mine
• H-mine is another pattern growth method for frequent pattern
mining in Sparse data H-mine is better than it FP-growth.
• H-mine uses divide and conquer strategy to mine all the frequent
pattern

P-Hmine
• The general idea of P-Hmine is that is a represent the database in
the form of a new structure called P-Hstruct. which is similar to
H-struct.
• In P-Hmine struct we represent the database as a set of queues.
Experimental Analysis and Result
• We analyze the running time of algorithm running on both
synthetic and actual data, synthetic data sets generator is taken
from IDM Almanden website.

Datasets
• The data set mushroom is a description of hypothetical sample
was corresponding to different species of Mushrooms.
• The dataset consists of 8124 instances of 119 attributes which are
derived from 24 species.
• The chess data set is also a dense datasets that is consist of 3196
instances and 74 itemset.

Conclusion
• Conclusion in this paper h-mine for uncertain data. Finally we
have analyzed the performance of frequent pattern mining
algorithm on few benchmark metrics.
• In case of binary dense data model FB-growth performs better
than other algorithms because the dense dataset result in a very
compact FP-tree which requires less amount of data.

Continue…
• In case of sparse data sets H-mine performs better than FP-
growth. The reason is that the FP-tree is bigger and spend a lot of
time in building and transversing the conditional FP-trees.
• The Hmine and P-Hmine saved a lot of scans of the database and
achieve better performance than Apriori on all tested datasets.
• The P-Hmine is also scalable for both large number of data items
and large number of transactions.

Frequent data sets algos

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to Frequent data sets algos

Similar to Frequent data sets algos (20)

Recently uploaded

Recently uploaded (20)

Frequent data sets algos