1. MINING FREQUENT ITEMSETS USING HIGH- SPEED
ALGORITHMS AND FP-TREES
A PROJECT REPORT
Submitted by
ANTONY JAYASEELAN.G
MUTHU KUMARAN.D
PRAVVEEN.G
RAKESH.R
in partial fulfillment for the award of the degree
of
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINEERING
SAVEETHA ENGINEERING COLLEGE,
CHENNAI – 602 105
ANNA UNIVERSITY : CHENNAI 600 025
April 2009
2. ANNA UNIVERSITY : CHENNAI 600 025
BONAFIDE CERTIFICATE
Certified that this project report “MINING OF THE FREQUENT ITEMSET
MINING USING HIGH SPEED ALGORITHMS AND FP-TREES” is the
bonafide work of “ANTONY JAYASEELAN.G (21605104003), MUTHU
KUMARAN.D (21605104027), PRAVVEEN.G (21605104035), RAKESH.R
(21605104039)”, who carried out the project work under my supervision.
SIGNATURE SIGNATURE
Dr.P.Palaniswamy, M.Tech(IIT-M), Ph.D(IISc) Mr.Mohana Prakash.T.A, B.E
HEAD OF THE DEPARTMENT SUPERVISOR
LECTURER
Computer Science & Engineering Computer Science & Engineering
Saveetha Engineering College, Saveetha Engineering College,
Saveetha Nagar, Saveetha Nagar,
Thandalam, Thandalam,
Chennai – 602 105 Chennai – 602 105
3. INTERNAL EXAMINER EXTERNAL EXAMINER
ACKNOWLEDGMENT
We express our deepest gratitude to our President Dr.N.M.Veeraiyan, for
his invaluable guidance and blessings.
We are very grateful to our Principal Dr.Venkatasamy.R for providing us
with an environment to complete our project successfully.
We would like to thank Prof.R.Dheenadayalu, B.E, M.Sc (Engg.) Dean
(ICT) for his unwavering support during the entire course of this project work.
We are deeply indebted to our Head of the Department Dr.
P.Palaniswamy, M.Tech (IIT Madras), Ph.D (IISC), who modeled us both
technically and morally for achieving greater success in life.
We express our sincere thanks to Senior Lecturer Mr.Saravanan.R, for his
constant encouragement and support throughout our course, especially for the
useful suggestions given during the course of the project period.
We are very grateful to our internal guide Mr.Mohana Prakash.T.A,
Lecturer, for being instrumental in the completion of our project with his complete
guidance.
We would also like to thank our Project Coordinator Mr.Sridharan.K for his
support during the entire course of this project work.
We also thank all the staff members of our college and technicians for their
help in making this project a successful one.
Finally, we take this opportunity to extend our deep appreciation to our
family and friends, for all that they meant to us during the crucial times of the
completion of our project.
4. ABSTRACT
Efficient algorithms for mining frequent itemsets are crucial for mining
association rules as well as for many other data mining tasks. Methods for mining
frequent itemsets have been implemented using a prefix-tree structure, known as an
FP-tree, for storing compressed information about frequent itemsets. Numerous
experimental results have demonstrated that these algorithms perform extremely
well.
In this paper, we present a novel FP-array technique that greatly reduces the
need to traverse FP-trees, thus obtaining significantly improved performance for
FP-tree-based algorithms. Our technique works especially well for sparse data sets.
Furthermore, we present new algorithms for mining all, maximal, and closed
frequent itemsets. Our algorithms use the FP-tree data structure in combination with
the FP-array technique efficiently and incorporate various optimization techniques.
Even though the algorithms consume much memory when the data sets are sparse,
they are still the fastest ones when the minimum support is low. Moreover, they are
always among the fastest algorithms and consume less memory than other methods
when the data sets are dense.
This algorithm can be applied to various applications like Banking,
Insurance, and Departmental Stores etc. We implementing this algorithm adopted
especially for banking application
5. TABLE OF CONTENTS
CHAPTER.NO TITLE PAGE NO
ABSTRACT i
LIST OF FIGURES iii
LIST OF ABBREVIATIONS iv
1. INTRODUCTION 1
2. LITERATURE REVIEW 3
2.1 EXISTING SYSTEM 6
2.2 PROPOSED SYSTEM 13
2.3 PROBLEM FORMULATION
3. SYSTEM REQUIREMENTS 15
3.2 PLATFORM 17
3.2.1 Software Requirements 17
3.2.2 Hardware Requirements 19
4. SYSTEM DESIGN 22
3.3 PROJECT DESCRIPTION 26
3.4 ALGORITHM 32
3.4.1 fp-growth 32
3.4.2 fp-max 34
3.4.3 cfi tree & fp close 36
5. IMPLEMENTATION 39
5.1 CODING 39
5.2 TESTING 42
APPENDICES 52
REFERENCES 64
ii
6. LIST OF FIGURES
PAGE NO.
FIGURE NO. TITLE
2. a RELATION BETWEEN DIFFERENT ITEMSETS 20
2.3.a MODULE INTERFACE DIAGRAM 24
26
2.3.b DATA FLOW DIAGRAM
2.3.c CLASS DIAGRAM 36
2.3.d SEQUENCE DIAGRAM 37
2.3.d ER DIAGRAM 38
2.3.d FP GROWTH 39
iii
LIST OF ABBREVIATIONS
7. FI Frequent Items
MFI Maximal Frequent Item
CFI Closed Frequent Item
FP Frequent Pattern
FP-MAX Frequent Pattern Maximum
FP-CLOSE Frequent Pattern Closed
J2EE Java 2 Enterprise Edition
AWT Abstract Windowing Toolkit
API Application Program Interface
JDBC Java Data Base Connectivity
DSN Data Source Name
iv