SlideShare a Scribd company logo
1 of 8
MINING FREQUENT ITEMSETS USING HIGH- SPEED
         ALGORITHMS AND FP-TREES

                      A PROJECT REPORT


                           Submitted by
                 ANTONY JAYASEELAN.G
                  MUTHU KUMARAN.D
                     PRAVVEEN.G
                      RAKESH.R

        in partial fulfillment for the award of the degree

                               of

           BACHELOR OF ENGINEERING
                               IN

       COMPUTER SCIENCE AND ENGINEERING


         SAVEETHA ENGINEERING COLLEGE,
                CHENNAI – 602 105




       ANNA UNIVERSITY : CHENNAI 600 025
                           April 2009
ANNA UNIVERSITY : CHENNAI 600 025

                          BONAFIDE CERTIFICATE


Certified that this project report “MINING OF THE FREQUENT ITEMSET

MINING USING HIGH SPEED ALGORITHMS AND FP-TREES” is the

bonafide work of        “ANTONY JAYASEELAN.G (21605104003), MUTHU

KUMARAN.D (21605104027), PRAVVEEN.G (21605104035), RAKESH.R

(21605104039)”, who carried out the project work under my supervision.




             SIGNATURE                                  SIGNATURE
Dr.P.Palaniswamy, M.Tech(IIT-M), Ph.D(IISc)     Mr.Mohana Prakash.T.A, B.E
HEAD OF THE DEPARTMENT                          SUPERVISOR
                                                LECTURER
Computer Science & Engineering                  Computer Science & Engineering
Saveetha Engineering College,                   Saveetha Engineering College,
Saveetha Nagar,                                 Saveetha Nagar,
Thandalam,                                      Thandalam,
Chennai – 602 105                               Chennai – 602 105
INTERNAL EXAMINER                                 EXTERNAL EXAMINER


                            ACKNOWLEDGMENT


       We express our deepest gratitude to our President Dr.N.M.Veeraiyan, for
his invaluable guidance and blessings.

      We are very grateful to our Principal Dr.Venkatasamy.R for providing us
with an environment to complete our project successfully.

      We would like to thank Prof.R.Dheenadayalu, B.E, M.Sc (Engg.) Dean
(ICT) for his unwavering support during the entire course of this project work.

      We are deeply         indebted to our Head of the Department Dr.
P.Palaniswamy, M.Tech (IIT Madras), Ph.D (IISC), who modeled us both
technically and morally for achieving greater success in life.

       We express our sincere thanks to Senior Lecturer Mr.Saravanan.R, for his
constant encouragement and support throughout our course, especially for the
useful suggestions given during the course of the project period.

      We are very grateful to our internal guide Mr.Mohana Prakash.T.A,
Lecturer, for being instrumental in the completion of our project with his complete
guidance.

      We would also like to thank our Project Coordinator Mr.Sridharan.K for his
support during the entire course of this project work.

       We also thank all the staff members of our college and technicians for their
help in making this project a successful one.

      Finally, we take this opportunity to extend our deep appreciation to our
family and friends, for all that they meant to us during the crucial times of the
completion of our project.
ABSTRACT

        Efficient algorithms for mining frequent itemsets are crucial for mining

association rules as well as for many other data mining tasks. Methods for mining

frequent itemsets have been implemented using a prefix-tree structure, known as an

FP-tree, for storing compressed information about frequent itemsets. Numerous

experimental results have demonstrated that these algorithms perform extremely

well.

        In this paper, we present a novel FP-array technique that greatly reduces the

need to traverse FP-trees, thus obtaining significantly improved performance for

FP-tree-based algorithms. Our technique works especially well for sparse data sets.

Furthermore, we present new algorithms for mining all, maximal, and closed

frequent itemsets. Our algorithms use the FP-tree data structure in combination with

the FP-array technique efficiently and incorporate various optimization techniques.

Even though the algorithms consume much memory when the data sets are sparse,

they are still the fastest ones when the minimum support is low. Moreover, they are

always among the fastest algorithms and consume less memory than other methods

when the data sets are dense.

        This algorithm can be applied to various applications like Banking,

Insurance, and Departmental Stores etc. We implementing this algorithm adopted

especially for banking application
TABLE OF CONTENTS



CHAPTER.NO        TITLE                         PAGE NO
             ABSTRACT                               i
             LIST OF FIGURES                        iii
             LIST OF ABBREVIATIONS                  iv


    1.       INTRODUCTION                           1
    2.       LITERATURE REVIEW                      3
             2.1 EXISTING SYSTEM                    6
             2.2 PROPOSED SYSTEM                    13
             2.3 PROBLEM FORMULATION
    3.       SYSTEM REQUIREMENTS                    15
             3.2 PLATFORM                           17
                  3.2.1 Software Requirements       17
                  3.2.2 Hardware Requirements       19
    4.       SYSTEM DESIGN                          22
             3.3 PROJECT DESCRIPTION                26
             3.4 ALGORITHM                          32
                  3.4.1 fp-growth                   32
                  3.4.2 fp-max                      34
                  3.4.3 cfi tree & fp close         36
    5.       IMPLEMENTATION                         39
             5.1 CODING                             39
             5.2 TESTING                            42


             APPENDICES                             52
             REFERENCES                             64




                             ii
LIST OF FIGURES


                                                   PAGE NO.
FIGURE NO.                       TITLE

2. a         RELATION BETWEEN DIFFERENT ITEMSETS   20

2.3.a        MODULE INTERFACE DIAGRAM              24
                                                   26
2.3.b        DATA FLOW DIAGRAM

2.3.c        CLASS DIAGRAM                         36

2.3.d        SEQUENCE DIAGRAM                      37

2.3.d        ER DIAGRAM                            38

2.3.d        FP GROWTH                             39




                                  iii


                     LIST OF ABBREVIATIONS
FI         Frequent Items
MFI        Maximal Frequent Item
CFI        Closed Frequent Item
FP         Frequent Pattern
FP-MAX     Frequent Pattern Maximum
FP-CLOSE   Frequent Pattern Closed
J2EE       Java 2 Enterprise Edition
AWT        Abstract Windowing Toolkit
API        Application Program Interface
JDBC       Java Data Base Connectivity
DSN        Data Source Name




                            iv
129  sample 1_st few pages for final doc

More Related Content

Viewers also liked

Autoriõigus, litsentsid ja avatud sisu
Autoriõigus, litsentsid ja avatud sisuAutoriõigus, litsentsid ja avatud sisu
Autoriõigus, litsentsid ja avatud sisuHans Põldoja
 
Proyecto Algodón Siempre Catuti – José Tiburcio
Proyecto Algodón Siempre Catuti – José TiburcioProyecto Algodón Siempre Catuti – José Tiburcio
Proyecto Algodón Siempre Catuti – José TiburcioMaximiliano Valencia
 
Chiropractic Health Orientation
Chiropractic Health OrientationChiropractic Health Orientation
Chiropractic Health Orientationpetimi
 
Davis Langdon Sentiment Monitor 11 - Mar 2010
Davis Langdon Sentiment Monitor 11 - Mar 2010Davis Langdon Sentiment Monitor 11 - Mar 2010
Davis Langdon Sentiment Monitor 11 - Mar 2010Michael Skelton
 
Mass communication thoery
Mass communication thoeryMass communication thoery
Mass communication thoeryRana Saddam
 
SZIE beadandó 2012 április
SZIE beadandó 2012 áprilisSZIE beadandó 2012 április
SZIE beadandó 2012 áprilisillesalmos
 
Cogitainment - On Happiness, Dreams&Goals, Success _ Almaty
Cogitainment - On Happiness, Dreams&Goals, Success _ AlmatyCogitainment - On Happiness, Dreams&Goals, Success _ Almaty
Cogitainment - On Happiness, Dreams&Goals, Success _ AlmatyCogitainment
 
Absolute Software & Lojack – 2014 Overview
Absolute Software & Lojack – 2014 OverviewAbsolute Software & Lojack – 2014 Overview
Absolute Software & Lojack – 2014 OverviewAbsolute LoJack
 

Viewers also liked (14)

Cpr
CprCpr
Cpr
 
Autoriõigus, litsentsid ja avatud sisu
Autoriõigus, litsentsid ja avatud sisuAutoriõigus, litsentsid ja avatud sisu
Autoriõigus, litsentsid ja avatud sisu
 
Proyecto Algodón Siempre Catuti – José Tiburcio
Proyecto Algodón Siempre Catuti – José TiburcioProyecto Algodón Siempre Catuti – José Tiburcio
Proyecto Algodón Siempre Catuti – José Tiburcio
 
Chiropractic Health Orientation
Chiropractic Health OrientationChiropractic Health Orientation
Chiropractic Health Orientation
 
Jpeg!
Jpeg!Jpeg!
Jpeg!
 
Davis Langdon Sentiment Monitor 11 - Mar 2010
Davis Langdon Sentiment Monitor 11 - Mar 2010Davis Langdon Sentiment Monitor 11 - Mar 2010
Davis Langdon Sentiment Monitor 11 - Mar 2010
 
Mass communication thoery
Mass communication thoeryMass communication thoery
Mass communication thoery
 
SZIE beadandó 2012 április
SZIE beadandó 2012 áprilisSZIE beadandó 2012 április
SZIE beadandó 2012 április
 
Equivariance
EquivarianceEquivariance
Equivariance
 
Cogitainment - On Happiness, Dreams&Goals, Success _ Almaty
Cogitainment - On Happiness, Dreams&Goals, Success _ AlmatyCogitainment - On Happiness, Dreams&Goals, Success _ Almaty
Cogitainment - On Happiness, Dreams&Goals, Success _ Almaty
 
Absolute Software & Lojack – 2014 Overview
Absolute Software & Lojack – 2014 OverviewAbsolute Software & Lojack – 2014 Overview
Absolute Software & Lojack – 2014 Overview
 
Location scoute
Location scoute Location scoute
Location scoute
 
Assignmet on facebook
Assignmet on facebookAssignmet on facebook
Assignmet on facebook
 
Summer Reading 2011-2012
Summer Reading 2011-2012Summer Reading 2011-2012
Summer Reading 2011-2012
 

Similar to 129 sample 1_st few pages for final doc

student mangement
student mangementstudent mangement
student mangementAditya Gaud
 
AIR BAG CRASH USING MEMS
AIR BAG CRASH USING MEMS AIR BAG CRASH USING MEMS
AIR BAG CRASH USING MEMS Ganesh Gani
 
Prof Chethan Raj C, Final year Project Report Format
Prof Chethan Raj C, Final year Project Report FormatProf Chethan Raj C, Final year Project Report Format
Prof Chethan Raj C, Final year Project Report FormatProf Chethan Raj C
 
WIRELESS ROBOT
WIRELESS ROBOTWIRELESS ROBOT
WIRELESS ROBOTAIRTEL
 
Auto Metro Train to Shuttle Between Stations
Auto Metro Train to Shuttle Between StationsAuto Metro Train to Shuttle Between Stations
Auto Metro Train to Shuttle Between StationsMadhav Reddy Chintapalli
 
Microcontroller based automatic engine locking system for drunken drivers
Microcontroller based automatic engine locking system for drunken driversMicrocontroller based automatic engine locking system for drunken drivers
Microcontroller based automatic engine locking system for drunken driversVinny Chweety
 
Modularcomputing sreekanthkt-110717065834-phpapp01
Modularcomputing sreekanthkt-110717065834-phpapp01Modularcomputing sreekanthkt-110717065834-phpapp01
Modularcomputing sreekanthkt-110717065834-phpapp01melbygeo
 
Sample projectdocumentation
Sample projectdocumentationSample projectdocumentation
Sample projectdocumentationhlksd
 
automatic database schema generation
automatic database schema generationautomatic database schema generation
automatic database schema generationsoma Dileep kumar
 
An evaluation of distributed datastores using the app scale cloud platform
An evaluation of distributed datastores using the app scale cloud platformAn evaluation of distributed datastores using the app scale cloud platform
An evaluation of distributed datastores using the app scale cloud platformhimanshuvaishnav
 

Similar to 129 sample 1_st few pages for final doc (20)

student mangement
student mangementstudent mangement
student mangement
 
AIR BAG CRASH USING MEMS
AIR BAG CRASH USING MEMS AIR BAG CRASH USING MEMS
AIR BAG CRASH USING MEMS
 
Prof Chethan Raj C, Final year Project Report Format
Prof Chethan Raj C, Final year Project Report FormatProf Chethan Raj C, Final year Project Report Format
Prof Chethan Raj C, Final year Project Report Format
 
Real Time Image Processing
Real Time Image Processing Real Time Image Processing
Real Time Image Processing
 
iPDC Report Kedar
iPDC Report KedariPDC Report Kedar
iPDC Report Kedar
 
iPDC Report Nitesh
iPDC Report NiteshiPDC Report Nitesh
iPDC Report Nitesh
 
Alcohol report
Alcohol reportAlcohol report
Alcohol report
 
Final edu junction_ss (1)
Final edu junction_ss (1)Final edu junction_ss (1)
Final edu junction_ss (1)
 
WIRELESS ROBOT
WIRELESS ROBOTWIRELESS ROBOT
WIRELESS ROBOT
 
Auto Metro Train to Shuttle Between Stations
Auto Metro Train to Shuttle Between StationsAuto Metro Train to Shuttle Between Stations
Auto Metro Train to Shuttle Between Stations
 
Project report
Project reportProject report
Project report
 
Front Pages_pdf_format
Front Pages_pdf_formatFront Pages_pdf_format
Front Pages_pdf_format
 
Table of contents
Table of contentsTable of contents
Table of contents
 
Microcontroller based automatic engine locking system for drunken drivers
Microcontroller based automatic engine locking system for drunken driversMicrocontroller based automatic engine locking system for drunken drivers
Microcontroller based automatic engine locking system for drunken drivers
 
Modularcomputing sreekanthkt-110717065834-phpapp01
Modularcomputing sreekanthkt-110717065834-phpapp01Modularcomputing sreekanthkt-110717065834-phpapp01
Modularcomputing sreekanthkt-110717065834-phpapp01
 
Modular Computing
Modular ComputingModular Computing
Modular Computing
 
Sample projectdocumentation
Sample projectdocumentationSample projectdocumentation
Sample projectdocumentation
 
automatic database schema generation
automatic database schema generationautomatic database schema generation
automatic database schema generation
 
A minor project
A minor projectA minor project
A minor project
 
An evaluation of distributed datastores using the app scale cloud platform
An evaluation of distributed datastores using the app scale cloud platformAn evaluation of distributed datastores using the app scale cloud platform
An evaluation of distributed datastores using the app scale cloud platform
 

Recently uploaded

Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 

Recently uploaded (20)

Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 

129 sample 1_st few pages for final doc

  • 1. MINING FREQUENT ITEMSETS USING HIGH- SPEED ALGORITHMS AND FP-TREES A PROJECT REPORT Submitted by ANTONY JAYASEELAN.G MUTHU KUMARAN.D PRAVVEEN.G RAKESH.R in partial fulfillment for the award of the degree of BACHELOR OF ENGINEERING IN COMPUTER SCIENCE AND ENGINEERING SAVEETHA ENGINEERING COLLEGE, CHENNAI – 602 105 ANNA UNIVERSITY : CHENNAI 600 025 April 2009
  • 2. ANNA UNIVERSITY : CHENNAI 600 025 BONAFIDE CERTIFICATE Certified that this project report “MINING OF THE FREQUENT ITEMSET MINING USING HIGH SPEED ALGORITHMS AND FP-TREES” is the bonafide work of “ANTONY JAYASEELAN.G (21605104003), MUTHU KUMARAN.D (21605104027), PRAVVEEN.G (21605104035), RAKESH.R (21605104039)”, who carried out the project work under my supervision. SIGNATURE SIGNATURE Dr.P.Palaniswamy, M.Tech(IIT-M), Ph.D(IISc) Mr.Mohana Prakash.T.A, B.E HEAD OF THE DEPARTMENT SUPERVISOR LECTURER Computer Science & Engineering Computer Science & Engineering Saveetha Engineering College, Saveetha Engineering College, Saveetha Nagar, Saveetha Nagar, Thandalam, Thandalam, Chennai – 602 105 Chennai – 602 105
  • 3. INTERNAL EXAMINER EXTERNAL EXAMINER ACKNOWLEDGMENT We express our deepest gratitude to our President Dr.N.M.Veeraiyan, for his invaluable guidance and blessings. We are very grateful to our Principal Dr.Venkatasamy.R for providing us with an environment to complete our project successfully. We would like to thank Prof.R.Dheenadayalu, B.E, M.Sc (Engg.) Dean (ICT) for his unwavering support during the entire course of this project work. We are deeply indebted to our Head of the Department Dr. P.Palaniswamy, M.Tech (IIT Madras), Ph.D (IISC), who modeled us both technically and morally for achieving greater success in life. We express our sincere thanks to Senior Lecturer Mr.Saravanan.R, for his constant encouragement and support throughout our course, especially for the useful suggestions given during the course of the project period. We are very grateful to our internal guide Mr.Mohana Prakash.T.A, Lecturer, for being instrumental in the completion of our project with his complete guidance. We would also like to thank our Project Coordinator Mr.Sridharan.K for his support during the entire course of this project work. We also thank all the staff members of our college and technicians for their help in making this project a successful one. Finally, we take this opportunity to extend our deep appreciation to our family and friends, for all that they meant to us during the crucial times of the completion of our project.
  • 4. ABSTRACT Efficient algorithms for mining frequent itemsets are crucial for mining association rules as well as for many other data mining tasks. Methods for mining frequent itemsets have been implemented using a prefix-tree structure, known as an FP-tree, for storing compressed information about frequent itemsets. Numerous experimental results have demonstrated that these algorithms perform extremely well. In this paper, we present a novel FP-array technique that greatly reduces the need to traverse FP-trees, thus obtaining significantly improved performance for FP-tree-based algorithms. Our technique works especially well for sparse data sets. Furthermore, we present new algorithms for mining all, maximal, and closed frequent itemsets. Our algorithms use the FP-tree data structure in combination with the FP-array technique efficiently and incorporate various optimization techniques. Even though the algorithms consume much memory when the data sets are sparse, they are still the fastest ones when the minimum support is low. Moreover, they are always among the fastest algorithms and consume less memory than other methods when the data sets are dense. This algorithm can be applied to various applications like Banking, Insurance, and Departmental Stores etc. We implementing this algorithm adopted especially for banking application
  • 5. TABLE OF CONTENTS CHAPTER.NO TITLE PAGE NO ABSTRACT i LIST OF FIGURES iii LIST OF ABBREVIATIONS iv 1. INTRODUCTION 1 2. LITERATURE REVIEW 3 2.1 EXISTING SYSTEM 6 2.2 PROPOSED SYSTEM 13 2.3 PROBLEM FORMULATION 3. SYSTEM REQUIREMENTS 15 3.2 PLATFORM 17 3.2.1 Software Requirements 17 3.2.2 Hardware Requirements 19 4. SYSTEM DESIGN 22 3.3 PROJECT DESCRIPTION 26 3.4 ALGORITHM 32 3.4.1 fp-growth 32 3.4.2 fp-max 34 3.4.3 cfi tree & fp close 36 5. IMPLEMENTATION 39 5.1 CODING 39 5.2 TESTING 42 APPENDICES 52 REFERENCES 64 ii
  • 6. LIST OF FIGURES PAGE NO. FIGURE NO. TITLE 2. a RELATION BETWEEN DIFFERENT ITEMSETS 20 2.3.a MODULE INTERFACE DIAGRAM 24 26 2.3.b DATA FLOW DIAGRAM 2.3.c CLASS DIAGRAM 36 2.3.d SEQUENCE DIAGRAM 37 2.3.d ER DIAGRAM 38 2.3.d FP GROWTH 39 iii LIST OF ABBREVIATIONS
  • 7. FI Frequent Items MFI Maximal Frequent Item CFI Closed Frequent Item FP Frequent Pattern FP-MAX Frequent Pattern Maximum FP-CLOSE Frequent Pattern Closed J2EE Java 2 Enterprise Edition AWT Abstract Windowing Toolkit API Application Program Interface JDBC Java Data Base Connectivity DSN Data Source Name iv