SlideShare a Scribd company logo
A Combined Approach for Clustering
  based on the GSA-KM and Genetic
              Algorithms


Divakar Raj.M (0901016)                       Under the guidance of
Dilip.M (0901015)                             Mr.P.Perumal
Kishore Kumar.C (0901036)                     Associate Professor
IV CSE - A                                    Department of Computer Science and
                                              Engineering (UG)


                            Data Mining / Clustering                      1/33
Introduction about Data Mining

• Data mining (knowledge discovery in databases):
   – Extraction of interesting (non-trivial, implicit, previously unknown and
       potentially useful) information or patterns from data in large databases


• Potential Applications
   –   Market analysis and management
   –   Risk analysis and management
   –   Fraud detection and management
   –   Text mining (news group, email, documents) and Web analysis
   –   Intelligent query answering


                                Data Mining / Clustering                      2/33
Data Mining: A KDD Process


 – Data mining: the core of                         Pattern Evaluation
   knowledge discovery
   process.
                                     Data Mining

                Task-relevant Data


      Data Warehouse         Selection


Data Cleaning

          Data Integration


        Databases            Data Mining / Clustering                    3/33
Architecture of a Typical Data
       Mining System
                 Graphical user interface


                    Pattern evaluation


                  Data mining engine

                                                               Knowledge-base
               Database or data warehouse
                         server
                                                   Filtering
Data cleaning & data integration

                                            Data
            Databases                     Warehouse
                                                                          4/33
                        Data Mining / Clustering
Data Mining Functionalities
• Concept description: Characterization and discrimination
   – Generalize, summarize, and contrast data characteristics, e.g., dry
     vs. wet regions


• Association (correlation and causality)
   – Multi-dimensional vs. single-dimensional association
   – age(X, ―20..29‖) ^ income(X, ―20..29K‖)           buys(X, ―PC‖)
   – contains(T, ―computer‖)          contains(x, ―software‖)




                           Data Mining / Clustering                    5/33
Data Mining Functionalities
• Classification and Prediction
   – Finding models (functions) that describe and distinguish classes or
     concepts for future prediction
   – E.g., classify countries based on climate, or classify cars based on gas
     mileage
   – Presentation: decision-tree, classification rule, neural network
   – Prediction: Predict some unknown or missing numerical values
• Cluster analysis
   – Class label is unknown: Group data to form new classes, e.g., cluster
     houses to find distribution patterns
   – Clustering based on the principle: maximizing the intra-class similarity
     and minimizing the interclass similarity

                             Data Mining / Clustering                      6/33
Data Mining Functionalities
• Outlier analysis
   – Outlier: a data object that does not comply with the general behavior
     of the data
   – It can be considered as noise or exception but is quite useful in fraud
     detection, rare events analysis

• Trend and evolution analysis
   – Trend and deviation: regression analysis
   – Sequential pattern mining, periodicity analysis
   – Similarity-based analysis



                             Data Mining / Clustering                     7/33
Issues in Data mining
•   Individual Privacy
•   Data Integrity
•   Relational Database Structure (vs) Multidimensional One
•   Issue of Cost
•   Mining methodology and user interaction issues
•   Performance issues
•   Issues relating to the diversity of database types




                          Data Mining / Clustering            8/33
Applications
• Database analysis and decision support

   – Market analysis and management
       • Target Marketing, Customer Relation Management, Market
         Basket Analysis, Cross Selling, Market Segmentation


   – Risk analysis and management
       • Forecasting, Customer Retention, Improved Underwriting,
         Quality Control, Competitive Analysis




                           Data Mining / Clustering                9/33
Applications

• Text mining (news group, email, documents) and Web analysis
• Intelligent query answering
• Sports
• Astronomy
• Internet Web Surf-Aid




                          Data Mining / Clustering        10/33
Clustering

• Clustering is a data mining (machine learning)
  technique used to place data elements into related
  groups without advance knowledge of the group
  definitions

• Set of meaningful sub classes called clusters




                      Data Mining / Clustering         11/33
Cluster Analysis
• Cluster: a collection of data objects
   – Similar to one another within the same cluster
   – Dissimilar to the objects in other clusters

• Cluster analysis
   – Grouping a set of data objects into clusters

• Clustering is unsupervised classification: no predefined classes

• Typical applications
   – As a stand-alone tool to get insight into data distribution
   – As a preprocessing step for other algorithms

                             Data Mining / Clustering              12/33
What Is Good Clustering?
• A good clustering method will produce high quality clusters with
    – high intra-class similarity
    – low inter-class similarity


• The quality of a clustering result depends on both the similarity
  measure used by the method and its implementation.


• The quality of a clustering method is also measured by its ability to
  discover some or all of the hidden patterns


                            Data Mining / Clustering                  13/33
Requirements of Clustering in Data Mining

 • Scalability
 • Ability to deal with different types of attributes
 • Discovery of clusters with arbitrary shape
 • Minimal requirements for domain knowledge to determine input
   parameters
 • Able to deal with noise and outliers
 • Insensitive to order of input records
 • High dimensionality
 • Incorporation of user-specified constraints
 • Interpretability and Usability

                           Data Mining / Clustering          14/33
Major Clustering Approaches
• Partitioning algorithms: Construct various partitions and then
  evaluate them by some criterion
• Hierarchy algorithms: Create a hierarchical decomposition of the
  set of data (or objects) using some criterion
• Density-based: based on connectivity and density functions
• Grid-based: based on a multiple-level granularity structure
• Model-based: A model is hypothesized for each of the clusters
  and the idea is to find the best fit of that model to each other


                            Data Mining / Clustering                 15/33
Issues of Clustering
• Assessment of results

• Choice of appropriate number of clusters

• Data preparation

• Proximity measures

• Handling outliers


                          Data Mining / Clustering   16/33
General Applications of Clustering

• Pattern Recognition

• Image Processing

• Economic Science (especially market research)

• WWW
   – Document classification
   – Cluster Weblog data to discover groups of similar access patterns



                           Data Mining / Clustering               17/33
Examples of Clustering Applications
• Marketing: Help marketers discover distinct groups in their
  customer bases, and then use this knowledge to develop targeted
  marketing programs
• Land use: Identification of areas of similar land use in an earth
  observation database
• Insurance: Identifying groups of motor insurance policy holders
  with a high average claim cost
• City-planning: Identifying groups of houses according to their
  house type, value, and geographical location
• Earth-quake studies: Observed earth quake epicenters should be
  clustered along continent faults
                           Data Mining / Clustering             18/33
Literature Survey
[1] An Architecture for Component-Based Design of Representative-
    Based Clustering Algorithms
    Boris Delibas, Milan Vuki, Milos Jovanovi, Kathrin Kirchner,
    Johannes Ruhland, Milija Suknovic (2012)

[2] The Research of Imbalanced Data Set of Sample Sampling Method
    Based on K-Means Cluster and Genetic Algorithm
    Yang Yong, (2012)

[3] A Combined Approach for Clustering based on K-means and
    Gravitational Search Algorithms
    Abdolreza Hatamlou, Salwani Abdullah, Hossein Nezamabadi-
    pour, (2012)
                             Data Mining / Clustering               19/33
An Architecture for Component-Based Design of
      Representative-Based Clustering Algorithms


• Based on reusable components

• Components derived from K-Means like algorithms and their extensions

• The new algorithm is built by exchanging components from the original
  algorithm and their improvements

• The Comparison & Evaluation are possible by using Representative Based
  Clustering Algorithm




                             Data Mining / Clustering                     20/33
The Research of Imbalanced Data Set of Sample
                    Sampling Method
       Based on K-Means Cluster and Genetic Algorithm


• We use K-Means to cluster & In each cluster, we use GA to carry on the
  valid confirmation and to gain a new sample

• Enhances the classified performance of imbalanced datasets

• Generates unbalanced data set’s minority class

• Attention to Classification’s accuracy of Minority Classes




                              Data Mining / Clustering                 21/33
A Combined Approach for Clustering based on K-
                  means and
          Gravitational Search Algorithms
• A hybrid data clustering algorithm based on GSA and k-means
  (GSA-KM) is presented
• It uses the advantages of both algorithms
• Comparison of the performance of GSA-KM with other well-known
  algorithms
   –   K-means
   –   Genetic Algorithm(GA)
   –   Simulated Annealing(SA)
   –   Ant Colony Optimization(ACO)
   –   Honey Bee Mating Optimization(HBMO)
   –   Particle Swarm Optimization(PSO)
   –   Gravitational Search Algorithm(GSA)
• Comparison based on real and standard datasets from the UCI
  repository             Data Mining / Clustering               22/33
Existing System
K-Means
• One of the most efficient and famous clustering algorithms
• Starts with some random or heuristic-based centroids for the desired clusters
• Assigns every data object to the closest centroid
• Iteratively refines the current centroids to reach the (near) optimal ones by
  calculating the mean value of data objects within their respective clusters
• The algorithm will terminate when any one of the specified termination
  criteria is met (i.e., a predetermined maximum number of iterations is
  reached, a (near) optimal solution is found or the maximum search time is
  reached)



                                Data Mining / Clustering                    23/33
Existing System
Gravitational Search Algorithm

• Inspired by the physical phenomenon of Gravity
• Based on the interaction of masses in the universe via Newtonian
  gravity law
• Attraction depends on the amount of masses and the distance
  between them

                        2
• F = G (M1*M2) / R



                            Data Mining / Clustering                 24/33
Drawbacks of Existing System
K – Means

• Performance is highly dependent on the initial state of
  centroids

• May converge to the local optima rather than global optima

• The number of clusters is needed as input to the algorithm, i.e.
  the number of clusters is assumed known


                          Data Mining / Clustering             25/33
GSA-KM
• Built on three main steps

   1. GSA-KM applies k-means algorithm on selected dataset
      and tries to produce near optimal centroids for desired
      clusters
   2. The proposed approach will produce an initial population
      of solutions
   3. Application of the GSA Algorithm




                         Data Mining / Clustering           26/33
GSA - KM
Ways for production of an initial population

• One of the candidate solutions will be produced by the output of the
  k-means algorithm, which has been achieved in the previous step

• Three of them will be created based on the dataset itself and other
  solutions will be produced randomly

• GSA will be employed for determining an optimal solution for the
  clustering problem



                            Data Mining / Clustering                27/33
Reasons for Efficiency
• Decreases the number of iterations and function evaluations to
  find a near global optimum compared to the original GSA
  alone

• With the advent of a good candidate solution in the initial
  population, GSA can search for near global optima in a
  promising search space and, therefore, find a high quality
  solution in comparison with the original GSA alone




                          Data Mining / Clustering              28/33
Proposed System

• Along with the given GSA-KM, we intend to implement
  Genetic Algorithm to further increase the efficiency and speed
  of the clustering

• The proposed system will have combined advantages and will
  be faster and efficient than the traditional clustering algorithms
  and also GSA-KM




                           Data Mining / Clustering              29/33
Implementation Details

• Programming language : C#
• Database : MS- Access

• The given repository is clustered using K-Means and GSA,
  combinedly called GSA-KM and Genetic Algorithm is used to
  enhance the performance
• The performance is calculated and compared with other
  clustering algorithms



                       Data Mining / Clustering         30/33
References
[1]   C.L. Blake, C.J. Merz
      UCI repository of machine learning databases
      http://www.ics.uci.edu/-learn/MLRepository.html

[2]   S. Das, A. Abraham, A. Konar
      Meta heuristic pattern clustering —an overview
      Studies in Computational Intelligence (2009)

[3]   L. Kaufman, P.J. Rousseeuw
      Finding Groups in Data: An Introduction to Cluster Analysis
      John Wiley & Sons, New York, (1990)

[4]   M.B. Adil
      Modified global-means algorithm for minimum sum-of- squares clustering problems
      Pattern Recognition 41 (10) (2008)

[5]   E. Rashedi, H. Nezamabadi-pour, S. Saryazdi
      GSA: a gravitational search algorithm
      Information Sciences 179 (13) (2009)


                                    Data Mining / Clustering                            31/33
References
[6]    A. Likas, N. Vlassis, J.J. Verbeek
       The global k -means clustering algorithm
       Pattern Recognition 36 (2) (2003)

[7]    M. Mahdavi
       Novel meta-heuristic algorithms for clustering web documents
       Applied Mathematics and Computation (2008)

[8]    M. Moshtaghi
       Clustering ellipses for anomaly detection
       Pattern Recognition 44 (2008)

[9]    B. Saglam, et al.,
       A mixed-integer programming approach to the clustering problem with an application in customer
       segmentation
       European Journal of Operational Research 173 (3) (2006)

[10]   A.K. Jain
       Data clustering: 50 years beyond K –means
       Pattern Recognition Letters 31 (8) (2010)

                                     Data Mining / Clustering                                 32/33
Thank You !!!




  Data Mining / Clustering   33/33

More Related Content

What's hot

Flow oriented modeling
Flow oriented modelingFlow oriented modeling
Flow oriented modeling
ramyaaswin
 
Dynamic Itemset Counting
Dynamic Itemset CountingDynamic Itemset Counting
Dynamic Itemset Counting
Tarat Diloksawatdikul
 
Project presentation template
Project presentation templateProject presentation template
Project presentation template
Abhishek Bhardwaj
 
Random scan displays and raster scan displays
Random scan displays and raster scan displaysRandom scan displays and raster scan displays
Random scan displays and raster scan displays
Somya Bagai
 
OOAD - UML - Sequence and Communication Diagrams - Lab
OOAD - UML - Sequence and Communication Diagrams - LabOOAD - UML - Sequence and Communication Diagrams - Lab
OOAD - UML - Sequence and Communication Diagrams - Lab
Victer Paul
 
System Models in Software Engineering SE7
System Models in Software Engineering SE7System Models in Software Engineering SE7
System Models in Software Engineering SE7
koolkampus
 
sutherland- Hodgeman Polygon clipping
sutherland- Hodgeman Polygon clippingsutherland- Hodgeman Polygon clipping
sutherland- Hodgeman Polygon clipping
Arvind Kumar
 
Elaboration and domain model
Elaboration and domain modelElaboration and domain model
Elaboration and domain model
Vignesh Saravanan
 
Object Oriented Approach for Software Development
Object Oriented Approach for Software DevelopmentObject Oriented Approach for Software Development
Object Oriented Approach for Software Development
Rishabh Soni
 
Video display devices
Video display devicesVideo display devices
Video display devices
shalinikarunakaran1
 
An Approach For Predicting Road Accident Severity
An Approach For Predicting Road Accident SeverityAn Approach For Predicting Road Accident Severity
An Approach For Predicting Road Accident Severity
BilalSikander3
 
Character generation techniques
Character generation techniquesCharacter generation techniques
Character generation techniques
Mani Kanth
 
state modeling In UML
state modeling In UMLstate modeling In UML
state modeling In UML
Kumar
 
Unit 1( modelling concepts & class modeling)
Unit  1( modelling concepts & class modeling)Unit  1( modelling concepts & class modeling)
Unit 1( modelling concepts & class modeling)
Manoj Reddy
 
Depth Buffer Method
Depth Buffer MethodDepth Buffer Method
Depth Buffer Method
Ummiya Mohammedi
 
Parallel projection
Parallel projectionParallel projection
Parallel projection
Prince Shahu
 
Processes and threads
Processes and threadsProcesses and threads
Domain class model
Domain class modelDomain class model
Domain class model
shekharsj
 
Dynamic and Static Modeling
Dynamic and Static ModelingDynamic and Static Modeling
Dynamic and Static Modeling
Saurabh Kumar
 
Raster Scan display
Raster Scan displayRaster Scan display
Raster Scan display
Lokesh Singrol
 

What's hot (20)

Flow oriented modeling
Flow oriented modelingFlow oriented modeling
Flow oriented modeling
 
Dynamic Itemset Counting
Dynamic Itemset CountingDynamic Itemset Counting
Dynamic Itemset Counting
 
Project presentation template
Project presentation templateProject presentation template
Project presentation template
 
Random scan displays and raster scan displays
Random scan displays and raster scan displaysRandom scan displays and raster scan displays
Random scan displays and raster scan displays
 
OOAD - UML - Sequence and Communication Diagrams - Lab
OOAD - UML - Sequence and Communication Diagrams - LabOOAD - UML - Sequence and Communication Diagrams - Lab
OOAD - UML - Sequence and Communication Diagrams - Lab
 
System Models in Software Engineering SE7
System Models in Software Engineering SE7System Models in Software Engineering SE7
System Models in Software Engineering SE7
 
sutherland- Hodgeman Polygon clipping
sutherland- Hodgeman Polygon clippingsutherland- Hodgeman Polygon clipping
sutherland- Hodgeman Polygon clipping
 
Elaboration and domain model
Elaboration and domain modelElaboration and domain model
Elaboration and domain model
 
Object Oriented Approach for Software Development
Object Oriented Approach for Software DevelopmentObject Oriented Approach for Software Development
Object Oriented Approach for Software Development
 
Video display devices
Video display devicesVideo display devices
Video display devices
 
An Approach For Predicting Road Accident Severity
An Approach For Predicting Road Accident SeverityAn Approach For Predicting Road Accident Severity
An Approach For Predicting Road Accident Severity
 
Character generation techniques
Character generation techniquesCharacter generation techniques
Character generation techniques
 
state modeling In UML
state modeling In UMLstate modeling In UML
state modeling In UML
 
Unit 1( modelling concepts & class modeling)
Unit  1( modelling concepts & class modeling)Unit  1( modelling concepts & class modeling)
Unit 1( modelling concepts & class modeling)
 
Depth Buffer Method
Depth Buffer MethodDepth Buffer Method
Depth Buffer Method
 
Parallel projection
Parallel projectionParallel projection
Parallel projection
 
Processes and threads
Processes and threadsProcesses and threads
Processes and threads
 
Domain class model
Domain class modelDomain class model
Domain class model
 
Dynamic and Static Modeling
Dynamic and Static ModelingDynamic and Static Modeling
Dynamic and Static Modeling
 
Raster Scan display
Raster Scan displayRaster Scan display
Raster Scan display
 

Viewers also liked

First Review(Ppt)
First Review(Ppt)First Review(Ppt)
First Review(Ppt)
smjagadish
 
Sample PowerPoint for Project Review
Sample PowerPoint for Project ReviewSample PowerPoint for Project Review
Sample PowerPoint for Project Review
MissKarchin
 
Final Year Project Presentation
Final Year Project PresentationFinal Year Project Presentation
Final Year Project Presentation
Syed Absar
 
Presentation on project report
Presentation on project reportPresentation on project report
Presentation on project report
ramesh_x
 
Zeroth review presentation(3)
Zeroth review presentation(3)Zeroth review presentation(3)
Zeroth review presentation(3)
yash119
 
Final Year Project
Final Year ProjectFinal Year Project
Final Year Project
Muhammad Khan
 
First Review for B.Tech Mechanical VIII Sem
First Review for B.Tech Mechanical VIII SemFirst Review for B.Tech Mechanical VIII Sem
First Review for B.Tech Mechanical VIII Sem
VIT University
 
Final ppt of project
Final ppt of projectFinal ppt of project
Final ppt of project
Ruchi Gulati
 
Mini project ppt
Mini project pptMini project ppt
Mini project ppt
Manendra Shukla
 
Final Year Project Presentation
Final Year Project PresentationFinal Year Project Presentation
Final Year Project Presentation
LauraConroy
 
General Guidelines - First Review for B.Tech Mechanical VIII Sem
General Guidelines - First Review for B.Tech Mechanical VIII SemGeneral Guidelines - First Review for B.Tech Mechanical VIII Sem
General Guidelines - First Review for B.Tech Mechanical VIII Sem
VIT University
 
Canvas Based Presentation - Zeroth Review
Canvas Based Presentation - Zeroth ReviewCanvas Based Presentation - Zeroth Review
Canvas Based Presentation - Zeroth Review
Arvind Krishnaa
 
Project explation ppt
Project explation pptProject explation ppt
Project explation ppt
Pawan Kumar Shrivas
 
Final Year Project Presentation
Final Year Project PresentationFinal Year Project Presentation
Final Year Project Presentation
Ghulam Mustafa Vira
 
zeroth review
zeroth review zeroth review
zeroth review
dinesh raj
 
The second review
The second reviewThe second review
The second review
Joccy
 
History Of Coimbatore Seminar
History Of Coimbatore SeminarHistory Of Coimbatore Seminar
History Of Coimbatore Seminar
Divakar Raj M
 
Hybrid wireless network -0th review
Hybrid wireless network -0th review Hybrid wireless network -0th review
Hybrid wireless network -0th review
AAKASH S
 
Chemical Project Engineer
Chemical Project EngineerChemical Project Engineer
Chemical Project Engineer
laurenMadosky
 
Simulation and Design of SRF based Control Algorithm for Three Phase Shunt Ac...
Simulation and Design of SRF based Control Algorithm for Three Phase Shunt Ac...Simulation and Design of SRF based Control Algorithm for Three Phase Shunt Ac...
Simulation and Design of SRF based Control Algorithm for Three Phase Shunt Ac...
idescitation
 

Viewers also liked (20)

First Review(Ppt)
First Review(Ppt)First Review(Ppt)
First Review(Ppt)
 
Sample PowerPoint for Project Review
Sample PowerPoint for Project ReviewSample PowerPoint for Project Review
Sample PowerPoint for Project Review
 
Final Year Project Presentation
Final Year Project PresentationFinal Year Project Presentation
Final Year Project Presentation
 
Presentation on project report
Presentation on project reportPresentation on project report
Presentation on project report
 
Zeroth review presentation(3)
Zeroth review presentation(3)Zeroth review presentation(3)
Zeroth review presentation(3)
 
Final Year Project
Final Year ProjectFinal Year Project
Final Year Project
 
First Review for B.Tech Mechanical VIII Sem
First Review for B.Tech Mechanical VIII SemFirst Review for B.Tech Mechanical VIII Sem
First Review for B.Tech Mechanical VIII Sem
 
Final ppt of project
Final ppt of projectFinal ppt of project
Final ppt of project
 
Mini project ppt
Mini project pptMini project ppt
Mini project ppt
 
Final Year Project Presentation
Final Year Project PresentationFinal Year Project Presentation
Final Year Project Presentation
 
General Guidelines - First Review for B.Tech Mechanical VIII Sem
General Guidelines - First Review for B.Tech Mechanical VIII SemGeneral Guidelines - First Review for B.Tech Mechanical VIII Sem
General Guidelines - First Review for B.Tech Mechanical VIII Sem
 
Canvas Based Presentation - Zeroth Review
Canvas Based Presentation - Zeroth ReviewCanvas Based Presentation - Zeroth Review
Canvas Based Presentation - Zeroth Review
 
Project explation ppt
Project explation pptProject explation ppt
Project explation ppt
 
Final Year Project Presentation
Final Year Project PresentationFinal Year Project Presentation
Final Year Project Presentation
 
zeroth review
zeroth review zeroth review
zeroth review
 
The second review
The second reviewThe second review
The second review
 
History Of Coimbatore Seminar
History Of Coimbatore SeminarHistory Of Coimbatore Seminar
History Of Coimbatore Seminar
 
Hybrid wireless network -0th review
Hybrid wireless network -0th review Hybrid wireless network -0th review
Hybrid wireless network -0th review
 
Chemical Project Engineer
Chemical Project EngineerChemical Project Engineer
Chemical Project Engineer
 
Simulation and Design of SRF based Control Algorithm for Three Phase Shunt Ac...
Simulation and Design of SRF based Control Algorithm for Three Phase Shunt Ac...Simulation and Design of SRF based Control Algorithm for Three Phase Shunt Ac...
Simulation and Design of SRF based Control Algorithm for Three Phase Shunt Ac...
 

Similar to Project 0th Review

2 introductory slides
2 introductory slides2 introductory slides
2 introductory slides
tafosepsdfasg
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
DataminingTools Inc
 
Data Mining: Data mining classification and analysis
Data Mining: Data mining classification and analysisData Mining: Data mining classification and analysis
Data Mining: Data mining classification and analysis
Datamining Tools
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
thamizh arasi
 
Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CS
Thanveen
 
algoritma klastering.pdf
algoritma klastering.pdfalgoritma klastering.pdf
algoritma klastering.pdf
bintis1
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introduction
Dr-Dipali Meher
 
Data Mining- Unit-I PPT (1).ppt
Data Mining- Unit-I PPT (1).pptData Mining- Unit-I PPT (1).ppt
Data Mining- Unit-I PPT (1).ppt
AravindReddy565690
 
Data Mining Application and Trends
Data Mining Application and TrendsData Mining Application and Trends
Data Mining Application and Trends
VijayasankariS
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onword
Sulman Ahmed
 
Data mining
Data miningData mining
Data mining
pradeepa n
 
Data mininng trends
Data mininng trendsData mininng trends
Data mininng trends
VijayasankariS
 
Data mining
Data miningData mining
Data mining
Annies Minu
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
bhagathk
 
Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
Dhilsath Fathima
 
DM_clustering.ppt
DM_clustering.pptDM_clustering.ppt
DM_clustering.ppt
nandhini manoharan
 
Chapter 13. Trends and Research Frontiers in Data Mining.ppt
Chapter 13. Trends and Research Frontiers in Data Mining.pptChapter 13. Trends and Research Frontiers in Data Mining.ppt
Chapter 13. Trends and Research Frontiers in Data Mining.ppt
Subrata Kumer Paul
 
G045033841
G045033841G045033841
G045033841
IJERA Editor
 
Introduction to data warehouse
Introduction to data warehouseIntroduction to data warehouse
Introduction to data warehouse
Cognizant Technology Solutions
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
Amr Abd El Latief
 

Similar to Project 0th Review (20)

2 introductory slides
2 introductory slides2 introductory slides
2 introductory slides
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
 
Data Mining: Data mining classification and analysis
Data Mining: Data mining classification and analysisData Mining: Data mining classification and analysis
Data Mining: Data mining classification and analysis
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
 
Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CS
 
algoritma klastering.pdf
algoritma klastering.pdfalgoritma klastering.pdf
algoritma klastering.pdf
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introduction
 
Data Mining- Unit-I PPT (1).ppt
Data Mining- Unit-I PPT (1).pptData Mining- Unit-I PPT (1).ppt
Data Mining- Unit-I PPT (1).ppt
 
Data Mining Application and Trends
Data Mining Application and TrendsData Mining Application and Trends
Data Mining Application and Trends
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onword
 
Data mining
Data miningData mining
Data mining
 
Data mininng trends
Data mininng trendsData mininng trends
Data mininng trends
 
Data mining
Data miningData mining
Data mining
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
 
Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
 
DM_clustering.ppt
DM_clustering.pptDM_clustering.ppt
DM_clustering.ppt
 
Chapter 13. Trends and Research Frontiers in Data Mining.ppt
Chapter 13. Trends and Research Frontiers in Data Mining.pptChapter 13. Trends and Research Frontiers in Data Mining.ppt
Chapter 13. Trends and Research Frontiers in Data Mining.ppt
 
G045033841
G045033841G045033841
G045033841
 
Introduction to data warehouse
Introduction to data warehouseIntroduction to data warehouse
Introduction to data warehouse
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 

Recently uploaded

The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
simonomuemu
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
taiba qazi
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
ak6969907
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 

Recently uploaded (20)

The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 

Project 0th Review

  • 1. A Combined Approach for Clustering based on the GSA-KM and Genetic Algorithms Divakar Raj.M (0901016) Under the guidance of Dilip.M (0901015) Mr.P.Perumal Kishore Kumar.C (0901036) Associate Professor IV CSE - A Department of Computer Science and Engineering (UG) Data Mining / Clustering 1/33
  • 2. Introduction about Data Mining • Data mining (knowledge discovery in databases): – Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases • Potential Applications – Market analysis and management – Risk analysis and management – Fraud detection and management – Text mining (news group, email, documents) and Web analysis – Intelligent query answering Data Mining / Clustering 2/33
  • 3. Data Mining: A KDD Process – Data mining: the core of Pattern Evaluation knowledge discovery process. Data Mining Task-relevant Data Data Warehouse Selection Data Cleaning Data Integration Databases Data Mining / Clustering 3/33
  • 4. Architecture of a Typical Data Mining System Graphical user interface Pattern evaluation Data mining engine Knowledge-base Database or data warehouse server Filtering Data cleaning & data integration Data Databases Warehouse 4/33 Data Mining / Clustering
  • 5. Data Mining Functionalities • Concept description: Characterization and discrimination – Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet regions • Association (correlation and causality) – Multi-dimensional vs. single-dimensional association – age(X, ―20..29‖) ^ income(X, ―20..29K‖) buys(X, ―PC‖) – contains(T, ―computer‖) contains(x, ―software‖) Data Mining / Clustering 5/33
  • 6. Data Mining Functionalities • Classification and Prediction – Finding models (functions) that describe and distinguish classes or concepts for future prediction – E.g., classify countries based on climate, or classify cars based on gas mileage – Presentation: decision-tree, classification rule, neural network – Prediction: Predict some unknown or missing numerical values • Cluster analysis – Class label is unknown: Group data to form new classes, e.g., cluster houses to find distribution patterns – Clustering based on the principle: maximizing the intra-class similarity and minimizing the interclass similarity Data Mining / Clustering 6/33
  • 7. Data Mining Functionalities • Outlier analysis – Outlier: a data object that does not comply with the general behavior of the data – It can be considered as noise or exception but is quite useful in fraud detection, rare events analysis • Trend and evolution analysis – Trend and deviation: regression analysis – Sequential pattern mining, periodicity analysis – Similarity-based analysis Data Mining / Clustering 7/33
  • 8. Issues in Data mining • Individual Privacy • Data Integrity • Relational Database Structure (vs) Multidimensional One • Issue of Cost • Mining methodology and user interaction issues • Performance issues • Issues relating to the diversity of database types Data Mining / Clustering 8/33
  • 9. Applications • Database analysis and decision support – Market analysis and management • Target Marketing, Customer Relation Management, Market Basket Analysis, Cross Selling, Market Segmentation – Risk analysis and management • Forecasting, Customer Retention, Improved Underwriting, Quality Control, Competitive Analysis Data Mining / Clustering 9/33
  • 10. Applications • Text mining (news group, email, documents) and Web analysis • Intelligent query answering • Sports • Astronomy • Internet Web Surf-Aid Data Mining / Clustering 10/33
  • 11. Clustering • Clustering is a data mining (machine learning) technique used to place data elements into related groups without advance knowledge of the group definitions • Set of meaningful sub classes called clusters Data Mining / Clustering 11/33
  • 12. Cluster Analysis • Cluster: a collection of data objects – Similar to one another within the same cluster – Dissimilar to the objects in other clusters • Cluster analysis – Grouping a set of data objects into clusters • Clustering is unsupervised classification: no predefined classes • Typical applications – As a stand-alone tool to get insight into data distribution – As a preprocessing step for other algorithms Data Mining / Clustering 12/33
  • 13. What Is Good Clustering? • A good clustering method will produce high quality clusters with – high intra-class similarity – low inter-class similarity • The quality of a clustering result depends on both the similarity measure used by the method and its implementation. • The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns Data Mining / Clustering 13/33
  • 14. Requirements of Clustering in Data Mining • Scalability • Ability to deal with different types of attributes • Discovery of clusters with arbitrary shape • Minimal requirements for domain knowledge to determine input parameters • Able to deal with noise and outliers • Insensitive to order of input records • High dimensionality • Incorporation of user-specified constraints • Interpretability and Usability Data Mining / Clustering 14/33
  • 15. Major Clustering Approaches • Partitioning algorithms: Construct various partitions and then evaluate them by some criterion • Hierarchy algorithms: Create a hierarchical decomposition of the set of data (or objects) using some criterion • Density-based: based on connectivity and density functions • Grid-based: based on a multiple-level granularity structure • Model-based: A model is hypothesized for each of the clusters and the idea is to find the best fit of that model to each other Data Mining / Clustering 15/33
  • 16. Issues of Clustering • Assessment of results • Choice of appropriate number of clusters • Data preparation • Proximity measures • Handling outliers Data Mining / Clustering 16/33
  • 17. General Applications of Clustering • Pattern Recognition • Image Processing • Economic Science (especially market research) • WWW – Document classification – Cluster Weblog data to discover groups of similar access patterns Data Mining / Clustering 17/33
  • 18. Examples of Clustering Applications • Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs • Land use: Identification of areas of similar land use in an earth observation database • Insurance: Identifying groups of motor insurance policy holders with a high average claim cost • City-planning: Identifying groups of houses according to their house type, value, and geographical location • Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults Data Mining / Clustering 18/33
  • 19. Literature Survey [1] An Architecture for Component-Based Design of Representative- Based Clustering Algorithms Boris Delibas, Milan Vuki, Milos Jovanovi, Kathrin Kirchner, Johannes Ruhland, Milija Suknovic (2012) [2] The Research of Imbalanced Data Set of Sample Sampling Method Based on K-Means Cluster and Genetic Algorithm Yang Yong, (2012) [3] A Combined Approach for Clustering based on K-means and Gravitational Search Algorithms Abdolreza Hatamlou, Salwani Abdullah, Hossein Nezamabadi- pour, (2012) Data Mining / Clustering 19/33
  • 20. An Architecture for Component-Based Design of Representative-Based Clustering Algorithms • Based on reusable components • Components derived from K-Means like algorithms and their extensions • The new algorithm is built by exchanging components from the original algorithm and their improvements • The Comparison & Evaluation are possible by using Representative Based Clustering Algorithm Data Mining / Clustering 20/33
  • 21. The Research of Imbalanced Data Set of Sample Sampling Method Based on K-Means Cluster and Genetic Algorithm • We use K-Means to cluster & In each cluster, we use GA to carry on the valid confirmation and to gain a new sample • Enhances the classified performance of imbalanced datasets • Generates unbalanced data set’s minority class • Attention to Classification’s accuracy of Minority Classes Data Mining / Clustering 21/33
  • 22. A Combined Approach for Clustering based on K- means and Gravitational Search Algorithms • A hybrid data clustering algorithm based on GSA and k-means (GSA-KM) is presented • It uses the advantages of both algorithms • Comparison of the performance of GSA-KM with other well-known algorithms – K-means – Genetic Algorithm(GA) – Simulated Annealing(SA) – Ant Colony Optimization(ACO) – Honey Bee Mating Optimization(HBMO) – Particle Swarm Optimization(PSO) – Gravitational Search Algorithm(GSA) • Comparison based on real and standard datasets from the UCI repository Data Mining / Clustering 22/33
  • 23. Existing System K-Means • One of the most efficient and famous clustering algorithms • Starts with some random or heuristic-based centroids for the desired clusters • Assigns every data object to the closest centroid • Iteratively refines the current centroids to reach the (near) optimal ones by calculating the mean value of data objects within their respective clusters • The algorithm will terminate when any one of the specified termination criteria is met (i.e., a predetermined maximum number of iterations is reached, a (near) optimal solution is found or the maximum search time is reached) Data Mining / Clustering 23/33
  • 24. Existing System Gravitational Search Algorithm • Inspired by the physical phenomenon of Gravity • Based on the interaction of masses in the universe via Newtonian gravity law • Attraction depends on the amount of masses and the distance between them 2 • F = G (M1*M2) / R Data Mining / Clustering 24/33
  • 25. Drawbacks of Existing System K – Means • Performance is highly dependent on the initial state of centroids • May converge to the local optima rather than global optima • The number of clusters is needed as input to the algorithm, i.e. the number of clusters is assumed known Data Mining / Clustering 25/33
  • 26. GSA-KM • Built on three main steps 1. GSA-KM applies k-means algorithm on selected dataset and tries to produce near optimal centroids for desired clusters 2. The proposed approach will produce an initial population of solutions 3. Application of the GSA Algorithm Data Mining / Clustering 26/33
  • 27. GSA - KM Ways for production of an initial population • One of the candidate solutions will be produced by the output of the k-means algorithm, which has been achieved in the previous step • Three of them will be created based on the dataset itself and other solutions will be produced randomly • GSA will be employed for determining an optimal solution for the clustering problem Data Mining / Clustering 27/33
  • 28. Reasons for Efficiency • Decreases the number of iterations and function evaluations to find a near global optimum compared to the original GSA alone • With the advent of a good candidate solution in the initial population, GSA can search for near global optima in a promising search space and, therefore, find a high quality solution in comparison with the original GSA alone Data Mining / Clustering 28/33
  • 29. Proposed System • Along with the given GSA-KM, we intend to implement Genetic Algorithm to further increase the efficiency and speed of the clustering • The proposed system will have combined advantages and will be faster and efficient than the traditional clustering algorithms and also GSA-KM Data Mining / Clustering 29/33
  • 30. Implementation Details • Programming language : C# • Database : MS- Access • The given repository is clustered using K-Means and GSA, combinedly called GSA-KM and Genetic Algorithm is used to enhance the performance • The performance is calculated and compared with other clustering algorithms Data Mining / Clustering 30/33
  • 31. References [1] C.L. Blake, C.J. Merz UCI repository of machine learning databases http://www.ics.uci.edu/-learn/MLRepository.html [2] S. Das, A. Abraham, A. Konar Meta heuristic pattern clustering —an overview Studies in Computational Intelligence (2009) [3] L. Kaufman, P.J. Rousseeuw Finding Groups in Data: An Introduction to Cluster Analysis John Wiley & Sons, New York, (1990) [4] M.B. Adil Modified global-means algorithm for minimum sum-of- squares clustering problems Pattern Recognition 41 (10) (2008) [5] E. Rashedi, H. Nezamabadi-pour, S. Saryazdi GSA: a gravitational search algorithm Information Sciences 179 (13) (2009) Data Mining / Clustering 31/33
  • 32. References [6] A. Likas, N. Vlassis, J.J. Verbeek The global k -means clustering algorithm Pattern Recognition 36 (2) (2003) [7] M. Mahdavi Novel meta-heuristic algorithms for clustering web documents Applied Mathematics and Computation (2008) [8] M. Moshtaghi Clustering ellipses for anomaly detection Pattern Recognition 44 (2008) [9] B. Saglam, et al., A mixed-integer programming approach to the clustering problem with an application in customer segmentation European Journal of Operational Research 173 (3) (2006) [10] A.K. Jain Data clustering: 50 years beyond K –means Pattern Recognition Letters 31 (8) (2010) Data Mining / Clustering 32/33
  • 33. Thank You !!! Data Mining / Clustering 33/33