SlideShare a Scribd company logo
1 of 23
Clustering
Model based techniques and Handling
high dimensional data
1
2
Model-Based Clustering Methods
 Attempt to optimize the fit between the data and some mathematical
model
 Assumption: Data are generated by a mixture of underlying probability
distributions
 Techniques
 Expectation-Maximization
 Conceptual Clustering
 Neural Networks Approach
Expectation Maximization
 Each cluster is represented mathematically by a
parametric probability distribution
 Component distribution
 Data is a mixture of these distributions
 Mixture density model
 Problem: To estimate parameters of probability distributions
3
Expectation Maximization
 Iterative Refinement Algorithm – used to find parameter
estimates
 Extension of k-means
 Assigns an object to a cluster according to a weight representing
probability of membership
 Initial estimate of parameters
 Iteratively reassigns scores
4
Expectation Maximization
 Initial guess for parameters; randomly select k objects to
represent cluster means or centers
 Iteratively refine parameters / clusters
 Expectation Step
 Assign each object xi to cluster Ck with probability
where
 Maximization Step
 Re-estimate model parameters
 Simple and easy to implement
 Complexity depends on features, objects and iterations
5
6
Conceptual Clustering
 Conceptual clustering
 A form of clustering in machine learning
 Produces a classification scheme for a set of unlabeled objects
 Finds characteristic description for each concept (class)
 COBWEB
 A popular and simple method of incremental conceptual learning
 Creates a hierarchical clustering in the form of a classification tree
 Each node refers to a concept and contains a probabilistic description
of that concept
7
COBWEB Clustering Method
A classification tree
COBWEB
 Classification tree
 Each node – Concept and its probabilistic distribution
(Summary of objects under that node)
 Description – Conditional probabilities P(Ai=vij / Ck)
 Sibling nodes at given level form a partition
 Category Utility
 Increase in the expected number of attribute values that can
be correctly guessed given a partition
8
COBWEB
 Category Utility rewards:
 Intra-class similarity P(Ai=vij|Ck)
 High value indicates many class members share this attribute-value
pair
 Inter-class dissimilarity P(Ck|Ai=vij)
 High values – fewer objects in different classes share this attribute-
value
 Placement of new objects
 Descend tree
 Identify best host
 Temporarily place object in each node and compute CU of resulting
partition
 Placement with highest CU is chosen
 COBWEB may also forms new nodes if object does not fit into the
existing tree
9
COBWEB
 COBWEB is sensitive to order of records
 Additional operations
 Merging and Splitting
 Two best hosts are considered for merging
 Best host is considered for splitting
 Limitations
 The assumption that the attributes are independent of each
other is often too strong because correlation may exist
 Not suitable for clustering large database data
 CLASSIT - an extension of COBWEB for incremental clustering of
continuous data
10
Neural Network Approach
 Represent each cluster as an exemplar, acting as a “prototype” of
the cluster
 New objects are distributed to the cluster whose exemplar is the
most similar according to some distance measure
 Self Organizing Map
 Competitive learning
 Involves a hierarchical architecture of several units (neurons)
 Neurons compete in a “winner-takes-all” fashion for the object currently
being presented
 Organization of units – forms a feature map
 Web Document Clustering
11
Kohenen SOM
12
Clustering High-Dimensional data
 As dimensionality increases
 number of irrelevant dimensions may produce noise and mask real clusters
 data becomes sparse
 Distance measures –meaningless
 Feature transformation methods
 PCA, SVD – Summarize data by creating linear combinations of attributes
 But do not remove any attributes; transformed attributes – complex to
interpret
 Feature Selection methods
 Most relevant set of attributes with respect to class labels
 Entropy Analysis
 Subspace Clustering – searches for groups of clusters within different
subspaces of the same data set
13
CLIQUE: CLustering In QUest
 Dimension growth subspace clustering
 Starts at 1-D and grows upwards to higher dimensions
 Partitions each dimension – grids – determines whether
cell is dense
 CLIQUE
 Determines sparse and crowded units
 Dense unit – fraction of data points > threshold
 Cluster – maximal set of connected dense units
14
CLIQUE
 First partitions d-dimensional space into non-overlapping units
 Performed in 1-D
 Based on Apriori property: If a k-dimensional unit is dense so are its
projections in (k-1) dimensional space
 Search space size is reduced
 Determines the maximal dense region and Generates a minimal
description
15
CLIQUE
 Finds subspace of highest dimension
 Insensitive to order of inputs
 Performance depends on grid size and density threshold
 Difficult to determine across all dimensions
 Several lower dimensional subspaces will have to be
processed
 Can use adaptive strategy
16
PROCLUS – PROjected CLUStering
 Dimension-reduction Subspace Clustering technique
 Finds initial approximation of clusters in high dimensional
space
 Avoids generation of large number of overlapped
clusters of lower dimensionality
 Finds best set of medoids by hill-climbing process
(Similar to CLARANS)
 Manhattan Segmental distance measure
17
PROCLUS
 Initialization phase
 Greedy algorithm to select a set of initial medoids that are far
apart
 Iteration Phase
 Selects a random set of k-medoids
 Replaces bad medoids
 For each medoid a set of dimensions is chosen whose average
distances are small
 Refinement Phase
 Computes new dimensions for each medoid based on clusters
found, reasigns points to medoids and removes outliers
18
Frequent Pattern based Clustering
 Frequent patterns may also form clusters
 Instead of growing clusters dimension by dimension sets
of frequent itemsets are determined
 Two common technqiues
 Frequent term-based text Clustering
 Clustering by Pattern similarity
19
Frequent-term based text clustering
 Text documents are clustered based on frequent terms
they contain
 Documents – terms
 Dimensionality is very high
 Frequent term based analysis
 Well selected subset of set of all frequent terms must be
discovered
 Fi – Set of frequent term sets
 Cov(Fi) – set of documents covered by Fi
 ∑i=1 k
cov(Fi) = D and overlap between Fi and Fj must be
minimized
 Description of clusters – their frequent term sets
20
Clustering by Pattern Similarity
 pCluster on micro-array data analysis
 DNA micro-array analysis – expression levels of two
genes may rise and fall synchronously in response to
stimuli
 Two objects are similar if they exhibit a coherent pattern
on a subset of dimensions
21
pCluster
 Shift Pattern discovery
 Euclidean distance – not suitable
 Derive new attributes
 Bi-Clustering based on mean squared residue score
 pCluster
 Objects –x, y; attributes – a, b
 A pair (O,T) forms a δ-pCluster if for any 2 x 2 matrix X in (O, T)
pScore(X) <= δ
 Each pair of objects and their features must satisfy threshold
22
pCluster
 Scaling patterns
 pCluster can be used in other applications
also
23

More Related Content

What's hot

3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methodsKrish_ver2
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Salah Amean
 
Grid based method & model based clustering method
Grid based method & model based clustering methodGrid based method & model based clustering method
Grid based method & model based clustering methodrajshreemuthiah
 
4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data miningKrish_ver2
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Salah Amean
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based ClusteringSSA KPI
 
K means clustering
K means clusteringK means clustering
K means clusteringkeshav goyal
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classificationKrish_ver2
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalitiesKrish_ver2
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization janani thirupathi
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataSalah Amean
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data miningMITS Gwalior
 

What's hot (20)

3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methods
 
Clique
Clique Clique
Clique
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
 
Grid based method & model based clustering method
Grid based method & model based clustering methodGrid based method & model based clustering method
Grid based method & model based clustering method
 
4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data mining
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based Clustering
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
K means clustering
K means clusteringK means clustering
K means clustering
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classification
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
Data mining primitives
Data mining primitivesData mining primitives
Data mining primitives
 
Data clustering
Data clustering Data clustering
Data clustering
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
Chapter8
Chapter8Chapter8
Chapter8
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data mining
 

Similar to 3.5 model based clustering

Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptSubrata Kumer Paul
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasicengrasi
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapterNaveenKumar5162
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapterNaveenKumar5162
 
Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10mqasimsheikh5
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basicHouw Liong The
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberHouw Liong The
 
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.pptLPrashanthi
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfSowmyaJyothi3
 
ClustIII.ppt
ClustIII.pptClustIII.ppt
ClustIII.pptSueMiu
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
 

Similar to 3.5 model based clustering (20)

Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
CLUSTERING
CLUSTERINGCLUSTERING
CLUSTERING
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapter
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapter
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basic
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & Kamber
 
My8clst
My8clstMy8clst
My8clst
 
Dataa miining
Dataa miiningDataa miining
Dataa miining
 
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.ppt
 
ClusetrigBasic.ppt
ClusetrigBasic.pptClusetrigBasic.ppt
ClusetrigBasic.ppt
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
 
ClustIII.ppt
ClustIII.pptClustIII.ppt
ClustIII.ppt
 
Clustering
ClusteringClustering
Clustering
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithm
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
UNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptxUNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptx
 
Data clustring
Data clustring Data clustring
Data clustring
 

More from Krish_ver2

5.5 back tracking
5.5 back tracking5.5 back tracking
5.5 back trackingKrish_ver2
 
5.5 back track
5.5 back track5.5 back track
5.5 back trackKrish_ver2
 
5.5 back tracking 02
5.5 back tracking 025.5 back tracking 02
5.5 back tracking 02Krish_ver2
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructuresKrish_ver2
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructuresKrish_ver2
 
5.4 randamized algorithm
5.4 randamized algorithm5.4 randamized algorithm
5.4 randamized algorithmKrish_ver2
 
5.3 dynamic programming 03
5.3 dynamic programming 035.3 dynamic programming 03
5.3 dynamic programming 03Krish_ver2
 
5.3 dynamic programming
5.3 dynamic programming5.3 dynamic programming
5.3 dynamic programmingKrish_ver2
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-iKrish_ver2
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03Krish_ver2
 
5.2 divide and conquer
5.2 divide and conquer5.2 divide and conquer
5.2 divide and conquerKrish_ver2
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03Krish_ver2
 
5.1 greedyyy 02
5.1 greedyyy 025.1 greedyyy 02
5.1 greedyyy 02Krish_ver2
 
4.4 hashing ext
4.4 hashing  ext4.4 hashing  ext
4.4 hashing extKrish_ver2
 
4.4 external hashing
4.4 external hashing4.4 external hashing
4.4 external hashingKrish_ver2
 

More from Krish_ver2 (20)

5.5 back tracking
5.5 back tracking5.5 back tracking
5.5 back tracking
 
5.5 back track
5.5 back track5.5 back track
5.5 back track
 
5.5 back tracking 02
5.5 back tracking 025.5 back tracking 02
5.5 back tracking 02
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
5.4 randamized algorithm
5.4 randamized algorithm5.4 randamized algorithm
5.4 randamized algorithm
 
5.3 dynamic programming 03
5.3 dynamic programming 035.3 dynamic programming 03
5.3 dynamic programming 03
 
5.3 dynamic programming
5.3 dynamic programming5.3 dynamic programming
5.3 dynamic programming
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-i
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03
 
5.2 divide and conquer
5.2 divide and conquer5.2 divide and conquer
5.2 divide and conquer
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03
 
5.1 greedyyy 02
5.1 greedyyy 025.1 greedyyy 02
5.1 greedyyy 02
 
5.1 greedy
5.1 greedy5.1 greedy
5.1 greedy
 
5.1 greedy 03
5.1 greedy 035.1 greedy 03
5.1 greedy 03
 
4.4 hashing02
4.4 hashing024.4 hashing02
4.4 hashing02
 
4.4 hashing
4.4 hashing4.4 hashing
4.4 hashing
 
4.4 hashing ext
4.4 hashing  ext4.4 hashing  ext
4.4 hashing ext
 
4.4 external hashing
4.4 external hashing4.4 external hashing
4.4 external hashing
 
4.2 bst
4.2 bst4.2 bst
4.2 bst
 

Recently uploaded

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfSanaAli374401
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 

Recently uploaded (20)

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 

3.5 model based clustering

  • 1. Clustering Model based techniques and Handling high dimensional data 1
  • 2. 2 Model-Based Clustering Methods  Attempt to optimize the fit between the data and some mathematical model  Assumption: Data are generated by a mixture of underlying probability distributions  Techniques  Expectation-Maximization  Conceptual Clustering  Neural Networks Approach
  • 3. Expectation Maximization  Each cluster is represented mathematically by a parametric probability distribution  Component distribution  Data is a mixture of these distributions  Mixture density model  Problem: To estimate parameters of probability distributions 3
  • 4. Expectation Maximization  Iterative Refinement Algorithm – used to find parameter estimates  Extension of k-means  Assigns an object to a cluster according to a weight representing probability of membership  Initial estimate of parameters  Iteratively reassigns scores 4
  • 5. Expectation Maximization  Initial guess for parameters; randomly select k objects to represent cluster means or centers  Iteratively refine parameters / clusters  Expectation Step  Assign each object xi to cluster Ck with probability where  Maximization Step  Re-estimate model parameters  Simple and easy to implement  Complexity depends on features, objects and iterations 5
  • 6. 6 Conceptual Clustering  Conceptual clustering  A form of clustering in machine learning  Produces a classification scheme for a set of unlabeled objects  Finds characteristic description for each concept (class)  COBWEB  A popular and simple method of incremental conceptual learning  Creates a hierarchical clustering in the form of a classification tree  Each node refers to a concept and contains a probabilistic description of that concept
  • 7. 7 COBWEB Clustering Method A classification tree
  • 8. COBWEB  Classification tree  Each node – Concept and its probabilistic distribution (Summary of objects under that node)  Description – Conditional probabilities P(Ai=vij / Ck)  Sibling nodes at given level form a partition  Category Utility  Increase in the expected number of attribute values that can be correctly guessed given a partition 8
  • 9. COBWEB  Category Utility rewards:  Intra-class similarity P(Ai=vij|Ck)  High value indicates many class members share this attribute-value pair  Inter-class dissimilarity P(Ck|Ai=vij)  High values – fewer objects in different classes share this attribute- value  Placement of new objects  Descend tree  Identify best host  Temporarily place object in each node and compute CU of resulting partition  Placement with highest CU is chosen  COBWEB may also forms new nodes if object does not fit into the existing tree 9
  • 10. COBWEB  COBWEB is sensitive to order of records  Additional operations  Merging and Splitting  Two best hosts are considered for merging  Best host is considered for splitting  Limitations  The assumption that the attributes are independent of each other is often too strong because correlation may exist  Not suitable for clustering large database data  CLASSIT - an extension of COBWEB for incremental clustering of continuous data 10
  • 11. Neural Network Approach  Represent each cluster as an exemplar, acting as a “prototype” of the cluster  New objects are distributed to the cluster whose exemplar is the most similar according to some distance measure  Self Organizing Map  Competitive learning  Involves a hierarchical architecture of several units (neurons)  Neurons compete in a “winner-takes-all” fashion for the object currently being presented  Organization of units – forms a feature map  Web Document Clustering 11
  • 13. Clustering High-Dimensional data  As dimensionality increases  number of irrelevant dimensions may produce noise and mask real clusters  data becomes sparse  Distance measures –meaningless  Feature transformation methods  PCA, SVD – Summarize data by creating linear combinations of attributes  But do not remove any attributes; transformed attributes – complex to interpret  Feature Selection methods  Most relevant set of attributes with respect to class labels  Entropy Analysis  Subspace Clustering – searches for groups of clusters within different subspaces of the same data set 13
  • 14. CLIQUE: CLustering In QUest  Dimension growth subspace clustering  Starts at 1-D and grows upwards to higher dimensions  Partitions each dimension – grids – determines whether cell is dense  CLIQUE  Determines sparse and crowded units  Dense unit – fraction of data points > threshold  Cluster – maximal set of connected dense units 14
  • 15. CLIQUE  First partitions d-dimensional space into non-overlapping units  Performed in 1-D  Based on Apriori property: If a k-dimensional unit is dense so are its projections in (k-1) dimensional space  Search space size is reduced  Determines the maximal dense region and Generates a minimal description 15
  • 16. CLIQUE  Finds subspace of highest dimension  Insensitive to order of inputs  Performance depends on grid size and density threshold  Difficult to determine across all dimensions  Several lower dimensional subspaces will have to be processed  Can use adaptive strategy 16
  • 17. PROCLUS – PROjected CLUStering  Dimension-reduction Subspace Clustering technique  Finds initial approximation of clusters in high dimensional space  Avoids generation of large number of overlapped clusters of lower dimensionality  Finds best set of medoids by hill-climbing process (Similar to CLARANS)  Manhattan Segmental distance measure 17
  • 18. PROCLUS  Initialization phase  Greedy algorithm to select a set of initial medoids that are far apart  Iteration Phase  Selects a random set of k-medoids  Replaces bad medoids  For each medoid a set of dimensions is chosen whose average distances are small  Refinement Phase  Computes new dimensions for each medoid based on clusters found, reasigns points to medoids and removes outliers 18
  • 19. Frequent Pattern based Clustering  Frequent patterns may also form clusters  Instead of growing clusters dimension by dimension sets of frequent itemsets are determined  Two common technqiues  Frequent term-based text Clustering  Clustering by Pattern similarity 19
  • 20. Frequent-term based text clustering  Text documents are clustered based on frequent terms they contain  Documents – terms  Dimensionality is very high  Frequent term based analysis  Well selected subset of set of all frequent terms must be discovered  Fi – Set of frequent term sets  Cov(Fi) – set of documents covered by Fi  ∑i=1 k cov(Fi) = D and overlap between Fi and Fj must be minimized  Description of clusters – their frequent term sets 20
  • 21. Clustering by Pattern Similarity  pCluster on micro-array data analysis  DNA micro-array analysis – expression levels of two genes may rise and fall synchronously in response to stimuli  Two objects are similar if they exhibit a coherent pattern on a subset of dimensions 21
  • 22. pCluster  Shift Pattern discovery  Euclidean distance – not suitable  Derive new attributes  Bi-Clustering based on mean squared residue score  pCluster  Objects –x, y; attributes – a, b  A pair (O,T) forms a δ-pCluster if for any 2 x 2 matrix X in (O, T) pScore(X) <= δ  Each pair of objects and their features must satisfy threshold 22
  • 23. pCluster  Scaling patterns  pCluster can be used in other applications also 23