SlideShare a Scribd company logo
1 of 50
Clustering on DSS
By:
Nora Alharbi
Enaam Alotaibi
Outlines:
• Importance of Cluster Analysis for Data
Mining
• What is Cluster Analysis
• Purposes of Cluster Analysis
• Main Types of Clustering
• Common Types of Clustering
• Cluster Algorithms
• Applications of clustering
• Conclusion
• References
Importance of Cluster Analysis for
Data Mining
• Used for automatic identification of
natural groupings of things
• Part of the machine-learning family
• Employ unsupervised learning
• Learns the clusters of things from past
data, then assigns new instances
• There is not an output variable
• Also known as segmentation
What is a Cluster Analysis?
Notion of a Cluster can be Ambiguous
QUESTION ??
How many clusters?
What is Cluster Analysis?
• Finding groups of objects such that the
objects in a group will be similar (or related)
to one another and different from (or
unrelated to) the objects in other groups
Puprses of Cluster Analysis
Understanding
Group related documents for
browsing, group genes and
proteins that have similar
functionality, or group stocks
with similar price fluctuations
Summarization
Reduce the size of large data
sets
Clustering precipitation in Australia
What is not Cluster Analysis?
Supervised classification
Have class label information
Simple segmentation
Dividing students into different registration groups
alphabetically, by last name
Results of a query
Groupings are a result of an external specification
Graph partitioning
Some mutual relevance and synergy, but areas are not
identical
Common Types of Clusters
• Well-separated clusters
• Center-based clusters
• Contiguous clusters
• Density-based clusters
• Property or Conceptual
• Described by an Objective Function
Types of Clusters: Well-Separated
• Well-Separated Clusters:
– A cluster is a set of points such that any point in a cluster is closer
(or more similar) to every other point in the cluster than to any
point not in the cluster.
3 well-separated clusters
Types of Clusters: Center-Based
• Center-based (prototype-based)
– A cluster is a set of objects such that an object in a cluster is
closer (more similar) to the “center” of a cluster, than to the
center of any other cluster
– The center of a cluster is often a centroid, the average of all the
points in the cluster, or a medoid, the most “representative”
point of a cluster
4 center-based clusters
Types of Clusters: Contiguity-Based
• Contiguous Cluster (Graph-based)
– A cluster is a set of points such that a point in a cluster is closer (or
more similar) to one or more other points in the cluster than to any
point not in the cluster. Cluster can be defined as a connected
component i.e. a group of objects that are connected to one
another, but that have no connection to objects outside the group.
8 contiguous clusters
Types of Clusters : DENSITY-BASED
Density-based
A cluster is a dense region of points, which is separated by low-density
regions, from other regions of high density.
Used when the clusters are irregular or intertwined, and when noise
and outliers are present.
6 density-based clusters
Types of Clusters: Conceptual Clusters
• Shared Property or Conceptual Clusters
– Finds clusters that share some common property or represent a
particular concept.
.
2 Overlapping Circles
Types of Clusters: Objective Function
• Clusters Defined by an Objective Function
– Finds clusters that minimize or maximize an objective function.
– Enumerate all possible ways of dividing the points into clusters and
evaluate the `goodness' of each potential set of clusters by using the
given objective function.
– Can have global or local objectives.
• Hierarchical clustering algorithms typically have local objectives
• Partitional algorithms typically have global objectives
– A variation of the global objective function approach is to fit the data to a
parameterized model.
• Parameters for the model are determined from the data.
• Mixture models assume that the data is a ‘mixture' of a number of statistical
distributions.
CONT.
-Map the clustering problem to a different domain
and solve a related problem in that domain
-Proximity matrix defines a weighted graph, where
the nodes are the points being clustered, and the
weighted edges represent the proximities between
points
Clustering is equivalent to breaking the graph into
connected components, one for each cluster.
Want to minimize the edge weight between clusters
and maximize the edge weight within clusters
• A clustering is a set of clusters
• Important distinction between
hierarchical and partitional sets of
clusters
What is Clustering?
Main type of Cluster Method
•Hierarchical clustering methods: can be either agglomerative or divisive.
• An agglomerative method starts with each point as a separate cluster,
and successively performs merging until a stopping criterion is met.
• A divisive method begins with all points in a single cluster and
performs splitting until a stopping criterion is met.
• The result of a hierarchical clustering method is a tree of clusters called
a dendogram.
• A tree like diagram that records the sequences of merges or splits
1 3 2 5 4 6
0
0.05
0.1
0.15
0.2
1
2
3
4
5
6
1
2
3 4
5
Hierarchical clustering
• Agglomerative (Bottom up)
Hierarchical clustering
• Agglomerative (Bottom up)
• 1st iteration1
Hierarchical clustering
• Agglomerative (Bottom up)
• 2nd iteration1 2
Hierarchical clustering
• Agglomerative (Bottom up)
• 3rd iteration
1 2
3
Hierarchical clustering
• Agglomerative (Bottom up)
• 4th iteration
1 2
3
4
Hierarchical clustering
• Agglomerative (Bottom up)
• 5th iteration
1 2
3
4
5
Hierarchical clustering
• Agglomerative (Bottom up)
• Finally k clusters left
1 2
3
4
6 9
5
7
8
Hierarchical clustering
• Divisive (Top-down)
Slide credit: Min Zhang
Hierarchical clustering
An example of hierarchical clustering methods is BIRCH (Balanced
Iterative Reducing and Clustering using Hierarchies)
Advantage: typically find a good clustering with a single scan of the data,
and improve the quality further with a few additional scans.
Disadvantages: it can hold only a limited number of entries due to its size,
it not always correspond to what a user may consider a natural cluster.
Type of Cluster Method CONT.
•Partitional clustering methods:
• determine a partition of the points into clusters, such that the
points in a cluster are more similar to each other than to points in
different clusters.
•They start with some arbitrary initial clusters and iteratively until
criterion is met.
Examples of partitional clustering algorithms include k-means, PAM,
CLARA, CLARANS and EM.
Disadvantages:
• it assumes that the points to be clustered are all stored in main
memory.
• the run time of the algorithm is prohibitive on large datasets.
Partitional Clustering
Original Points A Partitional Clustering
Type of Cluster Method CONT.
•Density-based clustering methods :try to find clusters based on the density of
points in regions. Dense regions that are reachable from each other are merged to
form clusters.
Examples of density-based clustering methods include DBSCAN , DBRS and DBRS+
.
Advantages: It gives extremely good results and is efficient in many datasets.
Disadvantages:
• if a dataset has clusters of widely varying densities, it can't able to handle with
it efficiently.
• it does not consider non-spatial attributes in the dataset.
• it not suitable for finding approximate clusters in very large datasets.
Type of Cluster Method CONT.
Grid-based clustering methods quantize the clustering space into a finite number of
cells and then perform the required operations on the quantized space. Cells
containing more than a certain number of points are considered to be dense.
Contiguous dense cells are connected to form clusters.
Examples of grid-based clustering methods include CLIQUE and STING .
Clustering Algorithms
K-means and its variants
Hierarchical clustering
Density-based clustering
K-Means
• Initial set of clusters randomly chosen.
• Iteratively, items are moved among sets of
clusters until the desired set is reached.
• High degree of similarity among elements
in a cluster is obtained.
• Given a cluster Ki={ti1,ti2,…,tim}, the cluster
mean is mi = (1/m)(ti1 + … + tim)
K-Means Example
• Given: {2,4,10,12,3,20,30,11,25}, k=2
1- Randomly assign means: m1=3,m2=4
2- K1={2,3}, K2={4,10,12,20,30,11,25}, m1=2.5,m2=16
3-K1={2,3,4},K2={10,12,20,30,11,25}, m1=3,m2=18
4-K1={2,3,4,10},K2={12,20,30,11,25}, m1=4.75,m2=19.6
5-K1={2,3,4,10,11,12},K2={20,30,25}, m1=7,m2=25
• Stop as the clusters with these means are the same.
K-Means Algorithm
Two different K-means Clustering
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Sub-optimal Clustering
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Optimal Clustering
Original Points
What are applications that
use Clustering?
Some applications of clustering
• Clustering in EBMT (Example-Based Machine
Translation)
• Clustering in Online shopping mall
• Clustering in Spatial database
Clustering in EBMT
• EBMT is a method of performing machine translation by imitating
translation examples of similar sentences .
• a large amount of bi/multi-lingual translation examples (Greek-
English) has been stored in a textual database .
• and input expressions are rendered in the target language by
retrieving from the database that example which is most similar to
the input.
But , how to make retrieval of the example that best matches the
input more efficient?
By using of
cluster
Clustering in EBMT
• It applied k-means algorithm .
• Its enable the application to limit the search space
and locate the best available match in a database.
Clustering in Online shopping mall
• Its combine the characteristics of Web 2.0 into Decision
Support System for online shopping mall.
• To facilitate customers to share profiles and comments.
• The goal is to show the customers the product which they
may have interest in, according to the information provided
by the them.
• the framework contains of the search engine as an external
layer and the recommendation system as an internal layer.
• It used K-means to clustering the products in internal layer.
Clustering in Online shopping mall
Fig2: DSS framework for online shopping mall
Clustering in Online shopping mall
Fig3: Products clustering
Clustering in Spatial database
• Spatial database is a large amounts of data are obtained from
satellite for applications such as geo-marketing, traffic control,
and environmental studies.
• There exist many physical obstacles (rivers, lakes , highways)
and their presence may affect the result of clustering in spatial
databases.
Clustering in Spatial database
Fig4: map of western population of Canada with Highways and Rivers
Clustering in Spatial database
• However, most of clustering algorithms cannot deal with obstacles like
DBRS, DBRS+( Density-Based Clustering with Random Sampling) .
• Because, it can't handle with widely varying densities clusters in
dataset efficiently.
• so new clustering algorithm MMO (mathematics morphology based
algorithm of obstacles clustering) is proposed for the problem of
clustering in the presence of obstacles.
Clustering in Spatial database
• The main contributions are new operators are more
accurate (Connector and Obstor)than the ordinary
operators: (open and close).
Clustering in Spatial database
• Result: The performance tests show that: MMO is
suitable and effective for large data set more than
other algorithms .
Fig5: runtime of MMO with DBRS+
Conclusion
The main purpose of Data mining is to provide decision support to
managers and business professionals through knowledge
discovery
-Analyzes vast store of historical business data
-Tries to discover patterns, trends, and correlations hidden in the
data that can help a company improve its business performance
-Use regression, decision tree, neural network, cluster analysis, or
market basket analysis
-Cluster Analysis is the common method of Data mining tools
-Cluster methods and It’s algorithms were the effective methods
used by many application these day , and good Research area For
DSS fields was Done in Cluster methods .
References
1. CRANIAS, L., PAPAGEORGIOU, H., & PIPERIDIS, S. (1994). CLUSTERING : A
TECHNIQUE FOR SEARCH SPACE REDUCTION IN EXAMPLEBASED MACHINE
TRANSLATION. IEEE.
2. Pattabiraman, V. (2012). Optimization of spatial join using constraints based-
clustering techniques. Engineering and Computer Innovations.
3. Wang, X., Rostoker, C., & Hamilton , H. J. (2004). Density-Based Spatial Clustering
in the Presence of Obstacles and Facilitators.
4. Wang, X., & Hamilton, H. (n.d.). Using Clustering Methods in Geospatial
Information Systems.
5. Wenxing, H., Weng, Y., Xie, L., & Maoqing, L. (2009). Design and Implementation
of Web-based DSS for Online Shopping Mall. IEEE.
6. Zhang, Q. (2008). A Mathematics Morphology Based Algorithm of Obstacles
Clustering. International Conference on Computer Science and Software
Engineering.

More Related Content

What's hot (20)

Cluster Analysis
Cluster Analysis Cluster Analysis
Cluster Analysis
 
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysis
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)
 
Dataa miining
Dataa miiningDataa miining
Dataa miining
 
Clustering
ClusteringClustering
Clustering
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
Machine Learning - Clustering
Machine Learning - ClusteringMachine Learning - Clustering
Machine Learning - Clustering
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Clustering
ClusteringClustering
Clustering
 
Clustering in artificial intelligence
Clustering in artificial intelligence Clustering in artificial intelligence
Clustering in artificial intelligence
 
Clustering
ClusteringClustering
Clustering
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Pattern recognition binoy k means clustering
Pattern recognition binoy  k means clusteringPattern recognition binoy  k means clustering
Pattern recognition binoy k means clustering
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
08 clustering
08 clustering08 clustering
08 clustering
 
Clusters techniques
Clusters techniquesClusters techniques
Clusters techniques
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clustering
 

Viewers also liked

simulation modeling in DSS
 simulation modeling in DSS simulation modeling in DSS
simulation modeling in DSSEnaam Alotaibi
 
GDSS Group Decision Support System
GDSS Group Decision Support SystemGDSS Group Decision Support System
GDSS Group Decision Support SystemEnaam Alotaibi
 
TQM in small and big organizations
TQM in small and big organizationsTQM in small and big organizations
TQM in small and big organizationsEnaam Alotaibi
 
Different Approaches using Change Impact Analysis of UML Based Design for Sof...
Different Approaches using Change Impact Analysis of UML Based Design for Sof...Different Approaches using Change Impact Analysis of UML Based Design for Sof...
Different Approaches using Change Impact Analysis of UML Based Design for Sof...zillesubhan
 
возможности Flash
возможности Flashвозможности Flash
возможности Flashliza2209
 
TRATAMIENTO DBT 2013
TRATAMIENTO DBT 2013TRATAMIENTO DBT 2013
TRATAMIENTO DBT 2013Flor Weisburd
 
Mtk faiz bab 2
Mtk faiz bab 2Mtk faiz bab 2
Mtk faiz bab 2Zinoa
 
Phosphorus and potassium placement for corn and soybean managed w
Phosphorus and potassium placement for corn and soybean managed wPhosphorus and potassium placement for corn and soybean managed w
Phosphorus and potassium placement for corn and soybean managed wDaniel Barker, Ph.D.
 
Da Freelance a VA: la chiave per un 2017 con più clienti e più guadagni
Da Freelance a VA: la chiave per un 2017 con più clienti e più guadagniDa Freelance a VA: la chiave per un 2017 con più clienti e più guadagni
Da Freelance a VA: la chiave per un 2017 con più clienti e più guadagniPaola Devescovi
 
Presentatie Leerwerktraject Amersfoort 13-12-16
Presentatie Leerwerktraject Amersfoort 13-12-16Presentatie Leerwerktraject Amersfoort 13-12-16
Presentatie Leerwerktraject Amersfoort 13-12-16Marcel Palm
 
Arquitectura al servicio del hombre
Arquitectura al servicio del hombreArquitectura al servicio del hombre
Arquitectura al servicio del hombrembochaga
 
Falsa doctrina salvo siempre salvo
Falsa doctrina salvo siempre salvoFalsa doctrina salvo siempre salvo
Falsa doctrina salvo siempre salvoJose Marchena
 

Viewers also liked (16)

simulation modeling in DSS
 simulation modeling in DSS simulation modeling in DSS
simulation modeling in DSS
 
GDSS Group Decision Support System
GDSS Group Decision Support SystemGDSS Group Decision Support System
GDSS Group Decision Support System
 
TQM in small and big organizations
TQM in small and big organizationsTQM in small and big organizations
TQM in small and big organizations
 
Different Approaches using Change Impact Analysis of UML Based Design for Sof...
Different Approaches using Change Impact Analysis of UML Based Design for Sof...Different Approaches using Change Impact Analysis of UML Based Design for Sof...
Different Approaches using Change Impact Analysis of UML Based Design for Sof...
 
возможности Flash
возможности Flashвозможности Flash
возможности Flash
 
TRATAMIENTO DBT 2013
TRATAMIENTO DBT 2013TRATAMIENTO DBT 2013
TRATAMIENTO DBT 2013
 
Lenguaje de definición de datos (ddl)
Lenguaje de definición de datos (ddl)Lenguaje de definición de datos (ddl)
Lenguaje de definición de datos (ddl)
 
Homeopathy for Hypothyroidism Disorders
Homeopathy for Hypothyroidism DisordersHomeopathy for Hypothyroidism Disorders
Homeopathy for Hypothyroidism Disorders
 
Mtk faiz bab 2
Mtk faiz bab 2Mtk faiz bab 2
Mtk faiz bab 2
 
Phosphorus and potassium placement for corn and soybean managed w
Phosphorus and potassium placement for corn and soybean managed wPhosphorus and potassium placement for corn and soybean managed w
Phosphorus and potassium placement for corn and soybean managed w
 
Intrinsic healing
Intrinsic healingIntrinsic healing
Intrinsic healing
 
Da Freelance a VA: la chiave per un 2017 con più clienti e più guadagni
Da Freelance a VA: la chiave per un 2017 con più clienti e più guadagniDa Freelance a VA: la chiave per un 2017 con più clienti e più guadagni
Da Freelance a VA: la chiave per un 2017 con più clienti e più guadagni
 
Presentatie Leerwerktraject Amersfoort 13-12-16
Presentatie Leerwerktraject Amersfoort 13-12-16Presentatie Leerwerktraject Amersfoort 13-12-16
Presentatie Leerwerktraject Amersfoort 13-12-16
 
Arquitectura al servicio del hombre
Arquitectura al servicio del hombreArquitectura al servicio del hombre
Arquitectura al servicio del hombre
 
Falsa doctrina salvo siempre salvo
Falsa doctrina salvo siempre salvoFalsa doctrina salvo siempre salvo
Falsa doctrina salvo siempre salvo
 
8. modulo-de-liderazgo
8. modulo-de-liderazgo8. modulo-de-liderazgo
8. modulo-de-liderazgo
 

Similar to Clustering on DSS

Cluster_saumitra.ppt
Cluster_saumitra.pptCluster_saumitra.ppt
Cluster_saumitra.pptssuser6b3336
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)Pravinkumar Landge
 
Data mining Techniques
Data mining TechniquesData mining Techniques
Data mining TechniquesSulman Ahmed
 
Unsupervised Learning-Clustering Algorithms.pptx
Unsupervised Learning-Clustering Algorithms.pptxUnsupervised Learning-Clustering Algorithms.pptx
Unsupervised Learning-Clustering Algorithms.pptxjasontseng19
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.pptvikassingh569137
 
Hierarchical clustering machine learning by arpit_sharma
Hierarchical clustering  machine learning by arpit_sharmaHierarchical clustering  machine learning by arpit_sharma
Hierarchical clustering machine learning by arpit_sharmaEr. Arpit Sharma
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit vmalathieswaran29
 
Clustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfClustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfigeabroad
 
DS9 - Clustering.pptx
DS9 - Clustering.pptxDS9 - Clustering.pptx
DS9 - Clustering.pptxJK970901
 
UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningNandakumar P
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfSowmyaJyothi3
 
MODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxMODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxnikshaikh786
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsPrashanth Guntal
 

Similar to Clustering on DSS (20)

Clusteryanam
ClusteryanamClusteryanam
Clusteryanam
 
Cluster_saumitra.ppt
Cluster_saumitra.pptCluster_saumitra.ppt
Cluster_saumitra.ppt
 
PPT s10-machine vision-s2
PPT s10-machine vision-s2PPT s10-machine vision-s2
PPT s10-machine vision-s2
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
 
Data mining Techniques
Data mining TechniquesData mining Techniques
Data mining Techniques
 
Data mining
Data miningData mining
Data mining
 
Unsupervised Learning-Clustering Algorithms.pptx
Unsupervised Learning-Clustering Algorithms.pptxUnsupervised Learning-Clustering Algorithms.pptx
Unsupervised Learning-Clustering Algorithms.pptx
 
kmean clustering
kmean clusteringkmean clustering
kmean clustering
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
Hierarchical clustering machine learning by arpit_sharma
Hierarchical clustering  machine learning by arpit_sharmaHierarchical clustering  machine learning by arpit_sharma
Hierarchical clustering machine learning by arpit_sharma
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit v
 
Clustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfClustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdf
 
Clustering.pdf
Clustering.pdfClustering.pdf
Clustering.pdf
 
DS9 - Clustering.pptx
DS9 - Clustering.pptxDS9 - Clustering.pptx
DS9 - Clustering.pptx
 
Clustering
ClusteringClustering
Clustering
 
UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data Mining
 
UNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptxUNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptx
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
 
MODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxMODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptx
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

Clustering on DSS

  • 1. Clustering on DSS By: Nora Alharbi Enaam Alotaibi
  • 2. Outlines: • Importance of Cluster Analysis for Data Mining • What is Cluster Analysis • Purposes of Cluster Analysis • Main Types of Clustering • Common Types of Clustering • Cluster Algorithms • Applications of clustering • Conclusion • References
  • 3. Importance of Cluster Analysis for Data Mining • Used for automatic identification of natural groupings of things • Part of the machine-learning family • Employ unsupervised learning • Learns the clusters of things from past data, then assigns new instances • There is not an output variable • Also known as segmentation
  • 4. What is a Cluster Analysis?
  • 5. Notion of a Cluster can be Ambiguous QUESTION ?? How many clusters?
  • 6. What is Cluster Analysis? • Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups
  • 7. Puprses of Cluster Analysis Understanding Group related documents for browsing, group genes and proteins that have similar functionality, or group stocks with similar price fluctuations Summarization Reduce the size of large data sets Clustering precipitation in Australia
  • 8. What is not Cluster Analysis? Supervised classification Have class label information Simple segmentation Dividing students into different registration groups alphabetically, by last name Results of a query Groupings are a result of an external specification Graph partitioning Some mutual relevance and synergy, but areas are not identical
  • 9. Common Types of Clusters • Well-separated clusters • Center-based clusters • Contiguous clusters • Density-based clusters • Property or Conceptual • Described by an Objective Function
  • 10. Types of Clusters: Well-Separated • Well-Separated Clusters: – A cluster is a set of points such that any point in a cluster is closer (or more similar) to every other point in the cluster than to any point not in the cluster. 3 well-separated clusters
  • 11. Types of Clusters: Center-Based • Center-based (prototype-based) – A cluster is a set of objects such that an object in a cluster is closer (more similar) to the “center” of a cluster, than to the center of any other cluster – The center of a cluster is often a centroid, the average of all the points in the cluster, or a medoid, the most “representative” point of a cluster 4 center-based clusters
  • 12. Types of Clusters: Contiguity-Based • Contiguous Cluster (Graph-based) – A cluster is a set of points such that a point in a cluster is closer (or more similar) to one or more other points in the cluster than to any point not in the cluster. Cluster can be defined as a connected component i.e. a group of objects that are connected to one another, but that have no connection to objects outside the group. 8 contiguous clusters
  • 13. Types of Clusters : DENSITY-BASED Density-based A cluster is a dense region of points, which is separated by low-density regions, from other regions of high density. Used when the clusters are irregular or intertwined, and when noise and outliers are present. 6 density-based clusters
  • 14. Types of Clusters: Conceptual Clusters • Shared Property or Conceptual Clusters – Finds clusters that share some common property or represent a particular concept. . 2 Overlapping Circles
  • 15. Types of Clusters: Objective Function • Clusters Defined by an Objective Function – Finds clusters that minimize or maximize an objective function. – Enumerate all possible ways of dividing the points into clusters and evaluate the `goodness' of each potential set of clusters by using the given objective function. – Can have global or local objectives. • Hierarchical clustering algorithms typically have local objectives • Partitional algorithms typically have global objectives – A variation of the global objective function approach is to fit the data to a parameterized model. • Parameters for the model are determined from the data. • Mixture models assume that the data is a ‘mixture' of a number of statistical distributions.
  • 16. CONT. -Map the clustering problem to a different domain and solve a related problem in that domain -Proximity matrix defines a weighted graph, where the nodes are the points being clustered, and the weighted edges represent the proximities between points Clustering is equivalent to breaking the graph into connected components, one for each cluster. Want to minimize the edge weight between clusters and maximize the edge weight within clusters
  • 17. • A clustering is a set of clusters • Important distinction between hierarchical and partitional sets of clusters What is Clustering?
  • 18. Main type of Cluster Method •Hierarchical clustering methods: can be either agglomerative or divisive. • An agglomerative method starts with each point as a separate cluster, and successively performs merging until a stopping criterion is met. • A divisive method begins with all points in a single cluster and performs splitting until a stopping criterion is met. • The result of a hierarchical clustering method is a tree of clusters called a dendogram. • A tree like diagram that records the sequences of merges or splits 1 3 2 5 4 6 0 0.05 0.1 0.15 0.2 1 2 3 4 5 6 1 2 3 4 5
  • 20. Hierarchical clustering • Agglomerative (Bottom up) • 1st iteration1
  • 21. Hierarchical clustering • Agglomerative (Bottom up) • 2nd iteration1 2
  • 22. Hierarchical clustering • Agglomerative (Bottom up) • 3rd iteration 1 2 3
  • 23. Hierarchical clustering • Agglomerative (Bottom up) • 4th iteration 1 2 3 4
  • 24. Hierarchical clustering • Agglomerative (Bottom up) • 5th iteration 1 2 3 4 5
  • 25. Hierarchical clustering • Agglomerative (Bottom up) • Finally k clusters left 1 2 3 4 6 9 5 7 8
  • 26. Hierarchical clustering • Divisive (Top-down) Slide credit: Min Zhang
  • 27. Hierarchical clustering An example of hierarchical clustering methods is BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) Advantage: typically find a good clustering with a single scan of the data, and improve the quality further with a few additional scans. Disadvantages: it can hold only a limited number of entries due to its size, it not always correspond to what a user may consider a natural cluster.
  • 28. Type of Cluster Method CONT. •Partitional clustering methods: • determine a partition of the points into clusters, such that the points in a cluster are more similar to each other than to points in different clusters. •They start with some arbitrary initial clusters and iteratively until criterion is met. Examples of partitional clustering algorithms include k-means, PAM, CLARA, CLARANS and EM. Disadvantages: • it assumes that the points to be clustered are all stored in main memory. • the run time of the algorithm is prohibitive on large datasets.
  • 29. Partitional Clustering Original Points A Partitional Clustering
  • 30. Type of Cluster Method CONT. •Density-based clustering methods :try to find clusters based on the density of points in regions. Dense regions that are reachable from each other are merged to form clusters. Examples of density-based clustering methods include DBSCAN , DBRS and DBRS+ . Advantages: It gives extremely good results and is efficient in many datasets. Disadvantages: • if a dataset has clusters of widely varying densities, it can't able to handle with it efficiently. • it does not consider non-spatial attributes in the dataset. • it not suitable for finding approximate clusters in very large datasets.
  • 31. Type of Cluster Method CONT. Grid-based clustering methods quantize the clustering space into a finite number of cells and then perform the required operations on the quantized space. Cells containing more than a certain number of points are considered to be dense. Contiguous dense cells are connected to form clusters. Examples of grid-based clustering methods include CLIQUE and STING .
  • 32. Clustering Algorithms K-means and its variants Hierarchical clustering Density-based clustering
  • 33. K-Means • Initial set of clusters randomly chosen. • Iteratively, items are moved among sets of clusters until the desired set is reached. • High degree of similarity among elements in a cluster is obtained. • Given a cluster Ki={ti1,ti2,…,tim}, the cluster mean is mi = (1/m)(ti1 + … + tim)
  • 34. K-Means Example • Given: {2,4,10,12,3,20,30,11,25}, k=2 1- Randomly assign means: m1=3,m2=4 2- K1={2,3}, K2={4,10,12,20,30,11,25}, m1=2.5,m2=16 3-K1={2,3,4},K2={10,12,20,30,11,25}, m1=3,m2=18 4-K1={2,3,4,10},K2={12,20,30,11,25}, m1=4.75,m2=19.6 5-K1={2,3,4,10,11,12},K2={20,30,25}, m1=7,m2=25 • Stop as the clusters with these means are the same.
  • 36. Two different K-means Clustering -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Sub-optimal Clustering -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Optimal Clustering Original Points
  • 37. What are applications that use Clustering?
  • 38. Some applications of clustering • Clustering in EBMT (Example-Based Machine Translation) • Clustering in Online shopping mall • Clustering in Spatial database
  • 39. Clustering in EBMT • EBMT is a method of performing machine translation by imitating translation examples of similar sentences . • a large amount of bi/multi-lingual translation examples (Greek- English) has been stored in a textual database . • and input expressions are rendered in the target language by retrieving from the database that example which is most similar to the input. But , how to make retrieval of the example that best matches the input more efficient? By using of cluster
  • 40. Clustering in EBMT • It applied k-means algorithm . • Its enable the application to limit the search space and locate the best available match in a database.
  • 41. Clustering in Online shopping mall • Its combine the characteristics of Web 2.0 into Decision Support System for online shopping mall. • To facilitate customers to share profiles and comments. • The goal is to show the customers the product which they may have interest in, according to the information provided by the them. • the framework contains of the search engine as an external layer and the recommendation system as an internal layer. • It used K-means to clustering the products in internal layer.
  • 42. Clustering in Online shopping mall Fig2: DSS framework for online shopping mall
  • 43. Clustering in Online shopping mall Fig3: Products clustering
  • 44. Clustering in Spatial database • Spatial database is a large amounts of data are obtained from satellite for applications such as geo-marketing, traffic control, and environmental studies. • There exist many physical obstacles (rivers, lakes , highways) and their presence may affect the result of clustering in spatial databases.
  • 45. Clustering in Spatial database Fig4: map of western population of Canada with Highways and Rivers
  • 46. Clustering in Spatial database • However, most of clustering algorithms cannot deal with obstacles like DBRS, DBRS+( Density-Based Clustering with Random Sampling) . • Because, it can't handle with widely varying densities clusters in dataset efficiently. • so new clustering algorithm MMO (mathematics morphology based algorithm of obstacles clustering) is proposed for the problem of clustering in the presence of obstacles.
  • 47. Clustering in Spatial database • The main contributions are new operators are more accurate (Connector and Obstor)than the ordinary operators: (open and close).
  • 48. Clustering in Spatial database • Result: The performance tests show that: MMO is suitable and effective for large data set more than other algorithms . Fig5: runtime of MMO with DBRS+
  • 49. Conclusion The main purpose of Data mining is to provide decision support to managers and business professionals through knowledge discovery -Analyzes vast store of historical business data -Tries to discover patterns, trends, and correlations hidden in the data that can help a company improve its business performance -Use regression, decision tree, neural network, cluster analysis, or market basket analysis -Cluster Analysis is the common method of Data mining tools -Cluster methods and It’s algorithms were the effective methods used by many application these day , and good Research area For DSS fields was Done in Cluster methods .
  • 50. References 1. CRANIAS, L., PAPAGEORGIOU, H., & PIPERIDIS, S. (1994). CLUSTERING : A TECHNIQUE FOR SEARCH SPACE REDUCTION IN EXAMPLEBASED MACHINE TRANSLATION. IEEE. 2. Pattabiraman, V. (2012). Optimization of spatial join using constraints based- clustering techniques. Engineering and Computer Innovations. 3. Wang, X., Rostoker, C., & Hamilton , H. J. (2004). Density-Based Spatial Clustering in the Presence of Obstacles and Facilitators. 4. Wang, X., & Hamilton, H. (n.d.). Using Clustering Methods in Geospatial Information Systems. 5. Wenxing, H., Weng, Y., Xie, L., & Maoqing, L. (2009). Design and Implementation of Web-based DSS for Online Shopping Mall. IEEE. 6. Zhang, Q. (2008). A Mathematics Morphology Based Algorithm of Obstacles Clustering. International Conference on Computer Science and Software Engineering.