SlideShare a Scribd company logo
1 of 67
Download to read offline
Unit 5: Cluster Analysis LH 7
Presented By : Tekendra Nath Yogi
Tekendranath@gmail.com
College Of Applied Business And Technology
Contd…
• Outline:
– 5.1. Basics and Algorithms
– 5.2. K-means Clustering
– 5.3. Hierarchical Clustering
– 5.4. Density-based spatial clustering of applications with noise (DBSCAN)
Clustering
27/5/2019 By: Tekendra Nath Yogi
July 5, 2019 By:Tekendra Nath Yogi 3
Introduction
• Cluster is a group of similar objects.
• Clustering is the process of finding groups of objects such that the objects in a
group will be similar (or related) to one another and different from (or
unrelated to) the objects in other groups
July 5, 2019 By:Tekendra Nath Yogi 4
Contd…
• Dissimilarities and similarities are assessed based on the attribute values
describing the objects and often involve distance measures (e.g., Euclidean
distance).
• Clustering, falling under the category of unsupervised machine
learning, because it uses unlabeled input data and allows the algorithm to act
on that information without guidance.
• Different clustering methods may generate different clusters on the same data
set.
• The partitioning is not performed by humans, but by the clustering algorithm.
July 5, 2019 By:Tekendra Nath Yogi 5
Contd…
• a good clustering algorithm aims to create clusters whose:
– intra-cluster similarity is high (The data that is present inside the cluster is
similar to one another)
– inter-cluster similarity is less (Each cluster holds data that isn’t similar to
the other)
7/5/2019 By:Tekendra Nath Yogi 6
Some Applications of Clustering
• Cluster analysis has been widely used in numerous applications
such as:
– In business intelligence
– In image reorganization
– In web search
– In Outlier detection
– In biology
7/5/2019 By:Tekendra Nath Yogi 7
Contd..
• In Business intelligence:
– clustering can help marketers discover distinct groups in their customer
bases and characterize customer groups based on purchasing patterns so
that, for example, advertising can be appropriately targeted..
7/5/2019 By:Tekendra Nath Yogi 8
Contd..
• In image recognization:
– In image recognition, clustering can be used to discover clusters or
“subclasses” in handwritten character recognition systems.
– For example: Some people may write it with a small circle at the left
bottom part, while some others may not. We can use clustering to
determine subclasses for “2,” each of which represents a variation on the
way in which 2 can be written.
7/5/2019 By:Tekendra Nath Yogi 9
Contd..
• In web search
– document grouping: Clustering can be used to organize the search results
into groups and present the results in a concise and easily accessible way.
– cluster Weblog data to discover groups of similar access patterns.
7/5/2019 By:Tekendra Nath Yogi 10
Contd..
• In Outlier detection
– Clustering can also be used for outlier detection, where outliers (values
that are “far away” from any cluster) may be more interesting than
common cases.
– Applications of outlier detection include the detection of credit card fraud
and the monitoring of criminal activities in electronic commerce.
7/5/2019 By:Tekendra Nath Yogi 11
Contd..
• In biology:
– In biology, it can be used to derive plant and animal taxonomies,
categorize genes with similar functionality, and gain insight into structures
inherent in populations.
July 5, 2019 By:Tekendra Nath Yogi 12
Requirements of Clustering in Data Mining
• The following are typical requirements of clustering in data
mining.
– Scalability
– Ability to deal with different types of attributes
– Discovery of clusters with arbitrary shape
– Minimal requirements for domain knowledge to determine input parameters
– Ability to deal with noisy data
– Incremental clustering and insensitivity to input order
– Capability of clustering high-dimensionality data
– Constraint-based clustering
– Interpretability and usability
7/5/2019 By:Tekendra Nath Yogi 13
Contd..
• Scalability:
– Many clustering algorithms work well on small data sets containing fewer
than several hundred data objects; however, a large database may contain
millions of objects.
– Clustering on a sample of a given large data set may lead to biased results.
– Highly scalable clustering algorithms are needed.
7/5/2019 By:Tekendra Nath Yogi 14
Contd..
• Ability to deal with different types of attributes:
– Many algorithms are designed to cluster interval-based (numerical) data.
– However, applications may require clustering other types of data, such as
binary, categorical (nominal), and ordinal data, or mixtures of these data
types.
7/5/2019 By:Tekendra Nath Yogi 15
Contd..
• Discovery of clusters with arbitrary shape:
– Many clustering algorithms determine clusters based on Euclidean
distance measures.
– Algorithms based on such distance measures tend to find spherical
clusters with similar size and density.
– However, a cluster could be of any shape.
– It is important to develop algorithms that can detect clusters of arbitrary
shape.
7/5/2019 By:Tekendra Nath Yogi 16
Contd..
• Minimal requirements for domain knowledge to determine
input parameters:
– Many clustering algorithms require users to input certain parameters in
cluster analysis (such as the number of desired clusters).
– The clustering results can be quite sensitive to input parameters.
– Parameters are often difficult to determine, especially for data sets
containing high-dimensional objects.
– This not only burdens users, but it also makes the quality of clustering
difficult to control.
7/5/2019 By:Tekendra Nath Yogi 17
Contd..
• Ability to deal with noisy data:
– Most real-world databases contain outliers or missing, unknown, or
erroneous data.
– Some clustering algorithms are sensitive to such data and may lead to
clusters of poor quality.
7/5/2019 By:Tekendra Nath Yogi 18
Contd..
• Incremental clustering and insensitivity to the order of input
records:
– Some clustering algorithms cannot incorporate newly inserted data (i.e.,
database updates) into existing clustering structures and, instead, must
determine a new clustering from scratch.
– Some clustering algorithms are sensitive to the order of input data. That is,
given a set of data objects, such an algorithm may return dramatically
different clustering depending on the order of presentation of the input
objects.
– It is important to develop incremental clustering algorithms and algorithms
that are insensitive to the order of input.
7/5/2019 By:Tekendra Nath Yogi 19
Contd..
• High dimensionality:
– A database or a data warehouse can contain several dimensions or
attributes.
– Many clustering algorithms are good at handling low-dimensional data,
involving only two to three dimensions.
– Human eyes are good at judging the quality of clustering for up to three
dimensions.
– Finding clusters of data objects in high dimensional space is challenging,
especially considering that such data can be sparse and highly skewed.
7/5/2019 By:Tekendra Nath Yogi 20
Contd..
• Constraint-based clustering:
– Real-world applications may need to perform clustering under various
kinds of constraints.
– Suppose that your job is to choose the locations for a given number of
new Automated Teller Machines (ATMs) in a city.
– To decide upon this, you may cluster households while considering
constraints such as the city’s rivers and highway networks, and the type
and number of customers per cluster.
– A challenging task is to find groups of data with good clustering behavior
that satisfy specified constraints.
7/5/2019 By:Tekendra Nath Yogi 21
Contd..
• Interpretability and usability:
– Users expect clustering results to be interpretable, comprehensible, and
usable.
– That is, clustering may need to be tied to specific semantic interpretations
and applications.
– It is important to study how an application goal may influence the
selection of clustering features and methods.
July 5, 2019 By:Tekendra Nath Yogi 22
Major Clustering Methods:
• In general, the major fundamental clustering methods can be
classified into the following categories:
– Partitioning Methods
– Hierarchical Methods
– Density-Based Methods
– Grid-Based Methods
July 5, 2019 By:Tekendra Nath Yogi 23
Contd..
• Partitioning Methods:
– Given a data set, D, of n objects, and k, the number of clusters to form, a
partitioning method constructs k partitions of the data, where each partition
represents a cluster and k <= n.
– That is, it classifies the data into k groups, which together satisfy the following
requirements:
• Each group must contain at least one object, and
• Each object must belong to exactly one group.
July 5, 2019 By:Tekendra Nath Yogi 24
Contd…
– A partitioning method creates an initial partitioning. It then uses an iterative relocation
technique that attempts to improve the partitioning by moving objects from one group
to another.
– The general criterion of a good partitioning is that objects in the same cluster are close
or related to each other, whereas objects of different clusters are far apart or very
different.
July 5, 2019 By:Tekendra Nath Yogi 25
k-Means: A Centroid-Based Technique
• A Centroid-based partitioning technique uses the centroid of a cluster, Ci , to
represent that cluster.
• The centroid of a cluster is its center point such as the mean of the objects (or
points) assigned to the cluster.
• The distance between an object and ci, the representative of the
cluster, is measured by dist(p, ci),
• where dist(i, j) is the Euclidean distance between two points
July 5, 2019 By:Tekendra Nath Yogi 26
Contd..
• The k-means algorithm defines the centroid of a cluster as the mean value of
the points within the cluster. It proceeds as follows:
– First, it randomly selects k of the objects in D, each of which initially
represents a cluster mean or center.
– For each of the remaining objects, an object is assigned to the cluster to
which it is the most similar, based on the Euclidean distance between the
object and the cluster mean.
– The k-means algorithm then iteratively improves the within-cluster
variation. For each cluster, it computes the new mean using the objects
assigned to the cluster in the previous iteration. All the objects are then
reassigned using the updated means as the new cluster centers.
– The iterations continue until the assignment is stable, that is, the clusters
formed in the current round are the same as those formed in the previous
round.
July 5, 2019 By:Tekendra Nath Yogi 27
Contd..
• Algorithm:
– The k-means algorithm for partitioning, where each cluster’s center is
represented by the mean value of the objects in the cluster.
28
Contd…
• Example1: Clusters the following instances of given data (2-
Dimensional form) with the help of K means algorithm (Take K
= 2)
Instance X Y
1 1 1.5
2 1 4.5
3 2 1.5
4 2 3.5
5 3 2.5
6 3 4
July 5, 2019 By:Tekendra Nath Yogi 29
Contd…
• Solution:
– Given, number of clusters to be created (k)=2.
– Initially choose two points randomly as a initial cluster center, say objects
1 and 3 are chosen
– i.e., c1=(1, 1.5) and c2= (2, 1.5)
July 5, 2019 By:Tekendra Nath Yogi 30
Contd…
• Iteration1:
– Now calculating similarity by using Euclidean distance measure as:
– dist(c1,2) = √(1 - 1)² + (1.5- 4.5)²=3
– dist(c2, 2) = √(2 - 1)² + (1.5 – 4.5)²=3.163
– Here, dist(c1, 2)< dist(c2,2)
– So, data point 2 belongs to c1.
July 5, 2019 By:Tekendra Nath Yogi 31
Contd…
– dist(c1,4) = √(1 - 2)² + (1.5- 3.5)²=2.236
– dist(c2, 4) = √(2 - 2)² + (1.5 – 3.5)²=2
– Here, dist(c2, 4)< dist(c1,4)
– So, data point 4 belongs to c2.
– dist(c1,5) = √(1 - 3)² + (1.5- 2.5)²=2.236
– dist(c2, 5) = √(2 - 3)² + (1.5 – 2.5)²=1.4143
– Here, dist(c2, 5)< dist(c1,5)
– So, data point 5 belongs to c2.
July 5, 2019 By:Tekendra Nath Yogi 32
Contd…
– dist(c1,6) = √(1 - 3)² + (1.5- 4)²=3.2
– dist(c2, 6) = √(2 - 3)² + (1.5 – 4)²=2.7
– Here, dist(c2, 6)< dist(c1,6)
– So, data point 6 belongs to c2.
– The resulting cluster after 1st iteration is:
1, 2
C1
3,4,5,6
C2
July 5, 2019 By:Tekendra Nath Yogi 33
Contd…
• Iteration 2:
• Now calculating centroid for each cluster:
– Centroid for c1=(1+1/2, 1.5+4.5/2)=( 1, 3)
– Centroid for c3=((2+2+3+3)/4, ( 1.5+3.5+2.5+4)/4)=( 2.5, 2.875)
– Now, again calculating similarity:
– dist(c1,1) = √(1 - 1)² + (3- 1.5)²=1.5
– dist(c2, 1) = √(2.5 - 1)² + (2.875 – 1.5)²=2.035
– Here, dist(c1, 1)< dist(c2,1)
– So, data point 1 belongs to c1.
July 5, 2019 By:Tekendra Nath Yogi 34
Contd…
– dist(c1,2) = √(1 - 1)² + (3- 4.5)²=1.5
– dist(c2, 2) = √(2.5 - 1)² + (2.875 – 4.5)²=2.22
– Here, dist(c1, 2)< dist(c2,2)
– So, data point 2 belongs to c1.
– dist(c1,3) = √(1 - 2)² + (3- 1.5)²=1.8
– dist(c2, 3) = √(2.5 - 2)² + (2.875 – 1.5)²=1.463
– Here, dist(c2, 3)< dist(c1,3)
– So, data point 3 belongs to c2.
July 5, 2019 By:Tekendra Nath Yogi 35
Contd…
– dist(c1,4) = √(1 - 2)² + (3- 3.5)²=1.12
– dist(c2, 4) = √(2.5 - 2)² + (2.875 – 3.5)²=0.8
– Here, dist(c2, 4)< dist(c1,4)
– So, data point 4 belongs to c2.
– dist(c1,5) = √(1 - 3)² + (3- 2.5)²=2.06
– dist(c2, 5) = √(2.5 - 3)² + (2.875 – 2.5)²=0.625
– Here, dist(c2, 5)< dist(c1,5)
– So, data point 5 belongs to c2.
July 5, 2019 By:Tekendra Nath Yogi 36
Contd…
– dist(c1,6) = √(1 - 3)² + (3- 4)²=2.236
– dist(c2, 6) = √(2.5 - 3)² + (2.875 – 4)²=0.718
– Here, dist(c2, 6)< dist(c1,6)
– So, data point 6 belongs to c2.
The resulting cluster after 1st iteration is:
Same as iteration 1, so terminate.
1, 2
C1
3,4,5,6
C2
July 5, 2019 By:Tekendra Nath Yogi 37
Contd…
July 5, 2019 By:Tekendra Nath Yogi 38
Contd…
July 5, 2019 By:Tekendra Nath Yogi 39
Contd…
July 5, 2019 By:Tekendra Nath Yogi 40
Contd…
41
Contd..
• Example 3: Clusters the following instances of given data (2-
Dimensional form) with the help of K means algorithm (Take K
= 2)
Instance X Y
1 1 2.5
2 1 4.5
3 2.5 3
4 2 1.5
5 4.5 1.5
6 4 5
July 5, 2019 By:Tekendra Nath Yogi 42
Contd…
• Weakness of K-means:
– Applicable only when mean is defined.
– Need to specify k, the number of cluster in advance.
– Unable to handle outliers.
July 5, 2019 By:Tekendra Nath Yogi 43
Hierarchical clustering
• A hierarchical clustering method works by grouping data objects into a
hierarchy or “tree” of clusters.
• Representing data objects in the form of a hierarchy is useful for data
summarization and visualization.
July 5, 2019 By:Tekendra Nath Yogi 44
Contd..
• Depending on whether the hierarchical decomposition is formed in a bottom-
up (merging) or top-down (splitting) fashion a hierarchical clustering method
can be classified into two categories:
– Agglomerative Hierarchical Clustering and
– Divisive Hierarchical Clustering
July 5, 2019 By:Tekendra Nath Yogi 45
Contd..
• Agglomerative Hierarchical Clustering:
– uses a bottom-up strategy.
– starts by letting each object form its own cluster and iteratively merges
clusters into larger and larger clusters, until all the objects are in a single
cluster or certain termination conditions(desired number of clusters) are
satisfied.
– For the merging step, it finds the two clusters that are closest to each other
(according to some similarity measure), and combines the two to form one
cluster.
July 5, 2019 By:Tekendra Nath Yogi 46
Contd..
• Example: a data set of five objects, {a, b, c, d, e}. Initially, AGNES
(AGglomerative NESting), the agglomerative method, places each object into
a cluster of its own. The clusters are then merged step-by-step according to
some criterion (e.g., minimum Euclidean distance).
July 5, 2019 By:Tekendra Nath Yogi 47
Contd..
• Divisive hierarchical clustering :
– A divisive hierarchical clustering method employs a top-down strategy.
– It starts by placing all objects in one cluster, which is the hierarchy’s root.
– It then divides the root cluster into several smaller sub-clusters, and
recursively partitions those clusters into smaller ones.
– The partitioning process continues until each cluster at the lowest level
either containing only one object, or the objects within a cluster are
sufficiently similar to each other.
July 5, 2019 By:Tekendra Nath Yogi 48
Contd..
• Example: DIANA (DIvisive ANAlysis), a divisive hierarchical clustering
method:
– a data set of five objects, {a, b, c, d, e}. All the objects are used to form
one initial cluster. The cluster is split according to some principle such as
the maximum Euclidean distance between the closest neighboring objects
in the cluster. The cluster-splitting process repeats until, eventually, each
new cluster contains only a single object.
July 5, 2019 By:Tekendra Nath Yogi 49
Contd..
• agglomerative versus divisive hierarchical clustering:
– Organize objects into a hierarchy using a bottom-up or top-down strategy,
respectively.
– Agglomerative methods start with individual objects as clusters, which are
iteratively merged to form larger clusters.
– Conversely, divisive methods initially let all the given objects form one
cluster, which they iteratively split into smaller clusters.
July 5, 2019 By:Tekendra Nath Yogi 50
Contd..
• Hierarchical clustering methods can encounter difficulties
regarding the selection of merge or split points.
– Such a decision is critical, because once a group of objects is merged or
split, the process at the next step will operate on the newly generated
clusters. It will neither undo what was done previously, nor perform object
swapping between clusters.
– Thus, merge or split decisions, if not well chosen, may lead to low-quality
clusters.
• Moreover, the methods do not scale well because each decision of merge or
split needs to examine and evaluate many objects or clusters.
7/5/2019 By:Tekendra Nath Yogi 51
Density Based Methods
• Partitioning methods and hierarchical clustering are suitable for finding
spherical-shaped clusters.
• Moreover, they are also severely affected by the presence of noise and
outliers in the data.
• Unfortunately, real life data contain:
– Clusters of arbitrary shape such as oval, linear, s-shaped, etc.
– Many noise
• Solution : Density based methods
7/5/2019 By:Tekendra Nath Yogi 52
Contd..
• Basic Idea behind Density based methods:
– Model clusters as dense regions in the data space, separated by sparse
regions.
• Major features:
– Discover clusters of arbitrary shape(e.g., oval, s-shaped, etc)
– Handle noise
– Need density parameters as termination condition
• E.g., : DBSCAN(Density Based Spatial Clustering of Applications with Noise)
Density-Based Clustering: Background
• Neighborhood of point p=all points within distance e from p:
– NEps(p)={q | dist(p,q) <= e }
• Two parameters:
– e : Maximum radius of the neighbourhood
– MinPts: Minimum number of points in an e -neighbourhood of that point
• If the number of points in the e -neighborhood of p is at least
MinPts, then p is called a core object.
p
q
MinPts = 5
e = 1 cm
Contd..
• Directly density-reachable:
– A point p is directly density-reachable from a point q wrt. e, MinPts if
• 1) p belongs to NEps(q)
• 2) core point condition: |NEps (q)| >= MinPts
p
q
MinPts = 5
e = 1 cm
Contd..
• Density-reachable:
– A point p is density-reachable from a point q wrt. Eps, MinPts if there is a
chain of points p1, …, pn, q = p1,….. pn = p such that pi+1 is directly
density-reachable from pi
p
q
p1
Contd..
• Density-connected:
– A point p is density-connected to a point q wrt. Eps, MinPts if there is a
point o such that both, p and q are density-reachable from o wrt. Eps and
MinPts.
p q
o
7/5/2019 By:Tekendra Nath Yogi 57
Contd..
• Density = number of points within a specified radius (Eps).
• A point is a core point if it has at least a specified number of points (MinPts)
within Eps.
• These are points that are at the interior of a cluster
• Counts the point itself
• A border point is not a core point, but is in the neighborhood of a core point
• A noise point is any point that is not a core point or a border point
e.g.,: Minpts=7
7/5/2019 By:Tekendra Nath Yogi 58
DBSCAN(Density Based Spatial Clustering of Applications with Noise)
• To find the next cluster, DBSCAN randomly selects an unvisited object
from the remaining ones. The clustering process continues until all
objects are visited.
7/5/2019 By:Tekendra Nath Yogi 59
Contd..
7/5/2019 By:Tekendra Nath Yogi 60
Contd..
• Example:
– If Epsilon is 2 and minpoint is 2, what are the clusters that DBScan would
discover with the following 8 examples: A1=(2,10), A2=(2,5), A3=(8,4),
A4=(5,8), A5=(7,5), A6=(6,4), A7=(1,2), A8=(4,9).
• Solution :
– d(a,b) denotes the Eucledian distance between a and b. It is obtained
directly from the distance matrix calculated as follows:
– d(a,b)=sqrt((xb-xa)2+(yb-ya)2))
7/5/2019 By:Tekendra Nath Yogi 61
Contd..
A1 A2 A3 A4 A5 A6 A7 A8
A1 0 √25 √36 √13 √50 √52 √65 √5
A2 0 √37 √18 √25 √17 √10 √20
A3 0 √25 √2 √2 √53 √41
A4 0 √13 √17 √52 √2
A5 0 √2 √45 √25
A6 0 √29 √29
A7 0 √58
A8 0
7/5/2019 By:Tekendra Nath Yogi 62
Contd..
• N2(A1)={};
• N2(A2)={};
• N2(A3)={A5, A6};
• N2(A4)={A8};
• N2(A5)={A3, A6};
• N2(A6)={A3, A5};
• N2(A7)={};
• N2(A8)={A4};
• So A1, A2, and A7 are outliers, while we have two clusters C1={A4,
A8} and C2={A3, A5, A6}
7/5/2019 By:Tekendra Nath Yogi 63
Contd..
7/5/2019 By:Tekendra Nath Yogi 64
Advantages and Disadvantages of DBSCAN algorithm:
• Advantages:
– DBSCAN does not require one to specify the number of clusters in the
data priori, as opposed to k-means.
– DBSCAN can find arbitrarily shaped clusters
– DBSCAN is robust to outliers.
– DBSCAN is mostly insensitive to the ordering of the points in the
database.
– The parameters minPts and ε can be set by a domain expert, if the data is
well understood.
7/5/2019 By:Tekendra Nath Yogi 65
Contd..
• Disadvantages:
– DBSCAN is not entirely deterministic: border points that are reachable
from more than one cluster can be part of either cluster, depending on the
order the data is processed. Fortunately, this situation does not arise often,
and has little impact on the clustering result: both on core points and noise
points, DBSCAN is deterministic
– DBSCAN cannot cluster data sets well with large differences in densities,
since the minPts-ε combination cannot then be chosen appropriately for all
clusters.
– If the data and scale are not well understood, choosing a meaningful
distance threshold ε can be difficult.
7/5/2019 By:Tekendra Nath Yogi 66
Homework
• Explain the aims of cluster analysis.
• What is clustering? How is it different than supervised classification?
In what situation clustering can be useful?
• List and explain desired features of cluster analysis.
• Explain the different types of cluster analysis methods and discuss their
features.
• Describe the k-means algorithm and write its strengths and
weaknesses.
• Describe the features of Hierarchical clustering methods? In what
situations are these methods useful?
Thank You !
67By: Tekendra Nath Yogi7/5/2019

More Related Content

What's hot

01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data MiningValerii Klymchuk
 
IT6701 Information Management Unit-I
IT6701 Information Management Unit-IIT6701 Information Management Unit-I
IT6701 Information Management Unit-IMikel Raj
 
Creating Effective Data Visualizations for Online Learning
Creating Effective Data Visualizations for Online Learning Creating Effective Data Visualizations for Online Learning
Creating Effective Data Visualizations for Online Learning Shalin Hai-Jew
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
 
Regression, Bayesian Learning and Support vector machine
Regression, Bayesian Learning and Support vector machineRegression, Bayesian Learning and Support vector machine
Regression, Bayesian Learning and Support vector machineDr. Radhey Shyam
 
The use of genetic algorithm, clustering and feature selection techniques in ...
The use of genetic algorithm, clustering and feature selection techniques in ...The use of genetic algorithm, clustering and feature selection techniques in ...
The use of genetic algorithm, clustering and feature selection techniques in ...IJMIT JOURNAL
 
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and PredictionUsing ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Predictionijtsrd
 
Full Paper: Analytics: Key to go from generating big data to deriving busines...
Full Paper: Analytics: Key to go from generating big data to deriving busines...Full Paper: Analytics: Key to go from generating big data to deriving busines...
Full Paper: Analytics: Key to go from generating big data to deriving busines...Piyush Malik
 
The state of the art in integrating machine learning into visual analytics
The state of the art in integrating machine learning into visual analyticsThe state of the art in integrating machine learning into visual analytics
The state of the art in integrating machine learning into visual analyticsCagatay Turkay
 
RECOMMENDED ELEMENTS OF INFOGRAPHICS IN EDUCATION (PROGRAMMING FOCUSED)
RECOMMENDED ELEMENTS OF INFOGRAPHICS IN EDUCATION (PROGRAMMING FOCUSED)RECOMMENDED ELEMENTS OF INFOGRAPHICS IN EDUCATION (PROGRAMMING FOCUSED)
RECOMMENDED ELEMENTS OF INFOGRAPHICS IN EDUCATION (PROGRAMMING FOCUSED)ijma
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningKai Koenig
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection methodIJSRD
 
Performance Analysis of Selected Classifiers in User Profiling
Performance Analysis of Selected Classifiers in User ProfilingPerformance Analysis of Selected Classifiers in User Profiling
Performance Analysis of Selected Classifiers in User Profilingijdmtaiir
 
Analyzing undergraduate students’ performance in various perspectives using d...
Analyzing undergraduate students’ performance in various perspectives using d...Analyzing undergraduate students’ performance in various perspectives using d...
Analyzing undergraduate students’ performance in various perspectives using d...Alexander Decker
 
User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...
User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...
User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...Aalto University
 
Computational Rationality I - a Lecture at Aalto University by Antti Oulasvirta
Computational Rationality I - a Lecture at Aalto University by Antti OulasvirtaComputational Rationality I - a Lecture at Aalto University by Antti Oulasvirta
Computational Rationality I - a Lecture at Aalto University by Antti OulasvirtaAalto University
 
Survey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data MiningSurvey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data Miningijcsit
 
Selecting Experts Using Data Quality Concepts
Selecting Experts Using Data Quality ConceptsSelecting Experts Using Data Quality Concepts
Selecting Experts Using Data Quality Conceptsijdms
 
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...University of Bologna
 

What's hot (20)

01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
 
IT6701 Information Management Unit-I
IT6701 Information Management Unit-IIT6701 Information Management Unit-I
IT6701 Information Management Unit-I
 
Creating Effective Data Visualizations for Online Learning
Creating Effective Data Visualizations for Online Learning Creating Effective Data Visualizations for Online Learning
Creating Effective Data Visualizations for Online Learning
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithm
 
Regression, Bayesian Learning and Support vector machine
Regression, Bayesian Learning and Support vector machineRegression, Bayesian Learning and Support vector machine
Regression, Bayesian Learning and Support vector machine
 
The use of genetic algorithm, clustering and feature selection techniques in ...
The use of genetic algorithm, clustering and feature selection techniques in ...The use of genetic algorithm, clustering and feature selection techniques in ...
The use of genetic algorithm, clustering and feature selection techniques in ...
 
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and PredictionUsing ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction
 
Full Paper: Analytics: Key to go from generating big data to deriving busines...
Full Paper: Analytics: Key to go from generating big data to deriving busines...Full Paper: Analytics: Key to go from generating big data to deriving busines...
Full Paper: Analytics: Key to go from generating big data to deriving busines...
 
The state of the art in integrating machine learning into visual analytics
The state of the art in integrating machine learning into visual analyticsThe state of the art in integrating machine learning into visual analytics
The state of the art in integrating machine learning into visual analytics
 
RECOMMENDED ELEMENTS OF INFOGRAPHICS IN EDUCATION (PROGRAMMING FOCUSED)
RECOMMENDED ELEMENTS OF INFOGRAPHICS IN EDUCATION (PROGRAMMING FOCUSED)RECOMMENDED ELEMENTS OF INFOGRAPHICS IN EDUCATION (PROGRAMMING FOCUSED)
RECOMMENDED ELEMENTS OF INFOGRAPHICS IN EDUCATION (PROGRAMMING FOCUSED)
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection method
 
Performance Analysis of Selected Classifiers in User Profiling
Performance Analysis of Selected Classifiers in User ProfilingPerformance Analysis of Selected Classifiers in User Profiling
Performance Analysis of Selected Classifiers in User Profiling
 
Data Mining
Data MiningData Mining
Data Mining
 
Analyzing undergraduate students’ performance in various perspectives using d...
Analyzing undergraduate students’ performance in various perspectives using d...Analyzing undergraduate students’ performance in various perspectives using d...
Analyzing undergraduate students’ performance in various perspectives using d...
 
User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...
User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...
User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...
 
Computational Rationality I - a Lecture at Aalto University by Antti Oulasvirta
Computational Rationality I - a Lecture at Aalto University by Antti OulasvirtaComputational Rationality I - a Lecture at Aalto University by Antti Oulasvirta
Computational Rationality I - a Lecture at Aalto University by Antti Oulasvirta
 
Survey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data MiningSurvey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data Mining
 
Selecting Experts Using Data Quality Concepts
Selecting Experts Using Data Quality ConceptsSelecting Experts Using Data Quality Concepts
Selecting Experts Using Data Quality Concepts
 
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...
 

Similar to BIM Data Mining Unit5 by Tekendra Nath Yogi

For iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptxFor iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptxSureshPolisetty2
 
pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)Pratik Meshram
 
Clustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfClustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfigeabroad
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1bPRAWEEN KUMAR
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningNatasha Grant
 
Assessment of Cluster Tree Analysis based on Data Linkages
Assessment of Cluster Tree Analysis based on Data LinkagesAssessment of Cluster Tree Analysis based on Data Linkages
Assessment of Cluster Tree Analysis based on Data Linkagesjournal ijrtem
 
UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningNandakumar P
 
An Iterative Improved k-means Clustering
An Iterative Improved k-means ClusteringAn Iterative Improved k-means Clustering
An Iterative Improved k-means ClusteringIDES Editor
 
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace DataMPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace DataIRJET Journal
 
BIM Data Mining Unit1 by Tekendra Nath Yogi
 BIM Data Mining Unit1 by Tekendra Nath Yogi BIM Data Mining Unit1 by Tekendra Nath Yogi
BIM Data Mining Unit1 by Tekendra Nath YogiTekendra Nath Yogi
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...IJCSIS Research Publications
 
Clustering, application, methods u 1
Clustering, application, methods u 1Clustering, application, methods u 1
Clustering, application, methods u 1sakthyvel3
 
A Survey on the Clustering Algorithms in Sales Data Mining
A Survey on the Clustering Algorithms in Sales Data MiningA Survey on the Clustering Algorithms in Sales Data Mining
A Survey on the Clustering Algorithms in Sales Data MiningEditor IJCATR
 
Cluster analysis (2).docx
Cluster analysis (2).docxCluster analysis (2).docx
Cluster analysis (2).docxYaseenRashid4
 
Survey on Text Mining Based on Social Media Comments as Big Data Analysis Usi...
Survey on Text Mining Based on Social Media Comments as Big Data Analysis Usi...Survey on Text Mining Based on Social Media Comments as Big Data Analysis Usi...
Survey on Text Mining Based on Social Media Comments as Big Data Analysis Usi...IJMREMJournal
 

Similar to BIM Data Mining Unit5 by Tekendra Nath Yogi (20)

Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
Descriptive m0deling
Descriptive m0delingDescriptive m0deling
Descriptive m0deling
 
For iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptxFor iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptx
 
pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)
 
Chapter 5.pdf
Chapter 5.pdfChapter 5.pdf
Chapter 5.pdf
 
Clustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfClustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdf
 
CLUSTER ANALYSIS.pptx
CLUSTER ANALYSIS.pptxCLUSTER ANALYSIS.pptx
CLUSTER ANALYSIS.pptx
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
 
Customer segmentation.pptx
Customer segmentation.pptxCustomer segmentation.pptx
Customer segmentation.pptx
 
Assessment of Cluster Tree Analysis based on Data Linkages
Assessment of Cluster Tree Analysis based on Data LinkagesAssessment of Cluster Tree Analysis based on Data Linkages
Assessment of Cluster Tree Analysis based on Data Linkages
 
UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data Mining
 
An Iterative Improved k-means Clustering
An Iterative Improved k-means ClusteringAn Iterative Improved k-means Clustering
An Iterative Improved k-means Clustering
 
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace DataMPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
 
BIM Data Mining Unit1 by Tekendra Nath Yogi
 BIM Data Mining Unit1 by Tekendra Nath Yogi BIM Data Mining Unit1 by Tekendra Nath Yogi
BIM Data Mining Unit1 by Tekendra Nath Yogi
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
 
Clustering, application, methods u 1
Clustering, application, methods u 1Clustering, application, methods u 1
Clustering, application, methods u 1
 
A Survey on the Clustering Algorithms in Sales Data Mining
A Survey on the Clustering Algorithms in Sales Data MiningA Survey on the Clustering Algorithms in Sales Data Mining
A Survey on the Clustering Algorithms in Sales Data Mining
 
Cluster analysis (2).docx
Cluster analysis (2).docxCluster analysis (2).docx
Cluster analysis (2).docx
 
Survey on Text Mining Based on Social Media Comments as Big Data Analysis Usi...
Survey on Text Mining Based on Social Media Comments as Big Data Analysis Usi...Survey on Text Mining Based on Social Media Comments as Big Data Analysis Usi...
Survey on Text Mining Based on Social Media Comments as Big Data Analysis Usi...
 

More from Tekendra Nath Yogi

Unit4: Knowledge Representation
Unit4: Knowledge RepresentationUnit4: Knowledge Representation
Unit4: Knowledge RepresentationTekendra Nath Yogi
 
Unit3:Informed and Uninformed search
Unit3:Informed and Uninformed searchUnit3:Informed and Uninformed search
Unit3:Informed and Uninformed searchTekendra Nath Yogi
 
BIM Data Mining Unit4 by Tekendra Nath Yogi
 BIM Data Mining Unit4 by Tekendra Nath Yogi BIM Data Mining Unit4 by Tekendra Nath Yogi
BIM Data Mining Unit4 by Tekendra Nath YogiTekendra Nath Yogi
 
B. SC CSIT Computer Graphics Unit 5 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 5 By Tekendra Nath YogiB. SC CSIT Computer Graphics Unit 5 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 5 By Tekendra Nath YogiTekendra Nath Yogi
 
B. SC CSIT Computer Graphics Lab By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Lab By Tekendra Nath YogiB. SC CSIT Computer Graphics Lab By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Lab By Tekendra Nath YogiTekendra Nath Yogi
 
B. SC CSIT Computer Graphics Unit 4 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 4 By Tekendra Nath YogiB. SC CSIT Computer Graphics Unit 4 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 4 By Tekendra Nath YogiTekendra Nath Yogi
 
B. SC CSIT Computer Graphics Unit 3 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 3 By Tekendra Nath YogiB. SC CSIT Computer Graphics Unit 3 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 3 By Tekendra Nath YogiTekendra Nath Yogi
 
B. SC CSIT Computer Graphics Unit 2 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 2 By Tekendra Nath YogiB. SC CSIT Computer Graphics Unit 2 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 2 By Tekendra Nath YogiTekendra Nath Yogi
 

More from Tekendra Nath Yogi (20)

Unit9:Expert System
Unit9:Expert SystemUnit9:Expert System
Unit9:Expert System
 
Unit7: Production System
Unit7: Production SystemUnit7: Production System
Unit7: Production System
 
Unit8: Uncertainty in AI
Unit8: Uncertainty in AIUnit8: Uncertainty in AI
Unit8: Uncertainty in AI
 
Unit5: Learning
Unit5: LearningUnit5: Learning
Unit5: Learning
 
Unit4: Knowledge Representation
Unit4: Knowledge RepresentationUnit4: Knowledge Representation
Unit4: Knowledge Representation
 
Unit3:Informed and Uninformed search
Unit3:Informed and Uninformed searchUnit3:Informed and Uninformed search
Unit3:Informed and Uninformed search
 
Unit2: Agents and Environment
Unit2: Agents and EnvironmentUnit2: Agents and Environment
Unit2: Agents and Environment
 
Unit1: Introduction to AI
Unit1: Introduction to AIUnit1: Introduction to AI
Unit1: Introduction to AI
 
Unit 6: Application of AI
Unit 6: Application of AIUnit 6: Application of AI
Unit 6: Application of AI
 
Unit10
Unit10Unit10
Unit10
 
Unit9
Unit9Unit9
Unit9
 
Unit8
Unit8Unit8
Unit8
 
Unit7
Unit7Unit7
Unit7
 
BIM Data Mining Unit4 by Tekendra Nath Yogi
 BIM Data Mining Unit4 by Tekendra Nath Yogi BIM Data Mining Unit4 by Tekendra Nath Yogi
BIM Data Mining Unit4 by Tekendra Nath Yogi
 
Unit6
Unit6Unit6
Unit6
 
B. SC CSIT Computer Graphics Unit 5 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 5 By Tekendra Nath YogiB. SC CSIT Computer Graphics Unit 5 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 5 By Tekendra Nath Yogi
 
B. SC CSIT Computer Graphics Lab By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Lab By Tekendra Nath YogiB. SC CSIT Computer Graphics Lab By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Lab By Tekendra Nath Yogi
 
B. SC CSIT Computer Graphics Unit 4 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 4 By Tekendra Nath YogiB. SC CSIT Computer Graphics Unit 4 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 4 By Tekendra Nath Yogi
 
B. SC CSIT Computer Graphics Unit 3 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 3 By Tekendra Nath YogiB. SC CSIT Computer Graphics Unit 3 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 3 By Tekendra Nath Yogi
 
B. SC CSIT Computer Graphics Unit 2 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 2 By Tekendra Nath YogiB. SC CSIT Computer Graphics Unit 2 By Tekendra Nath Yogi
B. SC CSIT Computer Graphics Unit 2 By Tekendra Nath Yogi
 

Recently uploaded

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Recently uploaded (20)

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

BIM Data Mining Unit5 by Tekendra Nath Yogi

  • 1. Unit 5: Cluster Analysis LH 7 Presented By : Tekendra Nath Yogi Tekendranath@gmail.com College Of Applied Business And Technology
  • 2. Contd… • Outline: – 5.1. Basics and Algorithms – 5.2. K-means Clustering – 5.3. Hierarchical Clustering – 5.4. Density-based spatial clustering of applications with noise (DBSCAN) Clustering 27/5/2019 By: Tekendra Nath Yogi
  • 3. July 5, 2019 By:Tekendra Nath Yogi 3 Introduction • Cluster is a group of similar objects. • Clustering is the process of finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups
  • 4. July 5, 2019 By:Tekendra Nath Yogi 4 Contd… • Dissimilarities and similarities are assessed based on the attribute values describing the objects and often involve distance measures (e.g., Euclidean distance). • Clustering, falling under the category of unsupervised machine learning, because it uses unlabeled input data and allows the algorithm to act on that information without guidance. • Different clustering methods may generate different clusters on the same data set. • The partitioning is not performed by humans, but by the clustering algorithm.
  • 5. July 5, 2019 By:Tekendra Nath Yogi 5 Contd… • a good clustering algorithm aims to create clusters whose: – intra-cluster similarity is high (The data that is present inside the cluster is similar to one another) – inter-cluster similarity is less (Each cluster holds data that isn’t similar to the other)
  • 6. 7/5/2019 By:Tekendra Nath Yogi 6 Some Applications of Clustering • Cluster analysis has been widely used in numerous applications such as: – In business intelligence – In image reorganization – In web search – In Outlier detection – In biology
  • 7. 7/5/2019 By:Tekendra Nath Yogi 7 Contd.. • In Business intelligence: – clustering can help marketers discover distinct groups in their customer bases and characterize customer groups based on purchasing patterns so that, for example, advertising can be appropriately targeted..
  • 8. 7/5/2019 By:Tekendra Nath Yogi 8 Contd.. • In image recognization: – In image recognition, clustering can be used to discover clusters or “subclasses” in handwritten character recognition systems. – For example: Some people may write it with a small circle at the left bottom part, while some others may not. We can use clustering to determine subclasses for “2,” each of which represents a variation on the way in which 2 can be written.
  • 9. 7/5/2019 By:Tekendra Nath Yogi 9 Contd.. • In web search – document grouping: Clustering can be used to organize the search results into groups and present the results in a concise and easily accessible way. – cluster Weblog data to discover groups of similar access patterns.
  • 10. 7/5/2019 By:Tekendra Nath Yogi 10 Contd.. • In Outlier detection – Clustering can also be used for outlier detection, where outliers (values that are “far away” from any cluster) may be more interesting than common cases. – Applications of outlier detection include the detection of credit card fraud and the monitoring of criminal activities in electronic commerce.
  • 11. 7/5/2019 By:Tekendra Nath Yogi 11 Contd.. • In biology: – In biology, it can be used to derive plant and animal taxonomies, categorize genes with similar functionality, and gain insight into structures inherent in populations.
  • 12. July 5, 2019 By:Tekendra Nath Yogi 12 Requirements of Clustering in Data Mining • The following are typical requirements of clustering in data mining. – Scalability – Ability to deal with different types of attributes – Discovery of clusters with arbitrary shape – Minimal requirements for domain knowledge to determine input parameters – Ability to deal with noisy data – Incremental clustering and insensitivity to input order – Capability of clustering high-dimensionality data – Constraint-based clustering – Interpretability and usability
  • 13. 7/5/2019 By:Tekendra Nath Yogi 13 Contd.. • Scalability: – Many clustering algorithms work well on small data sets containing fewer than several hundred data objects; however, a large database may contain millions of objects. – Clustering on a sample of a given large data set may lead to biased results. – Highly scalable clustering algorithms are needed.
  • 14. 7/5/2019 By:Tekendra Nath Yogi 14 Contd.. • Ability to deal with different types of attributes: – Many algorithms are designed to cluster interval-based (numerical) data. – However, applications may require clustering other types of data, such as binary, categorical (nominal), and ordinal data, or mixtures of these data types.
  • 15. 7/5/2019 By:Tekendra Nath Yogi 15 Contd.. • Discovery of clusters with arbitrary shape: – Many clustering algorithms determine clusters based on Euclidean distance measures. – Algorithms based on such distance measures tend to find spherical clusters with similar size and density. – However, a cluster could be of any shape. – It is important to develop algorithms that can detect clusters of arbitrary shape.
  • 16. 7/5/2019 By:Tekendra Nath Yogi 16 Contd.. • Minimal requirements for domain knowledge to determine input parameters: – Many clustering algorithms require users to input certain parameters in cluster analysis (such as the number of desired clusters). – The clustering results can be quite sensitive to input parameters. – Parameters are often difficult to determine, especially for data sets containing high-dimensional objects. – This not only burdens users, but it also makes the quality of clustering difficult to control.
  • 17. 7/5/2019 By:Tekendra Nath Yogi 17 Contd.. • Ability to deal with noisy data: – Most real-world databases contain outliers or missing, unknown, or erroneous data. – Some clustering algorithms are sensitive to such data and may lead to clusters of poor quality.
  • 18. 7/5/2019 By:Tekendra Nath Yogi 18 Contd.. • Incremental clustering and insensitivity to the order of input records: – Some clustering algorithms cannot incorporate newly inserted data (i.e., database updates) into existing clustering structures and, instead, must determine a new clustering from scratch. – Some clustering algorithms are sensitive to the order of input data. That is, given a set of data objects, such an algorithm may return dramatically different clustering depending on the order of presentation of the input objects. – It is important to develop incremental clustering algorithms and algorithms that are insensitive to the order of input.
  • 19. 7/5/2019 By:Tekendra Nath Yogi 19 Contd.. • High dimensionality: – A database or a data warehouse can contain several dimensions or attributes. – Many clustering algorithms are good at handling low-dimensional data, involving only two to three dimensions. – Human eyes are good at judging the quality of clustering for up to three dimensions. – Finding clusters of data objects in high dimensional space is challenging, especially considering that such data can be sparse and highly skewed.
  • 20. 7/5/2019 By:Tekendra Nath Yogi 20 Contd.. • Constraint-based clustering: – Real-world applications may need to perform clustering under various kinds of constraints. – Suppose that your job is to choose the locations for a given number of new Automated Teller Machines (ATMs) in a city. – To decide upon this, you may cluster households while considering constraints such as the city’s rivers and highway networks, and the type and number of customers per cluster. – A challenging task is to find groups of data with good clustering behavior that satisfy specified constraints.
  • 21. 7/5/2019 By:Tekendra Nath Yogi 21 Contd.. • Interpretability and usability: – Users expect clustering results to be interpretable, comprehensible, and usable. – That is, clustering may need to be tied to specific semantic interpretations and applications. – It is important to study how an application goal may influence the selection of clustering features and methods.
  • 22. July 5, 2019 By:Tekendra Nath Yogi 22 Major Clustering Methods: • In general, the major fundamental clustering methods can be classified into the following categories: – Partitioning Methods – Hierarchical Methods – Density-Based Methods – Grid-Based Methods
  • 23. July 5, 2019 By:Tekendra Nath Yogi 23 Contd.. • Partitioning Methods: – Given a data set, D, of n objects, and k, the number of clusters to form, a partitioning method constructs k partitions of the data, where each partition represents a cluster and k <= n. – That is, it classifies the data into k groups, which together satisfy the following requirements: • Each group must contain at least one object, and • Each object must belong to exactly one group.
  • 24. July 5, 2019 By:Tekendra Nath Yogi 24 Contd… – A partitioning method creates an initial partitioning. It then uses an iterative relocation technique that attempts to improve the partitioning by moving objects from one group to another. – The general criterion of a good partitioning is that objects in the same cluster are close or related to each other, whereas objects of different clusters are far apart or very different.
  • 25. July 5, 2019 By:Tekendra Nath Yogi 25 k-Means: A Centroid-Based Technique • A Centroid-based partitioning technique uses the centroid of a cluster, Ci , to represent that cluster. • The centroid of a cluster is its center point such as the mean of the objects (or points) assigned to the cluster. • The distance between an object and ci, the representative of the cluster, is measured by dist(p, ci), • where dist(i, j) is the Euclidean distance between two points
  • 26. July 5, 2019 By:Tekendra Nath Yogi 26 Contd.. • The k-means algorithm defines the centroid of a cluster as the mean value of the points within the cluster. It proceeds as follows: – First, it randomly selects k of the objects in D, each of which initially represents a cluster mean or center. – For each of the remaining objects, an object is assigned to the cluster to which it is the most similar, based on the Euclidean distance between the object and the cluster mean. – The k-means algorithm then iteratively improves the within-cluster variation. For each cluster, it computes the new mean using the objects assigned to the cluster in the previous iteration. All the objects are then reassigned using the updated means as the new cluster centers. – The iterations continue until the assignment is stable, that is, the clusters formed in the current round are the same as those formed in the previous round.
  • 27. July 5, 2019 By:Tekendra Nath Yogi 27 Contd.. • Algorithm: – The k-means algorithm for partitioning, where each cluster’s center is represented by the mean value of the objects in the cluster.
  • 28. 28 Contd… • Example1: Clusters the following instances of given data (2- Dimensional form) with the help of K means algorithm (Take K = 2) Instance X Y 1 1 1.5 2 1 4.5 3 2 1.5 4 2 3.5 5 3 2.5 6 3 4
  • 29. July 5, 2019 By:Tekendra Nath Yogi 29 Contd… • Solution: – Given, number of clusters to be created (k)=2. – Initially choose two points randomly as a initial cluster center, say objects 1 and 3 are chosen – i.e., c1=(1, 1.5) and c2= (2, 1.5)
  • 30. July 5, 2019 By:Tekendra Nath Yogi 30 Contd… • Iteration1: – Now calculating similarity by using Euclidean distance measure as: – dist(c1,2) = √(1 - 1)² + (1.5- 4.5)²=3 – dist(c2, 2) = √(2 - 1)² + (1.5 – 4.5)²=3.163 – Here, dist(c1, 2)< dist(c2,2) – So, data point 2 belongs to c1.
  • 31. July 5, 2019 By:Tekendra Nath Yogi 31 Contd… – dist(c1,4) = √(1 - 2)² + (1.5- 3.5)²=2.236 – dist(c2, 4) = √(2 - 2)² + (1.5 – 3.5)²=2 – Here, dist(c2, 4)< dist(c1,4) – So, data point 4 belongs to c2. – dist(c1,5) = √(1 - 3)² + (1.5- 2.5)²=2.236 – dist(c2, 5) = √(2 - 3)² + (1.5 – 2.5)²=1.4143 – Here, dist(c2, 5)< dist(c1,5) – So, data point 5 belongs to c2.
  • 32. July 5, 2019 By:Tekendra Nath Yogi 32 Contd… – dist(c1,6) = √(1 - 3)² + (1.5- 4)²=3.2 – dist(c2, 6) = √(2 - 3)² + (1.5 – 4)²=2.7 – Here, dist(c2, 6)< dist(c1,6) – So, data point 6 belongs to c2. – The resulting cluster after 1st iteration is: 1, 2 C1 3,4,5,6 C2
  • 33. July 5, 2019 By:Tekendra Nath Yogi 33 Contd… • Iteration 2: • Now calculating centroid for each cluster: – Centroid for c1=(1+1/2, 1.5+4.5/2)=( 1, 3) – Centroid for c3=((2+2+3+3)/4, ( 1.5+3.5+2.5+4)/4)=( 2.5, 2.875) – Now, again calculating similarity: – dist(c1,1) = √(1 - 1)² + (3- 1.5)²=1.5 – dist(c2, 1) = √(2.5 - 1)² + (2.875 – 1.5)²=2.035 – Here, dist(c1, 1)< dist(c2,1) – So, data point 1 belongs to c1.
  • 34. July 5, 2019 By:Tekendra Nath Yogi 34 Contd… – dist(c1,2) = √(1 - 1)² + (3- 4.5)²=1.5 – dist(c2, 2) = √(2.5 - 1)² + (2.875 – 4.5)²=2.22 – Here, dist(c1, 2)< dist(c2,2) – So, data point 2 belongs to c1. – dist(c1,3) = √(1 - 2)² + (3- 1.5)²=1.8 – dist(c2, 3) = √(2.5 - 2)² + (2.875 – 1.5)²=1.463 – Here, dist(c2, 3)< dist(c1,3) – So, data point 3 belongs to c2.
  • 35. July 5, 2019 By:Tekendra Nath Yogi 35 Contd… – dist(c1,4) = √(1 - 2)² + (3- 3.5)²=1.12 – dist(c2, 4) = √(2.5 - 2)² + (2.875 – 3.5)²=0.8 – Here, dist(c2, 4)< dist(c1,4) – So, data point 4 belongs to c2. – dist(c1,5) = √(1 - 3)² + (3- 2.5)²=2.06 – dist(c2, 5) = √(2.5 - 3)² + (2.875 – 2.5)²=0.625 – Here, dist(c2, 5)< dist(c1,5) – So, data point 5 belongs to c2.
  • 36. July 5, 2019 By:Tekendra Nath Yogi 36 Contd… – dist(c1,6) = √(1 - 3)² + (3- 4)²=2.236 – dist(c2, 6) = √(2.5 - 3)² + (2.875 – 4)²=0.718 – Here, dist(c2, 6)< dist(c1,6) – So, data point 6 belongs to c2. The resulting cluster after 1st iteration is: Same as iteration 1, so terminate. 1, 2 C1 3,4,5,6 C2
  • 37. July 5, 2019 By:Tekendra Nath Yogi 37 Contd…
  • 38. July 5, 2019 By:Tekendra Nath Yogi 38 Contd…
  • 39. July 5, 2019 By:Tekendra Nath Yogi 39 Contd…
  • 40. July 5, 2019 By:Tekendra Nath Yogi 40 Contd…
  • 41. 41 Contd.. • Example 3: Clusters the following instances of given data (2- Dimensional form) with the help of K means algorithm (Take K = 2) Instance X Y 1 1 2.5 2 1 4.5 3 2.5 3 4 2 1.5 5 4.5 1.5 6 4 5
  • 42. July 5, 2019 By:Tekendra Nath Yogi 42 Contd… • Weakness of K-means: – Applicable only when mean is defined. – Need to specify k, the number of cluster in advance. – Unable to handle outliers.
  • 43. July 5, 2019 By:Tekendra Nath Yogi 43 Hierarchical clustering • A hierarchical clustering method works by grouping data objects into a hierarchy or “tree” of clusters. • Representing data objects in the form of a hierarchy is useful for data summarization and visualization.
  • 44. July 5, 2019 By:Tekendra Nath Yogi 44 Contd.. • Depending on whether the hierarchical decomposition is formed in a bottom- up (merging) or top-down (splitting) fashion a hierarchical clustering method can be classified into two categories: – Agglomerative Hierarchical Clustering and – Divisive Hierarchical Clustering
  • 45. July 5, 2019 By:Tekendra Nath Yogi 45 Contd.. • Agglomerative Hierarchical Clustering: – uses a bottom-up strategy. – starts by letting each object form its own cluster and iteratively merges clusters into larger and larger clusters, until all the objects are in a single cluster or certain termination conditions(desired number of clusters) are satisfied. – For the merging step, it finds the two clusters that are closest to each other (according to some similarity measure), and combines the two to form one cluster.
  • 46. July 5, 2019 By:Tekendra Nath Yogi 46 Contd.. • Example: a data set of five objects, {a, b, c, d, e}. Initially, AGNES (AGglomerative NESting), the agglomerative method, places each object into a cluster of its own. The clusters are then merged step-by-step according to some criterion (e.g., minimum Euclidean distance).
  • 47. July 5, 2019 By:Tekendra Nath Yogi 47 Contd.. • Divisive hierarchical clustering : – A divisive hierarchical clustering method employs a top-down strategy. – It starts by placing all objects in one cluster, which is the hierarchy’s root. – It then divides the root cluster into several smaller sub-clusters, and recursively partitions those clusters into smaller ones. – The partitioning process continues until each cluster at the lowest level either containing only one object, or the objects within a cluster are sufficiently similar to each other.
  • 48. July 5, 2019 By:Tekendra Nath Yogi 48 Contd.. • Example: DIANA (DIvisive ANAlysis), a divisive hierarchical clustering method: – a data set of five objects, {a, b, c, d, e}. All the objects are used to form one initial cluster. The cluster is split according to some principle such as the maximum Euclidean distance between the closest neighboring objects in the cluster. The cluster-splitting process repeats until, eventually, each new cluster contains only a single object.
  • 49. July 5, 2019 By:Tekendra Nath Yogi 49 Contd.. • agglomerative versus divisive hierarchical clustering: – Organize objects into a hierarchy using a bottom-up or top-down strategy, respectively. – Agglomerative methods start with individual objects as clusters, which are iteratively merged to form larger clusters. – Conversely, divisive methods initially let all the given objects form one cluster, which they iteratively split into smaller clusters.
  • 50. July 5, 2019 By:Tekendra Nath Yogi 50 Contd.. • Hierarchical clustering methods can encounter difficulties regarding the selection of merge or split points. – Such a decision is critical, because once a group of objects is merged or split, the process at the next step will operate on the newly generated clusters. It will neither undo what was done previously, nor perform object swapping between clusters. – Thus, merge or split decisions, if not well chosen, may lead to low-quality clusters. • Moreover, the methods do not scale well because each decision of merge or split needs to examine and evaluate many objects or clusters.
  • 51. 7/5/2019 By:Tekendra Nath Yogi 51 Density Based Methods • Partitioning methods and hierarchical clustering are suitable for finding spherical-shaped clusters. • Moreover, they are also severely affected by the presence of noise and outliers in the data. • Unfortunately, real life data contain: – Clusters of arbitrary shape such as oval, linear, s-shaped, etc. – Many noise • Solution : Density based methods
  • 52. 7/5/2019 By:Tekendra Nath Yogi 52 Contd.. • Basic Idea behind Density based methods: – Model clusters as dense regions in the data space, separated by sparse regions. • Major features: – Discover clusters of arbitrary shape(e.g., oval, s-shaped, etc) – Handle noise – Need density parameters as termination condition • E.g., : DBSCAN(Density Based Spatial Clustering of Applications with Noise)
  • 53. Density-Based Clustering: Background • Neighborhood of point p=all points within distance e from p: – NEps(p)={q | dist(p,q) <= e } • Two parameters: – e : Maximum radius of the neighbourhood – MinPts: Minimum number of points in an e -neighbourhood of that point • If the number of points in the e -neighborhood of p is at least MinPts, then p is called a core object. p q MinPts = 5 e = 1 cm
  • 54. Contd.. • Directly density-reachable: – A point p is directly density-reachable from a point q wrt. e, MinPts if • 1) p belongs to NEps(q) • 2) core point condition: |NEps (q)| >= MinPts p q MinPts = 5 e = 1 cm
  • 55. Contd.. • Density-reachable: – A point p is density-reachable from a point q wrt. Eps, MinPts if there is a chain of points p1, …, pn, q = p1,….. pn = p such that pi+1 is directly density-reachable from pi p q p1
  • 56. Contd.. • Density-connected: – A point p is density-connected to a point q wrt. Eps, MinPts if there is a point o such that both, p and q are density-reachable from o wrt. Eps and MinPts. p q o
  • 57. 7/5/2019 By:Tekendra Nath Yogi 57 Contd.. • Density = number of points within a specified radius (Eps). • A point is a core point if it has at least a specified number of points (MinPts) within Eps. • These are points that are at the interior of a cluster • Counts the point itself • A border point is not a core point, but is in the neighborhood of a core point • A noise point is any point that is not a core point or a border point e.g.,: Minpts=7
  • 58. 7/5/2019 By:Tekendra Nath Yogi 58 DBSCAN(Density Based Spatial Clustering of Applications with Noise) • To find the next cluster, DBSCAN randomly selects an unvisited object from the remaining ones. The clustering process continues until all objects are visited.
  • 59. 7/5/2019 By:Tekendra Nath Yogi 59 Contd..
  • 60. 7/5/2019 By:Tekendra Nath Yogi 60 Contd.. • Example: – If Epsilon is 2 and minpoint is 2, what are the clusters that DBScan would discover with the following 8 examples: A1=(2,10), A2=(2,5), A3=(8,4), A4=(5,8), A5=(7,5), A6=(6,4), A7=(1,2), A8=(4,9). • Solution : – d(a,b) denotes the Eucledian distance between a and b. It is obtained directly from the distance matrix calculated as follows: – d(a,b)=sqrt((xb-xa)2+(yb-ya)2))
  • 61. 7/5/2019 By:Tekendra Nath Yogi 61 Contd.. A1 A2 A3 A4 A5 A6 A7 A8 A1 0 √25 √36 √13 √50 √52 √65 √5 A2 0 √37 √18 √25 √17 √10 √20 A3 0 √25 √2 √2 √53 √41 A4 0 √13 √17 √52 √2 A5 0 √2 √45 √25 A6 0 √29 √29 A7 0 √58 A8 0
  • 62. 7/5/2019 By:Tekendra Nath Yogi 62 Contd.. • N2(A1)={}; • N2(A2)={}; • N2(A3)={A5, A6}; • N2(A4)={A8}; • N2(A5)={A3, A6}; • N2(A6)={A3, A5}; • N2(A7)={}; • N2(A8)={A4}; • So A1, A2, and A7 are outliers, while we have two clusters C1={A4, A8} and C2={A3, A5, A6}
  • 63. 7/5/2019 By:Tekendra Nath Yogi 63 Contd..
  • 64. 7/5/2019 By:Tekendra Nath Yogi 64 Advantages and Disadvantages of DBSCAN algorithm: • Advantages: – DBSCAN does not require one to specify the number of clusters in the data priori, as opposed to k-means. – DBSCAN can find arbitrarily shaped clusters – DBSCAN is robust to outliers. – DBSCAN is mostly insensitive to the ordering of the points in the database. – The parameters minPts and ε can be set by a domain expert, if the data is well understood.
  • 65. 7/5/2019 By:Tekendra Nath Yogi 65 Contd.. • Disadvantages: – DBSCAN is not entirely deterministic: border points that are reachable from more than one cluster can be part of either cluster, depending on the order the data is processed. Fortunately, this situation does not arise often, and has little impact on the clustering result: both on core points and noise points, DBSCAN is deterministic – DBSCAN cannot cluster data sets well with large differences in densities, since the minPts-ε combination cannot then be chosen appropriately for all clusters. – If the data and scale are not well understood, choosing a meaningful distance threshold ε can be difficult.
  • 66. 7/5/2019 By:Tekendra Nath Yogi 66 Homework • Explain the aims of cluster analysis. • What is clustering? How is it different than supervised classification? In what situation clustering can be useful? • List and explain desired features of cluster analysis. • Explain the different types of cluster analysis methods and discuss their features. • Describe the k-means algorithm and write its strengths and weaknesses. • Describe the features of Hierarchical clustering methods? In what situations are these methods useful?
  • 67. Thank You ! 67By: Tekendra Nath Yogi7/5/2019