2. Clustering
• clustering is the process of grouping similar
objects together.
• Clustering can be used to group items in a
supermarket. For example, butter , cheese and
milk can be placed in the "dairy products"
group.
• Clustering algorithms are mainly used for
natural groupings.
3. • Clustering is an example of Unsupervised
learning.
• what is Unsupervised learning?
• Machine learning (ML) is divided into two
different fields:
• Supervised ML defined as a set of tools used for
prediction (linear model, logistic regression, linear
discriminant analysis, classification trees, support
vector machines and more)
• Unsupervised ML, also known as clustering, is
an exploratory data analysis technique used for
identifying groups (i.e clusters) in the data set of
interest. Each group contains observations with
similar profile according to a specific criteria
4. • A huge amounts of multidimensional data have
been collected in various fields such as
marketing, bio-medical and geo-spatial fields.
Mining knowledge from these big
data becomes a highly demanding field.
However, it far exceeded human’s ability to
analyze these huge data. Unsupervised
Machine Learning or clustering is one of the
important data mining methods for discovering
knowledge in multidimensional data.
5. • Different Categories of Clustering
• 1.Hierarchical: Hierarchical cluster identifies the
cluster within cluster. A new article group can
further have other groups such as business ,
politics, and sports in which each group can have
still have subgroups. For example Inside sports
news there can be news of baseball sports ,
hockey sport , basketball sports.
• 2. Partitional : Partitional creates a fixed number
of Clusters.
• Example-K mean algorithm
6. • What is K Means Clustering?
• K Means Clustering is an unsupervised learning
algorithm that tries to cluster data based on their
similarity. Unsupervised learning means that there is
no outcome to be predicted, and the algorithm just
tries to find patterns in the data. In k means clustering,
we have specify the number of clusters we want the
data to be grouped into. The algorithm randomly
assigns each observation to a cluster, and finds the
centroid of each cluster.
• Then, the algorithm iterates through two steps:
• Reassign data points to the cluster whose centroid is
closest.
• Calculate new centroid of each cluster.