2. What is Clustering Analysis in Data Mining
๏ด Clustering analysis is the process of forming similar group of objects
together in form of a cluster that is it categorizes the objects in unlabelled
data into different categories based on their similarities.
๏ด Clustering is an unsupervised machine learning algorithm.
๏ด How clustering is making the groups from an unlabelled dataset? It is done
by finding some similarities or patterns from the dataset like color or shape.
Then it groups the objects which have the same features into a group and
others into another group like that.
๏ด Consider the below image to understand the clustering concept in pictorial
view.
3. Clustering Properties & Methods:
Properties:
๏ด Clustering scalability
๏ด Algorithm usability with multiple types of data.
๏ด Dealing with unstructured.
๏ด Interoperability
Methods:
๏ด Partitioning Clustering
๏ด Density-Based Clustering
๏ด Distribution Model-Based Clustering
๏ด Hierarchical Clustering
๏ด Fuzzy Clustering
4. Applications of Clustering
๏ด Below are some commonly known applications of clustering technique in
Machine Learning:
๏ด In Identification of Cancer Cells: The clustering algorithms are
widely used for the identification of cancerous cells. It divides the
cancerous and non-cancerous data sets into different groups.
๏ด Customer Segmentation:It is used in market research to segment
the customers based on their choice and preferences.
๏ด In Biology: It is used in the biology stream to classify different species
of plants and animals using the image recognition technique.
๏ด In Land Use: The clustering technique is used in identifying the area of
similar lands use in the GIS database. This can be very useful to find that
for what purpose the particular land should be used, that means for which
purpose it is more suitable.
5. K-means Clustering Algorithm
๏ด K-Means clustering is an example of Partitioning clustering which follows
method of partitioning clustering that is it divides the data items into k
partitions that represents a cluster.
๏ด K-Means Clustering is an Unsupervised Learning algorithm, which groups
the unlabeled dataset into different clusters. Here K defines the number of
pre-defined clusters that need to be created in the process, as if K=2,
there will be two clusters, and for K=3, there will be three clusters, and so
on.
๏ด It is an iterative algorithm that divides the unlabeled dataset into k different
clusters in such a way that each dataset belongs only one group that has
similar properties.
๏ด The algorithm takes the unlabeled dataset as input, divides the dataset
into k-number of clusters, and repeats the process until it does not find the
best clusters. The value of k should be predetermined in this algorithm
6. ๏ด Partitions that is cluster should satisfy two rules.
โข Each partition should contain atleast one object.
โข Each data item should belongs to only one partition that is cluster.
๏ด The algorithm takes the unlabeled dataset as input, divides the dataset into
k-number of clusters, and repeats the process until it does not find the best
clusters. The value of k should be predetermined in this algorithm.
๏ด The k-means clustering algorithm mainly performs two tasks:
โข Determines the best value for K center points or centroids by an iterative
process.
โข Assigns each data point to its closest k-center. Those data points which are
near to the particular k-center, create a cluster.
7. The below diagram explains the working of the K-means
Clustering Algorithm:
8. Advantages & Disadvantages
๏ด Advantages:
โข Relatively simple to implement
โข Scales to large data sets
โข Guarantees convergence
โข Can warm-start the positions of centroids
โข Easily adapts to new examples
โข Generalizes to clusters of different shapes and sizes, such as elliptical clusters
๏ด Disadvantages:
โข Choosing k manually
โข Being dependent on initial values
โข Clustering outliers
โข Scaling with number of dimensions
9. Customer segmentation
๏ด Customer segmentation simply means grouping your customers
according to various characteristics (for example grouping customers
by age).
๏ด Itโs a way for organizations to understand their customers. Knowing
the differences between customer groups, itโs easier to make
strategic decisions regarding product growth and marketing.
๏ด There are different methodologies for customer segmentation, and
they depend on four types of parameters:
โข geographic,
โข demographic,
โข behavioral,
โข psychological.
11. Advantages of customer segmentation
๏ด Implementing customer segmentation leads to plenty of new business
opportunities. You can do a lot of optimization in:
โข budgeting,
โข product design,
โข promotion,
โข marketing,
โข customer satisfaction.
12. Subjects That Clustering Analysis
๏ด Data Warehouse and Data Mining
๏ด Clustering Techniques
๏ด Artificial Intelligence
๏ด Image Processing
๏ด Unsupervised learnings
13. Conclusion
๏ด Customers have different needs. A one-size-for-all approach to business will
generally result in less engagement, lower-click through rates, and
ultimately fewer sales.
๏ด Customer segmentation is the cure for this problem.
๏ด Finding an optimal number of unique customer groups will help you
understand how your customers differ, and help you give them exactly what
they want.
๏ด Customer segmentation improves customer experience and boosts
company revenue.
14. Future Work
๏ด Use different distance metrics: K-means clustering uses Euclidean
distance as the default distance metric. However, other distance metrics
such as Manhattan, Cosine, and Minkowski can be used to increase the
accuracy of the clustering.
๏ด Use different initialization methods: K-means clustering uses random
initialization by default. However, other initialization methods such as k-
means++ and k-medoids can be used to increase the accuracy of the
clustering.
๏ด Use different cluster sizes: K-means clustering uses a fixed
number of clusters.