The document describes an unsupervised k-means clustering algorithm project. It includes an introduction to k-means clustering, a literature review of previous related work, and the problem statement, aim, and objectives of proposing a novel unsupervised k-means algorithm. The methodology section outlines the u-k-means clustering algorithm and flowchart. Software implementations in Python are provided using scikit-learn and matplotlib to visualize the clusters. The algorithm is tested on the iris dataset to find the best model parameters.
K-Means clustering uses an iterative procedure which is very much sensitive and dependent upon the initial centroids. The initial centroids in the k-means clustering are chosen randomly, and hence the clustering also changes with respect to the initial centroids. This paper tries to overcome this problem of random selection of centroids and hence change of clusters with a premeditated selection of initial centroids. We have used the iris, abalone and wine data sets to demonstrate that the proposed method of finding the initial centroids and using the centroids in k-means algorithm improves the clustering performance. The clustering also remains the same in every run as the initial centroids are not randomly selected but through premeditated method.
New Approach for K-mean and K-medoids AlgorithmEditor IJCATR
K-means and K-medoids clustering algorithms are widely used for many practical applications. Original k
medoids algorithms select initial centroids and medoids randomly that affect the quality of the resulting clusters and sometimes it
generates unstable and empty clusters which are meaningless.
expensive and requires time proportional to the product of the number of data items, number of clusters and the number of iterations.
The new approach for the k mean algorithm eliminates the deficiency of exiting k mean. It first calculates the initial centro
requirements of users and then gives better, effective and stable cluster. It also takes less execution time because it eliminates
unnecessary distance computation by using previous iteration. The new approach for k
systematically based on initial centroids. It generates stable clusters to improve accuracy.
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
K-means Clustering Method for the Analysis of Log Dataidescitation
Clustering analysis method is one of the main
analytical methods in data mining; the method of clustering
algorithm will influence the clustering results directly. This
paper discusses the standard k-means clustering algorithm
and analyzes the shortcomings of standard k-means
algorithm. This paper also focuses on web usage mining to
analyze the data for pattern recognition. With the help of k-
means algorithm, pattern is identified.
K-Means clustering uses an iterative procedure which is very much sensitive and dependent upon the initial centroids. The initial centroids in the k-means clustering are chosen randomly, and hence the clustering also changes with respect to the initial centroids. This paper tries to overcome this problem of random selection of centroids and hence change of clusters with a premeditated selection of initial centroids. We have used the iris, abalone and wine data sets to demonstrate that the proposed method of finding the initial centroids and using the centroids in k-means algorithm improves the clustering performance. The clustering also remains the same in every run as the initial centroids are not randomly selected but through premeditated method.
New Approach for K-mean and K-medoids AlgorithmEditor IJCATR
K-means and K-medoids clustering algorithms are widely used for many practical applications. Original k
medoids algorithms select initial centroids and medoids randomly that affect the quality of the resulting clusters and sometimes it
generates unstable and empty clusters which are meaningless.
expensive and requires time proportional to the product of the number of data items, number of clusters and the number of iterations.
The new approach for the k mean algorithm eliminates the deficiency of exiting k mean. It first calculates the initial centro
requirements of users and then gives better, effective and stable cluster. It also takes less execution time because it eliminates
unnecessary distance computation by using previous iteration. The new approach for k
systematically based on initial centroids. It generates stable clusters to improve accuracy.
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
K-means Clustering Method for the Analysis of Log Dataidescitation
Clustering analysis method is one of the main
analytical methods in data mining; the method of clustering
algorithm will influence the clustering results directly. This
paper discusses the standard k-means clustering algorithm
and analyzes the shortcomings of standard k-means
algorithm. This paper also focuses on web usage mining to
analyze the data for pattern recognition. With the help of k-
means algorithm, pattern is identified.
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
Data mining is the process of using technology to identify patterns and prospects from large amount of information. In Data Mining, Clustering is an important research topic and wide range of unverified classification application. Clustering is technique which divides a data into meaningful groups. K-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. In this paper, we present the comparison of different K-means clustering algorithms.
Parallel knn on gpu architecture using opencleSAT Journals
Abstract In data mining applications, one of the useful algorithms for classification is the kNN algorithm. The kNN search has a wide usage in many research and industrial domains like 3-dimensional object rendering, content-based image retrieval, statistics, biology (gene classification), etc. In spite of some improvements in the last decades, the computation time required by the kNN search remains the bottleneck for kNN classification, especially in high dimensional spaces. This bottleneck has created the necessity of the parallel kNN on commodity hardware. GPU and OpenCL architecture are the low cost high performance solutions for parallelising the kNN classifier. In regard to this, we have designed, implemented our proposed parallel kNN model to improve upon performance bottleneck issue of kNN algorithm. In this paper, we have proposed parallel kNN algorithm on GPU and OpenCL framework. In our approach, we distributed the distance computations of the data points among all GPU cores. Multiple threads invoked for each GPU core. We have implemented and tested our parallel kNN implementation on UCI datasets. The experimental results show that the speedup of the KNN algorithm is improved over the serial performance.
Keywords: kNN, GPU, CPU, Parallel Computing, Data Mining, Clustering Algorithm.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
k-Means is a rather simple but well known algorithms for grouping objects, clustering. Again all objects need to be represented as a set of numerical features. In addition the user has to specify the number of groups (referred to as k) he wishes to identify. Each object can be thought of as being represented by some feature vector in an n dimensional space, n being the number of all features used to describe the objects to cluster. The algorithm then randomly chooses k points in that vector space, these point serve as the initial centers of the clusters. Afterwards all objects are each assigned to center they are closest to. Usually the distance measure is chosen by the user and determined by the learning task. After that, for each cluster a new center is computed by averaging the feature vectors of all objects assigned to it. The process of assigning objects and recomputing centers is repeated until the process converges. The algorithm can be proven to converge after a finite number of iterations. Several tweaks concerning distance measure, initial center choice and computation of new average centers have been explored, as well as the estimation of the number of clusters k. Yet the main principle always remains the same. In this project we will discuss about K-means clustering algorithm, implementation and its application to the problem of unsupervised learning
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
More Related Content
Similar to Presentation Template__TY_AIML_IE2_Project (1).pptx
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
Data mining is the process of using technology to identify patterns and prospects from large amount of information. In Data Mining, Clustering is an important research topic and wide range of unverified classification application. Clustering is technique which divides a data into meaningful groups. K-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. In this paper, we present the comparison of different K-means clustering algorithms.
Parallel knn on gpu architecture using opencleSAT Journals
Abstract In data mining applications, one of the useful algorithms for classification is the kNN algorithm. The kNN search has a wide usage in many research and industrial domains like 3-dimensional object rendering, content-based image retrieval, statistics, biology (gene classification), etc. In spite of some improvements in the last decades, the computation time required by the kNN search remains the bottleneck for kNN classification, especially in high dimensional spaces. This bottleneck has created the necessity of the parallel kNN on commodity hardware. GPU and OpenCL architecture are the low cost high performance solutions for parallelising the kNN classifier. In regard to this, we have designed, implemented our proposed parallel kNN model to improve upon performance bottleneck issue of kNN algorithm. In this paper, we have proposed parallel kNN algorithm on GPU and OpenCL framework. In our approach, we distributed the distance computations of the data points among all GPU cores. Multiple threads invoked for each GPU core. We have implemented and tested our parallel kNN implementation on UCI datasets. The experimental results show that the speedup of the KNN algorithm is improved over the serial performance.
Keywords: kNN, GPU, CPU, Parallel Computing, Data Mining, Clustering Algorithm.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
k-Means is a rather simple but well known algorithms for grouping objects, clustering. Again all objects need to be represented as a set of numerical features. In addition the user has to specify the number of groups (referred to as k) he wishes to identify. Each object can be thought of as being represented by some feature vector in an n dimensional space, n being the number of all features used to describe the objects to cluster. The algorithm then randomly chooses k points in that vector space, these point serve as the initial centers of the clusters. Afterwards all objects are each assigned to center they are closest to. Usually the distance measure is chosen by the user and determined by the learning task. After that, for each cluster a new center is computed by averaging the feature vectors of all objects assigned to it. The process of assigning objects and recomputing centers is repeated until the process converges. The algorithm can be proven to converge after a finite number of iterations. Several tweaks concerning distance measure, initial center choice and computation of new average centers have been explored, as well as the estimation of the number of clusters k. Yet the main principle always remains the same. In this project we will discuss about K-means clustering algorithm, implementation and its application to the problem of unsupervised learning
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
1. AIML IE2 Project
Gr.No.-10 (P-10)
Unsupervised k-means
clustering
Group members – TYETB078 Vidit Agarwal
TYETB079 Randhirsinh Bhosale
TYETB080 Hetal Gupta
TYETB081 Isha Shetty
TYETB082 Vishwatej Katkar
Guide name - Dr. Rajani P.K.
ELECTRONICS & TELECOMMUNICATION ENGINEERING
Pimpri Chinchwad College of Engineering , Pune
2. OUTLINE OF PROJECT
• Introduction
• Literature review
• Need of the Project
• Problem Statement, Aim, Objectives
• Methodology
• Block Diagram
• Algorithm and Flowchart
• Software Implementations
• Advantages & Applications
• Work Done till date
• Results & Analysis
• Conclusion
• References
• Project Outcome
2
17-10-2023 Unsupervised K-means Clustering Algorithm
3. Introduction
• Clustering is a useful tool in data science. It is a method for
finding cluster structure in a data set that is characterized by
the greatest similarity within the same cluster and the
greatest dissimilarity between different clusters.
• It is known that the k-means algorithm is the oldest and
popular partitional method. The k-means clustering has been
widely studied with various extensions in the literature and
applied in a variety of substantive areas. However, these k-
means clustering algorithms are usually affected by
initialization sand need to be given a number of clusters a
priori. In general, the cluster number is unknown. In this
case, validity indices can be used to find a cluster number
where they are supposed to be independent of clustering
algorithms.
3
17-10-2023 Unsupervised K-means Clustering Algorithm
4. Literature Review
Title Author Objective Algorithm Methodology Conclusion
Unsupervi
sed K-
Means
Clustering
Algorithm
KRISTINA P.
SINAGA AND
MIIN-SHEN
YANG
The k-means
algorithm is
generally the most
known and used
clustering method.
There are Various
extensions of k-
means to be
proposed in the
literature.
Although it is an
unsupervised
learning to
clustering in
pattern
recognition and
machine learning,
K means
clustering
algorithm
Clustering is a useful
tool in data science. It is
a method for finding
cluster structure in a
data set that is
characterized by the
greatest similarity within
the same cluster and the
greatest dissimilarity
between different
clusters.
In this paper we
propose a new
schema with a
learning Frame
work for the
kmeans clustering
algorithm. We
adopt the merit of
entropy-type
penalty terms to
construct a
competition
schema.
17-10-2023 Unsupervised K-means Clustering Algorithm 4
5. 17-10-2023 Unsupervised K-means Clustering Algorithm 5
Title Author Objective Algorithm Methodology Conclusion
Unsupervi
sed K-
Means
Clustering
Algorithm
Yogiraj Singh
Kushawah,
Ashish
Mohan Yadav
Data mining are
data analysis
supported
unsupervised
clustering
algorithm is one of
the quickest
growing research
areas because of
availability of huge
quantity of data
analysis and
extract usefully
information based
on new improve
performance of
clustering
algorithm.
K means
clustering(PCA)
Data mining is that the
process of extracting
useful and hidden
information or data from
large datasets. the data
therefore extracted will
be used to improve an
unsupervised clustering.
Clustering is an
unsupervised technique
that groups the
knowledge of similar
objects with minimum
cluster distance into a
cluster by eliminating
the inappropriate data
objects
Data mining is to
divide clusters from
large data set and
transform it into an
understandable
extract information
.Clustering plays a
very important task
in data mining.
6. • Title
• Author
• Objective
17-10-2023 Unsupervised K-means Clustering Algorithm 6
Title Author Objective Algorith
m
Methodology Conclusion
Unsupervi
sed K-
Means
Clustering
Algorithm
Salim Dridi
M
Many unsupervised
learning approaches
and algorithms have
been introduced
since the last decade
where are well-
known and widely
used algorithms of
unsupervised
learning. The growing
interest in applying
unsupervised
learning techniques
forms a great success
in fields.
K means
clustering
Unsupervised learning
refers to algorithms to
identify patterns in data
sets containing data
points that are neither
classified nor labelled.
The algorithms are thus
allowed to classify, label
and group the data
points within the data
sets without having any
external guidance in
performing that task.
The users do not need to
supervise the model
K-Means Clustering
is an Unsupervised
Learning algorithm.
It arranges the
unlabeled dataset
into several clusters
where “K” denotes
K-Means Clustering
is an Unsupervised
Learning algorithm.
It arranges the
unlabeled dataset
into several clusters
where “K” denotes
clusters.
7. Need of the project
• Unsupervised learning methods do not need so many labels
of data, so this reduces some requirements of the collected
data. Unsupervised learning relies on its unique training
system, and classifies original data with little or no already
known label data. Unsupervised learning is accomplished
through feedback and a series of rewards and punishments.
• K-Means is a type of Partitioning Clustering and hence one
of the simplest yet powerful machine learning algorithms. As
it is an unsupervised algorithm, K-Means makes its
inferences from datasets using only input vectors without
referring to labeled outcomes.
17-10-2023 Unsupervised K-means Clustering Algorithm 7
8. Problem Statement
• There are various extensions of k-means to be proposed in the literature.
Although it is an unsupervised learning to clustering in pattern
recognition and machine learning, the k-means algorithm and its
extensions are always influenced by initializations with a necessary
number of clusters a priori. That is, the k-means algorithm is not exactly
an unsupervised clustering method. In this paper, they construct an
unsupervised learning schema for the k-means algorithm so that it is free
of initializations without parameter selection and can also simultaneously
find an optimal number of clusters. That is, they propose a novel
unsupervised k-means (U-k-means) clustering algorithm with
automatically finding an optimal number of clusters without giving any
initialization and parameter selection. The computational complexity of
the proposed U-k-means clustering algorithm is also analyzed.
17-10-2023 Unsupervised K-means Clustering Algorithm 8
9. Aim, Objective
• To construct an unsupervised learning schema for the k-
means algorithm so that it is free of initializations without
parameter selection and can also simultaneously find an
optimal number of clusters.
• To propose a novel unsupervised k-means (U-k-means)
clustering algorithm with automatically finding an optimal
number of clusters without giving any initialization and
parameter selection.
• To analyze the computational complexity of the proposed U-
k-means clustering algorithm
17-10-2023 Unsupervised K-means Clustering Algorithm 9
11. U-k-means clustering Algorithm
17-10-2023 Unsupervised K-means Clustering Algorithm 11
1.Fix ε>0 . Give initial c(0)=n , α(0)k=1/n , a(0)k=xi , and initial
learning rates γ(0)=β(0)=1 . Set t=0 .
2.Compute z(t+1)ik using a(t)k , α(t)k,c(t),γ(t),β(t).
3.Compute γ(t+1)
4.Update α(t+1)k with z(t+1)ik and α(t)k
5.Compute β(t+1) with α(t+1) and α(t)
6. Update c(t) to c(t+1) by discard those clusters
with α(t+1)k≤1/n and adjust α(t+1)k and z(t+1)ik
7.IF t≥60 and c(t−60)−c(t)=0 , THEN let β(t+1)=0 .
7.Update a(t+1)k with c(t+1) and z(t+1)ik
8.Compare a(t+1)k and a(t)k .
IF max1≤k≤c(t)∥∥a(t+1)k−a(t)k∥∥<ε , THEN Stop.
ELSE t=t+1 and return to Step 2.
13. Software implementations
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.cluster import Kmeans
# Define the U-k-means algorithm
class UKMeans:
def __init__(self, n_clusters=3, fuzziness=2, max_iter=100):
self.n_clusters = n_clusters
self.fuzziness = fuzziness
self.max_iter = max_iter
17-10-2023 Unsupervised K-means Clustering Algorithm 13
14. def fit(self, X):
n = X.shape[0]
C = X[np.random.choice(n, self.n_clusters), :]
U = np.random.rand(n, self.n_clusters)
U = U / np.sum(U, axis=1, keepdims=True)
for i in range(self.max_iter):
U_old = U.copy()
for j in range(self.n_clusters):
C[j, :] = np.sum(U[:, j].reshape(-1, 1) * X, axis=0) / np.sum(U[:, j])
dist = np.linalg.norm(X[:, :, np.newaxis] - C.T[np.newaxis, :, :], axis=1)
U = 1 / (dist ** (2 / (self.fuzziness - 1)))
U = U / np.sum(U, axis=1, keepdims=True)
if np.allclose(U, U_old):
break
17-10-2023 Unsupervised K-means Clustering Algorithm 14
15. self.centroids_ = C
self.labels_ = np.argmax(U, axis=1)
# Load the iris dataset
iris = load_iris()
# Preprocess the data
X = iris.data
y = iris.target
scaler = StandardScaler()
X = scaler.fit_transform(X)
# Perform clustering using U-k-means algorithm
best_accuracy = 0
best_model = None
for n_clusters in range(2, 6):
17-10-2023 Unsupervised K-means Clustering Algorithm 15
16. or fuzziness in [1.5, 2, 2.5]:
for max_iter in [50, 100, 200]:
model = UKMeans(n_clusters=n_clusters, fuzziness=fuzziness,
max_iter=max_iter)
model.fit(X)
accuracy = np.mean(model.labels_ == y)
if accuracy > best_accuracy:
best_accuracy = accuracy
best_model = model
# Print the best accuracy and corresponding parameters
print(f"Best accuracy: {best_accuracy}")
print(f"Number of clusters: {best_model.n_clusters}")
print(f"Fuzziness: {best_model.fuzziness}")
print(f"Maximum number of iterations: {best_model.max_iter}"
17-10-2023 Unsupervised K-means Clustering Algorithm 16
17. # Visualize the clusters using scatter plot
sns.set_style("whitegrid")
sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=best_model.labels_, palette="Set2")
sns.scatterplot(x=best_model.centroids_[:, 0], y=best_model.centroids_[:, 1], color="black", marker="X",
s=200, legend=False)
plt.title("Clustering using U-k-means algorithm")
plt.xlabel("Sepal length (cm)")
plt.ylabel("Sepal width (cm)")
plt.show()
# Visualize the distribution of samples across clusters using bar graph
sns.set_style("whitegrid")
sns.countplot(x=best_model.labels_, palette="Set2")
plt.title("Distribution of samples across clusters")
plt.xlabel("Cluster")
plt.ylabel("Number of samples")
plt.show()
17-10-2023 Unsupervised K-means Clustering Algorithm 17
18. # Compute the confusion matrix and classification report
cm = confusion_matrix(y, best_model.labels_)
cr = classification_report(y, best_model.labels_)
# Print the confusion matrix and classification report
print("Confusion Matrix:")
print(cm)
print("Classification Report:")
print(cr)
# Plot the confusion matrix
sns.set_style("white")
sns.heatmap(cm, annot=True, cmap="Blues", fmt="d", xticklabels=iris.target_names,
yticklabels=iris.target_names)
plt.title("Confusion Matrix")
plt.xlabel("Predicted Class")
plt.ylabel("True Class")
plt.show()
17-10-2023 Unsupervised K-means Clustering Algorithm 18
19. Advantages
17-10-2023 Project Title 19
1. High Performance - K-Means algorithm has linear time complexity and
it can be used with large datasets conveniently. With unlabeled big data K-
Means offers many insights and benefits as an unsupervised clustering
algorithm.
2. Unlabeled Data -This one is a general unsupervised machine learning
algorithm that also applies to K-Means.If your data has no labels (class
values or targets) or even column headers, K-Means will still successfully
cluster your data.
3. Easy to Use - K-Means is also easy to use. It can be initialized using
default parameters in the Scikit-Learn implementation.
4. Result Interpretation - K-Means returns clusters which can be easily
interpreted and even visualized. This simplicity makes it highly useful in
some cases when you need a quick overview of the data segments.
20. Applications
K-Means clustering is used in a variety of examples or business cases in
real life, like:
• Academic performance - Based on the scores, students are categorized
into grades like A, B, or C.
• Diagnostic systems - The medical profession uses k-means in creating
smarter medical decision support systems, especially in the treatment of
liver ailments.
• Search engines - Clustering forms a backbone of search engines. When a
search is performed, the search results need to be grouped, and the search
engines very often use clustering to do this.
• Wireless sensor networks - The clustering algorithm plays the role of
finding the cluster heads, which collect all the data in its respective
cluster.
17-10-2023 Unsupervised K-means Clustering Algorithm 20
21. Results and analysis
17-10-2023 Unsupervised K-means Clustering Algorithm 21
The U-k-means algorithm is a variant of the k-means algorithm that assigns
each data point a fuzzy membership to each cluster, instead of assigning it to
only one cluster. The implementation of the U-k-means algorithm was done
using the Python programming language and the scikit-learn library.
The best accuracy achieved using the U-k-means algorithm was 0.83, which
was obtained for the following hyperparameters: 3 clusters, fuzziness value
of 1.5, and maximum number of iterations of 50. The scatter plot showed the
clusters of the iris dataset, and the bar graph showed the distribution of
samples across the clusters.
The confusion matrix and classification report were computed to evaluate the
performance of the U-k-means algorithm. The confusion matrix showed the
number of true positives, true negatives, false positives, and false negatives,
while the classification report showed the precision, recall, F1-score, and
support for each class. The performance of the U-k-means algorithm was not
as good as that of supervised learning algorithms, but it is still a useful tool
for unsupervised learning tasks.
23. Results and analysis
17-10-2023 Unsupervised K-means Clustering Algorithm 23
Confusion Matrix For U-k-means algorithm which introduce
different performance measures
24. Conclusion
17-10-2023 Unsupervised K-means Clustering Algorithm 24
• In this paper a new scheme is proposed with a learning
framework for the k-means clustering algorithm. During
iterations, the U-k-means algorithm will discard extra
clusters, and then an optimal number of clusters can be
automatically found according to the structure of data. The
advantages of U-k-means are free of initializations and
parameters that also robust to different cluster volumes and
shapes with automatically finding the number of clusters.
The proposed U-k-means algorithm was performed on
several synthetic and real data sets and also compared with
most existing algorithms, such as R-EM, C-FS, k-means
with the true number c, k-means + gap, and X-means
algorithms. The results actually demonstrate the superiority
of the U-k-means clustering algorithm.
25. References
[1] A. K. Jain and R. C. Dubes, Algorithms for Clustering Data, Englewood Cliffs, NJ, USA: Prentice-Hall,
1988.
[2] L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. New
York, NY, USA: Wiley, 1990.
[3] G. J. McLachlan and K. E. Basford, Mixture Models: Inference and Applications to Clustering. New
York, NY, USA: Marcel Dekker, 1988.
[4] A. P. Dempster, N. M. Laird, and D. B. Rubin, ‘‘Maximum likelihood from incomplete data via the EM
algorithm (with discussion),’’ J. Roy. Stat. Soc., Ser. B, Methodol., vol. 39, no. 1, pp. 1–38, 1977.
[5] J. Yu, C. Chaomurilige, and M.-S. Yang, ‘‘On convergence and parameter selection of the EM and DA-
EM algorithms for Gaussian mixtures,’’ Pattern Recognit., vol. 77, pp. 188–203, May 2018.
[6] A. K. Jain, ‘‘Data clustering: 50 years beyond K-means,’’ Pattern Recognit. Lett., vol. 31, no. 8, pp.
651–666, Jun. 2010.
[7] M.-S. Yang, S.-J. Chang-Chien, and Y. Nataliani, ‘‘A fully-unsupervised possibilistic C-Means
clustering algorithm,’’ IEEE Access, vol. 6, pp. 78308–78320, 2018.
[8] J. MacQueen, ‘‘Some methods for classification and analysis of multivariate observations,’’ in Proc. 5th
Berkeley Symp. Math. Statist. Probab., vol. 1, 1967, pp. 281–297.
[9] M. Alhawarat and M. Hegazi, ‘‘Revisiting K-Means and topic modeling, a comparison study to cluster
arabic documents,’’ IEEE Access, vol. 6, pp. 42740–42749, 2018.
17-10-2023 Unsupervised K-means Clustering Algorithm 25
26. Project Outcome
• The k-means algorithm is generally the most known and used clustering
method. There are various extensions of k-means to be proposed in the
literature. Although it is an unsupervised learning to clustering in pattern
recognition and machine learning, the k-means algorithm and its
extensions are always influenced by initializations with a necessary
number of clusters a priori.
• The outcome of this project was an unsupervised learning schema for the
k-means algorithm so that it is free of initializations without parameter
selection and can also simultaneously find an optimal number of clusters.
• Also, we analyzed the computational complexity of the proposed U-k-
means clustering algorithm.
17-10-2023 Unsupervised K-means Clustering Algorithm 26