This presentation was given by our project group at the Lead College competition at Shivaji University. Our project got the 1st Prize. We focused mainly on Rough K-Means and build a Social-Network-Recommender System based on Rough K-Means.
The Members of the Project group were -
Mansi Kulkarni,
Nikhil Ingole,
Prasad Mohite,
Varad Meru
Vishal Bhavsar.
Wonderful Experience !!!
k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.
k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...ijscmc
Face recognition is one of the most unobtrusive biometric techniques that can be used for access control as well as surveillance purposes. Various methods for implementing face recognition have been proposed with varying degrees of performance in different scenarios. The most common issue with effective facial biometric systems is high susceptibility of variations in the face owing to different factors like changes in pose, varying illumination, different expression, presence of outliers, noise etc. This paper explores a novel technique for face recognition by performing classification of the face images using unsupervised learning approach through K-Medoids clustering. Partitioning Around Medoids algorithm (PAM) has been used for performing K-Medoids clustering of the data. The results are suggestive of increased robustness to noise and outliers in comparison to other clustering methods. Therefore the technique can also be used to increase the overall robustness of a face recognition system and thereby increase its invariance and make it a reliably usable biometric modality
Clustering: Introduction, Types of clustering;
Partition-based clustering: K-Means, K-Medoids;
Density based clustering: DBSCAN, Clustering evaluation.
Mining Data Stream, Mining Time-Series Data, Mining Sequence Patterns in Transactional Database,
Social Network analysis and Multirelational Data Mining.
Get involved with the steps of Kmeans and Hierarchical clustering and also understand how scaling affects the clustering with Agglomerative and Divise modes.
Do let me know if anything is required. Ping me at google #bobrupakroy
This presentation provides an introduction to the artificial neural networks topic, its learning, network architecture, back propagation training algorithm, and its applications.
Jiawei Han, Micheline Kamber and Jian Pei
Data Mining: Concepts and Techniques, 3rd ed.
The Morgan Kaufmann Series in Data Management Systems
Morgan Kaufmann Publishers, July 2011. ISBN 978-0123814791
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...ijscmc
Face recognition is one of the most unobtrusive biometric techniques that can be used for access control as well as surveillance purposes. Various methods for implementing face recognition have been proposed with varying degrees of performance in different scenarios. The most common issue with effective facial biometric systems is high susceptibility of variations in the face owing to different factors like changes in pose, varying illumination, different expression, presence of outliers, noise etc. This paper explores a novel technique for face recognition by performing classification of the face images using unsupervised learning approach through K-Medoids clustering. Partitioning Around Medoids algorithm (PAM) has been used for performing K-Medoids clustering of the data. The results are suggestive of increased robustness to noise and outliers in comparison to other clustering methods. Therefore the technique can also be used to increase the overall robustness of a face recognition system and thereby increase its invariance and make it a reliably usable biometric modality
Clustering: Introduction, Types of clustering;
Partition-based clustering: K-Means, K-Medoids;
Density based clustering: DBSCAN, Clustering evaluation.
Mining Data Stream, Mining Time-Series Data, Mining Sequence Patterns in Transactional Database,
Social Network analysis and Multirelational Data Mining.
Get involved with the steps of Kmeans and Hierarchical clustering and also understand how scaling affects the clustering with Agglomerative and Divise modes.
Do let me know if anything is required. Ping me at google #bobrupakroy
This presentation provides an introduction to the artificial neural networks topic, its learning, network architecture, back propagation training algorithm, and its applications.
Jiawei Han, Micheline Kamber and Jian Pei
Data Mining: Concepts and Techniques, 3rd ed.
The Morgan Kaufmann Series in Data Management Systems
Morgan Kaufmann Publishers, July 2011. ISBN 978-0123814791
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
k-Means is a rather simple but well known algorithms for grouping objects, clustering. Again all objects need to be represented as a set of numerical features. In addition the user has to specify the number of groups (referred to as k) he wishes to identify. Each object can be thought of as being represented by some feature vector in an n dimensional space, n being the number of all features used to describe the objects to cluster. The algorithm then randomly chooses k points in that vector space, these point serve as the initial centers of the clusters. Afterwards all objects are each assigned to center they are closest to. Usually the distance measure is chosen by the user and determined by the learning task. After that, for each cluster a new center is computed by averaging the feature vectors of all objects assigned to it. The process of assigning objects and recomputing centers is repeated until the process converges. The algorithm can be proven to converge after a finite number of iterations. Several tweaks concerning distance measure, initial center choice and computation of new average centers have been explored, as well as the estimation of the number of clusters k. Yet the main principle always remains the same. In this project we will discuss about K-means clustering algorithm, implementation and its application to the problem of unsupervised learning
K-Means clustering uses an iterative procedure which is very much sensitive and dependent upon the initial centroids. The initial centroids in the k-means clustering are chosen randomly, and hence the clustering also changes with respect to the initial centroids. This paper tries to overcome this problem of random selection of centroids and hence change of clusters with a premeditated selection of initial centroids. We have used the iris, abalone and wine data sets to demonstrate that the proposed method of finding the initial centroids and using the centroids in k-means algorithm improves the clustering performance. The clustering also remains the same in every run as the initial centroids are not randomly selected but through premeditated method.
K means Clustering - algorithm to cluster n objectsVoidVampire
The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, where k < n.
It is similar to the expectation-maximization algorithm for mixtures of Gaussians in that they both attempt to find the centers of natural clusters in the data.
It assumes that the object attributes form a vector space.
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit
The ability to mine and extract useful information automatically, from large datasets, is a
common concern for organizations (having large datasets), over the last few decades. Over the
internet, data is vastly increasing gradually and consequently the capacity to collect and store
very large data is significantly increasing.
Existing clustering algorithms are not always efficient and accurate in solving clustering
problems for large datasets.
However, the development of accurate and fast data classification algorithms for very large
scale datasets is still a challenge. In this paper, various algorithms and techniques especially,
approach using non-smooth optimization formulation of the clustering problem, are proposed
for solving the minimum sum-of-squares clustering problems in very large datasets. This
research also develops accurate and real time L2-DC algorithm based with the incremental
approach to solve the minimum
Similar to K-Means, its Variants and its Applications (20)
Predicting rainfall using ensemble of ensemblesVarad Meru
The Paper was done in a group of three for the class project of CS 273: Introduction to Machine Learning at UC Irvine. The group members were Prolok Sundaresan, Varad Meru, and Prateek Jain.
Regression is an approach for modeling the relationship between data X and the dependent variable y. In this report, we present our experiments with multiple approaches, ranging from Ensemble of Learning to Deep Learning Networks on the weather modeling data to predict the rainfall. The competition was held on the online data science competition portal ‘Kaggle’. The results for weighted ensemble of learners gave us a top-10 ranking, with the testing root-mean-squared error being 0.5878.
Generating Musical Notes and Transcription using Deep LearningVarad Meru
Music has always been the most followed art form, and lot of research had gone into understanding it. In recent years, deep learning approaches for building unsupervised hierarchical representations from unlabeled data have gained significant interest. Progress in fields, such as image processing and natural language processing, has been substantial, but to my knowledge, methods on auditory data for learning representations have not been studied extensively. In this project I try to use two methods for generating music from range of musical inputs such as MIDI to complex WAV formats. I use RNN-RBMs and CDBN to explore music.
Kakuro: Solving the Constraint Satisfaction ProblemVarad Meru
This work was done as a part of the project for the course CS 271: Introduction to Artificial Intelligence (http://www.ics.uci.edu/~kkask/Fall-2014%20CS271/index.html), taught in Fall 2014.
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...Varad Meru
Slides created as a part of CS 295's week 5 on Transactions and Systems.
CS 295 (Cloud Computing and BigData) at UCI - https://sites.google.com/site/cs295cloudcomputing/
Cassandra - A Decentralized Structured Storage SystemVarad Meru
Slides created as a part of CS 295's week 4 on NoSQL Basics.
CS 295 (Cloud Computing and BigData) at UCI - https://sites.google.com/site/cs295cloudcomputing/
Slides created as a part of CS 295's week 1 on Cloud Computing Basics.
CS 295 (Cloud Computing and BigData) at UCI - https://sites.google.com/site/cs295cloudcomputing/
Live Wide-Area Migration of Virtual Machines including Local Persistent State.Varad Meru
Slides created as a part of CS 295's week 2 on Virtualization in cloud.
CS 295 (Cloud Computing and BigData) at UCI - https://sites.google.com/site/cs295cloudcomputing/
Machine Learning and Apache Mahout : An IntroductionVarad Meru
An Introductory presentation on Machine Learning and Apache Mahout. I presented it at the BigData Meetup - Pune Chapter's first meetup (http://www.meetup.com/Big-Data-Meetup-Pune-Chapter/).
Introduction to Mahout and Machine LearningVarad Meru
This presentation gives an introduction to Apache Mahout and Machine Learning. It presents some of the important Machine Learning algorithms implemented in Mahout. Machine Learning is a vast subject; this presentation is only a introductory guide to Mahout and does not go into lower-level implementation details.
This article got published in the Software Developer's Journal's February Edition.
It describes the use of MapReduce paradigm to design Clustering algorithms and explain three algorithms using MapReduce.
- K-Means Clustering
- Canopy Clustering
- MinHash Clustering
I gave a series of Seminars at the following colleges in Solapur.
1. Walchand Institute of Technology, Solapur.
2. Brahmdevdada Mane Institute of Technology, Solapur.
3. Orchid College of Engineering & Technology, Solapur.
4. SVERI's College of Engineering, Pandharpur.
It focussed on what 'BigData' is and how the next generation of professionals should be ready the BigData revolution
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
1. K-Means, its Variants and its Applications
Group 9
-------------------------------
Varad Meru, Nikhil Ingole, Mansi Kulkarni, Vishal Bhavsar, Prasad Mohite
-------------------------------
Guided By: Mrs. V. S. Rupnar
-------------------------------
Department of Computer Science and Engineering
D. Y. Patil College of Engineering and Technology
Kolhapur
1
Monday, 29 July 13
2. Work Completed in the Previous Semester
✓ Selection of Topic and Preliminary Understanding of Clustering.
✓ Implementation of K-Means algorithm with Synthetic Data.
✓ Development of Graphical Representation of Clusters.
✓ Understanding and Implementation of Rough Set Clustering.
✓ Real World Data : Data Collection based on Surveys.
✓ Implementation of Conventional Clustering on Input Surveys Details for Cluster Generation and Recommender System.
✓ Implementing Rough-Set Clustering on Input Surveys Details for Cluster Generation and Recommender System.
2
Monday, 29 July 13
3. Work Completed in this Semester
✓ Study of Genetic Algorithms and its Implementation issues.
✓ Adaption of JavaGAlib for K-Means Clustering.
✓ Verification and Validation of Cluster Quality with all the following Processes :
➡ K-Means, Rough K-Means, GA Rough K-Means.
✓ Recommender System Design and Initial Prototype Evaluation based on K-Means Algorithm.
✓ Verification and Validation of Recommendations and Applying Heuristics on the Results of the Recommendations for
Precision
✓ Recommender System Design and Initial Prototype Evaluation based on Rough K-Means Algorithm.
3
Monday, 29 July 13
4. Introduction to Clustering
• Organizing data into clusters such that there is
• high intra-cluster similarity
• low inter-cluster similarity
• Informally, finding natural groupings among
objects.
• Applications of clustering range from various fields
• Data Compression, Data Modeling, Expression
Analysis and other Fields of Applications.
4
Monday, 29 July 13
5. Introduction to K-Means Algorithm
• It was proposed in the year 1956 by Hugo Steinhaus.
• It finds partitions such that the Squared Error between the Empirical Mean of a Cluster and the Points in that
Cluster is Minimized
• Squared Error is defined as :
• The Goal of K-Means is to minimize the sum of the Squared Error over all the K-Clusters.
• Minimizing this Objective Function is known to be an NP-Hard Problem (even for K=2).
5
Monday, 29 July 13
6. K-Means Clustering Algorithm
Stop
Start
Input: K, no. of Clusters
to be Formed
Centroid Initialization
Find Distance of
Objects
to Centroids
Partition based on
Minimum distance
New Additions
in Group ?
Yes
No
6
Monday, 29 July 13
7. Graph of Clusters in Synthetic DataResult of K-Means Algorithm
6 Lingras
Fig. 2. Synthetic data
7
Monday, 29 July 13
10. Introduction to Rough Sets
• It was proposed in the year 1991 by Zdzislaw I. Pawlak.
• Formal Approximation of Crisp Sets in terms of a pair of sets.
• Pairs gives the Lower and Upper Approximation of original set.
• The Rough set are based on Equivalence class partitioning.
• The pair A=(U,R) is called Approximation Space.
• The lower bound is the union of all the elementary sets which are subsets of X.
• The upper bound is the union of all elementary sets which have a non-empty intersection with X
• The set X{ , } is the formal representation of regular set X.
• It is not possible to differentiate the elements within the same equivalence class.
Monday, 29 July 13
11. Adaptation of Rough Sets into K-Means Clustering
• We consider the upper and lower bounds for only a few subsets of U.
• It is not possible to verify all the properties of the rough sets ( Pawlak, `82,`91).
• Lingras et. al. classified these compulsory rules for rough set clustering
• An object v can be part of at most one lower bound
•
• An object v is not part of any lower bound v belongs to two or more upper bounds.
Monday, 29 July 13
12. Evolutionary Rough K-means 7
Fig. 3. Rough clusters for the synthetic data
ified criterion. The paper demonstrates the use of the proposed algorithm for a
Result of Rough Set Clustering Graph of Clusters in Synthetic Data
12
Monday, 29 July 13
13. Lingras’s Absolute Distance Formula
• If the distance given by :
• Consider the Set T : -
• T ≠ Ø, The point X is associated with 2 or more clusters’ upper bounds.
• T = Ø, X Exists in lower bound of only one cluster.
1482 G. Peters / Pattern Reco
Boundary
Area
Lower Approximation
Upper Approximation
Fig. 1. Lower, upper approximation and boundary area.
Monday, 29 July 13
14. Peters’s Refinements on Lingras’s Absolute Distance Formula
• Limitations of Lingras method-
• Outlier in inline position: b = az.
• Outlier in an rectangular position.
Monday, 29 July 13
15. Modified Rough K-Means
• Centroid calculation in Rough Clustering
• Membership Assignment on the basis of
• Let , the ratio are used to determine the membership of X.
• Let and .
• T ≠ Ø, The point X is associated with 2 or more clusters’ upper bounds.
• T = Ø, X Exists in lower bound of only one cluster.
Monday, 29 July 13
19. Genetic Algorithm based Rough Set Clustering
• Genetic Algorithms - Introduction
• A search process that follows the principles of evolution through natural selection.
• Important terms : Genes, Genome, Chromosomes, Populations, Generations, Fitness, Selection, Crossover, Mutation.
• This paradigm has the following steps
• generate initial population, G(0);
evaluate G(0);
for (t = 1; solution is not found; t++)
generate G(t) using G(t-1);
evaluate G(t);
19
Monday, 29 July 13
20. Genetic Algorithm based Rough Set Clustering
• Genetic Algorithms for Rough set Clustering
• JavaGALib : A Java Library built by Jeff Smith of SoftTechDesign to support GA operations
20
p - Threshold
D(n,m) - A Dataset with n objects of m dimensions
k - The number of Clusters
w_lower, w_upper
population - The number of chromosomes to be generated
generations - The number of successive populations to be
generated
Input Fields -
A set of clusters. Each cluster is by the objects in the lower
region and boundary region(upper bound)
Output -
• Data Structures used for Genetic Algorithms for Rough set Clustering
...
Chromosomes
Centroid1* Centroid2* Centroid3*
Monday, 29 July 13
21. Genetic Algorithm based Rough Set Clustering
• Constructor Description for Genetic Algorithm
• super(numOfClusters*numOfDimensions,//no.of genes in a chromosome
100,//population of chromosome
0.7,//crossover probability
6,//random selection chance
50,//stop after these many generations
10,//no. of preliminary runs to build good breeding stack for finding fall run
20,//max preliminary generations
0.1,//chromosome mutation probability
Crossover.ctTwoPoint,//crossover type
2,//number of decimal pts of precision
false//considers only float numbers
);
}//end constructor
21
• Evolve Function
computeFitnessRankings();
doGeneticMating();
copyNextGenToThisGen();
Monday, 29 July 13
23. Rough Set Clustering based on Kohonen SOM Paradigm
• Kohonen network Architecture is used as an Artificial Neural Network Paradigm.
• The Single level, One-Dimensional case can be seen in fig. 1.
• The weight vector x for a group that is closest to the pattern v is modified using
• void update(int winner, int objectID) {
for (int j = 0; j < weights[winner].length; j++)
weights[winner][j] = (1 - alpha) * weights[winner][j] + alpha
* objects[objectID][j];
• The Updates are carried over the previous weights.
23
J
0 0
1
Output Layer
Input Layer
Fig. 1. Kohonen
Neural Network
Monday, 29 July 13
24. Rough Set Clustering based on Kohonen SOM Paradigm
• The distance metric is calculated by the following code fragment
• double dist(int objectID, int weightID) {
double d = 0;
for (int j = 0; j < weights[0].length; j++) {
double o = objects[objectID][j]; double c = weights[weightID][j];
d += (c - o) * (c - o);
} if (weights[0].length == 0)
return 0;
return Math.sqrt(d) / weights[0].length;
}
• The Flow of the Kohonen K-Means Implementation is as follows
• Kohonen m = new Kohonen(numOfRows, numOfCols, numOfClusters, 0.01);
m.readObjects(args[0]);
m.makeClusters(numOfIterations);
m.writeClusters();
m.writeCentroids();
24
X1
0 01
X2
X3
0 1 0
Monday, 29 July 13
26. Recommender System based on Clustering
• Recommender System is an Information Filtering Technique based System.
• It applies Knowledge Discovery Techniques such as Clustering, Classification, and Filtering to find out
Recommendations.
• Exposing the most interesting items for the user saves time and energy.
• Techniques include K-Nearest Neighbor and Collaborative filtering to give Recommendations.
• Why Clustering?
• Basic feature of clustering algorithm is natural grouping.
• Challenges in above two algorithms are overcome.
• K-Means works on a P-Time algorithm to give crisp Clusters.
26
Monday, 29 July 13
27. Recommender System based on Clustering
• Recommendations for K-Means Algorithm:
• All the members of the cluster where the user lies are recommended.
• Recommendations for Rough K-Means Algorithm:
• If the user lies in lower bound of the cluster, All the members lying in lower bound of that cluster are
recommended.
• If the user lies in the upper bound of two or more clusters, All the members in the upper bound are
recommended.
Monday, 29 July 13
28. Recommender System based on Clustering
28
System ArchitectureUser Perspective
Monday, 29 July 13
30. References
• Completed:
✓ K-Means Algorithm
• “Data Clustering: 50 Years Beyond K-Means”, Anil K. Jain, 2010.
✓ Rough Set based K-Means Algorithm
• “Precision of Rough Set Clustering”, Pawan Lingras, Min Chen, Duoqian Miao, 2008
• “Some Refinements of Rough K-means Clustering”, George Peters, 2006.
• “Interval Set Clustering of Web Users with Rough K-Means”, Pawan Lingras, Chad West, 2003
✓ Rough K-Means based on Genetic Algorithm and Kohonen Self-Organizing Maps Paradigm
• “Applications of Rough Set Based K-Means, Kohonen SOM, GA Clustering”, Pawan Lingras, 2006.
• “Evolutionary Rough K-Means Clustering”, Pawan Lingras, 2009.
30
Monday, 29 July 13
31. References (Contd.)
• Recommender System
• “Enhanced K-means-Based Mobile Recommender System”, Gamal Hussein, International Journal of
Information Studies, April 2010.
• “Clustering Social Networks”, Nina Mishra, Robert Schreiber, Isabelle Stanton, and Robert E. Tarjan, 2006
• K-Means based on Genetic Algorithms
• “Genetic K-Means Algorithm”, K. Krishna and M. Narasimha Murty, IEEE Transactions on Systems, Man and
Cybernetics, 1999.
• “Initializing K-Means using Genetic Algorithms”, Bashar Al-Shboul, and Sung-Hyon Myaeng, World Academy
of Science, Engineering and Technology, 2009.
• Advanced Topics
• “FGKA- A Fast Genetic K-means Clustering Algorithm”,Yi Lu, Shiyong Lu, Farshad Fotouhi, Youping Deng,
Susan J. Brown, 2004.
• “Incremental genetic K-means algorithm and its application in gene expression data analysis”, Yi Lu, Shiyong
Lu, Farshad Fotouhi, Youping Deng, Susan J. Brown, 2004.
• “A Genetic Algorithm for Clustering on Image Data”, Qin Ding and Jim Gasvoda, International Journal of
Computational Intelligence,2004.
31
Monday, 29 July 13