SlideShare a Scribd company logo
1 of 20
Download to read offline
1
Clustering
Class Algorithmic Methods of Data Mining
Program M. Sc. Data Science
University Sapienza University of Rome
Semester Fall 2015
Lecturer Carlos Castillo http://chato.cl/
Sources:
● Mohammed J. Zaki, Wagner Meira, Jr., Data Mining and Analysis:
Fundamental Concepts and Algorithms, Cambridge University
Press, May 2014. Part 3. [download]
● Evimaria Terzi: Data Mining course at Boston University
http://www.cs.bu.edu/~evimaria/cs565-13.html
2
Why 3 groups instead of 2?
Why these sizes?
3
Clustering
● Given a set of elements (e.g. documents)
● Group similar elements together
● So that:
– Inside a group, elements are similar
– Across groups, elements are different
4
What is clustering?
Inter-cluster
distances are
maximized
Intra-cluster
distances are
minimized
5
Outliers
• Outliers are objects that do not belong to any cluster or
form clusters of very small cardinality
• In some applications we are interested in discovering outliers,
not clusters (outlier analysis)
cluster
outliers
6
Why do we cluster?
●
Clustering results are used:
– As a stand-alone tool to get insight into data
distribution
●
Visualization of clusters may unveil important information
– As a preprocessing step for other algorithms
●
Efficient indexing or compression often relies on clustering
7
Applications
• Image Processing
– cluster images based on their visual content
• Web
– Cluster groups of users based on their access
patterns on webpages
– Cluster webpages based on their content
• Bioinformatics
– Cluster similar proteins together (similarity wrt
chemical structure and/or functionality etc)
• Many more…
8
http://musicmachinery.com/2013/09/22/5025/
9http://www.nature.com/articles/srep00196/figures/2
10
Clustering questions
● How many clusters?
– Given as input or determined by algorithm
● How good is a clustering?
– Intra similarity, inter similarity, number of clusters
● Can an element belong to > 1 cluster?
– Hard clustering vs Soft clustering
11
Boston University Slideshow Title Goes Here
How many clusters?
12
Boston University Slideshow Title Goes Here
Types of clusterings
• Partitional
• each object belongs in exactly one cluster
• Hierarchical
• a set of nested clusters organized in a tree
13
Boston University Slideshow Title Goes Here
Partitional algorithms
• partition the n objects into k clusters
• each object belongs to exactly one cluster
• the number of clusters k is given in advance
14
Boston University Slideshow Title Goes Here
Partitional clustering
Original points Partitional clustering
15
Example: 1-dimensional clustering
Communism Socialism Liberalism Conservatism Monarchism Fascism
16
Parenthesis: 2D political spectrum
http://www.termometropolitico.it/119350_dai-modelli-collocazione-nello-spazio-politico-test-per-elezioni-europee-2014.html
17
1 dimensional clustering
5 11 13 16 25 36 38 39 42 60 62 64 67
How would you cluster this data? Why?
18
1 dimensional clustering
5 11 13 16 25 36 38 39 42 60 62 64 67
What about now, how would you cluster?
19
Two very important metrics
● Minimum inter-cluster distance
(should be large)
● Maximum intra-cluster distance
(should be small)
20
1 dimensional clustering
5 11 13 16 25 36 38 39 42 60 62 64 67
Exercise:
For each of these 3 clusterings:
● Compute minimum inter-cluster distance.
● Compute maximum intra-cluster distance.
5 11 13 16 25 36 38 39 42 60 62 64 67
5 11 13 16 25 36 38 39 42 60 62 64 67
http://chato.cl/2015/data_analysis/exercise-answers/clustering_exercise_01_answer.txt

More Related Content

What's hot

Doing data science with F#
Doing data science with F#Doing data science with F#
Doing data science with F#Tomas Petricek
 
Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein A...
Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein A...Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein A...
Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein A...IMPACT Centre of Competence
 
Doing data science with F# (BuildStuff)
Doing data science with F# (BuildStuff)Doing data science with F# (BuildStuff)
Doing data science with F# (BuildStuff)Tomas Petricek
 
Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...
Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...
Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...AIST
 
Automated interpretability of linked data ontologies: an evaluation within th...
Automated interpretability of linked data ontologies: an evaluation within th...Automated interpretability of linked data ontologies: an evaluation within th...
Automated interpretability of linked data ontologies: an evaluation within th...Nuno Freire
 
Information Extraction in the TalkOfEurope Creative Camp
Information Extraction in the TalkOfEurope Creative CampInformation Extraction in the TalkOfEurope Creative Camp
Information Extraction in the TalkOfEurope Creative CampWim Peters
 

What's hot (8)

Contract Law A: Assessment 1 Help
Contract Law A: Assessment 1 HelpContract Law A: Assessment 1 Help
Contract Law A: Assessment 1 Help
 
Doing data science with F#
Doing data science with F#Doing data science with F#
Doing data science with F#
 
Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein A...
Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein A...Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein A...
Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein A...
 
Doing data science with F# (BuildStuff)
Doing data science with F# (BuildStuff)Doing data science with F# (BuildStuff)
Doing data science with F# (BuildStuff)
 
Prez(2)
Prez(2)Prez(2)
Prez(2)
 
Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...
Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...
Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...
 
Automated interpretability of linked data ontologies: an evaluation within th...
Automated interpretability of linked data ontologies: an evaluation within th...Automated interpretability of linked data ontologies: an evaluation within th...
Automated interpretability of linked data ontologies: an evaluation within th...
 
Information Extraction in the TalkOfEurope Creative Camp
Information Extraction in the TalkOfEurope Creative CampInformation Extraction in the TalkOfEurope Creative Camp
Information Extraction in the TalkOfEurope Creative Camp
 

Similar to Clustering

Open Web Data for Education - Linked Data technologies for connecting open ed...
Open Web Data for Education - Linked Data technologies for connecting open ed...Open Web Data for Education - Linked Data technologies for connecting open ed...
Open Web Data for Education - Linked Data technologies for connecting open ed...Mathieu d'Aquin
 
Data Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world ChallengesData Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world ChallengesYuchen Zhao
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown BagDataTactics
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Rich Heimann
 
Clustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfClustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfigeabroad
 
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...ABINASHPADHY6
 
algoritma klastering.pdf
algoritma klastering.pdfalgoritma klastering.pdf
algoritma klastering.pdfbintis1
 
For iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptxFor iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptxSureshPolisetty2
 
BIM Data Mining Unit5 by Tekendra Nath Yogi
 BIM Data Mining Unit5 by Tekendra Nath Yogi BIM Data Mining Unit5 by Tekendra Nath Yogi
BIM Data Mining Unit5 by Tekendra Nath YogiTekendra Nath Yogi
 
TS4-3: Takumi Sato from Nagoya Institute of Technology
TS4-3: Takumi Sato from Nagoya Institute of TechnologyTS4-3: Takumi Sato from Nagoya Institute of Technology
TS4-3: Takumi Sato from Nagoya Institute of TechnologyJawad Haqbeen
 
Data Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering AlgorithmData Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering Algorithmnishant24894
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISrathnaarul
 
Data mining Techniques
Data mining TechniquesData mining Techniques
Data mining TechniquesSulman Ahmed
 

Similar to Clustering (20)

Unit 5-1.pdf
Unit 5-1.pdfUnit 5-1.pdf
Unit 5-1.pdf
 
Open Web Data for Education - Linked Data technologies for connecting open ed...
Open Web Data for Education - Linked Data technologies for connecting open ed...Open Web Data for Education - Linked Data technologies for connecting open ed...
Open Web Data for Education - Linked Data technologies for connecting open ed...
 
Data Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world ChallengesData Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world Challenges
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
Clustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfClustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdf
 
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
 
algoritma klastering.pdf
algoritma klastering.pdfalgoritma klastering.pdf
algoritma klastering.pdf
 
ase-social-informatics (6)
ase-social-informatics (6)ase-social-informatics (6)
ase-social-informatics (6)
 
Social Network Analysis Applications and Approach
Social Network Analysis Applications and ApproachSocial Network Analysis Applications and Approach
Social Network Analysis Applications and Approach
 
Clustering - K-Means, DBSCAN
Clustering - K-Means, DBSCANClustering - K-Means, DBSCAN
Clustering - K-Means, DBSCAN
 
For iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptxFor iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptx
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
BIM Data Mining Unit5 by Tekendra Nath Yogi
 BIM Data Mining Unit5 by Tekendra Nath Yogi BIM Data Mining Unit5 by Tekendra Nath Yogi
BIM Data Mining Unit5 by Tekendra Nath Yogi
 
TS4-3: Takumi Sato from Nagoya Institute of Technology
TS4-3: Takumi Sato from Nagoya Institute of TechnologyTS4-3: Takumi Sato from Nagoya Institute of Technology
TS4-3: Takumi Sato from Nagoya Institute of Technology
 
Data Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering AlgorithmData Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering Algorithm
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
 
Data mining Techniques
Data mining TechniquesData mining Techniques
Data mining Techniques
 
Clustering
ClusteringClustering
Clustering
 

More from Carlos Castillo (ChaTo)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social MediaCarlos Castillo (ChaTo)
 
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Carlos Castillo (ChaTo)
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Carlos Castillo (ChaTo)
 

More from Carlos Castillo (ChaTo) (20)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social Media
 
When no clicks are good news
When no clicks are good newsWhen no clicks are good news
When no clicks are good news
 
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)
 
Discrimination Discovery
Discrimination DiscoveryDiscrimination Discovery
Discrimination Discovery
 
Fairness-Aware Data Mining
Fairness-Aware Data MiningFairness-Aware Data Mining
Fairness-Aware Data Mining
 
Big Crisis Data for ISPC
Big Crisis Data for ISPCBig Crisis Data for ISPC
Big Crisis Data for ISPC
 
Databeers: Big Crisis Data
Databeers: Big Crisis DataDatabeers: Big Crisis Data
Databeers: Big Crisis Data
 
Observational studies in social media
Observational studies in social mediaObservational studies in social media
Observational studies in social media
 
Natural experiments
Natural experimentsNatural experiments
Natural experiments
 
Content-based link prediction
Content-based link predictionContent-based link prediction
Content-based link prediction
 
Link prediction
Link predictionLink prediction
Link prediction
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Graph Partitioning and Spectral Methods
Graph Partitioning and Spectral MethodsGraph Partitioning and Spectral Methods
Graph Partitioning and Spectral Methods
 
Finding Dense Subgraphs
Finding Dense SubgraphsFinding Dense Subgraphs
Finding Dense Subgraphs
 
Graph Evolution Models
Graph Evolution ModelsGraph Evolution Models
Graph Evolution Models
 
Link-Based Ranking
Link-Based RankingLink-Based Ranking
Link-Based Ranking
 
Text Indexing / Inverted Indices
Text Indexing / Inverted IndicesText Indexing / Inverted Indices
Text Indexing / Inverted Indices
 
Indexing
IndexingIndexing
Indexing
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 

Recently uploaded

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Recently uploaded (20)

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Clustering

  • 1. 1 Clustering Class Algorithmic Methods of Data Mining Program M. Sc. Data Science University Sapienza University of Rome Semester Fall 2015 Lecturer Carlos Castillo http://chato.cl/ Sources: ● Mohammed J. Zaki, Wagner Meira, Jr., Data Mining and Analysis: Fundamental Concepts and Algorithms, Cambridge University Press, May 2014. Part 3. [download] ● Evimaria Terzi: Data Mining course at Boston University http://www.cs.bu.edu/~evimaria/cs565-13.html
  • 2. 2 Why 3 groups instead of 2? Why these sizes?
  • 3. 3 Clustering ● Given a set of elements (e.g. documents) ● Group similar elements together ● So that: – Inside a group, elements are similar – Across groups, elements are different
  • 4. 4 What is clustering? Inter-cluster distances are maximized Intra-cluster distances are minimized
  • 5. 5 Outliers • Outliers are objects that do not belong to any cluster or form clusters of very small cardinality • In some applications we are interested in discovering outliers, not clusters (outlier analysis) cluster outliers
  • 6. 6 Why do we cluster? ● Clustering results are used: – As a stand-alone tool to get insight into data distribution ● Visualization of clusters may unveil important information – As a preprocessing step for other algorithms ● Efficient indexing or compression often relies on clustering
  • 7. 7 Applications • Image Processing – cluster images based on their visual content • Web – Cluster groups of users based on their access patterns on webpages – Cluster webpages based on their content • Bioinformatics – Cluster similar proteins together (similarity wrt chemical structure and/or functionality etc) • Many more…
  • 10. 10 Clustering questions ● How many clusters? – Given as input or determined by algorithm ● How good is a clustering? – Intra similarity, inter similarity, number of clusters ● Can an element belong to > 1 cluster? – Hard clustering vs Soft clustering
  • 11. 11 Boston University Slideshow Title Goes Here How many clusters?
  • 12. 12 Boston University Slideshow Title Goes Here Types of clusterings • Partitional • each object belongs in exactly one cluster • Hierarchical • a set of nested clusters organized in a tree
  • 13. 13 Boston University Slideshow Title Goes Here Partitional algorithms • partition the n objects into k clusters • each object belongs to exactly one cluster • the number of clusters k is given in advance
  • 14. 14 Boston University Slideshow Title Goes Here Partitional clustering Original points Partitional clustering
  • 15. 15 Example: 1-dimensional clustering Communism Socialism Liberalism Conservatism Monarchism Fascism
  • 16. 16 Parenthesis: 2D political spectrum http://www.termometropolitico.it/119350_dai-modelli-collocazione-nello-spazio-politico-test-per-elezioni-europee-2014.html
  • 17. 17 1 dimensional clustering 5 11 13 16 25 36 38 39 42 60 62 64 67 How would you cluster this data? Why?
  • 18. 18 1 dimensional clustering 5 11 13 16 25 36 38 39 42 60 62 64 67 What about now, how would you cluster?
  • 19. 19 Two very important metrics ● Minimum inter-cluster distance (should be large) ● Maximum intra-cluster distance (should be small)
  • 20. 20 1 dimensional clustering 5 11 13 16 25 36 38 39 42 60 62 64 67 Exercise: For each of these 3 clusterings: ● Compute minimum inter-cluster distance. ● Compute maximum intra-cluster distance. 5 11 13 16 25 36 38 39 42 60 62 64 67 5 11 13 16 25 36 38 39 42 60 62 64 67 http://chato.cl/2015/data_analysis/exercise-answers/clustering_exercise_01_answer.txt