SlideShare a Scribd company logo
1 of 21
Download to read offline
1st edition
November 4-5, 2018
Machine Learning School in Doha
Unsupervised Learning
Sanjay Chawla
Qatar Computing Research Institute (QCRI)
Lets start with flowers….
Iris Setosa Iris Virginica Iris Versicolor
Can we write a computer program to distinguish between the three species ?
Some Features…
Supervised vs. Unsupervised Learning
• Supervised Learning:
– “Learn” a relationship from:
• [SL,SW,PL,PW] à Species(S,Vi,Ve)
• Unsupervised Learning
– “Learn” something from:
• [SL,SW,PL,PW]
– Why ?
How to compare ?
Sepal Length Sepal Width Petal Length Petal Width
f1 5.1 3.4 1.4 0.2
f2 7.2 3.6 6.1 2.5
(5.1-7.2)=-2.1 (3.4-3.6)=-0.2 (1.4-6.1)=-4.7 (0.2-2.5)=-2.3
(5.1-7.2)2 = 4.41 (3.4-3.6)2=0.04 (1.4-6.1)2
=22.09
(0.2-2.5)2=5.29
distance(f1,f2) = (4.41 + 0.04 + 22.09 + 5.29)0.5 = 5.64
This is called Euclidean Distance
Bottom Line: We have a quantitative way of comparing between
two flowers (or any two objects)
The ”average” flower
Sepal Length Sepal Width Petal Length Petal Width
f1 5.1 3.4 1.4 0.2
f2 7.2 3.6 6.1 2.5
Average of two numbers: X and Y is (X+Y)/2
Average of two flowers vectors (rows) f1 and f2:
!.#$%.&
&
,
'.($'.)
&
,
#.($).#
&
,
*.&$&.!
&
Does average flower exist ?
Unsupervised Learning
• We want to find ”three” (or K) averages in our
data set.
• If we can use the species information –then
easy.
• Chicken and Egg problem!
Revisiting chicken-egg problem
4.9 3.4 1.5 0.3Avg Setosa
However we cannot use the labels !
Breaking the chicken-egg problem
• In Machine Learning we break the chicken and egg problem using
”random guess”
• Randomly select three (K) vectors: m1,m2,m3
• Assignment:
– Let C1 ={all data points nearest to m1}
– Let C2 = {all data points nearest to m2}
– Let C3 = {all data points nearest to m3}
• Update:
– m1 is average of C1
– m2 is average of C2
– m3 is average of C3
• Repeat
Bad vs. Good
• Randomization can be tricky!
K-means algorithm
Let C = initial k cluster centroids (often selected randomly)
Mark C as unstable
While <C is unstable>
Assign all data points to their nearest centroid in C.
Compute the centroids of the points assigned to each
element of C.
Update C as the set of new centroids.
Mark C as stable or unstable by comparing with
previous set of centroids.
End While
Complexity: O(nkdI)
n:num of points; k: num of clusters; d: dimension; I: num of iterations
Take away: complexity is linear in n.
From W3-S14
Example: 2 Clusters
c
c
c
c
A(-1,2) B(1,2)
C(-1,-2) D(1,-2)
(0,0)
K-means Problem: Solution is (0,2) and (0,-2) and the clusters are {A,B} and
{C,D}
K-means Algorithm: Suppose the initial centroids are (-1,0) and (1,0) then
{A,C} and {B,D} end up as the two clusters.
4
2
From W3-S16
Clustering with Outlier Detection
In general clustering algorithms are extremely
sensitive to outliers
K-means-- algorithm
Input: Data Set, k (number of clusters), L (number of outliers)
Let C = initial k cluster means (centroids) (often selected randomly)
Mark C as unstable
While <C is unstable>
Assign all data points to their nearest centroid in C.
Sort all points in descending order based on their distance to their nearest
centroid
Remove the top L points
Compute the centroids of the remaining points assigned to each
element of C.
Update C as the set of new centroids.
Mark C as stable or unstable by comparing with
previous set of centroids.
End While
Application of Kmeans--
Association Discovery
• Motivation:
TID Transaction
1 phone, adapter
2 phone, adapter, headphones, USB
3 adapter, charger, USB
4 phone, charger, USB
Definition: A itemset is a set of items
Definition: A itemset is frequent if the number of times it appears
is greater than a pre-defined threshold T
Objective: Find all frequent itemsets
Example
TID Transaction
1 phone, adapter
2 phone, adapter, headphones, USB
3 adapter, charger, USB
4 phone, charger, USB
Support of {phone} = ¾
Support of {phone, adapter} = 2/4
Support of {phone,adapter, USB} = ¼
Support of {charger,USB} = 2/4
Association Discovery
• Brute Force Approach
– Let I be the set of items
– Then number of possible subsets is 2|I| - 1
– For each possible subset check the
percentage of transactions which contain it
• Not practical
– 1000 items:
– 21000 – 1 > number of atoms in the universe
Association Discovery
• Efficient Algorithm
• Key Observation:
– If Itemset1 is a subset of Itemset2 then
support(Itemset1) > support(itemset2)
– Example:
• support(phone) >= support{phone, adapter}
• How can we use this observation to design an
algorithm ?
Latent Dirichlet Allocation
• Suppose you have the following sentences:
1. Technology companies include Amazon and Google, Facebook
2. Google applications shine
3. I bought pizza online from Amazon
4. Fresh pasta and pizza is delicious
• LDA makes it possible to automatically discover
topics from sentences
– Sentence 1& 2 – 100% Topic A
– Sentence 3 – 50% Topic A; 50% Topic B
– Sentence 4 – 100% Topic B
• Topics:
• Topic A – 50% Google, 25% Amazon…
• Topic B – 50% pizza, 25% pasta…..

More Related Content

Similar to MLSD18. Unsupervised Learning

machine_learning.pptx
machine_learning.pptxmachine_learning.pptx
machine_learning.pptxPanchami V U
 
kmean_naivebayes.pptx
kmean_naivebayes.pptxkmean_naivebayes.pptx
kmean_naivebayes.pptxAryanhayaran
 
Mrongraphs acm-sig-2 (1)
Mrongraphs acm-sig-2 (1)Mrongraphs acm-sig-2 (1)
Mrongraphs acm-sig-2 (1)Nima Sarshar
 
Customer Segmentation using Clustering
Customer Segmentation using ClusteringCustomer Segmentation using Clustering
Customer Segmentation using ClusteringDessy Amirudin
 
Mit15 082 jf10_lec01
Mit15 082 jf10_lec01Mit15 082 jf10_lec01
Mit15 082 jf10_lec01Saad Liaqat
 
Branch and bounding : Data structures
Branch and bounding : Data structuresBranch and bounding : Data structures
Branch and bounding : Data structuresKàŕtheek Jåvvàjí
 
Chapter One.pdf
Chapter One.pdfChapter One.pdf
Chapter One.pdfabay golla
 
Secure information aggregation in sensor networks
Secure information aggregation in sensor networksSecure information aggregation in sensor networks
Secure information aggregation in sensor networksAleksandr Yampolskiy
 
Clustering for Beginners
Clustering for BeginnersClustering for Beginners
Clustering for BeginnersSayeed Mahmud
 
ANLY 501 Lab 7 Presentation Group 8 slide.pptx
ANLY 501 Lab 7 Presentation Group 8 slide.pptxANLY 501 Lab 7 Presentation Group 8 slide.pptx
ANLY 501 Lab 7 Presentation Group 8 slide.pptxrinehi3578
 
module2_dIVIDEncONQUER_2022.pdf
module2_dIVIDEncONQUER_2022.pdfmodule2_dIVIDEncONQUER_2022.pdf
module2_dIVIDEncONQUER_2022.pdfShiwani Gupta
 
ML basic &amp; clustering
ML basic &amp; clusteringML basic &amp; clustering
ML basic &amp; clusteringmonalisa Das
 
K-means Clustering || Data Mining
K-means Clustering || Data MiningK-means Clustering || Data Mining
K-means Clustering || Data MiningIffat Firozy
 
Mathematics online: some common algorithms
Mathematics online: some common algorithmsMathematics online: some common algorithms
Mathematics online: some common algorithmsMark Moriarty
 

Similar to MLSD18. Unsupervised Learning (20)

machine_learning.pptx
machine_learning.pptxmachine_learning.pptx
machine_learning.pptx
 
kmean_naivebayes.pptx
kmean_naivebayes.pptxkmean_naivebayes.pptx
kmean_naivebayes.pptx
 
Project PPT
Project PPTProject PPT
Project PPT
 
DAA Notes.pdf
DAA Notes.pdfDAA Notes.pdf
DAA Notes.pdf
 
Lec13 Clustering.pptx
Lec13 Clustering.pptxLec13 Clustering.pptx
Lec13 Clustering.pptx
 
Mrongraphs acm-sig-2 (1)
Mrongraphs acm-sig-2 (1)Mrongraphs acm-sig-2 (1)
Mrongraphs acm-sig-2 (1)
 
Customer Segmentation using Clustering
Customer Segmentation using ClusteringCustomer Segmentation using Clustering
Customer Segmentation using Clustering
 
Teknik Simulasi
Teknik SimulasiTeknik Simulasi
Teknik Simulasi
 
Self healing data
Self healing dataSelf healing data
Self healing data
 
Mit15 082 jf10_lec01
Mit15 082 jf10_lec01Mit15 082 jf10_lec01
Mit15 082 jf10_lec01
 
Branch and bounding : Data structures
Branch and bounding : Data structuresBranch and bounding : Data structures
Branch and bounding : Data structures
 
Chapter One.pdf
Chapter One.pdfChapter One.pdf
Chapter One.pdf
 
Secure information aggregation in sensor networks
Secure information aggregation in sensor networksSecure information aggregation in sensor networks
Secure information aggregation in sensor networks
 
Clustering for Beginners
Clustering for BeginnersClustering for Beginners
Clustering for Beginners
 
ANLY 501 Lab 7 Presentation Group 8 slide.pptx
ANLY 501 Lab 7 Presentation Group 8 slide.pptxANLY 501 Lab 7 Presentation Group 8 slide.pptx
ANLY 501 Lab 7 Presentation Group 8 slide.pptx
 
module2_dIVIDEncONQUER_2022.pdf
module2_dIVIDEncONQUER_2022.pdfmodule2_dIVIDEncONQUER_2022.pdf
module2_dIVIDEncONQUER_2022.pdf
 
ML basic &amp; clustering
ML basic &amp; clusteringML basic &amp; clustering
ML basic &amp; clustering
 
K-means Clustering || Data Mining
K-means Clustering || Data MiningK-means Clustering || Data Mining
K-means Clustering || Data Mining
 
Mathematics online: some common algorithms
Mathematics online: some common algorithmsMathematics online: some common algorithms
Mathematics online: some common algorithms
 
Lect4
Lect4Lect4
Lect4
 

More from BigML, Inc

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingBigML, Inc
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationBigML, Inc
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceBigML, Inc
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesBigML, Inc
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector BigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionBigML, Inc
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLBigML, Inc
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyBigML, Inc
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorBigML, Inc
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsBigML, Inc
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsBigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleBigML, Inc
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIBigML, Inc
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object DetectionBigML, Inc
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image ProcessingBigML, Inc
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureBigML, Inc
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorBigML, Inc
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotBigML, Inc
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...BigML, Inc
 

More from BigML, Inc (20)

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
 

Recently uploaded

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 

Recently uploaded (20)

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 

MLSD18. Unsupervised Learning

  • 1. 1st edition November 4-5, 2018 Machine Learning School in Doha
  • 2. Unsupervised Learning Sanjay Chawla Qatar Computing Research Institute (QCRI)
  • 3. Lets start with flowers…. Iris Setosa Iris Virginica Iris Versicolor Can we write a computer program to distinguish between the three species ?
  • 5. Supervised vs. Unsupervised Learning • Supervised Learning: – “Learn” a relationship from: • [SL,SW,PL,PW] à Species(S,Vi,Ve) • Unsupervised Learning – “Learn” something from: • [SL,SW,PL,PW] – Why ?
  • 6. How to compare ? Sepal Length Sepal Width Petal Length Petal Width f1 5.1 3.4 1.4 0.2 f2 7.2 3.6 6.1 2.5 (5.1-7.2)=-2.1 (3.4-3.6)=-0.2 (1.4-6.1)=-4.7 (0.2-2.5)=-2.3 (5.1-7.2)2 = 4.41 (3.4-3.6)2=0.04 (1.4-6.1)2 =22.09 (0.2-2.5)2=5.29 distance(f1,f2) = (4.41 + 0.04 + 22.09 + 5.29)0.5 = 5.64 This is called Euclidean Distance Bottom Line: We have a quantitative way of comparing between two flowers (or any two objects)
  • 7. The ”average” flower Sepal Length Sepal Width Petal Length Petal Width f1 5.1 3.4 1.4 0.2 f2 7.2 3.6 6.1 2.5 Average of two numbers: X and Y is (X+Y)/2 Average of two flowers vectors (rows) f1 and f2: !.#$%.& & , '.($'.) & , #.($).# & , *.&$&.! & Does average flower exist ?
  • 8. Unsupervised Learning • We want to find ”three” (or K) averages in our data set. • If we can use the species information –then easy. • Chicken and Egg problem!
  • 9. Revisiting chicken-egg problem 4.9 3.4 1.5 0.3Avg Setosa However we cannot use the labels !
  • 10. Breaking the chicken-egg problem • In Machine Learning we break the chicken and egg problem using ”random guess” • Randomly select three (K) vectors: m1,m2,m3 • Assignment: – Let C1 ={all data points nearest to m1} – Let C2 = {all data points nearest to m2} – Let C3 = {all data points nearest to m3} • Update: – m1 is average of C1 – m2 is average of C2 – m3 is average of C3 • Repeat
  • 11. Bad vs. Good • Randomization can be tricky!
  • 12. K-means algorithm Let C = initial k cluster centroids (often selected randomly) Mark C as unstable While <C is unstable> Assign all data points to their nearest centroid in C. Compute the centroids of the points assigned to each element of C. Update C as the set of new centroids. Mark C as stable or unstable by comparing with previous set of centroids. End While Complexity: O(nkdI) n:num of points; k: num of clusters; d: dimension; I: num of iterations Take away: complexity is linear in n. From W3-S14
  • 13. Example: 2 Clusters c c c c A(-1,2) B(1,2) C(-1,-2) D(1,-2) (0,0) K-means Problem: Solution is (0,2) and (0,-2) and the clusters are {A,B} and {C,D} K-means Algorithm: Suppose the initial centroids are (-1,0) and (1,0) then {A,C} and {B,D} end up as the two clusters. 4 2 From W3-S16
  • 14. Clustering with Outlier Detection In general clustering algorithms are extremely sensitive to outliers
  • 15. K-means-- algorithm Input: Data Set, k (number of clusters), L (number of outliers) Let C = initial k cluster means (centroids) (often selected randomly) Mark C as unstable While <C is unstable> Assign all data points to their nearest centroid in C. Sort all points in descending order based on their distance to their nearest centroid Remove the top L points Compute the centroids of the remaining points assigned to each element of C. Update C as the set of new centroids. Mark C as stable or unstable by comparing with previous set of centroids. End While
  • 17. Association Discovery • Motivation: TID Transaction 1 phone, adapter 2 phone, adapter, headphones, USB 3 adapter, charger, USB 4 phone, charger, USB Definition: A itemset is a set of items Definition: A itemset is frequent if the number of times it appears is greater than a pre-defined threshold T Objective: Find all frequent itemsets
  • 18. Example TID Transaction 1 phone, adapter 2 phone, adapter, headphones, USB 3 adapter, charger, USB 4 phone, charger, USB Support of {phone} = ¾ Support of {phone, adapter} = 2/4 Support of {phone,adapter, USB} = ¼ Support of {charger,USB} = 2/4
  • 19. Association Discovery • Brute Force Approach – Let I be the set of items – Then number of possible subsets is 2|I| - 1 – For each possible subset check the percentage of transactions which contain it • Not practical – 1000 items: – 21000 – 1 > number of atoms in the universe
  • 20. Association Discovery • Efficient Algorithm • Key Observation: – If Itemset1 is a subset of Itemset2 then support(Itemset1) > support(itemset2) – Example: • support(phone) >= support{phone, adapter} • How can we use this observation to design an algorithm ?
  • 21. Latent Dirichlet Allocation • Suppose you have the following sentences: 1. Technology companies include Amazon and Google, Facebook 2. Google applications shine 3. I bought pizza online from Amazon 4. Fresh pasta and pizza is delicious • LDA makes it possible to automatically discover topics from sentences – Sentence 1& 2 – 100% Topic A – Sentence 3 – 50% Topic A; 50% Topic B – Sentence 4 – 100% Topic B • Topics: • Topic A – 50% Google, 25% Amazon… • Topic B – 50% pizza, 25% pasta…..