SlideShare a Scribd company logo
Spatial Analysis of
Geo-tagged Tweets
Intern : Ali Akbari
Supervisors: Mohamed Kafsi
Vincent Etter
Professors: Matthias Grossglauser
Patrick Thiran
Summer internship @ EPFL
GOAL
• Finding a spatial pattern for geo-tagged
tweets
– Finding hot spots of a city
– Study their temporal evolution
– People behavior
• Empirical approach using real world data
Data
How can we model the spatial density of tweets ?
Mixture of Gaussians
• Normalization and positivity require
• Linear super-position of Gaussians
1
( ) ( | , )
K
k k k
k
p x N x 

 
1
1 0 1
K
k k
k
 

  
• Mixture of distributions
1
( ) ( | )
K
k
k
p x p x k

 
Maximum Likelihood
• Likelihood Function
 One Gaussian
 independent and identically distributed (i.i.d.)
 Data Set
• Likelihood Function for GMM
1
( | , ) ( | , )
N
n
n
p D N x 

  
1 2 3{ , , ,..., }ND x x x x
1 1
ln ( | , , ) ln ( | , )
N K
k n k k
n k
p D N x   
 
 
   
 
 
Expectation Maximization
1. Initialize
2. E-Step
 Evaluate responsibilities
3. M-step
 Re-estimate parameters, using current responsibilities
4. Evaluate log-likelihood
Check for convergence.
{ , , }k k k 
( )kz
• E-Step
• M-Step
EM Evolution
EM on Synthetic Data
Synthetic data EM Result
10,000 Points
Number of Clusters
• What if the number of clusters is unknown ?
• Naive approach: maximize likelihood
• Bayesian Information Criterion (BIC)
EM and real data
• EM works well with synthetic data
• Different situation with real data
• EM does not always converge
Collapsing
Duplicated point (repetitions)
Outlier points
Duplicated Point
Removing Places
• More flexible hot points
Heat Map of Revised Data (per day)
Heat Map of Revised Data (per Hour)
BIC Result
EM on Real Data
• First local minimum close to extremum
Number of clusters = 25
Conclusion
• Finding a spatial pattern for geo-tagged
tweets
 Mixture of Gaussian
 EM
 Given Number of Clusters
 Collapsing Issue
 BIC
 No global minimum !
Future Work
• Vary number of clusters with respect to time
• Understand in which cases BIC works
• Focus on city center to avoid outliers
Thank you

More Related Content

What's hot

Ahmad Mauliddin Vol Of Water In Bili Bili
Ahmad Mauliddin Vol Of Water In Bili BiliAhmad Mauliddin Vol Of Water In Bili Bili
Ahmad Mauliddin Vol Of Water In Bili Bili
Hartanto Sanjaya
 
[0312] joohee
[0312] joohee[0312] joohee
[0312] joohee
ivaderivader
 
Development Infographic
Development InfographicDevelopment Infographic
Development Infographic
RealMassive
 
オープンハウスにおける 機械学習・データサイエンスの 取り組みについて
オープンハウスにおける機械学習・データサイエンスの取り組みについてオープンハウスにおける機械学習・データサイエンスの取り組みについて
オープンハウスにおける 機械学習・データサイエンスの 取り組みについて
Teito Nakagawa
 
How Rough Is Your Runway?
How Rough Is Your Runway? How Rough Is Your Runway?
How Rough Is Your Runway?
Safe Software
 
Cloud nima afraz
Cloud nima afrazCloud nima afraz
Cloud nima afraz
Nima Afraz
 
ODVSML_Presentation
ODVSML_PresentationODVSML_Presentation
ODVSML_Presentation
Shounak Mitra
 
K venkata reddy
K venkata reddyK venkata reddy
K venkata reddy
ClimDev15
 
NOAA's Climate Data Record (CDR) Program
NOAA's Climate Data Record (CDR) ProgramNOAA's Climate Data Record (CDR) Program
NOAA's Climate Data Record (CDR) Program
American Astronautical Society
 
Multiple volumetric datasets
Multiple volumetric datasetsMultiple volumetric datasets
Multiple volumetric datasets
Su Yan-Jen
 
Denis Reznik Data driven future
Denis Reznik Data driven futureDenis Reznik Data driven future
Denis Reznik Data driven future
Аліна Шепшелей
 
Finding Top-k Dominance on Incomplete Big Data Using MapReduce Framework
Finding Top-k Dominance on Incomplete Big Data Using MapReduce FrameworkFinding Top-k Dominance on Incomplete Big Data Using MapReduce Framework
Finding Top-k Dominance on Incomplete Big Data Using MapReduce Framework
Navid Kalaei
 
Processing of raw astronomical data of large volume by map reduce model
Processing of raw astronomical data of large volume by map reduce modelProcessing of raw astronomical data of large volume by map reduce model
Processing of raw astronomical data of large volume by map reduce model
Sergey Gerasimov
 
Point Clouds: What's New
Point Clouds: What's NewPoint Clouds: What's New
Point Clouds: What's New
Safe Software
 
Adaptive Methane Detection
Adaptive Methane DetectionAdaptive Methane Detection
Adaptive Methane Detection
Ke Ding
 
Exploring Raster with FME
Exploring Raster with FMEExploring Raster with FME
Exploring Raster with FME
Safe Software
 
Probabilistic data structures
Probabilistic data structuresProbabilistic data structures
Probabilistic data structures
Yoav chernobroda
 
Large Scale Tag Recommendation Using Different Image Representations
Large Scale Tag Recommendation Using Different Image RepresentationsLarge Scale Tag Recommendation Using Different Image Representations
Large Scale Tag Recommendation Using Different Image Representations
Rabeeh Abbasi
 
Principal Component Analysis of Quantum Materials Data: a Study in Augmented ...
Principal Component Analysis of Quantum Materials Data: a Study in Augmented ...Principal Component Analysis of Quantum Materials Data: a Study in Augmented ...
Principal Component Analysis of Quantum Materials Data: a Study in Augmented ...
Jorge Quintanilla
 
poster
posterposter

What's hot (20)

Ahmad Mauliddin Vol Of Water In Bili Bili
Ahmad Mauliddin Vol Of Water In Bili BiliAhmad Mauliddin Vol Of Water In Bili Bili
Ahmad Mauliddin Vol Of Water In Bili Bili
 
[0312] joohee
[0312] joohee[0312] joohee
[0312] joohee
 
Development Infographic
Development InfographicDevelopment Infographic
Development Infographic
 
オープンハウスにおける 機械学習・データサイエンスの 取り組みについて
オープンハウスにおける機械学習・データサイエンスの取り組みについてオープンハウスにおける機械学習・データサイエンスの取り組みについて
オープンハウスにおける 機械学習・データサイエンスの 取り組みについて
 
How Rough Is Your Runway?
How Rough Is Your Runway? How Rough Is Your Runway?
How Rough Is Your Runway?
 
Cloud nima afraz
Cloud nima afrazCloud nima afraz
Cloud nima afraz
 
ODVSML_Presentation
ODVSML_PresentationODVSML_Presentation
ODVSML_Presentation
 
K venkata reddy
K venkata reddyK venkata reddy
K venkata reddy
 
NOAA's Climate Data Record (CDR) Program
NOAA's Climate Data Record (CDR) ProgramNOAA's Climate Data Record (CDR) Program
NOAA's Climate Data Record (CDR) Program
 
Multiple volumetric datasets
Multiple volumetric datasetsMultiple volumetric datasets
Multiple volumetric datasets
 
Denis Reznik Data driven future
Denis Reznik Data driven futureDenis Reznik Data driven future
Denis Reznik Data driven future
 
Finding Top-k Dominance on Incomplete Big Data Using MapReduce Framework
Finding Top-k Dominance on Incomplete Big Data Using MapReduce FrameworkFinding Top-k Dominance on Incomplete Big Data Using MapReduce Framework
Finding Top-k Dominance on Incomplete Big Data Using MapReduce Framework
 
Processing of raw astronomical data of large volume by map reduce model
Processing of raw astronomical data of large volume by map reduce modelProcessing of raw astronomical data of large volume by map reduce model
Processing of raw astronomical data of large volume by map reduce model
 
Point Clouds: What's New
Point Clouds: What's NewPoint Clouds: What's New
Point Clouds: What's New
 
Adaptive Methane Detection
Adaptive Methane DetectionAdaptive Methane Detection
Adaptive Methane Detection
 
Exploring Raster with FME
Exploring Raster with FMEExploring Raster with FME
Exploring Raster with FME
 
Probabilistic data structures
Probabilistic data structuresProbabilistic data structures
Probabilistic data structures
 
Large Scale Tag Recommendation Using Different Image Representations
Large Scale Tag Recommendation Using Different Image RepresentationsLarge Scale Tag Recommendation Using Different Image Representations
Large Scale Tag Recommendation Using Different Image Representations
 
Principal Component Analysis of Quantum Materials Data: a Study in Augmented ...
Principal Component Analysis of Quantum Materials Data: a Study in Augmented ...Principal Component Analysis of Quantum Materials Data: a Study in Augmented ...
Principal Component Analysis of Quantum Materials Data: a Study in Augmented ...
 
poster
posterposter
poster
 

Viewers also liked

Tim Mann, CIO at NFU Mutual - Digital Transformation Case Studies: how NFUM i...
Tim Mann, CIO at NFU Mutual - Digital Transformation Case Studies: how NFUM i...Tim Mann, CIO at NFU Mutual - Digital Transformation Case Studies: how NFUM i...
Tim Mann, CIO at NFU Mutual - Digital Transformation Case Studies: how NFUM i...
Global Business Events
 
vsource brochure
vsource brochurevsource brochure
vsource brochure
Angie Ngoc Nguyen
 
Sumendiak
SumendiakSumendiak
Sumendiak
16291538
 
Technology Primer: Monitor Microservices, Containers, Cloud Foundry and Node ...
Technology Primer: Monitor Microservices, Containers, Cloud Foundry and Node ...Technology Primer: Monitor Microservices, Containers, Cloud Foundry and Node ...
Technology Primer: Monitor Microservices, Containers, Cloud Foundry and Node ...
CA Technologies
 
스포츠토토추천〔⊙°⊙〕PuPu82,coM〔⊙°⊙〕 스타배팅 스타토토
스포츠토토추천〔⊙°⊙〕PuPu82,coM〔⊙°⊙〕   스타배팅 스타토토스포츠토토추천〔⊙°⊙〕PuPu82,coM〔⊙°⊙〕   스타배팅 스타토토
스포츠토토추천〔⊙°⊙〕PuPu82,coM〔⊙°⊙〕 스타배팅 스타토토
dfsgghhg
 
Nervous system
Nervous systemNervous system
Nervous system
Michel Kiflen
 
Thin client SPAs? Stream UI using web standards (CodeNight)
Thin client SPAs? Stream UI using web standards (CodeNight)Thin client SPAs? Stream UI using web standards (CodeNight)
Thin client SPAs? Stream UI using web standards (CodeNight)
Starcounter
 
If you drink, floome
If you drink, floomeIf you drink, floome
If you drink, floome
Fabio Alessandro
 
Primary Data VMworld 2015 VVOLs Survey Key Findings
Primary Data VMworld 2015 VVOLs Survey Key Findings Primary Data VMworld 2015 VVOLs Survey Key Findings
Primary Data VMworld 2015 VVOLs Survey Key Findings
PrimaryData
 
performance evaluation of desilting devices
performance evaluation of desilting devicesperformance evaluation of desilting devices
performance evaluation of desilting devices
Gurdeep singh Johar
 

Viewers also liked (13)

Tim Mann, CIO at NFU Mutual - Digital Transformation Case Studies: how NFUM i...
Tim Mann, CIO at NFU Mutual - Digital Transformation Case Studies: how NFUM i...Tim Mann, CIO at NFU Mutual - Digital Transformation Case Studies: how NFUM i...
Tim Mann, CIO at NFU Mutual - Digital Transformation Case Studies: how NFUM i...
 
vsource brochure
vsource brochurevsource brochure
vsource brochure
 
Sumendiak
SumendiakSumendiak
Sumendiak
 
Slideshare test
Slideshare testSlideshare test
Slideshare test
 
лыхмус анастасии
лыхмус анастасиилыхмус анастасии
лыхмус анастасии
 
Technology Primer: Monitor Microservices, Containers, Cloud Foundry and Node ...
Technology Primer: Monitor Microservices, Containers, Cloud Foundry and Node ...Technology Primer: Monitor Microservices, Containers, Cloud Foundry and Node ...
Technology Primer: Monitor Microservices, Containers, Cloud Foundry and Node ...
 
2015
20152015
2015
 
스포츠토토추천〔⊙°⊙〕PuPu82,coM〔⊙°⊙〕 스타배팅 스타토토
스포츠토토추천〔⊙°⊙〕PuPu82,coM〔⊙°⊙〕   스타배팅 스타토토스포츠토토추천〔⊙°⊙〕PuPu82,coM〔⊙°⊙〕   스타배팅 스타토토
스포츠토토추천〔⊙°⊙〕PuPu82,coM〔⊙°⊙〕 스타배팅 스타토토
 
Nervous system
Nervous systemNervous system
Nervous system
 
Thin client SPAs? Stream UI using web standards (CodeNight)
Thin client SPAs? Stream UI using web standards (CodeNight)Thin client SPAs? Stream UI using web standards (CodeNight)
Thin client SPAs? Stream UI using web standards (CodeNight)
 
If you drink, floome
If you drink, floomeIf you drink, floome
If you drink, floome
 
Primary Data VMworld 2015 VVOLs Survey Key Findings
Primary Data VMworld 2015 VVOLs Survey Key Findings Primary Data VMworld 2015 VVOLs Survey Key Findings
Primary Data VMworld 2015 VVOLs Survey Key Findings
 
performance evaluation of desilting devices
performance evaluation of desilting devicesperformance evaluation of desilting devices
performance evaluation of desilting devices
 

Similar to Internship

CLIM Program: Remote Sensing Workshop, Optimization Methods in Remote Sensing...
CLIM Program: Remote Sensing Workshop, Optimization Methods in Remote Sensing...CLIM Program: Remote Sensing Workshop, Optimization Methods in Remote Sensing...
CLIM Program: Remote Sensing Workshop, Optimization Methods in Remote Sensing...
The Statistical and Applied Mathematical Sciences Institute
 
CLIM: Transition Workshop - Optimization Methods in Remote Sensing - Jessica...
CLIM: Transition Workshop - Optimization Methods in Remote Sensing  - Jessica...CLIM: Transition Workshop - Optimization Methods in Remote Sensing  - Jessica...
CLIM: Transition Workshop - Optimization Methods in Remote Sensing - Jessica...
The Statistical and Applied Mathematical Sciences Institute
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
Sudhakar Chavan
 
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Neelabha Pant
 
Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25
MapR Technologies
 
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler..."Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
Dataconomy Media
 
"Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler...
"Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler..."Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler...
"Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler...
Dataconomy Media
 
ODSC India 2018: Topological space creation & Clustering at BigData scale
ODSC India 2018: Topological space creation & Clustering at BigData scaleODSC India 2018: Topological space creation & Clustering at BigData scale
ODSC India 2018: Topological space creation & Clustering at BigData scale
Kuldeep Jiwani
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
Vahid Mirjalili
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
NAVER Engineering
 
Training machine learning k means 2017
Training machine learning k means 2017Training machine learning k means 2017
Training machine learning k means 2017
Iwan Sofana
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
MapR Technologies
 
Pattern recognition binoy k means clustering
Pattern recognition binoy  k means clusteringPattern recognition binoy  k means clustering
Pattern recognition binoy k means clustering
108kaushik
 
Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...
Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...
Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...
Lucidworks
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
Daiki Tanaka
 
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Abdulrahman Kerim
 
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAUNye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
InfinIT - Innovationsnetværket for it
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
Ted Dunning
 
PPT s12-machine vision-s2
PPT s12-machine vision-s2PPT s12-machine vision-s2
PPT s12-machine vision-s2
Binus Online Learning
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
Nandhini S
 

Similar to Internship (20)

CLIM Program: Remote Sensing Workshop, Optimization Methods in Remote Sensing...
CLIM Program: Remote Sensing Workshop, Optimization Methods in Remote Sensing...CLIM Program: Remote Sensing Workshop, Optimization Methods in Remote Sensing...
CLIM Program: Remote Sensing Workshop, Optimization Methods in Remote Sensing...
 
CLIM: Transition Workshop - Optimization Methods in Remote Sensing - Jessica...
CLIM: Transition Workshop - Optimization Methods in Remote Sensing  - Jessica...CLIM: Transition Workshop - Optimization Methods in Remote Sensing  - Jessica...
CLIM: Transition Workshop - Optimization Methods in Remote Sensing - Jessica...
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
 
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
 
Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25
 
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler..."Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
 
"Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler...
"Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler..."Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler...
"Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler...
 
ODSC India 2018: Topological space creation & Clustering at BigData scale
ODSC India 2018: Topological space creation & Clustering at BigData scaleODSC India 2018: Topological space creation & Clustering at BigData scale
ODSC India 2018: Topological space creation & Clustering at BigData scale
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
 
Training machine learning k means 2017
Training machine learning k means 2017Training machine learning k means 2017
Training machine learning k means 2017
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
 
Pattern recognition binoy k means clustering
Pattern recognition binoy  k means clusteringPattern recognition binoy  k means clustering
Pattern recognition binoy k means clustering
 
Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...
Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...
Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
 
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAUNye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
 
PPT s12-machine vision-s2
PPT s12-machine vision-s2PPT s12-machine vision-s2
PPT s12-machine vision-s2
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 

Internship

  • 1. Spatial Analysis of Geo-tagged Tweets Intern : Ali Akbari Supervisors: Mohamed Kafsi Vincent Etter Professors: Matthias Grossglauser Patrick Thiran Summer internship @ EPFL
  • 2. GOAL • Finding a spatial pattern for geo-tagged tweets – Finding hot spots of a city – Study their temporal evolution – People behavior • Empirical approach using real world data
  • 3. Data How can we model the spatial density of tweets ?
  • 4. Mixture of Gaussians • Normalization and positivity require • Linear super-position of Gaussians 1 ( ) ( | , ) K k k k k p x N x     1 1 0 1 K k k k       • Mixture of distributions 1 ( ) ( | ) K k k p x p x k   
  • 5. Maximum Likelihood • Likelihood Function  One Gaussian  independent and identically distributed (i.i.d.)  Data Set • Likelihood Function for GMM 1 ( | , ) ( | , ) N n n p D N x      1 2 3{ , , ,..., }ND x x x x 1 1 ln ( | , , ) ln ( | , ) N K k n k k n k p D N x               
  • 6. Expectation Maximization 1. Initialize 2. E-Step  Evaluate responsibilities 3. M-step  Re-estimate parameters, using current responsibilities 4. Evaluate log-likelihood Check for convergence. { , , }k k k  ( )kz
  • 9. EM on Synthetic Data Synthetic data EM Result 10,000 Points
  • 10. Number of Clusters • What if the number of clusters is unknown ? • Naive approach: maximize likelihood • Bayesian Information Criterion (BIC)
  • 11. EM and real data • EM works well with synthetic data • Different situation with real data • EM does not always converge Collapsing Duplicated point (repetitions) Outlier points
  • 13. Removing Places • More flexible hot points
  • 14. Heat Map of Revised Data (per day)
  • 15. Heat Map of Revised Data (per Hour)
  • 17. EM on Real Data • First local minimum close to extremum Number of clusters = 25
  • 18. Conclusion • Finding a spatial pattern for geo-tagged tweets  Mixture of Gaussian  EM  Given Number of Clusters  Collapsing Issue  BIC  No global minimum !
  • 19. Future Work • Vary number of clusters with respect to time • Understand in which cases BIC works • Focus on city center to avoid outliers