Model-based clustering for BSSusage mining,a case study with the velib’ system of ParisEtienne Côme15/10/2012
OutlineBike Sharing Systems (BSS)What is fun with BSS ?    Relatively new systems    Rapidly diffusing (EU and US nowadays...
OutlineOutline1   Introduction       Problematics       Usage data : trips records       Velib’ in few numbers and picture...
Introduction   ProblematicsProblematicsOperational objectives    Planning new systems : position, size of the stations    ...
Introduction   Usage data : trips recordsRaw dataTrips data     departure time stamp     departure station     arrival tim...
Introduction   Velib’ in few numbers and pictures               in few numbersBSS size :    1200 stations    ≈ 40000 slots...
Introduction     Velib’ in few numbers and picturesGlobal behavior   100 000                                             1...
Introduction    Velib’ in few numbers and picturesTemporal effects          35 000                                        ...
Introduction        Velib’ in few numbers and picturesTemporal effects                                                7 50...
Introduction   Velib’ in few numbers and picturesSpatial effects                 F IG . 4: Incoming trips map [6h,7h] for ...
Introduction   Velib’ in few numbers and picturesSpatial effects                                             24           ...
Introduction   Tools and approachApproach, exploratory data analysisGeneral methodologie       Use clustering algorithms t...
Introduction   Tools and approachTools, model based clusteringGeneral methodologieImagine a data generation process⇒ which...
Introduction    Tools and approachGenerative approachClusteringModel-based clustering : 1     Draw the cluster of sample (...
Introduction   Tools and approachData generation processGraphical model representation1. Draw the cluster of sample (i)   ...
Introduction   Tools and approachData generation processGraphical model representation2. Depending on the cluster draw the...
Introduction   Tools and approachModel based clustering frameworkTask and tools    Inferring the parameters :    ⇒ EM algo...
Introduction   Tools and approachModel based clustering frameworkTask and tools    Inferring the parameters :    ⇒ EM algo...
Introduction   Tools and approachModel based clustering frameworkTask and tools    Inferring the parameters :    ⇒ EM algo...
Stations clustering using temporal usage profiles      Stations clustering using      temporal usage profilesEtienne Côme (I...
Stations clustering using temporal usage profilesStations clustering using temporal usage profilesObjectives :    Find group...
Stations clustering using temporal usage profiles   Data representation : count time seriesData representation : count time...
Stations clustering using temporal usage profiles   Generative model : naive Poisson mixtureGenerative model : naive Poisso...
Stations clustering using temporal usage profiles   Generative model : naive Poisson mixtureGenerative modelNaive Poisson m...
Stations clustering using temporal usage profiles        Generative model : naive Poisson mixtureParameters estimation, lik...
Stations clustering using temporal usage profiles      Generative model : naive Poisson mixtureEM algorithm⇒ Straightforwar...
Stations clustering using temporal usage profiles   Generative model : naive Poisson mixtureEM algorithm⇒ Straightforward s...
Stations clustering using temporal usage profiles   Analysis of the results on the Velib’ datasetResultsSetting    One mont...
Stations clustering using temporal usage profiles           Analysis of the results on the Velib’ datasetRailway stations  ...
Stations clustering using temporal usage profiles   Analysis of the results on the Velib’ datasetRailway stations  Etienne ...
Stations clustering using temporal usage profiles           Analysis of the results on the Velib’ datasetParks             ...
Stations clustering using temporal usage profiles   Analysis of the results on the Velib’ datasetParks  Etienne Côme (IFSTT...
Stations clustering using temporal usage profiles           Analysis of the results on the Velib’ datasetSpare time, night ...
Stations clustering using temporal usage profiles   Analysis of the results on the Velib’ datasetSpare time, night  Etienne...
Stations clustering using temporal usage profiles           Analysis of the results on the Velib’ datasetSpare time, night ...
Stations clustering using temporal usage profiles   Analysis of the results on the Velib’ datasetSpare time, night and week...
Stations clustering using temporal usage profiles           Analysis of the results on the Velib’ datasetHousing           ...
Housing   Inhabitants / ha                     0                   200                   400                   600        ...
Stations clustering using temporal usage profiles           Analysis of the results on the Velib’ datasetEmployment (1)    ...
Stations clustering using temporal usage profiles           Analysis of the results on the Velib’ datasetEmployment (2)    ...
Employment (1 and 2)                       Jobs / ha                               0                             500      ...
Stations clustering using temporal usage profiles          Analysis of the results on the Velib’ datasetMixed usage        ...
Stations clustering using temporal usage profiles   Analysis of the results on the Velib’ datasetMixed usage  Etienne Côme ...
Stations clustering using temporal usage profiles   Analysis of the results on the Velib’ datasetCrossing with population/e...
Stations clustering using temporal usage profiles   Analysis of the results on the Velib’ datasetConclusion on stations clu...
Latent Dirichlet Allocation (LDA) for trips activity recognition       Latent Dirichlet Allocation                  (LDA),...
Latent Dirichlet Allocation (LDA) for trips activity recognitionObjectives      Decompose, the trips into interpretable cl...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Data representation : dynamical O/D matricesData repres...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Generative model under LDALDA, backgroundLDA = Latent D...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Generative model under LDALDA for dynamical O/D matrice...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Generative model under LDALDA, for dynamical O/D matric...
Latent Dirichlet Allocation (LDA) for trips activity recognition                               Analysis of the results on ...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ datasetTemporal r...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ datasetSpatial re...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ datasetSpatial re...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ datasetSpatial re...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ datasetSpatial re...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ datasetSpatial re...
Latent Dirichlet Allocation (LDA) for trips activity recognition      Analysis of the results on the Velib’ datasetIncomin...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ datasetSpatial re...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ datasetSpatial re...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ datasetExpected b...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ datasetSpatial re...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset"Lunch", i...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset"Lunch", o...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset"Lunch", b...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset"Work→Hous...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset"Work→Hous...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset"Work→Hous...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset"Evening" ...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset"Evening",...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset"Evening",...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset"Spare tim...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset"Spare tim...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset"Spare tim...
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ datasetConclusion...
Thanks for your attention                               @comeetie, etienne.come@ifsttar.frIfsttarCentre de Marne-la-Vallée...
Upcoming SlideShare
Loading in …5
×

animatics

15,235 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
15,235
On SlideShare
0
From Embeds
0
Number of Embeds
13,832
Actions
Shares
0
Downloads
18
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

animatics

  1. 1. Model-based clustering for BSSusage mining,a case study with the velib’ system of ParisEtienne Côme15/10/2012
  2. 2. OutlineBike Sharing Systems (BSS)What is fun with BSS ? Relatively new systems Rapidly diffusing (EU and US nowadays, Hangzhou, ...) Important sucesses Abundant usage data In interesting and original forms : Origins / Destinations + timestamp Real-time stations balances Interesting and new problematics Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 2 / 75
  3. 3. OutlineOutline1 Introduction Problematics Usage data : trips records Velib’ in few numbers and pictures Tools and approach2 Stations clustering using temporal usage profiles Data representation : count time series Generative model : naive Poisson mixture Analysis of the results on the Velib’ dataset3 Latent Dirichlet Allocation (LDA) for trips activity recognition Data representation : dynamical O/D matrices Generative model under LDA Analysis of the results on the Velib’ dataset Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 3 / 75
  4. 4. Introduction ProblematicsProblematicsOperational objectives Planning new systems : position, size of the stations Quality of service : bikes re-dispatch,... ...Mining objectives Building predictive model of usage Finding spatio-temporal patterns Better understanding of the usages ... Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 4 / 75
  5. 5. Introduction Usage data : trips recordsRaw dataTrips data departure time stamp departure station arrival time stamp arrival station type of subscription! Will be converted in contingency tables (i.e. tensors of counts)Data sources ! Velib’, 2 month Open data : Barclays (Londre), Boston, ... Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 5 / 75
  6. 6. Introduction Velib’ in few numbers and pictures in few numbersBSS size : 1200 stations ≈ 40000 slots ≈ 16000 bikes ≈ 100 000 trips/day 27% trips = day subscription 73% trips = year subscription Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 6 / 75
  7. 7. Introduction Velib’ in few numbers and picturesGlobal behavior 100 000 140 000 Day subscription free use limit 120 000 Year subscription 80 000 free use limit 100 000 60 000 80 000Trips 60 000 40 000 40 000 20 000 20 000 0 0 0 5 10 0 20 40 60 80 100 Distances (Km) Duration (min) F IG . 1: Histograms of trips lengths and durations Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 7 / 75
  8. 8. Introduction Velib’ in few numbers and picturesTemporal effects 35 000 Subscription : 30 000 Short Long 25 000 Trips 20 000 15 000 10 000 5 000 Monday Tuesday Wednesday Thursday Friday Saturday Sunday Time F IG . 2: Number of Trips / hour (short / long subscriptions) Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 8 / 75
  9. 9. Introduction Velib’ in few numbers and picturesTemporal effects 7 500 Average number of trips 5 000 2 500 0 0 2 4 6 8 10 12 14 16 18 20 22 Hours F IG . 3: Number of trips in week day / en week-end Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 9 / 75
  10. 10. Introduction Velib’ in few numbers and picturesSpatial effects F IG . 4: Incoming trips map [6h,7h] for week days Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 10 / 75
  11. 11. Introduction Velib’ in few numbers and picturesSpatial effects 24 20 Mean activity / hour 16 12 8 4 2 4 6 8 10 Distance from the center ("Les Halles") in Km F IG . 5: Stations activities / distance to "Les Halles" Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 11 / 75
  12. 12. Introduction Tools and approachApproach, exploratory data analysisGeneral methodologie Use clustering algorithms to find interesting patterns in the data Confront the found clusters to the city geography and sociology ⇒ Extract important factors that influence BSS system behavior.2 developments : 1 Find clusters of stations with similar temporal usage pattern 2 Find latent activities that govern the BSS system dynamics Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 12 / 75
  13. 13. Introduction Tools and approachTools, model based clusteringGeneral methodologieImagine a data generation process⇒ which include non-observed or latent variablesLatent variables can be discrete or continuousExamples of latent variables Species for flowers Topics for texts Communities for graph vertices ... Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 13 / 75
  14. 14. Introduction Tools and approachGenerative approachClusteringModel-based clustering : 1 Draw the cluster of sample (i) 2 Depending on the cluster draw the observed values of (i) 0.05 0.04 0.03 f(x) 0.02 0.01 0 -80 -60 -40 -20 0 20 40 x F IG . 6: Example of 1D Gaussian mixture model Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 14 / 75
  15. 15. Introduction Tools and approachData generation processGraphical model representation1. Draw the cluster of sample (i) Zi ∼ M(1, π)⇒ π prior proportions of the clusters. Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 15 / 75
  16. 16. Introduction Tools and approachData generation processGraphical model representation2. Depending on the cluster draw the observed values of (i) p(x|Zik = 1) = f (x; θ k ), ∀k ∈ {1, . . . , K }.⇒ f can be tuned to exploit specificities of the problem. Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 16 / 75
  17. 17. Introduction Tools and approachModel based clustering frameworkTask and tools Inferring the parameters : ⇒ EM algorithm or Variational EM for complex models Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 17 / 75
  18. 18. Introduction Tools and approachModel based clustering frameworkTask and tools Inferring the parameters : ⇒ EM algorithm or Variational EM for complex models Finding the clustering ⇒ Byproducts of EM Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 17 / 75
  19. 19. Introduction Tools and approachModel based clustering frameworkTask and tools Inferring the parameters : ⇒ EM algorithm or Variational EM for complex models Finding the clustering ⇒ Byproducts of EM Fixing the number of clusters ⇒ Model selection criterion : BIC, AIC, ICL, perplexity. Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 17 / 75
  20. 20. Stations clustering using temporal usage profiles Stations clustering using temporal usage profilesEtienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 18 / 75
  21. 21. Stations clustering using temporal usage profilesStations clustering using temporal usage profilesObjectives : Find groups of stations with similar temporal usage profiles Temporal usage profiles = incoming, outgoing activity / hour Taking into account the week-days /week-end discrepancy With a model for counts data Cross the results with possible explanatory variables : population, employments, amenities, ... Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 19 / 75
  22. 22. Stations clustering using temporal usage profiles Data representation : count time seriesData representation : count time seriesObserved data : out Xsdt : # of bikes taken at station s during day d at hour t in Xsdt : # of bikes returned at station s during day d at hour t in in out out Xsd = (Xsd1 , . . . , Xsd24 , Xsd1 , . . . , Xsd24 )⇒ X tensor of size N × D × T .⇒ temporal behavior / stations.Variables Xsd (observed) : # of bike leaving/coming Zs (latent) : cluster of station s Wd (observed) : cluster of days (week / week-end) Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 20 / 75
  23. 23. Stations clustering using temporal usage profiles Generative model : naive Poisson mixtureGenerative model : naive Poisson mixture F IG . 7: Graphical model representationParameters, Θ αs = stations attractivity effects π = (π1 , . . . , πK ) cluster proportions λ = (λklt ) temporal profiles of the clusters Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 21 / 75
  24. 24. Stations clustering using temporal usage profiles Generative model : naive Poisson mixtureGenerative modelNaive Poisson mixture Zs ∼ M(1, π) Xsd1 ⊥ . . . ⊥ XsdT ⊥ ⊥ | {Zsk = 1, Wdl = 1} Xsdt |{Zsk = 1, Wdl = 1} ∼ P(αs λklt )Constraints Dl λklt = DT , ∀k ∈ {1, . . . , K }, l,twith Dl number of day in cluster l. Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 22 / 75
  25. 25. Stations clustering using temporal usage profiles Generative model : naive Poisson mixtureParameters estimation, likelihoodMarginal likelihood   L(Θ; X) = log  πk p(Xsdt ; αs λklt )Wdl  (1) s k d,t,lCompleted likelihood   Lc(Θ; X, Z) = Zsk log πk p(Xsdt ; αs λklt )Wdl  (2) s,k d,t,lwhere Z is unknown. Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 23 / 75
  26. 26. Stations clustering using temporal usage profiles Generative model : naive Poisson mixtureEM algorithm⇒ Straightforward solution for parameters estimation EM :E stepConditional expectation of Lc given the current parameters   E[Lc(Θ, x, Z)|x, Θ(q) ] = tsk log πk p(xsdt ; αs λklt )Wdl  (3) s,k d,t,lwith tsk the posteriori probabilities : (q) (q) (q) πk d,t,l p(xsdt ; αs λklt )Wdl tsk = (q) (q) (q) (4) k πk d,t,l p(xsdt ; αs λklt )Wdl Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 24 / 75
  27. 27. Stations clustering using temporal usage profiles Generative model : naive Poisson mixtureEM algorithm⇒ Straightforward solution for parameters estimation EM :M stepMaximization of the lower bound with respect to the parameters 1 αs : mean station activity αs = ˆ DT d,t Xsdt , 1 πk : proportion of cluster k , πk = ˆ N s tsk λklt : activity of time frame t for cluster k , for week day or during the week-end (day cluster l) ˆ 1 λklt = tsk Wdl Xsdt (5) s tsk αs d Wdl s,d Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 25 / 75
  28. 28. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ datasetResultsSetting One month of data (September) Number of clusters (K=8) set manually ⇒ good trade off between interpretability and fit of the clusteringOutputs Zs : station s clusters λk : temporal profile of cluster k αs : stations s attractivity Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 26 / 75
  29. 29. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ datasetRailway stations Week Week-end 5 4 Departures 3 2 1 Activity 0 5 4 3 Arrivals 2 1 0 0 5 10 15 20 0 5 10 15 20 Hours Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 27 / 75
  30. 30. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ datasetRailway stations Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 28 / 75
  31. 31. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ datasetParks Week Week-end 5 4 Departures 3 2 1 Activity 0 5 4 3 Arrivals 2 1 0 0 5 10 15 20 0 5 10 15 20 Hours Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 29 / 75
  32. 32. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ datasetParks Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 30 / 75
  33. 33. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ datasetSpare time, night Week Week-end 5 4 Departures 3 2 1 Activity 0 5 4 3 Arrivals 2 1 0 0 5 10 15 20 0 5 10 15 20 Hours Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 31 / 75
  34. 34. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ datasetSpare time, night Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 32 / 75
  35. 35. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ datasetSpare time, night and week-end Week Week-end 5 4 Departures 3 2 1 Activity 0 5 4 3 Arrivals 2 1 0 0 5 10 15 20 0 5 10 15 20 Hours Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 33 / 75
  36. 36. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ datasetSpare time, night and week-end Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 34 / 75
  37. 37. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ datasetHousing Week Week-end 5 4 Departures 3 2 1 Activity 0 5 4 3 Arrivals 2 1 0 0 5 10 15 20 0 5 10 15 20 Hours Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 35 / 75
  38. 38. Housing Inhabitants / ha 0 200 400 600 800 1 000 1 200
  39. 39. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ datasetEmployment (1) Week Week-end 5 4 Departures 3 2 1 Activity 0 5 4 3 Arrivals 2 1 0 0 5 10 15 20 0 5 10 15 20 Hours Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 37 / 75
  40. 40. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ datasetEmployment (2) Week Week-end 5 4 Departures 3 2 1 Activity 0 5 4 3 Arrivals 2 1 0 0 5 10 15 20 0 5 10 15 20 Hours Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 38 / 75
  41. 41. Employment (1 and 2) Jobs / ha 0 500 1 000 1 500 2 000
  42. 42. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ datasetMixed usage Week Week-end 5 4 Departures 3 2 1 Activity 0 5 4 3 Arrivals 2 1 0 0 5 10 15 20 0 5 10 15 20 Hours Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 40 / 75
  43. 43. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ datasetMixed usage Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 41 / 75
  44. 44. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ datasetCrossing with population/employments/services rates hab/ha emp/ha serv/ha com/ha 162 237 4.2 3.7 Spare time (1) 367 189 6.3 4.4 Spare time (2) 261 322 7.7 6.9 Parks 172 90 2 1.7 Railway stations 209 206 2.4 1.8 Housing 375 108 3.8 2.7 Employment (1) 138 409 4.5 2.8 Employment (2) 157 456 5.7 5.6 Mixed usage 301 163 3.8 2.8TAB . 1: Mean of each cluster with respect to population, employment,services and shops densities . Sources "Recensement 2008", "Basepermanente des équipements", Insee. Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 42 / 75
  45. 45. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ datasetConclusion on stations clusteringDiscussion on the model Model adapted to counts Scaling factors for stations important Stations described by incoming and outgoing flow dynamics Taking into account week-day week-end differencesDiscussion on the results Clusters are interpretable Population, employment and amenities densities are highly explanatory for the clusters Temporal profiles are also interpretable and informative Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 43 / 75
  46. 46. Latent Dirichlet Allocation (LDA) for trips activity recognition Latent Dirichlet Allocation (LDA), for trips activity recognition Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 44 / 75
  47. 47. Latent Dirichlet Allocation (LDA) for trips activity recognitionObjectives Decompose, the trips into interpretable clusters ⇒ look for stationarities and change points in the OD dynamics LDA with documents = small bags of successive tripsAnalyse the found clusters with respect to their : Temporal positions, cycles Spatial distribution of flows Spatial distribution of incoming / outgoing flows per stations Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 45 / 75
  48. 48. Latent Dirichlet Allocation (LDA) for trips activity recognition Data representation : dynamical O/D matricesData representation : dynamical O/D matricesObserved data : Xijt : # of bikes that were 1 taken at station i 2 returned at station j 3 at time t t ∈ {1, . . . , Nt } : i, j ∈ {1, . . . , Ns } : set of stations⇒ Xijt tensor of dimension Ns × Ns × Nt .⇒ taking into account spatial and temporal BSS behavior Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 46 / 75
  49. 49. Latent Dirichlet Allocation (LDA) for trips activity recognition Generative model under LDALDA, backgroundLDA = Latent Dirichlet Allocation Bayesian mixture for discrete data ⇒ originally to find topics in text corpus Each document (bag of words) is a mixture of topics Each topic has its own words probabilities vector F IG . 8: Graphical model representation of LDA. Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 47 / 75
  50. 50. Latent Dirichlet Allocation (LDA) for trips activity recognition Generative model under LDALDA for dynamical O/D matrices analysisHypothesis : Local stationarity of BSS behaviour / OD Cyclostationarity : week, daySmall bags of successive trips ≈ stationarity of OD⇒ Documents (bags of words) = bags of successive trips (5000), with : Words = Origin/Destination couples Topics = Latent activities Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 48 / 75
  51. 51. Latent Dirichlet Allocation (LDA) for trips activity recognition Generative model under LDALDA, for dynamical O/D matrices analysisFor each activity a, draw an O/D matrices generator : Λa ∼ D(β)For each "bag of trips" t ∈ {1, . . . , Nt } : 1 Draw the activities proportions : πt ∼ D(α) 2 For each trips of the bag t : Draw its activity A : A ∼ M(1, πt ) Draw an O/D couple D using activity A generator : D ∼ M(1, ΛA ) Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 49 / 75
  52. 52. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ datasetFixing the number of activitiesperplexity analysis Perplexity = f( likelihood of test data ) Clear drop off at K=5 165000 q q q 160000 perplexity 155000 q q q q q q q q q q q 4 8 12 KF IG . 9: Perplexity on the September dataset with respect to the number oflatent activities. Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 50 / 75
  53. 53. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ datasetTemporal results : πt 9000trips / hour 6000 3000 0 avril 11 avril 18 avril 25 F IG . 10: Temporal evolution of πtRemarks : Cyclostationarity clearly visible (even holidays) Low mixture between the latent activities Interpretable temporal clusters : Home ↔ Work, Lunch,... Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 51 / 75
  54. 54. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ datasetSpatial results : Λa as flowsF IG . 11: Latent activity "House→Work commute", flows (blue for f=10/10 000) Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 52 / 75
  55. 55. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ datasetSpatial results : Λa as flows F IG . 12: Latent activity "Lunch", flows (blue for f=10/10 000) Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 53 / 75
  56. 56. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ datasetSpatial results : Λa as flowsF IG . 13: Latent activity "Work→House commute", flows (blue for f=10/10 000) Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 54 / 75
  57. 57. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ datasetSpatial results : Λa as flows F IG . 14: Latent activity "Evening", flows (blue for f=10/10 000) Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 55 / 75
  58. 58. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ datasetSpatial results : Λa as flows F IG . 15: Latent activity "Spare time", flows (blue for f=10/10 000) Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 56 / 75
  59. 59. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ datasetIncoming / Outgoing specificities, question :Which stations have an increased in/out-degree for a latent activity a ? a Introduce stations incoming specificities ISs and outgoing specificities OSsa : a a a ag g ISs = log(pins /pins ), OSs = log(pouts /pouts ), (6) a a with pins , pouts the probabilities that a trips end/start in station s for activity a : a pins = Λa , pouts = js a Λa , sj j j g g and pins , pouts the global probabilities that a trips end/start in station s : g j,t Xjst g j,t Xsjt pins = , pouts = . i,j,t Xijt i,j,t Xijt Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 57 / 75
  60. 60. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ datasetSpatial results : incoming specificitiesF IG . 16: Latent activity "House→Work commute", stations incomingspecificity Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 58 / 75
  61. 61. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ datasetSpatial results : outgoing specificitiesF IG . 17: Latent activity "House→Work commute", stations outgoing specificity Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 59 / 75
  62. 62. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ datasetExpected bike balance, question :Positive/negative bike balance of stations for a latent activity a ? The O/D matrix D follow a multinomial law of parameter Ndep (number of trips) and Λa : D ∼ M(Ndep , Λa ), The bike balance Bs for a station s is thus given by : Incoming bikes Outgoing bikes Bs = Djs − Dsj j j And the expectation of the balance vector B is thus equal to : E[B] = Ndep (Λa )t − Λa v, (7) with v = (1, . . . , 1)t . Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 60 / 75
  63. 63. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ datasetSpatial results : expected bike balance Balance -30 -20 -10 0 10 20 30F IG . 18: Latent activity "House→Work commute", stations expected balanceswith Ndep = 10 000 Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 61 / 75
  64. 64. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset"Lunch", incoming specificity F IG . 19: Stations incoming specificity Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 62 / 75
  65. 65. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset"Lunch", outgoing specificity F IG . 20: Stations outgoing specificity Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 63 / 75
  66. 66. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset"Lunch", balance Balance -30 -20 -10 0 10 20 30 F IG . 21: Stations expected balances with Ndep = 10 000 Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 64 / 75
  67. 67. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset"Work→House commute", incoming specificity F IG . 22: Stations incoming specificity Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 65 / 75
  68. 68. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset"Work→House commute", outgoing specificity F IG . 23: Stations outgoing specificity Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 66 / 75
  69. 69. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset"Work→House commute", balance Balance -30 -20 -10 0 10 20 30 F IG . 24: Stations expected balances with Ndep = 10 000 Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 67 / 75
  70. 70. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset"Evening" incoming specificity F IG . 25: Stations incoming specificity Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 68 / 75
  71. 71. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset"Evening", outgoing specificity F IG . 26: Stations outgoing specificity Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 69 / 75
  72. 72. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset"Evening", balance Balance -30 -20 -10 0 10 20 30 F IG . 27: Stations expected balances with Ndep = 10 000 Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 70 / 75
  73. 73. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset"Spare time", incoming specificity F IG . 28: Stations incoming specificity Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 71 / 75
  74. 74. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset"Spare time", outgoing specificity F IG . 29: Stations outgoing specificity Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 72 / 75
  75. 75. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset"Spare time", balance Balance -30 -20 -10 0 10 20 30 F IG . 30: Stations expected balances with Ndep = 10 000 Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 73 / 75
  76. 76. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ datasetConclusion on LDA for activities recognition Interpretable latent activities Give good picture of city "pulse" and geography Better understanding of the system behaviour Strong evidence of cyclostationarity Week-day / Week-end pattern Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 74 / 75
  77. 77. Thanks for your attention @comeetie, etienne.come@ifsttar.frIfsttarCentre de Marne-la-ValléeBatiment le “Descartes 2”2, rue de la Butte Verte F-93166 Noisy le Grand cedexMél. etienne.come@ifsttar.frTél. +33 (0)1 45 92 56 57Site : www.ifsttar.fr Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 75 / 75

×