Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Mobility Insights at Swisscom :
Understanding Collective Mobility
in Switzerland
Spark Summit, October 2016
francois.garil...
Agenda
Intro
Smart-Data
Big Data Architecture
Trajectory Classification
Streaming
Data challenges
Introduction : Positioning
Positioning users in a modern
network
no triangulation at scale
positioning based on cell attachement history, prec ~200m
...
Trajectory data
mining
time series reconstruction
trajectory segmentation
map matching, clustering
mode of transport detec...
How to create value with
positioning at Swisscom ?
with competitive analytics & data sources,
and by making sure it embodi...
Smart Data
On (not) tracking (any users)
"Swisscom strictly complies with all applicable legislations, in
particular with the telecom...
Smart Data : Big Data without Big Brother
Privacy preservation is an asset
It makes sense to care as much about your custo...
Swisscom mobile subscribers
source: xavierstuder.com, MD&A reports
Our choices
public good applications: making Switzerland run better,
understanding places, not individuals,
anonymized agg...
A first product : City
"It's a dream for civil engineers" -- Alexandre Machu, Urban
systems engineer, Pully
Demo time
Usages
New roads to divert transit traffic out of downtown (informs a 50M$
project)
Parking lot expansion and transformati...
Big Data architecture
In the backend
Spark configuration essentials for enterprise
jobs
spark.executor.memory="not the default 1g"
spark.kryo.registrator="some...
Scala (1/2)
type ChronoHistory = List[UEupdate] @@ Chronological
type AnteChronoHistory = List[UEupdate] @@ AnteChronologi...
Scala (2/2)
implicit def reverseChrono(l: ChronoHistory): AnteChronoHistory = l.reve
implicit def reverseAnteChrono(l: Ant...
>
Trajectory Classification
What is the proportion of trips
associated with trains?
Mode of Transport Detection
Input: Sequence of network events
Output: Mode of transport (train vs. other)
Network events a...
Bursty Cell
Number of devices vs. minute of day
Burstiness
Random process with mean and variance , the relative variance isμ σ
2
.D =
σ
2
μ
Machine Learning with Spark
Periodic Spark job to compute cell features
Supervised training on labeled data (train vs. oth...
Spark (1/2)
val labeledPoints: RDD[LabeledPoint] = data.map {
case (transportMode, tripFeatures) =>
LabeledPoint(
labelOf(...
Spark (2/2)
// train a model for performance evaluation
val model = trainNewModel(trainingData)
// Evaluate model on test ...
Streaming Analytics
Road conditions on highways
Selecting users on a path of Interest
Graph matching
Locality-sensitive hashing :
A family H of hashing functions is
-sensitive if:
if then
if then
More:
Locali...
Computing speeds: Solving graph
constraints
given a history of cells, where was the user, exactly ? (twice)
what's the pat...
Checkpointing: Set the checkpoint
interval
are you checkpointing too often ?
every batches, you'll need batches to recover...
Data Challenges
Crucial elements
Quality, reliability of data sources
Automated ground truth checking
sensors
TEMS fleet
What's the ground...
Questions ?
Upcoming SlideShare
Loading in …5
×

Mobility insights at Swisscom - Understanding collective mobility in Switzerland

587 views

Published on

Swisscom is the leading mobile-service provider in Switzerland, with a market share high enough to enable us to model and understand the collective mobility in every area of the country. To accomplish that, we built an urban planning tool that helps cities better manage their infrastructure based on data-based insights, produced with Apache Spark, YARN, Kafka and a good dose of machine learning. In this talk, we will explain how building such a tool involves mining a massive amount of raw data (1.5E9 records/day) to extract fine-grained mobility features from raw network traces. These features are obtained using different machine learning algorithms. For example, we built an algorithm that segments a trajectory into mobile and static periods and trained classifiers that enable us to distinguish between different means of transport. As we sketch the different algorithmic components, we will present our approach to continuously run and test them, which involves complex pipelines managed with Oozie and fuelled with ground truth data. Finally, we will delve into the streaming part of our analytics and see how network events allow Swisscom to understand the characteristics of the flow of people on roads and paths of interest. This requires making a link between network coverage information and geographical positioning in the space of milliseconds and using Spark streaming with libraries that were originally designed for batch processing. We will conclude on the advantages and pitfalls of Spark involved in running this kind of pipeline on a multi-tenant cluster. Audiences should come back from this talk with an overall picture of the use of Apache Spark and related components of its ecosystem in the field of trajectory mining.

Published in: Technology
  • Be the first to comment

Mobility insights at Swisscom - Understanding collective mobility in Switzerland

  1. 1. Mobility Insights at Swisscom : Understanding Collective Mobility in Switzerland Spark Summit, October 2016 francois.garillot@swisscom.com @huitseeker mohamed.kafsi@swisscom.com @mou7
  2. 2. Agenda Intro Smart-Data Big Data Architecture Trajectory Classification Streaming Data challenges
  3. 3. Introduction : Positioning
  4. 4. Positioning users in a modern network no triangulation at scale positioning based on cell attachement history, prec ~200m cell-to-cell handover, prec ~50m around limit Timing Advance (roundtrip) : better results on good data sources
  5. 5. Trajectory data mining time series reconstruction trajectory segmentation map matching, clustering mode of transport detection ...
  6. 6. How to create value with positioning at Swisscom ? with competitive analytics & data sources, and by making sure it embodies the right values.
  7. 7. Smart Data
  8. 8. On (not) tracking (any users) "Swisscom strictly complies with all applicable legislations, in particular with the telecommunications law and the data protection initiative." Jürg Studerus, Swisscom Senior Manager, Corporate Responsibility
  9. 9. Smart Data : Big Data without Big Brother Privacy preservation is an asset It makes sense to care as much about your customer as they do about you. We technically enforce this answering only synoptic questions, no individual ones, with data flow control : we neutralize quasi-identifiers at every stage
  10. 10. Swisscom mobile subscribers source: xavierstuder.com, MD&A reports
  11. 11. Our choices public good applications: making Switzerland run better, understanding places, not individuals, anonymized aggregations
  12. 12. A first product : City "It's a dream for civil engineers" -- Alexandre Machu, Urban systems engineer, Pully
  13. 13. Demo time
  14. 14. Usages New roads to divert transit traffic out of downtown (informs a 50M$ project) Parking lot expansion and transformation (informs a 10M$ project) Electric car charging station deployment
  15. 15. Big Data architecture
  16. 16. In the backend
  17. 17. Spark configuration essentials for enterprise jobs spark.executor.memory="not the default 1g" spark.kryo.registrator="something custom" // among others spark.shuffle.service.enabled="true" spark.dynamicAllocation.enabled="true" spark.deploy.recoveryMode="ZOOKEEPER" spark.deploy.recoveryDirectory="/path/to/state" spark.deploy.zookeeper.url="quorumMachine1:2181, ..." NOT the only valuable settings, see https://techsuppdiva.github.io
  18. 18. Scala (1/2) type ChronoHistory = List[UEupdate] @@ Chronological type AnteChronoHistory = List[UEupdate] @@ AnteChronological implicit class Chrono(l: List[UEupdate]) { def asChrono: ChronoHistory = { chronoCheck(l) l.asInstanceOf[ChronoHistory] } def asAnteChrono: AnteChronoHistory = { anteChronoCheck(l) l.asInstanceOf[AnteChronoHistory] } }
  19. 19. Scala (2/2) implicit def reverseChrono(l: ChronoHistory): AnteChronoHistory = l.reve implicit def reverseAnteChrono(l: AnteChronoHistory): ChronoHistory = l.
  20. 20. > Trajectory Classification
  21. 21. What is the proportion of trips associated with trains?
  22. 22. Mode of Transport Detection Input: Sequence of network events Output: Mode of transport (train vs. other) Network events associated with cells Create fingerprints of cells Intuition: cells with intermittent increases in the number of connections are associated with collective mode of transports
  23. 23. Bursty Cell Number of devices vs. minute of day
  24. 24. Burstiness Random process with mean and variance , the relative variance isμ σ 2 .D = σ 2 μ
  25. 25. Machine Learning with Spark Periodic Spark job to compute cell features Supervised training on labeled data (train vs. others) Training and test with Spark ML
  26. 26. Spark (1/2) val labeledPoints: RDD[LabeledPoint] = data.map { case (transportMode, tripFeatures) => LabeledPoint( labelOf(transportMode).toDouble, featuresToFeatureVector(tripFeatures) ) } // generate labeled data labeledPoints.cache() def trainNewModel = // Fix the used model new LogisticRegressionWithLBFGS() .setIntercept(true) .setNumClasses(numberOfClasses) .run(_: RDD[LabeledPoint])
  27. 27. Spark (2/2) // train a model for performance evaluation val model = trainNewModel(trainingData) // Evaluate model on test instances and compute test error val labelAndPreds = testData.map { point => val prediction = model.predict(point.features) (point.label, prediction) } val testErr = (labelAndPreds .filter(r => r._1 != r._2) .count().toDouble) / testData.count() // train final model on the whole dataset val finalModel = trainNewModel(labeledPoints)
  28. 28. Streaming Analytics
  29. 29. Road conditions on highways
  30. 30. Selecting users on a path of Interest
  31. 31. Graph matching Locality-sensitive hashing : A family H of hashing functions is -sensitive if: if then if then More: LocalitySensitiveHashingBySpark, Uber, Spark Summit 2016 AGentleIntroductiontoLocality-SensitiveHashingwith ApacheSpark, Scala By The Bay 2015 (r, cr, , )p1 p2 p– q ≤ r P [h(q) = h(p)] ≥rH p1 p– q ≥ cr P [h(q) = h(p)] ≤rH p2
  32. 32. Computing speeds: Solving graph constraints given a history of cells, where was the user, exactly ? (twice) what's the path between 2 positions ? linear query per user
  33. 33. Checkpointing: Set the checkpoint interval are you checkpointing too often ? every batches, you'll need batches to recover from checkpointing time loss make sure k p k ≥ p
  34. 34. Data Challenges
  35. 35. Crucial elements Quality, reliability of data sources Automated ground truth checking sensors TEMS fleet What's the ground truth for mode of transport, domicile, etc ? Colleagues and friends volunteers
  36. 36. Questions ?

×