On (not) tracking (any users)
"Swisscom strictly complies with all applicable legislations, in
particular with the telecommunications law and the data
Jürg Studerus, Swisscom Senior Manager, Corporate Responsibility
Smart Data : Big Data without Big Brother
Privacy preservation is an asset
It makes sense to care as much about your customer as they do about you.
We technically enforce this
answering only synoptic questions, no individual ones,
with data flow control : we neutralize quasi-identifiers at every stage
Swisscom mobile subscribers
source: xavierstuder.com, MD&A reports
public good applications: making Switzerland run better,
understanding places, not individuals,
all results presented aggregated, anonymized.
Spark configuration essentials for enterprise
spark.executor.memory="not the default 1g"
spark.kryo.registrator="something custom"// and companions
NOT the only valuable settings, see https://techsuppdiva.github.io
Locality-sensitive hashing short histories
A family H of hashing functions is -sensitive if:(r, cr, , )p1 p2
p–q ≤ r P [h(q) = h(p)] ≥rH p1
p–q ≥ cr P [h(q) = h(p)] ≤rH p2
Locality Sensitive Hashing By Spark, Uber, Spark Summit
A Gentle Introduction to Locality-Sensitive Hashing with Apache Spark,
Scala by The Bay
Computing speeds: Solving graph
a speed comes from a user well-positioned, twice
plus route knowledge
given a history of cells, where was the user, exactly ?
Solving graph constraints
just a few users left in computation at this stage
so a lot invested in > linear complexity algorithms
Quality, reliability of data sources
Automated ground truth checking
What's the ground truth for mode of transport, domicile, etc ?
Colleagues and friends volunteers
In the works
More features (see you Spark Summit EU!)
Streaming for city