SlideShare a Scribd company logo
1 of 68
Download to read offline
Swiss Transport in Real Time:
Tribulations in the Big Data Stack
Alexandre Masselot
Dev. Wednesday
March 2017
@alex_mass
Swiss Transport in Real Time:
Tribulations in the Big Data Stack
Alexandre Masselot
Dev. Wednesday
March 2017
@alex_mass
AVENUE DU THÉÂTRE, 7 – 1005 LAUSANNE > SUISSE > WWW.OCTO.CH
OCTO Suisse RECRUTE
5 consultants en 2017
rejoins.octo.com
Architecte
Software
Craftsman DataGeek
Coach
Méthodo
Expert
DevOps
Consultant
en Stratégie
Is it possible to build
a simple scalable infrastructure, to
dispatch, store, transform

and visualize “near real time” data
and achieve a posteriori analysis?
This is only
a POC!!!
Finding a dataset
• social media
• finance
• sport
• energy
• transport
• log analysis
• meteorology
• bioinformatics
• personalized health
• monitoring
• security
• IOT
Finding a dataset
• social media
• finance
• sport
• energy
• transport
• log analysis
• meteorology
• bioinformatics
• personalized health
• monitoring
• security
• IOT
www.voev.ch
www.voev.ch
www.voev.ch
www.voev.ch
AAGL Autobus AG Liestal
AAGR Auto AG Rothenburg
AAGS Auto AG Schwyz
AAGU AUTO AG URI
AB Appenzeller Bahnen AG
ABl Autolinee Bleniesi SA
ABF Autobusbetrieb Freienbach
AFA Automobilverkehr Frutigen Adelboden AG
AMSA Autolinea Mendrisiense SA
AOT Autokurse Oberthurgau AG
ARAG Rottal Auto AG
ARBAG Aletsch Riederalp Bahnen AG
ARL Autolinee Regionali Luganesi
AS Autobetrieb Sernftal AG
ASGS Autotransports Sion-Grône-Sierre
ASm Aare Seeland mobil AG
AVG Autoverkehr Grindelwald AG
AVJ Autotransports de la Vallée de Joux
AWA Autobetrieb Weesen-Amden
AZZK Autobus Zürich-Zollikon-Küsnacht
BB Bürgenstock Bahnen
BBA Busbetrieb Aarau AAR bus+bahn
BBBW Bus-Betrieb Binggeli
BDWM BDWM Transport AG
BGU BGU Busbetrieb Grenchen und Umgebung AG
BLAG Busland AG
BLM Bergbahn Lauterbrunnen-Mürren AG
BLS BLS AG
BLT BLT Baselland Transport AG
BLWE Busbetrieb Lichtensteig-Wattwil-Ebnat-Kappel
BOB Berner Oberland-Bahnen AG
BOGG Busbetrieb Olten Gösgen Gäu AG
BOS BUS Ostschweiz AG
BOS-M BOS Management AG
BRB Brienz Rothorn Bahn AG
BRER Busbetrieb Rapperswil-Eschenbach-Rüti
BRSB Braunwald-Standseilbahn AG
BSU Busbetrieb Solothurn und Umgebung AG
BVB Basler Verkehrs-Betriebe
CGN CGN SA
CJ Compagnie des chemins de fer du Jura (C.J.) SA
CROS Crossrail AG
DBSCH DB Schenker Rail Schweiz GmbH
DBZ Dolderbahn Zürich
ETB Emmentalbahn, Huttwil
FART Ferrovie Autolinee Regionali Ticinesi
FB Forchbahn AG
FC FUNICAR Kursbetriebe AG
FLP Ferrovie Luganesi SA
FW Frauenfeld-Wil-Bahn AG
GGB Gornergrat Bahn AG HBSAG Hafenbahn Schweiz AG
JB Jungfraubahn AG
LEB Chemin de fer Lausanne-Echallens-Bercher
LLB AG für Verkehrsbetriebe Leuk-Leukerbad und Umgebung
LSMS Schilthornbahn AG
MBC Transports de la région Morges-Bière-Cossonay SA
MG Ferrovia Monte Generoso SA
MGB Matterhorn Gotthard Bahn
MIB Kraftwerke Oberhasli AG Meiringen-Innertkirchen-Bahn
MOB Chemin de fer Montreux-Oberland Bernois
MVR Transports Montreux-Vevey-Riviera SA
NHB Niederhornbahn
NB Niesenbahn AG
NStCM Chemin de fer Nyon-St. Cergue-Morez
OeBB Oensingen-Balsthal-Bahn
PAG PostAuto Schweiz AG
PB PILATUS-BAHNEN AG
RA RegionAlps SA
RAILG Railgate AG
RB RIGI BAHNEN AG
RBL Regionalbus Lenzburg AG
RBS Regionalverkehr Bern-Solothurn AG
REGO Regiobus Gossau AG
RhB Rhätische Bahn AG
RNCH DB Schenker Rail Schweiz GmbH
RLC railCare
RVBW Regionale Verkehrsbetriebe Baden-Wettingen AG
RVSH SchaffhausenBus, Regionale Verkehrsbetriebe SH AG
SBB SBB AG
SBB-D SBB GmbH
SBC Stadtbus Chur AG
SBF Stadtbus Frauenfeld
SBW Stadtbus Winterthur
SMC Cie de Chemin de Fer+d'Autobus Sierre-Montana-Crans (SMC) SA
SMGN Société des Mouettes Genevoises Navigation SA
SMtS Funiculaire St-Imier - Mont-Soleil SA
SOB Schweizerische Südostbahn AG
SRTAG Swiss Rail Traffic AG
SSIF Società Subalpina di Imprese Ferroviarie S.p.A.
ST Sursee-Triengen-Bahn
STB Sensetalbahn AG
STI Verkehrsbetriebe STI AG
SVB BERNMOBIL Städt. Verkehrsbetriebe Bern
SWAG Seilbahn Weissenstein AG
SZU Sihltal Zürich Uetliberg Bahn SZU AG
THURBO Thurbo AG
TL Transports publics de la région lausannoise SA
TMR TRANSPORTS DE MARTIGNY ET REGIONS SA
TPC Transports Publics du Chablais SA
TPF Transports publics fribourgeois SA
TPG Transports publics genevois
TPL Trasporti Pubblici Luganesi SA
TPN Transports Publics de la Région Nyonnaise SA
TRN Transports Publics Neuchâtelois SA
TRAVYS TRAVYS SA Transports Vallée de Joux-Yverdon-Sainte-Croix
TSD Theytaz Excursions Sion
VB Verkehrsbetriebe Biel
VBD Verkehrsbetrieb der Landschaft Davos
VBG VBG Verkehrsbetriebe Glattal AG
VBH Verkehrsbetriebe Herisau
VBL Verkehrsbetriebe Luzern AG
VBSG Verkehrsbetriebe St.Gallen
VBSH Verkehrsbetriebe Schaffhausen
VBZ Verkehrsbetriebe Zürich
VMCV Transports publics Vevey-Montreux-Chillon-Villeneuve
VSSU Verband Schweizerischer Schifffahrtsunternehmen
VZO Verkehrsbetriebe Zürichsee und Oberland AG
WAB Wengernalpbahn AG
WB Waldenburgerbahn AG
WRS Widmer Rail Services Personal AG
WSB Wynental- und Suhrentalbahn AAR bus+bahn
ZB zb Zentralbahn AG
ZVB Zugerland Verkehrsbetriebe AG
ZVV Zürcher Verkehrsverbund ZVV
AES Ägerisee Schifffahrt AG
BLS BLS AG Schifffahrt Berner Oberland Thuner- und Brienzersee
BPG Basler Personenschifffahrt AG
BSG Bielersee-Schifffahrts-Gesellschaft AG
CGN CGN SA
FHM Zürichsee-Fähre Horgen-Meilen AG
LNM Société de Navigation Lacs de Neuchâtel et Morat SA
NLM Navigazione Lago Maggiore
SBS SBS Schifffahrt AG
SGG Schifffahrts-Genossenschaft Greifensee
SGH Schifffahrtsgesellschaft Hallwilersee AG
SGV Schifffahrtsgesellschaft des Vierwaldstättersees
SGZ Schifffahrtsgesellschaft für den Zugersee AG / Ägerisee
SNL Società Navigazione del Lago di Lugano SA
SW Schiffsbetrieb Walensee AG
URh Schweiz. Schifffahrtsgesellschaft Untersee und Rhein AG
ZSG Zürichsee-Schifffahrtsgesellschaft AG
What do we propose?
https://github.com/alexmasselot/swiss-transport-realtime
Is it possible to build
a simple scalable infrastructure, to
dispatch, transform and visualize

“near real time” massive data
and achieve a posteriori analysis?
offline
real time
users
data analysts
vehicles
positions
station
boards
Is it possible to build
a simple scalable infrastructure, to
dispatch, transform and visualize

“near real time” massive data
and achieve a posteriori analysis?
offline
real time
transform
format
dispatch
storage
expose
analysis
visualization
users
data analysts
vehicles
positions
station
boards
offline
real time
transform
format
dispatch
storage
expose
analysis
visualization
users
data analysts
vehicles
positions
station
boards
This is only
a POC!!!
Is it possible to build
a simple scalable infrastructure, to
dispatch, transform and visualize

“near real time” massive data
and achieve a posteriori analysis?
offline
real time
transform
format
dispatch
storage
expose
analysis
visualization
users
data analysts
vehicles
positions
station
boards
dispatch
vehicles
positions
station
boards
Acquire
SBB rest api
vehicles
positions
vehicles
positions
station
boards
station
boards
OpenData
transport api
{
id: 12345xyz,
category: IR,
name: IR 72928,
destination: Alpnach,
position: {
lat: 46.940582,
lon: 8.275442
}
}
station
boards
station
boards
{
station: {
name: Lausanne,
location: {lat, long}
},
departures: [
{
to:Domodossola,
time: 20:13,
delayed: 4,
prognosis: {
capacity2nd: 3,
capacity1st: 1
}
},
{…}
positionspositions
Dispatch
offline
real time
transform
format
dispatch
storage
expose
analysis
visualization
users
data analysts
vehicles
positions
station
boards
dispatch
vehicles
positions
station
boards
Events are streamed to
“Kafka is used for building real-
time data pipelines and
streaming apps. It is horizontally
scalable, fault-tolerant, wicked
fast, and runs in production in
thousands of companies.”
kafka.apache.org
real time
offline
Kafka, RabbitMQ, ZeroMQ…
TIMTOWTDI
Store
format
dispatch
storage
logstash elasticsearch
flat fileflat fileflat fileflat fileflat fileflat fileflat files
Logstash, Flume, Filebeat…
TIMTOWTDI
Elasticsearch, HBase, Cassandra…
TIMTOWTDI
real time
transform
dispatch
expose visualization
Is it possible to build
a simple scalable infrastructure, to
dispatch, transform and visualize

“near real time” massive data
and achieve a posteriori analysis?
Stream transformation
• We have an input flow of events and want to:
• know if a train is stopped into a station;
• know if a train as exited the network;
• expose an aggregated station board.
• We need to:
• digest the input flow;
• process with temporary state persistance;
• be able to expose snapshots.
Stream transformation
• Scala is The language for Big Data (functional & OO)

• Akka (actors):
• lightweight entities (one per train, per station);
• easy asynchronous communications;
• the perfect use case.
• Play framework for REST service, configuration etc.
Spark Streaming, Storm, Flink…
TIMTOWTDI
DevOps
: putting everything together
• The “simple” infrastructure is not so light;
• A developper should have everything on his/her
laptop without polluting the machine;
• Docker comes to the rescue:
• lightweight containers,
• pre-existing images,
• docker-compose to describe the infrastructure
• deploy directly to a cloud.
transform
format
dispatch
storage
expose
analysis
visualization
users
data analysts
vehicles
positions
station
boards
Performance: 2 numbers
15% CPU: nodeJS + kafka + akka + play
15x faster ajax queries (vs SBB rest)

to gather 30 times more trains
Is it possible to build
a simple scalable infrastructure, to
dispatch, transform and visualize

“near real time” massive data
and achieve a posteriori analysis?
A scalable infrastructure
Kafka partitioning and zookeeper
Logstash ? (but naturally recover on failure)
Elasticsearch partitioning
Spark streaming
distributed by essence

& write ahead logs
Akka
aka cluster, supervisors

& failure strategy
Docker
Kubernetes

AWS, GCE, Exoscale, Hidora
offline
real time
users
data analysts
vehicles
positions
station
boards
Is it possible to build
a simple scalable infrastructure, to
dispatch, transform and visualize

“near real time” massive data
and achieve a posteriori analysis?
JS for large data set
• Only a rendering library (but fast);
• Use a flux architecture;
• Built by Facebook. Dispatcher
Store
View
Action
Action
JavaScript for big data viz
• React can handle viz >100k elements (don’t show
them individually!)
• Beware of performance issue;
• Testing is not an option.
ng(2) + rx/js +d3.js + pixi.js (GPU)
http://blog.octo.com/en/visualizing-massive-data-streams-a-public-transport-use-case/
http://blog.octo.com/en/d3-js-transitions-killed-my-cpu-a-d3-js-pixi-js-comparison/
Is it possible to build
a simple scalable infrastructure, to
dispatch, transform and visualize

“near real time” massive data
and achieve a posteriori analysis?
4.5 months of data
A. What is the train occupancy during weekdays,
between Lausanne and Geneva?
B. When are the train the most delayed?
C. Where are the train the most delayed?
A. Lausanne-Genève:
when to have a seat?
Lausanne-Genève: when to have a seat?
or pay…
Lausanne-Genève: when to have a seat?
Good luck

in finding a spot!
Wake up earlier!
Lausanne-Genève: when to have a seat?
B. When are the trains most delayed?
C. Where are the trains most delayed?
Trains Expected
Trains Delayed
Data analysis tooling…
…or “reproducible science”
a data science notebook
• Web application
• Interactively edit and run pieces of code (analysis
steps)
• Inclined towards Python (although other languages
are available)
• Beware of performance with large dataset (sample
data or use Spark mode)
a data science notebook
Jupyter, Zeppelin, RStudio…
TIMTOWTDI
transform
format
dispatch
storage
expose
analysis
visualization
users
data analysts
vehicles
positions
station
boards
This is only
a POC!!!
https://github.com/alexmasselot/swiss-transport-realtime
http://bit.ly/2eukFex
users
data analysts
Swiss transport in real time,
is that only the beginning?
• Bus & trains dispatch their actual positions in real time
• High availability & scalability
• Performance in the browser
• Better long term storage
• More data analysis questions (what’s yours?)
• Don’t forget to have fun!
https://github.com/alexmasselot/swiss-transport-realtime
@alex_mass
This is only
a POC!!!

More Related Content

Viewers also liked

Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Andrii Vozniuk
 

Viewers also liked (8)

Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
 
#PortraitDeCDO - Guénaëlle Gault - Kantar
#PortraitDeCDO - Guénaëlle Gault - Kantar#PortraitDeCDO - Guénaëlle Gault - Kantar
#PortraitDeCDO - Guénaëlle Gault - Kantar
 
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.com
 
#PortraitDeCDO - Thierry Picard - Pierre Fabre
#PortraitDeCDO - Thierry Picard - Pierre Fabre#PortraitDeCDO - Thierry Picard - Pierre Fabre
#PortraitDeCDO - Thierry Picard - Pierre Fabre
 
#PortraitDeCDO - Laurent Assouad - Aéroport de Lyon
#PortraitDeCDO - Laurent Assouad - Aéroport de Lyon#PortraitDeCDO - Laurent Assouad - Aéroport de Lyon
#PortraitDeCDO - Laurent Assouad - Aéroport de Lyon
 
Solution de transfert mobile - Formats d'échange
Solution de transfert mobile - Formats d'échangeSolution de transfert mobile - Formats d'échange
Solution de transfert mobile - Formats d'échange
 
Afterwork Big Data - Data Science & Machine Learning : explorer, comprendre e...
Afterwork Big Data - Data Science & Machine Learning : explorer, comprendre e...Afterwork Big Data - Data Science & Machine Learning : explorer, comprendre e...
Afterwork Big Data - Data Science & Machine Learning : explorer, comprendre e...
 
#PortraitDeCDO - Juliette De Maupeou - Total
#PortraitDeCDO - Juliette De Maupeou - Total#PortraitDeCDO - Juliette De Maupeou - Total
#PortraitDeCDO - Juliette De Maupeou - Total
 

Similar to Dev wednesday-swiss-transport-realtime

Icinga Camp Berlin 2017 - Train IT Platform Monitoring
Icinga Camp Berlin 2017 - Train IT Platform MonitoringIcinga Camp Berlin 2017 - Train IT Platform Monitoring
Icinga Camp Berlin 2017 - Train IT Platform Monitoring
Icinga
 
CVIS Live!
CVIS Live!CVIS Live!
CVIS Live!
zjeftic
 

Similar to Dev wednesday-swiss-transport-realtime (7)

Icinga Camp Berlin 2017 - Train IT Platform Monitoring
Icinga Camp Berlin 2017 - Train IT Platform MonitoringIcinga Camp Berlin 2017 - Train IT Platform Monitoring
Icinga Camp Berlin 2017 - Train IT Platform Monitoring
 
CVIS Live! at ITS WC 2009
CVIS Live! at ITS WC 2009CVIS Live! at ITS WC 2009
CVIS Live! at ITS WC 2009
 
CVIS Live!
CVIS Live!CVIS Live!
CVIS Live!
 
How to use the maps of geo.admin.ch ? 2012
How to use the maps of geo.admin.ch ? 2012How to use the maps of geo.admin.ch ? 2012
How to use the maps of geo.admin.ch ? 2012
 
Swissconnect ag -A network Solution
Swissconnect ag -A network SolutionSwissconnect ag -A network Solution
Swissconnect ag -A network Solution
 
Transport-as-a-Service (TaaS) - How we build next generation plug-and-play IT...
Transport-as-a-Service (TaaS) - How we build next generation plug-and-play IT...Transport-as-a-Service (TaaS) - How we build next generation plug-and-play IT...
Transport-as-a-Service (TaaS) - How we build next generation plug-and-play IT...
 
Christian Leysen, Ahlers on Economy: Get Ready for the Rebound'
Christian Leysen, Ahlers on Economy: Get Ready for the Rebound'Christian Leysen, Ahlers on Economy: Get Ready for the Rebound'
Christian Leysen, Ahlers on Economy: Get Ready for the Rebound'
 

More from OCTO Technology Suisse

Afterwork Devops : vision et pratiques
Afterwork Devops : vision et pratiquesAfterwork Devops : vision et pratiques
Afterwork Devops : vision et pratiques
OCTO Technology Suisse
 
Afterwork "Décollez vers le Cloud"
Afterwork "Décollez vers le Cloud"Afterwork "Décollez vers le Cloud"
Afterwork "Décollez vers le Cloud"
OCTO Technology Suisse
 
big data et data viz - du lac à votre écran - afterwork
big data et data viz - du lac à votre écran - afterwork big data et data viz - du lac à votre écran - afterwork
big data et data viz - du lac à votre écran - afterwork
OCTO Technology Suisse
 
Afterwork Blockchain : la prochaine technologie disruptive ?
Afterwork Blockchain : la prochaine technologie disruptive ?Afterwork Blockchain : la prochaine technologie disruptive ?
Afterwork Blockchain : la prochaine technologie disruptive ?
OCTO Technology Suisse
 
De la pensée projet à la pensée produit
De la pensée projet à la pensée produitDe la pensée projet à la pensée produit
De la pensée projet à la pensée produit
OCTO Technology Suisse
 
Les Business Analysts face à l'agilité : de nouveaux challenges à relever
Les Business Analysts face à l'agilité : de nouveaux challenges à releverLes Business Analysts face à l'agilité : de nouveaux challenges à relever
Les Business Analysts face à l'agilité : de nouveaux challenges à relever
OCTO Technology Suisse
 

More from OCTO Technology Suisse (20)

An afterwork on Microservices by @OCTO Technology Switzerland
An afterwork on Microservices  by @OCTO Technology SwitzerlandAn afterwork on Microservices  by @OCTO Technology Switzerland
An afterwork on Microservices by @OCTO Technology Switzerland
 
Afterwork Devops : vision et pratiques
Afterwork Devops : vision et pratiquesAfterwork Devops : vision et pratiques
Afterwork Devops : vision et pratiques
 
Êtes-vous API dans votre organisation ?
Êtes-vous API dans votre organisation ?Êtes-vous API dans votre organisation ?
Êtes-vous API dans votre organisation ?
 
Afterwork "Décollez vers le Cloud"
Afterwork "Décollez vers le Cloud"Afterwork "Décollez vers le Cloud"
Afterwork "Décollez vers le Cloud"
 
big data et data viz - du lac à votre écran - afterwork
big data et data viz - du lac à votre écran - afterwork big data et data viz - du lac à votre écran - afterwork
big data et data viz - du lac à votre écran - afterwork
 
Afterwork Blockchain : la prochaine technologie disruptive ?
Afterwork Blockchain : la prochaine technologie disruptive ?Afterwork Blockchain : la prochaine technologie disruptive ?
Afterwork Blockchain : la prochaine technologie disruptive ?
 
Afterwork hadoop
Afterwork hadoopAfterwork hadoop
Afterwork hadoop
 
Réussissez le développement de votre prochaine application web ou mobile
Réussissez le développement de votre prochaine application web ou mobileRéussissez le développement de votre prochaine application web ou mobile
Réussissez le développement de votre prochaine application web ou mobile
 
L'ADN d'un développement produit réussi
L'ADN d'un développement produit réussiL'ADN d'un développement produit réussi
L'ADN d'un développement produit réussi
 
Fintech : concurrents ou partenaires ?
Fintech : concurrents ou partenaires ?Fintech : concurrents ou partenaires ?
Fintech : concurrents ou partenaires ?
 
Fintech demain comment travailler ensemble
Fintech   demain comment travailler ensembleFintech   demain comment travailler ensemble
Fintech demain comment travailler ensemble
 
Softshake 2015 - Des small data aux big data - Méthodes et Technologies
Softshake 2015 - Des small data aux big data - Méthodes et TechnologiesSoftshake 2015 - Des small data aux big data - Méthodes et Technologies
Softshake 2015 - Des small data aux big data - Méthodes et Technologies
 
Démystifions l'API-culture!
Démystifions l'API-culture!Démystifions l'API-culture!
Démystifions l'API-culture!
 
Qu'est qu'une Data Driven Company à l'heure de la digitalisation ?
Qu'est qu'une Data Driven Company à l'heure de la digitalisation ?Qu'est qu'une Data Driven Company à l'heure de la digitalisation ?
Qu'est qu'une Data Driven Company à l'heure de la digitalisation ?
 
OCTO Technology - Data Driven Company - SITB15
OCTO Technology - Data Driven Company - SITB15OCTO Technology - Data Driven Company - SITB15
OCTO Technology - Data Driven Company - SITB15
 
Afterwork - La Révolution Digitale
Afterwork - La Révolution DigitaleAfterwork - La Révolution Digitale
Afterwork - La Révolution Digitale
 
Brochure Vers l'entreprise Agile
Brochure Vers l'entreprise AgileBrochure Vers l'entreprise Agile
Brochure Vers l'entreprise Agile
 
De la pensée projet à la pensée produit
De la pensée projet à la pensée produitDe la pensée projet à la pensée produit
De la pensée projet à la pensée produit
 
Agile & Top Management
Agile & Top ManagementAgile & Top Management
Agile & Top Management
 
Les Business Analysts face à l'agilité : de nouveaux challenges à relever
Les Business Analysts face à l'agilité : de nouveaux challenges à releverLes Business Analysts face à l'agilité : de nouveaux challenges à relever
Les Business Analysts face à l'agilité : de nouveaux challenges à relever
 

Recently uploaded

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 

Recently uploaded (20)

WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
 
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & InnovationWSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdfAzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
Novo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMsNovo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMs
 
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
WSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital BusinessesWSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital Businesses
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of TransformationWSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
BusinessGPT - Security and Governance for Generative AI
BusinessGPT  - Security and Governance for Generative AIBusinessGPT  - Security and Governance for Generative AI
BusinessGPT - Security and Governance for Generative AI
 
From Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST APIFrom Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST API
 

Dev wednesday-swiss-transport-realtime

  • 1. Swiss Transport in Real Time: Tribulations in the Big Data Stack Alexandre Masselot Dev. Wednesday March 2017 @alex_mass
  • 2. Swiss Transport in Real Time: Tribulations in the Big Data Stack Alexandre Masselot Dev. Wednesday March 2017 @alex_mass
  • 3. AVENUE DU THÉÂTRE, 7 – 1005 LAUSANNE > SUISSE > WWW.OCTO.CH OCTO Suisse RECRUTE 5 consultants en 2017 rejoins.octo.com Architecte Software Craftsman DataGeek Coach Méthodo Expert DevOps Consultant en Stratégie
  • 4. Is it possible to build a simple scalable infrastructure, to dispatch, store, transform
 and visualize “near real time” data and achieve a posteriori analysis? This is only a POC!!!
  • 5. Finding a dataset • social media • finance • sport • energy • transport • log analysis • meteorology • bioinformatics • personalized health • monitoring • security • IOT
  • 6. Finding a dataset • social media • finance • sport • energy • transport • log analysis • meteorology • bioinformatics • personalized health • monitoring • security • IOT
  • 11. AAGL Autobus AG Liestal AAGR Auto AG Rothenburg AAGS Auto AG Schwyz AAGU AUTO AG URI AB Appenzeller Bahnen AG ABl Autolinee Bleniesi SA ABF Autobusbetrieb Freienbach AFA Automobilverkehr Frutigen Adelboden AG AMSA Autolinea Mendrisiense SA AOT Autokurse Oberthurgau AG ARAG Rottal Auto AG ARBAG Aletsch Riederalp Bahnen AG ARL Autolinee Regionali Luganesi AS Autobetrieb Sernftal AG ASGS Autotransports Sion-Grône-Sierre ASm Aare Seeland mobil AG AVG Autoverkehr Grindelwald AG AVJ Autotransports de la Vallée de Joux AWA Autobetrieb Weesen-Amden AZZK Autobus Zürich-Zollikon-Küsnacht BB Bürgenstock Bahnen BBA Busbetrieb Aarau AAR bus+bahn BBBW Bus-Betrieb Binggeli BDWM BDWM Transport AG BGU BGU Busbetrieb Grenchen und Umgebung AG BLAG Busland AG BLM Bergbahn Lauterbrunnen-Mürren AG BLS BLS AG BLT BLT Baselland Transport AG BLWE Busbetrieb Lichtensteig-Wattwil-Ebnat-Kappel BOB Berner Oberland-Bahnen AG BOGG Busbetrieb Olten Gösgen Gäu AG BOS BUS Ostschweiz AG BOS-M BOS Management AG BRB Brienz Rothorn Bahn AG BRER Busbetrieb Rapperswil-Eschenbach-Rüti BRSB Braunwald-Standseilbahn AG BSU Busbetrieb Solothurn und Umgebung AG BVB Basler Verkehrs-Betriebe CGN CGN SA CJ Compagnie des chemins de fer du Jura (C.J.) SA CROS Crossrail AG DBSCH DB Schenker Rail Schweiz GmbH DBZ Dolderbahn Zürich ETB Emmentalbahn, Huttwil FART Ferrovie Autolinee Regionali Ticinesi FB Forchbahn AG FC FUNICAR Kursbetriebe AG FLP Ferrovie Luganesi SA FW Frauenfeld-Wil-Bahn AG GGB Gornergrat Bahn AG HBSAG Hafenbahn Schweiz AG JB Jungfraubahn AG LEB Chemin de fer Lausanne-Echallens-Bercher LLB AG für Verkehrsbetriebe Leuk-Leukerbad und Umgebung LSMS Schilthornbahn AG MBC Transports de la région Morges-Bière-Cossonay SA MG Ferrovia Monte Generoso SA MGB Matterhorn Gotthard Bahn MIB Kraftwerke Oberhasli AG Meiringen-Innertkirchen-Bahn MOB Chemin de fer Montreux-Oberland Bernois MVR Transports Montreux-Vevey-Riviera SA NHB Niederhornbahn NB Niesenbahn AG NStCM Chemin de fer Nyon-St. Cergue-Morez OeBB Oensingen-Balsthal-Bahn PAG PostAuto Schweiz AG PB PILATUS-BAHNEN AG RA RegionAlps SA RAILG Railgate AG RB RIGI BAHNEN AG RBL Regionalbus Lenzburg AG RBS Regionalverkehr Bern-Solothurn AG REGO Regiobus Gossau AG RhB Rhätische Bahn AG RNCH DB Schenker Rail Schweiz GmbH RLC railCare RVBW Regionale Verkehrsbetriebe Baden-Wettingen AG RVSH SchaffhausenBus, Regionale Verkehrsbetriebe SH AG SBB SBB AG SBB-D SBB GmbH SBC Stadtbus Chur AG SBF Stadtbus Frauenfeld SBW Stadtbus Winterthur SMC Cie de Chemin de Fer+d'Autobus Sierre-Montana-Crans (SMC) SA SMGN Société des Mouettes Genevoises Navigation SA SMtS Funiculaire St-Imier - Mont-Soleil SA SOB Schweizerische Südostbahn AG SRTAG Swiss Rail Traffic AG SSIF Società Subalpina di Imprese Ferroviarie S.p.A. ST Sursee-Triengen-Bahn STB Sensetalbahn AG STI Verkehrsbetriebe STI AG SVB BERNMOBIL Städt. Verkehrsbetriebe Bern SWAG Seilbahn Weissenstein AG SZU Sihltal Zürich Uetliberg Bahn SZU AG THURBO Thurbo AG TL Transports publics de la région lausannoise SA TMR TRANSPORTS DE MARTIGNY ET REGIONS SA TPC Transports Publics du Chablais SA TPF Transports publics fribourgeois SA TPG Transports publics genevois TPL Trasporti Pubblici Luganesi SA TPN Transports Publics de la Région Nyonnaise SA TRN Transports Publics Neuchâtelois SA TRAVYS TRAVYS SA Transports Vallée de Joux-Yverdon-Sainte-Croix TSD Theytaz Excursions Sion VB Verkehrsbetriebe Biel VBD Verkehrsbetrieb der Landschaft Davos VBG VBG Verkehrsbetriebe Glattal AG VBH Verkehrsbetriebe Herisau VBL Verkehrsbetriebe Luzern AG VBSG Verkehrsbetriebe St.Gallen VBSH Verkehrsbetriebe Schaffhausen VBZ Verkehrsbetriebe Zürich VMCV Transports publics Vevey-Montreux-Chillon-Villeneuve VSSU Verband Schweizerischer Schifffahrtsunternehmen VZO Verkehrsbetriebe Zürichsee und Oberland AG WAB Wengernalpbahn AG WB Waldenburgerbahn AG WRS Widmer Rail Services Personal AG WSB Wynental- und Suhrentalbahn AAR bus+bahn ZB zb Zentralbahn AG ZVB Zugerland Verkehrsbetriebe AG ZVV Zürcher Verkehrsverbund ZVV AES Ägerisee Schifffahrt AG BLS BLS AG Schifffahrt Berner Oberland Thuner- und Brienzersee BPG Basler Personenschifffahrt AG BSG Bielersee-Schifffahrts-Gesellschaft AG CGN CGN SA FHM Zürichsee-Fähre Horgen-Meilen AG LNM Société de Navigation Lacs de Neuchâtel et Morat SA NLM Navigazione Lago Maggiore SBS SBS Schifffahrt AG SGG Schifffahrts-Genossenschaft Greifensee SGH Schifffahrtsgesellschaft Hallwilersee AG SGV Schifffahrtsgesellschaft des Vierwaldstättersees SGZ Schifffahrtsgesellschaft für den Zugersee AG / Ägerisee SNL Società Navigazione del Lago di Lugano SA SW Schiffsbetrieb Walensee AG URh Schweiz. Schifffahrtsgesellschaft Untersee und Rhein AG ZSG Zürichsee-Schifffahrtsgesellschaft AG
  • 12.
  • 13. What do we propose? https://github.com/alexmasselot/swiss-transport-realtime
  • 14.
  • 15. Is it possible to build a simple scalable infrastructure, to dispatch, transform and visualize
 “near real time” massive data and achieve a posteriori analysis?
  • 17. Is it possible to build a simple scalable infrastructure, to dispatch, transform and visualize
 “near real time” massive data and achieve a posteriori analysis?
  • 19.
  • 21. Is it possible to build a simple scalable infrastructure, to dispatch, transform and visualize
 “near real time” massive data and achieve a posteriori analysis?
  • 24. { id: 12345xyz, category: IR, name: IR 72928, destination: Alpnach, position: { lat: 46.940582, lon: 8.275442 } } station boards station boards { station: { name: Lausanne, location: {lat, long} }, departures: [ { to:Domodossola, time: 20:13, delayed: 4, prognosis: { capacity2nd: 3, capacity1st: 1 } }, {…} positionspositions
  • 26. Events are streamed to “Kafka is used for building real- time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.” kafka.apache.org real time offline
  • 28. Store format dispatch storage logstash elasticsearch flat fileflat fileflat fileflat fileflat fileflat fileflat files
  • 32. Is it possible to build a simple scalable infrastructure, to dispatch, transform and visualize
 “near real time” massive data and achieve a posteriori analysis?
  • 33. Stream transformation • We have an input flow of events and want to: • know if a train is stopped into a station; • know if a train as exited the network; • expose an aggregated station board. • We need to: • digest the input flow; • process with temporary state persistance; • be able to expose snapshots.
  • 34. Stream transformation • Scala is The language for Big Data (functional & OO)
 • Akka (actors): • lightweight entities (one per train, per station); • easy asynchronous communications; • the perfect use case. • Play framework for REST service, configuration etc.
  • 35. Spark Streaming, Storm, Flink… TIMTOWTDI
  • 37. : putting everything together • The “simple” infrastructure is not so light; • A developper should have everything on his/her laptop without polluting the machine; • Docker comes to the rescue: • lightweight containers, • pre-existing images, • docker-compose to describe the infrastructure • deploy directly to a cloud.
  • 39. Performance: 2 numbers 15% CPU: nodeJS + kafka + akka + play 15x faster ajax queries (vs SBB rest)
 to gather 30 times more trains
  • 40. Is it possible to build a simple scalable infrastructure, to dispatch, transform and visualize
 “near real time” massive data and achieve a posteriori analysis?
  • 41. A scalable infrastructure Kafka partitioning and zookeeper Logstash ? (but naturally recover on failure) Elasticsearch partitioning Spark streaming distributed by essence
 & write ahead logs Akka aka cluster, supervisors
 & failure strategy Docker Kubernetes
 AWS, GCE, Exoscale, Hidora
  • 43. Is it possible to build a simple scalable infrastructure, to dispatch, transform and visualize
 “near real time” massive data and achieve a posteriori analysis?
  • 44.
  • 45.
  • 46. JS for large data set • Only a rendering library (but fast); • Use a flux architecture; • Built by Facebook. Dispatcher Store View Action Action
  • 47. JavaScript for big data viz • React can handle viz >100k elements (don’t show them individually!) • Beware of performance issue; • Testing is not an option.
  • 48. ng(2) + rx/js +d3.js + pixi.js (GPU) http://blog.octo.com/en/visualizing-massive-data-streams-a-public-transport-use-case/ http://blog.octo.com/en/d3-js-transitions-killed-my-cpu-a-d3-js-pixi-js-comparison/
  • 49. Is it possible to build a simple scalable infrastructure, to dispatch, transform and visualize
 “near real time” massive data and achieve a posteriori analysis?
  • 50. 4.5 months of data A. What is the train occupancy during weekdays, between Lausanne and Geneva? B. When are the train the most delayed? C. Where are the train the most delayed?
  • 53. or pay… Lausanne-Genève: when to have a seat? Good luck
 in finding a spot! Wake up earlier!
  • 55. B. When are the trains most delayed?
  • 56.
  • 57. C. Where are the trains most delayed?
  • 58.
  • 63. a data science notebook
  • 64. • Web application • Interactively edit and run pieces of code (analysis steps) • Inclined towards Python (although other languages are available) • Beware of performance with large dataset (sample data or use Spark mode) a data science notebook
  • 66. transform format dispatch storage expose analysis visualization users data analysts vehicles positions station boards This is only a POC!!! https://github.com/alexmasselot/swiss-transport-realtime http://bit.ly/2eukFex
  • 68. Swiss transport in real time, is that only the beginning? • Bus & trains dispatch their actual positions in real time • High availability & scalability • Performance in the browser • Better long term storage • More data analysis questions (what’s yours?) • Don’t forget to have fun! https://github.com/alexmasselot/swiss-transport-realtime @alex_mass This is only a POC!!!