SlideShare a Scribd company logo
‡
Data Science at Trainline
for Smarter Journeys
London, 22/11/2016
@DataScienceFest
@TrainlineTalent
‡
Outline
• A bit about Trainline.
• Cloud-based serverless architecture for Big Data.
• Case Study: BusyBot
• Other Case Studies
2
John Telford, Head of Data Architecture.
Leading the adoption of Big Data technology at Trainline. Manages a team of Data Engineers and
Database Administrators. Previously worked on Data Warehousing and Big Data at Channel 4.
Computer Science degree from Brunel University.
Twitter: @jtelford1
Marco Rossetti, Senior Data Scientist.
Leading personalisation initiatives, like providing context-aware personalised services, journey
recommendations, and tailored travel options. Previously worked on recommender systems for
researchers at Mendeley. He has a PhD in Computer Science from University of Milan-Bicocca.
Twitter: @ross85
‡
Trainline - Smarter Journeys
Help our customers save,
• Time (no more queuing for tickets at station)
• Money (book early, find cheap tickets)
• Energy (remove complexity)
Headlines...
• We process more than £2.3 billion in ticket sales annually.
• 100,000 smarter journeys every single day.
• 44 train companies, across 24 European countries.
• ~400 employees (London, Edinburgh, Paris).
• More than 30m visits per month
• 1 ticket sold every three seconds
3
Trainline takeover of Kings X, Oct 2016.
‡ 4
‡ 5
‡ 6
‡
Bob's cloud laws
It’s cloud if…
1. It offers self provisioning.
2. It offers pay-as-you-go pricing.
3. It is, for all intents and purposes, infinitely scalable.
Thus, no need for support from the provider for set-up, no upfront payments for
licences or minimum term agreements, and no constraints on what I can do!
• Hosting is not cloud.
• BYO licensing is not cloud.
7
‡
From servers... to serverless
8
Servers = Pets
Virtual Machines= Cattle
Containers & Serverless
= Herds
Trainline policy:
Use PaaS wherever possible,
Use Serverless wherever possible,
... so long as they are good enough.
‡
Data Gateway
9
‡
Data Platform
10
‡
Lessons: Lambda
• Effortless scaling; we often have >
100 λs running at once.
• Warm-up time.
• Choose language / framework
carefully.
• Consequences of 'freeze'.
• Monitoring– single thread.
Google "Trainline Engineering Lambda"
11
Service	Time	
Distribution
Execution	(ms)
‡
Lessons: Kinesis Streams
• TCO is generally low.
• But... understand costs, related to capacity of stream (number & size of
messages), time-to-live, etc.
• Monitoring / alerting... CloudWatch is (probably) not enough.
• Compress & encrypt?
Google "AWS Overview of Security Processes"
12
‡ 13
BusyBot
‡
0% 10% 20% 30% 40% 50% 60% 70%
Delays
Overcrowding
Value for money
Toilet Facilities
Luggage Space
Availability of staff
Car Parking
Unhappy customers
Source : National Rail Passenger Survey (NRPS) 2015
14
‡ 15
‡ 16
Google "BusyBot overcrowding"
‡
Busy Bot Discovery
Data from March to May - approx. 100k feedback from our Android
users.
17
‡
Infrastructure – Data Gateway
Feedback
collection
Daily
Enrichment
{
"train_destination": "RDG",
"retail_train_number": "GW2980",
"train_origin": "NRC",
"train_date": "2016-08-08T07:38:00.000Z",
"customer_longitude": 0,
"train_hashid": "NRC:RDG:08/08/2016 08:38:00:GW2980",
"customer_location_on_train": "Back",
"customer_hashid": ”…",
"customer_got_seat": 1,
"customer_feedback": "Yes",
"feedback_type": 1,
"customer_latitude": 0,
"feedbackid": ”…",
"device_id": ”…",
"timestamp": "2016-08-08T07:41:39.390Z",
"customer_id": ”…”
} 18
‡ 19
‡ 20
‡
ALL
feebacks
≧100
feedbacks
≧1000
feedbacks
City	Thameslink:	50%
0%	with	a	seat
100%	with	a	seat
21
‡
Infrastructure – Data Platform
Model Building
And	Validation
Service
route-
origin
route-
destination stop
customer-location-
train
percentage-who-got-
seat
feedback-
count
EUS MAN EUS middle 0.738059701 4020
EUS BHM EUS middle 0.63788222 3532
KGX LDS KGX middle 0.704984154 3471
BHM EUS BHM middle 0.679082241 3356
KGX EDB KGX middle 0.5589236 3233
EUS GLC EUS middle 0.676663543 3201
MAN EUS MAN middle 0.769495772 3193
PAD SWA PAD middle 0.608086078 3067
EUS BHM EUS front 0.672365666 2866
EUS MAN EUS front 0.790479625 2773
{
"retailTrainIdentifier": "VT7280",
"isBusy": false,
"callingPoints": [
{
"stationCode": "EUS",
"coaches": [
{"position": "Back", "recommend": true},
{"position": "Front", "recommend": false},
{"position": "Middle", "recommend": false}
]
},
{
"stationCode": "MKC",
"coaches": [
{"position": "Back", "recommend": false},
…
22
‡
• At	least	N feedbacks
• At	least	feedbacks	for
D days
• CI	on	the	percentage
who	got	a	seat	<=	p
Data Validation
23
‡
Journey Results
Live Tracker
BusyBot V1
Sep 2016
24
‡ 25
Coming soon…
‡
Hotels
26
‡
Journey
Recommendations
27
‡
Search
Prediction
28
‡
Summary
BusyBot Hotels
Journey Recommendations
Search
Prediction
Delays
Prices
Real Time Information
Personalisation
….
29
‡
Any Questions?
(we are hiring!)
Data Scientist positions: mali.mehmood@thetrainline.com
Data Engineer positions: david.smith@thetrainline.com
30

More Related Content

What's hot

Canonicalization for SEO BrightonSEO April 2023 Patrick Stox
Canonicalization for SEO BrightonSEO April 2023 Patrick StoxCanonicalization for SEO BrightonSEO April 2023 Patrick Stox
Canonicalization for SEO BrightonSEO April 2023 Patrick Stox
Ahrefs
 
International SEO for E-Commerce Websites #SEJLive #SEJeSummit
International SEO for E-Commerce Websites #SEJLive #SEJeSummitInternational SEO for E-Commerce Websites #SEJLive #SEJeSummit
International SEO for E-Commerce Websites #SEJLive #SEJeSummit
Aleyda Solís
 
Startup SEO: 0 to 386k Organic Traffic ARR in 12 Months at a Showstring Budget
Startup SEO: 0 to 386k Organic Traffic ARR in 12 Months at a Showstring BudgetStartup SEO: 0 to 386k Organic Traffic ARR in 12 Months at a Showstring Budget
Startup SEO: 0 to 386k Organic Traffic ARR in 12 Months at a Showstring Budget
Nitin Manchanda
 
Fully Automated Link Building - Brighton SEO.pdf
Fully Automated Link Building - Brighton SEO.pdfFully Automated Link Building - Brighton SEO.pdf
Fully Automated Link Building - Brighton SEO.pdf
Sam Oh
 
[LondonSEO 2020] BigQuery & SQL for SEOs
[LondonSEO 2020] BigQuery & SQL for SEOs[LondonSEO 2020] BigQuery & SQL for SEOs
[LondonSEO 2020] BigQuery & SQL for SEOs
Areej AbuAli
 
How to convince even the pickiest editors to take SEO more seriously :: brigh...
How to convince even the pickiest editors to take SEO more seriously :: brigh...How to convince even the pickiest editors to take SEO more seriously :: brigh...
How to convince even the pickiest editors to take SEO more seriously :: brigh...
Ian Helms
 
Hoe optimaliseer je pagina's voor Search Intent - SEO Benelux Meetup 2022 @ i...
Hoe optimaliseer je pagina's voor Search Intent - SEO Benelux Meetup 2022 @ i...Hoe optimaliseer je pagina's voor Search Intent - SEO Benelux Meetup 2022 @ i...
Hoe optimaliseer je pagina's voor Search Intent - SEO Benelux Meetup 2022 @ i...
Clarissa Filius | Booming
 
HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO Apri...
HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO  Apri...HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO  Apri...
HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO Apri...
Jessica Maloney
 
Freddy Krueger's Guide to Scary Good Reporting
Freddy Krueger's Guide to Scary Good ReportingFreddy Krueger's Guide to Scary Good Reporting
Freddy Krueger's Guide to Scary Good Reporting
Greg Gifford
 
Website Migration SEO: Advanced Migration Strategy & Analysis
Website Migration SEO: Advanced Migration Strategy & AnalysisWebsite Migration SEO: Advanced Migration Strategy & Analysis
Website Migration SEO: Advanced Migration Strategy & Analysis
Sam Partland
 
How To EAT Links.pptx
How To EAT Links.pptxHow To EAT Links.pptx
How To EAT Links.pptx
Dixon Jones
 
BrightonSEO April 2022 - Kara Thurkettle - Search in the Metaverse.pdf
BrightonSEO April 2022 - Kara Thurkettle - Search in the Metaverse.pdfBrightonSEO April 2022 - Kara Thurkettle - Search in the Metaverse.pdf
BrightonSEO April 2022 - Kara Thurkettle - Search in the Metaverse.pdf
🇺🇲 🇬🇧 Kara Thurkettle
 
How to create content that generates leads -- not just traffic.pptx
How to create content that generates leads -- not just traffic.pptxHow to create content that generates leads -- not just traffic.pptx
How to create content that generates leads -- not just traffic.pptx
AramintaRobertson
 
Improving Crawling and Indexing using Real-Time Log File Insights
Improving Crawling and Indexing using Real-Time Log File InsightsImproving Crawling and Indexing using Real-Time Log File Insights
Improving Crawling and Indexing using Real-Time Log File Insights
Steven van Vessum
 
Content Prioritisation: Approaching infinite opportunities with finite resour...
Content Prioritisation: Approaching infinite opportunities with finite resour...Content Prioritisation: Approaching infinite opportunities with finite resour...
Content Prioritisation: Approaching infinite opportunities with finite resour...
Colebrook
 
SEO Migrations for International Web Setups
SEO Migrations for International Web SetupsSEO Migrations for International Web Setups
SEO Migrations for International Web Setups
Nitin Manchanda
 
Google Sheets For SEO - Tom Pool - London SEO Meetup XL
Google Sheets For SEO - Tom Pool - London SEO Meetup XLGoogle Sheets For SEO - Tom Pool - London SEO Meetup XL
Google Sheets For SEO - Tom Pool - London SEO Meetup XL
Tom Pool
 
SEO at Scale - BrightonSEO April 2022
SEO at Scale - BrightonSEO April 2022SEO at Scale - BrightonSEO April 2022
SEO at Scale - BrightonSEO April 2022
Nitin Manchanda
 
Data Science Hierarchy of Needs
Data Science Hierarchy of NeedsData Science Hierarchy of Needs
Data Science Hierarchy of Needs
Dylan
 
Conference slide design tips for brightonSEO speakers (and other events too)
Conference slide design tips for brightonSEO speakers (and other events too)Conference slide design tips for brightonSEO speakers (and other events too)
Conference slide design tips for brightonSEO speakers (and other events too)
Kelvin Newman
 

What's hot (20)

Canonicalization for SEO BrightonSEO April 2023 Patrick Stox
Canonicalization for SEO BrightonSEO April 2023 Patrick StoxCanonicalization for SEO BrightonSEO April 2023 Patrick Stox
Canonicalization for SEO BrightonSEO April 2023 Patrick Stox
 
International SEO for E-Commerce Websites #SEJLive #SEJeSummit
International SEO for E-Commerce Websites #SEJLive #SEJeSummitInternational SEO for E-Commerce Websites #SEJLive #SEJeSummit
International SEO for E-Commerce Websites #SEJLive #SEJeSummit
 
Startup SEO: 0 to 386k Organic Traffic ARR in 12 Months at a Showstring Budget
Startup SEO: 0 to 386k Organic Traffic ARR in 12 Months at a Showstring BudgetStartup SEO: 0 to 386k Organic Traffic ARR in 12 Months at a Showstring Budget
Startup SEO: 0 to 386k Organic Traffic ARR in 12 Months at a Showstring Budget
 
Fully Automated Link Building - Brighton SEO.pdf
Fully Automated Link Building - Brighton SEO.pdfFully Automated Link Building - Brighton SEO.pdf
Fully Automated Link Building - Brighton SEO.pdf
 
[LondonSEO 2020] BigQuery & SQL for SEOs
[LondonSEO 2020] BigQuery & SQL for SEOs[LondonSEO 2020] BigQuery & SQL for SEOs
[LondonSEO 2020] BigQuery & SQL for SEOs
 
How to convince even the pickiest editors to take SEO more seriously :: brigh...
How to convince even the pickiest editors to take SEO more seriously :: brigh...How to convince even the pickiest editors to take SEO more seriously :: brigh...
How to convince even the pickiest editors to take SEO more seriously :: brigh...
 
Hoe optimaliseer je pagina's voor Search Intent - SEO Benelux Meetup 2022 @ i...
Hoe optimaliseer je pagina's voor Search Intent - SEO Benelux Meetup 2022 @ i...Hoe optimaliseer je pagina's voor Search Intent - SEO Benelux Meetup 2022 @ i...
Hoe optimaliseer je pagina's voor Search Intent - SEO Benelux Meetup 2022 @ i...
 
HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO Apri...
HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO  Apri...HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO  Apri...
HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO Apri...
 
Freddy Krueger's Guide to Scary Good Reporting
Freddy Krueger's Guide to Scary Good ReportingFreddy Krueger's Guide to Scary Good Reporting
Freddy Krueger's Guide to Scary Good Reporting
 
Website Migration SEO: Advanced Migration Strategy & Analysis
Website Migration SEO: Advanced Migration Strategy & AnalysisWebsite Migration SEO: Advanced Migration Strategy & Analysis
Website Migration SEO: Advanced Migration Strategy & Analysis
 
How To EAT Links.pptx
How To EAT Links.pptxHow To EAT Links.pptx
How To EAT Links.pptx
 
BrightonSEO April 2022 - Kara Thurkettle - Search in the Metaverse.pdf
BrightonSEO April 2022 - Kara Thurkettle - Search in the Metaverse.pdfBrightonSEO April 2022 - Kara Thurkettle - Search in the Metaverse.pdf
BrightonSEO April 2022 - Kara Thurkettle - Search in the Metaverse.pdf
 
How to create content that generates leads -- not just traffic.pptx
How to create content that generates leads -- not just traffic.pptxHow to create content that generates leads -- not just traffic.pptx
How to create content that generates leads -- not just traffic.pptx
 
Improving Crawling and Indexing using Real-Time Log File Insights
Improving Crawling and Indexing using Real-Time Log File InsightsImproving Crawling and Indexing using Real-Time Log File Insights
Improving Crawling and Indexing using Real-Time Log File Insights
 
Content Prioritisation: Approaching infinite opportunities with finite resour...
Content Prioritisation: Approaching infinite opportunities with finite resour...Content Prioritisation: Approaching infinite opportunities with finite resour...
Content Prioritisation: Approaching infinite opportunities with finite resour...
 
SEO Migrations for International Web Setups
SEO Migrations for International Web SetupsSEO Migrations for International Web Setups
SEO Migrations for International Web Setups
 
Google Sheets For SEO - Tom Pool - London SEO Meetup XL
Google Sheets For SEO - Tom Pool - London SEO Meetup XLGoogle Sheets For SEO - Tom Pool - London SEO Meetup XL
Google Sheets For SEO - Tom Pool - London SEO Meetup XL
 
SEO at Scale - BrightonSEO April 2022
SEO at Scale - BrightonSEO April 2022SEO at Scale - BrightonSEO April 2022
SEO at Scale - BrightonSEO April 2022
 
Data Science Hierarchy of Needs
Data Science Hierarchy of NeedsData Science Hierarchy of Needs
Data Science Hierarchy of Needs
 
Conference slide design tips for brightonSEO speakers (and other events too)
Conference slide design tips for brightonSEO speakers (and other events too)Conference slide design tips for brightonSEO speakers (and other events too)
Conference slide design tips for brightonSEO speakers (and other events too)
 

Similar to Data Science at Trainline for Smarter Journeys

Smart net
Smart netSmart net
Smart net
Amirhosein Ataei
 
OSS Presentation Keynote by Jason Hoffman
OSS Presentation Keynote by Jason HoffmanOSS Presentation Keynote by Jason Hoffman
OSS Presentation Keynote by Jason Hoffman
OpenStorageSummit
 
URBAN TRAFFIC DATA HACK - ROLAND MAJOR
URBAN TRAFFIC DATA HACK - ROLAND MAJORURBAN TRAFFIC DATA HACK - ROLAND MAJOR
URBAN TRAFFIC DATA HACK - ROLAND MAJOR
Big Data Week
 
TransportCamp AU Presentation: Designing a multimodal, high-frequency network...
TransportCamp AU Presentation: Designing a multimodal, high-frequency network...TransportCamp AU Presentation: Designing a multimodal, high-frequency network...
TransportCamp AU Presentation: Designing a multimodal, high-frequency network...
Patrick Sunter
 
Can the e-Mobility Charging Infrastructure be a Blueprint for other IoT Proje...
Can the e-Mobility Charging Infrastructure be a Blueprint for other IoT Proje...Can the e-Mobility Charging Infrastructure be a Blueprint for other IoT Proje...
Can the e-Mobility Charging Infrastructure be a Blueprint for other IoT Proje...
Achim Friedland
 
A Platform Approach to Digital Transformation
A Platform Approach to Digital TransformationA Platform Approach to Digital Transformation
A Platform Approach to Digital Transformation
Integration Meetups
 
Evolution of network - computer networks
Evolution of network - computer networksEvolution of network - computer networks
Evolution of network - computer networks
SabarishSanjeevi
 
Mehr und schneller ist nicht automatisch besser - data2day, 06.10.16
Mehr und schneller ist nicht automatisch besser - data2day, 06.10.16Mehr und schneller ist nicht automatisch besser - data2day, 06.10.16
Mehr und schneller ist nicht automatisch besser - data2day, 06.10.16
Boris Adryan
 
Challenges in end-to-end performance
Challenges in end-to-end performanceChallenges in end-to-end performance
Challenges in end-to-end performance
Jisc
 
Architecting IoT with Machine Learning
Architecting IoT with Machine LearningArchitecting IoT with Machine Learning
Architecting IoT with Machine Learning
Rudradeb Mitra
 
Data Summer Conf 2018, “Architecting IoT system with Machine Learning (ENG)” ...
Data Summer Conf 2018, “Architecting IoT system with Machine Learning (ENG)” ...Data Summer Conf 2018, “Architecting IoT system with Machine Learning (ENG)” ...
Data Summer Conf 2018, “Architecting IoT system with Machine Learning (ENG)” ...
Provectus
 
Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017
Boris Adryan
 
Masterslides Trafiklabmeetup 6 dec
Masterslides Trafiklabmeetup 6 decMasterslides Trafiklabmeetup 6 dec
Masterslides Trafiklabmeetup 6 dec
Emma Skille
 
Alternative metrics
Alternative metricsAlternative metrics
Alternative metrics
Parthipan Parthi
 
Cloud computing for business
Cloud computing for businessCloud computing for business
Cloud computing for business
Azure Group
 
Big Data Meetup #7
Big Data Meetup #7Big Data Meetup #7
Big Data Meetup #7
Paul Lo
 
Data Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RData Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and R
Radek Maciaszek
 
Automobile Route Matching with Dynamic Time Warping Using PySpark with Cather...
Automobile Route Matching with Dynamic Time Warping Using PySpark with Cather...Automobile Route Matching with Dynamic Time Warping Using PySpark with Cather...
Automobile Route Matching with Dynamic Time Warping Using PySpark with Cather...
Databricks
 
IoT interoperability
IoT interoperabilityIoT interoperability
IoT interoperability
1248 Ltd.
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
Rob Winters
 

Similar to Data Science at Trainline for Smarter Journeys (20)

Smart net
Smart netSmart net
Smart net
 
OSS Presentation Keynote by Jason Hoffman
OSS Presentation Keynote by Jason HoffmanOSS Presentation Keynote by Jason Hoffman
OSS Presentation Keynote by Jason Hoffman
 
URBAN TRAFFIC DATA HACK - ROLAND MAJOR
URBAN TRAFFIC DATA HACK - ROLAND MAJORURBAN TRAFFIC DATA HACK - ROLAND MAJOR
URBAN TRAFFIC DATA HACK - ROLAND MAJOR
 
TransportCamp AU Presentation: Designing a multimodal, high-frequency network...
TransportCamp AU Presentation: Designing a multimodal, high-frequency network...TransportCamp AU Presentation: Designing a multimodal, high-frequency network...
TransportCamp AU Presentation: Designing a multimodal, high-frequency network...
 
Can the e-Mobility Charging Infrastructure be a Blueprint for other IoT Proje...
Can the e-Mobility Charging Infrastructure be a Blueprint for other IoT Proje...Can the e-Mobility Charging Infrastructure be a Blueprint for other IoT Proje...
Can the e-Mobility Charging Infrastructure be a Blueprint for other IoT Proje...
 
A Platform Approach to Digital Transformation
A Platform Approach to Digital TransformationA Platform Approach to Digital Transformation
A Platform Approach to Digital Transformation
 
Evolution of network - computer networks
Evolution of network - computer networksEvolution of network - computer networks
Evolution of network - computer networks
 
Mehr und schneller ist nicht automatisch besser - data2day, 06.10.16
Mehr und schneller ist nicht automatisch besser - data2day, 06.10.16Mehr und schneller ist nicht automatisch besser - data2day, 06.10.16
Mehr und schneller ist nicht automatisch besser - data2day, 06.10.16
 
Challenges in end-to-end performance
Challenges in end-to-end performanceChallenges in end-to-end performance
Challenges in end-to-end performance
 
Architecting IoT with Machine Learning
Architecting IoT with Machine LearningArchitecting IoT with Machine Learning
Architecting IoT with Machine Learning
 
Data Summer Conf 2018, “Architecting IoT system with Machine Learning (ENG)” ...
Data Summer Conf 2018, “Architecting IoT system with Machine Learning (ENG)” ...Data Summer Conf 2018, “Architecting IoT system with Machine Learning (ENG)” ...
Data Summer Conf 2018, “Architecting IoT system with Machine Learning (ENG)” ...
 
Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017
 
Masterslides Trafiklabmeetup 6 dec
Masterslides Trafiklabmeetup 6 decMasterslides Trafiklabmeetup 6 dec
Masterslides Trafiklabmeetup 6 dec
 
Alternative metrics
Alternative metricsAlternative metrics
Alternative metrics
 
Cloud computing for business
Cloud computing for businessCloud computing for business
Cloud computing for business
 
Big Data Meetup #7
Big Data Meetup #7Big Data Meetup #7
Big Data Meetup #7
 
Data Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RData Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and R
 
Automobile Route Matching with Dynamic Time Warping Using PySpark with Cather...
Automobile Route Matching with Dynamic Time Warping Using PySpark with Cather...Automobile Route Matching with Dynamic Time Warping Using PySpark with Cather...
Automobile Route Matching with Dynamic Time Warping Using PySpark with Cather...
 
IoT interoperability
IoT interoperabilityIoT interoperability
IoT interoperability
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
 

Recently uploaded

一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 

Recently uploaded (20)

一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 

Data Science at Trainline for Smarter Journeys

  • 1. ‡ Data Science at Trainline for Smarter Journeys London, 22/11/2016 @DataScienceFest @TrainlineTalent
  • 2. ‡ Outline • A bit about Trainline. • Cloud-based serverless architecture for Big Data. • Case Study: BusyBot • Other Case Studies 2 John Telford, Head of Data Architecture. Leading the adoption of Big Data technology at Trainline. Manages a team of Data Engineers and Database Administrators. Previously worked on Data Warehousing and Big Data at Channel 4. Computer Science degree from Brunel University. Twitter: @jtelford1 Marco Rossetti, Senior Data Scientist. Leading personalisation initiatives, like providing context-aware personalised services, journey recommendations, and tailored travel options. Previously worked on recommender systems for researchers at Mendeley. He has a PhD in Computer Science from University of Milan-Bicocca. Twitter: @ross85
  • 3. ‡ Trainline - Smarter Journeys Help our customers save, • Time (no more queuing for tickets at station) • Money (book early, find cheap tickets) • Energy (remove complexity) Headlines... • We process more than £2.3 billion in ticket sales annually. • 100,000 smarter journeys every single day. • 44 train companies, across 24 European countries. • ~400 employees (London, Edinburgh, Paris). • More than 30m visits per month • 1 ticket sold every three seconds 3 Trainline takeover of Kings X, Oct 2016.
  • 7. ‡ Bob's cloud laws It’s cloud if… 1. It offers self provisioning. 2. It offers pay-as-you-go pricing. 3. It is, for all intents and purposes, infinitely scalable. Thus, no need for support from the provider for set-up, no upfront payments for licences or minimum term agreements, and no constraints on what I can do! • Hosting is not cloud. • BYO licensing is not cloud. 7
  • 8. ‡ From servers... to serverless 8 Servers = Pets Virtual Machines= Cattle Containers & Serverless = Herds Trainline policy: Use PaaS wherever possible, Use Serverless wherever possible, ... so long as they are good enough.
  • 11. ‡ Lessons: Lambda • Effortless scaling; we often have > 100 λs running at once. • Warm-up time. • Choose language / framework carefully. • Consequences of 'freeze'. • Monitoring– single thread. Google "Trainline Engineering Lambda" 11 Service Time Distribution Execution (ms)
  • 12. ‡ Lessons: Kinesis Streams • TCO is generally low. • But... understand costs, related to capacity of stream (number & size of messages), time-to-live, etc. • Monitoring / alerting... CloudWatch is (probably) not enough. • Compress & encrypt? Google "AWS Overview of Security Processes" 12
  • 14. ‡ 0% 10% 20% 30% 40% 50% 60% 70% Delays Overcrowding Value for money Toilet Facilities Luggage Space Availability of staff Car Parking Unhappy customers Source : National Rail Passenger Survey (NRPS) 2015 14
  • 16. ‡ 16 Google "BusyBot overcrowding"
  • 17. ‡ Busy Bot Discovery Data from March to May - approx. 100k feedback from our Android users. 17
  • 18. ‡ Infrastructure – Data Gateway Feedback collection Daily Enrichment { "train_destination": "RDG", "retail_train_number": "GW2980", "train_origin": "NRC", "train_date": "2016-08-08T07:38:00.000Z", "customer_longitude": 0, "train_hashid": "NRC:RDG:08/08/2016 08:38:00:GW2980", "customer_location_on_train": "Back", "customer_hashid": ”…", "customer_got_seat": 1, "customer_feedback": "Yes", "feedback_type": 1, "customer_latitude": 0, "feedbackid": ”…", "device_id": ”…", "timestamp": "2016-08-08T07:41:39.390Z", "customer_id": ”…” } 18
  • 22. ‡ Infrastructure – Data Platform Model Building And Validation Service route- origin route- destination stop customer-location- train percentage-who-got- seat feedback- count EUS MAN EUS middle 0.738059701 4020 EUS BHM EUS middle 0.63788222 3532 KGX LDS KGX middle 0.704984154 3471 BHM EUS BHM middle 0.679082241 3356 KGX EDB KGX middle 0.5589236 3233 EUS GLC EUS middle 0.676663543 3201 MAN EUS MAN middle 0.769495772 3193 PAD SWA PAD middle 0.608086078 3067 EUS BHM EUS front 0.672365666 2866 EUS MAN EUS front 0.790479625 2773 { "retailTrainIdentifier": "VT7280", "isBusy": false, "callingPoints": [ { "stationCode": "EUS", "coaches": [ {"position": "Back", "recommend": true}, {"position": "Front", "recommend": false}, {"position": "Middle", "recommend": false} ] }, { "stationCode": "MKC", "coaches": [ {"position": "Back", "recommend": false}, … 22
  • 23. ‡ • At least N feedbacks • At least feedbacks for D days • CI on the percentage who got a seat <= p Data Validation 23
  • 30. ‡ Any Questions? (we are hiring!) Data Scientist positions: mali.mehmood@thetrainline.com Data Engineer positions: david.smith@thetrainline.com 30