Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout

•Download as PPTX, PDF•

18 likes•12,013 views

confluent

How Uber uses Kafka to drive our real-time business.

Engineering

KAFKA +
Building the World's Realtime Transit Infrastructure

DATA CONSUMERS
Real-time, Fast
Analytics
BATCH PIPELINE
Storm
Applications
Data Science
Analytics
Reporting
KAFKA
VERTICA
RIDER APP
DRIVER APP
API / SERVICES
DISPATCH
(gps logs)
Mapping &
Logistic Ad-hoc exploration
ELK
Samza
Alerts,
Dashboards
Debugging
REAL-TIME PIPELINE
HADOOP
Surge Mobile App
DATA
PRODUCERS
KAFKA 8 ECOSYSTEM @UBER

Product
Features
Predictive
Models
Operational
Analytics
Business
Intelligence
INFRASTRUCTURE ECOSYSTEM

NEAR REALTIME
PRICE SURGING
PRODUCT FEATURES

FRAUD -
ANOMALY
DETECTION
PREDICTIVE MODELS

KAFKA 8KAFKA 7 MIGRATOR
Limited Availability
Difficult to Scale
Not multi-DC Multi-lang incompatibility Multi-DC, multi-language
support
2013 2014 2015 - 2016
KAFKA 7 WORLD
Difficult to Operate
Producer Scale Issues
High Availability
High Scalability
Kafka 7 + Mirrormaker
Deployed everywhere
Kafka 7 migrator
Deployed everywhere
New Kafka 8
pipeline

Kafka 7
Mirrormaker
2.0
Rest
architecture
Data AuditAutomated
Topic Mgmt

Logs Business events
Async REST library
Data Audit
Local spooling
High throughput
custom protocol
REST ARCHITECTURE
Rest Proxy

Mirrormaker 2.0
Robust
Data Audit
Dynamic topics
MIRROR MAKER 2.0
Destination DCSource DC

Msg counts across multiple DCs
End-end latencies across multiple
DCs
DATA AUDIT FOR KAFKA MESSAGES

Mirrormaker
2.0
Rest
architecture
Data Audit Kafka 8Automated
Topic Mgmt

A ROBUST FUTURE
0 data loss messaging system
Data discovery and lineage
Quota management
Self-correcting brokers
Active active data pipelines

Real-time Data
Dynamic SQL(ish)
Real-time decision
THE FUTURE
Real-time Data
Custom Application
Real-time decision
THE PRESENT

What's hot

Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar

Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaKai Wähner

Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScaleSeunghyun Lee

CDC patterns in Apache Kafka®confluent

Data Streaming Ecosystem Management at Booking.com confluent

Batch Processing at Scale with Flink & IcebergFlink Forward

Introduction to Stream ProcessingGuido Schmutz

Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.

Databricks Delta Lake and Its BenefitsDatabricks

Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...HostedbyConfluent

Apache Kafka - Martin PodvalMartin Podval

Autoscaling Flink with Reactive ModeFlink Forward

Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward

Databricks Platform.pptxAlex Ivy

From Zero to Hero with Kafka Connectconfluent

Kafka presentationMohammed Fazuluddin

Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent

Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureKai Wähner

Intro to Delta LakeDatabricks

Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...Spark Summit

What's hot (20)

Kafka Tutorial - Introduction to Apache Kafka (Part 1)

Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka

Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale

CDC patterns in Apache Kafka®

Data Streaming Ecosystem Management at Booking.com

Batch Processing at Scale with Flink & Iceberg

Introduction to Stream Processing

Apache Iceberg - A Table Format for Hige Analytic Datasets

Databricks Delta Lake and Its Benefits

Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...

Apache Kafka - Martin Podval

Autoscaling Flink with Reactive Mode

Tame the small files problem and optimize data layout for streaming ingestion...

Databricks Platform.pptx

From Zero to Hero with Kafka Connect

Kafka presentation

Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...

Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture

Intro to Delta Lake

Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...

Viewers also liked

Uber's new mobile architectureDhaval Patel

Building Real-Time Applications with Android and WebSocketsSergi Almar i Graupera

"Building Data Foundations and Analytics Tools Across The Product" by Crystal...Tech in Asia ID

Open-source Infrastructure at LyftDaniel Hochman

Taxi Startup Presentation for Taxi CompanyEugene Suslo

Just Add Reality: Managing Logistics with the Uber Developer PlatformApigee | Google Cloud

Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...Daniel Hochman

31 - IDNOG03 - Bergas Bimo Branarto (GOJEK) - Scaling GojekIndonesia Network Operators Group

Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...DataStax

Stream Processing with Kafka in Uber, Danny Yuan confluent

Viewers also liked (10)

Uber's new mobile architecture

Building Real-Time Applications with Android and WebSockets

"Building Data Foundations and Analytics Tools Across The Product" by Crystal...

Open-source Infrastructure at Lyft

Taxi Startup Presentation for Taxi Company

Just Add Reality: Managing Logistics with the Uber Developer Platform

Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...

31 - IDNOG03 - Bergas Bimo Branarto (GOJEK) - Scaling Gojek

Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...

Stream Processing with Kafka in Uber, Danny Yuan

Similar to Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout

SNCF-Reseau-6th GIS Rail Summitv3Amaury-Xavier Marchal

Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...HostedbyConfluent

PLNOG 22 - Frédéric Guillois - Automatyzacja widoczności – dynamiczne podejś...PROIDEA

Acura embedded systems on fire policeemergencyAcura Embedded Systems Inc

AI and Space: finally, no more arguing with the GPSSpeck&Tech

Webinar: Introducing the SnapLogic Elastic Integration Platform Summer 2014 R...SnapLogic

EDA Meets Data Engineering – What's the Big Deal?confluent

IXIA VISIBILITY ARCHITECTURE Eliminating Blind spotsCisco Russia

3° Fiware Overview-Chile- TrackTIDChile

Google maps platform product pitch deck Shruti M

2022.04.06 cam scripterssuser3440151

100%-ный контроль для 100%-ной безопасностиАльбина Минуллина

Fast Cars, Big Data - How Streaming Can Help Formula 1Tugdual Grall

Apache Kafka for Smart Grid, Utilities and Energy ProductionKai Wähner

Edge2AI delivered by Cloudera Edge Management(CEM) gvetticaden

Go for Real Time Streaming Architectures - DotGo 2017Mickaël Rémond

Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...Flink Forward

Building scalable data with kafka and sparkbabatunde ekemode

EXA8 Aggregation & Capture ApplicationTamanna Bhatia

Web Liquid Streams Mashup Challenge ICWE 2015Andrea Gallidabino

Similar to Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout (20)

SNCF-Reseau-6th GIS Rail Summitv3

Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...

PLNOG 22 - Frédéric Guillois - Automatyzacja widoczności – dynamiczne podejś...

Acura embedded systems on fire policeemergency

AI and Space: finally, no more arguing with the GPS

Webinar: Introducing the SnapLogic Elastic Integration Platform Summer 2014 R...

EDA Meets Data Engineering – What's the Big Deal?

IXIA VISIBILITY ARCHITECTURE Eliminating Blind spots

3° Fiware Overview-Chile- Track

Google maps platform product pitch deck

2022.04.06 cam scripter

100%-ный контроль для 100%-ной безопасности

Fast Cars, Big Data - How Streaming Can Help Formula 1

Apache Kafka for Smart Grid, Utilities and Energy Production

Edge2AI delivered by Cloudera Edge Management(CEM)

Go for Real Time Streaming Architectures - DotGo 2017

Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...

Building scalable data with kafka and spark

EXA8 Aggregation & Capture Application

Web Liquid Streams Mashup Challenge ICWE 2015

Recently uploaded

young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

DATA ANALYTICS PPT definition usage examplePragyanshuParadkar1

main PPT.pptx of girls hostel security using rfidNikhilNagaraju

Churning of Butter, Factors affecting .Satyam Kumar

CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani

pipeline in computer architecture designssuser87fa0c1

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...9953056974 Low Rate Call Girls In Saket, Delhi NCR

Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066

Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665

Oxy acetylene welding presentation note.eptoze12

Application of Residue Theorem to evaluate real integrations.pptx959SahilShah

Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Low Rate Call Girls In Saket, Delhi NCR

Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR9953056974 Low Rate Call Girls In Saket, Delhi NCR

Past, Present and Future of Generative AIabhishek36461

Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3

POWER SYSTEMS-1 Complete notes examplesDr. Gudipudi Nageswara Rao

complete construction, environmental and economics information of biomass com...asadnawaz62

Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER

EduAI - E learning Platform integrated with AIkoyaldeepu123

Recently uploaded (20)

young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service

DATA ANALYTICS PPT definition usage example

main PPT.pptx of girls hostel security using rfid

Churning of Butter, Factors affecting .

CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf

pipeline in computer architecture design

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...

Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)

Call Girls Delhi {Jodhpur} 9711199012 high profile service

Oxy acetylene welding presentation note.

Application of Residue Theorem to evaluate real integrations.pptx

Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf

Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR

Past, Present and Future of Generative AI

Concrete Mix Design - IS 10262-2019 - .pptx

POWER SYSTEMS-1 Complete notes examples

complete construction, environmental and economics information of biomass com...

Risk Assessment For Installation of Drainage Pipes.pdf

EduAI - E learning Platform integrated with AI

Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout

1. KAFKA + Building the World's Realtime Transit Infrastructure

2. For Illustration only

4. SURGE - CIRCA 2013

5. SURGE - CIRCA 2016

6. DATA CONSUMERS Real-time, Fast Analytics BATCH PIPELINE Storm Applications Data Science Analytics Reporting KAFKA VERTICA RIDER APP DRIVER APP API / SERVICES DISPATCH (gps logs) Mapping & Logistic Ad-hoc exploration ELK Samza Alerts, Dashboards Debugging REAL-TIME PIPELINE HADOOP Surge Mobile App DATA PRODUCERS KAFKA 8 ECOSYSTEM @UBER

7. Product Features Predictive Models Operational Analytics Business Intelligence INFRASTRUCTURE ECOSYSTEM

8. NEAR REALTIME PRICE SURGING PRODUCT FEATURES

9. FRAUD - ANOMALY DETECTION PREDICTIVE MODELS

10. PREDICTIVE MODELS ETA

11. OPERATIONAL ANALYTICS

12. UberEATs OPERATIONAL ANALYTICS

13. XP OPERATIONAL ANALYTICS

14. BUSINESS INTELLIGENCE

15. KAFKA 8KAFKA 7 MIGRATOR Limited Availability Difficult to Scale Not multi-DC Multi-lang incompatibility Multi-DC, multi-language support 2013 2014 2015 - 2016 KAFKA 7 WORLD Difficult to Operate Producer Scale Issues High Availability High Scalability Kafka 7 + Mirrormaker Deployed everywhere Kafka 7 migrator Deployed everywhere New Kafka 8 pipeline

16. Kafka 7 Mirrormaker 2.0 Rest architecture Data AuditAutomated Topic Mgmt

17. Logs Business events Async REST library Data Audit Local spooling High throughput custom protocol REST ARCHITECTURE Rest Proxy

18. Automated Schema and Topic Management

19. Mirrormaker 2.0 Robust Data Audit Dynamic topics MIRROR MAKER 2.0 Destination DCSource DC

20. Msg counts across multiple DCs End-end latencies across multiple DCs DATA AUDIT FOR KAFKA MESSAGES

21. Mirrormaker 2.0 Rest architecture Data Audit Kafka 8Automated Topic Mgmt

22. A ROBUST FUTURE 0 data loss messaging system Data discovery and lineage Quota management Self-correcting brokers Active active data pipelines

23. Real-time Data Dynamic SQL(ish) Real-time decision THE FUTURE Real-time Data Custom Application Real-time decision THE PRESENT

24. TELEMATICS

25. SELF DRIVING CAR

26.

27. Thank you, Kafka Community!

Editor's Notes

Duration: Keynote is 15 mins long Good morning! My name is Aaron Schildkrout. I run Data and Marketing at Uber. I’m here today to talk to you about our Realtime journey at Uber - and particularly the critical and hugely empowering role Kafka (including Confluent and the whole Kafka community) has played in this journey.
Uber is realtime transit infrastructure for the globe. We’ve stated many times that we want this infrastructure to be as reliable as running water. A utility. A right even. A project that started out as a cool app to get you black cars on demand - is quickly becoming among the largest global infrastructure inventions of all time. And - like the cars moving on the streets outside right now - it is all taking place now and now and now. It is real time.
We’re not the only ones. The internet is quite literally penetrating our lives. -our cities -our relationships -our bodies This is a known story. But it’s getting more radical by the day. And as this penetration increases - in volume, in immediacy, in depth - there is an unbelievable increase in the need for systems that facilitate the flow of information, in real time, between our lives and our machines and back again. That’s why we’re all here.
Compressing time and space - is...a non-trivial technical problem. Uber for instance - has always sought to provide this kind of truly responsive, realtime infrastructure. But in the beginning we were...just starting. This is surge circa 2012/3 in our driver app. Our first version of surge, v1, used data it queried directly from our dispatch service There was only one Node.js process per city The geofenses were very big and not granular at all (causing a lot of problems and huge inefficiency).
This is surge today - with the addition of much more granular geo-temporal surge targeting. We are updating - in real-time - our understanding of supply and demand in highly specific geographies to allow us to calculate surge in the hexagons shown in this screen. This system now runs on Kafka - as opposed to our janky node query - and while it took us a bit of time to make this truly work at our exponentially exploding global scale...we’ve gotten...at least closer. That’s the story I’ll tell today.
To get the obvious architectural diagram out of the way - here’s how Kafka 8 is currently used @ Uber.
The Real-time infrastructure ecosystem - which includes Kafka - at Uber powers many key pieces of our business. I think of this in this topology...
Surge - as noted earlier..
FRAUD MODELS
ETA - real-time system
Cities use real-time operational analytics to active manage their cities - making adjustments in dispatch, messaging, etc - to optimize city functioning. Much of Uber’s success has to do with the amazing speed and agility of our on-the-ground global city teams - and much of this comes from empowering them with realtime tools.
We’ve recently applied this same type of infrastructure to our Uber Eats business, which is rapidly scaling now and involves significant operational complexity.
Internally analytics on our experimentation pipeline - which now powers the creation of hundreds of new experiments weekly and on which our teams are acting on daily based on rapid data feedback loops - is a real-time system.
Pretty awesome. But it took a long journey to get there. 2013 - we first launched Kafka 7 each application essentially ran its own Kafka cluster 2014 - started a transition to K8 - where we started moving all our K7 data to K8 through the K7 migrator. 2015 to today - we deployed a fully functional K8 pipeline - stable with scalable producers and consumers and multi-DC, multi language support
Along the way we ran into some significant limitations…and we did a bunch of work that I’ll work through now to complete our migration to Kafka 8 - and, more fundamentally, to make Kafka work at our scale.
We implemented REST proxy improvements, adding a new binary protocol for high throughput. By building REST client libraries, we facilitated multi-language support (which was important given our 4-language environment)
We automated schema and topic management. In a world with many thousands of topics and hundreds of engineers and teams producing data, the absence of strong tools around schema inferencing, enforcement and management were a huge painpoint.
We built Mirrormaker 2.0, which we’ll soon be open sourcing… It’s More robust // Easier to operate // and allows for dynamic topic addition
And… We built a series of Data auditing tools - allowing us to track data loss and latency spikes at different points in the Kafka pipeline, which at scale became critical for triaging and solving problems at a rapid pace
All kafka data producers at Uber are now running Kafka 8. The project has been a huge success and is now powering much of Uber’s data infrastructure. It is...mission critical.
Add notes
The goal is to shrink the barrier between real time Infra and analytical usage.
We’re currently capturing accelerometer data from the driver’s / rider’s phone via Kafka. This data is then used for: Detecting traffic / road conditions ? (need to confirm) 1) we use our motionstash data to generate safety models an safety scores for all our drivers (Supervised machine learning and classification algorithms) 2) we do per trip adhoc- analysis for safety by computing safety scores per driver. Use the models generated in 1) to predict in realtime and alert a driver about their unsafe driving.
Duration: Keynote is 15 mins long

Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (10)

Similar to Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout

Similar to Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout (20)

More from confluent

More from confluent (20)

Recently uploaded

Recently uploaded (20)

Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout

Editor's Notes