Duration: Keynote is 15 mins long
Good morning!
My name is Aaron Schildkrout.
I run Data and Marketing at Uber.
I’m here today to talk to you about our Realtime journey at Uber - and particularly the critical and hugely empowering role Kafka (including Confluent and the whole Kafka community) has played in this journey.
Uber is realtime transit infrastructure for the globe.
We’ve stated many times that we want this infrastructure to be as reliable as running water.
A utility. A right even.
A project that started out as a cool app to get you black cars on demand - is quickly becoming among the largest global infrastructure inventions of all time.
And - like the cars moving on the streets outside right now - it is all taking place now and now and now. It is real time.
We’re not the only ones.
The internet is quite literally penetrating our lives.
-our cities
-our relationships
-our bodies
This is a known story.
But it’s getting more radical by the day. And as this penetration increases - in volume, in immediacy, in depth - there is an unbelievable increase in the need for systems that facilitate the flow of information, in real time, between our lives and our machines and back again.
That’s why we’re all here.
Compressing time and space - is...a non-trivial technical problem.
Uber for instance - has always sought to provide this kind of truly responsive, realtime infrastructure.
But in the beginning we were...just starting. This is surge circa 2012/3 in our driver app.
Our first version of surge, v1, used data it queried directly from our dispatch service
There was only one Node.js process per city
The geofenses were very big and not granular at all (causing a lot of problems and huge inefficiency).
This is surge today - with the addition of much more granular geo-temporal surge targeting.
We are updating - in real-time - our understanding of supply and demand in highly specific geographies to allow us to calculate surge in the hexagons shown in this screen.
This system now runs on Kafka - as opposed to our janky node query - and while it took us a bit of time to make this truly work at our exponentially exploding global scale...we’ve gotten...at least closer.
That’s the story I’ll tell today.
To get the obvious architectural diagram out of the way - here’s how Kafka 8 is currently used @ Uber.
The Real-time infrastructure ecosystem - which includes Kafka - at Uber powers many key pieces of our business. I think of this in this topology...
Surge - as noted earlier..
FRAUD MODELS
ETA - real-time system
Cities use real-time operational analytics to active manage their cities - making adjustments in dispatch, messaging, etc - to optimize city functioning.
Much of Uber’s success has to do with the amazing speed and agility of our on-the-ground global city teams - and much of this comes from empowering them with realtime tools.
We’ve recently applied this same type of infrastructure to our Uber Eats business, which is rapidly scaling now and involves significant operational complexity.
Internally analytics on our experimentation pipeline - which now powers the creation of hundreds of new experiments weekly and on which our teams are acting on daily based on rapid data feedback loops - is a real-time system.
Pretty awesome. But it took a long journey to get there.
2013 - we first launched Kafka 7 each application essentially ran its own Kafka cluster
2014 - started a transition to K8 - where we started moving all our K7 data to K8 through the K7 migrator.
2015 to today - we deployed a fully functional K8 pipeline - stable with scalable producers and consumers and multi-DC, multi language support
Along the way we ran into some significant limitations…and we did a bunch of work that I’ll work through now to complete our migration to Kafka 8 - and, more fundamentally, to make Kafka work at our scale.
We implemented REST proxy improvements, adding a new binary protocol for high throughput.
By building REST client libraries, we facilitated multi-language support (which was important given our 4-language environment)
We automated schema and topic management. In a world with many thousands of topics and hundreds of engineers and teams producing data, the absence of strong tools around schema inferencing, enforcement and management were a huge painpoint.
We built Mirrormaker 2.0, which we’ll soon be open sourcing…
It’s More robust // Easier to operate // and allows for dynamic topic addition
And…
We built a series of Data auditing tools - allowing us to track data loss and latency spikes at different points in the Kafka pipeline, which at scale became critical for triaging and solving problems at a rapid pace
All kafka data producers at Uber are now running Kafka 8. The project has been a huge success and is now powering much of Uber’s data infrastructure. It is...mission critical.
Add notes
The goal is to shrink the barrier between real time Infra and analytical usage.
We’re currently capturing accelerometer data from the driver’s / rider’s phone via Kafka. This data is then used for:
Detecting traffic / road conditions ? (need to confirm)
1) we use our motionstash data to generate safety models an safety scores for all our drivers (Supervised machine learning and classification algorithms)
2) we do per trip adhoc- analysis for safety by computing safety scores per driver.
Use the models generated in 1) to predict in realtime and alert a driver about their unsafe driving.