Real-Time Analytics at Uber Scale

*
Apollo
James Burkhart
Uber - Staff Engineer

Agenda
- Motivation
- Ingest
- Storage
- Query

Motivation
- Business Intelligence
- Real-time
- Time series aggregates
- Geospatial

What is Apollo?
- Real-time analytics platform focused on:
- Recent data (~7 weeks)
- Immediate visibility (1500ms-3minute p99 ingest latency)
- Ad-hoc queryability
- Arbitrary drilldown
- Geospatial functionality
- Data correctness/deduplication (exactly-once)
- Extremely low latency query (<100ms p95, <1s p99)
- Powering internal data tools at Uber

Real-time operational analytics dashboarding
- Used by majority of
Operations weekly

Apollo Query Builder
- Web UI for Apollo
Query Language
- Fully interactive

Motivation, Functionality Requirements
- Index based on data timestamp, not arrival timestamp
- Out of order and late (up to days later) arrival
- Mutability
- Sub-linear performance impact of scaling QPS

Environment Management
(MemSQL Cluster Sizes)
Datacenter 1 Datacenter 2
Production Prime
33x 256GB
Production Prime 2
43x 256GB
Production Minor
5x 256GB
Production Minor 2
7x 256GB
Staging/Preprod
25x 256GB
mirrored

Ingestion
● Simple transformations
○ (i.e string uuid to binary representation)
■ “123e4567-e89b-12d3-a456-426655440000” >= 36B
■ 0x123E4567E89B12D3A456426655440000 >= 16B
● Filters
● Each job is one input stream to (>=1) output tables
● Independent job instance per environment

val inputStream = KafkaInputStream(topic);
job.outputTables.forEach((outputTable) => {
inputStream
.filter( ... )
.map(..transformations -> sql row...)
.grouped(outputTable.batchSize)
.forEach(writeBatchToDatabase)
});

Ingestion
● Upserts - No double counting!
● Async RF=2 MemSQL replication
○ Can lose recent writes during hardware failure
● Solution -> every 6 hours, upsert last 72h worth of data in
batch from Hive

Storage
● In-memory rowstore - mutable/recent
● Columnstore - immutable/older

Caching
● Partial, recomposable results
● Sharded MySQLs

Apollo Query Language (AQL)
● Custom Analytical Time-Series Query Language
● Goals:
○ Flexibility like SQL
○ Minimal Learning Curve
○ Ease-of-Use
● Features:
○ Canonicalization
○ Ease-of-parsing
○ Error detection
○ Automatic optimization

{
"table": "trips",
"joins": [
{
"alias": "g",
"table": "geofences",
"conditions": [
"geography_intersects(request_at, g.shape)"
]
}
], "dimensions": [
{
"sqlExpression": "request_at",
"timeBucketizer": "day",
"timeUnit": "millisecond"
}
], "measures": [
{
"sqlExpression": "count(*)",
"rowFilters": [
"status='completed'"
]
}
], "rowFilters": [
"city_id=1",
"g.uuid=0x0A"
], "timeFilter": {
"column": "request_at",
"from": "yesterday",
"to": "yesterday"
},
"timezone": "America/Los_Angeles"
}
Example

Why SQL is hard for time series OLAP
Field Value
Dimension.SQLExpression request_at
Dimension.TimeBucketizer day
Dimension.TimeUnit millisecond
Timezone America/Los_Angeles

● Date/time functions:
○ ROUND(UNIX_TIMESTAMP(CONVERT_TZ(DATE_FORMAT(CONVERT_TZ(FROM_UNIXTIME(((trips.request_at) - (trips.request_at) %
900000) / 1000), 'GMT', 'America/Los_Angeles'), '%Y-%m-%d'), 'America/Los_Angeles', 'UTC')) / 0.001, 0)
○ Cheap timestamp snapping to 15m
○ Conversion from milliseconds to seconds
○ Conversion from Unix timestamp to SQL time
○ Adding timezone to Unix time
○ Date/time formatting/truncation
○ Timezone conversion
○ Conversion from SQL time to Unix timestamp
○ Conversion from seconds to milliseconds
Field Value
Timezone America/Los_Angeles

● City/Region/Country based timezone
○ ROUND(UNIX_TIMESTAMP(CONVERT_TZ(DATE_FORMAT(CONVERT_TZ(FROM_UNIXTIME(((trips.request_at) - (trips.request_at) %
900000) / 1000), 'GMT', __tz__.sub_region_timezone), '%Y-%m-%d'), __tz__.sub_region_timezone, 'UTC')) / 0.001, 0) FROM trips
JOIN api_cities as __tz__ ON trips.city_id = __tz__.id
○ Join with api_cities (which has timezone info of each level) on city_id
○ Use the corresponding timezone column from api_cities
Field Value
Timezone sub_region_timezone(city_id)

● #completed_trips / #requested_trips
○ SUM(CASE WHEN trips.status=’completed’ THEN 1 ELSE 0 END) / SUM(CASE WHEN trips.status!=’ignored’ THEN 1 ELSE 0 END)
○ SELECT …, _1.completed / _2.requested FROM (SELECT …, COUNT(*) as completed FROM trips WHERE status=’completed’ GROUP BY
...) AS _1 JOIN (SELECT …, COUNT(*) as requested FROM trips WHERE status!=’ignored’ GROUP BY ...) AS _2 ON ...
○ Filters make measures complex
Field Value
Measure[0].SQLExpression count(*)
Measure[0].Filters status=’completed’
Measure[0].Alias completed
Measure[1].SQLExpression count(*)
Measure[1].Filters status!=’ignored’
Measure[1].Alias requested
Measure[2].SQLExpression completed / requested

● #Trips by geofence for geofence A, B and C
○ SELECT count(*), geofences.uuid FROM trips JOIN geofences ON geography_intersects(trips.request_point, geofences.shape) WHERE
geofences.uuid IN (A, B, C) GROUP By geofences.uuid
● Total #Trips for geofence A, B and C
○ SELECT count(*) FROM trips JOIN geofences ON geography_intersects(trips.request_point, geofences.shape) WHERE geofences.uuid IN
(A, B, C)
● Overlapping is OK, overcounting is not!
○ SELECT count(*) FROM trips WHERE EXISTS (SELECT * FROM geofences WHERE geography_intersects(trips.request_point,
geofences.shape) AND geofences.uuid IN (A, B, C)

Bad SQL queries
● SELECT count(*), request_at FROM trips GROUP BY request_at;
○ Time needs to be bucketized! Grouping by milliseconds makes no sense!
● SELECT count(*), fare_total FROM trips GROUP BY fare_total;
○ Some numeric values such as fare needs to be bucketized (reported as histograms)!
● SELECT sum(fare_total) FROM trips, other_table WHERE trips.fare_total>1.0 AND other_table.foo=’BAR’;
○ Join condition is missing, cartesian product is bad!

AQL Query Optimization
Date/time function performance issue
● CONCAT(DATE_FORMAT(FROM_UNIXTIME((__d0__) / 1000), '%Y-%m-%d '), LPAD(3 *
FLOOR(HOUR(FROM_UNIXTIME((__d0__) / 1000)) / 3), 2, '0'), ':00')
● Run for every row (trip)!
Two-stage aggregation
date/time
function
bucketizaton
request_at
count(*)
date/time
function
bucketizaton
request_at
count(*) as c
t - t % 15m
sum(c) Stage 2
Stage 1

Time Series Bucket Splitting
Now: 2016-03-22 13:17
2016-03-21 (partial week)
2016-03-21 (day) 2016-03-22
00:00
(hour)
2016-03-22
01:00
(hour)
...
(hour)
2016-03-22
12:00
(hour)
2016-03-22
13:00
(15m)
2016-03-22
13:15
(minute)
2016-03-22
13:16
(minute)
2016-03-22 13:15 (15m)
Split Rollup
From: this week To: now

Time Series Bucket Splitting
2016-03-07 (week)
To: -12h
2016-03-14 (week) 2016-03-21
(partial week)
2016-03-02
(partial week)
From: -20d
2016-03-02
(day)
2016-03-03
(day)
... (day) 2016-03-06
(day)
2016-03-21
(day)
2016-03-22
00:00 (hour)
Now: 2016-03-22 13:17
2016-03-22
01:00 (hour)
Split Rollup Split Rollup
BucketSize: week

AQL Query Optimization
Aggregate rollups
avg(x) = sum(x) / count(*)
Original function Stage 1 Stage 2 (rollup)
count count sum
sum sum sum
min min min
max max max
count distinct distinct count distinct
HyperLogLog

Contracts
SELECT AVG(fare), ts_15m FROM trips WHERE time >= (now() - 1h)
(where city=x)
group by 15m(, city);

Contracts
SELECT AVG(fare), ts_15m FROM trips WHERE time >= (now() - 1h)
(where city=x)
group by 15m(, city);
(where city=x) --p95--> 50ms 60ms 70ms
For x in cities:
(where city=x) -sum-> ~9s ~10s ~12s
group by city --p95--> 200ms ~1s ~7s
1h 24h (21d, group by 24h)

Contracts
SELECT COUNT(1), AVG(fare), SUM(fare), AVG(eta) FROM trips WHERE ...
SELECT COUNT(1), AVG(fare), SUM(fare), SUM(eta) FROM trips WHERE ...

Contracts
SELECT COUNT(1) FROM trips WHERE
City = ‘San Francisco’
State = ’completed’
Product = ’Uber-X’
(City,State,Product),(City,State),(City,Product),(City),
(State),(State,Product),
(Product),
(∅)
Geographical Breakdowns:
World > North America > United States > US West > California > BayArea > SF

Contracts
SELECT COUNT(1) FROM trips WHERE GROUP BY
City = ‘San Francisco’
State = ’completed’
Product = ’Uber-X’
(City,State,Product),(City,State),(City,Product),(City),
(State),(State,Product),
(Product),
(∅)
Geographical Breakdowns:
World > North America > United States > US West > California > BayArea > SF

Stats
● p80 <= 10ms
● p90 <= 50ms
● p95 <= 100ms
● p99 <= 1000ms
● p99.5 <= 5000ms
● Millions queries/day
● ~250k distinct queries
● Billions MySQL writes/day

Future Plans (next 3-6 months)
● Product
○ Self-service onboarding and schema management
○ Schema change management and automation
● Technology
○ Cost Accounting
○ Contract automation
○ Query cost estimation

Schema Challenges
● Many Schemas:
○ Ingestion transformations
■ Hive
■ Avro-encoded Kafka
○ MemSQL Schema
○ Query layer schema

Ingestion
Metric Spark Golang
Containers 32 4
CPU Cores 160 8
Memory (GB) 226 16
Throughput 36k/s 60k/s
Performance differences for largest job

Questions?
(PS: We’re hiring)
Uber Engineering Blog
eng.uber.com
Uber Open Source
uber.github.io
Uber Eng Twitter
twitter.com/ubereng
These slides
https://tinyurl.com/apollostrata msql.co/uberscale
Check out ‘Hoodie: Incremental processing on Hadoop at Uber’ Thursday 1:50-2:30 for the
next Uber Strata presentation.

Real-Time Analytics at Uber Scale

More Related Content

What's hot

Similar to Real-Time Analytics at Uber Scale

More from SingleStore

Recently uploaded

Real-Time Analytics at Uber Scale