*
Apollo
James Burkhart
Uber - Staff Engineer
Agenda
- Motivation
- Ingest
- Storage
- Query
Motivation
- Business Intelligence
- Real-time
- Time series aggregates
- Geospatial
What is Apollo?
- Real-time analytics platform focused on:
- Recent data (~7 weeks)
- Immediate visibility (1500ms-3minute p99 ingest latency)
- Ad-hoc queryability
- Arbitrary drilldown
- Geospatial functionality
- Data correctness/deduplication (exactly-once)
- Extremely low latency query (<100ms p95, <1s p99)
- Powering internal data tools at Uber
Real-time operational analytics dashboarding
- Used by majority of
Operations weekly
Apollo Query Builder
- Web UI for Apollo
Query Language
- Fully interactive
NYE 2016-2017
Motivation, Functionality Requirements
- Index based on data timestamp, not arrival timestamp
- Out of order and late (up to days later) arrival
- Mutability
- Sub-linear performance impact of scaling QPS
Apollo architecture
Users
Environment Management
(MemSQL Cluster Sizes)
Datacenter 1 Datacenter 2
Production Prime
33x 256GB
Production Prime 2
43x 256GB
Production Minor
5x 256GB
Production Minor 2
7x 256GB
Staging/Preprod
25x 256GB
mirrored
Ingestion
Ingestion
● Simple transformations
○ (i.e string uuid to binary representation)
■ “123e4567-e89b-12d3-a456-426655440000” >= 36B
■ 0x123E4567E89B12D3A456426655440000 >= 16B
● Filters
● Each job is one input stream to (>=1) output tables
● Independent job instance per environment
val inputStream = KafkaInputStream(topic);
job.outputTables.forEach((outputTable) => {
inputStream
.filter( ... )
.map(..transformations -> sql row...)
.grouped(outputTable.batchSize)
.forEach(writeBatchToDatabase)
});
Ingestion
● Upserts - No double counting!
● Async RF=2 MemSQL replication
○ Can lose recent writes during hardware failure
● Solution -> every 6 hours, upsert last 72h worth of data in
batch from Hive
Storage
● In-memory rowstore - mutable/recent
● Columnstore - immutable/older
Caching
● Partial, recomposable results
● Sharded MySQLs
Apollo Query Language (AQL)
● Custom Analytical Time-Series Query Language
● Goals:
○ Flexibility like SQL
○ Minimal Learning Curve
○ Ease-of-Use
● Features:
○ Canonicalization
○ Ease-of-parsing
○ Error detection
○ Automatic optimization
{
"table": "trips",
"joins": [
{
"alias": "g",
"table": "geofences",
"conditions": [
"geography_intersects(request_at, g.shape)"
]
}
], "dimensions": [
{
"sqlExpression": "request_at",
"timeBucketizer": "day",
"timeUnit": "millisecond"
}
], "measures": [
{
"sqlExpression": "count(*)",
"rowFilters": [
"status='completed'"
]
}
], "rowFilters": [
"city_id=1",
"g.uuid=0x0A"
], "timeFilter": {
"column": "request_at",
"from": "yesterday",
"to": "yesterday"
},
"timezone": "America/Los_Angeles"
}
Example
Apollo Query Builder
- Web UI for Apollo
Query Language
- Fully interactive
Why SQL is hard for time series OLAP
Field Value
Dimension.SQLExpression request_at
Dimension.TimeBucketizer day
Dimension.TimeUnit millisecond
Timezone America/Los_Angeles
Why SQL is hard for time series OLAP
● Date/time functions:
○ ROUND(UNIX_TIMESTAMP(CONVERT_TZ(DATE_FORMAT(CONVERT_TZ(FROM_UNIXTIME(((trips.request_at) - (trips.request_at) %
900000) / 1000), 'GMT', 'America/Los_Angeles'), '%Y-%m-%d'), 'America/Los_Angeles', 'UTC')) / 0.001, 0)
○ Cheap timestamp snapping to 15m
○ Conversion from milliseconds to seconds
○ Conversion from Unix timestamp to SQL time
○ Adding timezone to Unix time
○ Date/time formatting/truncation
○ Timezone conversion
○ Conversion from SQL time to Unix timestamp
○ Conversion from seconds to milliseconds
Field Value
Dimension.SQLExpression request_at
Dimension.TimeBucketizer day
Dimension.TimeUnit millisecond
Timezone America/Los_Angeles
Why SQL is hard for time series OLAP
● City/Region/Country based timezone
○ ROUND(UNIX_TIMESTAMP(CONVERT_TZ(DATE_FORMAT(CONVERT_TZ(FROM_UNIXTIME(((trips.request_at) - (trips.request_at) %
900000) / 1000), 'GMT', __tz__.sub_region_timezone), '%Y-%m-%d'), __tz__.sub_region_timezone, 'UTC')) / 0.001, 0) FROM trips
JOIN api_cities as __tz__ ON trips.city_id = __tz__.id
○ Join with api_cities (which has timezone info of each level) on city_id
○ Use the corresponding timezone column from api_cities
Field Value
Dimension.SQLExpression request_at
Dimension.TimeBucketizer day
Dimension.TimeUnit millisecond
Timezone sub_region_timezone(city_id)
Why SQL is hard for time series OLAP
● #completed_trips / #requested_trips
○ SUM(CASE WHEN trips.status=’completed’ THEN 1 ELSE 0 END) / SUM(CASE WHEN trips.status!=’ignored’ THEN 1 ELSE 0 END)
○ SELECT …, _1.completed / _2.requested FROM (SELECT …, COUNT(*) as completed FROM trips WHERE status=’completed’ GROUP BY
...) AS _1 JOIN (SELECT …, COUNT(*) as requested FROM trips WHERE status!=’ignored’ GROUP BY ...) AS _2 ON ...
○ Filters make measures complex
Field Value
Measure[0].SQLExpression count(*)
Measure[0].Filters status=’completed’
Measure[0].Alias completed
Measure[1].SQLExpression count(*)
Measure[1].Filters status!=’ignored’
Measure[1].Alias requested
Measure[2].SQLExpression completed / requested
Why SQL is hard for time series OLAP
● #Trips by geofence for geofence A, B and C
○ SELECT count(*), geofences.uuid FROM trips JOIN geofences ON geography_intersects(trips.request_point, geofences.shape) WHERE
geofences.uuid IN (A, B, C) GROUP By geofences.uuid
● Total #Trips for geofence A, B and C
○ SELECT count(*) FROM trips JOIN geofences ON geography_intersects(trips.request_point, geofences.shape) WHERE geofences.uuid IN
(A, B, C)
● Overlapping is OK, overcounting is not!
○ SELECT count(*) FROM trips WHERE EXISTS (SELECT * FROM geofences WHERE geography_intersects(trips.request_point,
geofences.shape) AND geofences.uuid IN (A, B, C)
Bad SQL queries
● SELECT count(*), request_at FROM trips GROUP BY request_at;
○ Time needs to be bucketized! Grouping by milliseconds makes no sense!
● SELECT count(*), fare_total FROM trips GROUP BY fare_total;
○ Some numeric values such as fare needs to be bucketized (reported as histograms)!
● SELECT sum(fare_total) FROM trips, other_table WHERE trips.fare_total>1.0 AND other_table.foo=’BAR’;
○ Join condition is missing, cartesian product is bad!
AQL Query Optimization
Date/time function performance issue
● CONCAT(DATE_FORMAT(FROM_UNIXTIME((__d0__) / 1000), '%Y-%m-%d '), LPAD(3 *
FLOOR(HOUR(FROM_UNIXTIME((__d0__) / 1000)) / 3), 2, '0'), ':00')
● Run for every row (trip)!
Two-stage aggregation
date/time
function
bucketizaton
request_at
count(*)
date/time
function
bucketizaton
request_at
count(*) as c
t - t % 15m
sum(c) Stage 2
Stage 1
Time Series Bucket Splitting
Now: 2016-03-22 13:17
2016-03-21 (partial week)
2016-03-21 (day) 2016-03-22
00:00
(hour)
2016-03-22
01:00
(hour)
...
(hour)
2016-03-22
12:00
(hour)
2016-03-22
13:00
(15m)
2016-03-22
13:15
(minute)
2016-03-22
13:16
(minute)
2016-03-22 13:15 (15m)
Split Rollup
From: this week To: now
Time Series Bucket Splitting
2016-03-07 (week)
To: -12h
2016-03-14 (week) 2016-03-21
(partial week)
2016-03-02
(partial week)
From: -20d
2016-03-02
(day)
2016-03-03
(day)
... (day) 2016-03-06
(day)
2016-03-21
(day)
2016-03-22
00:00 (hour)
Now: 2016-03-22 13:17
2016-03-22
01:00 (hour)
Split Rollup Split Rollup
BucketSize: week
AQL Query Optimization
Aggregate rollups
avg(x) = sum(x) / count(*)
Original function Stage 1 Stage 2 (rollup)
count count sum
sum sum sum
min min min
max max max
count distinct distinct count distinct
HyperLogLog
Contracts
SELECT AVG(fare), ts_15m FROM trips WHERE time >= (now() - 1h)
(where city=x)
group by 15m(, city);
Contracts
SELECT AVG(fare), ts_15m FROM trips WHERE time >= (now() - 1h)
(where city=x)
group by 15m(, city);
(where city=x) --p95--> 50ms 60ms 70ms
For x in cities:
(where city=x) -sum-> ~9s ~10s ~12s
group by city --p95--> 200ms ~1s ~7s
1h 24h (21d, group by 24h)
Contracts
SELECT AVG(fare), ts_15m FROM trips WHERE time >= (now() - 1h)
(where city=x)
group by 15m(, city);
(where city=x) --p95--> 50ms 60ms 70ms
For x in cities:
(where city=x) -sum-> ~9s ~10s ~12s
group by city --p95--> 200ms ~1s ~7s
1h 24h (21d, group by 24h)
Contracts
SELECT COUNT(1), AVG(fare), SUM(fare), AVG(eta) FROM trips WHERE ...
SELECT COUNT(1), AVG(fare), SUM(fare), SUM(eta) FROM trips WHERE ...
Contracts
SELECT COUNT(1) FROM trips WHERE
City = ‘San Francisco’
State = ’completed’
Product = ’Uber-X’
(City,State,Product),(City,State),(City,Product),(City),
(State),(State,Product),
(Product),
(∅)
Geographical Breakdowns:
World > North America > United States > US West > California > BayArea > SF
Contracts
SELECT COUNT(1) FROM trips WHERE GROUP BY
City = ‘San Francisco’
State = ’completed’
Product = ’Uber-X’
(City,State,Product),(City,State),(City,Product),(City),
(State),(State,Product),
(Product),
(∅)
Geographical Breakdowns:
World > North America > United States > US West > California > BayArea > SF
Stats
● p80 <= 10ms
● p90 <= 50ms
● p95 <= 100ms
● p99 <= 1000ms
● p99.5 <= 5000ms
● Millions queries/day
● ~250k distinct queries
● Billions MySQL writes/day
Future Plans (next 3-6 months)
● Product
○ Self-service onboarding and schema management
○ Schema change management and automation
● Technology
○ Cost Accounting
○ Contract automation
○ Query cost estimation
Challenges and Learnings
Schema Challenges
● Many Schemas:
○ Ingestion transformations
■ Hive
■ Avro-encoded Kafka
○ MemSQL Schema
○ Query layer schema
Ingestion
Ingestion
Metric Spark Golang
Containers 32 4
CPU Cores 160 8
Memory (GB) 226 16
Throughput 36k/s 60k/s
Performance differences for largest job
Questions?
(PS: We’re hiring)
Uber Engineering Blog
eng.uber.com
Uber Open Source
uber.github.io
Uber Eng Twitter
twitter.com/ubereng
These slides
https://tinyurl.com/apollostrata msql.co/uberscale
Check out ‘Hoodie: Incremental processing on Hadoop at Uber’ Thursday 1:50-2:30 for the
next Uber Strata presentation.

Real-Time Analytics at Uber Scale

  • 1.
  • 2.
  • 3.
    Motivation - Business Intelligence -Real-time - Time series aggregates - Geospatial
  • 4.
    What is Apollo? -Real-time analytics platform focused on: - Recent data (~7 weeks) - Immediate visibility (1500ms-3minute p99 ingest latency) - Ad-hoc queryability - Arbitrary drilldown - Geospatial functionality - Data correctness/deduplication (exactly-once) - Extremely low latency query (<100ms p95, <1s p99) - Powering internal data tools at Uber
  • 5.
    Real-time operational analyticsdashboarding - Used by majority of Operations weekly
  • 6.
    Apollo Query Builder -Web UI for Apollo Query Language - Fully interactive
  • 7.
  • 8.
    Motivation, Functionality Requirements -Index based on data timestamp, not arrival timestamp - Out of order and late (up to days later) arrival - Mutability - Sub-linear performance impact of scaling QPS
  • 9.
  • 10.
    Environment Management (MemSQL ClusterSizes) Datacenter 1 Datacenter 2 Production Prime 33x 256GB Production Prime 2 43x 256GB Production Minor 5x 256GB Production Minor 2 7x 256GB Staging/Preprod 25x 256GB mirrored
  • 11.
  • 12.
    Ingestion ● Simple transformations ○(i.e string uuid to binary representation) ■ “123e4567-e89b-12d3-a456-426655440000” >= 36B ■ 0x123E4567E89B12D3A456426655440000 >= 16B ● Filters ● Each job is one input stream to (>=1) output tables ● Independent job instance per environment
  • 13.
    val inputStream =KafkaInputStream(topic); job.outputTables.forEach((outputTable) => { inputStream .filter( ... ) .map(..transformations -> sql row...) .grouped(outputTable.batchSize) .forEach(writeBatchToDatabase) });
  • 14.
    Ingestion ● Upserts -No double counting! ● Async RF=2 MemSQL replication ○ Can lose recent writes during hardware failure ● Solution -> every 6 hours, upsert last 72h worth of data in batch from Hive
  • 15.
    Storage ● In-memory rowstore- mutable/recent ● Columnstore - immutable/older
  • 16.
    Caching ● Partial, recomposableresults ● Sharded MySQLs
  • 17.
    Apollo Query Language(AQL) ● Custom Analytical Time-Series Query Language ● Goals: ○ Flexibility like SQL ○ Minimal Learning Curve ○ Ease-of-Use ● Features: ○ Canonicalization ○ Ease-of-parsing ○ Error detection ○ Automatic optimization
  • 18.
    { "table": "trips", "joins": [ { "alias":"g", "table": "geofences", "conditions": [ "geography_intersects(request_at, g.shape)" ] } ], "dimensions": [ { "sqlExpression": "request_at", "timeBucketizer": "day", "timeUnit": "millisecond" } ], "measures": [ { "sqlExpression": "count(*)", "rowFilters": [ "status='completed'" ] } ], "rowFilters": [ "city_id=1", "g.uuid=0x0A" ], "timeFilter": { "column": "request_at", "from": "yesterday", "to": "yesterday" }, "timezone": "America/Los_Angeles" } Example
  • 19.
    Apollo Query Builder -Web UI for Apollo Query Language - Fully interactive
  • 20.
    Why SQL ishard for time series OLAP Field Value Dimension.SQLExpression request_at Dimension.TimeBucketizer day Dimension.TimeUnit millisecond Timezone America/Los_Angeles
  • 21.
    Why SQL ishard for time series OLAP ● Date/time functions: ○ ROUND(UNIX_TIMESTAMP(CONVERT_TZ(DATE_FORMAT(CONVERT_TZ(FROM_UNIXTIME(((trips.request_at) - (trips.request_at) % 900000) / 1000), 'GMT', 'America/Los_Angeles'), '%Y-%m-%d'), 'America/Los_Angeles', 'UTC')) / 0.001, 0) ○ Cheap timestamp snapping to 15m ○ Conversion from milliseconds to seconds ○ Conversion from Unix timestamp to SQL time ○ Adding timezone to Unix time ○ Date/time formatting/truncation ○ Timezone conversion ○ Conversion from SQL time to Unix timestamp ○ Conversion from seconds to milliseconds Field Value Dimension.SQLExpression request_at Dimension.TimeBucketizer day Dimension.TimeUnit millisecond Timezone America/Los_Angeles
  • 22.
    Why SQL ishard for time series OLAP ● City/Region/Country based timezone ○ ROUND(UNIX_TIMESTAMP(CONVERT_TZ(DATE_FORMAT(CONVERT_TZ(FROM_UNIXTIME(((trips.request_at) - (trips.request_at) % 900000) / 1000), 'GMT', __tz__.sub_region_timezone), '%Y-%m-%d'), __tz__.sub_region_timezone, 'UTC')) / 0.001, 0) FROM trips JOIN api_cities as __tz__ ON trips.city_id = __tz__.id ○ Join with api_cities (which has timezone info of each level) on city_id ○ Use the corresponding timezone column from api_cities Field Value Dimension.SQLExpression request_at Dimension.TimeBucketizer day Dimension.TimeUnit millisecond Timezone sub_region_timezone(city_id)
  • 23.
    Why SQL ishard for time series OLAP ● #completed_trips / #requested_trips ○ SUM(CASE WHEN trips.status=’completed’ THEN 1 ELSE 0 END) / SUM(CASE WHEN trips.status!=’ignored’ THEN 1 ELSE 0 END) ○ SELECT …, _1.completed / _2.requested FROM (SELECT …, COUNT(*) as completed FROM trips WHERE status=’completed’ GROUP BY ...) AS _1 JOIN (SELECT …, COUNT(*) as requested FROM trips WHERE status!=’ignored’ GROUP BY ...) AS _2 ON ... ○ Filters make measures complex Field Value Measure[0].SQLExpression count(*) Measure[0].Filters status=’completed’ Measure[0].Alias completed Measure[1].SQLExpression count(*) Measure[1].Filters status!=’ignored’ Measure[1].Alias requested Measure[2].SQLExpression completed / requested
  • 24.
    Why SQL ishard for time series OLAP ● #Trips by geofence for geofence A, B and C ○ SELECT count(*), geofences.uuid FROM trips JOIN geofences ON geography_intersects(trips.request_point, geofences.shape) WHERE geofences.uuid IN (A, B, C) GROUP By geofences.uuid ● Total #Trips for geofence A, B and C ○ SELECT count(*) FROM trips JOIN geofences ON geography_intersects(trips.request_point, geofences.shape) WHERE geofences.uuid IN (A, B, C) ● Overlapping is OK, overcounting is not! ○ SELECT count(*) FROM trips WHERE EXISTS (SELECT * FROM geofences WHERE geography_intersects(trips.request_point, geofences.shape) AND geofences.uuid IN (A, B, C)
  • 25.
    Bad SQL queries ●SELECT count(*), request_at FROM trips GROUP BY request_at; ○ Time needs to be bucketized! Grouping by milliseconds makes no sense! ● SELECT count(*), fare_total FROM trips GROUP BY fare_total; ○ Some numeric values such as fare needs to be bucketized (reported as histograms)! ● SELECT sum(fare_total) FROM trips, other_table WHERE trips.fare_total>1.0 AND other_table.foo=’BAR’; ○ Join condition is missing, cartesian product is bad!
  • 26.
    AQL Query Optimization Date/timefunction performance issue ● CONCAT(DATE_FORMAT(FROM_UNIXTIME((__d0__) / 1000), '%Y-%m-%d '), LPAD(3 * FLOOR(HOUR(FROM_UNIXTIME((__d0__) / 1000)) / 3), 2, '0'), ':00') ● Run for every row (trip)! Two-stage aggregation date/time function bucketizaton request_at count(*) date/time function bucketizaton request_at count(*) as c t - t % 15m sum(c) Stage 2 Stage 1
  • 27.
    Time Series BucketSplitting Now: 2016-03-22 13:17 2016-03-21 (partial week) 2016-03-21 (day) 2016-03-22 00:00 (hour) 2016-03-22 01:00 (hour) ... (hour) 2016-03-22 12:00 (hour) 2016-03-22 13:00 (15m) 2016-03-22 13:15 (minute) 2016-03-22 13:16 (minute) 2016-03-22 13:15 (15m) Split Rollup From: this week To: now
  • 28.
    Time Series BucketSplitting 2016-03-07 (week) To: -12h 2016-03-14 (week) 2016-03-21 (partial week) 2016-03-02 (partial week) From: -20d 2016-03-02 (day) 2016-03-03 (day) ... (day) 2016-03-06 (day) 2016-03-21 (day) 2016-03-22 00:00 (hour) Now: 2016-03-22 13:17 2016-03-22 01:00 (hour) Split Rollup Split Rollup BucketSize: week
  • 29.
    AQL Query Optimization Aggregaterollups avg(x) = sum(x) / count(*) Original function Stage 1 Stage 2 (rollup) count count sum sum sum sum min min min max max max count distinct distinct count distinct HyperLogLog
  • 30.
    Contracts SELECT AVG(fare), ts_15mFROM trips WHERE time >= (now() - 1h) (where city=x) group by 15m(, city);
  • 31.
    Contracts SELECT AVG(fare), ts_15mFROM trips WHERE time >= (now() - 1h) (where city=x) group by 15m(, city); (where city=x) --p95--> 50ms 60ms 70ms For x in cities: (where city=x) -sum-> ~9s ~10s ~12s group by city --p95--> 200ms ~1s ~7s 1h 24h (21d, group by 24h)
  • 32.
    Contracts SELECT AVG(fare), ts_15mFROM trips WHERE time >= (now() - 1h) (where city=x) group by 15m(, city); (where city=x) --p95--> 50ms 60ms 70ms For x in cities: (where city=x) -sum-> ~9s ~10s ~12s group by city --p95--> 200ms ~1s ~7s 1h 24h (21d, group by 24h)
  • 33.
    Contracts SELECT COUNT(1), AVG(fare),SUM(fare), AVG(eta) FROM trips WHERE ... SELECT COUNT(1), AVG(fare), SUM(fare), SUM(eta) FROM trips WHERE ...
  • 34.
    Contracts SELECT COUNT(1) FROMtrips WHERE City = ‘San Francisco’ State = ’completed’ Product = ’Uber-X’ (City,State,Product),(City,State),(City,Product),(City), (State),(State,Product), (Product), (∅) Geographical Breakdowns: World > North America > United States > US West > California > BayArea > SF
  • 35.
    Contracts SELECT COUNT(1) FROMtrips WHERE GROUP BY City = ‘San Francisco’ State = ’completed’ Product = ’Uber-X’ (City,State,Product),(City,State),(City,Product),(City), (State),(State,Product), (Product), (∅) Geographical Breakdowns: World > North America > United States > US West > California > BayArea > SF
  • 36.
    Stats ● p80 <=10ms ● p90 <= 50ms ● p95 <= 100ms ● p99 <= 1000ms ● p99.5 <= 5000ms ● Millions queries/day ● ~250k distinct queries ● Billions MySQL writes/day
  • 37.
    Future Plans (next3-6 months) ● Product ○ Self-service onboarding and schema management ○ Schema change management and automation ● Technology ○ Cost Accounting ○ Contract automation ○ Query cost estimation
  • 38.
  • 39.
    Schema Challenges ● ManySchemas: ○ Ingestion transformations ■ Hive ■ Avro-encoded Kafka ○ MemSQL Schema ○ Query layer schema
  • 40.
  • 41.
    Ingestion Metric Spark Golang Containers32 4 CPU Cores 160 8 Memory (GB) 226 16 Throughput 36k/s 60k/s Performance differences for largest job
  • 42.
    Questions? (PS: We’re hiring) UberEngineering Blog eng.uber.com Uber Open Source uber.github.io Uber Eng Twitter twitter.com/ubereng These slides https://tinyurl.com/apollostrata msql.co/uberscale Check out ‘Hoodie: Incremental processing on Hadoop at Uber’ Thursday 1:50-2:30 for the next Uber Strata presentation.