-
1.
Low Latency “OLAP” with HBase
Cosmin Lehene | Adobe
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
-
2.
What we needed … and built
OLAP Semantics
Low Latency Ingestion
High Throughput
Real-time Query API
Not hardcoded to web analytics or x-, y-, z-
analytics, but extensible
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 2
-
3.
Building Blocks
Dimensions, Metrics
Aggregations
Roll-up, drill-down, slicing and dicing, sorting
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 3
-
4.
OLAP 101 – Queries example
Date Countr City OS Browser Sale
y
2012-05-21 USA NY Windows FF 0.0
2012-05-21 USA NY Windows FF 10.0
2012-05-22 USA SF OSX Chrome 25.0
2012-05-22 Canada Ontario Linux Chrome 0.0
2012-05-23 USA Chicago OSX Safari 15.0
5 visits, 2 4 cities: 3 OS-es 3 browsers 50.0
3 days countries NY: 2 Win: 2 FF: 2 3 sales
USA: 4 SF: 1 OSX: 2 Chrome:2
Canada: 1
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 4
-
5.
OLAP 101 – Queries example
Rolling up to country level: Country visits sales
SELECT COUNT(visits), SUM(sales)
USA 4 $50
GROUP BY country
Canada 1 0
“Slicing” by browser Country visits sales
SELECT COUNT(visits), SUM(sales) USA 2 $10
GROUP BY country
Canada 0 0
HAVING browser = “FF”
Top browsers by sales Browser sales visits
SELECT SUM(sales), COUNT(visits) Chrome $25 2
GROUP BY browser
Safari $15 1
ORDER BY sales
FF $10 2
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 5
-
6.
OLAP – Runtime Aggregation vs. Pre-aggregation
Aggregate at runtime Pre-aggregate
Most flexible Fast
Fast – scatter gather Efficient – O(1)
Space efficient High throughput
But But
I/O, CPU intensive More effort to process (latency)
slow for larger data Combinatorial explosion (space)
low throughput No flexibility
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 6
-
7.
Pre-aggregation
Data needs to be summarized
Can’t visualize 1B data points (no, not even with Retina display)
Difficult to comprehend correlations among more than 3 dimensions
Not all dimension groups are relevant
Index on a needed basis (view selection problem)
Runtime aggregation == TeraSort for every query?
Pre-aggregate to reduce cardinality
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 7
-
8.
SaasBase
We tune both
pre-aggregation level vs. runtime post-aggregation
(ingestion speed + space ) vs. (query speed)
Think materialized views from RDBMS
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 8
-
9.
SaasBase Domain Model Mapping
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 9
-
10.
SaasBase - Domain Model Mapping
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 10
-
11.
SaasBase - Ingestion, Processing, Indexing, Querying
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 11
-
12.
SaasBase - Ingestion, Processing, Indexing, Querying
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 12
-
13.
Ingestion
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 13
-
14.
Ingestion throughput vs. latency
Historical data (large batches)
Optimize for throughput
Increments (latest data, smaller)
Optimize for latency
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 14
-
15.
Large, granular input strategies
Slow listing in HDFS
Archive processed files
Filtering input
FileDateFilter (log name patterns: log-YYYY-MM-dd-HH.log)
TableInputFormat start/stop row
File Index in HBase (track processed/new files)
Map tasks overhead - stitching input splits
400K files => 400K map tasks => overhead, slow reduce copy
CombineFileInputFormat – 2GB-splits => 500 splits for 1TB
FixedMappersTableInputFormat (e.g. 5-region splits)
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 15
-
16.
Ingestion – Bulk Import
HFileOutputFormat (HFOF)
100s X faster than HBase API
No need to recover from failed jobs
No unnecessary load on machines
* No shuffle - global reduce order
required!
e.g. first reduce key needs to be in the
first region, last one in the last region
Watch for uneven partitions
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 16
-
17.
HFOF – FileSizeDatePartitioner
1 partition(reduce) / day for initial import
Uneven reduce (partitions) due to data growth over time
Reduce k: 2010-12-04 = 500MB
Reduce n: 2012-05-22 = 5GB => slow and will result in a 5GB region
Balance reduce buckets based on input file sizes and the reduce key
Generate sub-partitions based on predefined size (e.g. 1GB)
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 17
-
18.
Processing
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 18
-
19.
Processing
Processing involves reading the Input (files, tables, events), pre-
aggregating it (reducing cardinality) and generating tables that can be
queried in real-time
1 year: 1B events => 100B data points indexed
Query => scan 365 data points (e.g. daily page views)
Processing could be either MR or real-time (e.g. Storm)
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 19
-
20.
Processing for OLAP semantics
GROUP BY (process, query)
COUNT, SUM, AVG, etc. (process, query)
SORT (process, query)
HAVING (mostly query, can define pre-process constraints)
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 20
-
21.
SaasBase vs. SQL Views Comparison
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 21
-
22.
reports.json entities definition
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 22
-
23.
Processing Performance
read, map, partition, combine, copy, sort, reduce, write
Read:
Scan.setCaching() (I/O ~ buffer)
Scan.setBatching() (avoid timeouts for abnormal input, e.g. 1M hits/visit)
Even region distribution across cluster (distributes CPU, I/O)
Map:
No unnecessary transformations: Bytes.toString(bytes) + Bytes.toBytes(string)
(CPU)
Avoid GC : new X() (CPU, Memory)
Avoid system calls (context switching)
Stripping unnecessary data (I/O)
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 23
-
24.
Processing Performance
Hot (in memory) vs. Cold (on disk, on network) data
Minimize I/O from disk/network
Single shot MR job: SuperProcessor
Emit all groups from one map() call
Incremental processing
Data format YYYY-MM-DD prefixed rowkey (HH:mm for more granularity)
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 24
-
25.
Indexing
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 25
-
26.
HBase natural order: hierarchical representation
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 26
-
27.
Indexing - Why
Example: top 10 cities
~50K [country, city] combinations per day
Top 10 cities for 1 year =>
365 (days) X 50K ~=15M data points scanned
If you add gender => 30M
If you add Device, OS, Browser …
Might compress well, but think about the environment
How much energy would you spend for just top 10 cities?
* Image from: http://my.neutralexistence.com/images/Green-Earth.jpg
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 27
-
28.
Indexing with HBase “10” < “2”
GROUP BY year, month, country, city ORDER BY visits DESC LIMIT 10
Lexicographic sorting
2012/05/USA/0000000000/
2012/05/USA/4294961296/San Francisco = 1000 visits*
2012/05/USA/4294961396/New York = 900 visits*
. . .
2012/05/USA/9999999999/
scan “t” startrow => “2012/05/USA/”, limit => 10
* Padding numbers for lexicographic sorting:
1000 -> Long.MAX_VALUE – 1000 = 4294961296
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 28
-
29.
Query Engine
Always reads indexed, compact data
Query parsing
Scan strategy
Single vs. multiple scans
Start/stop rows (prefixes, index positions, etc.)
Index selection (volatile indexes with incremental processing)
Deserialization
Post-aggregation, sorting, fuzzy-sorting etc.
Paging
Custom dimension/metric class loading
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 29
-
30.
Conclusions
OLAP semantics on a simple data model
Data as first class citizen
Domain Specific “Language” for Dimensions, Metrics, Aggregations
Tunable performance, resource allocation
Framework for vertical analytics systems
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 30
-
31.
Thank you!
Cosmin Lehene @clehene
http://hstack.org
Credits:
Andrei Dragomir
Adrian Muraru
Andrei Dulvac
Raluca Podiuc
Tudor Scurtu
Bogdan Dragu
Bogdan Drutu
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 31
-
32.
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
-
33.
OLAP 101 - Rollup
Countr Visits Sale
y
USA 4 $50
Canada 1 $0
Rollup: SELECT COUNT(visits), SUM(sales) GROUP BY country
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 33
-
34.
OLAP 101 - Slicing
Date Countr City OS Browser Sale
y
2012-03-02 USA NY Windows FF 0.0
2012-03-02 USA NY Windows FF 10.0
2012-03-03 USA S OSX Chrome 25.0
2012-03-03 Canada Ontario Linux Chrome 0.0
2012-03-04 USA Chicago OSX Safari 15.0
5 visits, 2 4 cities: 3 OS-es 3 browsers 50.0
3 days countries NY: 2 Win: 2 FF: 2 3 sales
USA: 4 SF: 1 OSX: 2 Chrome:2
Canada: 1
Filter or Segment or Slice (WHERE or HAVING)
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 34
-
35.
OLAP 101 – Sorting, TOP n
Date Countr City OS Browser Sale
y
Chrome $25
Safari $15
Firefox $10
SELECT SUM(sales) as total GROUP BY browser ORDER BY total
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 35
How many HBase users?
Data as first class citizen
Check contrast on projector
Just like speedvs space in general CS/algoQueries always hit indexes
Dimensions – readtransformserializedeserialize data attributesMetrics – read/transform/aggregate/serializeConstraints: ingestion filteringReport: instrument dimensions groups + metrics with aggregations, sorting
QUERY ENGINE -> INDEX(always realtime)
Initial import/process and NEW reports (not covered) on historical data
18K regions, upgrade to 0.92
DiagramHARD TO DIGEST (TOO MUCH INFO, TOO CONDENSED)
Process = aggregate,generate indexes (natural)Query = uses indexes, can do extra aggregation
LEFT: report definition, NOT a QUERYLIKE A VIEW - CREATED - THEN QUERIED
Inconsistent
Rowkey =dimensions group -> metrics (right)
GO BACK to EXPLAIN
>100K/sec/threadREALTIME
Data analysts work with familiar concepts