Twitter uses HBase for mutable batch processing data storage and operational intelligence. They run HBase 0.94 on Hadoop 2.0 across multiple clusters managed by Puppet. HBase stores operational audit logs and supports Python apps. hRaven stores and analyzes MapReduce job metrics to optimize Pig reducers, plan cluster capacity, and troubleshoot problems. It tracks 12.6M jobs across flows, clusters, users and versions.
5. Major Use Cases
● Mutable data store for batch processing
● Operational Intelligence
● Monitoring/Metrics
6. Mutable data store for batch
processing
● Tables copied from MySQL
○ Allowing for incremental loads
● MapReduce jobs over data in HBase
● Snapshot of data copied into HDFS for processing
○ HBASE-8369 will optimize this
9. ● Stores stats, configuration and timing for every map
reduce job on every cluster
● Structured around the full DAG of jobs from a Pig or
Scalding application
● Easily queryable for historical trending
● Allows for Pig reducer optimization based on historical
run stats
● Keep data online forever (12.6M jobs, 4.5B tasks +
attempts)
hRaven: Why?
10. ● cluster - each cluster has a unique name mapping to
the Job Tracker
● user - map reduce jobs are run as a given user
● application - a Pig or Scalding script (or plain map
reduce job)
● flow - the combined DAG of jobs executed from a
single run of an application
● version - changes impacting the DAG are recorded as
a new version of the same application
hRaven: Key Concepts
13. ● All jobs in a flow are ordered together
hRaven: Flow Storage
14. ● Most recent flow is ordered first
hRaven: Flow Storage
15. ● All jobs in a flow are ordered together
● Per-job metrics stored
○ Total map and reduce tasks
○ HDFS bytes read / written
○ File bytes read / written
○ Total map and reduce slot milliseconds
● Easy to aggregate stats for an entire flow
● Easy to scan the timeseries of each application’s flows
hRaven: Key Features
16. ● Pig reducer optimizations
● Cluster utilization / capacity planning
● Application performance trending over time
● Identifying common job anti-patterns
● Ad-hoc analysis troubleshooting cluster problems
hRaven: Current Uses