Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Druid @ Branch
Enhancing the Data Platform for better Business
Decisions
● Sub Second aggregate queries
● Real time analyt...
Agenda
The Business Problem
Technology Gap
Data Platform Features
Performance and Scale
Opportunity for new Apps
Monitorin...
The Business Problem
● Cannot perform live complex queries
● Lack of instant access to aggregate data
● Gathering unique i...
Technology Gap
Key/Value Store (Aerospike)
● Pre-compute all permutations of possible user queries.
● Range scans on event...
Druid to the rescue…….
High Level Data Pipeline flow
SECOR
Tranquility
Parquet
Batch System
Query path
Performance And Scale
● 25 node Production cluster (only Druid)
● Several hundred terabytes raw data indexed .
● Typical c...
Opportunity for new Apps
● Druid helped us to support new analytics easily .
● Ad hoc reporting .
● Visualizing Data.
● Ex...
Provisioning & Deployment
SaltStack
Rolling Updates
1
2
3
Future Plan
● More robust Query Service .
● Migrate Hadoop indexer to Spark.
● Actively working to migrate streaming pipel...
Thank you
We are hiring : https://branch.io/careers
Druid @ branch
Upcoming SlideShare
Loading in …5
×

Druid @ branch

1,058 views

Published on

My recent talk at druid meet up about how we build our petabyte-scale real-time analytics infrastructure.

Published in: Data & Analytics
  • Be the first to comment

Druid @ branch

  1. 1. Druid @ Branch Enhancing the Data Platform for better Business Decisions ● Sub Second aggregate queries ● Real time analytics dashboard ● Live queries for uniques ● Instant exploratory analytics Technology powering the Data Platform Performance & Scale Considerations Opportunity for new Apps Monitoring Provisioning & deployment Future Plan Demo Biswajit Das Data Team @biswajit @branch.io Muwon Lum Infra Team @muwon @branch.io
  2. 2. Agenda The Business Problem Technology Gap Data Platform Features Performance and Scale Opportunity for new Apps Monitoring Provisioning & deployment Future Plan
  3. 3. The Business Problem ● Cannot perform live complex queries ● Lack of instant access to aggregate data ● Gathering unique impressions time consuming ● No single pane of glass to view all data ● Ad Hoc query requires pre-aggregation Instant access to information at scale was a problem
  4. 4. Technology Gap Key/Value Store (Aerospike) ● Pre-compute all permutations of possible user queries. ● Range scans on event data. ● Pre-computing all permutations of all ad-hoc queries can lead to a result sets that grow exponentially with the number of columns of a data sets and can require hours of pre-processing time.
  5. 5. Druid to the rescue…….
  6. 6. High Level Data Pipeline flow SECOR Tranquility Parquet
  7. 7. Batch System
  8. 8. Query path
  9. 9. Performance And Scale ● 25 node Production cluster (only Druid) ● Several hundred terabytes raw data indexed . ● Typical complex datasource with 30 dimension and 2 metrics ● Real time indexer with ~30k events per second to peak 50k ● Hourly bucketed data to support different timezones ● Sustained 2B + events day ● Thousands of queries per second for online dashboard applications ● Serving 11 million query every day
  10. 10. Opportunity for new Apps ● Druid helped us to support new analytics easily . ● Ad hoc reporting . ● Visualizing Data. ● Exploratory analytics .
  11. 11. Provisioning & Deployment SaltStack
  12. 12. Rolling Updates 1 2 3
  13. 13. Future Plan ● More robust Query Service . ● Migrate Hadoop indexer to Spark. ● Actively working to migrate streaming pipeline to Flink . ● Evaluating to move whole druid stack to Mesos/Docker .
  14. 14. Thank you We are hiring : https://branch.io/careers

×