Druid @ Branch
Enhancing the Data Platform for better Business
Decisions
● Sub Second aggregate queries
● Real time analytics dashboard
● Live queries for uniques
● Instant exploratory analytics
Technology powering the Data Platform
Performance & Scale Considerations
Opportunity for new Apps
Monitoring
Provisioning & deployment
Future Plan
Demo
Biswajit Das
Data Team
@biswajit @branch.io
Muwon Lum
Infra Team
@muwon @branch.io
Agenda
The Business Problem
Technology Gap
Data Platform Features
Performance and Scale
Opportunity for new Apps
Monitoring
Provisioning & deployment
Future Plan
The Business Problem
● Cannot perform live complex queries
● Lack of instant access to aggregate data
● Gathering unique impressions time consuming
● No single pane of glass to view all data
● Ad Hoc query requires pre-aggregation
Instant access to information at scale was a problem
Technology Gap
Key/Value Store (Aerospike)
● Pre-compute all permutations of possible user queries.
● Range scans on event data.
● Pre-computing all permutations of all ad-hoc queries can lead to a result sets that grow exponentially
with the number of columns of a data sets and can require hours of pre-processing time.
Druid to the rescue…….
High Level Data Pipeline flow
SECOR
Tranquility
Parquet
Batch System
Query path
Performance And Scale
● 25 node Production cluster (only Druid)
● Several hundred terabytes raw data indexed .
● Typical complex datasource with 30 dimension and 2 metrics
● Real time indexer with ~30k events per second to peak 50k
● Hourly bucketed data to support different timezones
● Sustained 2B + events day
● Thousands of queries per second for online dashboard applications
● Serving 11 million query every day
Opportunity for new Apps
● Druid helped us to support new analytics easily .
● Ad hoc reporting .
● Visualizing Data.
● Exploratory analytics .
Provisioning & Deployment
SaltStack
Rolling Updates
1
2
3
Future Plan
● More robust Query Service .
● Migrate Hadoop indexer to Spark.
● Actively working to migrate streaming pipeline to Flink .
● Evaluating to move whole druid stack to Mesos/Docker .
Thank you
We are hiring : https://branch.io/careers

Druid @ branch

  • 1.
    Druid @ Branch Enhancingthe Data Platform for better Business Decisions ● Sub Second aggregate queries ● Real time analytics dashboard ● Live queries for uniques ● Instant exploratory analytics Technology powering the Data Platform Performance & Scale Considerations Opportunity for new Apps Monitoring Provisioning & deployment Future Plan Demo Biswajit Das Data Team @biswajit @branch.io Muwon Lum Infra Team @muwon @branch.io
  • 2.
    Agenda The Business Problem TechnologyGap Data Platform Features Performance and Scale Opportunity for new Apps Monitoring Provisioning & deployment Future Plan
  • 3.
    The Business Problem ●Cannot perform live complex queries ● Lack of instant access to aggregate data ● Gathering unique impressions time consuming ● No single pane of glass to view all data ● Ad Hoc query requires pre-aggregation Instant access to information at scale was a problem
  • 4.
    Technology Gap Key/Value Store(Aerospike) ● Pre-compute all permutations of possible user queries. ● Range scans on event data. ● Pre-computing all permutations of all ad-hoc queries can lead to a result sets that grow exponentially with the number of columns of a data sets and can require hours of pre-processing time.
  • 5.
    Druid to therescue…….
  • 6.
    High Level DataPipeline flow SECOR Tranquility Parquet
  • 7.
  • 8.
  • 9.
    Performance And Scale ●25 node Production cluster (only Druid) ● Several hundred terabytes raw data indexed . ● Typical complex datasource with 30 dimension and 2 metrics ● Real time indexer with ~30k events per second to peak 50k ● Hourly bucketed data to support different timezones ● Sustained 2B + events day ● Thousands of queries per second for online dashboard applications ● Serving 11 million query every day
  • 10.
    Opportunity for newApps ● Druid helped us to support new analytics easily . ● Ad hoc reporting . ● Visualizing Data. ● Exploratory analytics .
  • 12.
  • 13.
  • 14.
    Future Plan ● Morerobust Query Service . ● Migrate Hadoop indexer to Spark. ● Actively working to migrate streaming pipeline to Flink . ● Evaluating to move whole druid stack to Mesos/Docker .
  • 15.
    Thank you We arehiring : https://branch.io/careers