Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Snapchat 2018
Analytics at
Snap
Big Data processing, slicing, and dicing
Charles Allen
charles.allen@snap.com
https://www....
09.20.18
Who we are
Snap growth
Wrangling Data / Data tool chest
Druid’s powerhouse
Overview
Who we are
Snap Inc. is a camera company
Express yourself!
place creative here place creative here
Live in the moment
place creative here
Snap growth
Million DAU Q2
2014
Million DAU Q2
188
2018
Source: 10-K; 10-Q; earnings call transcripts
User base up
Advertiser value up...
Trillions of interactions per
week.
Wrangling data
Lack of data
causes pain
Natural pipeline development
Need
Find data signal,
and data
processing SME
Source
Work with
deve...
Common data consumption formats
Scripting
High level of expertise
Extremely dynamic
Usually either one-off for a specific
...
Data tool chest
Headline Center, Sub, Labels, 6-Screens Yellow
Stream buffer
Kafka
Stream buffer
Pubsub
Batch processing
orchestration
Air...
Key architecture components for business logic
Stream and Batch
processing
Dataflow
Pipeline business logic
Beam
Popular l...
Headline Center, Sub, Labels, 6-Screens Yellow
Bulk data warehousing
Big Query
Exploratory data storage
Druid
Druid centri...
Core event log workflows
GDPR
SOX
● Bundle lands in GCS
● Airflow churns data
between BigQuery and
GCS
● Over 20k DAG runs...
Druid vs BigQuery
Druid
Multi cloud compatible.
Higher friction data load.
Lower friction data maintenance.
Gets more affo...
Druid’s powerhouse
Large compute capacity
Cores
>10k
Flowing into Druid
Events per day
>100B
Answered
Queries per day
>100k
Key Druid stats
Druid ingestion and consumption
Reports /
Dashboards
SME
Dashboards
Drill Down
Data Storage & Querying
Platform
Platform GKE Cluster
ZooKeeper
Coordination &
configuration
Druid
Indexed datastore
Java,...
Recent data FAST
NVME-SSD
1 Week
2 Hot
Recent data HA
1 Week
1 Cold
Keep older data available
Older Data
HADruid retention...
We Are Hiring!
charles.allen@snap.com
https://www.snap.com/jobs/
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Upcoming SlideShare
Loading in …5
×

Data Analytics and Processing at Snap - Druid Meetup LA - September 2018

2,821 views

Published on

Charles Allen covers data processing, analytics, and insights systems at Snap. Strength points for Druid use cases are called out as are differences in some of the processing systems used.

This is the slide collection from the second talk from:
https://www.meetup.com/druidio-la/events/254080924/

Published in: Data & Analytics
  • Be the first to comment

Data Analytics and Processing at Snap - Druid Meetup LA - September 2018

  1. 1. Snapchat 2018 Analytics at Snap Big Data processing, slicing, and dicing Charles Allen charles.allen@snap.com https://www.linkedin.com/in/charles-allen-255bab2a/
  2. 2. 09.20.18 Who we are Snap growth Wrangling Data / Data tool chest Druid’s powerhouse Overview
  3. 3. Who we are
  4. 4. Snap Inc. is a camera company
  5. 5. Express yourself! place creative here place creative here
  6. 6. Live in the moment place creative here
  7. 7. Snap growth
  8. 8. Million DAU Q2 2014 Million DAU Q2 188 2018 Source: 10-K; 10-Q; earnings call transcripts User base up Advertiser value up 57
  9. 9. Trillions of interactions per week.
  10. 10. Wrangling data
  11. 11. Lack of data causes pain Natural pipeline development Need Find data signal, and data processing SME Source Work with development team for pipeline Develop To production! Deploy Fire and forget, or keep it live? Maintain Getting insights into data
  12. 12. Common data consumption formats Scripting High level of expertise Extremely dynamic Usually either one-off for a specific human. Or scripted for machine consumption. DashboardsReports Small qty of KPIs Big tables or worksheets “Executive” summarization Multiple KPIs Curated by expert Some flexibility Often operational in nature or usage
  13. 13. Data tool chest
  14. 14. Headline Center, Sub, Labels, 6-Screens Yellow Stream buffer Kafka Stream buffer Pubsub Batch processing orchestration Airflow Bundle storage Storage Key architecture components for data flow control ICON
  15. 15. Key architecture components for business logic Stream and Batch processing Dataflow Pipeline business logic Beam Popular language Python Popular language Java Stream and batch processing Spark
  16. 16. Headline Center, Sub, Labels, 6-Screens Yellow Bulk data warehousing Big Query Exploratory data storage Druid Druid centric dashboarding Superset General dashboarding Looker Key architecture components for data consumption
  17. 17. Core event log workflows GDPR SOX ● Bundle lands in GCS ● Airflow churns data between BigQuery and GCS ● Over 20k DAG runs a week ● Lots of access control
  18. 18. Druid vs BigQuery Druid Multi cloud compatible. Higher friction data load. Lower friction data maintenance. Gets more affordable with more usage. You will track who has the most data. Very fast. Slice and dice. BigQuery Fully managed and hosted, GCP-only. Low friction data load. High friction data maintenance. Price punishment for using too much. You will track who is causing cost spikes. Often slow, but faster than hadoop. Joins. Internal use cases for Druid vs BigQuery
  19. 19. Druid’s powerhouse
  20. 20. Large compute capacity Cores >10k Flowing into Druid Events per day >100B Answered Queries per day >100k Key Druid stats
  21. 21. Druid ingestion and consumption Reports / Dashboards SME Dashboards Drill Down
  22. 22. Data Storage & Querying Platform Platform GKE Cluster ZooKeeper Coordination & configuration Druid Indexed datastore Java, Druid Druid Indexed datastore Java, Druid Druid Broker Druid Historicals* Druid Coordinator Java, CoreOS, Druid, GCE Mesos Cluster Management GCE Marathon Orchestration GCE GCS Deep Storage CloudSQL Druid Metadata ZooKeeper Coordination & Configuration ZooKeeper Coordination & configuration MongoDB Query Time Lookup Cache ● GCP Deployment Manager ● Helm
  23. 23. Recent data FAST NVME-SSD 1 Week 2 Hot Recent data HA 1 Week 1 Cold Keep older data available Older Data HADruid retention tunings
  24. 24. We Are Hiring! charles.allen@snap.com https://www.snap.com/jobs/

×