Presentation for Big Data for Business Users
January 27th, 2016
Qubole Service Growth: Rapid Uptake
Qubole History
First Paying
Customer
GCP GA
Series B
100 PB/mo
294 PB/mo
Azure GA
Spark GA
Series A
Presto GAAWS GA
Company Founding
Qubole founders built the Facebook data platform.
The Facebook model changed the role for data
in an enterprise.
• Turned data assets into a “utility”
– Collaborative: over 30% of employees
accessed
– Accessible: developers, analysts, business
analysts
– Scalable: Exabyte's of data moving very fast
Work at Facebook Inspired the Founding of Qubole
Operations
Analyst
Marketing Ops
Analyst
Data
Architect
Business
Users
Product
Support
Customer
Support
Developer
Sales Ops
Product
Managers
Data
Infrastructure
Impediments for an Aspiring Data Driven Enterprise
Where Big
Data Tech
falls short:
• 6-18 month implementation time
• Only 27% of Big Data initiatives are
classified as “Successful” in 2014
Rigid and
inflexible
infrastructure
Non adaptive
software
services
Highly
specialized
systems
Difficult to
build and
operate
• Only 13% of organizations achieve full-scale production
• 57% of organizations cite skills gap as a major inhibitor
Impediments for an Aspiring Data Driven Enterprise
What You Need to Work in the Cloud:
Central
Governance &
Security
Internet
Scale
Instant
Deployment
Isolated
Multitenancy
Elastic
Object Store
Underpinnings
1. Start small
2. Begin with clear objectives
3. Get buy-in from all business stakeholders
4. Make sure data is available to those who need it
5. Build a skilled and knowledgeable team
5 Best Practices for successful Big Data projects:
Lessons Learned
https://www.qubole.com/blog/big-data/big-data-project-management/
Case Study: Pinterest
Data at Pinterest
• 60 Billion Pins
• 1 Billion boards
• 100M MAU
• 60 PB of data on S3
• 3 PB processed every day
• 2000 node Hadoop cluster
• 250 engineers
Case Study: Pinterest
Pinterest Data Architecture
App
Case Study: Pinterest
Pinterest Data Architecture
App
events
Kafka
Secor
Singer
Case Study: Pinterest
Pinterest Data Architecture
App
events
Kafka
Secor
Singer
Case Study: Pinterest
Pinterest Data Architecture
App
events
Kafka
Secor
Skyline
Pinball
Redshift
Pinalytics
Features
Qubole
(Hadoop)
Singer
Case Study: Pinterest
Hadoop Platform Requirements
• Ephemeral clusters
• Access control layer
• Shared data store
• Easy deployment
• Isolated multi-tenancy
• Elasticity
• Support multiple
clusters
https://engineering.pinterest.com/blog/powering-big-data-pinterest
Case Study: Pinterest
Decoupling Storage & Compute
Hadoop Cluster 1
Transient
HDFS
Hadoop Cluster 2
Transient
HDFS
S3 Persistent
Store
Case Study: Pinterest
Centralized Hive Metastore
Hive
Metastore
Pig
Cascading
Hive
HDFS/S3
DataMetadata
Case Study: Pinterest
Executor Abstraction Layer
Hive
Metastore
HDFS/S3
Qubole
Managed
Hadoop
EMR
Executor
Pinball
Dev Server
Case Study: Pinterest
Optimizing Your Utilization
On Demand Instance
$$
$
Market of Instances
Spot Instance
$
Or
Case Study: Pinterest
Why Qubole?
• API for simplified executor
abstraction
• Advanced support for spot
instances
• Baked AMI customization
• Hadoop & Spark as
managed services
• Tight integration with
Hive
• Graceful cluster scaling
QDS Virtualizes Out the Complexity of Managing Infrastructure
18
Designed to Drive Organizational Leverage
• QDS can run admin/analyst ratios of 1:unlimited
• Compare this to alternatives ~ 1:1.5
• Current average admin/analyst ratio is 1:21*
*Ratio is only limited by company size
“We have over 100 regular MapReduce users running over 2,000
jobs each day through Qubole’s web interface, ad-hoc jobs and
scheduled workflows -- all managed by a single administrator.”
Krishna Gade
Engineering Manager
Note: Pinterest is currently running at ~1:300 as of Dec 2015
Introducing Qubole Data Services (QDS)
• Monitoring • Full support • Persistent logging • Persistent debugging
Service Management
Workbench designed for Data Scientist and Data Analysts
Ad-hoc workloads using a simple SQL query composer or SmartQuery™ builder tool
Language SDKs designed for Application Developers
Build apps that leverage QDS as a data platform without human intervention
BI Tool connector designed for Lines of Business Analysts
Drive QDS workload right through your front-end analytic tool of choice
Three Ways To Process Data Through QDS
The QDS Core Platform
Growing Base of Connectors
Connect to Persistent Storage and Databases
Persistent Storage Database Connectors
Google Cloud
Storage
http://www.qubole.com/features/
Thank You!

Big Data at Pinterest - Presented by Qubole

  • 1.
    Presentation for BigData for Business Users January 27th, 2016
  • 2.
    Qubole Service Growth:Rapid Uptake Qubole History First Paying Customer GCP GA Series B 100 PB/mo 294 PB/mo Azure GA Spark GA Series A Presto GAAWS GA
  • 3.
    Company Founding Qubole foundersbuilt the Facebook data platform. The Facebook model changed the role for data in an enterprise. • Turned data assets into a “utility” – Collaborative: over 30% of employees accessed – Accessible: developers, analysts, business analysts – Scalable: Exabyte's of data moving very fast Work at Facebook Inspired the Founding of Qubole Operations Analyst Marketing Ops Analyst Data Architect Business Users Product Support Customer Support Developer Sales Ops Product Managers Data Infrastructure
  • 4.
    Impediments for anAspiring Data Driven Enterprise Where Big Data Tech falls short: • 6-18 month implementation time • Only 27% of Big Data initiatives are classified as “Successful” in 2014 Rigid and inflexible infrastructure Non adaptive software services Highly specialized systems Difficult to build and operate • Only 13% of organizations achieve full-scale production • 57% of organizations cite skills gap as a major inhibitor
  • 5.
    Impediments for anAspiring Data Driven Enterprise What You Need to Work in the Cloud: Central Governance & Security Internet Scale Instant Deployment Isolated Multitenancy Elastic Object Store Underpinnings
  • 6.
    1. Start small 2.Begin with clear objectives 3. Get buy-in from all business stakeholders 4. Make sure data is available to those who need it 5. Build a skilled and knowledgeable team 5 Best Practices for successful Big Data projects: Lessons Learned https://www.qubole.com/blog/big-data/big-data-project-management/
  • 7.
    Case Study: Pinterest Dataat Pinterest • 60 Billion Pins • 1 Billion boards • 100M MAU • 60 PB of data on S3 • 3 PB processed every day • 2000 node Hadoop cluster • 250 engineers
  • 8.
    Case Study: Pinterest PinterestData Architecture App
  • 9.
    Case Study: Pinterest PinterestData Architecture App events Kafka Secor Singer
  • 10.
    Case Study: Pinterest PinterestData Architecture App events Kafka Secor Singer
  • 11.
    Case Study: Pinterest PinterestData Architecture App events Kafka Secor Skyline Pinball Redshift Pinalytics Features Qubole (Hadoop) Singer
  • 12.
    Case Study: Pinterest HadoopPlatform Requirements • Ephemeral clusters • Access control layer • Shared data store • Easy deployment • Isolated multi-tenancy • Elasticity • Support multiple clusters https://engineering.pinterest.com/blog/powering-big-data-pinterest
  • 13.
    Case Study: Pinterest DecouplingStorage & Compute Hadoop Cluster 1 Transient HDFS Hadoop Cluster 2 Transient HDFS S3 Persistent Store
  • 14.
    Case Study: Pinterest CentralizedHive Metastore Hive Metastore Pig Cascading Hive HDFS/S3 DataMetadata
  • 15.
    Case Study: Pinterest ExecutorAbstraction Layer Hive Metastore HDFS/S3 Qubole Managed Hadoop EMR Executor Pinball Dev Server
  • 16.
    Case Study: Pinterest OptimizingYour Utilization On Demand Instance $$ $ Market of Instances Spot Instance $ Or
  • 17.
    Case Study: Pinterest WhyQubole? • API for simplified executor abstraction • Advanced support for spot instances • Baked AMI customization • Hadoop & Spark as managed services • Tight integration with Hive • Graceful cluster scaling
  • 18.
    QDS Virtualizes Outthe Complexity of Managing Infrastructure 18 Designed to Drive Organizational Leverage • QDS can run admin/analyst ratios of 1:unlimited • Compare this to alternatives ~ 1:1.5 • Current average admin/analyst ratio is 1:21* *Ratio is only limited by company size “We have over 100 regular MapReduce users running over 2,000 jobs each day through Qubole’s web interface, ad-hoc jobs and scheduled workflows -- all managed by a single administrator.” Krishna Gade Engineering Manager Note: Pinterest is currently running at ~1:300 as of Dec 2015
  • 19.
    Introducing Qubole DataServices (QDS) • Monitoring • Full support • Persistent logging • Persistent debugging Service Management Workbench designed for Data Scientist and Data Analysts Ad-hoc workloads using a simple SQL query composer or SmartQuery™ builder tool Language SDKs designed for Application Developers Build apps that leverage QDS as a data platform without human intervention BI Tool connector designed for Lines of Business Analysts Drive QDS workload right through your front-end analytic tool of choice Three Ways To Process Data Through QDS
  • 20.
    The QDS CorePlatform
  • 21.
    Growing Base ofConnectors Connect to Persistent Storage and Databases Persistent Storage Database Connectors Google Cloud Storage http://www.qubole.com/features/
  • 22.