SlideShare a Scribd company logo
1 of 22
Presentation for Big Data for Business Users
January 27th, 2016
Qubole Service Growth: Rapid Uptake
Qubole History
First Paying
Customer
GCP GA
Series B
100 PB/mo
294 PB/mo
Azure GA
Spark GA
Series A
Presto GAAWS GA
Company Founding
Qubole founders built the Facebook data platform.
The Facebook model changed the role for data
in an enterprise.
• Turned data assets into a “utility”
– Collaborative: over 30% of employees
accessed
– Accessible: developers, analysts, business
analysts
– Scalable: Exabyte's of data moving very fast
Work at Facebook Inspired the Founding of Qubole
Operations
Analyst
Marketing Ops
Analyst
Data
Architect
Business
Users
Product
Support
Customer
Support
Developer
Sales Ops
Product
Managers
Data
Infrastructure
Impediments for an Aspiring Data Driven Enterprise
Where Big
Data Tech
falls short:
• 6-18 month implementation time
• Only 27% of Big Data initiatives are
classified as “Successful” in 2014
Rigid and
inflexible
infrastructure
Non adaptive
software
services
Highly
specialized
systems
Difficult to
build and
operate
• Only 13% of organizations achieve full-scale production
• 57% of organizations cite skills gap as a major inhibitor
Impediments for an Aspiring Data Driven Enterprise
What You Need to Work in the Cloud:
Central
Governance &
Security
Internet
Scale
Instant
Deployment
Isolated
Multitenancy
Elastic
Object Store
Underpinnings
1. Start small
2. Begin with clear objectives
3. Get buy-in from all business stakeholders
4. Make sure data is available to those who need it
5. Build a skilled and knowledgeable team
5 Best Practices for successful Big Data projects:
Lessons Learned
https://www.qubole.com/blog/big-data/big-data-project-management/
Case Study: Pinterest
Data at Pinterest
• 60 Billion Pins
• 1 Billion boards
• 100M MAU
• 60 PB of data on S3
• 3 PB processed every day
• 2000 node Hadoop cluster
• 250 engineers
Case Study: Pinterest
Pinterest Data Architecture
App
Case Study: Pinterest
Pinterest Data Architecture
App
events
Kafka
Secor
Singer
Case Study: Pinterest
Pinterest Data Architecture
App
events
Kafka
Secor
Singer
Case Study: Pinterest
Pinterest Data Architecture
App
events
Kafka
Secor
Skyline
Pinball
Redshift
Pinalytics
Features
Qubole
(Hadoop)
Singer
Case Study: Pinterest
Hadoop Platform Requirements
• Ephemeral clusters
• Access control layer
• Shared data store
• Easy deployment
• Isolated multi-tenancy
• Elasticity
• Support multiple
clusters
https://engineering.pinterest.com/blog/powering-big-data-pinterest
Case Study: Pinterest
Decoupling Storage & Compute
Hadoop Cluster 1
Transient
HDFS
Hadoop Cluster 2
Transient
HDFS
S3 Persistent
Store
Case Study: Pinterest
Centralized Hive Metastore
Hive
Metastore
Pig
Cascading
Hive
HDFS/S3
DataMetadata
Case Study: Pinterest
Executor Abstraction Layer
Hive
Metastore
HDFS/S3
Qubole
Managed
Hadoop
EMR
Executor
Pinball
Dev Server
Case Study: Pinterest
Optimizing Your Utilization
On Demand Instance
$$
$
Market of Instances
Spot Instance
$
Or
Case Study: Pinterest
Why Qubole?
• API for simplified executor
abstraction
• Advanced support for spot
instances
• Baked AMI customization
• Hadoop & Spark as
managed services
• Tight integration with
Hive
• Graceful cluster scaling
QDS Virtualizes Out the Complexity of Managing Infrastructure
18
Designed to Drive Organizational Leverage
• QDS can run admin/analyst ratios of 1:unlimited
• Compare this to alternatives ~ 1:1.5
• Current average admin/analyst ratio is 1:21*
*Ratio is only limited by company size
“We have over 100 regular MapReduce users running over 2,000
jobs each day through Qubole’s web interface, ad-hoc jobs and
scheduled workflows -- all managed by a single administrator.”
Krishna Gade
Engineering Manager
Note: Pinterest is currently running at ~1:300 as of Dec 2015
Introducing Qubole Data Services (QDS)
• Monitoring • Full support • Persistent logging • Persistent debugging
Service Management
Workbench designed for Data Scientist and Data Analysts
Ad-hoc workloads using a simple SQL query composer or SmartQuery™ builder tool
Language SDKs designed for Application Developers
Build apps that leverage QDS as a data platform without human intervention
BI Tool connector designed for Lines of Business Analysts
Drive QDS workload right through your front-end analytic tool of choice
Three Ways To Process Data Through QDS
The QDS Core Platform
Growing Base of Connectors
Connect to Persistent Storage and Databases
Persistent Storage Database Connectors
Google Cloud
Storage
http://www.qubole.com/features/
Thank You!

More Related Content

What's hot

Data Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation SlidesData Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation SlidesSlideTeam
 
AWS Customer Presentation - WeoGeo
AWS Customer Presentation - WeoGeo AWS Customer Presentation - WeoGeo
AWS Customer Presentation - WeoGeo Amazon Web Services
 
Build, Train & Deploy Machine Learning Models at Scale
Build, Train & Deploy Machine Learning Models at ScaleBuild, Train & Deploy Machine Learning Models at Scale
Build, Train & Deploy Machine Learning Models at ScaleAmazon Web Services
 
Introduction to Azure SQL DB
Introduction to Azure SQL DBIntroduction to Azure SQL DB
Introduction to Azure SQL DBChristopher Foot
 
Introducing Azure SQL Database
Introducing Azure SQL DatabaseIntroducing Azure SQL Database
Introducing Azure SQL DatabaseJames Serra
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopCCG
 
Telecom Churn Prediction Presentation
Telecom Churn Prediction PresentationTelecom Churn Prediction Presentation
Telecom Churn Prediction PresentationPinintiHarishReddy
 
Azure Database Services for MySQL PostgreSQL and MariaDB
Azure Database Services for MySQL PostgreSQL and MariaDBAzure Database Services for MySQL PostgreSQL and MariaDB
Azure Database Services for MySQL PostgreSQL and MariaDBNicholas Vossburg
 
Cloud Computing - An Introduction
Cloud Computing - An IntroductionCloud Computing - An Introduction
Cloud Computing - An IntroductionRavindra Dastikop
 
Wix's ML Platform
Wix's ML PlatformWix's ML Platform
Wix's ML PlatformRan Romano
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solidLars Albertsson
 

What's hot (20)

Data Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation SlidesData Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation Slides
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
AWS Customer Presentation - WeoGeo
AWS Customer Presentation - WeoGeo AWS Customer Presentation - WeoGeo
AWS Customer Presentation - WeoGeo
 
AWS Analytics
AWS AnalyticsAWS Analytics
AWS Analytics
 
Demystifying identity on AWS
Demystifying identity on AWSDemystifying identity on AWS
Demystifying identity on AWS
 
Build, Train & Deploy Machine Learning Models at Scale
Build, Train & Deploy Machine Learning Models at ScaleBuild, Train & Deploy Machine Learning Models at Scale
Build, Train & Deploy Machine Learning Models at Scale
 
Introduction to Azure SQL DB
Introduction to Azure SQL DBIntroduction to Azure SQL DB
Introduction to Azure SQL DB
 
Introducing Azure SQL Database
Introducing Azure SQL DatabaseIntroducing Azure SQL Database
Introducing Azure SQL Database
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual Workshop
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Telecom Churn Prediction Presentation
Telecom Churn Prediction PresentationTelecom Churn Prediction Presentation
Telecom Churn Prediction Presentation
 
Aws
AwsAws
Aws
 
Chapter 1 big data
Chapter 1 big dataChapter 1 big data
Chapter 1 big data
 
Amazon RDS Deep Dive
Amazon RDS Deep DiveAmazon RDS Deep Dive
Amazon RDS Deep Dive
 
Azure Database Services for MySQL PostgreSQL and MariaDB
Azure Database Services for MySQL PostgreSQL and MariaDBAzure Database Services for MySQL PostgreSQL and MariaDB
Azure Database Services for MySQL PostgreSQL and MariaDB
 
Cloud Computing - An Introduction
Cloud Computing - An IntroductionCloud Computing - An Introduction
Cloud Computing - An Introduction
 
Wix's ML Platform
Wix's ML PlatformWix's ML Platform
Wix's ML Platform
 
SQOOP PPT
SQOOP PPTSQOOP PPT
SQOOP PPT
 
Big data
Big dataBig data
Big data
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solid
 

Similar to Big Data at Pinterest - Presented by Qubole

Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole presentation for the Cleveland Big Data and Hadoop Meetup   Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole presentation for the Cleveland Big Data and Hadoop Meetup Qubole
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesQubole
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15MLconf
 
Atlanta MLConf
Atlanta MLConfAtlanta MLConf
Atlanta MLConfQubole
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Cloudera, Inc.
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...DataWorks Summit/Hadoop Summit
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresKangaroot
 
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study VMworld
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketDremio Corporation
 
Pacemaker hadoop infrastructure and soft serve experience
Pacemaker   hadoop infrastructure and soft serve experiencePacemaker   hadoop infrastructure and soft serve experience
Pacemaker hadoop infrastructure and soft serve experienceVitaliy Bashun
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceWilfried Hoge
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
AI-Powered Analytics: What It Is and How It’s Powering the Next Generation of...
AI-Powered Analytics: What It Is and How It’s Powering the Next Generation of...AI-Powered Analytics: What It Is and How It’s Powering the Next Generation of...
AI-Powered Analytics: What It Is and How It’s Powering the Next Generation of...Tyler Wishnoff
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiFelicia Haggarty
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...DataWorks Summit
 
Making the Case for Hadoop in a Large Enterprise-British Airways
Making the Case for Hadoop in a Large Enterprise-British AirwaysMaking the Case for Hadoop in a Large Enterprise-British Airways
Making the Case for Hadoop in a Large Enterprise-British AirwaysDataWorks Summit
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 

Similar to Big Data at Pinterest - Presented by Qubole (20)

Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole presentation for the Cleveland Big Data and Hadoop Meetup   Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole presentation for the Cleveland Big Data and Hadoop Meetup
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slides
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
 
Atlanta MLConf
Atlanta MLConfAtlanta MLConf
Atlanta MLConf
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 
Retail & CPG
Retail & CPGRetail & CPG
Retail & CPG
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
 
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
Pacemaker hadoop infrastructure and soft serve experience
Pacemaker   hadoop infrastructure and soft serve experiencePacemaker   hadoop infrastructure and soft serve experience
Pacemaker hadoop infrastructure and soft serve experience
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
AI-Powered Analytics: What It Is and How It’s Powering the Next Generation of...
AI-Powered Analytics: What It Is and How It’s Powering the Next Generation of...AI-Powered Analytics: What It Is and How It’s Powering the Next Generation of...
AI-Powered Analytics: What It Is and How It’s Powering the Next Generation of...
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
 
Making the Case for Hadoop in a Large Enterprise-British Airways
Making the Case for Hadoop in a Large Enterprise-British AirwaysMaking the Case for Hadoop in a Large Enterprise-British Airways
Making the Case for Hadoop in a Large Enterprise-British Airways
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 

More from Qubole

Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Qubole
 
7 Big Data Challenges and How to Overcome Them
7 Big Data Challenges and How to Overcome Them7 Big Data Challenges and How to Overcome Them
7 Big Data Challenges and How to Overcome ThemQubole
 
State of Big Data Adoption
State of Big Data AdoptionState of Big Data Adoption
State of Big Data AdoptionQubole
 
5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance 5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance Qubole
 
Spark on Yarn
Spark on YarnSpark on Yarn
Spark on YarnQubole
 
Running Spark on Cloud
Running Spark on CloudRunning Spark on Cloud
Running Spark on CloudQubole
 
Qubole State of the Big Data Industry
Qubole State of the Big Data IndustryQubole State of the Big Data Industry
Qubole State of the Big Data IndustryQubole
 
Big Data Platform at Pinterest
Big Data Platform at PinterestBig Data Platform at Pinterest
Big Data Platform at PinterestQubole
 
BIPD Tech Tuesday Presentation - Qubole
BIPD Tech Tuesday Presentation - QuboleBIPD Tech Tuesday Presentation - Qubole
BIPD Tech Tuesday Presentation - QuboleQubole
 
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache HiveHarnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache HiveQubole
 
Optimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudOptimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudQubole
 
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataGetting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataQubole
 
Expert Big Data Tips
Expert Big Data TipsExpert Big Data Tips
Expert Big Data TipsQubole
 
Big dataproposal
Big dataproposalBig dataproposal
Big dataproposalQubole
 
Presto in the cloud
Presto in the cloudPresto in the cloud
Presto in the cloudQubole
 
Basic Sentiment Analysis using Hive
Basic Sentiment Analysis using HiveBasic Sentiment Analysis using Hive
Basic Sentiment Analysis using HiveQubole
 
Effective Hive Queries
Effective Hive QueriesEffective Hive Queries
Effective Hive QueriesQubole
 

More from Qubole (17)

Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
 
7 Big Data Challenges and How to Overcome Them
7 Big Data Challenges and How to Overcome Them7 Big Data Challenges and How to Overcome Them
7 Big Data Challenges and How to Overcome Them
 
State of Big Data Adoption
State of Big Data AdoptionState of Big Data Adoption
State of Big Data Adoption
 
5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance 5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance
 
Spark on Yarn
Spark on YarnSpark on Yarn
Spark on Yarn
 
Running Spark on Cloud
Running Spark on CloudRunning Spark on Cloud
Running Spark on Cloud
 
Qubole State of the Big Data Industry
Qubole State of the Big Data IndustryQubole State of the Big Data Industry
Qubole State of the Big Data Industry
 
Big Data Platform at Pinterest
Big Data Platform at PinterestBig Data Platform at Pinterest
Big Data Platform at Pinterest
 
BIPD Tech Tuesday Presentation - Qubole
BIPD Tech Tuesday Presentation - QuboleBIPD Tech Tuesday Presentation - Qubole
BIPD Tech Tuesday Presentation - Qubole
 
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache HiveHarnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
 
Optimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudOptimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public Cloud
 
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataGetting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big Data
 
Expert Big Data Tips
Expert Big Data TipsExpert Big Data Tips
Expert Big Data Tips
 
Big dataproposal
Big dataproposalBig dataproposal
Big dataproposal
 
Presto in the cloud
Presto in the cloudPresto in the cloud
Presto in the cloud
 
Basic Sentiment Analysis using Hive
Basic Sentiment Analysis using HiveBasic Sentiment Analysis using Hive
Basic Sentiment Analysis using Hive
 
Effective Hive Queries
Effective Hive QueriesEffective Hive Queries
Effective Hive Queries
 

Recently uploaded

Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfRobertoOcampo24
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksBoston Institute of Analytics
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfgreat91
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理pyhepag
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.pptRachmaGhifari
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理cyebo
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunksgmuir1066
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证ppy8zfkfm
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理pyhepag
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancingmohamed Elzalabany
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证ju0dztxtn
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxStephen266013
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证zifhagzkk
 

Recently uploaded (20)

Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
123.docx. .
123.docx.                                 .123.docx.                                 .
123.docx. .
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 

Big Data at Pinterest - Presented by Qubole

  • 1. Presentation for Big Data for Business Users January 27th, 2016
  • 2. Qubole Service Growth: Rapid Uptake Qubole History First Paying Customer GCP GA Series B 100 PB/mo 294 PB/mo Azure GA Spark GA Series A Presto GAAWS GA
  • 3. Company Founding Qubole founders built the Facebook data platform. The Facebook model changed the role for data in an enterprise. • Turned data assets into a “utility” – Collaborative: over 30% of employees accessed – Accessible: developers, analysts, business analysts – Scalable: Exabyte's of data moving very fast Work at Facebook Inspired the Founding of Qubole Operations Analyst Marketing Ops Analyst Data Architect Business Users Product Support Customer Support Developer Sales Ops Product Managers Data Infrastructure
  • 4. Impediments for an Aspiring Data Driven Enterprise Where Big Data Tech falls short: • 6-18 month implementation time • Only 27% of Big Data initiatives are classified as “Successful” in 2014 Rigid and inflexible infrastructure Non adaptive software services Highly specialized systems Difficult to build and operate • Only 13% of organizations achieve full-scale production • 57% of organizations cite skills gap as a major inhibitor
  • 5. Impediments for an Aspiring Data Driven Enterprise What You Need to Work in the Cloud: Central Governance & Security Internet Scale Instant Deployment Isolated Multitenancy Elastic Object Store Underpinnings
  • 6. 1. Start small 2. Begin with clear objectives 3. Get buy-in from all business stakeholders 4. Make sure data is available to those who need it 5. Build a skilled and knowledgeable team 5 Best Practices for successful Big Data projects: Lessons Learned https://www.qubole.com/blog/big-data/big-data-project-management/
  • 7. Case Study: Pinterest Data at Pinterest • 60 Billion Pins • 1 Billion boards • 100M MAU • 60 PB of data on S3 • 3 PB processed every day • 2000 node Hadoop cluster • 250 engineers
  • 8. Case Study: Pinterest Pinterest Data Architecture App
  • 9. Case Study: Pinterest Pinterest Data Architecture App events Kafka Secor Singer
  • 10. Case Study: Pinterest Pinterest Data Architecture App events Kafka Secor Singer
  • 11. Case Study: Pinterest Pinterest Data Architecture App events Kafka Secor Skyline Pinball Redshift Pinalytics Features Qubole (Hadoop) Singer
  • 12. Case Study: Pinterest Hadoop Platform Requirements • Ephemeral clusters • Access control layer • Shared data store • Easy deployment • Isolated multi-tenancy • Elasticity • Support multiple clusters https://engineering.pinterest.com/blog/powering-big-data-pinterest
  • 13. Case Study: Pinterest Decoupling Storage & Compute Hadoop Cluster 1 Transient HDFS Hadoop Cluster 2 Transient HDFS S3 Persistent Store
  • 14. Case Study: Pinterest Centralized Hive Metastore Hive Metastore Pig Cascading Hive HDFS/S3 DataMetadata
  • 15. Case Study: Pinterest Executor Abstraction Layer Hive Metastore HDFS/S3 Qubole Managed Hadoop EMR Executor Pinball Dev Server
  • 16. Case Study: Pinterest Optimizing Your Utilization On Demand Instance $$ $ Market of Instances Spot Instance $ Or
  • 17. Case Study: Pinterest Why Qubole? • API for simplified executor abstraction • Advanced support for spot instances • Baked AMI customization • Hadoop & Spark as managed services • Tight integration with Hive • Graceful cluster scaling
  • 18. QDS Virtualizes Out the Complexity of Managing Infrastructure 18 Designed to Drive Organizational Leverage • QDS can run admin/analyst ratios of 1:unlimited • Compare this to alternatives ~ 1:1.5 • Current average admin/analyst ratio is 1:21* *Ratio is only limited by company size “We have over 100 regular MapReduce users running over 2,000 jobs each day through Qubole’s web interface, ad-hoc jobs and scheduled workflows -- all managed by a single administrator.” Krishna Gade Engineering Manager Note: Pinterest is currently running at ~1:300 as of Dec 2015
  • 19. Introducing Qubole Data Services (QDS) • Monitoring • Full support • Persistent logging • Persistent debugging Service Management Workbench designed for Data Scientist and Data Analysts Ad-hoc workloads using a simple SQL query composer or SmartQuery™ builder tool Language SDKs designed for Application Developers Build apps that leverage QDS as a data platform without human intervention BI Tool connector designed for Lines of Business Analysts Drive QDS workload right through your front-end analytic tool of choice Three Ways To Process Data Through QDS
  • 20. The QDS Core Platform
  • 21. Growing Base of Connectors Connect to Persistent Storage and Databases Persistent Storage Database Connectors Google Cloud Storage http://www.qubole.com/features/