SlideShare a Scribd company logo
1 of 19
Big Data at Tube
(Events → Insights → Actions)
27th April 2016
@John Trenkle (Chief Scientist)
@Murtaza Doctor (Director of Engineering, RTB)
©2016 TubeMogul Inc. All rights reserved.
• Where do we fit?
• What do we do?
• Life of a video Ad
• RTB Architecture
• Events Architecture
• ML Perspective: Transactional -> User-Oriented
• Data -> Models
• Models -> Action
Outline
Busy Ad-Tech Landscape
©2016 TubeMogul Inc. All rights reserved.
Where does TubeMogul fit?
©2016 TubeMogul Inc. All rights reserved.
Scale:
An enterprise software company for digital branding
● Processed over 12.6 Trillion Ad Auctions in 2015
● Serve over 55 billion auctions per day
● Served over 3 Billion Ad Impressions on linear TV via our PTV solution
● Process bids in < 50 ms
● Serve bid responses in < 80 ms (includes network round-trip)
● Serve 5 PB of monthly video traffic
©2016 TubeMogul Inc. All rights reserved.
Ex: Life of a video Ad:
©2016 TubeMogul Inc. All rights reserved.
Technical Overview
Bidding Layer
Ad
Serving
- High Volumes
- Low Latency
- Small Packets
- Large Data Sets
- Low Latency
- Fast Processing
- Large Caches
Low Latency User
Database for User
Targeting and Frequency
Capping
©2016 TubeMogul Inc. All rights reserved.
Events Architecture:
● Auctions (Bids + Non Bids)
● Win Events (Impressions)
● Columnar format (ORC)
● Data Pipeline?
● Bad data?
● Scaling challenges
● Multiple downstream consumers
©2016 TubeMogul Inc. All rights reserved.
Events Architecture
©2016 TubeMogul Inc. All rights reserved.
Events Architecture: Takeaways
● Simply and Unify
● Focus on Data Validation at each step
● Automated recovery
● Leverage the messaging system for status or completion
● Metrics & Measurement for SLA
©2016 TubeMogul Inc. All rights reserved.
Machine-Learning as a Consumer
• Audience Modeling begets user-oriented data
• Pivot RTB / Analytics sources for model-building
• Many sources of Truth that need to be integrated
• Ad Interaction
• Characterize Users with robust signature (UU-Code) rather than just an item list
• Facilitate rapid prototyping and model-building
• Maintain enriched information for exploratory analysis and visualization
• Insights
• Actionable Intel
©2016 TubeMogul Inc. All rights reserved.
Ad Calls to User-Traces in Hive (on path to NoSQL)
Hive
RTB Ad
Calls
RTB
Digest
User
Activity
NoSQL
RTB Ad
Calls
User
Activity
Elastic
Search
©2016 TubeMogul Inc. All rights reserved.
Token Embedding Models and Spark
http://deepdist.com/
Ref: http://static.googleusercontent.com/media/research.google.com/en//archive/large_deep_networks_nips2012.pdf
©2016 TubeMogul Inc. All rights reserved.
Cascading for Signatures
1. JOIN on
tm_client
2. Filter
average weight
per verticals <
0.5
Daily Users
Activities
Prefixed
Daily UUCode
Creation Process
Daily
UUCodes
TM Client
Daily Activity3
Get Truth Users By
LAL Segment
Daily Truth
Users for all
LAL segment
Centroid Creation
Process
LAL
Landmarks
Segment
Creation
Process
User
Membership
Unfiltered
UUCode
Model
TM Daily
Converters
Convs LAL
segments from
Mario
User
Membership
Attach SourceID
Process
Daily
UUCodes with
Source ID
TMClientID
SourceID
Lookup
Aggregated
UUCode Creation
Process
UU Code
TM Client
Digest3
Create SourceID
Lookup Process
Wormhole
Process
Segment
Filter
Process
~650GB
UDB Team
Persistent Users
Table
©2016 TubeMogul Inc. All rights reserved.
Large-Scale Predictive Model Building
Get Truth Users,
signature
Data
Warehouse
Of truth users
Training Data
Creation
Training
Data for
segments
Ground Truth
For each
segment, perform
training
Check
performance, log
in mysql for
tracking
purposes.
Model/
weights file
for each
segment
Aggregate and
Convert to
UUCode
UU Code
Model
3 months
aggregatio
n
Segment Information
Dashboard
UI
©2016 TubeMogul Inc. All rights reserved.
Partners that have Contributed to Our Ecosystem
• Qubole
• Long-time partners
• Great for Ad Hoc queries and scheduled ETL
• Dynamic Scaling
• Snowflake
• Data Warehouse – facilitates Fraud Analysis
• SpotInst
• Cost effective Spot Instances in EMR
• Robust provisioning
• Dynamic Scaling
• Driven
• Monitor, optimize and debug Hadoop flows
©2016 TubeMogul Inc. All rights reserved.
Since Hive has been our primary datastore for a while…
• Tips and tricks
• ORC
• MAPJOIN
• Sorted, Bucketed JOINs
• TRANSFORM
• HAVING
• Hadoop Streaming
©2016 TubeMogul Inc. All rights reserved.
Models → Action
• Optimization
• Surrogate measures of engagement: Clicks, Completions, Conversions
• Audience Building for Targeting
• Demographic
• Behavioral
• Fraud Detection
• Cross Device Synching
• Profiling / Data Mining / Actionable Intel
Big Data at Tube: Events to Insights to Action

More Related Content

What's hot

Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidImply
 
Presto Summit 2018 - 08 - FINRA
Presto Summit 2018  - 08 - FINRAPresto Summit 2018  - 08 - FINRA
Presto Summit 2018 - 08 - FINRAkbajda
 
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQueryDharmesh Vaya
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Rajit Saha
 
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleAn indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleData Con LA
 
Google Dremel. Concept and Implementations.
Google Dremel. Concept and Implementations.Google Dremel. Concept and Implementations.
Google Dremel. Concept and Implementations.Vicente Orjales
 
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience SharingClickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience SharingVianney FOUCAULT
 
Managed Cluster Services
Managed Cluster ServicesManaged Cluster Services
Managed Cluster ServicesAdam Doyle
 
Presto & differences between popular SQL engines (Spark, Redshift, and Hive)
Presto & differences between popular SQL engines (Spark, Redshift, and Hive)Presto & differences between popular SQL engines (Spark, Redshift, and Hive)
Presto & differences between popular SQL engines (Spark, Redshift, and Hive)Holden Ackerman
 
Self Service Analytics at Twitch
Self Service Analytics at TwitchSelf Service Analytics at Twitch
Self Service Analytics at TwitchImply
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...HostedbyConfluent
 
Building Scalable Big Data Pipelines
Building Scalable Big Data PipelinesBuilding Scalable Big Data Pipelines
Building Scalable Big Data PipelinesChristian Gügi
 
NoSQL no more: SQL on Druid with Apache Calcite
NoSQL no more: SQL on Druid with Apache CalciteNoSQL no more: SQL on Druid with Apache Calcite
NoSQL no more: SQL on Druid with Apache Calcitegianmerlino
 
empirical analysis modeling of power dissipation control in internet data ce...
 empirical analysis modeling of power dissipation control in internet data ce... empirical analysis modeling of power dissipation control in internet data ce...
empirical analysis modeling of power dissipation control in internet data ce...saadjamil31
 
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...DataWorks Summit
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to RedshiftTreasure Data, Inc.
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for ExperimentationGleb Kanterov
 
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul MasterCornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul MasterSpark Summit
 
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...Databricks
 

What's hot (20)

Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on Druid
 
Presto Summit 2018 - 08 - FINRA
Presto Summit 2018  - 08 - FINRAPresto Summit 2018  - 08 - FINRA
Presto Summit 2018 - 08 - FINRA
 
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
 
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleAn indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
 
Google Dremel. Concept and Implementations.
Google Dremel. Concept and Implementations.Google Dremel. Concept and Implementations.
Google Dremel. Concept and Implementations.
 
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience SharingClickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
 
Managed Cluster Services
Managed Cluster ServicesManaged Cluster Services
Managed Cluster Services
 
Presto & differences between popular SQL engines (Spark, Redshift, and Hive)
Presto & differences between popular SQL engines (Spark, Redshift, and Hive)Presto & differences between popular SQL engines (Spark, Redshift, and Hive)
Presto & differences between popular SQL engines (Spark, Redshift, and Hive)
 
Self Service Analytics at Twitch
Self Service Analytics at TwitchSelf Service Analytics at Twitch
Self Service Analytics at Twitch
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
 
Building Scalable Big Data Pipelines
Building Scalable Big Data PipelinesBuilding Scalable Big Data Pipelines
Building Scalable Big Data Pipelines
 
NoSQL no more: SQL on Druid with Apache Calcite
NoSQL no more: SQL on Druid with Apache CalciteNoSQL no more: SQL on Druid with Apache Calcite
NoSQL no more: SQL on Druid with Apache Calcite
 
empirical analysis modeling of power dissipation control in internet data ce...
 empirical analysis modeling of power dissipation control in internet data ce... empirical analysis modeling of power dissipation control in internet data ce...
empirical analysis modeling of power dissipation control in internet data ce...
 
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to Redshift
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul MasterCornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
 
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
 

Viewers also liked

Advanced Analytics using Apache Hive
Advanced Analytics using Apache HiveAdvanced Analytics using Apache Hive
Advanced Analytics using Apache HiveMurtaza Doctor
 
Navigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryNavigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryDataWorks Summit/Hadoop Summit
 
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...Cloudera, Inc.
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Upping your NiFi Game with Docker
Upping your NiFi Game with DockerUpping your NiFi Game with Docker
Upping your NiFi Game with DockerAldrin Piri
 
Deploying a Governed Data Lake
Deploying a Governed Data LakeDeploying a Governed Data Lake
Deploying a Governed Data LakeWaterlineData
 
Selective Data Replication with Geographically Distributed Hadoop
Selective Data Replication with Geographically Distributed HadoopSelective Data Replication with Geographically Distributed Hadoop
Selective Data Replication with Geographically Distributed HadoopDataWorks Summit
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...DataWorks Summit/Hadoop Summit
 
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...Cloudera, Inc.
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWSAmazon Web Services
 
The Google Chubby lock service for loosely-coupled distributed systems
The Google Chubby lock service for loosely-coupled distributed systemsThe Google Chubby lock service for loosely-coupled distributed systems
The Google Chubby lock service for loosely-coupled distributed systemsRomain Jacotin
 

Viewers also liked (12)

Big Data Applications
Big Data ApplicationsBig Data Applications
Big Data Applications
 
Advanced Analytics using Apache Hive
Advanced Analytics using Apache HiveAdvanced Analytics using Apache Hive
Advanced Analytics using Apache Hive
 
Navigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryNavigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data Discovery
 
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Upping your NiFi Game with Docker
Upping your NiFi Game with DockerUpping your NiFi Game with Docker
Upping your NiFi Game with Docker
 
Deploying a Governed Data Lake
Deploying a Governed Data LakeDeploying a Governed Data Lake
Deploying a Governed Data Lake
 
Selective Data Replication with Geographically Distributed Hadoop
Selective Data Replication with Geographically Distributed HadoopSelective Data Replication with Geographically Distributed Hadoop
Selective Data Replication with Geographically Distributed Hadoop
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
 
The Google Chubby lock service for loosely-coupled distributed systems
The Google Chubby lock service for loosely-coupled distributed systemsThe Google Chubby lock service for loosely-coupled distributed systems
The Google Chubby lock service for loosely-coupled distributed systems
 

Similar to Big Data at Tube: Events to Insights to Action

Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Big Data Spain
 
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Kai Wähner
 
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...Big Data Spain
 
Apply Machine Learning to Microservices
Apply Machine Learning to MicroservicesApply Machine Learning to Microservices
Apply Machine Learning to MicroservicesKai Wähner
 
Findability Day 2016 - Big data analytics and machine learning
Findability Day 2016 - Big data analytics and machine learningFindability Day 2016 - Big data analytics and machine learning
Findability Day 2016 - Big data analytics and machine learningFindwise
 
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...Matt Stubbs
 
Criteo TektosData Meetup
Criteo TektosData MeetupCriteo TektosData Meetup
Criteo TektosData MeetupOlivier Koch
 
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...Codemotion
 
New machine learning challenges at Criteo
New machine learning challenges at CriteoNew machine learning challenges at Criteo
New machine learning challenges at CriteoOlivier Koch
 
RTBkit Introduction & Best Practices
RTBkit Introduction & Best PracticesRTBkit Introduction & Best Practices
RTBkit Introduction & Best PracticesDatacratic
 
Applying R in BI and Real Time applications EARL London 2015
Applying R in BI and Real Time applications EARL London 2015Applying R in BI and Real Time applications EARL London 2015
Applying R in BI and Real Time applications EARL London 2015Lou Bajuk
 
TIBCO Innovation Workshop Series: Reducing Decision Latency with Streaming An...
TIBCO Innovation Workshop Series: Reducing Decision Latency with Streaming An...TIBCO Innovation Workshop Series: Reducing Decision Latency with Streaming An...
TIBCO Innovation Workshop Series: Reducing Decision Latency with Streaming An...Nelson Petracek
 
Making advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders MeetupMaking advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders MeetupOlivier Koch
 
Applying the R Language to BI and Real Time Applications
Applying the R Language to BI and Real Time ApplicationsApplying the R Language to BI and Real Time Applications
Applying the R Language to BI and Real Time ApplicationsLou Bajuk
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Holden Ackerman
 
Cloud Automation and Machine learning: A selection of real world case studies...
Cloud Automation and Machine learning: A selection of real world case studies...Cloud Automation and Machine learning: A selection of real world case studies...
Cloud Automation and Machine learning: A selection of real world case studies...Amazon Web Services
 
Nosql Now 2015
Nosql Now 2015Nosql Now 2015
Nosql Now 2015kaiyzen
 
Presentación Paco Bermejo - La Noche del Sector Financiero
Presentación Paco Bermejo - La Noche del Sector FinancieroPresentación Paco Bermejo - La Noche del Sector Financiero
Presentación Paco Bermejo - La Noche del Sector FinancieroJorge Puebla Fernández
 
Industrial Internet of Things: Protocols an Standards
Industrial Internet of Things: Protocols an StandardsIndustrial Internet of Things: Protocols an Standards
Industrial Internet of Things: Protocols an StandardsJavier Povedano
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analyticshuguk
 

Similar to Big Data at Tube: Events to Insights to Action (20)

Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
 
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
 
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
 
Apply Machine Learning to Microservices
Apply Machine Learning to MicroservicesApply Machine Learning to Microservices
Apply Machine Learning to Microservices
 
Findability Day 2016 - Big data analytics and machine learning
Findability Day 2016 - Big data analytics and machine learningFindability Day 2016 - Big data analytics and machine learning
Findability Day 2016 - Big data analytics and machine learning
 
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
 
Criteo TektosData Meetup
Criteo TektosData MeetupCriteo TektosData Meetup
Criteo TektosData Meetup
 
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...
 
New machine learning challenges at Criteo
New machine learning challenges at CriteoNew machine learning challenges at Criteo
New machine learning challenges at Criteo
 
RTBkit Introduction & Best Practices
RTBkit Introduction & Best PracticesRTBkit Introduction & Best Practices
RTBkit Introduction & Best Practices
 
Applying R in BI and Real Time applications EARL London 2015
Applying R in BI and Real Time applications EARL London 2015Applying R in BI and Real Time applications EARL London 2015
Applying R in BI and Real Time applications EARL London 2015
 
TIBCO Innovation Workshop Series: Reducing Decision Latency with Streaming An...
TIBCO Innovation Workshop Series: Reducing Decision Latency with Streaming An...TIBCO Innovation Workshop Series: Reducing Decision Latency with Streaming An...
TIBCO Innovation Workshop Series: Reducing Decision Latency with Streaming An...
 
Making advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders MeetupMaking advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders Meetup
 
Applying the R Language to BI and Real Time Applications
Applying the R Language to BI and Real Time ApplicationsApplying the R Language to BI and Real Time Applications
Applying the R Language to BI and Real Time Applications
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
 
Cloud Automation and Machine learning: A selection of real world case studies...
Cloud Automation and Machine learning: A selection of real world case studies...Cloud Automation and Machine learning: A selection of real world case studies...
Cloud Automation and Machine learning: A selection of real world case studies...
 
Nosql Now 2015
Nosql Now 2015Nosql Now 2015
Nosql Now 2015
 
Presentación Paco Bermejo - La Noche del Sector Financiero
Presentación Paco Bermejo - La Noche del Sector FinancieroPresentación Paco Bermejo - La Noche del Sector Financiero
Presentación Paco Bermejo - La Noche del Sector Financiero
 
Industrial Internet of Things: Protocols an Standards
Industrial Internet of Things: Protocols an StandardsIndustrial Internet of Things: Protocols an Standards
Industrial Internet of Things: Protocols an Standards
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analytics
 

Recently uploaded

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 

Recently uploaded (20)

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 

Big Data at Tube: Events to Insights to Action

  • 1. Big Data at Tube (Events → Insights → Actions) 27th April 2016 @John Trenkle (Chief Scientist) @Murtaza Doctor (Director of Engineering, RTB)
  • 2. ©2016 TubeMogul Inc. All rights reserved. • Where do we fit? • What do we do? • Life of a video Ad • RTB Architecture • Events Architecture • ML Perspective: Transactional -> User-Oriented • Data -> Models • Models -> Action Outline
  • 4. ©2016 TubeMogul Inc. All rights reserved. Where does TubeMogul fit?
  • 5. ©2016 TubeMogul Inc. All rights reserved. Scale: An enterprise software company for digital branding ● Processed over 12.6 Trillion Ad Auctions in 2015 ● Serve over 55 billion auctions per day ● Served over 3 Billion Ad Impressions on linear TV via our PTV solution ● Process bids in < 50 ms ● Serve bid responses in < 80 ms (includes network round-trip) ● Serve 5 PB of monthly video traffic
  • 6. ©2016 TubeMogul Inc. All rights reserved. Ex: Life of a video Ad:
  • 7. ©2016 TubeMogul Inc. All rights reserved. Technical Overview Bidding Layer Ad Serving - High Volumes - Low Latency - Small Packets - Large Data Sets - Low Latency - Fast Processing - Large Caches Low Latency User Database for User Targeting and Frequency Capping
  • 8. ©2016 TubeMogul Inc. All rights reserved. Events Architecture: ● Auctions (Bids + Non Bids) ● Win Events (Impressions) ● Columnar format (ORC) ● Data Pipeline? ● Bad data? ● Scaling challenges ● Multiple downstream consumers
  • 9. ©2016 TubeMogul Inc. All rights reserved. Events Architecture
  • 10. ©2016 TubeMogul Inc. All rights reserved. Events Architecture: Takeaways ● Simply and Unify ● Focus on Data Validation at each step ● Automated recovery ● Leverage the messaging system for status or completion ● Metrics & Measurement for SLA
  • 11. ©2016 TubeMogul Inc. All rights reserved. Machine-Learning as a Consumer • Audience Modeling begets user-oriented data • Pivot RTB / Analytics sources for model-building • Many sources of Truth that need to be integrated • Ad Interaction • Characterize Users with robust signature (UU-Code) rather than just an item list • Facilitate rapid prototyping and model-building • Maintain enriched information for exploratory analysis and visualization • Insights • Actionable Intel
  • 12. ©2016 TubeMogul Inc. All rights reserved. Ad Calls to User-Traces in Hive (on path to NoSQL) Hive RTB Ad Calls RTB Digest User Activity NoSQL RTB Ad Calls User Activity Elastic Search
  • 13. ©2016 TubeMogul Inc. All rights reserved. Token Embedding Models and Spark http://deepdist.com/ Ref: http://static.googleusercontent.com/media/research.google.com/en//archive/large_deep_networks_nips2012.pdf
  • 14. ©2016 TubeMogul Inc. All rights reserved. Cascading for Signatures 1. JOIN on tm_client 2. Filter average weight per verticals < 0.5 Daily Users Activities Prefixed Daily UUCode Creation Process Daily UUCodes TM Client Daily Activity3 Get Truth Users By LAL Segment Daily Truth Users for all LAL segment Centroid Creation Process LAL Landmarks Segment Creation Process User Membership Unfiltered UUCode Model TM Daily Converters Convs LAL segments from Mario User Membership Attach SourceID Process Daily UUCodes with Source ID TMClientID SourceID Lookup Aggregated UUCode Creation Process UU Code TM Client Digest3 Create SourceID Lookup Process Wormhole Process Segment Filter Process ~650GB UDB Team Persistent Users Table
  • 15. ©2016 TubeMogul Inc. All rights reserved. Large-Scale Predictive Model Building Get Truth Users, signature Data Warehouse Of truth users Training Data Creation Training Data for segments Ground Truth For each segment, perform training Check performance, log in mysql for tracking purposes. Model/ weights file for each segment Aggregate and Convert to UUCode UU Code Model 3 months aggregatio n Segment Information Dashboard UI
  • 16. ©2016 TubeMogul Inc. All rights reserved. Partners that have Contributed to Our Ecosystem • Qubole • Long-time partners • Great for Ad Hoc queries and scheduled ETL • Dynamic Scaling • Snowflake • Data Warehouse – facilitates Fraud Analysis • SpotInst • Cost effective Spot Instances in EMR • Robust provisioning • Dynamic Scaling • Driven • Monitor, optimize and debug Hadoop flows
  • 17. ©2016 TubeMogul Inc. All rights reserved. Since Hive has been our primary datastore for a while… • Tips and tricks • ORC • MAPJOIN • Sorted, Bucketed JOINs • TRANSFORM • HAVING • Hadoop Streaming
  • 18. ©2016 TubeMogul Inc. All rights reserved. Models → Action • Optimization • Surrogate measures of engagement: Clicks, Completions, Conversions • Audience Building for Targeting • Demographic • Behavioral • Fraud Detection • Cross Device Synching • Profiling / Data Mining / Actionable Intel