Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Enabling Cross-Screen
Advertising with

Machine Learning and Spark
Deb Ray

Chief Data Officer

Big Data Day LA 2016
How Media is Consumed
Consumers don’t differentiate between screens



Same Game of Thrones on 

Tablet, TV, Desktop, XBox...
How Media is Sold
But Advertising is sold in Silos
Creates a Gap
2	Sides	of	the	Advertising	Market
Selling	and	Buying	of	Ad	Inventory.
Publishers
Exchanges
TV	Providers
DSP
...
Bridging the Gap
VideoAmp’s goal is to Enable Advertisers and Content Creators to
Transact Seamlessly Across All Media Typ...
Consumer Graph
How Big is the Graph?
idfa

In-App
Phone
Uid 1

Safari
Phone
Uid 2

Firefox
Home
Uid 3

Chrome
Home
Uid 4

...
Video	Ads	:	from	Request	to	Delivery
Figure 1
Step 2: The publisher Yahoo! passes the information to the ad exchange, say,...
The	Right	Tool	:	Apache	Spark
Apache	Spark	is	a	distributed	computing	framework
that	came	out	of	AMPLab at	UC	Berkeley.
Ke...
Spark: Graph Frames
GraphFrames is a graph processing library (similar to GraphX)



- Scala, Python, Java APIs.



- Quer...
VideoAmp Flint
We open-sourced Flint: creating push-button Spark clusters

for Machine Learning and Data Science in the cl...
DEMO
Data from Devices
Data from TVs (ACR) Mobile Devices Desktop
TV ID generates:

TV program viewership



10M Smart TVs / ST...
Sparse Representation
For each class of consumption data, create Dictionary with enumeration

of all content (e.g. TMS ID)...
Graph Construction
Connected Components
Subgraphs in the graph s.t. 

there is a path between any 

two vertices.
Start with a node s, and do...
Clustering
Example with only Location (Lat / Long attributes)



We utilize Location, IP address, Types (segments),
Behavi...
Graph Inference
Find all Users similar to User A.



Fill in Missing Attributes. What is User B’s income level?
Which user...
Validation
Ground Truth from Login Data



e.g. Login to LinkedIn from Mobile, Tablet, Desktop
at Work, Laptop at Home.
Va...
Precision / Recall
High Precision -> Devices assigned to a consumer, 

belong to the consumer.



High Recall -> All devic...
TV Viewership Classification
Data from TVs (ACR)
TV ID generates: TV program viewership
Dictionary is enumeration of ~10M U...
Visualizing Embeddings
https://www.youtube.com/watch?v=RJVL80Gg3lA
Visualizing Data Using t-SNE by van der Maaten
t-distri...
Visualizing Title Embeddings
Visualizing Title Embeddings
Questions?
Bandit	Optimization
Metrics	to	Optimize:	Viewability,	Conversions.
Continue	with	same	campaign	parameters	that	have	worked...
Bandit	Optimization
Upcoming SlideShare
Loading in …5
×

Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising with Machine Learning and Spark - Debajyoti (Deb) Ray, CDO - VideoAmp

411 views

Published on

With content now viewed seamlessly across multiple screens, this shift in consumer behavior/consumption has come to a head with the way advertising is sold - separately in TV and online silos - creating an opportunity to make advertising more effective using data and machine learning. This talk explores technological developments at VideoAmp that bring together data from disparate mediums and creates cross-screen audience models using ML methods for cross-screen bid optimization, and graph based audience models for 150 Million users, across over a billion unique device IDs, as well as behavioral insights gleaned from observing such a large variety of data.

Published in: Technology
  • Be the first to comment

Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising with Machine Learning and Spark - Debajyoti (Deb) Ray, CDO - VideoAmp

  1. 1. Enabling Cross-Screen Advertising with
 Machine Learning and Spark Deb Ray
 Chief Data Officer
 Big Data Day LA 2016
  2. 2. How Media is Consumed Consumers don’t differentiate between screens
 
 Same Game of Thrones on 
 Tablet, TV, Desktop, XBox, Roku, Apple TV
  3. 3. How Media is Sold But Advertising is sold in Silos
  4. 4. Creates a Gap 2 Sides of the Advertising Market Selling and Buying of Ad Inventory. Publishers Exchanges TV Providers DSP Advertisers Brands Agencies Trading Desks Websites Mobile apps TV Programs OTT DMP
  5. 5. Bridging the Gap VideoAmp’s goal is to Enable Advertisers and Content Creators to Transact Seamlessly Across All Media Types • Frequency capping for target consumers.
 • TV media extension to desktop / mobile campaigns.
 • Competitive conquesting. —>
  6. 6. Consumer Graph How Big is the Graph? idfa
 In-App Phone Uid 1
 Safari Phone Uid 2
 Firefox Home Uid 3
 Chrome Home Uid 4
 Firefox Work Location Login • 1.5B+ unique cookie IDs, Device IDs.
 • 150M+ nodes.
 • Behavioral data from each ID (several TBs / day).
  7. 7. Video Ads : from Request to Delivery Figure 1 Step 2: The publisher Yahoo! passes the information to the ad exchange, say, Google DoubleClick AdX, including Figure 1 Step 2: The publisher Yahoo! passes the information to the ad exchange, say, Google DoubleClick AdX, including the URL where the ad slot is located, vertical of the web page content such as sports, and user cookie id. Step 3: The ad exchange AdX composes a bid request and sends the bid requests to several DSPs. Let’s assume the DSP iPinYou is one of them. Step 4: When the iPinYou DSP server receives the bid request from the ad exchange AdX, it passes the information Figure 1 Step 2: The publisher Yahoo! passes the information to the ad exchange, say, Google DoubleClick AdX, including the URL where the ad slot is located, vertical of the web page content such as sports, and user cookie id. Figure 1 Step 2: The publisher Yahoo! passes the information to the ad excha the URL where the ad slot is located, vertical of the web page conten Step 3: The ad exchange AdX composes a bid request and sends the Figure 1 Step 2: The publisher Yahoo! passes the information to the ad exchange, say, Google DoubleClick AdX, including the URL where the ad slot is located, vertical of the web page content such as sports, and user cookie id. Step 3: The ad exchange AdX composes a bid request and sends the bid requests to several DSPs. Let’s assume the DSP iPinYou is one of them. Figure 1 Step 2: The publisher Yahoo! passes the information to the ad exchange, say, Google DoubleClick AdX, including the URL where the ad slot is located, vertical of the web page content such as sports, and user cookie id. Figure 1 1. User Visits 2. Calls Ad Exchange 3. Bid Request 4. User ID, IP 5. User ID 6. User Data 7. Bid Price 8. Bid CPM, Ad Tag 9. Auction winner’s Ad Tag, 2nd price CPM 10. Calls Winner’s Ad Tag. 11. Serves Ad 12. Displays Ad Ad Server Ad Exchange Bid Listener Decision Engine User Data Storage 20 ms to calculate Whole process Takes ~100 ms
  8. 8. The Right Tool : Apache Spark Apache Spark is a distributed computing framework that came out of AMPLab at UC Berkeley. Key innovation is a Resilient Distributed Dataset (RDD): Logical collection of data partitioned across machines. Worker tasks results RAM Input Data Worker RAM Input Data Worker RAM Input Data Driver Figure 2: Spark runtime. The user’s driver program launches multiple workers, which read data blocks from a distributed file system and can persist computed RDD partitions in memory. ule tasks based on data locality to improve performance. Second, RDDs degrade gracefully when there is not enough memory to store them, as long as they are only being used in scan-based operations. Partitions that do not fit in RAM can be stored on disk and will provide similar performance to current data-parallel systems. 2.4 Applications Not Suitable for RDDs As discussed in the Introduction, RDDs are best suited for batch applications that apply the same operation to all elements of a dataset. In these cases, RDDs can ef- ficiently remember each transformation as one step in a tions lik Scala re these ob node to saves an the Java var x = of an RD RDDs paramet RDD[In example Altho conceptu Scala’s needed m interpret less, we 3.1 RD Table 2 available ation, sh call that new RD a value t API in Scala and Python. In our stack, Spark runs on Hadoop. Data stored in HDFS / Parquet. In some apps, involving iterative calls, Spark is upto 100X faster than MapReduce. Distributed File System (e.g. HDFS)
  9. 9. Spark: Graph Frames GraphFrames is a graph processing library (similar to GraphX)
 
 - Scala, Python, Java APIs.
 
 - Query on graphs (like SparkSQL): > g.vertices.filter(“age” > 25) > g.inDegrees.filter(“inDegree” > 2)
 
 - Supports all algorithms in GraphX, and also:
 Breadth first search (BFS) - shortest path between 2 vertices. (Strongly) connected components Label Propagation algorithm
  10. 10. VideoAmp Flint We open-sourced Flint: creating push-button Spark clusters
 for Machine Learning and Data Science in the cloud.
 
 Designed for rapid deployment while providing native access to
 data in a pre-existing HDFS / Hive cluster.
 
 - Flint: a Spark Cluster Launcher (on AWS)
 
 - Self-contained Spark Docker images.
 
 - Jupyter Docker image preloaded with Python, R, Scala kernels.
 
 Users can expand or contract the cluster on the fly.
  11. 11. DEMO
  12. 12. Data from Devices Data from TVs (ACR) Mobile Devices Desktop TV ID generates:
 TV program viewership
 
 10M Smart TVs / STBs
 Data in 15 min chunks Device ID generates:
 Sites, Video content, 
 Segments. 50K QPS over
 300M Device IDs Cookie ID generates:
 Sites, Video content, 
 Segments
 
 100K QPS over
 1B cookie IDs
  13. 13. Sparse Representation For each class of consumption data, create Dictionary with enumeration
 of all content (e.g. TMS ID), or types. e.g. demographic segments: Income = [ <30K, 30K to 60K, 60K to 90K, 90K to 120K, 120K+ ] e.g. TV programs watched TV_Programs = [“Walking Dead”, “Game of Thrones”,…,”Silicon Valley”] Then the user data is sparse: Income (User ABC123) = [0,0,0,1,0] TV_Programs (User ABC123) = [0,1,…,1]
  14. 14. Graph Construction
  15. 15. Connected Components Subgraphs in the graph s.t. 
 there is a path between any 
 two vertices. Start with a node s, and do BFS. This gives a 
 component of the graph. 
 
 At each stage, Pick an unexplored node n, and
 do BFS. This finds another component.
  16. 16. Clustering Example with only Location (Lat / Long attributes)
 
 We utilize Location, IP address, Types (segments), Behaviors (websites visited, TV program viewed)
 
 Clustering in a very high dimensional space with
 Sparse vectors.
  17. 17. Graph Inference Find all Users similar to User A.
 
 Fill in Missing Attributes. What is User B’s income level? Which users will like Brain Dead (new show)?
  18. 18. Validation Ground Truth from Login Data
 
 e.g. Login to LinkedIn from Mobile, Tablet, Desktop at Work, Laptop at Home. Validation data is used for hold-out cross-validation,
 to learn the parameters e.g. edge distance threshold, for Machine Learning.
  19. 19. Precision / Recall High Precision -> Devices assigned to a consumer, 
 belong to the consumer.
 
 High Recall -> All devices belonging to the consumer
 are correctly assigned.
  20. 20. TV Viewership Classification Data from TVs (ACR) TV ID generates: TV program viewership Dictionary is enumeration of ~10M Users Sparse vector of Video Content (0 / 1 if they saw it) Learning embedding: (TV programs, Users) —> Lookalike Programs. How do we learn embeddings? 
 
 Learn an underlying manifold -> 
 Like word2vec where document is a set of users viewing the content.
  21. 21. Visualizing Embeddings https://www.youtube.com/watch?v=RJVL80Gg3lA Visualizing Data Using t-SNE by van der Maaten t-distributed Stochastic Neighbor Embedding (t-SNE) a) IsoMap
 
 b) Locally Linear Embedding Implementations in R, Python: R package “tsne”
  22. 22. Visualizing Title Embeddings
  23. 23. Visualizing Title Embeddings
  24. 24. Questions?
  25. 25. Bandit Optimization Metrics to Optimize: Viewability, Conversions. Continue with same campaign parameters that have worked well, OR explore new parameter combinations? How to solve the Exploration-Exploitation Problem? Multi-Armed Bandits. Parameters coded in our Bidders (Actor-model in Scala). Run Simultaneously and determine prob of reward.
  26. 26. Bandit Optimization

×