Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
AD CLICK PREDICTION USING APACHE SPARK
MACHINE LEARNING
1 Private and confidential. Copyright (C) 2017, Imaginea Technolog...
 Advanced Machine Learning
 Democratization of Machine Learning
 Future of Machine Learning
 Data Flywheels
 The Algo...
AD CLICK PREDICTION USING APACHE SPARK
MACHINE LEARNING
3 Private and confidential. Copyright (C) 2017, Imaginea Technolog...
TABLE OF CONTENT
Context
Business problem
Challenges
Solution
Summary
4 Private and confidential. Copyright (C) 2017, Imag...
Industry Challenge
“Predicting ad click–through rates (CTR) is a massive-scale learning problem that is central to the
mul...
MotivationContext
Publisher Advertiser Bid ($) Predicted CTR Expected Bid
ESPN Nike 1 0.6 1 x 0.6 = 0.6
ESPN Gucci 2 0.1 2...
TABLE OF CONTENT
Context
Business problem
Challenges
Solution
Summary
7 Private and confidential. Copyright (C) 2017, Imag...
Ad-click prediction challengesProblem
How to build
a predictive
model that …
Can deal with huge
data volume
Has high predi...
TABLE OF CONTENT
Context
Business problem
Challenges
Solution
Summary
9 Private and confidential. Copyright (C) 2017, Imag...
Big Data
Need for a distributed data processing engine to handle volume, variety and
velocity of data
Challenges
▪ Billion...
Data sparsity
How to efficiently handle sparse datasets?
Challenges
▪ Millions of categorical features
▪ Huge number (in m...
Predictive models
Scalable and effective predictive models
Challenges
▪ Logistic regression models has historically
been t...
Online learning
Piecemeal learning
Challenges
▪ Update model weights looking
one input vector at a time
▪ Ideally, avoid l...
TABLE OF CONTENT
Context
Business problem
Challenges
Solution
Summary
14 Private and confidential. Copyright (C) 2017, Ima...
Handle big & sparse data using Apache SparkSolution
▪ Distributed data processing
▪ Use Apache Spark “dataframes” API to
b...
Online predictive models
What we tried and what worked?
Solution
Logistic Regression XGboost on Spark FFM
Availability Spa...
Field-aware Factorization Machine (FFM)Solution
▪ For each unique feature, it learns a field
aware vector representation
▪...
Apache Spark + FFM
Best of both worlds
Solution
 Use Apache Spark for data joining, cleaning, transformation and featurin...
TABLE OF CONTENT
Context
Business problem
Challenges
Solution
Summary
19 Private and confidential. Copyright (C) 2017, Ima...
SummarySolution
 Apache Spark is ideal for handling large datasets and explore them
 But for specific cases like ad-clic...
IMAGINEA TECHNOLOGIES:
CORPORATE OVERVIEW
21 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All ...
Pramati’s M&A’s of
Leading products
Serving from 5
Global Locations
Innovation
Enablement
Over 200
Product Companies
Uniqu...
Our credentials
Building product on Spark
since 2014
Contribution to Spark code:
Spark Scala, Packaging Spark,
Compilation...
Our expertise
Data management
Data augmentation
Data ingestion optimization
Data filtering
Operations
Cluster management
S...
FOLLOW US ON
25 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.
HAVE ANY QUES...
Disclaimer
This document may contain forward-looking statements concerning products and strategies. These statements are b...
Upcoming SlideShare
Loading in …5
×

Machine Learning (ML) applications in online advertising

2,493 views

Published on

Are you looking for ways to improve accuracy of your contextual ad targeting?

Machine Learning applications using Apache Spark help in improving online campaigns by precisely predicting ad Click-Through Rate (CTR).

Machine Learning analyzes data and information of users' online behaviours and predicts new data for online advertising. For instance, Machine Learning could be used in:
- Market segmentation
- Personalized messaging
- Display advertisements and lookalike targeting
- Customer Lifetime Value (CLV)
- Data Management Platforms (DMP) to enhance user data for better decisions

To know more about the benefits and applications of Machine Learning in online marketing, click here: https://www.imaginea.com/ad-click-prediction-using-apache-spark

You could also write to us at connect@imaginea.com.

Published in: Marketing

Machine Learning (ML) applications in online advertising

  1. 1. AD CLICK PREDICTION USING APACHE SPARK MACHINE LEARNING 1 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.
  2. 2.  Advanced Machine Learning  Democratization of Machine Learning  Future of Machine Learning  Data Flywheels  The Algorithm Economy  Cloud hosted intelligence  Machine Learning Platforms  ML/AI at Center Stage Introduction Trends in Machine Learning ML Application Development: Systems that understand, learn, predict, adapt & potentially operate autonomously Industries: Preventive Healthcare Banking Finance Media Supply Chain 2 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.
  3. 3. AD CLICK PREDICTION USING APACHE SPARK MACHINE LEARNING 3 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.
  4. 4. TABLE OF CONTENT Context Business problem Challenges Solution Summary 4 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.
  5. 5. Industry Challenge “Predicting ad click–through rates (CTR) is a massive-scale learning problem that is central to the multi-billion dollar online advertising industry” ~ Google Context ▪ Ad platforms collect huge data to help them predict ad clicks ▪ A good predictive model is essential to serve ads efficiently to optimize over all economic value ▪ Sponsored search advertising, contextual advertising, display advertising, and real- time bidding auctions have all relied heavily on the ability of learned models to predict ad click–through rates accurately, quickly, and reliably 5 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.
  6. 6. MotivationContext Publisher Advertiser Bid ($) Predicted CTR Expected Bid ESPN Nike 1 0.6 1 x 0.6 = 0.6 ESPN Gucci 2 0.1 2 x 0.1 = 0.2 Pay-per-click policy: an advertiser pays only to the extent that their ads are clicked by users 6 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.
  7. 7. TABLE OF CONTENT Context Business problem Challenges Solution Summary 7 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.
  8. 8. Ad-click prediction challengesProblem How to build a predictive model that … Can deal with huge data volume Has high predictive power Is conducive to incremental learning Deals with high dimensional sparse data 8 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.
  9. 9. TABLE OF CONTENT Context Business problem Challenges Solution Summary 9 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.
  10. 10. Big Data Need for a distributed data processing engine to handle volume, variety and velocity of data Challenges ▪ Billions of ad-impressions served per day ▪ Millions of users and their history ▪ For any decent ad-exchange this data will be of 100s of GBs order of magnitude ▪ Almost impossible to crunch that much data on single machine shared memory model The volume, variety and velocity of the incoming data makes distributed data processing essential 10 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.
  11. 11. Data sparsity How to efficiently handle sparse datasets? Challenges ▪ Millions of categorical features ▪ Huge number (in millions) of potential features for each input vector ▪ But only limited set of actual features per vector ▪ Generally stored in a dense format but algorithms expect categorical data to be encoded 11 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.
  12. 12. Predictive models Scalable and effective predictive models Challenges ▪ Logistic regression models has historically been the workhorse for such tasks ▪ However, a number of studies in last few years have noted the effects of feature conjunction is important ▪ So need scalable non-linear models that can take feature interactions into account Empirically, what models work best for this particular domain? 12 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.
  13. 13. Online learning Piecemeal learning Challenges ▪ Update model weights looking one input vector at a time ▪ Ideally, avoid loading the whole input dataset into memory Can the predictive model be trained on streaming data? 13 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.
  14. 14. TABLE OF CONTENT Context Business problem Challenges Solution Summary 14 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.
  15. 15. Handle big & sparse data using Apache SparkSolution ▪ Distributed data processing ▪ Use Apache Spark “dataframes” API to build data processing pipeline ▪ Dataframes API are fully SQL compliant and highly optimized under the hood ▪ Flexibility to write own custom transformers and UDFs ▪ Tip: Use feature hashing to constrain the model size 15 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.
  16. 16. Online predictive models What we tried and what worked? Solution Logistic Regression XGboost on Spark FFM Availability Spark ML Spark package Separate C++ library - libFFM Online updates Possible No Yes Features One hot encoded vectors Counts based Custom encoded vectors Distributed learning Yes Yes No Outbrain Score 0.63 0.64 0.68 16 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.
  17. 17. Field-aware Factorization Machine (FFM)Solution ▪ For each unique feature, it learns a field aware vector representation ▪ Needs to see an input vector only once - weight updates one instance at a time ▪ Learns feature interaction very effectively ▪ Uses AdaGrad for matrix factorization ▪ Hyperparameters: k (weights vector length), η learning rate, λ (regularization parameter) ▪ But: shared memory algorithm, hard to implement distributed version 17 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.
  18. 18. Apache Spark + FFM Best of both worlds Solution  Use Apache Spark for data joining, cleaning, transformation and featuring: fast and easy to use Dataframes API for the task  Use transformed data to train FFMs on a single machine  Alternatively, build a streaming pipeline to transform each incoming input into feature vector and send to FFM for model updates  Use trained FFM model for real time ad-click probability 18 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.
  19. 19. TABLE OF CONTENT Context Business problem Challenges Solution Summary 19 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.
  20. 20. SummarySolution  Apache Spark is ideal for handling large datasets and explore them  But for specific cases like ad-click prediction where the data is very high dimensional (million of features) and sparse (each instance only has tens of features), the current algorithms in Spark ML/MLLib may not be upto the mark  Some external Apache Spark packages like XGBoost have been made available, so need to use them whenever needed  Some highly effective algorithm like FFM are not yet on Apache Spark  But they can be easily integrated into the overall Apache Spark workflow to take advantage of cluster resources - e.g for parameter tuning etc. 20 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.
  21. 21. IMAGINEA TECHNOLOGIES: CORPORATE OVERVIEW 21 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.
  22. 22. Pramati’s M&A’s of Leading products Serving from 5 Global Locations Innovation Enablement Over 200 Product Companies Unique Products & Services Agile Methodology User-centric Design Open Source Contributions Products built from conception-code-cash Imaginea: Agile Engineering Culture 22 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.
  23. 23. Our credentials Building product on Spark since 2014 Contribution to Spark code: Spark Scala, Packaging Spark, Compilation for Scala, API Part of Spark team since 2013 while it was a Berkley project, worked commercially with DataBricks for developing it Over 20 patches to Apache Hadoop big data platform, worked commercially with TubeMogul on video analytics Contribution to Zeppelin code: JDBC Intepreter, Wildcard Parsing, Integration 23 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.
  24. 24. Our expertise Data management Data augmentation Data ingestion optimization Data filtering Operations Cluster management Storage optimization Solutions Predictive search Predictive Analytics Interactive data exploration 24 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.
  25. 25. FOLLOW US ON 25 Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve. HAVE ANY QUESTIONS? Just tweet your question with the hashtag #AskImaginea https://www.linkedin.com/company/imaginea https://twitter.com/ImagineaTech https://www.slideshare.net/Imaginea For more details about Imaginea, visit www.imaginea.com or write to connect@imaginea.com
  26. 26. Disclaimer This document may contain forward-looking statements concerning products and strategies. These statements are based on management's current expectations and actual results may differ materially from those projected, as a result of certain risks, uncertainties and assumptions, including but not limited to: the growth of the markets addressed by our products and our customers' products, the demand for and market acceptance of our products; our ability to successfully compete in the markets in which we do business; our ability to successfully address the cost structure of our offerings; the ability to develop and implement new technologies and to obtain protection for the related intellectual property; and our ability to realize financial and strategic benefits of past and future transactions. These forward-looking statements are made only as of the date indicated, and the company disclaims any obligation to update or revise the information contained in any forward-looking statements, whether as a result of new information, future events or otherwise. All Trademarks and other registered marks belong to their respective owners. Copyright © 2017, Imaginea Technologies, Inc. and/or its affiliates. All rights reserved. Credits Images under Creative Commons Zero license. Private and confidential. Copyright (C) 2017, Imaginea Technologies Inc. All rights reserve.26

×