Big data, predictive modeling and analytics in online marketing


Published on

Invited talk at Predictive Analytics Innovation Summit. Case studies, for retention, lead scoring, conversion models, wine recommender, and others

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Big data, predictive modeling and analytics in online marketing

  1. 1. Big Data in Online Marketing Predictive Analytics Innovation Summit Daqing Zhao, PhD Director SEM Analytics at 2/24/2012, San Diego ©Daqing Zhao All rights reserved
  2. 2. Agenda• Overview of big data analytics• Insights of big data modeling• A case for preference profiles – Recommender for a wine seller• Cases for behavioral profiles for predictive models – Yahoo mail retention – Tribal Fusion display ads impression optimization – University of Phoenix student retention – University of Phoenix lead optimization• Case of SEM algorithms 2
  3. 3. Daqing Zhao, PhD• Big Data scientist with deep domain knowledge• Academic training – Analyzed molecular spectra on Cray supercomputers – Determined, modeled, simulated molecular motions in 3D space• Enjoy working with large data and large scale computing• At Bank of America, led the development of a risk management system of global portfolio• Worked on computational Internet marketing since 1999 3
  4. 4. Big data, Big Opportunities• Thanks to Moore’s law, on CPU, storage, network connections• Too much data, too little knowledge• Data, analytics changed every field• From science, government, to commerce 4
  5. 5. Things computers good at• Computers have perfect memory – Every page view, click, transaction, every event,…• Good at finding a needle in a haystack – Identify clickers of any particular web page at some time – E.g., target abandoned shopping carts with promotions• Good at trade offs among a large number of factors – Female, 25-34, with child < 5, Asian, earning $30K, rent, divorced, live in Calif., some college, Walmart, visits,, drive Camry, … – Buyer of X or not? 5
  6. 6. Computers make it possible• Given data, optimize models and parameters – Identify reproducible patterns in the data – Provide a simple picture, predict events in the future• Simulations generate future events, given assumptions, and current state – Given a set of models, how future scenario will look like, under given set of conditions, “what ifs” – Like flight simulator• Crowd sourcing from big data and big data modeling – Define similarity, translations, quality, relevance 6
  7. 7. Computers can’t do everything• Data often have issues before being well analyzed• Data often have no taxonomy and context• Free format data, relevant information need to be extracted• Computers don’t define targets, construct predictors• Don’t know if critical predictive factors are missing• Computers don’t have common sense• Computers don’t have goals to achieve 7
  8. 8. Modeling need to scale• Traditional predictive models take long time to build – Small data sets, samples expensive to collect• Now data are cheap and models may degrade in weeks – Dimension of predictors are very large – Number of categories are large• Human interactive model building not scalable• Reasons for target events are complex• Without detailed analysis, it is unclear what drives the event• We need to rely on “out of sample testing” and “off the shelf” modeling 8
  9. 9. Big Data problem• Data size larger than what databases can handle• Terabytes of data may take hours just to scan it• A solution requires a cloud of servers with local storage – Read, process and write intermediate results in parallel – Aggregate at the end• Cloud computing build models in scale• Cloud often scales linearly as number of servers 9
  10. 10. Cloud computing• We built a SAS cloud at University of Phoenix – I have an invited SAS talk available at SAS web site – We can process billions of impressions in minutes• Hadoop clouds are used widely – Open source software – Commodity servers and storage• Clouds may have 100Ks of servers – Find needle in a haystack in milliseconds – Model computations usually would take years to compute now finishes in minutes 10
  11. 11. Example: Google Data CentersEstimated 500K commodity servers Data centers near Columbia River At Dalles, Oregon 11
  12. 12. In use from 1999 to 2001CUSTOMER PREFERENCE PROFILES
  13. 13. 1:1 email case• Weekly emails recommending 6 wines• Inventory of 20K+ wines• had clean data – Purchase, time, product, spend – Wine color, varietal, body, acidity, oak, tannin, sweetness, complexity, price, producer, region – Email response – Self reported preferences and demographics – Web behavior clusters – No data of explicit customer rating, like Netflix• Most customers have one or two data points 13
  14. 14. Dynamic Newsletter Dear “First name”, Welcome to our Newsletter. Celebrate holidays with family and friends with a bottle of some wine. History of some wine. Tips on wine tasting. Recipes using wine. Health benefits of wine. Wine drinking is socially fashionable, culturally sophisticated, etc., etc. Clicks tracked andDynamic XML Sincerely, Linked to purchases Template Signature Text blurb Text blurb Text blurb Text blurb Text blurb Text blurb 14
  15. 15. Wine direct marketing• Goal, to lift purchase revenue• Present wines customers more likely to buy• A/B testing against weekly selections by merchandisers• Concentrate on long time performance – Over many email campaigns• Focus on most important predictor – behavior profile 15
  16. 16. Build similarity of all wines• Decompose purchases into product attributes – Even 1 click can generate a taste profile• When go out of stock, wine profile info still usable• New inventory immediately mapped to existing profile• Build an implicit and explicit profile of customer• Add association rules, “customer bought these also bought… “• For new customers, augment profile with nearest neighbors who had more purchases as “mentors” 16
  17. 17. Customer experience is key• Recommend similar wines – Based on cosine distance to taste profile, price, and text mining on producer name, region, country – Shuffle among higher scored wines – Repeated campaigns take care of prediction errors – Dedup recent recommendations and purchases – Use decaying memory function and factor in seasonality• Reinforce learning• Use simulations to ensure quality 17
  18. 18. Learnings and insights• Our 1:1 emails increased revenue up to 300%• Out perform 40% over 2+ year period• Purchase data most important – Putting money where your mouth or mouse is• Email response data also predictive• Self reported preferences are different from actions – Talk the talk versus walk the walk• Aggregated web segments least useful 18
  20. 20. Email retention models• New email subscribers, 40% never return – High “infant” mortality rate – Activity immediately after sign ups correlate with normal retention – Frequent page views in certain pages, such as Help and Junk folders predictive – Find actionable retention drivers, such as send welcome emails, improve customer service, user experience, etc. 20
  21. 21. Online edu retention models• Students have low persistence rate until after several courses – Depend on major, credits finished, demo, socio- economic status, first generation students – Also by lead source, lead form entries, etc.• We set up to track data include search, display, landing page, home site, call center, enrollment, class finishes,…, 360 degree view• Billions of events per month 21
  22. 22. Lead conversion models• From impression to sign up as a lead is just 1/3 of student life cycle• Leads have very low enrollment rates – Takes 3 to 6 months to enroll – Leads easy to convert may also be easier to drop out• Need student performance data over long time to assess – Trade off between statistics and relevance – Use life time values, brand values, cost of service to determine media allocation 22
  23. 23. Display ad conversion models• Advertisers have different conversion drivers – Publisher, channel, geo, behavior, demographic data, data append, session depth, etc. – Require an array of predictive models on conversion to work together with an auction engine• Billions of display ads – Individual and event information• Too many models, too little time to build by humans 23
  24. 24. Unexpected data challenges• Task: predict enrollment and revenue in future• Problem: more than one definitions of metrics – Made by past business analysts, using reasonable business rules• Some rules are built into a BI reporting product – FP&A watches them every month as “truth” they monitor and guide the street• With IT/BI turnovers, rules change over time – Few current people knew or can articulate the rules 24
  25. 25. Solution with data issues• Without the rules, cannot calculate their version of enrollment and revenue from student financial transaction data• After several meetings, still no correct rules• We then modeled time series of reported data – One time data errors diluted – Rules changes long ago also less weighted• We were able to predict customer and revenue for 3 to 6 months 25
  26. 26. Data most important• In modeling, find key data most important – Identify the smoking gun• Data transformations – PageRank is a game changing data transformation – case, wineRank – Social graph is a key data transformation for credit card fraud detection 26
  27. 27. Modeling can go wrong• Leakage in lead scoring model – For example, use lead source to predict conversion, when certain values of the field were populated only for converters• Display ads conversion model – Construct data set by taking all converters and a sample of non-converters – Predict conversion using page view profiles, etc. – Problem: sample of non-converters included customers who had no impressions of the ad 27
  28. 28. Modeling lessons• Yahoo DSL subscribers with one year contract• If you try to model month to month retention, you find high retention rate – Due to contracts and penalties• The correct way is to model retention at contract expiry, only on 1/12 of the customers• For Yahoo email, if you look at quarter by quarter retention, you find that those acquired early in the first quarter have lower retention rate – Because those customers have more time to churn• A correct way is to use survival analysis 28
  29. 29. SEM Analytics
  30. 30. background• Founded ~16 years ago• attracts 100 million global users – Biggest Q&A site on the web• Over last 2 years we’ve revamped our approach to Q&A with a product that – Combines search technology with answers from real people• Instead of 10 blue links, we deliver – Real answers to people’s questions – both from already published data sources – And our growing community of users – on the web and across mobile 30
  31. 31. Ask SEM Analytics Systems• Select quality keywords at Big Data scale• Determine bids using search engine and internal data• Keyword segmentation and clustering at big data level – Text mining, behavioral association, historic performance – Use of data from organic traffic – Map similarity of keywords• Optimize landing page and custom creatives• Reinforce learning, testing hypotheses• Optimize algorithms and parameters via A/B tests 31
  32. 32. SEM Bid Algorithms• Building models for revenue at keyword level, predictive modeling using data include – User search streams – Ad depth, Landing Page CTR – Quality Score and minCPC – Effective CPC – Keyword categories – Natural language clusters – Search behavioral clusters• Use Hadoop/Hive/Mahout to process data 32
  33. 33. Benefits of SEM Algorithms• Predict keyword performance• Bid the right keyword at the right price, at the right time• Improve ROI, maximize profitable traffic volume• Shift traffic to keywords with higher quality scores• Optimize user experience• Find similar keywords for management and expansion 33
  34. 34. Segmenting keywords• In order to manage a large portfolio• We group keywords together based on – Customer behavior – Text mining – Keyword performance metrics• Generate keyword groups for content and bid management• Similar keywords have similar performance• Leverage learnings to other keywords 34
  35. 35. Conclusions• For optimal modeling, dive deep in domain knowledge• Identify key data and transformations• May require Big Data solutions to scale• Data are not reliable until after being seriously analyzed• Test hypotheses and optimize in real market• Use simulations to see if changes are reasonable• Focus on customer experience not data mining tools, model complexity or predictive accuracy• Use a lot of common sense• “The best way to get good ideas to have a lot of them” – Linus Pauling 35