Big data market research


Published on

Big data challenges for Market Research.
Presented at BIG 2014 ( part of WWW2014 (

Published in: Software, Business, Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Big data market research

  1. 1. The Big Data Challenges of Computational Market Research Frank Smadja (@FrankieMbaye) EVP Engineering Toluna April 2014
  2. 2. Toluna Table of Content 1. What is a Market Research study 2. The main challenge: Targeting. 3. Machine Learning Problem and Model 4. Some Experiments 5. Current and Future Work
  3. 3. Toluna What is a Market Research Study?
  4. 4. Toluna Market Research Goal: Answering Questions for Brands Customer/Employee Satisfaction: • Are my customers happy? • What can I do better for them? • Am I getting better or worse? Concept testing: • Would dog owners buy my organic dog food? • What should be my target market? Ad testing: • Is my advertising campaign effective? Brand positioning: • How is my brand doing compared to the competition? • What are my perceived strong features? • Where should I invest more? And many more types of questions
  5. 5. Toluna Output Example : ‘Positioning survey’ for Hilton Garden Inn.
  6. 6. Toluna Output Example : ‘Positioning survey’ for Hilton Garden Inn.
  7. 7. Toluna Example : Positioning survey for Beyonce
  8. 8. Toluna Example : Positioning survey for Beyonce
  9. 9. Toluna Market Research Main Challenge: Targeting Select segment of respondents (sample) that is: • Relevant to the question (dog owners who have one big dog and one small dog, smokers who are trying to stop, etc.) • Representative and balanced (not biased). The tougher/restrictive the targeting, the more expensive the study.
  10. 10. Toluna The Targeting Pipeline and Incidence Rate Demographics Behavioral Study Select the right population based on simple demographic attributes: Age, Gender, Region, Ethnicity, Income, etc. Further select based on behavioral and custom attributes: fly more than 5 times a year, uses aspirin on a daily basis, etc. Fixed set of attributes known beforehand Free style attributes, usually unknown. Incidence Rate: IR = Completes / Starts Cost is a growing function of IR Targeting process Start Complete
  11. 11. Toluna Why is targeting hard? Looking for 1,000 people in the UK who “smoke,” “tried to stop in the past,” “live around London,” “age 24-50.” Data on UK population: • 18% of the UK adults smoke • 40% of smokers tried to stop • 15% of the population is in the London area • 30% is between 24-50 Incidence rate: 0.18 * 0.4 *.15 * .3 = 0.3 % Sample size: 333,333 UK London Adults Smokers Tried to stop
  12. 12. Toluna State of the Art: Use Known Demographic Features • Basic Demographics are known: 100% incidence. o Age and London • Smokers: 18% • Tried to stop: 40% Incidence rate: 1 * 0.18 * 0.4 = 9 % Sample size: 11,000 Adults in the London Area Smokers Tried to stop
  13. 13. Toluna New Direction: Use Known Features and Predict Unknown Features • Basic Demographics are known: 100% incidence. o Age and London • If we could predict smokers with 85% accuracy. • Tried to stop still unknown: 40% Incidence rate: 1 * 0.85 * 0.4 = 34 % Sample size: 2,900 Adults in the London Area who are predicted to be smokers Tried to stop Smokers
  14. 14. Toluna How to Predict Features? The Space Model Users Features Shirt color Red Blue Smokes? Yes No Sex, Age, Region, etc. User 1 User 2 User 3 User 4 10^^9 users 10^^7 features Sparse Matrix containing all the attributes (integer answers to questions) we have ever asked. Demographic attributes Behavioral attributes
  15. 15. Toluna The Learning Task - The Model Try to predict answer to the “Smokes?” attribute based on other attributes. Smokes? Dog owner? Jogger? Overweight?
  16. 16. Toluna The Learning Task - Collaborative Filtering User correlation or Feature correlation User correlation: High level features [William Cohen] • If Josie and Bob both have the X feature then if Josie has the Y feature, Bob is more likely to have the Y feature as well. • Dog owners • Political inclination, Taste, Lifestyle Feature correlation: • If Josie has the X feature, Josie is more likely to also have the Y feature. • Joggers (y) and Smokers (n) • Favorite sports and Race/Ethnicity • Income level and Education level
  17. 17. Toluna Smaller Task: Complete missing data on a single survey for a single customer. Example: On a specific survey, some respondents skip some questions on income, some other skip the income level question. Use answers provided by other respondents to impute the missing data. Imputation: Complete missing data with substituted values with more or less sophistication. Mean, Nearest neighbor, Multiple Imputation, etc. [Andridge & Little 2011], [Rubin 1987], ... Implementation: IBM, SPSS Missing Values module. Uses an iterative Markov Chain Monte Carlo (MCMC) and multiple imputation. Used by the US Census bureau. First Experiments with Multiple Imputation
  18. 18. Toluna First Experiments with Multiple Imputation Some Results Where it does not work: • Too much missing data (over 10%) • Too many possible answers (what is the name of your children? what is your home city, etc.) • Not enough data overall (less than 1,000) Example of features that work well: Dog owners, Smokers, Income level, Age (3 bands), etc. Accuracy: 85% using blind tests.
  19. 19. Toluna Current Work Currently working on the storing component in AWS using Hbase, Elastic search and Hadoop. Some queries: • Find people who Smoke, Have a red shirt and are between 22 and 34. • Compute and store the similarity or correlation between any two pair of users. • Compute and store the similarity between features.
  20. 20. Toluna Future Work • Define model: binary features (smokes), Integer (number of children, income), Strings (city, diseases, etc.). • Experiment on a large scale with Collaborative Filtering algorithm and others. • Experiment with user based and feature based filtering (blend?, Slope-One?) • Integrate this into Targeting methodology
  21. 21. Toluna Q&A Suggestions? Ideas? Comments? Questions?