Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Audience projection of target consumers over multiple domains a ner and bayesian approach, Gianmario Spacagna, Alberto Pirovano


Published on

Traditional market research is generally conducted by questionnaires or other forms of explicit feedback, directly asked to an ad hoc panel of individuals that in aggregate are representative of a larger group of people. Unfortunately, those traditional approaches are often invasive, nonscalable, and biased. Indirect approaches based on sparse and implicit consumer feedback (e.g., social network interactions, web browsing, or online purchases) are more scalable, authentic, and more suitable for real-time consumer insights.

Although those sources of implicit consumer feedback provide relevant and detailed pictures of the population, they individually provide only a limited set of observable behaviors.
The Holy Grail of market research is the ability to merge different sources of consumers interests into an augmented view that connects all the dots across multiple domains.

Unfortunately, user-centric "fusion" algorithms present many limitations in the case of heterogeneous datasets strongly differing in terms of size and density and when the number of sources to merge increases.

We propose a novel approach of Audience Projection able to define a target audience as a subset of the population in a source domain and to project this target to a set of users into a destination dataset.
We will show how libraries such as spaCy can provide Deep Learning implementations for Named Entity Recognition (NER) to match related brands and we will use Bayesian Inference to transfer knowledge from the source domain. This way, we can estimate the probability of the user to belong to the target using the source distribution of volume of interests of common entities as model evidence and the source target size as prior probability.


Gianmario Spacagna is the chief scientist and head of AI at Helixa. His team’s mission is building the next generation of behavior algorithms and models of human decision making with careful attention to their potential and effects on society. His experience covers a diverse portfolio of machine learning algorithms and data products across different industries. Previously, he worked as a data scientist in IoT automotive (Pirelli Cyber Technology), retail and business banking (Barclays Analytics Centre of Excellence), threat intelligence (Cisco Talos), predictive marketing (AgilOne), plus some occasional freelancing. He’s a co-author of the book Python Deep Learning, contributor to the “Professional Manifesto for Data Science,” and founder of the Data Science Milan community. Gianmario holds a master’s degree in telematics (Polytechnic of Turin) and software engineering of distributed systems (KTH of Stockholm). After having spent half of his career abroad, he now lives in Milan. His favorite hobbies include home cooking, hiking, and exploring the surrounding nature on his motorcycle.

Published in: Technology
  • Be the first to comment

Audience projection of target consumers over multiple domains a ner and bayesian approach, Gianmario Spacagna, Alberto Pirovano

  1. 1. Helixa Audience Projection of Target Consumers over Multiple Domains: a NER and Bayesian approach Gianmario Spacagna Chief Scientist @ Helixa O’Reilly AI Conference London, 16th October 2019
  2. 2. About Me 7+ years experience in Data Science and Machine Learning Currently leading a team of ML Scientists and ML Engineers Background in Telematics and Software Engineering of Distributed Systems Ongoing MBA Student Co-author of Python Deep Learning Contributor of the Professional Data Science Manifesto Blogger of Data Science Vademecum Founder of the Data Science Milan community (1.4k members) Stockholm, London, Milan Gianmario Spacagna Chief Scientist, Helixa
  3. 3. DEMOGRAPHICS HHI < 40K Female 18 - 24 INFLUENCERS ODESZA Cardi B Shane DawsonJames Charles Helixa is Market Research platform that uses AI to integrate disparate data sources into an enriched view of the consumers who matter to your business. INTERESTS Listen to Podcasts Kylie Cosmetics Fan Starbucks Chipotle PSYCHOGRAPHICS Fast Food Fans Fashion Enthusiasts Entertainment Junkies
  4. 4. In the next 40 minutes... OUR GOAL: Discuss some of the current challenges of traditional market research and propose a novel solution based on Named Entity Recognition (NER) and Bayesian Inference.
  5. 5. Challenges in Market Research
  6. 6. Applied Social Science What is Market Research? Gain Insights for Strategic Decisions Information about individuals and organizations Statistical Inference
  7. 7. Why Market Research matters? Brands Perceptions Consumers Preferences and Behaviors Buyer Personas Market Segmentation Identify OpportunitiesMarket Trends
  8. 8. Approaches to Market Research Opinions and individual experiences In-depth interviews Smaller sample Qualitative Quantitative Numbers and Data Statistics Larger sample
  9. 9. Quantitative Market Research is conducted with Surveys Define Analyze DistributeCollect Design
  10. 10. Limitations of Surveys Expensive Invasive Response Bias Predefined questions Narrow coverage
  11. 11. Market Research using “Implicit Consumers Feedback” Define Analyze DistributeCollect Design vs. e.g. Social Listening
  12. 12. Twitter Interactions Inferring Interests from Twitter Interactions
  13. 13. Advantages of Implicit Consumer Feedback Approaches Flexible costs Wide view Opportunities for Big Data and AIMass coverage Spontaneous
  14. 14. What about other information? Twitter Interactions Amazon Purchases ? Beer Consumption Brand ?
  15. 15. The Universe of Consumers Datasets Social Media Financial and Properties Behaviors First Party (CSM) Consumer Research Surveys
  16. 16. SCATTERED PARTIAL SKEWED M A L E F E M A L E 18-30 31-43 44-56 57-70 Individual Consumers Datasets are Far From Being Exhaustive
  17. 17. ALL IN ONE COMPLETE REPRESENTATIVE M A L E F E M A L E 18-30 31-43 44-56 57-70 The Holy Grail of Market Research
  18. 18. What is the baseline algorithm for “completing” datasets?
  19. 19. Look-alike Fusion
  20. 20. What is look-alike fusion? Left: Social Network Panel Right: Consumptions Survey Panel
  21. 21. Assignment Optimization Problem ● Hungarian method ● Simplex ● Auction algorithm Well-known solutions:
  22. 22. Datasets Fusion X X X X X X X X X X X Left User Right User left-only entities right-only entities Target Audience =
  23. 23. Look-alike Fusions Requires a Main Panel Centrality
  24. 24. Look-alike Fusions Don’t Scale Well Differences in feature space Craftsmanship required at each change of data Universal objective function to optimize
  25. 25. Is there a more scalable way to “fuse” datasets?
  26. 26. The Audience Projection
  27. 27. Audience Projection defined as “User Binary Classification” Source: Social Network Panel Destination: Consumptions Survey Panel 70M Social accounts 200M U.S. consumers 1.6M / 26M / TRUE FALSE TRUE FALSE Target Audience = PROJECTION Ben & Jerry’s: bought in last 6 months? Affinity: 1.80x Venmo: paid in last 30 days? Affinity: 1.6x Angry Orchard: drunk in last 6 months? Affinity: 1.50x
  28. 28. Solution = Named Entity Recognition (NER) + Bayesian Model Social Pages Consumption Questions NER NER BAYESIAN MODEL ENTITY LINKING (NEL) Destination: Consumptions Survey Panel Source: Social Network Panel Projected Users Probabilities Target Audience
  29. 29. Entities Represent an Universal Feature Space Social Pages Consumption Questions Listed Products NER NER NER
  30. 30. The Coca-Cola Company is a total beverage company, offering over 500 brands in more than 200 countries and territories. Named Entity Recognition(NER) in each Domain Social Pages Consumption Questions Listed Products Adidas Originals Men's Relaxed Strapback Cap Coca-Cola KWC-4 6-Can Personal Mini 12V DC Car and 110V AC Cooler, Red
  31. 31. NLP Libraries with NER capability Polyglot Deep Pavlov
  32. 32. Why for Production? Fast Accurate Industry-grade maturity
  33. 33. example of NER usage
  34. 34. Same Entity May Exist with Different Spellings Interacted with Coca-Cola Company on Social Networks “Have you consumed Coca-Cola last week?”
  35. 35. Linking and Normalizing Entities via Entity Relationship
  36. 36. Normalized Entities means a Common Feature Space
  37. 37. Stacked Heterogeneous Feature Space X X ? ? X X ? ? ? ? X X X X ? ? X X X ? ? X X X Source Users Destination Users source-only entities common entities destination-only entities Latent interests Target Audience =
  38. 38. Common Entities translate Source to Destination Source: Social Network Panel Destination: Consumptions Survey Panel Target Audience = Common Entities ?Bayesian Model Source Target Size 1.6M / 70M = 2.3% Share of Interests
  39. 39. “Share of interests” encode the DNA of the Target Audience Global share of interests: 100% Common Entities Target audience share of interests: 50% 17% 50% Target Audience slice
  40. 40. Bayesian Model Posterior Probability of user belonging to projected target given the Share of Interests on common entities 𝐏( / ) =∈ 𝐏( / )∙𝐏( )∈ ∈ 𝐏( ) Evidence Prior Source Target Size=2.3%Likelihood
  41. 41. Evidence Decomposition 𝐏( ) Evidence 𝐏( / )∙𝐏( )∈ ∈ 𝐏( / )∙𝐏( )∉ ∉
  42. 42. Marginal Positive Likelihood Binomial distribution 𝐏( / ) ≈∈p=17%
  43. 43. Joint Likelihood under Naive Assumption 𝐏( , , / ) =50%17% 50% ∈ 𝐏( / )∙17% ∈ 𝐏( / )∙50% ∈ 𝐏( / )50% ∈
  44. 44. Destination variables TeenNick Robot Chicken Bob’s Burgers Ben & Jerry’s Venmo Angry Orchard Nintendo DSi XL Video Games Audio or Video Chat Affinity 8.9x 7.27x 2.36x 1.80x 1.62x 1.55.x 1.47x 1.45x 1.23x Predicted Probabilities provides Insights on the Projected Users PROJECTIONTarget Audience = Projected Users Probabilities Insights on Destination Variables 𝐏( / )∈
  45. 45. Audience Projection In a Nutshell Social Panel Consumptions Survey PanelCommon Entities Bayesian Model Target Audience = Affinity: 1.80x Affinity: 1.55x Affinity: 1.62x
  46. 46. Cool! How do you know this is accurate?
  47. 47. Evaluation Techniques
  48. 48. Binary Classifier Evaluation Bayesian Model Projected Users Probabilities Ground Truth Evaluation techniques ?
  49. 49. Validate via Common Entities X X X X X X X X Source Users Destination Users common entities Target Audience OR= Projected Audience OR= Exact Query Replica Ground Truth
  50. 50. Validate via Self Reconstruction Within the Same Domain X X X X X X X X X X X X X X X X X X X Source Users Destination Users source-only entities common entities destination-only entities Target Audience = Ground Truth
  51. 51. Validate via Double-step Reconstruction PROJECTION PROJECTION Predicted probabilities Ground Truth
  52. 52. Repeat Test Cases Stratifying by Category
  53. 53. Demographics Skewness PROJECTION
  54. 54. Golden Benchmarks Comparison on Aggregated Insights
  55. 55. Opportunities
  56. 56. Many Linked Views of the Same Global Population Audience Projection
  57. 57. Multiple Perspectives Reinforce Reliability Social Panel Target Audience = Interacted with Game Informer social page Affinity: 2.17x Have you read any Game Informer issue? Affinity: 1.73x Game Informer Single Issue Magazine purchased online Affinity: 2.51x
  58. 58. Generalize Audience Projection as a Domain Adaptation Problem
  59. 59. Final Remarks
  60. 60. Many Datasets but only Partial Views
  61. 61. Look-alike fusions don’t scale well
  62. 62. Audience Projection adapts to any “entity domain” Bayesian Model
  63. 63. Accuracy and Biases can be quantified
  64. 64. Strategists now have a complete view of their Target Audience
  65. 65. Gianmario Spacagna Chief Scientist at @gm_spacagna
  66. 66. Appendix A: The spaCy NER Model
  67. 67. Natural Language Processing (NLP) Pipeline "Mark Watney visited Mars"
  68. 68. The spaCy NER Model Overview EMBED ENCODE ATTEND PREDICT
  69. 69. Embedding Words Features token lower prefix suffix shape Apple apple app ple Wwwww U.K. uk uk uk W.W. Fahrenheit 451 fahrenheit 451 fah 451 Wwwwwwwwww ddd Each word (token) is represented by concatenating the embeddings of all of the 4 features in order to generalize the context for unknown words.
  70. 70. Efficiently Embedding Words Hash Embedding reduces the dimensionality and allows to deal with large vocabularies
  71. 71. Encoding Sequences of Words Residual Convolutional Neural Networks allows to encode context-independent word vectors into a context-sensitive sentence matrix. Raw tri-gram chunk Enriched tri-gram matrix Mark Watney visited “Mark Watney visited Mars”
  72. 72. Crafting the Attention Vector The attention vector of the trigram includes information on the encountered entities. “Mark Watney visited Mars” Attention vector Tri-gram matrix Enriched tri-gram vector
  73. 73. Predicting the Recognized Entities Actions: SHIFT OUT REDUCE (Entity Tagging) Stack Buffer Segment “Mark Watney visited Mars” Actions: 1.SHIFT 2.SHIFT 3.REDUCE (PER) 4.OUT 5.SHIFT 6.REDUCE (LOC) Mark Watney Mars Mark Watney visited Mars Enriched tri-gram vector Update attention Attention vector Tri-gam matrix
  74. 74. Official Explanation of spaCy NER Model
  75. 75. Appendix B: The Bayesian Model
  76. 76. Projecting the Share of Interests on Common Entities Target Audience Projection 50% 17% 50% Share of Interests: SIZE: 60M SIZE: 200M SIZE: ? SIZE: 40M Global Audience (average american) = Target Audience evidence prior
  77. 77. Evidence Statistics on Share of Interests N = 180M users in U.S. population sampling rate = 1 : 10k n = 18k users in sample panel p = 17% of market penetration x = 3k expected projected users SIZE: 200M SIZE: 40M statistics: evidence
  78. 78. 𝐏( / ) = Binomial Positive Likelihood n = 17999 x = 2999 log(p)=-5.56323 Probability of selecting 3000 / 18000 McDonald’s panel users given that the user IS part of the target∈ n = 18000 x = 3000 log(p)=-5.54342 is smaller than p=17%
  79. 79. 𝐏( / ) = Binomial Negative Likelihood n = 17999 x = 2999 log(p)=-5.53942 Probability of selecting 3000 / 18000 McDonald’s panel users given that the user IS NOT part of the target∉ n = 18000 x = 3000 log(p)=-5.54342 p=17% is greater than