Helixa
Audience Projection of Target Consumers over
Multiple Domains: a NER and Bayesian approach
Gianmario Spacagna
Chief Scientist @ Helixa
O’Reilly AI Conference
London, 16th October 2019
About Me
7+ years experience in Data Science and Machine Learning
Currently leading a team of ML Scientists and ML Engineers
Background in Telematics and Software Engineering of Distributed Systems
Ongoing MBA Student
Co-author of Python Deep Learning
Contributor of the Professional Data Science Manifesto
Blogger of Data Science Vademecum
Founder of the Data Science Milan community (1.4k members)
Stockholm, London, Milan
Gianmario Spacagna
Chief Scientist, Helixa
gspacagna@helixa.ai
DEMOGRAPHICS
HHI < 40K
Female
18 - 24
INFLUENCERS
ODESZA Cardi B
Shane DawsonJames Charles
Helixa is Market
Research platform
that uses AI to
integrate disparate
data sources into an
enriched view of the
consumers who
matter to your
business.
INTERESTS
Listen to Podcasts Kylie Cosmetics
Fan
Starbucks
Chipotle
PSYCHOGRAPHICS
Fast Food
Fans
Fashion
Enthusiasts
Entertainment
Junkies
In the next 40 minutes...
OUR GOAL:
Discuss some of the current challenges of traditional market
research and propose a novel solution based on Named Entity
Recognition (NER) and Bayesian Inference.
Challenges in Market Research
Applied Social Science
What is Market Research?
Gain Insights for Strategic Decisions
Information about
individuals and organizations Statistical Inference
Why Market Research matters?
Brands Perceptions
Consumers Preferences
and Behaviors
Buyer Personas
Market Segmentation
Identify OpportunitiesMarket Trends
Approaches to Market Research
Opinions and individual experiences
In-depth interviews
Smaller sample
Qualitative Quantitative
Numbers and Data
Statistics
Larger sample
Quantitative Market Research is conducted with Surveys
Define
Analyze
DistributeCollect
Design
Limitations of Surveys
Expensive
Invasive
Response Bias
Predefined questions
Narrow coverage
Market Research using “Implicit Consumers Feedback”
Define
Analyze
DistributeCollect
Design
vs.
e.g. Social Listening
Twitter Interactions
Inferring Interests from Twitter Interactions
Advantages of Implicit Consumer Feedback Approaches
Flexible costs
Wide view
Opportunities for Big Data and AIMass coverage
Spontaneous
What about other information?
Twitter Interactions
Amazon Purchases
?
Beer Consumption Brand
?
The Universe of Consumers Datasets
Social Media
Financial and
Properties
Behaviors
First Party
(CSM)
Consumer
Research
Surveys
SCATTERED PARTIAL SKEWED
M A L E F E M A L E
18-30
31-43
44-56
57-70
Individual Consumers Datasets are Far From Being Exhaustive
ALL IN ONE COMPLETE REPRESENTATIVE
M A L E F E M A L E
18-30
31-43
44-56
57-70
The Holy Grail of Market Research
What is the baseline
algorithm for
“completing”
datasets?
Look-alike Fusion
What is look-alike fusion?
Left:
Social Network Panel
Right:
Consumptions Survey Panel
Assignment Optimization Problem
● Hungarian method
● Simplex
● Auction algorithm
Well-known solutions:
Datasets Fusion
X X
X X X
X X
X X X
X
Left
User
Right
User
left-only entities right-only entities
Target
Audience
=
Look-alike Fusions Requires a Main Panel Centrality
Look-alike Fusions Don’t Scale Well
Differences in feature
space
Craftsmanship required
at each change of data
Universal objective
function to optimize
Is there a more
scalable way to
“fuse” datasets?
The Audience Projection
Audience Projection defined as “User Binary Classification”
Source:
Social Network Panel
Destination:
Consumptions Survey Panel
70M
Social accounts
200M
U.S. consumers
1.6M / 26M /
TRUE
FALSE
TRUE
FALSE
Target
Audience
=
PROJECTION
Ben & Jerry’s: bought in
last 6 months?
Affinity: 1.80x
Venmo: paid in last 30 days?
Affinity: 1.6x
Angry Orchard: drunk in
last 6 months?
Affinity: 1.50x
Solution = Named Entity Recognition (NER) + Bayesian Model
Social
Pages
Consumption
Questions
NER NER
BAYESIAN MODEL
ENTITY LINKING (NEL)
Destination:
Consumptions Survey Panel
Source:
Social Network Panel
Projected Users
Probabilities
Target
Audience
Entities Represent an Universal Feature Space
Social
Pages
Consumption
Questions
Listed
Products
NER NER NER
The Coca-Cola Company is a total beverage
company, offering over 500 brands in more
than 200 countries and territories.
Named Entity Recognition(NER) in each Domain
Social
Pages
Consumption
Questions
Listed
Products
Adidas Originals Men's Relaxed Strapback Cap
Coca-Cola KWC-4 6-Can Personal Mini 12V DC Car and 110V
AC Cooler, Red
NLP Libraries with NER capability
Polyglot
Deep
Pavlov
Why for Production?
Fast Accurate
Industry-grade maturity
example of NER usage
Same Entity May Exist with Different Spellings
Interacted with
Coca-Cola Company on
Social Networks
“Have you consumed
Coca-Cola last week?”
Linking and Normalizing Entities via
en.wikipedia.org/wiki/Coca-Cola
en.wikipedia.org/wiki/The_Coca-Cola_Company
Entity
Relationship
Normalized Entities means a Common Feature Space
Stacked Heterogeneous Feature Space
X X ? ?
X X ? ?
? ? X X X X
? ? X X X
? ? X X X
Source
Users
Destination
Users
source-only entities common entities destination-only entities
Latent
interests
Target
Audience
=
Common Entities translate Source to Destination
Source:
Social Network Panel
Destination:
Consumptions Survey Panel
Target
Audience
=
Common Entities
?Bayesian
Model
Source Target Size
1.6M / 70M = 2.3%
Share of
Interests
“Share of interests” encode the DNA of the Target Audience
Global
share of interests:
100%
Common Entities
Target audience
share of interests:
50%
17%
50%
Target Audience
slice
Bayesian Model
Posterior
Probability of user belonging to
projected target given the
Share of Interests on common entities
𝐏( / ) =∈
𝐏( / )∙𝐏( )∈ ∈
𝐏( )
Evidence
Prior
Source Target Size=2.3%Likelihood
Evidence Decomposition
𝐏( )
Evidence
𝐏( / )∙𝐏( )∈ ∈
𝐏( / )∙𝐏( )∉ ∉
Marginal Positive Likelihood
Binomial distribution
𝐏( / ) ≈∈p=17%
Joint Likelihood under Naive Assumption
𝐏( , , / ) =50%17% 50%
∈
𝐏( / )∙17%
∈
𝐏( / )∙50%
∈
𝐏( / )50%
∈
Destination
variables
TeenNick Robot
Chicken
Bob’s
Burgers
Ben &
Jerry’s
Venmo Angry
Orchard
Nintendo
DSi XL
Video
Games
Audio or
Video Chat
Affinity 8.9x 7.27x 2.36x 1.80x 1.62x 1.55.x 1.47x 1.45x 1.23x
Predicted Probabilities provides Insights on the Projected Users
PROJECTIONTarget
Audience
=
Projected Users
Probabilities
Insights on Destination Variables
𝐏( / )∈
Audience Projection In a Nutshell
Social Panel Consumptions Survey PanelCommon Entities
Bayesian Model
Target
Audience
=
Affinity: 1.80x
Affinity: 1.55x
Affinity: 1.62x
Cool! How do you
know this is
accurate?
Evaluation Techniques
Binary Classifier Evaluation
Bayesian Model
Projected Users Probabilities
Ground Truth
Evaluation
techniques
?
Validate via Common Entities
X
X
X X X
X X
X
Source
Users
Destination
Users
common entities
Target
Audience
OR=
Projected
Audience
OR=
Exact Query Replica
Ground
Truth
Validate via Self Reconstruction Within the Same Domain
X X X
X X X
X X X X X X
X X X X
X X X
Source
Users
Destination
Users
source-only entities common entities destination-only entities
Target
Audience
=
Ground
Truth
Validate via Double-step Reconstruction
PROJECTION PROJECTION
Predicted
probabilities
Ground
Truth
Repeat Test Cases Stratifying by Category
Demographics Skewness
PROJECTION
Golden Benchmarks Comparison on Aggregated Insights
Opportunities
Many Linked Views of the Same Global Population
Audience
Projection
Multiple Perspectives Reinforce Reliability
Social Panel
Target
Audience
=
Interacted with Game
Informer social page
Affinity: 2.17x
Have you read any Game
Informer issue?
Affinity: 1.73x
Game Informer Single Issue
Magazine purchased online
Affinity: 2.51x
Generalize Audience Projection as a Domain Adaptation Problem
Final Remarks
Many Datasets
but
only Partial Views
Look-alike
fusions don’t
scale well
Audience Projection
adapts to any
“entity domain”
Bayesian Model
Accuracy and
Biases can be
quantified
Strategists now
have a complete
view of their
Target Audience
Gianmario Spacagna
Chief Scientist at Helixa.ai
gspacagna@helixa.ai
@gm_spacagna
Appendix A:
The spaCy NER Model
Natural Language Processing (NLP) Pipeline
"Mark Watney visited Mars"
The spaCy NER Model Overview
EMBED
ENCODE
ATTEND
PREDICT
Embedding Words
Features
token lower prefix suffix shape
Apple apple app ple Wwwww
U.K. uk uk uk W.W.
Fahrenheit 451 fahrenheit 451 fah 451 Wwwwwwwwww ddd
Each word (token) is represented by concatenating
the embeddings of all of the 4 features in order to
generalize the context for unknown words.
Efficiently Embedding Words
Hash Embedding reduces the dimensionality and
allows to deal with large vocabularies
Encoding Sequences of Words
Residual Convolutional Neural Networks allows to
encode context-independent word vectors into a
context-sensitive sentence matrix.
Raw tri-gram chunk Enriched tri-gram matrix
Mark
Watney
visited
“Mark Watney visited Mars”
Crafting the Attention Vector
The attention vector of the trigram includes
information on the encountered entities.
“Mark Watney visited Mars”
Attention vector
Tri-gram matrix
Enriched
tri-gram vector
Predicting the Recognized Entities
Actions:
SHIFT
OUT
REDUCE (Entity Tagging)
Stack Buffer Segment
“Mark Watney visited Mars”
Actions:
1.SHIFT
2.SHIFT
3.REDUCE (PER)
4.OUT
5.SHIFT
6.REDUCE (LOC)
Mark
Watney
Mars
Mark
Watney
visited
Mars
Enriched
tri-gram vector
Update
attention
Attention vector
Tri-gam matrix
Official Explanation of spaCy NER Model
https://www.youtube.com/watch?v=sqDHBH9IjRU
Appendix B:
The Bayesian Model
Projecting the Share of Interests on Common Entities
Target
Audience
Projection
50%
17%
50%
Share of Interests:
SIZE: 60M
SIZE: 200M
SIZE: ?
SIZE: 40M
Global Audience
(average american)
=
Target
Audience evidence
prior
Evidence Statistics on Share of Interests
N = 180M users in U.S. population
sampling rate = 1 : 10k
n = 18k users in sample panel
p = 17% of market penetration
x = 3k expected projected users
SIZE: 200M
SIZE: 40M
statistics:
evidence
𝐏( / ) =
Binomial Positive Likelihood
n = 17999
x = 2999
log(p)=-5.56323
Probability of selecting 3000 / 18000 McDonald’s panel
users given that the user IS part of the target∈
n = 18000
x = 3000
log(p)=-5.54342
is smaller than
p=17%
𝐏( / ) =
Binomial Negative Likelihood
n = 17999
x = 2999
log(p)=-5.53942
Probability of selecting 3000 / 18000 McDonald’s panel
users given that the user IS NOT part of the target∉
n = 18000
x = 3000
log(p)=-5.54342
p=17%
is greater than

Audience projection of target consumers over multiple domains a ner and bayesian approach, Gianmario Spacagna, Alberto Pirovano

  • 1.
    Helixa Audience Projection ofTarget Consumers over Multiple Domains: a NER and Bayesian approach Gianmario Spacagna Chief Scientist @ Helixa O’Reilly AI Conference London, 16th October 2019
  • 2.
    About Me 7+ yearsexperience in Data Science and Machine Learning Currently leading a team of ML Scientists and ML Engineers Background in Telematics and Software Engineering of Distributed Systems Ongoing MBA Student Co-author of Python Deep Learning Contributor of the Professional Data Science Manifesto Blogger of Data Science Vademecum Founder of the Data Science Milan community (1.4k members) Stockholm, London, Milan Gianmario Spacagna Chief Scientist, Helixa gspacagna@helixa.ai
  • 3.
    DEMOGRAPHICS HHI < 40K Female 18- 24 INFLUENCERS ODESZA Cardi B Shane DawsonJames Charles Helixa is Market Research platform that uses AI to integrate disparate data sources into an enriched view of the consumers who matter to your business. INTERESTS Listen to Podcasts Kylie Cosmetics Fan Starbucks Chipotle PSYCHOGRAPHICS Fast Food Fans Fashion Enthusiasts Entertainment Junkies
  • 4.
    In the next40 minutes... OUR GOAL: Discuss some of the current challenges of traditional market research and propose a novel solution based on Named Entity Recognition (NER) and Bayesian Inference.
  • 5.
  • 6.
    Applied Social Science Whatis Market Research? Gain Insights for Strategic Decisions Information about individuals and organizations Statistical Inference
  • 7.
    Why Market Researchmatters? Brands Perceptions Consumers Preferences and Behaviors Buyer Personas Market Segmentation Identify OpportunitiesMarket Trends
  • 8.
    Approaches to MarketResearch Opinions and individual experiences In-depth interviews Smaller sample Qualitative Quantitative Numbers and Data Statistics Larger sample
  • 9.
    Quantitative Market Researchis conducted with Surveys Define Analyze DistributeCollect Design
  • 10.
    Limitations of Surveys Expensive Invasive ResponseBias Predefined questions Narrow coverage
  • 11.
    Market Research using“Implicit Consumers Feedback” Define Analyze DistributeCollect Design vs. e.g. Social Listening
  • 12.
  • 13.
    Advantages of ImplicitConsumer Feedback Approaches Flexible costs Wide view Opportunities for Big Data and AIMass coverage Spontaneous
  • 14.
    What about otherinformation? Twitter Interactions Amazon Purchases ? Beer Consumption Brand ?
  • 15.
    The Universe ofConsumers Datasets Social Media Financial and Properties Behaviors First Party (CSM) Consumer Research Surveys
  • 16.
    SCATTERED PARTIAL SKEWED MA L E F E M A L E 18-30 31-43 44-56 57-70 Individual Consumers Datasets are Far From Being Exhaustive
  • 17.
    ALL IN ONECOMPLETE REPRESENTATIVE M A L E F E M A L E 18-30 31-43 44-56 57-70 The Holy Grail of Market Research
  • 18.
    What is thebaseline algorithm for “completing” datasets?
  • 19.
  • 20.
    What is look-alikefusion? Left: Social Network Panel Right: Consumptions Survey Panel
  • 21.
    Assignment Optimization Problem ●Hungarian method ● Simplex ● Auction algorithm Well-known solutions:
  • 22.
    Datasets Fusion X X XX X X X X X X X Left User Right User left-only entities right-only entities Target Audience =
  • 23.
    Look-alike Fusions Requiresa Main Panel Centrality
  • 24.
    Look-alike Fusions Don’tScale Well Differences in feature space Craftsmanship required at each change of data Universal objective function to optimize
  • 25.
    Is there amore scalable way to “fuse” datasets?
  • 26.
  • 27.
    Audience Projection definedas “User Binary Classification” Source: Social Network Panel Destination: Consumptions Survey Panel 70M Social accounts 200M U.S. consumers 1.6M / 26M / TRUE FALSE TRUE FALSE Target Audience = PROJECTION Ben & Jerry’s: bought in last 6 months? Affinity: 1.80x Venmo: paid in last 30 days? Affinity: 1.6x Angry Orchard: drunk in last 6 months? Affinity: 1.50x
  • 28.
    Solution = NamedEntity Recognition (NER) + Bayesian Model Social Pages Consumption Questions NER NER BAYESIAN MODEL ENTITY LINKING (NEL) Destination: Consumptions Survey Panel Source: Social Network Panel Projected Users Probabilities Target Audience
  • 29.
    Entities Represent anUniversal Feature Space Social Pages Consumption Questions Listed Products NER NER NER
  • 30.
    The Coca-Cola Companyis a total beverage company, offering over 500 brands in more than 200 countries and territories. Named Entity Recognition(NER) in each Domain Social Pages Consumption Questions Listed Products Adidas Originals Men's Relaxed Strapback Cap Coca-Cola KWC-4 6-Can Personal Mini 12V DC Car and 110V AC Cooler, Red
  • 31.
    NLP Libraries withNER capability Polyglot Deep Pavlov
  • 32.
    Why for Production? FastAccurate Industry-grade maturity
  • 33.
  • 34.
    Same Entity MayExist with Different Spellings Interacted with Coca-Cola Company on Social Networks “Have you consumed Coca-Cola last week?”
  • 35.
    Linking and NormalizingEntities via en.wikipedia.org/wiki/Coca-Cola en.wikipedia.org/wiki/The_Coca-Cola_Company Entity Relationship
  • 36.
    Normalized Entities meansa Common Feature Space
  • 37.
    Stacked Heterogeneous FeatureSpace X X ? ? X X ? ? ? ? X X X X ? ? X X X ? ? X X X Source Users Destination Users source-only entities common entities destination-only entities Latent interests Target Audience =
  • 38.
    Common Entities translateSource to Destination Source: Social Network Panel Destination: Consumptions Survey Panel Target Audience = Common Entities ?Bayesian Model Source Target Size 1.6M / 70M = 2.3% Share of Interests
  • 39.
    “Share of interests”encode the DNA of the Target Audience Global share of interests: 100% Common Entities Target audience share of interests: 50% 17% 50% Target Audience slice
  • 40.
    Bayesian Model Posterior Probability ofuser belonging to projected target given the Share of Interests on common entities 𝐏( / ) =∈ 𝐏( / )∙𝐏( )∈ ∈ 𝐏( ) Evidence Prior Source Target Size=2.3%Likelihood
  • 41.
    Evidence Decomposition 𝐏( ) Evidence 𝐏(/ )∙𝐏( )∈ ∈ 𝐏( / )∙𝐏( )∉ ∉
  • 42.
    Marginal Positive Likelihood Binomialdistribution 𝐏( / ) ≈∈p=17%
  • 43.
    Joint Likelihood underNaive Assumption 𝐏( , , / ) =50%17% 50% ∈ 𝐏( / )∙17% ∈ 𝐏( / )∙50% ∈ 𝐏( / )50% ∈
  • 44.
    Destination variables TeenNick Robot Chicken Bob’s Burgers Ben & Jerry’s VenmoAngry Orchard Nintendo DSi XL Video Games Audio or Video Chat Affinity 8.9x 7.27x 2.36x 1.80x 1.62x 1.55.x 1.47x 1.45x 1.23x Predicted Probabilities provides Insights on the Projected Users PROJECTIONTarget Audience = Projected Users Probabilities Insights on Destination Variables 𝐏( / )∈
  • 45.
    Audience Projection Ina Nutshell Social Panel Consumptions Survey PanelCommon Entities Bayesian Model Target Audience = Affinity: 1.80x Affinity: 1.55x Affinity: 1.62x
  • 46.
    Cool! How doyou know this is accurate?
  • 47.
  • 48.
    Binary Classifier Evaluation BayesianModel Projected Users Probabilities Ground Truth Evaluation techniques ?
  • 49.
    Validate via CommonEntities X X X X X X X X Source Users Destination Users common entities Target Audience OR= Projected Audience OR= Exact Query Replica Ground Truth
  • 50.
    Validate via SelfReconstruction Within the Same Domain X X X X X X X X X X X X X X X X X X X Source Users Destination Users source-only entities common entities destination-only entities Target Audience = Ground Truth
  • 51.
    Validate via Double-stepReconstruction PROJECTION PROJECTION Predicted probabilities Ground Truth
  • 52.
    Repeat Test CasesStratifying by Category
  • 53.
  • 54.
    Golden Benchmarks Comparisonon Aggregated Insights
  • 55.
  • 56.
    Many Linked Viewsof the Same Global Population Audience Projection
  • 57.
    Multiple Perspectives ReinforceReliability Social Panel Target Audience = Interacted with Game Informer social page Affinity: 2.17x Have you read any Game Informer issue? Affinity: 1.73x Game Informer Single Issue Magazine purchased online Affinity: 2.51x
  • 58.
    Generalize Audience Projectionas a Domain Adaptation Problem
  • 59.
  • 60.
  • 61.
  • 62.
    Audience Projection adapts toany “entity domain” Bayesian Model
  • 63.
  • 64.
    Strategists now have acomplete view of their Target Audience
  • 65.
    Gianmario Spacagna Chief Scientistat Helixa.ai gspacagna@helixa.ai @gm_spacagna
  • 66.
  • 67.
    Natural Language Processing(NLP) Pipeline "Mark Watney visited Mars"
  • 68.
    The spaCy NERModel Overview EMBED ENCODE ATTEND PREDICT
  • 69.
    Embedding Words Features token lowerprefix suffix shape Apple apple app ple Wwwww U.K. uk uk uk W.W. Fahrenheit 451 fahrenheit 451 fah 451 Wwwwwwwwww ddd Each word (token) is represented by concatenating the embeddings of all of the 4 features in order to generalize the context for unknown words.
  • 70.
    Efficiently Embedding Words HashEmbedding reduces the dimensionality and allows to deal with large vocabularies
  • 71.
    Encoding Sequences ofWords Residual Convolutional Neural Networks allows to encode context-independent word vectors into a context-sensitive sentence matrix. Raw tri-gram chunk Enriched tri-gram matrix Mark Watney visited “Mark Watney visited Mars”
  • 72.
    Crafting the AttentionVector The attention vector of the trigram includes information on the encountered entities. “Mark Watney visited Mars” Attention vector Tri-gram matrix Enriched tri-gram vector
  • 73.
    Predicting the RecognizedEntities Actions: SHIFT OUT REDUCE (Entity Tagging) Stack Buffer Segment “Mark Watney visited Mars” Actions: 1.SHIFT 2.SHIFT 3.REDUCE (PER) 4.OUT 5.SHIFT 6.REDUCE (LOC) Mark Watney Mars Mark Watney visited Mars Enriched tri-gram vector Update attention Attention vector Tri-gam matrix
  • 74.
    Official Explanation ofspaCy NER Model https://www.youtube.com/watch?v=sqDHBH9IjRU
  • 75.
  • 76.
    Projecting the Shareof Interests on Common Entities Target Audience Projection 50% 17% 50% Share of Interests: SIZE: 60M SIZE: 200M SIZE: ? SIZE: 40M Global Audience (average american) = Target Audience evidence prior
  • 77.
    Evidence Statistics onShare of Interests N = 180M users in U.S. population sampling rate = 1 : 10k n = 18k users in sample panel p = 17% of market penetration x = 3k expected projected users SIZE: 200M SIZE: 40M statistics: evidence
  • 78.
    𝐏( / )= Binomial Positive Likelihood n = 17999 x = 2999 log(p)=-5.56323 Probability of selecting 3000 / 18000 McDonald’s panel users given that the user IS part of the target∈ n = 18000 x = 3000 log(p)=-5.54342 is smaller than p=17%
  • 79.
    𝐏( / )= Binomial Negative Likelihood n = 17999 x = 2999 log(p)=-5.53942 Probability of selecting 3000 / 18000 McDonald’s panel users given that the user IS NOT part of the target∉ n = 18000 x = 3000 log(p)=-5.54342 p=17% is greater than