SlideShare a Scribd company logo
1 of 28
Building a Data Science Product
with PyMC3
Korbinian Kuusisto, Data Scientist @ Latana
Outline
Building a next generation
brand tracker
Why bother with Bayesian
frameworks
Bayes in production
There are hundreds of
consumer startups and
scaleups and everyone
investing into brand.
Is this money well
spent?
Market research provides
tools to track brand
perception with surveys.
And we wanted in.
But we discovered traditional
brand tracking wasn’t working
very well.
Marketeers are interested in
niche audiences! Most brands
do not actively market to
everyone. Rather they have
smaller target groups they
focus on, be it in terms of age,
gender, location, interest or
other demographic or
psychographic criteria.
And with these traditional brand trackers, if one wants
to boil it down to one of these target groups, the
sample sizes become even smaller, the margin-of-
error skyrockets out of control and the insights
become entirely in-actionable.
Not good.
We concluded that we needed to
fundamentally rethink brand tracking if we
want to truly solve brands problems and
help them grow.
Their problem is straightforward: they
need reliable insights to understand how
their brand is performing in the real world,
for their various audiences, and how this
is changing over time.
So we chose to use MRP.
To explain how MRP works, we need to first compare it with a more traditional way of
doing things. Let’s take the example of measuring opinion in a very specific, small target
audience.
Imagine a brand who wants to run a campaign targeting young females who use Twitter
and also like American football. They want to find out what this specific group of people
think of their brand.
The traditional brand tracker creates a sample of 1,000+ respondents and then zooms in
young females who use Twitter and like American football. In the end, there are 20
respondents who fit the target audience. The brand tracker takes the average opinion of
this group but because the number of respondents is so small, the margin of error is large.
Traditional Quota
Sampling
Latana is able to fix this problem.
Instead of narrowing the sample size to just 20 respondents, MRP makes an estimate of
the target audience group by using ALL the information available in the 1,000+ respondent
sample size. This means it looks at ALL the young female females, ALL the people who
use Twitter, and all the people who like American football. Because we use all the
information from the sample, the estimate for a small group is much more reliable.
Therefore, the magic potion isn’t really magic at all. It’s as simple as this: instead of
focusing on a tiny group in a target audience, MRP builds a model. This model is used to
calculate the opinion of a brand by looking at the respondents’ individual characteristics
and how they relate to the brand.
MRP
So, essentially MRP can be used as a model driven
approach to brand tracking.
Whereas the method was originally designed using a
Hierarchical Bayesian model, one is free to choose
any binary classifier that returns some estimate for
the probability that a person knows a brand.
So if you use Python, you could choose your
favourite library scikit learn and try all kinds of
classifiers.
We did that in the beginning and just used a simple
logistic regression and were good to go!
Introducing Latana
The first brand tracking tool to use data
science to ensure reliable and accurate
brand insights.
Sounds Cool, But Why Bayesian
Methods?
#1: Learning from prior
information
#2: Uncertainty quantification
#1: Learning from prior
information
Blinkist is an up-and-coming startup that has
built a reading app that condenses non-fiction
books into 15-minute audio summaries. Latana
monitored Blinkist’s levels of brand awareness
in Germany before, during and after Blinkist’s
TV campaign by surveying 2000 people. They
then used the MRP model to predict brand
awareness levels for hundreds of niche target
audiences.
#1: Learning from prior
information
So, how does using a Bayesian model with prior information
help us?
What we soon discovered is that the real world isn’t always as rosy
as it seems, and sometimes even single characteristics are hard to reach.
One may end up collecting a sample of 2000 people, but only 200 of those
fall into a certain category.
#1: Learning from prior
information
With Blinkist test, this was the case for people between 56-65
years old who are on average less tech savvy and thus less
likely to fill out our mobile surveys.
To estimate brand awareness for the small group of respondents
aged 56-65 (approximately 11% of the sample / 220 people),
using prior information from past surveys is crucial. In the
graph below, it can be seen that if prior information is not used,
the brand awareness estimate for this group is essentially the
same as the overall brand awareness of 7.5%.
#1: Learning from prior
information
This happens because the MRP model doesn’t have enough
information from respondents aged 56-65 in the sample to find
any differences between them and the rest of the sample.
However, if the MRP model is allowed to use information from the
past (i.e. the survey data that occurred before and during the
campaign), then this helps the model find a stronger signal. By using
prior information, there comes a different result: the MRP model
estimates that brand awareness for 56-65-year-olds is 5.5%.
Therefore, without using prior information, MRP would not be able to
detect a difference between the general population and
56-65-year-olds and would simply assign the niche audience
the overall average of 7.5%, even if the full sample of 2000
respondents was used.
#1: Learning from prior
information
In this case, this “low education” niche audience is considered as
people who don’t have higher education. Again we see a similar
pattern as the previous example.
The model that uses prior information helps detect a lower level of
brand awareness, even with small sample sizes. On the flip side, the
model that doesn’t use prior information only starts to detect the
lower sample size at a sample size of 800 respondents or more.
#2: Uncertainty quantification
Let’s assume one of our clients runs a marketing campaign between October
and December.
Then in December they look at the Latana dashboard and see that the brand
awareness increased in some niche audience from 5% to 8%.
Now the question is how likely is that increase?
Well in a frequentist world one would just come up with some t-test or
bootstrap confidence bounds and then give a YES or NO. So ‘Yes’ this
change happened and isn’t just some random noise or ‘NO’ it did not.
Well we figured out that marketeers don’t really like showing to their boss
that there was actually no effect of their campaign.
So is there a better way to frame that?
#2: Uncertainty quantification
Well with a Bayesian model one always
gets the full posterior distribution of
estimates. This is nice since then one can
just compare the probability masses.
#2: Uncertainty quantification
So if you, for example, have two estimates, one before
and one after the campaign, just look at the overlap of
their posteriors and you will be to say:
“With a probability of 80% we are very certain that our
campaign had a positive effect on the awareness of our
brand”
Which also means that if they mess up, they would still
get some weak change probability of what ever 30-60%,
which is better than a definite NO.
#2: Uncertainty quantification
So how does this look like in our dashboard?
So basically whenever you want to compare two
estimates, the dashboard also shows you the change
probability with a color coding. This is something really
helpful for our clients.
Using Bayesian models in
production
The results looked really good, but now to the hands on part.
For coding the Bayesian model we used PyMC3 and started off with the general full
Bayesian inference algorithm, most advanced one currently is Hamiltonian MCMC (NUTS).
The advantage is that it covers all complex posterior distributions, even when they are multi
modal and so on.
However, this solver is highly unstable, it takes several hours and is just not practicable in
production. There is also another approach that is much lighter so called approximative
Bayesian inference (variational inference).
This algorithm basically assumes a smooth distribution and then just finds the one that best
fits the data. It is stable, fast but the disadvantage is that it does not cover complex
distributions.
We ran some tests and compared those two, and chose the second one because it gave us
much better results.
Using Bayesian models in
production
Using Bayesian models in
production
So interesting for people here is maybe how that looks like in production.
Well it is actually not so much different from using other machine learning libraries in production.
We wrote our model in PyMC3, then packed it into a Django web service, deployed the web service
on AWS.
Now our survey engine generates survey responses in real time, writes them to our database, our
web service picks them up, calculates the results in a reasonable time, writes them back to the
database and the Latana dashboard updates from there.
Summary
Bayesian methods added a
whole new layer of value to
our product.
Quantify probability of change
in brand KPIs
Use prior information to
uncover hard to reach
audiences
Bayesian methods in
production is no magic

More Related Content

What's hot

Analysis of "Big data hype and reality - Gregory Piatetsky-Shapiro"
Analysis of "Big data hype and reality - Gregory Piatetsky-Shapiro"Analysis of "Big data hype and reality - Gregory Piatetsky-Shapiro"
Analysis of "Big data hype and reality - Gregory Piatetsky-Shapiro"Dheepika Chokkalingam
 
Nkateko Mongwe | Inspiring Data Scientist | 2021
Nkateko Mongwe | Inspiring Data Scientist | 2021Nkateko Mongwe | Inspiring Data Scientist | 2021
Nkateko Mongwe | Inspiring Data Scientist | 2021Nkateko Mongwe
 
Data Storytelling - Game changer for Analytics
Data Storytelling - Game changer for Analytics Data Storytelling - Game changer for Analytics
Data Storytelling - Game changer for Analytics Gramener
 
2008 ANA Masters of Marketing Speech
2008 ANA Masters of Marketing Speech2008 ANA Masters of Marketing Speech
2008 ANA Masters of Marketing Speechpaulnprice
 
Banksocial - Keynote 2017
Banksocial - Keynote 2017 Banksocial - Keynote 2017
Banksocial - Keynote 2017 Mathew Sweezey
 
New Era of Marketing - Content Marketing Conference - 2016
New Era of Marketing - Content Marketing Conference - 2016 New Era of Marketing - Content Marketing Conference - 2016
New Era of Marketing - Content Marketing Conference - 2016 Mathew Sweezey
 
How AI Can Help You Make Your Audience Sit Up and Take Notice
How AI Can Help You Make Your Audience Sit Up and Take NoticeHow AI Can Help You Make Your Audience Sit Up and Take Notice
How AI Can Help You Make Your Audience Sit Up and Take NoticeGramener
 
3 data-led growth hacks that got our App ready for take off Applying scientif...
3 data-led growth hacks that got our App ready for take off Applying scientif...3 data-led growth hacks that got our App ready for take off Applying scientif...
3 data-led growth hacks that got our App ready for take off Applying scientif...Skyscanner
 
Future of Marketing - Keynote Brandemonium 2018
Future of Marketing  - Keynote Brandemonium 2018 Future of Marketing  - Keynote Brandemonium 2018
Future of Marketing - Keynote Brandemonium 2018 Mathew Sweezey
 
Golden key - Sweden 2017
Golden key - Sweden 2017 Golden key - Sweden 2017
Golden key - Sweden 2017 Mathew Sweezey
 
Online Marketing Meetup ,Manchester
Online Marketing Meetup ,ManchesterOnline Marketing Meetup ,Manchester
Online Marketing Meetup ,ManchesterElena Baxendale
 
The Hottest B2B Marketing Trends of 2017
The Hottest B2B Marketing Trends of 2017The Hottest B2B Marketing Trends of 2017
The Hottest B2B Marketing Trends of 2017DemandWave
 
Digital Summit - Seattle 2017
Digital Summit - Seattle 2017 Digital Summit - Seattle 2017
Digital Summit - Seattle 2017 Mathew Sweezey
 
Big Data for the CMO
Big Data for the CMOBig Data for the CMO
Big Data for the CMOBruno Aziza
 
Storytelling for analytics | Naveen Gattu | CDAO Apex 2020
Storytelling for analytics | Naveen Gattu | CDAO Apex 2020Storytelling for analytics | Naveen Gattu | CDAO Apex 2020
Storytelling for analytics | Naveen Gattu | CDAO Apex 2020Gramener
 
5 Key Traits of High Performing Marketing Organizations
5 Key Traits of High Performing Marketing Organizations 5 Key Traits of High Performing Marketing Organizations
5 Key Traits of High Performing Marketing Organizations Mathew Sweezey
 
Marketing Research Trends in 2014
Marketing Research Trends in 2014Marketing Research Trends in 2014
Marketing Research Trends in 2014James Rothaar
 
Storyfying your Data: How to go from Data to Insights to Stories
Storyfying your Data: How to go from Data to Insights to StoriesStoryfying your Data: How to go from Data to Insights to Stories
Storyfying your Data: How to go from Data to Insights to StoriesGramener
 
PPT: 3 things that drive marketers crazy by Patricia Christiansen
PPT: 3 things that drive marketers crazy by Patricia ChristiansenPPT: 3 things that drive marketers crazy by Patricia Christiansen
PPT: 3 things that drive marketers crazy by Patricia ChristiansenPatricia Christiansen
 
State of B2B Marketing 2016
State of B2B Marketing 2016 State of B2B Marketing 2016
State of B2B Marketing 2016 Mathew Sweezey
 

What's hot (20)

Analysis of "Big data hype and reality - Gregory Piatetsky-Shapiro"
Analysis of "Big data hype and reality - Gregory Piatetsky-Shapiro"Analysis of "Big data hype and reality - Gregory Piatetsky-Shapiro"
Analysis of "Big data hype and reality - Gregory Piatetsky-Shapiro"
 
Nkateko Mongwe | Inspiring Data Scientist | 2021
Nkateko Mongwe | Inspiring Data Scientist | 2021Nkateko Mongwe | Inspiring Data Scientist | 2021
Nkateko Mongwe | Inspiring Data Scientist | 2021
 
Data Storytelling - Game changer for Analytics
Data Storytelling - Game changer for Analytics Data Storytelling - Game changer for Analytics
Data Storytelling - Game changer for Analytics
 
2008 ANA Masters of Marketing Speech
2008 ANA Masters of Marketing Speech2008 ANA Masters of Marketing Speech
2008 ANA Masters of Marketing Speech
 
Banksocial - Keynote 2017
Banksocial - Keynote 2017 Banksocial - Keynote 2017
Banksocial - Keynote 2017
 
New Era of Marketing - Content Marketing Conference - 2016
New Era of Marketing - Content Marketing Conference - 2016 New Era of Marketing - Content Marketing Conference - 2016
New Era of Marketing - Content Marketing Conference - 2016
 
How AI Can Help You Make Your Audience Sit Up and Take Notice
How AI Can Help You Make Your Audience Sit Up and Take NoticeHow AI Can Help You Make Your Audience Sit Up and Take Notice
How AI Can Help You Make Your Audience Sit Up and Take Notice
 
3 data-led growth hacks that got our App ready for take off Applying scientif...
3 data-led growth hacks that got our App ready for take off Applying scientif...3 data-led growth hacks that got our App ready for take off Applying scientif...
3 data-led growth hacks that got our App ready for take off Applying scientif...
 
Future of Marketing - Keynote Brandemonium 2018
Future of Marketing  - Keynote Brandemonium 2018 Future of Marketing  - Keynote Brandemonium 2018
Future of Marketing - Keynote Brandemonium 2018
 
Golden key - Sweden 2017
Golden key - Sweden 2017 Golden key - Sweden 2017
Golden key - Sweden 2017
 
Online Marketing Meetup ,Manchester
Online Marketing Meetup ,ManchesterOnline Marketing Meetup ,Manchester
Online Marketing Meetup ,Manchester
 
The Hottest B2B Marketing Trends of 2017
The Hottest B2B Marketing Trends of 2017The Hottest B2B Marketing Trends of 2017
The Hottest B2B Marketing Trends of 2017
 
Digital Summit - Seattle 2017
Digital Summit - Seattle 2017 Digital Summit - Seattle 2017
Digital Summit - Seattle 2017
 
Big Data for the CMO
Big Data for the CMOBig Data for the CMO
Big Data for the CMO
 
Storytelling for analytics | Naveen Gattu | CDAO Apex 2020
Storytelling for analytics | Naveen Gattu | CDAO Apex 2020Storytelling for analytics | Naveen Gattu | CDAO Apex 2020
Storytelling for analytics | Naveen Gattu | CDAO Apex 2020
 
5 Key Traits of High Performing Marketing Organizations
5 Key Traits of High Performing Marketing Organizations 5 Key Traits of High Performing Marketing Organizations
5 Key Traits of High Performing Marketing Organizations
 
Marketing Research Trends in 2014
Marketing Research Trends in 2014Marketing Research Trends in 2014
Marketing Research Trends in 2014
 
Storyfying your Data: How to go from Data to Insights to Stories
Storyfying your Data: How to go from Data to Insights to StoriesStoryfying your Data: How to go from Data to Insights to Stories
Storyfying your Data: How to go from Data to Insights to Stories
 
PPT: 3 things that drive marketers crazy by Patricia Christiansen
PPT: 3 things that drive marketers crazy by Patricia ChristiansenPPT: 3 things that drive marketers crazy by Patricia Christiansen
PPT: 3 things that drive marketers crazy by Patricia Christiansen
 
State of B2B Marketing 2016
State of B2B Marketing 2016 State of B2B Marketing 2016
State of B2B Marketing 2016
 

Similar to Build a Data Science Product with PyMC3 and Bayesian Methods

MRP: A Brand Tracking Model that Learns Overtime
MRP: A Brand Tracking Model that Learns OvertimeMRP: A Brand Tracking Model that Learns Overtime
MRP: A Brand Tracking Model that Learns OvertimeLatana
 
Digital Leadership Series : Shawn O'Neal
Digital Leadership Series : Shawn O'Neal Digital Leadership Series : Shawn O'Neal
Digital Leadership Series : Shawn O'Neal Capgemini
 
Marketers: the future is ready for you now
Marketers: the future is ready for you nowMarketers: the future is ready for you now
Marketers: the future is ready for you nowTNS
 
Growth Hacking: A Crash Course
Growth Hacking: A Crash CourseGrowth Hacking: A Crash Course
Growth Hacking: A Crash CourseDavid Fallarme
 
Going Beyond the Empowered Buyer: The Next Five Mega Marketing Challenges
Going Beyond the Empowered Buyer: The Next Five Mega Marketing Challenges Going Beyond the Empowered Buyer: The Next Five Mega Marketing Challenges
Going Beyond the Empowered Buyer: The Next Five Mega Marketing Challenges Demandbase
 
The Future of Marketing 2016: New Roles, and Trends
The Future of Marketing 2016: New Roles, and Trends The Future of Marketing 2016: New Roles, and Trends
The Future of Marketing 2016: New Roles, and Trends Mathew Sweezey
 
Introduction to lean analytics
Introduction to lean analyticsIntroduction to lean analytics
Introduction to lean analyticsKartik Narayanan
 
Rational Advertising Is Dead. Neuroscience Killed It. A Wasabi Rabbit White ...
Rational Advertising Is Dead.  Neuroscience Killed It. A Wasabi Rabbit White ...Rational Advertising Is Dead.  Neuroscience Killed It. A Wasabi Rabbit White ...
Rational Advertising Is Dead. Neuroscience Killed It. A Wasabi Rabbit White ...John Mustin
 
Digital Marketing Masterclass TIAS Executive Master Marketing
Digital Marketing Masterclass TIAS Executive Master MarketingDigital Marketing Masterclass TIAS Executive Master Marketing
Digital Marketing Masterclass TIAS Executive Master Marketingrobineffing
 
The transition to digital adulthood
The transition to digital adulthoodThe transition to digital adulthood
The transition to digital adulthoodEbiquity
 
Hype pres 2 final light
Hype pres 2 final lightHype pres 2 final light
Hype pres 2 final lightStephanie1301
 
7 Advanced Lead Nurturing Tips
7 Advanced Lead Nurturing Tips7 Advanced Lead Nurturing Tips
7 Advanced Lead Nurturing TipsPardot
 
Using Data To Inform Product Decisions - Cape Town, 26 March '15
Using Data To Inform Product Decisions - Cape Town, 26 March '15Using Data To Inform Product Decisions - Cape Town, 26 March '15
Using Data To Inform Product Decisions - Cape Town, 26 March '15Marc Abraham
 
Personalization Strategy Workshop - Los Angeles
Personalization Strategy Workshop - Los AngelesPersonalization Strategy Workshop - Los Angeles
Personalization Strategy Workshop - Los AngelesOptimizely
 
Why So Many Ads? An Introduction To Live Creative Optimisation
Why So Many Ads? An Introduction To Live Creative OptimisationWhy So Many Ads? An Introduction To Live Creative Optimisation
Why So Many Ads? An Introduction To Live Creative OptimisationAutomated Creative
 
5 pillars of the Infinite Marketer
5 pillars of the Infinite Marketer 5 pillars of the Infinite Marketer
5 pillars of the Infinite Marketer Mathew Sweezey
 
Phil Nottingham - How the best strategies start with the right metrics
Phil Nottingham - How the best strategies start with the right metricsPhil Nottingham - How the best strategies start with the right metrics
Phil Nottingham - How the best strategies start with the right metricsPhil Nottingham
 
CFO's Guide to Business Analytics
CFO's Guide to Business AnalyticsCFO's Guide to Business Analytics
CFO's Guide to Business AnalyticsManish Desai
 

Similar to Build a Data Science Product with PyMC3 and Bayesian Methods (20)

MRP: A Brand Tracking Model that Learns Overtime
MRP: A Brand Tracking Model that Learns OvertimeMRP: A Brand Tracking Model that Learns Overtime
MRP: A Brand Tracking Model that Learns Overtime
 
Digital Leadership Series : Shawn O'Neal
Digital Leadership Series : Shawn O'Neal Digital Leadership Series : Shawn O'Neal
Digital Leadership Series : Shawn O'Neal
 
Marketers: the future is ready for you now
Marketers: the future is ready for you nowMarketers: the future is ready for you now
Marketers: the future is ready for you now
 
5 Factors Driving Complexity
5 Factors Driving Complexity5 Factors Driving Complexity
5 Factors Driving Complexity
 
Growth Hacking: A Crash Course
Growth Hacking: A Crash CourseGrowth Hacking: A Crash Course
Growth Hacking: A Crash Course
 
Going Beyond the Empowered Buyer: The Next Five Mega Marketing Challenges
Going Beyond the Empowered Buyer: The Next Five Mega Marketing Challenges Going Beyond the Empowered Buyer: The Next Five Mega Marketing Challenges
Going Beyond the Empowered Buyer: The Next Five Mega Marketing Challenges
 
The Future of Marketing 2016: New Roles, and Trends
The Future of Marketing 2016: New Roles, and Trends The Future of Marketing 2016: New Roles, and Trends
The Future of Marketing 2016: New Roles, and Trends
 
Introduction to lean analytics
Introduction to lean analyticsIntroduction to lean analytics
Introduction to lean analytics
 
Rational Advertising Is Dead. Neuroscience Killed It. A Wasabi Rabbit White ...
Rational Advertising Is Dead.  Neuroscience Killed It. A Wasabi Rabbit White ...Rational Advertising Is Dead.  Neuroscience Killed It. A Wasabi Rabbit White ...
Rational Advertising Is Dead. Neuroscience Killed It. A Wasabi Rabbit White ...
 
Digital Marketing Masterclass TIAS Executive Master Marketing
Digital Marketing Masterclass TIAS Executive Master MarketingDigital Marketing Masterclass TIAS Executive Master Marketing
Digital Marketing Masterclass TIAS Executive Master Marketing
 
The transition to digital adulthood
The transition to digital adulthoodThe transition to digital adulthood
The transition to digital adulthood
 
Hype pres 2 final light
Hype pres 2 final lightHype pres 2 final light
Hype pres 2 final light
 
7 Advanced Lead Nurturing Tips
7 Advanced Lead Nurturing Tips7 Advanced Lead Nurturing Tips
7 Advanced Lead Nurturing Tips
 
Brandable newsletter for printers and mailers
Brandable newsletter for printers and mailersBrandable newsletter for printers and mailers
Brandable newsletter for printers and mailers
 
Using Data To Inform Product Decisions - Cape Town, 26 March '15
Using Data To Inform Product Decisions - Cape Town, 26 March '15Using Data To Inform Product Decisions - Cape Town, 26 March '15
Using Data To Inform Product Decisions - Cape Town, 26 March '15
 
Personalization Strategy Workshop - Los Angeles
Personalization Strategy Workshop - Los AngelesPersonalization Strategy Workshop - Los Angeles
Personalization Strategy Workshop - Los Angeles
 
Why So Many Ads? An Introduction To Live Creative Optimisation
Why So Many Ads? An Introduction To Live Creative OptimisationWhy So Many Ads? An Introduction To Live Creative Optimisation
Why So Many Ads? An Introduction To Live Creative Optimisation
 
5 pillars of the Infinite Marketer
5 pillars of the Infinite Marketer 5 pillars of the Infinite Marketer
5 pillars of the Infinite Marketer
 
Phil Nottingham - How the best strategies start with the right metrics
Phil Nottingham - How the best strategies start with the right metricsPhil Nottingham - How the best strategies start with the right metrics
Phil Nottingham - How the best strategies start with the right metrics
 
CFO's Guide to Business Analytics
CFO's Guide to Business AnalyticsCFO's Guide to Business Analytics
CFO's Guide to Business Analytics
 

Recently uploaded

Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...lizamodels9
 
RE Capital's Visionary Leadership under Newman Leech
RE Capital's Visionary Leadership under Newman LeechRE Capital's Visionary Leadership under Newman Leech
RE Capital's Visionary Leadership under Newman LeechNewman George Leech
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...Paul Menig
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst SummitHolger Mueller
 
The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024christinemoorman
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageMatteo Carbone
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora
 
Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth MarketingShawn Pang
 
Catalogue ONG NUOC PPR DE NHAT .pdf
Catalogue ONG NUOC PPR DE NHAT      .pdfCatalogue ONG NUOC PPR DE NHAT      .pdf
Catalogue ONG NUOC PPR DE NHAT .pdfOrient Homes
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Neil Kimberley
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Roland Driesen
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessAggregage
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Servicediscovermytutordmt
 
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurSuhani Kapoor
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxAndy Lambert
 
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒anilsa9823
 

Recently uploaded (20)

Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
RE Capital's Visionary Leadership under Newman Leech
RE Capital's Visionary Leadership under Newman LeechRE Capital's Visionary Leadership under Newman Leech
RE Capital's Visionary Leadership under Newman Leech
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst Summit
 
The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
 
Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
 
Catalogue ONG NUOC PPR DE NHAT .pdf
Catalogue ONG NUOC PPR DE NHAT      .pdfCatalogue ONG NUOC PPR DE NHAT      .pdf
Catalogue ONG NUOC PPR DE NHAT .pdf
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for Success
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Service
 
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptx
 
Forklift Operations: Safety through Cartoons
Forklift Operations: Safety through CartoonsForklift Operations: Safety through Cartoons
Forklift Operations: Safety through Cartoons
 
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
 

Build a Data Science Product with PyMC3 and Bayesian Methods

  • 1. Building a Data Science Product with PyMC3 Korbinian Kuusisto, Data Scientist @ Latana
  • 2. Outline Building a next generation brand tracker Why bother with Bayesian frameworks Bayes in production
  • 3. There are hundreds of consumer startups and scaleups and everyone investing into brand. Is this money well spent?
  • 4. Market research provides tools to track brand perception with surveys. And we wanted in.
  • 5. But we discovered traditional brand tracking wasn’t working very well. Marketeers are interested in niche audiences! Most brands do not actively market to everyone. Rather they have smaller target groups they focus on, be it in terms of age, gender, location, interest or other demographic or psychographic criteria.
  • 6. And with these traditional brand trackers, if one wants to boil it down to one of these target groups, the sample sizes become even smaller, the margin-of- error skyrockets out of control and the insights become entirely in-actionable. Not good.
  • 7. We concluded that we needed to fundamentally rethink brand tracking if we want to truly solve brands problems and help them grow. Their problem is straightforward: they need reliable insights to understand how their brand is performing in the real world, for their various audiences, and how this is changing over time. So we chose to use MRP.
  • 8. To explain how MRP works, we need to first compare it with a more traditional way of doing things. Let’s take the example of measuring opinion in a very specific, small target audience. Imagine a brand who wants to run a campaign targeting young females who use Twitter and also like American football. They want to find out what this specific group of people think of their brand. The traditional brand tracker creates a sample of 1,000+ respondents and then zooms in young females who use Twitter and like American football. In the end, there are 20 respondents who fit the target audience. The brand tracker takes the average opinion of this group but because the number of respondents is so small, the margin of error is large.
  • 10. Latana is able to fix this problem. Instead of narrowing the sample size to just 20 respondents, MRP makes an estimate of the target audience group by using ALL the information available in the 1,000+ respondent sample size. This means it looks at ALL the young female females, ALL the people who use Twitter, and all the people who like American football. Because we use all the information from the sample, the estimate for a small group is much more reliable. Therefore, the magic potion isn’t really magic at all. It’s as simple as this: instead of focusing on a tiny group in a target audience, MRP builds a model. This model is used to calculate the opinion of a brand by looking at the respondents’ individual characteristics and how they relate to the brand.
  • 11. MRP
  • 12. So, essentially MRP can be used as a model driven approach to brand tracking. Whereas the method was originally designed using a Hierarchical Bayesian model, one is free to choose any binary classifier that returns some estimate for the probability that a person knows a brand. So if you use Python, you could choose your favourite library scikit learn and try all kinds of classifiers. We did that in the beginning and just used a simple logistic regression and were good to go!
  • 13. Introducing Latana The first brand tracking tool to use data science to ensure reliable and accurate brand insights.
  • 14. Sounds Cool, But Why Bayesian Methods?
  • 15. #1: Learning from prior information #2: Uncertainty quantification
  • 16. #1: Learning from prior information Blinkist is an up-and-coming startup that has built a reading app that condenses non-fiction books into 15-minute audio summaries. Latana monitored Blinkist’s levels of brand awareness in Germany before, during and after Blinkist’s TV campaign by surveying 2000 people. They then used the MRP model to predict brand awareness levels for hundreds of niche target audiences.
  • 17. #1: Learning from prior information So, how does using a Bayesian model with prior information help us? What we soon discovered is that the real world isn’t always as rosy as it seems, and sometimes even single characteristics are hard to reach. One may end up collecting a sample of 2000 people, but only 200 of those fall into a certain category.
  • 18. #1: Learning from prior information With Blinkist test, this was the case for people between 56-65 years old who are on average less tech savvy and thus less likely to fill out our mobile surveys. To estimate brand awareness for the small group of respondents aged 56-65 (approximately 11% of the sample / 220 people), using prior information from past surveys is crucial. In the graph below, it can be seen that if prior information is not used, the brand awareness estimate for this group is essentially the same as the overall brand awareness of 7.5%.
  • 19. #1: Learning from prior information This happens because the MRP model doesn’t have enough information from respondents aged 56-65 in the sample to find any differences between them and the rest of the sample. However, if the MRP model is allowed to use information from the past (i.e. the survey data that occurred before and during the campaign), then this helps the model find a stronger signal. By using prior information, there comes a different result: the MRP model estimates that brand awareness for 56-65-year-olds is 5.5%. Therefore, without using prior information, MRP would not be able to detect a difference between the general population and 56-65-year-olds and would simply assign the niche audience the overall average of 7.5%, even if the full sample of 2000 respondents was used.
  • 20. #1: Learning from prior information In this case, this “low education” niche audience is considered as people who don’t have higher education. Again we see a similar pattern as the previous example. The model that uses prior information helps detect a lower level of brand awareness, even with small sample sizes. On the flip side, the model that doesn’t use prior information only starts to detect the lower sample size at a sample size of 800 respondents or more.
  • 21. #2: Uncertainty quantification Let’s assume one of our clients runs a marketing campaign between October and December. Then in December they look at the Latana dashboard and see that the brand awareness increased in some niche audience from 5% to 8%. Now the question is how likely is that increase? Well in a frequentist world one would just come up with some t-test or bootstrap confidence bounds and then give a YES or NO. So ‘Yes’ this change happened and isn’t just some random noise or ‘NO’ it did not. Well we figured out that marketeers don’t really like showing to their boss that there was actually no effect of their campaign. So is there a better way to frame that?
  • 22. #2: Uncertainty quantification Well with a Bayesian model one always gets the full posterior distribution of estimates. This is nice since then one can just compare the probability masses.
  • 23. #2: Uncertainty quantification So if you, for example, have two estimates, one before and one after the campaign, just look at the overlap of their posteriors and you will be to say: “With a probability of 80% we are very certain that our campaign had a positive effect on the awareness of our brand” Which also means that if they mess up, they would still get some weak change probability of what ever 30-60%, which is better than a definite NO.
  • 24. #2: Uncertainty quantification So how does this look like in our dashboard? So basically whenever you want to compare two estimates, the dashboard also shows you the change probability with a color coding. This is something really helpful for our clients.
  • 25. Using Bayesian models in production The results looked really good, but now to the hands on part. For coding the Bayesian model we used PyMC3 and started off with the general full Bayesian inference algorithm, most advanced one currently is Hamiltonian MCMC (NUTS). The advantage is that it covers all complex posterior distributions, even when they are multi modal and so on. However, this solver is highly unstable, it takes several hours and is just not practicable in production. There is also another approach that is much lighter so called approximative Bayesian inference (variational inference). This algorithm basically assumes a smooth distribution and then just finds the one that best fits the data. It is stable, fast but the disadvantage is that it does not cover complex distributions. We ran some tests and compared those two, and chose the second one because it gave us much better results.
  • 26. Using Bayesian models in production
  • 27. Using Bayesian models in production So interesting for people here is maybe how that looks like in production. Well it is actually not so much different from using other machine learning libraries in production. We wrote our model in PyMC3, then packed it into a Django web service, deployed the web service on AWS. Now our survey engine generates survey responses in real time, writes them to our database, our web service picks them up, calculates the results in a reasonable time, writes them back to the database and the Latana dashboard updates from there.
  • 28. Summary Bayesian methods added a whole new layer of value to our product. Quantify probability of change in brand KPIs Use prior information to uncover hard to reach audiences Bayesian methods in production is no magic

Editor's Notes

  1. As a young company we also wanted to enter this market and launched a product called BrandTracker in 2017. With BrandTracker, we offered a leaner, lower-cost version that focused on a set of standardised KPIs (around 5-10) and smaller sample sizes (usually 500). We delivered insights to our clients on a regular basis, usually monthly or quarterly, through an easy-to-use dashboard. Some aspects of BrandTracker were received really well by our clients. The dashboard was intuitive and a big improvement over the industry-typical PowerPoint presentations or PDF documents. Also, the speed of BrandTracker was a big plus. We were able to deliver results within 1-2 weeks, in an industry where clients often wait months to get the first insights. Lastly, our low-touch approach allowed us to keep the prices low. Our clients were surprised how much value they could get for their money, especially those that had previous experience with brand tracking.
  2. Most brands we work with do not actively market to everyone. Rather they have smaller target groups they focus on, be it in terms of age, gender, location, interest or other demographic or psychographic criteria.
  3. If one wants to boil it down to one of these target groups, the sample sizes become even smaller, the margin-of-error skyrockets out of control and the insights become entirely in-actionable.
  4. After countless conversations with our clients, we concluded that we needed to fundamentally rethink our approach if we want to truly solve their problems and help them to build a thriving brand. Their problem is straightforward: they need reliable insights to understand how their brand is performing in the real world, for their various audiences, and how this is changing over time. After months of conceptualising and prototyping, we concluded that recent innovation in data science, Multilevel Regression and Poststratification (MRP), could be a tool to solve this problem. It recently gained popularity within election predictions with great success so we decided to adapt and further develop it to the benefit of consumer brands.
  5. If one wants to boil it down to one of these target groups, the sample sizes become even smaller, the margin-of-error skyrockets out of control and the insights become entirely in-actionable.
  6. To explain how MRP works, we compare it with a more traditional way of doing things. Let’s take the example of measuring opinion in a very specific, small target audience. Imagine a brand who wants to run a campaign targeting young females who use twitter. They want to find out what this specific group of people think of their brand. The traditional brand tracker creates a sample of 1,000+ respondents and then zooms in young females who use twitter. In the end, there are 20 respondents who fit the target audience. The brand tracker takes the average opinion of this group but because the number of respondents is so small, the margin of error is large.
  7. If one wants to boil it down to one of these target groups, the sample sizes become even smaller, the margin-of-error skyrockets out of control and the insights become entirely in-actionable.
  8. This large margin of error is a problem Latana is able to fix. Instead of narrowing the sample size to just 20 respondents, MRP makes an estimate of the target audience group by using ALL the information available in the 1,000+ respondent sample size. This means it looks at ALL the young people, ALL the females, and ALL the people who use twitter. Because we use all the information from the sample, the estimate for a small group is much more reliable. The magic potion isn’t really magic at all. It’s as simple as this: instead of focusing on a tiny group in a target audience, MRP builds a model. This model is used to calculate the opinion of a brand by looking at the respondents’ individual characteristics and how they relate to the brand.
  9. So essentially MRP can be used as a model driven approach to brand tracking. Whereas the method was originally designed using a Hierarchical Bayesian model, one is free to choose any binary classifier that returns some estimate for the probability that a person knows a brand. So if you use Python, you could choose your favourite library scikit learn and try all kinds of classifiers. We did that in the beginning and just used a simple logistic regression and were good to go!
  10. Our engineers developed an even fancier and more intuitive dashboard. This time we used MRP in the backend.
  11. However, back to the premise of the talk: Why would we want to switch a working product that uses an easy to understand model to a way more complicated Bayesian framework?
  12. When we talk about Bayesian methods in an academic context, usually two big advantages are mentioned: Using prior information in your model Having a probabilistic way to quantify uncertainty in our estimates But how does this add value to our product?
  13. Let’s focus on the first one for now and take one of our clients as a showcase: Blinkist is an up-and-coming startup that has built a reading app that condenses non-fiction books into 15-minute audio summaries. Latana monitored Blinkist’s levels of brand awareness in Germany before, during and after Blinkist’s TV campaign by surveying 2000 people. They then used the MRP model to predict brand awareness levels for hundreds of niche target audiences.
  14. So what we tested with our client, how does the notion of using a Bayesian model with prior information help us? What we soon discovered is that the real world isn’t always as rosy as it seems, and sometimes even single characteristics are hard to reach. One may end up collecting a sample of 2000 people, but only 200 of those fall into a certain category.
  15. In our Blinkist test this was the case for people between 56-65 years old who are on average less tech savvy and thus less likely to fill out our mobile surveys. To estimate brand awareness for the small group of respondents aged 56-65 (approximately 11% of the sample / 220 people), using prior information from past surveys is crucial. In the graph below, it can be seen that if prior information is not used, the brand awareness estimate for this group is essentially the same as the overall brand awareness of 7.5%. This happens because the MRP model doesn’t have enough information from respondents aged 56-65 in the sample to find any differences between them and the rest of the sample. However, if the MRP model is allowed to use information from the past (i.e. the survey data that occurred before and during the campaign), then this helps the model find a stronger sig- nal. By using prior information, there comes a different result: the MRP model estimates that brand awareness for 56-65-year-olds is 5.5%. Therefore, without using prior information, MRP would not be able to detect a difference between the general population and 56-65-ye- ar-olds and would simply assign the niche au- dience the overall average of 7.5%, even if the full sample of 2000 respondents was used.
  16. In our Blinkist test this was the case for people between 56-65 years old who are on average less tech savvy and thus less likely to fill out our mobile surveys. To estimate brand awareness for the small group of respondents aged 56-65 (approximately 11% of the sample / 220 people), using prior information from past surveys is crucial. In the graph below, it can be seen that if prior information is not used, the brand awareness estimate for this group is essentially the same as the overall brand awareness of 7.5%. This happens because the MRP model doesn’t have enough information from respondents aged 56-65 in the sample to find any differences between them and the rest of the sample. However, if the MRP model is allowed to use information from the past (i.e. the survey data that occurred before and during the campaign), then this helps the model find a stronger sig- nal. By using prior information, there comes a different result: the MRP model estimates that brand awareness for 56-65-year-olds is 5.5%. Therefore, without using prior information, MRP would not be able to detect a difference between the general population and 56-65-ye- ar-olds and would simply assign the niche au- dience the overall average of 7.5%, even if the full sample of 2000 respondents was used.
  17. In this case, this “low education” niche audien- ce is considered as people who don’t have higher education. Again we see a similar pat- tern as the previous example. The model that uses prior information helps de- tect a lower level of brand awareness, even with small sample sizes. On the flip side, the model that doesn’t use prior information only starts to detect the lower sample size at a sample size of 800 respondents or more.
  18. Let’s assume one of our clients runs a marketing campaign between October and December. Then in December they look at the Latana dashboard and see that the Brand awareness of their brand increased in some niche audience from 5% to 8%. Now the question is how likely is that increase? Well in a frequentist world one would just come up with some t-test or bootstrap confidence bounds and then give a YES or NO. So Yes this change happened and isn’t just some random noise or NO it did not. Well we figured out that marketeers don’t really like showing to their boss that there was actually no effect of their campaign ;) So is there a better way to frame that?
  19. Well with a Bayesian model one always gets the full posterior distribution of estimates. This is nice since then one can just compare the probability masses.
  20. So if you for example have two estimates, one before and one after the campaign, just look at the overlap of their posteriors and you will be to say: “With a probability of 80% we are very certain that our campaign had a positive effect on the awareness of our brand” Which also means that if they mess up, they would still get some weak change probability of what ever 30-60%, which is better than a definite NO.
  21. So how does this look like in our dashboard? So basically whenever you want to compare two estimates, the dashboard also shows you the change probability with a color coding. This is something really helpful for our clients.
  22. The results looked really good, but now to the hands on part. For coding the Bayesian model we used PyMC3 and started off with the general full Bayesian inference algorithm, most advanced one currently is Hamiltonian MCMC (NUTS). The advantage is that it covers all complex posterior distributions, even when they are multi modal and so on. However, this solver is highly unstable, it takes several hours and is just not practicable in production. There is also another approach that is much lighter so called approximative Bayesian inference (variational inference). This algorithm basically assumes a smooth distribution and then just finds the one that best fits the data. It is stable, fast but the disadvantage is that it does not cover complex distributions. We ran some tests and compared those two, and chose the second one because it gave us much better results.
  23. So interesting for people here is maybe how that looks like in production. Well it is actually not so much different from using other machine learning libraries in production. We wrote our model in PyMC3, then packed it into a Django webservice, deployed the webservice on AWS. Now our survey engine generates survey responses in real time, writes them to our database, our web service picks them up, calculates the results in a reasonable time, writes them back to the database and the Latana dashboard updates from there.
  24. So what we tested with our client, how does the notion of using a Bayesian model with prior information help us? What we soon discovered is that the real world isn’t always as rosy as it seems, and sometimes even single characteristics are hard to reach. One may end up collecting a sample of 2000 people, but only 200 of those fall into a certain category.