Build a Data Science Product with PyMC3 and Bayesian Methods

Building a Data Science Product
with PyMC3
Korbinian Kuusisto, Data Scientist @ Latana

Outline
Building a next generation
brand tracker
Why bother with Bayesian
frameworks
Bayes in production

There are hundreds of
consumer startups and
scaleups and everyone
investing into brand.
Is this money well
spent?

Market research provides
tools to track brand
perception with surveys.
And we wanted in.

But we discovered traditional
brand tracking wasn’t working
very well.
Marketeers are interested in
niche audiences! Most brands
do not actively market to
everyone. Rather they have
smaller target groups they
focus on, be it in terms of age,
gender, location, interest or
other demographic or
psychographic criteria.

And with these traditional brand trackers, if one wants
to boil it down to one of these target groups, the
sample sizes become even smaller, the margin-of-
error skyrockets out of control and the insights
become entirely in-actionable.
Not good.

We concluded that we needed to
fundamentally rethink brand tracking if we
want to truly solve brands problems and
help them grow.
Their problem is straightforward: they
need reliable insights to understand how
their brand is performing in the real world,
for their various audiences, and how this
is changing over time.
So we chose to use MRP.

To explain how MRP works, we need to first compare it with a more traditional way of
doing things. Let’s take the example of measuring opinion in a very specific, small target
audience.
Imagine a brand who wants to run a campaign targeting young females who use Twitter
and also like American football. They want to find out what this specific group of people
think of their brand.
The traditional brand tracker creates a sample of 1,000+ respondents and then zooms in
young females who use Twitter and like American football. In the end, there are 20
respondents who fit the target audience. The brand tracker takes the average opinion of
this group but because the number of respondents is so small, the margin of error is large.

Latana is able to fix this problem.
Instead of narrowing the sample size to just 20 respondents, MRP makes an estimate of
the target audience group by using ALL the information available in the 1,000+ respondent
sample size. This means it looks at ALL the young female females, ALL the people who
use Twitter, and all the people who like American football. Because we use all the
information from the sample, the estimate for a small group is much more reliable.
Therefore, the magic potion isn’t really magic at all. It’s as simple as this: instead of
focusing on a tiny group in a target audience, MRP builds a model. This model is used to
calculate the opinion of a brand by looking at the respondents’ individual characteristics
and how they relate to the brand.

So, essentially MRP can be used as a model driven
approach to brand tracking.
Whereas the method was originally designed using a
Hierarchical Bayesian model, one is free to choose
any binary classifier that returns some estimate for
the probability that a person knows a brand.
So if you use Python, you could choose your
favourite library scikit learn and try all kinds of
classifiers.
We did that in the beginning and just used a simple
logistic regression and were good to go!

Introducing Latana
The first brand tracking tool to use data
science to ensure reliable and accurate
brand insights.

Sounds Cool, But Why Bayesian
Methods?

#1: Learning from prior
information
#2: Uncertainty quantification

information
Blinkist is an up-and-coming startup that has
built a reading app that condenses non-fiction
books into 15-minute audio summaries. Latana
monitored Blinkist’s levels of brand awareness
in Germany before, during and after Blinkist’s
TV campaign by surveying 2000 people. They
then used the MRP model to predict brand
awareness levels for hundreds of niche target
audiences.

information
So, how does using a Bayesian model with prior information
help us?
What we soon discovered is that the real world isn’t always as rosy
as it seems, and sometimes even single characteristics are hard to reach.
One may end up collecting a sample of 2000 people, but only 200 of those
fall into a certain category.

information
With Blinkist test, this was the case for people between 56-65
years old who are on average less tech savvy and thus less
likely to fill out our mobile surveys.
To estimate brand awareness for the small group of respondents
aged 56-65 (approximately 11% of the sample / 220 people),
using prior information from past surveys is crucial. In the
graph below, it can be seen that if prior information is not used,
the brand awareness estimate for this group is essentially the
same as the overall brand awareness of 7.5%.

information
This happens because the MRP model doesn’t have enough
information from respondents aged 56-65 in the sample to find
any differences between them and the rest of the sample.
However, if the MRP model is allowed to use information from the
past (i.e. the survey data that occurred before and during the
campaign), then this helps the model find a stronger signal. By using
prior information, there comes a different result: the MRP model
estimates that brand awareness for 56-65-year-olds is 5.5%.
Therefore, without using prior information, MRP would not be able to
detect a difference between the general population and
56-65-year-olds and would simply assign the niche audience
the overall average of 7.5%, even if the full sample of 2000
respondents was used.

information
In this case, this “low education” niche audience is considered as
people who don’t have higher education. Again we see a similar
pattern as the previous example.
The model that uses prior information helps detect a lower level of
brand awareness, even with small sample sizes. On the flip side, the
model that doesn’t use prior information only starts to detect the
lower sample size at a sample size of 800 respondents or more.

Let’s assume one of our clients runs a marketing campaign between October
and December.
Then in December they look at the Latana dashboard and see that the brand
awareness increased in some niche audience from 5% to 8%.
Now the question is how likely is that increase?
Well in a frequentist world one would just come up with some t-test or
bootstrap confidence bounds and then give a YES or NO. So ‘Yes’ this
change happened and isn’t just some random noise or ‘NO’ it did not.
Well we figured out that marketeers don’t really like showing to their boss
that there was actually no effect of their campaign.
So is there a better way to frame that?

Well with a Bayesian model one always
gets the full posterior distribution of
estimates. This is nice since then one can
just compare the probability masses.

So if you, for example, have two estimates, one before
and one after the campaign, just look at the overlap of
their posteriors and you will be to say:
“With a probability of 80% we are very certain that our
campaign had a positive effect on the awareness of our
brand”
Which also means that if they mess up, they would still
get some weak change probability of what ever 30-60%,
which is better than a definite NO.

So how does this look like in our dashboard?
So basically whenever you want to compare two
estimates, the dashboard also shows you the change
probability with a color coding. This is something really
helpful for our clients.

Using Bayesian models in
production
The results looked really good, but now to the hands on part.
For coding the Bayesian model we used PyMC3 and started off with the general full
Bayesian inference algorithm, most advanced one currently is Hamiltonian MCMC (NUTS).
The advantage is that it covers all complex posterior distributions, even when they are multi
modal and so on.
However, this solver is highly unstable, it takes several hours and is just not practicable in
production. There is also another approach that is much lighter so called approximative
Bayesian inference (variational inference).
This algorithm basically assumes a smooth distribution and then just finds the one that best
fits the data. It is stable, fast but the disadvantage is that it does not cover complex
distributions.
We ran some tests and compared those two, and chose the second one because it gave us
much better results.

production

production
So interesting for people here is maybe how that looks like in production.
Well it is actually not so much different from using other machine learning libraries in production.
We wrote our model in PyMC3, then packed it into a Django web service, deployed the web service
on AWS.
Now our survey engine generates survey responses in real time, writes them to our database, our
web service picks them up, calculates the results in a reasonable time, writes them back to the
database and the Latana dashboard updates from there.

Summary
Bayesian methods added a
whole new layer of value to
our product.
Quantify probability of change
in brand KPIs
Use prior information to
uncover hard to reach
audiences
Bayesian methods in
production is no magic

Build a Data Science Product with PyMC3 and Bayesian Methods

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Build a Data Science Product with PyMC3 and Bayesian Methods

Similar to Build a Data Science Product with PyMC3 and Bayesian Methods (20)

Recently uploaded

Recently uploaded (20)

Build a Data Science Product with PyMC3 and Bayesian Methods

Editor's Notes