What I learned from auditing over 1,000,000 websites - SERP Conf 2024 Patrick...
Balance between insight and noise indicia v2
1. Big data: The balance between insight and noise
With so many combinations inherent in big data, it’s inevitable that many patterns will begin to
emerge showing strong correlations. However in reality the vast majority of these patterns will just
be noise, worse than redundant when looking for insight as they actively lead you into making false
conclusions, what statisticians call ‘spurious correlations’. Statistically speaking your chances of
finding randomness masquerading as correlation increase faster than the amount of valid
correlations, creating a significant gap between what we actually know and what we think we know.
What compounds the issue is that as humans we are evolutionarily programmed to spot patterns in
noise, so we find ourselves naturally seeing things that don’t exist. Add to this our tendency for
confirmation bias, or to only pay attention to information which supports our own ideas, and we
realise that the combination of big data and human inference generally lead to identifying spurious
correlations more often than true insight.
So how do we overcome this? There are statistical processes which can dramatically reduce the
chance of spotting a spurious correlation but the amount of measurable data-sets are increasing
much faster than technology can produce processing power, so it’s just not practical to rely on
algorithms.
Assuming you need the insight this year rather than next, you’re going to need to take shortcuts.
Although human inference brings errors to the table, it can also provide great context which cannot
be quickly computed. Here are our seven steps to ensuring big data gives you the insight you need
without compromising on accuracy:
1. Create a culture of statistical numeracy
Not long ago you’d be laughed out of the room for suggesting that a strong grasp of statistical
techniques is a must for a Strategist or Planner, but with the increasing amount of insight being
derived from data it’s becoming a must.
Back at your agency or brand you should endeavour to create a culture where a good understanding
of statistical concepts is as important as basic numeracy. Look for academic qualifications in
mathematical subjects in your new hires and encourage everyone to get down and dirty with the
analysis.
Not everyone needs to be able to tell their Monte Carlo technique from their T-Tests, but encourage
your fellow Planners and Strategists to get their heads around the basics. Principles like the
correlation coefficient and z-scores will allow you to interpret insights in a robust way. If in doubt,
grab yourself a guide on stats, it doesn’t take long to get your head around and massively increases
the chance of spotting human errors.
2. Recognise the benefits of collecting everything (but don’t keep it all)
2. With data storage so cheap it really does pay to collect everything you can. Work with your
Technology teams and partners to ensure as much as possible is tracked and linked back to central
systems – there’s very little benefit in siloed data.
It’s critical to collect good quality data; like focus groups, survey data and any other traditional
insight source, if there’s an error in the original data like a misbehaving member or a leading
question, you’re going to be seeing errors in the outcome. When tracking data, time is a great
healer; data sets with longer timescales tend to be more accurate as the true average begins to
outweigh fluctuations caused by temporary behaviour changers like national events etc.
It sounds obvious, but it’s amazing how many brands collect the same information in different areas
but in slightly different formats. Maintain a holistic view of how all data sources will fit together and
make sure every piece is able to be integrated into the whole.
3. Set clearobjectives
It’s easy to get carried away with the almost limitless possibilities inherent in large data-sets, finding
yourself pursuing a tangential line of enquiry far removed from your initial task. Whilst these
meanderings can sometimes provide serendipitous results, more often than not they’re produce
little more than filler. Like everything we do, looking for insight in big data should begin with strict
objectives.
Clear objectives mean that you can only focus on the correlations which make a difference to the
specific outcomes you’re after. For example are you looking for key steps in a purchase journey? It
might be interesting to look into post-purchase data, but when data overload is such a problem it’s
worth saving that for a separate project.
4. Let the system spot the (apparent) correlations
Analytical systems are fantastic at spotting apparent correlations between differing data sets.
Essentially it’s a case of holding two pieces of data next to each other, checking to see if they match
up and then moving onto the next. Processing power is now so massive that huge data-sets can be
searched for these apparent correlations in a relatively short amount of time. This really is a pure
numbers game, so let the system take the strain and request a list of all correlating factors and their
constituent parts. Just be careful to not believe everything they tell you…
5. Let human inference sort the spurious from the real
It’s more difficult to understand the subtleties and shades of grey which can indicate a spurious
correlation. Once the system has output every apparent correlation, work through them and apply
your understanding of the context surrounding the data to spot aberrations. Always challenge the
correlation, is there a hidden third factor which could be the correlating factor?
3. The oft-quoted story of the relationship between ice cream sales and swimming pool deaths serves
as a telling, if a little morbid, example. Obviously we can all spot the real causative factor as an
increase in hot weather but a system would lack this contextual knowledge.
Team with your analytics department and work through all the correlations to find the ones that feel
right, using our inbuilt ability to get to spot patterns. Do they fit the context? Do they make sense in
your gut? Could you repeat them? It’s not fool proof, but it takes a long time for a computer to
replicate these human decisions.
But, beware of your inherent biases, like it or not you will naturally look for supporting evidence –
overcome this by introducing other strategists or analysts to the process with no interest in the
outcome and see if they find the same correlations.
6. Test everything!
The most important rule of all; don’t use insight derived from big data to replace the certainty
gained by testing your hypothesis. Even the most stringent processes make mistakes from time to
time so it’s important to use testing to repeat the situation and check it still stacks.
Once you’ve narrowed down the amount of correlations to a small number you think aren’t
spurious, don’t base your next annual strategy on them, test them! Whether through simple A/B
tests or complex multi-variant testing, digital technology means you can generally prove hypotheses
pretty quickly and apply them soon after.
If it’s not possible to test in a live environment then it’s worth seeing if you can replicate your
findings on similar sets of data. In statistical modelling it’s common to save some of the data as a
separate validation set to ensure the model works, this is a great principle to follow for deriving
insights from big data – if the hypothesis doesn’t work on both sets of data then it’s likely to be
spurious.
7. Don’t rest on your laurels
Finally, you should never rest on your insights for too long, collect more data and ensure your
theories are still correct. Remember, the actions you’ve taken since first finding the insight (e.g.
changing an ad, modifying a consumer journey) will have had an impact on subsequent behaviour.
It’s almost an equivalent of the physics-based ‘uncertainty principle’. By measuring certain factors
and utilising them in your strategies you are likely to change the foundations on which those insights
are based.
Ensure you regularly check the data to make sure your original hypothesis still holds.It’s important to
understand if your actions havechanged the principles the original insight was based on.
Nick Barthram, Principal Planner, Indicia