Heavy, Messy, 
Misleading 
Why Big Data is a human problem, 
not a technology one 
Francesco D’Orazio, @abc3d 
VP Product, PulsarPlatform.com
Every talk about Big Data 
should start with Twin Peaks
There’s more to big data than 
the technology behind it. 
And the best way to find out 
what it is, is to start from the 
metaphors of Big Data
stream 
of data
ocean 
of data
river 
of data
a data 
leak
data 
firehose
data 
flood
data 
tsunami
data “is” 
fluid
data “is” 
huge
data “is” 
powerful
data “is” 
unpredictable
data “is” 
uncontrollable
Data is the new oil(?!)
We are not going to war for it 
(yet)
Data is not a scarce resource
The abundance of data is the 
result of the instrumentalisation 
of the natural, industrial and 
social worlds
The Large Hadron Collider can record up to 
40 million particle interactions per second
The Square Kilometer Array will collect data from the deep 
space dating back to more than 13 billion years ago
Wolfram Data Science on Facebook Data: how our 
topics of discussion change by age and gender
Carna Botnet: in just 60 seconds nearly 640 Terabytes of IP 
data is tranferred across the globe via the Internet
Machine Sensing
The sensors on the new Airbus 380 generate 10 terabytes of data 
every 30 minutes. That’s 120T every LDN-NYC flight
And yet, another reason why data 
is not the new oil is that we are 
not actually doing it much…
99.5% 
Percentage of newly created digital data 
that’s never analysed
But that’s not strictly true 
either…
0.5% 
Percentage of newly created digital data 
that’s actually being used
higher % of teenagers having sex 
vs 
% of new data being analysed
Credit Scores have replaced the handshake 
with the bank manager
Fair and Isaac came along in 1956. Today they 
crunch around 10 billion scores each year
Buying advertising used to be about smiles 
and jokes over Martini lunches
Now it looks more like this…
11 seconds of trading for the FB shares. Already in 2006 one 
third of all transactions in EU and US was algorithmic
Walmart handles more than 1M customer transactions 
per hour, all affected by price elasticity
Price Discrimination based on log in info, browser history, device, 
A/B testing is common practice for most online retailers
75% of the content Netflix serves is chosen 
based on a Netflix recommendation
At Buzzfeed every item of content has its own dashboard showing how it 
spreads form ‘seed views’ to ‘social views’ and by what ‘viral lift’
Upworthy 
Systematic experimentation: 15% of the top 
10.000 websites uses A/B testing
Crowdpac matches candidates and funders based on analysis of public 
speeches, contribution and other sources of public data on the candidate
LAPD run a pilot to predict where a crime is going to happen next 
(‘crime aftershocks’) based on 13 million crimes over 80 years
The Dubai police is equipping officers with Google Glass enabled 
with face recognition to identify potential wanted criminals
Systematic experimentation: 15% of the top 
10.000 websites uses A/B testing
LinkedIn had a student problem: so they re-arranged the 
data they already have for a student audience
99.5% 
Why then are we throwing away this 
much data?
We are still learning to 
recognize problems as 
data-problems
Big Data changes the very 
definition of how we 
produce knowledge
Less > More 
Exact > Messy 
Causation > Correlation
Significant correlation requires 
scale. And scale is hard to handle.
DNA research is a case in point: DNA data is hard to manipulate and there’s not 
enough sequenced DNA available to establish significant patterns
Big Data comes with Big Errors
Data is rarely normalised.
Data is siloed and not verifiable.
Big does not equal whole.
Big does not equal representative.
Data doesn’t speak for 
itself. We speak for it.
Big Data is still biased and the 
result of interpretation.
Correlation doesn’t imply 
causality.
Models are often too simple and 
not peer-reviewed.
Context is hard to interpret at 
scale. Traditional Qual & Quant 
have to work with big data.
3 billion queries/day 
50 million top keywords identified 
5 years of data on flu spread matched 
Overestimates by 50% 
Didn’t predict pandemics
Big Data also means a big 
new digital divide.
Accessible doesn’t mean ethical.
The problems slowing down 
the adoption of Big Data 
are human problems
And that’s because the 
biggest innovation in Big 
Data is a human innovation
An innovation in 
decision-making: framing, 
solving and actioning a 
problem
“Data is just like crude. It’s 
valuable, but if unrefined it 
cannot really be used. It has to 
be changed into gas, plastic, 
chemicals, etc., to create a 
valuable entity that drives 
profitable activity; so must data 
be broken down, analyzed for it 
to have value” Michael Palmer
The opportunity in Big Data 
is data middleware: turning 
crude into gas, plastic, 
chemicals
But until we invent the new plastic, the new gas, the new chemicals, we 
are stuck with the smokescreen. Or even the smoke monster.
Thank You 
Francesco D’Orazio, @abc3d 
VP Product, PulsarPlatform.com

Heavy, Messy, Misleading: How Big Data is a human problem, not a tech one