Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Heavy, Messy, 
Misleading 
Why Big Data is a human problem, 
not a technology one 
Francesco D’Orazio, @abc3d 
VP Product,...
Every talk about Big Data 
should start with Twin Peaks
There’s more to big data than 
the technology behind it. 
And the best way to find out 
what it is, is to start from the 
...
stream 
of data
ocean 
of data
river 
of data
a data 
leak
data 
firehose
data 
flood
data 
tsunami
data “is” 
fluid
data “is” 
huge
data “is” 
powerful
data “is” 
unpredictable
data “is” 
uncontrollable
Data is the new oil(?!)
We are not going to war for it 
(yet)
Data is not a scarce resource
The abundance of data is the 
result of the instrumentalisation 
of the natural, industrial and 
social worlds
The Large Hadron Collider can record up to 
40 million particle interactions per second
The Square Kilometer Array will collect data from the deep 
space dating back to more than 13 billion years ago
Wolfram Data Science on Facebook Data: how our 
topics of discussion change by age and gender
Carna Botnet: in just 60 seconds nearly 640 Terabytes of IP 
data is tranferred across the globe via the Internet
Machine Sensing
The sensors on the new Airbus 380 generate 10 terabytes of data 
every 30 minutes. That’s 120T every LDN-NYC flight
And yet, another reason why data 
is not the new oil is that we are 
not actually doing it much…
99.5% 
Percentage of newly created digital data 
that’s never analysed
But that’s not strictly true 
either…
0.5% 
Percentage of newly created digital data 
that’s actually being used
higher % of teenagers having sex 
vs 
% of new data being analysed
Credit Scores have replaced the handshake 
with the bank manager
Fair and Isaac came along in 1956. Today they 
crunch around 10 billion scores each year
Buying advertising used to be about smiles 
and jokes over Martini lunches
Now it looks more like this…
11 seconds of trading for the FB shares. Already in 2006 one 
third of all transactions in EU and US was algorithmic
Walmart handles more than 1M customer transactions 
per hour, all affected by price elasticity
Price Discrimination based on log in info, browser history, device, 
A/B testing is common practice for most online retail...
75% of the content Netflix serves is chosen 
based on a Netflix recommendation
At Buzzfeed every item of content has its own dashboard showing how it 
spreads form ‘seed views’ to ‘social views’ and by...
Upworthy 
Systematic experimentation: 15% of the top 
10.000 websites uses A/B testing
Crowdpac matches candidates and funders based on analysis of public 
speeches, contribution and other sources of public da...
LAPD run a pilot to predict where a crime is going to happen next 
(‘crime aftershocks’) based on 13 million crimes over 8...
The Dubai police is equipping officers with Google Glass enabled 
with face recognition to identify potential wanted crimi...
Target Data can predict with 75% accuracy the likelihood 
that a home will sell in the next 30, 60 or 90 days
LinkedIn had a student problem: so they re-arranged the 
data they already have for a student audience
99.5% 
Why then are we throwing away this 
much data?
We are still learning to 
recognize problems as 
data-problems
Big Data changes the very 
definition of how we 
produce knowledge
Less > More 
Exact > Messy 
Causation > Correlation
Significant correlation requires 
scale. And scale is hard to handle.
DNA research is a case in point: DNA data is hard to manipulate and there’s not 
enough sequenced DNA available to establi...
Big Data comes with Big Errors
Data is rarely normalised.
Data is siloed and not verifiable.
Big does not equal whole.
Big does not equal representative.
Data doesn’t speak for 
itself. We speak for it.
Big Data is still biased and the 
result of interpretation.
Correlation doesn’t imply 
causality.
Models are often too simple and 
not peer-reviewed.
Context is hard to interpret at 
scale. Traditional Qual & Quant 
have to work with big data.
3 billion queries/day 
50 million top keywords identified 
5 years of data on flu spread matched 
Overestimates by 50% 
Di...
Big Data also means a big 
new digital divide.
Accessible doesn’t mean ethical.
The problems slowing down 
the adoption of Big Data 
are human problems
And that’s because the 
biggest innovation in Big 
Data is a human innovation
An innovation in 
decision-making: framing, 
solving and actioning a 
problem
“Data is just like crude. It’s 
valuable, but if unrefined it 
cannot really be used. It has to 
be changed into gas, plas...
The opportunity in Big Data 
is data middleware: turning 
crude into gas, plastic, 
chemicals
But until we invent the new plastic, the new gas, the new chemicals, we 
are stuck with the smokescreen. Or even the smoke...
Thank You 
Francesco D’Orazio, @abc3d 
VP Product, PulsarPlatform.com
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Upcoming SlideShare
Loading in …5
×

Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.

"Big data" has been around for a few years now but for every hundred people talking about it there’s probably only one actually doing it. As a result Big Data has become the preferred vehicle for inflated expectations and misguided strategy.

As always, language holds the key and the seed of the issue is reflected in the expression itself. "Big Data" is not so much about a quality of the data or the tools to mine it, it’s about a new approach to product, policy or business strategy design. And that’s way harder and trickier to implement than any new technology stack.

In this talk I look at where Big Data is going, what are the real opportunities, limitations and dangers and what can we do to stop talking about it and start doing it today.

  • Be the first to comment

Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.

  1. 1. Heavy, Messy, Misleading Why Big Data is a human problem, not a technology one Francesco D’Orazio, @abc3d VP Product, PulsarPlatform.com
  2. 2. Every talk about Big Data should start with Twin Peaks
  3. 3. There’s more to big data than the technology behind it. And the best way to find out what it is, is to start from the metaphors of Big Data
  4. 4. stream of data
  5. 5. ocean of data
  6. 6. river of data
  7. 7. a data leak
  8. 8. data firehose
  9. 9. data flood
  10. 10. data tsunami
  11. 11. data “is” fluid
  12. 12. data “is” huge
  13. 13. data “is” powerful
  14. 14. data “is” unpredictable
  15. 15. data “is” uncontrollable
  16. 16. Data is the new oil(?!)
  17. 17. We are not going to war for it (yet)
  18. 18. Data is not a scarce resource
  19. 19. The abundance of data is the result of the instrumentalisation of the natural, industrial and social worlds
  20. 20. The Large Hadron Collider can record up to 40 million particle interactions per second
  21. 21. The Square Kilometer Array will collect data from the deep space dating back to more than 13 billion years ago
  22. 22. Wolfram Data Science on Facebook Data: how our topics of discussion change by age and gender
  23. 23. Carna Botnet: in just 60 seconds nearly 640 Terabytes of IP data is tranferred across the globe via the Internet
  24. 24. Machine Sensing
  25. 25. The sensors on the new Airbus 380 generate 10 terabytes of data every 30 minutes. That’s 120T every LDN-NYC flight
  26. 26. And yet, another reason why data is not the new oil is that we are not actually doing it much…
  27. 27. 99.5% Percentage of newly created digital data that’s never analysed
  28. 28. But that’s not strictly true either…
  29. 29. 0.5% Percentage of newly created digital data that’s actually being used
  30. 30. higher % of teenagers having sex vs % of new data being analysed
  31. 31. Credit Scores have replaced the handshake with the bank manager
  32. 32. Fair and Isaac came along in 1956. Today they crunch around 10 billion scores each year
  33. 33. Buying advertising used to be about smiles and jokes over Martini lunches
  34. 34. Now it looks more like this…
  35. 35. 11 seconds of trading for the FB shares. Already in 2006 one third of all transactions in EU and US was algorithmic
  36. 36. Walmart handles more than 1M customer transactions per hour, all affected by price elasticity
  37. 37. Price Discrimination based on log in info, browser history, device, A/B testing is common practice for most online retailers
  38. 38. 75% of the content Netflix serves is chosen based on a Netflix recommendation
  39. 39. At Buzzfeed every item of content has its own dashboard showing how it spreads form ‘seed views’ to ‘social views’ and by what ‘viral lift’
  40. 40. Upworthy Systematic experimentation: 15% of the top 10.000 websites uses A/B testing
  41. 41. Crowdpac matches candidates and funders based on analysis of public speeches, contribution and other sources of public data on the candidate
  42. 42. LAPD run a pilot to predict where a crime is going to happen next (‘crime aftershocks’) based on 13 million crimes over 80 years
  43. 43. The Dubai police is equipping officers with Google Glass enabled with face recognition to identify potential wanted criminals
  44. 44. Target Data can predict with 75% accuracy the likelihood that a home will sell in the next 30, 60 or 90 days
  45. 45. LinkedIn had a student problem: so they re-arranged the data they already have for a student audience
  46. 46. 99.5% Why then are we throwing away this much data?
  47. 47. We are still learning to recognize problems as data-problems
  48. 48. Big Data changes the very definition of how we produce knowledge
  49. 49. Less > More Exact > Messy Causation > Correlation
  50. 50. Significant correlation requires scale. And scale is hard to handle.
  51. 51. DNA research is a case in point: DNA data is hard to manipulate and there’s not enough sequenced DNA available to establish significant patterns
  52. 52. Big Data comes with Big Errors
  53. 53. Data is rarely normalised.
  54. 54. Data is siloed and not verifiable.
  55. 55. Big does not equal whole.
  56. 56. Big does not equal representative.
  57. 57. Data doesn’t speak for itself. We speak for it.
  58. 58. Big Data is still biased and the result of interpretation.
  59. 59. Correlation doesn’t imply causality.
  60. 60. Models are often too simple and not peer-reviewed.
  61. 61. Context is hard to interpret at scale. Traditional Qual & Quant have to work with big data.
  62. 62. 3 billion queries/day 50 million top keywords identified 5 years of data on flu spread matched Overestimates by 50% Didn’t predict pandemics
  63. 63. Big Data also means a big new digital divide.
  64. 64. Accessible doesn’t mean ethical.
  65. 65. The problems slowing down the adoption of Big Data are human problems
  66. 66. And that’s because the biggest innovation in Big Data is a human innovation
  67. 67. An innovation in decision-making: framing, solving and actioning a problem
  68. 68. “Data is just like crude. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc., to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value” Michael Palmer
  69. 69. The opportunity in Big Data is data middleware: turning crude into gas, plastic, chemicals
  70. 70. But until we invent the new plastic, the new gas, the new chemicals, we are stuck with the smokescreen. Or even the smoke monster.
  71. 71. Thank You Francesco D’Orazio, @abc3d VP Product, PulsarPlatform.com

×