THE BIG DATA CON
Why Big Data is a Problem Not a Solution



           #TheBigDataCon
IAN PLOSKER
Director, Technical Operations, EMEA
          Basho Technologies
           @dstroyallmodels
WHO IS




basho
   ?
WE MAKE
DISCLAIMERS
ALL OPINIONS EXPRESSED HEREIN
ARE MY OWN AND NOT THOSE OF
 MY EMPLOYER OR ANYONE ELSE
I’M NOT TROLLING
I’M NOT FUD RAKING
WELL, MAYBE JUST A LITTLE
          BIT
LET’S GET STARTED
WHAT IS BIG DATA?
HOW BIG IS BIG?
GIGABYTES, TERABYTES,
PETABYTES, EXABYTES?
IF THERE’S BIG DATA,
SHOULDN’T THERE ALSO BE
MEDIUM AND SMALL DATA
BIG DATA IS MORE DATA
THAN YOU KNOW WHAT TO
        DO WITH
BIG DATA IS THE DATA THAT
YOU DON’T KNOW WHAT TO
         DO WITH
THE PROMISE OF BIG DATA
IS TO STORE THE REAMS
YOU DON’T KNOW HOW TO
          USE
SO YOU CAN EXTRACT
VALUE FROM IT IN THE
      FUTURE
LET’S BE HONEST
THE PEOPLE WHO ARE
MAKING MONEY OFF OF BIG
         DATA
ARE THE PEOPLE
EXTRACTING VALUE FROM IT
         TODAY
I.E. THE PEOPLE SELLING BIG
      DATA SOLUTIONS
EVERYONE AND THEIR
GRANDMA ARE TRYING TO
 GET IN ON THE ACTION
VENDORS ARE REPACKAGING
   THE SAME OLD THING
AND TRYING TO TRICK US
INTO THINKING ITS THE NEW
         HOTNESS
LET’S NOT BE FOOLED BY
       MARKETING
WORDS ARE IMPORTANT
MARKETING WORKS TO
SEPARATE WORDS FROM
    THEIR MEANING
FROM THEIR ORIGINAL
      INTENT
TO GET YOU TO ASSOCIATE
        WORDS
WITH PARTICULAR
PRODUCTS
BRANDS
AND VENDORS
</RANT>
LET’S TRY TO IMPROVE THE
  STATE OF DISCOURSE
SO LET’S BRING SOME NEW
CATCHPHRASES TO THE TABLE
SO WE DON’T HAVE TO TALK
ABOUT BIG DATA ANYMORE
INSTEAD LET’S TALK ABOUT
     CRITICAL DATA
WHAT IS CRITICAL DATA?
Is your data really
    that critical,
       dude?
IT’S MISSION CRITICAL DATA
IT’S DATA WHOSE
 UNAVAILABILITY
COSTS YOU MONEY
CRED
OR LIVES
IT IS THE DATA NEEDED NOW
NOT AT SOME DISTANT
POINT IN THE FUTURE
IT IS THE DATA THAT YOU
YOUR CUSTOMERS
OR SOCIETY
CAN CAPITALIZE ON TODAY
ITS VALUE IS CAPTURED BY
THE OWNERS OF THE DATA
RATHER THAN A THIRD PARTY
HOW DO YOU IDENTIFY
 YOUR CRITICAL DATA?
LATENCY AT SOME MARGIN
    APPEARS SIMPLY AS
     UNAVAILABILITY
Who cares about latency?




Sometimes high latency looks like an outage
             to the end user.
FOR AMAZON:
       100MS LATENCY
    DECREASES SALES BY 1%


Source: http://sites.google.com/site/glinden/Home/StanfordDataMining.2006-11-28.ppt
FOR GOOGLE:
    A 500MS INCREASE IN LATENCY
          REDUCED TRAFFIC
           BY 20 PERCENT



Source: http://sites.google.com/site/glinden/Home/StanfordDataMining.2006-11-28.ppt
WHAT DOES A SYSTEM FOR
CRITICAL DATA LOOK LIKE?
STREAMING PROCESSING
STORM
RIAK_PIPE
EXAMPLES OF DYNAMO
      SYSTEMS
VOLDEMORT
THESE SYSTEMS SACRIFICE
 CONSISTENCY FOR HIGH
AVAILABILITY/LOW LATENCY
I'M NOT MAKING A SALES
         PITCH
DON'T USE MY DATABASE
SERIOUSLY, DON'T
UNLESS
YOU ARE WILLING TO
    SACRIFICE
FAMILIAR DATA AND QUERY
         MODELS
FAMILIAR HIRING PATTERNS
KNOWN OPERATIONAL
     ISSUES
FOR
PREDICTABLE LATENCY
AVAILABILITY
PREDICTABLE OPERATIONS
OR IF DATA UNAVAILABILITY
 COSTS YOU $$$ OR MORE
THANKS
ian@basho.com

  @dstroyallmodels

github.com/ian-plosker

The Big Data Con: Why Big Data is a Problem, not a Solution - Ian Plosker