Data Scientists:Myths &
Mathemagical Powers
      James Kobielus
James Kobielus shoots down
10 myths about Data Scientists



      “Data Scientists: Myths and Mathemagical Powers,”
    James Kobielus, Thinking Inside the Box, June 29, 2012
Myth #1




Data scientists are mythical
 beings, like the unicorns.
IBMbigdatahub.com
IBMbigdatahub.com
Myth #2




 Data scientists are an elite
bunch of precious eggheads.
Data scientists get their fingernails
  dirty dumping piles of data into
 analytical sandboxes, cleansing,
  and sifting through it for useful
patterns that may or may not exist.
  Then, they do it all over again.



              Reality #2    IBMbigdatahub.com
Data scientists get their fingernails
                  It’s ofte
               nu piles n mind- into
  dirty dumpingm
                     bingly
                           of data
 analytical sandboxes, detailed
                 grunt       cleansing,
             the sp      work,
                     ort of a n useful
  and sifting through it for ot
                             rm
              data por may chairexist.
patterns that may hiloso not
                             phers.
  Then, they do it all over again.



              Reality #2     IBMbigdatahub.com
Myth #3




Data scientists are a nouveau
   fad that will soon fade.
The term “data scientist” has been
around for years, and the various
   advanced analytics specialties
  that fall under it are even older.
Recently, the term has been used
 in the convergence of disciplines
    that have become super-hot.


             Reality #3    IBMbigdatahub.com
The term “data scientist” has been
around for years, and the various
   advanced analytics specialties
  that fall growth
               under      n job
                        iit are even older.
     Ste  ady the academic been used
Recently,and term has.
      st i ngs              iable
                   unden
    lithe convergence of disciplines
 in ricula is
    c ur               fad.
    that Thi   s is no
             have become super-hot.


                Reality #3       IBMbigdatahub.com
Myth #4




Data scientists are all just
  PhD statisticians who
 failed to make tenure.
Many data scientists acquired
 their quantitative and statistical
   modeling skills in college, but
   pursued degrees in business
  administration, economics and
engineering. They actually know
    about business problems.


            Reality #4     IBMbigdatahub.com
M ny
  Many dataascientists acquired
                   data s
                                c entis
            you’ll and istatistical
 their quantitativenco
                   e                    ts
            the wo           unter
   modeling skills rking
                    in college, but  in
          are bu                world
                 sine in business
   pursued degreesss dom
               sp e c ia            ain
  administration, economics and
                         l i st s !
engineering. They actually know
    about business problems.


               Reality #4       IBMbigdatahub.com
Myth #5




  Data scientists are just BI
specialists with fancier titles.
Many longtime BI power users
 are, in fact, data scientists of a
 sort. They are business domain
  specialists whose jobs involve
multivariate analysis, forecasting,
what-if modeling, and simulation.



             Reality #5   IBMbigdatahub.com
nt
                    meBI power users
 Many develop ey
       er longtime
 Care            i f th
                tdata scientists of a
 are,yintall ou speed
    a s fact, to
  m           p
           y uare business domain
 sort.t They e Hadoop
 do n’ sta ik
  on to ictiv
  specialists e mod     e ing.
        pics l whose ljobs involve
      pred
multivariate analysis, forecasting,
and
what-if modeling, and simulation.



             Reality #5     IBMbigdatahub.com
Myth #6




 Data scientists aren’t really
scientists in any meaningful
     sense of the word.
Statistical controls are the
  bedrock of true science—the core
responsibility of the data scientist. If
 data scientists are confirming their
 findings through statistical controls
and real-world experiments, they’re
     scientists, plain and simple.


               Reality #6     IBMbigdatahub.com
Statistical controls are the
  bedrock of true science—the core
responsibility of the data scientist. If
                  True s
                         cience
 data scientistsnare confirming their
                  othing         is
                           withou
 findings throughvstatistical tcontrols
               obser
                     ationa
                             l data
and real-world experiments, .they’re
     scientists, plain and simple.


               Reality #6     IBMbigdatahub.com
Myth #7




 Data scientists need fancy,
 expensive statistical power
tools to get their work done.
The job of the data scientists is to
 look for hidden patterns. They can
accomplish this through user-friendly
  visualization tools, search-driven
 BI tools and other approaches that
   don’t require a deep mastery of
          statistical analysis.


              Reality #7    IBMbigdatahub.com
The job of the data scientists is to
 look for hidden patterns. They can
accomplish rthisfo ory  r cost- user-friendly
               a ket through
      The m explorat
  visualization tools, y
           ctive            n search-driven
      effe           as ma g
 BI tools tools h cludin
        BI and other approaches that
   don’t end    ors, ina deep mastery of
        v require gnos.
             I BM C o
            statistical analysis.


                 Reality #7      IBMbigdatahub.com
Myth #8




Data scientists simply pour
data into Hadoop and pull
out mind-blowing insights.
The data scientist will be the
first to tell you that Hadoop is
just another platform for deep
      exploration into data.




           Reality #8    IBMbigdatahub.com
There
                      i n’t a
 The data scientistswill be the
              Ouija           magic
                     board
first to tell youich
               wh that Hadoop h
                             throug is
                      the big
just anotherspirits sp forddeep
                platform          ata
                        eak to
                 me e m
      exploration rintoodata. s   u
                           rtals.




             Reality #8       IBMbigdatahub.com
Myth #9




 Data scientists are analytics
junkies who couldn’t care less
 about business applications.
If you spend time with any real-
  world data scientist, they’ll bend
    your ear discussing how they
tackled a specific business problem,
 such as reducing customer churn,
  targeting offers across channels,
    and mitigating financial risks.


             Reality #9    IBMbigdatahub.com
If you spend time withnany real-
                              e t i st s
                       ta sci
  world data ost da rds. They bend
            Mscientist, they’ll
             are  n’t ne
    your ear discussing how    egarthey d
                       e ople r ingo
            kn  ow pbusinessl problem,
tackled a specific big data on.
            al l th is       g jarg churn,
                       u si n
 such as reducing fcustomer
             as con
  targeting offers across channels,
    and mitigating financial risks.


               Reality #9      IBMbigdatahub.com
Myth #10




Data scientists don’t have any
responsibilities that force them
   out of their ivory towers.
That used to be the case. However,
 as next best action and real-world
experiments become ubiquitous, the
  data scientist is evolving into the
  role that stokes, tweaks and fuels
        the operational engine.



             Reality #10   IBMbigdatahub.com
That used to be the case. However,
       Da best action and real-world
 as nextta scien
      analy        tists te
                            s the
            tic become t ubiquitous, the
experiments- cent
       at the        ric mo
                              dels
  data scientistrt oevolving into the
               hea is
       busine           f agile
               ss pro tweaks and fuels
  role that stokes,cess
                            es.
        the operational engine.



              Reality #10     IBMbigdatahub.com
For more from James Kobielus and
  other big data thought leaders,
     visit The Big Data Hub at
       IBMbigdatahub.com

Myths and Mathemagical Superpowers of Data Scientists

  • 1.
  • 2.
    James Kobielus shootsdown 10 myths about Data Scientists “Data Scientists: Myths and Mathemagical Powers,” James Kobielus, Thinking Inside the Box, June 29, 2012
  • 3.
    Myth #1 Data scientistsare mythical beings, like the unicorns.
  • 4.
  • 5.
  • 6.
    Myth #2 Datascientists are an elite bunch of precious eggheads.
  • 7.
    Data scientists gettheir fingernails dirty dumping piles of data into analytical sandboxes, cleansing, and sifting through it for useful patterns that may or may not exist. Then, they do it all over again. Reality #2 IBMbigdatahub.com
  • 8.
    Data scientists gettheir fingernails It’s ofte nu piles n mind- into dirty dumpingm bingly of data analytical sandboxes, detailed grunt cleansing, the sp work, ort of a n useful and sifting through it for ot rm data por may chairexist. patterns that may hiloso not phers. Then, they do it all over again. Reality #2 IBMbigdatahub.com
  • 9.
    Myth #3 Data scientistsare a nouveau fad that will soon fade.
  • 10.
    The term “datascientist” has been around for years, and the various advanced analytics specialties that fall under it are even older. Recently, the term has been used in the convergence of disciplines that have become super-hot. Reality #3 IBMbigdatahub.com
  • 11.
    The term “datascientist” has been around for years, and the various advanced analytics specialties that fall growth under n job iit are even older. Ste ady the academic been used Recently,and term has. st i ngs iable unden lithe convergence of disciplines in ricula is c ur fad. that Thi s is no have become super-hot. Reality #3 IBMbigdatahub.com
  • 12.
    Myth #4 Data scientistsare all just PhD statisticians who failed to make tenure.
  • 13.
    Many data scientistsacquired their quantitative and statistical modeling skills in college, but pursued degrees in business administration, economics and engineering. They actually know about business problems. Reality #4 IBMbigdatahub.com
  • 14.
    M ny Many dataascientists acquired data s c entis you’ll and istatistical their quantitativenco e ts the wo unter modeling skills rking in college, but in are bu world sine in business pursued degreesss dom sp e c ia ain administration, economics and l i st s ! engineering. They actually know about business problems. Reality #4 IBMbigdatahub.com
  • 15.
    Myth #5 Data scientists are just BI specialists with fancier titles.
  • 16.
    Many longtime BIpower users are, in fact, data scientists of a sort. They are business domain specialists whose jobs involve multivariate analysis, forecasting, what-if modeling, and simulation. Reality #5 IBMbigdatahub.com
  • 17.
    nt meBI power users Many develop ey er longtime Care i f th tdata scientists of a are,yintall ou speed a s fact, to m p y uare business domain sort.t They e Hadoop do n’ sta ik on to ictiv specialists e mod e ing. pics l whose ljobs involve pred multivariate analysis, forecasting, and what-if modeling, and simulation. Reality #5 IBMbigdatahub.com
  • 18.
    Myth #6 Datascientists aren’t really scientists in any meaningful sense of the word.
  • 19.
    Statistical controls arethe bedrock of true science—the core responsibility of the data scientist. If data scientists are confirming their findings through statistical controls and real-world experiments, they’re scientists, plain and simple. Reality #6 IBMbigdatahub.com
  • 20.
    Statistical controls arethe bedrock of true science—the core responsibility of the data scientist. If True s cience data scientistsnare confirming their othing is withou findings throughvstatistical tcontrols obser ationa l data and real-world experiments, .they’re scientists, plain and simple. Reality #6 IBMbigdatahub.com
  • 21.
    Myth #7 Datascientists need fancy, expensive statistical power tools to get their work done.
  • 22.
    The job ofthe data scientists is to look for hidden patterns. They can accomplish this through user-friendly visualization tools, search-driven BI tools and other approaches that don’t require a deep mastery of statistical analysis. Reality #7 IBMbigdatahub.com
  • 23.
    The job ofthe data scientists is to look for hidden patterns. They can accomplish rthisfo ory r cost- user-friendly a ket through The m explorat visualization tools, y ctive n search-driven effe as ma g BI tools tools h cludin BI and other approaches that don’t end ors, ina deep mastery of v require gnos. I BM C o statistical analysis. Reality #7 IBMbigdatahub.com
  • 24.
    Myth #8 Data scientistssimply pour data into Hadoop and pull out mind-blowing insights.
  • 25.
    The data scientistwill be the first to tell you that Hadoop is just another platform for deep exploration into data. Reality #8 IBMbigdatahub.com
  • 26.
    There i n’t a The data scientistswill be the Ouija magic board first to tell youich wh that Hadoop h throug is the big just anotherspirits sp forddeep platform ata eak to me e m exploration rintoodata. s u rtals. Reality #8 IBMbigdatahub.com
  • 27.
    Myth #9 Datascientists are analytics junkies who couldn’t care less about business applications.
  • 28.
    If you spendtime with any real- world data scientist, they’ll bend your ear discussing how they tackled a specific business problem, such as reducing customer churn, targeting offers across channels, and mitigating financial risks. Reality #9 IBMbigdatahub.com
  • 29.
    If you spendtime withnany real- e t i st s ta sci world data ost da rds. They bend Mscientist, they’ll are n’t ne your ear discussing how egarthey d e ople r ingo kn ow pbusinessl problem, tackled a specific big data on. al l th is g jarg churn, u si n such as reducing fcustomer as con targeting offers across channels, and mitigating financial risks. Reality #9 IBMbigdatahub.com
  • 30.
    Myth #10 Data scientistsdon’t have any responsibilities that force them out of their ivory towers.
  • 31.
    That used tobe the case. However, as next best action and real-world experiments become ubiquitous, the data scientist is evolving into the role that stokes, tweaks and fuels the operational engine. Reality #10 IBMbigdatahub.com
  • 32.
    That used tobe the case. However, Da best action and real-world as nextta scien analy tists te s the tic become t ubiquitous, the experiments- cent at the ric mo dels data scientistrt oevolving into the hea is busine f agile ss pro tweaks and fuels role that stokes,cess es. the operational engine. Reality #10 IBMbigdatahub.com
  • 33.
    For more fromJames Kobielus and other big data thought leaders, visit The Big Data Hub at IBMbigdatahub.com