Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

of

What is Data Science Slide 1 What is Data Science Slide 2 What is Data Science Slide 3 What is Data Science Slide 4 What is Data Science Slide 5 What is Data Science Slide 6 What is Data Science Slide 7 What is Data Science Slide 8 What is Data Science Slide 9 What is Data Science Slide 10 What is Data Science Slide 11 What is Data Science Slide 12 What is Data Science Slide 13 What is Data Science Slide 14 What is Data Science Slide 15 What is Data Science Slide 16 What is Data Science Slide 17 What is Data Science Slide 18 What is Data Science Slide 19 What is Data Science Slide 20 What is Data Science Slide 21 What is Data Science Slide 22 What is Data Science Slide 23
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

3 Likes

Share

Download to read offline

What is Data Science

Download to read offline

Looking for an objective, complete, inclusive, accurate and succinct definition of this emerging field

What is Data Science

  1. 1. What  is  Data  Science Looking  for  an  objective,  complete,  inclusive,  accurate  and  succinct   definition  of  this  emerging  field Ioannis  Kourouklides www.kourouklides.com
  2. 2. Contents • Introduction • History • Related  terms • Definitions  by  various  individuals • Domain  expertise • Data  Science  in  the  job  market • How  Data  Scientists  are  self-­‐defined • Summary • Conclusion • References  &  Bibliography
  3. 3. Introduction • In  a  Forbes  article,  Gil  Press  (2013)   admits  himself,  among  others,  that   Data  Science  (DS)  is  a  buzzword without  a  clear  definition • A  quick  search  in  online  and  print  resources  verifies  this  lack  of  description • Several  people  and  companies  expressed  their  own  opinion  on  the  matter • Nonetheless,  most  definitions  overlap  with  each  other • Data  Science  is  not  concerned  with  everything  that  has  to  do  with  data • A  brief  look  at  the  recent  history  can  give  more  insight • The  proper  (concrete)  definition  of  this  science  would  have  to  come  from   the  industry rather  than  academia and  might  keep  evolving  through  time
  4. 4. History • The  term  “Data  Science”  has  been  around  for  more  than  30  years • It  did  not  always  have  the  same  meaning,  but  it  picked  up  since  then • Gil  Press  (2013)  authored  an  article  about  the  evolution  of  the  term • 1966:  Peter  Naur  used  the  term  “Science  of  Data”  interchangeably  with   “Datalogy”  as  a  synonym  of  Computer  Science  in  his  courses  (Naur,  1968) • 1974:  Naur  published  the  book  ‘Concise  Survey  of  Computer  Methods’ which  is  a  survey  of  modern  data  processing  methods • 1989:  Gregory  Piatetsky-­‐Shapiro  organized  and  chaired  the  first  Knowledge   Discovery  in  Databases  workshop.  In  1995,  it  became  the  annual  ACM   Conference  on  Knowledge  Discovery  and  Data  Mining  (KDD).
  5. 5. History • 1996:  International  Federation  of  Classification  Societies  (IFCS)  used  the   term  “Data  Science”  for  the  first  time  in  the  title  of  the  conference (“Data  science,  classification,  and  related  methods”) • 1997: C.F.  Jeff  Wu  gave  his  inaugural  lecture  entitled  ‘Statistics  =  Data   Science?’  (“Identity  of  statistics  in  science  examined,”  1997) • 2001:  William  S.  Cleveland  published  ‘Data  Science:  An  Action  Plan  for   Expanding  the  Technical  Areas  of  the  Field  of  Statistics’ • 2002:  Launch  of  ‘Data  Science  Journal’  by  CODATA  of  ICSU • 2003:  Launch  of  ‘Journal  of  Data  Science’  by  Columbia  University • 2005:  National  Science  Board  defined  what  a  Data  Scientist  is • 2007:  Nathan  Yau  wrote  about  the  “Rise  of  the  Data  Scientist”
  6. 6. Related  terms • But  let’s  look  at  some  related  (possibly  overlapping)  terms: • Machine  Learning • Data  Mining • Predictive  Analytics • Statistics • Big  Data • Data  Analysis • Business  Intelligence • Data  Engineering • Business  Analytics • Knowledge  Discovery  in  Databases • For  a  comparison  of  these  terms  with  Data  Science:  http://goo.gl/uW15El
  7. 7. Definition  by  M.  Loukides • Loukides  (2010)  wrote  an  article  about  ‘What  is  data  science?’ • “Data  science  requires  skills  ranging  from  traditional  computer science to   mathematics to  art.” • “Data  scientists  combine  entrepreneurship with  patience,  the  willingness  to   build  data  products  incrementally,  the  ability  to  explore,  and  the  ability  to   iterate  over  a  solution.  They  are  inherently  interdisciplinary.  They  can   tackle  all  aspects  of  a  problem,  from  initial  data  collection  and  data   conditioning  to  drawing  conclusions.” • This  is  not  a  very  precise  definition,  but  it  is  insightful  enough • He  also  highlighted  the  industry’s  perspective  and  the  escalated  job  trends
  8. 8. Definition  by  D.  Conway • Conway  (2010)  gave  a  less  vague  definition: “…one  needs  to  learn  a  lot  as  they  aspire  to   become  a  fully  competent  data  scientist.   Unfortunately,  simply  enumerating  texts  and   tutorials  does  not  untangle  the  knots.  Therefore,   in  an  effort  to  simplify  the  discussion,  and  add   my  own  thoughts  to  what  is  already  a  crowded   market  of  ideas,  I  present  the Data  Science  Venn   Diagram…  hacking  skills,  math  and  stats   knowledge,  and  substantive  expertise.”
  9. 9. Definition  by  P.  Warden • An  other  description  of  DS  (Warden,  2011)  appears  to  be  the  following: • “There  is  no  widely  accepted  boundary  for  what’s  inside  and  outside  of   data  science’s  scope.  Is  it  just  a  faddish  rebranding  of  statistics?  I  don’t   think  so,  but  I  also  don’t  have  a  full  definition.  I  believe  that  the  recent   abundance  of  data  has  sparked  something  new  in  the  world,  and  when  I   look  around  I  see  people  with  shared  characteristics  who  don’t  fit  into   traditional  categories.  These  people  tend  to  work  beyond  the  narrow   specialties  that  dominate  the  corporate  and  institutional  world,  handling   everything  from  finding the  data,  processing it  at  scale,  visualizing it  and   writing  it  up  as  a  story.  They  also  seem  to  start  by  looking  at  what  the  data   can  tell  them,  and  then  picking  interesting  threads  to  follow,  rather  than   the  traditional  scientist’s  approach  of  choosing  the  problem  first  and  then   finding  data  to  shed  light  on  it.”
  10. 10. Definition  by  J.  Wills
  11. 11. Definition  by  B.  Tierney
  12. 12. Definition  by  F.  Lo • When  searching  online  the  phrase  ‘define  data  science’,  an  excellent  article   (Lo,  2013)  appears  as  the  suggested/endorsed  answer  by  Google • “Data  science  is  multidisciplinary;  the  skill  set  of  a   data  scientist  lies  at  the  intersection  of  3  main   competencies.” • “Also,  a  big  misconception  is  that  data  science  is   all  about  statistics.  While  statistics  are  important,   it  is  not  the  only  type  of  mathematics  that  should   be  well-­‐understood  by  a  data  scientist.” • “A  defining  personality  trait  of  data  scientists  is   they  are  deep  thinkers  with intense  intellectual   curiosity.”
  13. 13. Definition  by  M.  Mut • Mut  (2013)  went  a  step  further  and  classified  Data  Scientists  into  3  distinct   specialties  with  very  little  overlap: • “Advanced  Analysis:  Math,  Stats,  Pattern  Recognition/Learning,  Uncertainty,   Visualization,  Data  Mining” – let’s  call  them  Data  Researchers • “Computer  Systems  -­‐ Advanced  Computing,  High  Performance  Computing,   Visualization,  Data  Mining” – let’s  call  them  Data  Hackers • “Databases -­‐ Data  Engineering,  Data  Warehousing”  – let’s  call  them  Data Developers • He  claimed  that  DS  is  defined  to  include  all  these  specialties  and  thus   makes  life  confusing  for  employers  and  applicants • He  proposed  a  solution  would  be  to  educate  HR  and  employers  that  they   need  to  break  DS  into  specialties
  14. 14. Definition  by  V.  Granville • However,  Granville  (2014)  and  others  disagreed  with  Mut.  They  maintained   that combining  these  different  areas  is  not  impossible  and  they  forecasted   that  in  the  future  there  will  be  more  skills  overlap  within  individuals • In  his  book  ‘Developing  Analytic  Talent:  Becoming  a  Data  Scientist’  he  seems   to  provide  the  most  convincing  and  conforming  definition:   • “Data  Science  is  the  intersection  of  computer  science,  business  engineering,   statistics,  data  mining,  machine  learning,  operations  research,  Six  Sigma,   automation  and  domain  expertise.” • “…  people  interested  in  a  data  science  career  don’t  need  to  learn  […]   everything  from  these  domains.”
  15. 15. Domain  expertise • Domain  expertise  and  business  acumen  are  totally  essential  for  DS • This  depends  on  the  kind  of  data  and  their  source,  such  as: • Bioinformatics  &  Genomics • Information  Security • Computer  Vision  &  Image  Processing • Finance  &  Econometrics • Insurance • Marketing • Medicine,  Health  &  Biomedical  applications • Particle  Physics • Social  Networks • Telecoms  &  Utilities • Web  &  Text  Mining
  16. 16. Data  Science  in  the  job  market • Data  Scientist  roles  can  be  referred  to  by  various  names  according  to  the   seniority  level,  the  specific  skillset  and  area  of  expertise • Frequently  required  skills  are: • Hadoop/MapReduce/MongoDB/Hive  (not  always  necessary,  sometimes  as  a  plus) • SQL  (though  less  popular  than  NoSQL) • Perl/Java/PHP/.NET/Ruby/C++ • Machine  Learning  techniques • Python/R/MATLAB/Octave/SPSS/SAS/Stata/Mathematica • Advanced  level  degree:  MSc  or  PhD • Work  experience  (typically  more  than  1-­‐3  years) • Communications  skills
  17. 17. Data  Science  in  the  job  market
  18. 18. How  Data  Scientists  are  self-­‐defined • Harris  et  al.  (2013)  identified  four clusters  (latent  factors)  of   Data  Scientists   in  their  book,  using  Non-­‐negative  Matrix  Factorization: • The  three  specializations  overlap  with  the  ones  mentioned  by  Mut (2013) • The  forth  one  refers  mostly  to  CDOs  (Chief  Data  Officers),  self-­‐identified  as:   Leaders,  Businesspersons,  or  Entrepreneurs Data Researcher Researcher Scientist Statistician Data Hacker Hacker Artist Jack  of  All  Trades Data  Developer Developer Engineer -­‐
  19. 19. How  Data  Scientists  are  self-­‐defined • The three specializations have started to emerge as three job positions: • Nothing stops a person who studied Science from becoming a Data Developer or Data Hacker and nothing stops a person who studied Engineering from becominga Data Researcher • Thus, it is the author’s belief that the terms ‘Scientist’ and ‘Engineer’ should not have been used, as they are misleading Data Researcher Data  Scientist Data Hacker Machine Learning  Engineer Data  Developer Data Engineer
  20. 20. Summary • In  brief,  one  can  split  down  the  skills  defining  DS  into  three  groups: Note:  Each  column  above  is  not  related  to  the  adjacent  ones Soft  skills Communication Business  knowledge Domain  expertise Knowledge &  Research  skills Machine  Learning  – Data  Mining Statistics  &  other  Maths Relational  Databases High  Performance  Computing Data  Visualization Coding  skills Perl/Java/C#/PHP/Ruby/C++ Python/R/MATLAB/Octave SPSS/SAS/Stata/Mathematica Hadoop/MongoDB/Hive SQL/JSON/XML/HTML/CSS
  21. 21. Conclusion • To  sum  up,  DS  is  an  interdisciplinary science,  but  without  a  clear  definition • It  can  be  defined  as  a  set  of  skills  from  Computer  Science,  Statistics,  … • It  definitely  requires  some  Research qualities,  but  also  Domain Expertise • Machine  Learning  is  at  the  epicentre  of  this  newly  coined  term • Different  Data  Scientists  used  to  focus  or  specialize  in  one  area  of  expertise • It  is  the  author’s  belief  that  future  Data  Professionals  will  be  required  to  have   three  distinct specializations  similar  to  Quantitative  Professionals,  i.e.  Quant   Researchers,  Quant  Traders and  Quant  Developers  corresponding  to  Data   Scientists,  Machine  Learning  Engineers  and  Data  Engineers respectively • More  resources  can  be  found  at  the  next  slides
  22. 22. References  &  Bibliography 1. Gil  Press  http://www.forbes.com/sites/gilpress/2013/05/28/a-­‐very-­‐short-­‐ history-­‐of-­‐data-­‐science/ 2. . Naur,  P.,  “'Datalogy',  the  science  of  data  and  data  processes.” IFIP   Congress  2,  1968,  pp.  1383-­‐1387. 3. "Identity  of  statistics  in  science  examined".  The  University  Records,  9   November  1997,  The  University  of  Michigan.   http://ur.umich.edu/9899/Nov09_98/4.htmRetrieved  8  August  2014. 4. Cleveland,  W.  S.  (2001).  "Data  Science:  An  Action  Plan  for  Expanding  the   Technical  Areas  of  the  Field  of  Statistics". International  Statistical  Review   /  Revue  Internationale  de  Statistique 69 (1). 5. .  http://radar.oreilly.com/2010/06/what-­‐is-­‐data-­‐science.html
  23. 23. References  &  Bibliography • http://www.theguardian.com/news/datablog/2012/mar/02/data-­‐scientist#_ • http://www.indeed.com/trendgraph/jobgraph.png?q=%22data+scientist%22 • http://www.indeed.com/trendgraph/jobgraph.png?q=%22machine+learning%22 • http://blogs.nature.com/naturejobs/2013/03/18/so-­‐you-­‐want-­‐to-­‐be-­‐a-­‐data-­‐ scientist • http://www.forbes.com/sites/rawnshah/2014/01/16/revealing-­‐data-­‐sciences-­‐ job-­‐potential/ • http://www.kdnuggets.com/2014/03/data-­‐scientist-­‐right-­‐career-­‐path-­‐candid-­‐ advice.html • http://www.oreilly.com/data/free/analyzing-­‐the-­‐ analyzers.csp?goback=%2Egde_2013423_member_254847898 • http://www.kddi-­‐ri.jp/download/report/RA2014006
  • cherhan

    Jun. 2, 2018
  • DevarajanPerumal

    Jun. 2, 2018
  • AndreasLampis

    May. 30, 2018

Looking for an objective, complete, inclusive, accurate and succinct definition of this emerging field

Views

Total views

2,202

On Slideshare

0

From embeds

0

Number of embeds

98

Actions

Downloads

25

Shares

0

Comments

0

Likes

3

×