Successfully reported this slideshow.

Data Tools and the Data Scientist Shortage

15

Share

Loading in …3
×
1 of 22
1 of 22

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Data Tools and the Data Scientist Shortage

  1. 1. 1  ©  Cloudera,  Inc.  All  rights  reserved.   Data  Tools  and  the  Data   Scien;st  Shortage   Wes  McKinney  @wesmckinn   Data  Summit  @  Web  Summit  2015-­‐11-­‐04  
  2. 2. 2  ©  Cloudera,  Inc.  All  rights  reserved.   Me  
  3. 3. 3  ©  Cloudera,  Inc.  All  rights  reserved.   Career  theme:  Serial  creator  of  data  tools  
  4. 4. 4  ©  Cloudera,  Inc.  All  rights  reserved.   hMps://hbr.org/2012/10/data-­‐scien;st-­‐the-­‐sexiest-­‐job-­‐of-­‐the-­‐21st-­‐century/  
  5. 5. 5  ©  Cloudera,  Inc.  All  rights  reserved.   hMp://www.bloomberg.com/news/ar;cles/2015-­‐06-­‐04/help-­‐wanted-­‐black-­‐belts-­‐in-­‐data  
  6. 6. 6  ©  Cloudera,  Inc.  All  rights  reserved.   “The  United  States  alone  faces  a  shortage  of  140,000   to  190,000  people  with  analy;cal  exper;se  and  1.5   million  managers  and  analysts  with  the  skills  to   understand  and  make  decisions  based  on  the  analysis   of  big  data.”     McKinsey  &  Co   hMp://www.mckinsey.com/features/big_data  
  7. 7. 7  ©  Cloudera,  Inc.  All  rights  reserved.   Source:  Drew  Conway,  “The  Data  Science  Venn  Diagram”   Tradi;onal  view  of  Data  Science  
  8. 8. 8  ©  Cloudera,  Inc.  All  rights  reserved.   Analyzing  the  Analyzers,  Harris,  Murphy,  Vaisman   Many  Kinds  of  “Data  People”  
  9. 9. 9  ©  Cloudera,  Inc.  All  rights  reserved.   Analyzing  the  Analyzers,  Harris,  Murphy,  Vaisman   Many  Kinds  of  “Data  People”  
  10. 10. 10  ©  Cloudera,  Inc.  All  rights  reserved.   Addressing  the  analy;cal  shortage   Educa;on   Culture   Tools  
  11. 11. 11  ©  Cloudera,  Inc.  All  rights  reserved.   Data  process  
  12. 12. 12  ©  Cloudera,  Inc.  All  rights  reserved.   The  “Great  Decoupling”  for  Industry  Analy;cs   UI ComputeStorage
  13. 13. 13  ©  Cloudera,  Inc.  All  rights  reserved.   The  “Great  Decoupling”  for  Industry  Analy;cs   UI ComputeStorage Accumula;on  of  user  ;me   Legacy  technology:   ver;cally-­‐integrated   solu;ons  
  14. 14. 14  ©  Cloudera,  Inc.  All  rights  reserved.   Ubiquitous  Real-­‐Time  Storage  and   Compute:  A  view  from  2040  
  15. 15. 15  ©  Cloudera,  Inc.  All  rights  reserved.   Data  analysis  hierarchy  of  needs   Data Storage / Access Clean Data Analysis and Visualization Productivity tools / UI
  16. 16. 16  ©  Cloudera,  Inc.  All  rights  reserved.   Some  data  tooling  UI  innova;ons  
  17. 17. 17  ©  Cloudera,  Inc.  All  rights  reserved.   Rejec;ng  the  “Highlander  Fallacy”  
  18. 18. 18  ©  Cloudera,  Inc.  All  rights  reserved.   SQL  Programming:  the  “mainframe   punch  cards”  of  our  ;me  
  19. 19. 19  ©  Cloudera,  Inc.  All  rights  reserved.   Many  SQL  engines   …  and  more  
  20. 20. 20  ©  Cloudera,  Inc.  All  rights  reserved.   Execu;ng  data  science  languages  in  the  compute  layer   UI Ibis, SQL, Spark API, … Compute Analytic SQL, Spark, MapReduce Storage HDFS, Kudu, HBase Python, R, Julia, …?
  21. 21. 21  ©  Cloudera,  Inc.  All  rights  reserved.  
  22. 22. 22  ©  Cloudera,  Inc.  All  rights  reserved.   Thank  you   Wes  McKinney  @wesmckinn   Views  are  my  own  

×