1	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Data	
  Tools	
  and	
  the	
  Data	
  
Scien;st	
  Shortage	
  
Wes	
  McKinney	
  @wesmckinn	
  
Data	
  Summit	
  @	
  Web	
  Summit	
  2015-­‐11-­‐04	
  
2	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Me	
  
3	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Career	
  theme:	
  Serial	
  creator	
  of	
  data	
  tools	
  
4	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
hMps://hbr.org/2012/10/data-­‐scien;st-­‐the-­‐sexiest-­‐job-­‐of-­‐the-­‐21st-­‐century/	
  
5	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
hMp://www.bloomberg.com/news/ar;cles/2015-­‐06-­‐04/help-­‐wanted-­‐black-­‐belts-­‐in-­‐data	
  
6	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
“The	
  United	
  States	
  alone	
  faces	
  a	
  shortage	
  of	
  140,000	
  
to	
  190,000	
  people	
  with	
  analy;cal	
  exper;se	
  and	
  1.5	
  
million	
  managers	
  and	
  analysts	
  with	
  the	
  skills	
  to	
  
understand	
  and	
  make	
  decisions	
  based	
  on	
  the	
  analysis	
  
of	
  big	
  data.”	
  
	
  
McKinsey	
  &	
  Co	
  
hMp://www.mckinsey.com/features/big_data	
  
7	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Source:	
  Drew	
  Conway,	
  “The	
  Data	
  Science	
  Venn	
  Diagram”	
  
Tradi;onal	
  view	
  of	
  Data	
  Science	
  
8	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Analyzing	
  the	
  Analyzers,	
  Harris,	
  Murphy,	
  Vaisman	
  
Many	
  Kinds	
  of	
  “Data	
  People”	
  
9	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Analyzing	
  the	
  Analyzers,	
  Harris,	
  Murphy,	
  Vaisman	
  
Many	
  Kinds	
  of	
  “Data	
  People”	
  
10	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Addressing	
  the	
  analy;cal	
  shortage	
  
Educa;on	
   Culture	
   Tools	
  
11	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Data	
  process	
  
12	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
The	
  “Great	
  Decoupling”	
  for	
  Industry	
  Analy;cs	
  
UI
ComputeStorage
13	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
The	
  “Great	
  Decoupling”	
  for	
  Industry	
  Analy;cs	
  
UI
ComputeStorage
Accumula;on	
  of	
  user	
  ;me	
  
Legacy	
  technology:	
  
ver;cally-­‐integrated	
  
solu;ons	
  
14	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Ubiquitous	
  Real-­‐Time	
  Storage	
  and	
  
Compute:	
  A	
  view	
  from	
  2040	
  
15	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Data	
  analysis	
  hierarchy	
  of	
  needs	
  
Data Storage / Access
Clean Data
Analysis and Visualization
Productivity tools / UI
16	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Some	
  data	
  tooling	
  UI	
  innova;ons	
  
17	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Rejec;ng	
  the	
  “Highlander	
  Fallacy”	
  
18	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
SQL	
  Programming:	
  the	
  “mainframe	
  
punch	
  cards”	
  of	
  our	
  ;me	
  
19	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Many	
  SQL	
  engines	
  
…	
  and	
  more	
  
20	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Execu;ng	
  data	
  science	
  languages	
  in	
  the	
  compute	
  layer	
  
UI
Ibis, SQL, Spark API, …
Compute
Analytic SQL, Spark, MapReduce
Storage
HDFS, Kudu, HBase
Python,
R, Julia, …?
21	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
22	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Thank	
  you	
  
Wes	
  McKinney	
  @wesmckinn	
  
Views	
  are	
  my	
  own	
  

Data Tools and the Data Scientist Shortage

  • 1.
    1  ©  Cloudera,  Inc.  All  rights  reserved.   Data  Tools  and  the  Data   Scien;st  Shortage   Wes  McKinney  @wesmckinn   Data  Summit  @  Web  Summit  2015-­‐11-­‐04  
  • 2.
    2  ©  Cloudera,  Inc.  All  rights  reserved.   Me  
  • 3.
    3  ©  Cloudera,  Inc.  All  rights  reserved.   Career  theme:  Serial  creator  of  data  tools  
  • 4.
    4  ©  Cloudera,  Inc.  All  rights  reserved.   hMps://hbr.org/2012/10/data-­‐scien;st-­‐the-­‐sexiest-­‐job-­‐of-­‐the-­‐21st-­‐century/  
  • 5.
    5  ©  Cloudera,  Inc.  All  rights  reserved.   hMp://www.bloomberg.com/news/ar;cles/2015-­‐06-­‐04/help-­‐wanted-­‐black-­‐belts-­‐in-­‐data  
  • 6.
    6  ©  Cloudera,  Inc.  All  rights  reserved.   “The  United  States  alone  faces  a  shortage  of  140,000   to  190,000  people  with  analy;cal  exper;se  and  1.5   million  managers  and  analysts  with  the  skills  to   understand  and  make  decisions  based  on  the  analysis   of  big  data.”     McKinsey  &  Co   hMp://www.mckinsey.com/features/big_data  
  • 7.
    7  ©  Cloudera,  Inc.  All  rights  reserved.   Source:  Drew  Conway,  “The  Data  Science  Venn  Diagram”   Tradi;onal  view  of  Data  Science  
  • 8.
    8  ©  Cloudera,  Inc.  All  rights  reserved.   Analyzing  the  Analyzers,  Harris,  Murphy,  Vaisman   Many  Kinds  of  “Data  People”  
  • 9.
    9  ©  Cloudera,  Inc.  All  rights  reserved.   Analyzing  the  Analyzers,  Harris,  Murphy,  Vaisman   Many  Kinds  of  “Data  People”  
  • 10.
    10  ©  Cloudera,  Inc.  All  rights  reserved.   Addressing  the  analy;cal  shortage   Educa;on   Culture   Tools  
  • 11.
    11  ©  Cloudera,  Inc.  All  rights  reserved.   Data  process  
  • 12.
    12  ©  Cloudera,  Inc.  All  rights  reserved.   The  “Great  Decoupling”  for  Industry  Analy;cs   UI ComputeStorage
  • 13.
    13  ©  Cloudera,  Inc.  All  rights  reserved.   The  “Great  Decoupling”  for  Industry  Analy;cs   UI ComputeStorage Accumula;on  of  user  ;me   Legacy  technology:   ver;cally-­‐integrated   solu;ons  
  • 14.
    14  ©  Cloudera,  Inc.  All  rights  reserved.   Ubiquitous  Real-­‐Time  Storage  and   Compute:  A  view  from  2040  
  • 15.
    15  ©  Cloudera,  Inc.  All  rights  reserved.   Data  analysis  hierarchy  of  needs   Data Storage / Access Clean Data Analysis and Visualization Productivity tools / UI
  • 16.
    16  ©  Cloudera,  Inc.  All  rights  reserved.   Some  data  tooling  UI  innova;ons  
  • 17.
    17  ©  Cloudera,  Inc.  All  rights  reserved.   Rejec;ng  the  “Highlander  Fallacy”  
  • 18.
    18  ©  Cloudera,  Inc.  All  rights  reserved.   SQL  Programming:  the  “mainframe   punch  cards”  of  our  ;me  
  • 19.
    19  ©  Cloudera,  Inc.  All  rights  reserved.   Many  SQL  engines   …  and  more  
  • 20.
    20  ©  Cloudera,  Inc.  All  rights  reserved.   Execu;ng  data  science  languages  in  the  compute  layer   UI Ibis, SQL, Spark API, … Compute Analytic SQL, Spark, MapReduce Storage HDFS, Kudu, HBase Python, R, Julia, …?
  • 21.
    21  ©  Cloudera,  Inc.  All  rights  reserved.  
  • 22.
    22  ©  Cloudera,  Inc.  All  rights  reserved.   Thank  you   Wes  McKinney  @wesmckinn   Views  are  my  own