SlideShare a Scribd company logo
Submit Search
Upload
Data Tools and the Data Scientist Shortage
Report
Share
Wes McKinney
Director of Ursa Labs, Open Source Developer at Ursa Labs
Follow
•
15 likes
•
3,668 views
1
of
22
Data Tools and the Data Scientist Shortage
•
15 likes
•
3,668 views
Report
Share
Download Now
Download to read offline
Technology
From the Data Stage at Web Summit 2015, November 4, 2015
Read more
Wes McKinney
Director of Ursa Labs, Open Source Developer at Ursa Labs
Follow
Recommended
Ibis: Scaling Python Analytics on Hadoop and Impala by
Ibis: Scaling Python Analytics on Hadoop and Impala
Wes McKinney
7.6K views
•
33 slides
PyData: The Next Generation by
PyData: The Next Generation
Wes McKinney
22.2K views
•
31 slides
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives by
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Cloudera, Inc.
915 views
•
36 slides
How to Build Continuous Ingestion for the Internet of Things by
How to Build Continuous Ingestion for the Internet of Things
Cloudera, Inc.
3.7K views
•
24 slides
DataFrames: The Good, Bad, and Ugly by
DataFrames: The Good, Bad, and Ugly
Wes McKinney
12.9K views
•
24 slides
Introduction To Big Data Analytics On Hadoop - SpringPeople by
Introduction To Big Data Analytics On Hadoop - SpringPeople
SpringPeople
6.9K views
•
15 slides
More Related Content
What's hot
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs... by
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Databricks
731 views
•
12 slides
Managed Cluster Services by
Managed Cluster Services
Adam Doyle
190 views
•
29 slides
2016 Cybersecurity Analytics State of the Union by
2016 Cybersecurity Analytics State of the Union
Cloudera, Inc.
1.1K views
•
29 slides
PyCon Singapore 2013 Keynote by
PyCon Singapore 2013 Keynote
Wes McKinney
94.6K views
•
19 slides
Managing the Dewey Decimal System by
Managing the Dewey Decimal System
DataWorks Summit
1K views
•
8 slides
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa... by
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
Dremio Corporation
1.1K views
•
37 slides
What's hot
(19)
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs... by Databricks
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Databricks
•
731 views
Managed Cluster Services by Adam Doyle
Managed Cluster Services
Adam Doyle
•
190 views
2016 Cybersecurity Analytics State of the Union by Cloudera, Inc.
2016 Cybersecurity Analytics State of the Union
Cloudera, Inc.
•
1.1K views
PyCon Singapore 2013 Keynote by Wes McKinney
PyCon Singapore 2013 Keynote
Wes McKinney
•
94.6K views
Managing the Dewey Decimal System by DataWorks Summit
Managing the Dewey Decimal System
DataWorks Summit
•
1K views
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa... by Dremio Corporation
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
Dremio Corporation
•
1.1K views
Webinar: Proofpoint, a pioneer in security-as-a-service protects people, info... by DataStax
Webinar: Proofpoint, a pioneer in security-as-a-service protects people, info...
DataStax
•
644 views
Deep Learning with Cloudera by Cloudera, Inc.
Deep Learning with Cloudera
Cloudera, Inc.
•
2.7K views
Govern This! Data Discovery and the application of data governance with new s... by Cloudera, Inc.
Govern This! Data Discovery and the application of data governance with new s...
Cloudera, Inc.
•
2.8K views
How to boost your datamanagement with Dremio ? by Vincent Terrasi
How to boost your datamanagement with Dremio ?
Vincent Terrasi
•
1.4K views
Introduction to hadoop by dhruv_gairola
Introduction to hadoop
dhruv_gairola
•
182 views
Hadoop Tutorial For Beginners by Dataflair Web Services Pvt Ltd
Hadoop Tutorial For Beginners
Dataflair Web Services Pvt Ltd
•
1.3K views
Dremio introduction by Alexis Gendronneau
Dremio introduction
Alexis Gendronneau
•
1.4K views
Introduction of Big data and Hadoop by Arohi Khandelwal
Introduction of Big data and Hadoop
Arohi Khandelwal
•
163 views
Seeking Cybersecurity--Strategies to Protect the Data by Cloudera, Inc.
Seeking Cybersecurity--Strategies to Protect the Data
Cloudera, Inc.
•
715 views
Available platforms for Big Data 2.0 by Petr Novotný
Available platforms for Big Data 2.0
Petr Novotný
•
46 views
Big Data and Hadoop - key drivers, ecosystem and use cases by Jeff Kelly
Big Data and Hadoop - key drivers, ecosystem and use cases
Jeff Kelly
•
1.4K views
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis... by Spark Summit
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Spark Summit
•
861 views
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench by NOVA DATASCIENCE
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
NOVA DATASCIENCE
•
171 views
Similar to Data Tools and the Data Scientist Shortage
Next-Gen ML/AI Platform by
Next-Gen ML/AI Platform
Josh Yeh
58.1K views
•
22 slides
The Vision & Challenge of Applied Machine Learning by
The Vision & Challenge of Applied Machine Learning
Cloudera, Inc.
635 views
•
42 slides
Analytics, Everywhere. Keys to Effective Analytics and Data Discovery by
Analytics, Everywhere. Keys to Effective Analytics and Data Discovery
DLT Solutions
363 views
•
40 slides
Introducing Cloudera DataFlow (CDF) 2.13.19 by
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
4.9K views
•
31 slides
Build a modern platform for anti-money laundering 9.19.18 by
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
1K views
•
24 slides
Keynote: The Journey to Pervasive Analytics by
Keynote: The Journey to Pervasive Analytics
Cloudera, Inc.
2.6K views
•
30 slides
Similar to Data Tools and the Data Scientist Shortage
(20)
Next-Gen ML/AI Platform by Josh Yeh
Next-Gen ML/AI Platform
Josh Yeh
•
58.1K views
The Vision & Challenge of Applied Machine Learning by Cloudera, Inc.
The Vision & Challenge of Applied Machine Learning
Cloudera, Inc.
•
635 views
Analytics, Everywhere. Keys to Effective Analytics and Data Discovery by DLT Solutions
Analytics, Everywhere. Keys to Effective Analytics and Data Discovery
DLT Solutions
•
363 views
Introducing Cloudera DataFlow (CDF) 2.13.19 by Cloudera, Inc.
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
•
4.9K views
Build a modern platform for anti-money laundering 9.19.18 by Cloudera, Inc.
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
•
1K views
Keynote: The Journey to Pervasive Analytics by Cloudera, Inc.
Keynote: The Journey to Pervasive Analytics
Cloudera, Inc.
•
2.6K views
Technology Primer: Hey IT—Your Big Data Infrastructure Can’t Sit in a Silo An... by CA Technologies
Technology Primer: Hey IT—Your Big Data Infrastructure Can’t Sit in a Silo An...
CA Technologies
•
1.2K views
151116 Sedania Cloudera BDA Profile by Zarul Zaabah
151116 Sedania Cloudera BDA Profile
Zarul Zaabah
•
253 views
Edc event vienna presentation 1 oct 2019 by Cloudera, Inc.
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
•
4.5K views
Building Data Science Teams: A Moneyball Approach by joshwills
Building Data Science Teams: A Moneyball Approach
joshwills
•
1.6K views
Big data oracle_introduccion by Fran Navarro
Big data oracle_introduccion
Fran Navarro
•
1.6K views
巨量資料入門 The evolution of data architecture by Wei-Chiu Chuang
巨量資料入門 The evolution of data architecture
Wei-Chiu Chuang
•
248 views
Stl meetup cloudera platform - january 2020 by Adam Doyle
Stl meetup cloudera platform - january 2020
Adam Doyle
•
721 views
The 5 Biggest Data Myths in Telco: Exposed by Cloudera, Inc.
The 5 Biggest Data Myths in Telco: Exposed
Cloudera, Inc.
•
333 views
Optimize your cloud strategy for machine learning and analytics by Cloudera, Inc.
Optimize your cloud strategy for machine learning and analytics
Cloudera, Inc.
•
867 views
Cloudera 助力台灣大數據產業的發展 by Etu Solution
Cloudera 助力台灣大數據產業的發展
Etu Solution
•
3.2K views
Creating your Center of Excellence (CoE) for data driven use cases by Frank Vullers
Creating your Center of Excellence (CoE) for data driven use cases
Frank Vullers
•
764 views
Addressing Challenges with IoT Edge Management by DataWorks Summit
Addressing Challenges with IoT Edge Management
DataWorks Summit
•
348 views
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera) by Spark Summit
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark Summit
•
2.3K views
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J... by Data Con LA
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...
Data Con LA
•
924 views
More from Wes McKinney
Solving Enterprise Data Challenges with Apache Arrow by
Solving Enterprise Data Challenges with Apache Arrow
Wes McKinney
1.1K views
•
31 slides
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity by
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Wes McKinney
1.1K views
•
26 slides
Apache Arrow: High Performance Columnar Data Framework by
Apache Arrow: High Performance Columnar Data Framework
Wes McKinney
1.5K views
•
53 slides
New Directions for Apache Arrow by
New Directions for Apache Arrow
Wes McKinney
1.9K views
•
27 slides
Apache Arrow Flight: A New Gold Standard for Data Transport by
Apache Arrow Flight: A New Gold Standard for Data Transport
Wes McKinney
2.2K views
•
31 slides
ACM TechTalks : Apache Arrow and the Future of Data Frames by
ACM TechTalks : Apache Arrow and the Future of Data Frames
Wes McKinney
2K views
•
47 slides
More from Wes McKinney
(20)
Solving Enterprise Data Challenges with Apache Arrow by Wes McKinney
Solving Enterprise Data Challenges with Apache Arrow
Wes McKinney
•
1.1K views
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity by Wes McKinney
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Wes McKinney
•
1.1K views
Apache Arrow: High Performance Columnar Data Framework by Wes McKinney
Apache Arrow: High Performance Columnar Data Framework
Wes McKinney
•
1.5K views
New Directions for Apache Arrow by Wes McKinney
New Directions for Apache Arrow
Wes McKinney
•
1.9K views
Apache Arrow Flight: A New Gold Standard for Data Transport by Wes McKinney
Apache Arrow Flight: A New Gold Standard for Data Transport
Wes McKinney
•
2.2K views
ACM TechTalks : Apache Arrow and the Future of Data Frames by Wes McKinney
ACM TechTalks : Apache Arrow and the Future of Data Frames
Wes McKinney
•
2K views
Apache Arrow: Present and Future @ ScaledML 2020 by Wes McKinney
Apache Arrow: Present and Future @ ScaledML 2020
Wes McKinney
•
970 views
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future by Wes McKinney
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
Wes McKinney
•
2.1K views
Apache Arrow: Leveling Up the Analytics Stack by Wes McKinney
Apache Arrow: Leveling Up the Analytics Stack
Wes McKinney
•
1.4K views
Apache Arrow Workshop at VLDB 2019 / BOSS Session by Wes McKinney
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Wes McKinney
•
2.5K views
Apache Arrow: Leveling Up the Data Science Stack by Wes McKinney
Apache Arrow: Leveling Up the Data Science Stack
Wes McKinney
•
3.5K views
Ursa Labs and Apache Arrow in 2019 by Wes McKinney
Ursa Labs and Apache Arrow in 2019
Wes McKinney
•
4.2K views
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward" by Wes McKinney
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
Wes McKinney
•
1.1K views
Apache Arrow at DataEngConf Barcelona 2018 by Wes McKinney
Apache Arrow at DataEngConf Barcelona 2018
Wes McKinney
•
2K views
Apache Arrow: Cross-language Development Platform for In-memory Data by Wes McKinney
Apache Arrow: Cross-language Development Platform for In-memory Data
Wes McKinney
•
6.6K views
Apache Arrow -- Cross-language development platform for in-memory data by Wes McKinney
Apache Arrow -- Cross-language development platform for in-memory data
Wes McKinney
•
2.9K views
Shared Infrastructure for Data Science by Wes McKinney
Shared Infrastructure for Data Science
Wes McKinney
•
8.5K views
Data Science Without Borders (JupyterCon 2017) by Wes McKinney
Data Science Without Borders (JupyterCon 2017)
Wes McKinney
•
6.2K views
Memory Interoperability in Analytics and Machine Learning by Wes McKinney
Memory Interoperability in Analytics and Machine Learning
Wes McKinney
•
5.6K views
Raising the Tides: Open Source Analytics for Data Science by Wes McKinney
Raising the Tides: Open Source Analytics for Data Science
Wes McKinney
•
3.2K views
Recently uploaded
Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ by
Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ
ShapeBlue
58 views
•
9 slides
Digital Personal Data Protection (DPDP) Practical Approach For CISOs by
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Priyanka Aash
103 views
•
59 slides
Future of AR - Facebook Presentation by
Future of AR - Facebook Presentation
Rob McCarty
54 views
•
27 slides
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O... by
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
ShapeBlue
59 views
•
13 slides
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc
130 views
•
29 slides
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online by
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
ShapeBlue
154 views
•
19 slides
Recently uploaded
(20)
Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ by ShapeBlue
Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ
ShapeBlue
•
58 views
Digital Personal Data Protection (DPDP) Practical Approach For CISOs by Priyanka Aash
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Priyanka Aash
•
103 views
Future of AR - Facebook Presentation by Rob McCarty
Future of AR - Facebook Presentation
Rob McCarty
•
54 views
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O... by ShapeBlue
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
ShapeBlue
•
59 views
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc
•
130 views
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online by ShapeBlue
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
ShapeBlue
•
154 views
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue by ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
ShapeBlue
•
134 views
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava... by ShapeBlue
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
ShapeBlue
•
74 views
Business Analyst Series 2023 - Week 4 Session 7 by DianaGray10
Business Analyst Series 2023 - Week 4 Session 7
DianaGray10
•
110 views
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software
•
373 views
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue by ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
ShapeBlue
•
68 views
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T by ShapeBlue
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
ShapeBlue
•
81 views
Kyo - Functional Scala 2023.pdf by Flavio W. Brasil
Kyo - Functional Scala 2023.pdf
Flavio W. Brasil
•
443 views
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by ShapeBlue
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
ShapeBlue
•
120 views
State of the Union - Rohit Yadav - Apache CloudStack by ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue
•
218 views
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... by ShapeBlue
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
ShapeBlue
•
93 views
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
The Digital Insurer
•
40 views
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT by ShapeBlue
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
ShapeBlue
•
138 views
"Surviving highload with Node.js", Andrii Shumada by Fwdays
"Surviving highload with Node.js", Andrii Shumada
Fwdays
•
49 views
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... by ShapeBlue
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
ShapeBlue
•
48 views
Data Tools and the Data Scientist Shortage
1.
1 © Cloudera,
Inc. All rights reserved. Data Tools and the Data Scien;st Shortage Wes McKinney @wesmckinn Data Summit @ Web Summit 2015-‐11-‐04
2.
2 © Cloudera,
Inc. All rights reserved. Me
3.
3 © Cloudera,
Inc. All rights reserved. Career theme: Serial creator of data tools
4.
4 © Cloudera,
Inc. All rights reserved. hMps://hbr.org/2012/10/data-‐scien;st-‐the-‐sexiest-‐job-‐of-‐the-‐21st-‐century/
5.
5 © Cloudera,
Inc. All rights reserved. hMp://www.bloomberg.com/news/ar;cles/2015-‐06-‐04/help-‐wanted-‐black-‐belts-‐in-‐data
6.
6 © Cloudera,
Inc. All rights reserved. “The United States alone faces a shortage of 140,000 to 190,000 people with analy;cal exper;se and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of big data.” McKinsey & Co hMp://www.mckinsey.com/features/big_data
7.
7 © Cloudera,
Inc. All rights reserved. Source: Drew Conway, “The Data Science Venn Diagram” Tradi;onal view of Data Science
8.
8 © Cloudera,
Inc. All rights reserved. Analyzing the Analyzers, Harris, Murphy, Vaisman Many Kinds of “Data People”
9.
9 © Cloudera,
Inc. All rights reserved. Analyzing the Analyzers, Harris, Murphy, Vaisman Many Kinds of “Data People”
10.
10 © Cloudera,
Inc. All rights reserved. Addressing the analy;cal shortage Educa;on Culture Tools
11.
11 © Cloudera,
Inc. All rights reserved. Data process
12.
12 © Cloudera,
Inc. All rights reserved. The “Great Decoupling” for Industry Analy;cs UI ComputeStorage
13.
13 © Cloudera,
Inc. All rights reserved. The “Great Decoupling” for Industry Analy;cs UI ComputeStorage Accumula;on of user ;me Legacy technology: ver;cally-‐integrated solu;ons
14.
14 © Cloudera,
Inc. All rights reserved. Ubiquitous Real-‐Time Storage and Compute: A view from 2040
15.
15 © Cloudera,
Inc. All rights reserved. Data analysis hierarchy of needs Data Storage / Access Clean Data Analysis and Visualization Productivity tools / UI
16.
16 © Cloudera,
Inc. All rights reserved. Some data tooling UI innova;ons
17.
17 © Cloudera,
Inc. All rights reserved. Rejec;ng the “Highlander Fallacy”
18.
18 © Cloudera,
Inc. All rights reserved. SQL Programming: the “mainframe punch cards” of our ;me
19.
19 © Cloudera,
Inc. All rights reserved. Many SQL engines … and more
20.
20 © Cloudera,
Inc. All rights reserved. Execu;ng data science languages in the compute layer UI Ibis, SQL, Spark API, … Compute Analytic SQL, Spark, MapReduce Storage HDFS, Kudu, HBase Python, R, Julia, …?
21.
21 © Cloudera,
Inc. All rights reserved.
22.
22 © Cloudera,
Inc. All rights reserved. Thank you Wes McKinney @wesmckinn Views are my own