More Related Content Similar to Investigating data scientists (20) More from Business Over Broadway (16) Investigating data scientists2. 2Copyright © 2015 Business Over Broadway and AnalyticsWeek
Who I am
Scientist /
Blogger
Author
Consultant
3. 3Copyright © 2015 Business Over Broadway and AnalyticsWeek
Business Over Broadway
Solve problems, primarily business problems,
through the use of the scientific method
Founder
Using data and analytics to help make decisions that
are based on fact, not hyperbole.
AnalyticsWeek
Help businesses optimize their data/analytics
Chief Research Officer
Improving talent/technology recruitment, facilitating
deeper community engagement with the power of
online/offline channels and advancing knowledge
through research
What I do
4. 4Copyright © 2015 Business Over Broadway and AnalyticsWeek
What is the role of a data scientist?
Quotes from the article: What is the role of a data scientist in the insight economy?
5. 5Copyright © 2015 Business Over Broadway and AnalyticsWeek
Data Science Study
• Invited data professionals via:
– AnalyticsWeek Newsletter
– Blog post
– Analytics.Club
• 600+ completed surveys
– Self-assessment rating of proficiency of 25 skills across
five skill areas:
• Business, Technology, Programming, Math & Modeling,
Statistics
– 9 additional questions
– Overall satisfaction with work outcome
6. 6Copyright © 2015 Business Over Broadway and AnalyticsWeek
Data Science Skills Assessed
Area Skills*
Business
1.Product design and development
2.Project management
3.Business development
4.Budgeting
5.Governance & Compliance (e.g., security)
Technology
6.Managing unstructured data (e.g., noSQL)
7.Managing structured data (e.g., SQL, JSON, XML)
8.Natural Language Processing (NLP) and text mining
9.Machine Learning (e.g., decision trees, neural nets, Support Vector Machine, clustering)
10.Big and Distributed Data (e.g., Hadoop, Map/Reduce, Spark)
Math &
Modeling
11.Optimization (e.g., linear, integer, convex, global)
12.Math (e.g., linear algebra, real analysis, calculus)
13.Graphical Models (e.g., social networks)
14.Algorithms (e.g., computational complexity, Computer Science theory) and Simulations (e.g., discrete, agent-based, continuous)
15.Bayesian Statistics (e.g., Markov Chain Monte Carlo)
Programming
16.Systems Administration (e.g., UNIX) and Design
17.Database Administration (MySQL, NOSQL)
18.Cloud Management
19.Back-End Programming (e.g., JAVA/Rails/Objective C)
20.Front-End Programming (e.g., JavaScript, HTML, CSS)
Statistics
21.Data Management (e.g., recoding, de-duplicating, Integrating disparate data sources, Web scraping)
22.Data Mining (e.g. R, Python, SPSS, SAS) and Visualization (e.g., graphics, mapping, web-based data visualization) tools
23.Statistics and statistical modeling (e.g., general linear model, ANOVA, MANOVA, Spatio-temporal, Geographical Information System (GIS))
24.Science/Scientific Method (e.g., experimental design, research design)
25.Communication (e.g., sharing results, writing/publishing, presentations, blogging)
* List of skills adapted from Analyzing the Analyzers by Harlan D. Harris, Sean Patrick Murphy and Marck Vaisman
7. 7Copyright © 2015 Business Over Broadway and AnalyticsWeek
Proficiency Ratings*
Proficiency
Level
Scale
Value
Description
Don't know 0 You possess no knowledge
Fundamental
Awareness
20
You have a common knowledge or an understanding of basic
techniques and concepts.
Novice 40
You have the level of experience gained in a classroom and/or
experimental scenarios or as a trainee on-the-job. You are expected
to need help when performing this skill.
Intermediate 60
You are able to successfully complete tasks in this competency as
requested. Help from an expert may be required from time to time, but
you can usually perform the skill independently.
Advanced 80
You can perform the actions associated with this skill without
assistance. You are certainly recognized within your immediate
organization as "a person to ask" when difficult questions arise
regarding this skill.
Expert 100
You are known as an expert in this area. You can provide guidance,
troubleshoot and answer questions related to this area of expertise
and the field where the skill is used.
* Rating scale is based on a proficiency rating scale used by NIH. Respondent instructions: You will be asked about your
proficiency for a variety of skills. Please use the following scale when indicating your level of proficiency for each skill.
9. 9Copyright © 2015 Business Over Broadway and AnalyticsWeek
Proficiency varies across skills
10. 10Copyright © 2016 Business Over Broadway and AnalyticsWeek
Job Roles in Data Science
*Researcher (e.g., researcher, scientist, statistician); Business Management (e.g., leader, business person,
entrepreneur); Creative (e.g., jack of all trades, artist, hacker); Developer (e.g., developer, engineer)
11. 11Copyright © 2015 Business Over Broadway and AnalyticsWeek
Proficiency varies by job role – 1st Look
12. 12Copyright © 2016 Business Over Broadway and AnalyticsWeek
Proficiency varies by job role – 2nd Look
13. 13Copyright © 2016 Business Over Broadway and AnalyticsWeek
Structure of Data Science Skills
* Factor analysis is based on proficiency ratings of 621 data professionals. Reliability (Cronbach’s alpha for each of the
three Skills areas (based on items that loaded on the respective factors) were: .87 (Business); .92 (Tech / Prog); .92
(Math / Stats)
Factor Analysis of Data Skills
• Data reduction technique
• Examines the statistical relationships (e.g.,
correlations) among a large set of
variables and tries to explain these
correlations using a smaller number of
variables (factors)
• Elements (or factor loadings) of the factor
pattern matrix represent the strength of
relationship between the variables and
each of the underlying factors
• Tells us two things:
1. number of underlying factors that
describe the initial set of variables
2. which variables are best
represented by each factor
14. 14Copyright © 2016 Business Over Broadway and AnalyticsWeek
Structure of Data Science Skills
* Factor analysis is based on proficiency ratings of 621 data professionals. Reliability (Cronbach’s alpha for each of the
three Skills areas (based on items that loaded on the respective factors) were: .87 (Business); .92 (Tech / Prog); .92
(Math / Stats)
Plot the factor loadings
for the 25 data skills into
a 3-dimensional space
Three Distinct Skill Sets
• Business
• Technology / Programming
• Math / Statistics
16. 17Copyright © 2016 Business Over Broadway and AnalyticsWeek
The Structure of Data Science Skills
17. 18Copyright © 2016 Business Over Broadway and AnalyticsWeek
Proficiency varies by job role – 3rd
Look
18. 19Copyright © 2016 Business Over Broadway and AnalyticsWeek
Proficiency varies by job role – 4th
Look
*Researcher (e.g., researcher, scientist, statistician) n = 133; Business Management (e.g., leader, business person,
entrepreneur) n = 86; Creative (e.g., jack of all trades, artist, hacker) n = 30; Developer (e.g., developer, engineer) n = 54
19. 20Copyright © 2016 Business Over Broadway and AnalyticsWeek
Proficiency varies by job role – 4th
Look
*Researcher (e.g., researcher, scientist, statistician) n = 133; Business Management (e.g., leader, business person,
entrepreneur) n = 86; Creative (e.g., jack of all trades, artist, hacker) n = 30; Developer (e.g., developer, engineer) n = 54
20. 21Copyright © 2016 Business Over Broadway and AnalyticsWeek
Top Data Science Skills by Job Role
21. 22Copyright © 2016 Business Over Broadway and AnalyticsWeek
Top Data Science Skills by Job Role
22. 23Copyright © 2016 Business Over Broadway and AnalyticsWeek
Satisfaction with Work Outcome
*Researcher (e.g., researcher, scientist, statistician); Business Management (e.g., leader, business person,
entrepreneur); Creative (e.g., jack of all trades, artist, hacker); Developer (e.g., developer, engineer)
23. 24Copyright © 2016 Business Over Broadway and AnalyticsWeek
In Search of the Data Scientist
Unicorn
24. 25Copyright © 2016 Business Over Broadway and AnalyticsWeek
Data Science as a Team Sport
Impact of Business Expert
25. 26Copyright © 2016 Business Over Broadway and AnalyticsWeek
Data Science as a Team Sport
Impact of Technology / Programming Expert
26. 27Copyright © 2016 Business Over Broadway and AnalyticsWeek
Data Science as a Team Sport
Impact of Math & Modeling / Statistics Expert
27. 28Copyright © 2016 Business Over Broadway and AnalyticsWeek
Getting Insight from Data: The Scientific Method
28. 29Copyright © 2016 Business Over Broadway and AnalyticsWeek
Scientific Method and Data Science Skills
29. 30Copyright © 2016 Business Over Broadway and AnalyticsWeek
Skill Proficiency Drives Satisfaction with Work
Outcome
30. 31Copyright © 2016 Business Over Broadway and AnalyticsWeek
What skills are linked to project
success?
31. 32Copyright © 2016 Business Over Broadway and AnalyticsWeek
Importance of Data Science Skills by Job
Role
32. 33Copyright © 2016 Business Over Broadway and AnalyticsWeek
Impact of Data Mining and Viz Tools
33. 34Copyright © 2015 Business Over Broadway and AnalyticsWeek
EXPERT
PROFICIENCY
NO
PROFICIENCY
Skill-Based Approach to Improve the Practice of Data Science
IMPROVE /
INVEST
DIVEST
DON’T
OVERINVEST
STAY THE
COURSE
SKILL ESSENTIAL TO
PROJECT OUTCOME
SKILL NOT ESSENTIAL
TO PROJECT OUTCOMECopyright 2015 Business Over Broadway
34. 35Copyright © 2015 Business Over Broadway and AnalyticsWeek-.10
.00
.10
.20
.30
.40
.50
.60
0 20 40 60 80 100
Business Management
EXPERTNONE
Skill-Based Method to Improve Data Science
Data Mining and
Viz Tools
Big and distributed data
Business Management
Science / Scientific Method
Statistics / Statistical Modeling
Machine Learning
Bayesian Statistics
Unstructured data
Optimization
Algorithms
NOT ESSENTIAL
ESSENTIAL
Copyright 2015 Business Over Broadway
35. 36Copyright © 2015 Business Over Broadway and AnalyticsWeek
EXPERTNONE
Skill-Based Method to Improve Data Science
Systems Administration
Developer
Data Mining and Viz Tools
NOT ESSENTIAL
ESSENTIAL
Copyright 2015 Business Over Broadway
36. 37Copyright © 2015 Business Over Broadway and AnalyticsWeek
EXPERTNONE
Skill-Based Method to Improve Data Science
Business Development
Creative
Optimization
Copyright 2015 Business Over Broadway
Math
Data Mining
and Viz Tools
Graphical Models
NOT ESSENTIAL
ESSENTIAL
37. 38Copyright © 2015 Business Over Broadway and AnalyticsWeek
ImpactonSatisfactionwithOutcome
Proficiency
EXPERTNONE
NOT ESSENTIAL
ESSENTIAL
Skill-Based Method to Improve Data Science
Big and distributed data
Bayesian Statistics
Researcher
Data Management
Machine Learning
AlgorithmsProduct Design
Copyright 2015 Business Over Broadway
39. 40Copyright © 2016 Business Over Broadway and AnalyticsWeek
Lack of Gender Diversity – Other Science
Roles
40. 41Copyright © 2016 Business Over Broadway and AnalyticsWeek
Job Roles in Data Science by Gender
41. 42Copyright © 2016 Business Over Broadway and AnalyticsWeek
Highest Level of Education Attained
42. 43Copyright © 2016 Business Over Broadway and AnalyticsWeek
Gender Comparison of Proficiency across Skills
43. 44Copyright © 2016 Business Over Broadway and AnalyticsWeek
Advice for Data Scientists
• Take the Data Science Skills Survey at: http://pxl.me/awrds3
• Be specific when talking about “data scientists”
• There are different types – defined by what they do and the skills
they possess
• Work with other data professionals who have
complementary skills. Teamwork is key to success.
• Learn to use data mining and visualization tools
• R, Python, SPSS, SAS, graphics, mapping, web-based data
visualization
• Be an advocate for women in the field of data science
44. 45Copyright © 2015 Business Over Broadway and AnalyticsWeek
Bob E. Hayes, Ph.D.
Email: bob@analyticsweek.com and bob@businessoverbroadway.com
Web: www.analyticsweek.com and www.businessoverbroadway.com
Blog: www.businessoverbroadway.com/blog
Twitter: www.twitter.com/bobehayes
Editor's Notes A factor analysis is a data reduction technique. It is used when you have a large number of variables in your data set and would like to reduce the number of variables to a manageable size. In general, factor analysis examines the statistical relationships (e.g., correlations) among a large set of variables and tries to explain these correlations using a smaller number of variables (factors).
The results of the factor analysis are presented in tabular format called the factor pattern matrix. The factor matrix is an NxM table (N = number of original variables and M = number of underlying factors). The elements of a factor pattern matrix represent the regression coefficients (like a correlation coefficient) between each of the variables and the factors. These elements (or factor loadings) represent the strength of relationship between the variables and each of the underlying factors. The results of the factor analysis tells us two things:
number of underlying factors that describe the initial set of variables
which variables are best represented by each factor