A presentation based on a published paper "The Ambiguity of Data Science Team Roles and the Need for a Data Science Workforce Framework" by Jeffrey S. Saltz and Nancy W. Grady.
1. The Ambiguity of Data Science Team
Roles and the Need for
a Data Science Workforce
Framework
Authors- Jeffrey S. Saltz, Nancy W. Grady
Presented by- K.K. Tripathi
(Course- CS795: Intro to Data Science)
October 18th, 2018
2. Introduction
2
Aim
Enable organizations to staff their data science teams more accurately with the desired skillsets.
Providing job titles and job descriptions that are more clearly identify tasks, knowledge, skills,
and abilities that can benefit the data science community.
Remove the overloading of the term data scientist.
Objective
To address this challenge, this paper frames and provides data science workforce definitions
with examples.
3. Background
3
Issue
Generalization of the “Data Science” word.
Problems
Difficulty to ascertain what skills are needed to perform the specific tasks required to build and deploy
big data analytics (BDA) systems.
This lack of vocabulary creates many issues (e.g., identifying the appropriate person that should be
hired for a specific role within a data science team).
There is not an agreed upon process model for data science (lack of process model).
Overlapping skillsets (Software development lifecycles).
4. Role based model by NICE (US DOD CWF)
4
Employers
• Track staff skills
• Training
• Qualifications
• Improve position descriptions
• Develop career paths
• Analyze proficiency
Educators
• Develop curriculum and
conduct training for programs
• Courses
• Seminars for specific roles
Technology
providers
• Identify work roles
• Tasks
• Knowledge
• Skills
• Abilities associated with
products
Based on the list of tasks, knowledge, skills, ability descriptions, a workforce framework map them to work
roles.
Domain benefits:
6. NIST
6
Develop a big data reference architecture that categorizes the components of big data systems
RA consists of 5 components and identifies their respective roles.
System Orchestrator: integrate the data app
Data Provider: introduces new data into the BDS
Big Data Application Provider
Big Data Framework provider
Data Consumer
Security and Privacy: interacts with sys. orch.
Management: big data life cycle
eg. Package, software, and backup management
7. EDISON
7
An European Union funded project to build the data science profession
EDSF (Edison data Science Framework) comprises several documents including DS professional
profiles and the Model Curriculum
Data Scientist: merge, manage, interpret large data-sets
Data Science Researcher: applies scientific discovery research/process, hypothesis testing
Data Science Architect: create relevant data models and process workflow
Data Science Programmer: design, develops, code large data (science) analytics applications
Data/Business Analyst: extract info about system, services, or organization performance
8. SAIC
8
A system integrator works primarily for the federal gov.,
Including civilian, defense, and intelligence customers
- Developed Data Science Edge (an internal process model)
- Extends CRISP-DM process to align with big data
Information Architect: develops data models for optimal performance in databases.
Data Scientist: works in cross-functional teams at all stages of analysis lifecycle.
- Follows a scientific approach to generate value from data
Metrics and Data: develops, inspects, mines, transforms,
models data to improve productivity
Knowledge and Collaboration Engineer: design &
implements tools
Big Data Engineer: works with the full open source
Hadoop stack from cluster management to repository
9. Springboard
9
An online data science education startup. Defines 3 following roles:
Data Engineer: typically knows a variety of programming languages, focuses on coding,
cleaning up data sets; takes the predictive model from the data scientist and implement it in
coding
Data Scientist: bridge the gap between programming and implementation of data science,
theory of data science, and the business implication of data
Data Analyst: provide visualizations and reports, explain insights
Data architect: focuses on structuring the technology
that manages the data models.
10. Gartner
10
A research / advisory consulting firm. Basically, advise to upper level decision makers.
Set of suggested roles:-
Data Scientists: extract various types of knowledge from data; end to end process
Data Engineers: make the data accessible and available for data scientists
Business Experts: business domain experts
Source System Experts: knowledge of data at the
business application level
Software Engineers: for custom coding requirements
Quant Geeks: certain situations: “nice-to-have” but
in rare situation: “must-have”
Unicorns: well versed data scientists
12. Data Scientist vs. Data Engineer
12
Most frequent key phrases used in job descriptions:
13. Future & Conclusion
13
Future:
Next changes in future will occur in cases such as:
Blending of data-intensive and compute-intensive applications
eg. Rise of High Performance Data Analytics (HPDA)
Conclusion:
Rerun of an analysis is required of role usage in the industry in the future (every 6 months)
to identify trends over time
NIST – National Institute of Standard and Technology developed a cybersecurity Workforce Framework - NICE (National Initiative for cyber security framework)
National Institute of Standards and Technology
Science Applications International Corporation (SAIC) parent company changed the name as Leidos