Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Investigating data scientists

2,696 views

Published on

This study explores the practice of data science by those who practice it. We surveyed over 600 data professionals to understand their data skills, team makeup and more.

Published in: Data & Analytics
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Investigating data scientists

  1. 1. Investigating Data Scientists Bob. E. Hayes, PhD bob@businessoverbroadway.com @bobehayes
  2. 2. 2Copyright © 2015 Business Over Broadway and AnalyticsWeek Who I am Scientist / Blogger Author Consultant
  3. 3. 3Copyright © 2015 Business Over Broadway and AnalyticsWeek Business Over Broadway Solve problems, primarily business problems, through the use of the scientific method Founder Using data and analytics to help make decisions that are based on fact, not hyperbole. AnalyticsWeek Help businesses optimize their data/analytics Chief Research Officer Improving talent/technology recruitment, facilitating deeper community engagement with the power of online/offline channels and advancing knowledge through research What I do
  4. 4. 4Copyright © 2015 Business Over Broadway and AnalyticsWeek What is the role of a data scientist? Quotes from the article: What is the role of a data scientist in the insight economy?
  5. 5. 5Copyright © 2015 Business Over Broadway and AnalyticsWeek Data Science Study • Invited data professionals via: – AnalyticsWeek Newsletter – Blog post – Analytics.Club • 600+ completed surveys – Self-assessment rating of proficiency of 25 skills across five skill areas: • Business, Technology, Programming, Math & Modeling, Statistics – 9 additional questions – Overall satisfaction with work outcome
  6. 6. 6Copyright © 2015 Business Over Broadway and AnalyticsWeek Data Science Skills Assessed Area Skills* Business 1.Product design and development 2.Project management 3.Business development 4.Budgeting 5.Governance & Compliance (e.g., security) Technology 6.Managing unstructured data (e.g., noSQL) 7.Managing structured data (e.g., SQL, JSON, XML) 8.Natural Language Processing (NLP) and text mining 9.Machine Learning (e.g., decision trees, neural nets, Support Vector Machine, clustering) 10.Big and Distributed Data (e.g., Hadoop, Map/Reduce, Spark) Math & Modeling 11.Optimization (e.g., linear, integer, convex, global) 12.Math (e.g., linear algebra, real analysis, calculus) 13.Graphical Models (e.g., social networks) 14.Algorithms (e.g., computational complexity, Computer Science theory) and Simulations (e.g., discrete, agent-based, continuous) 15.Bayesian Statistics (e.g., Markov Chain Monte Carlo) Programming 16.Systems Administration (e.g., UNIX) and Design 17.Database Administration (MySQL, NOSQL) 18.Cloud Management 19.Back-End Programming (e.g., JAVA/Rails/Objective C) 20.Front-End Programming (e.g., JavaScript, HTML, CSS) Statistics 21.Data Management (e.g., recoding, de-duplicating, Integrating disparate data sources, Web scraping) 22.Data Mining (e.g. R, Python, SPSS, SAS) and Visualization (e.g., graphics, mapping, web-based data visualization) tools 23.Statistics and statistical modeling (e.g., general linear model, ANOVA, MANOVA, Spatio-temporal, Geographical Information System (GIS)) 24.Science/Scientific Method (e.g., experimental design, research design) 25.Communication (e.g., sharing results, writing/publishing, presentations, blogging) * List of skills adapted from Analyzing the Analyzers by Harlan D. Harris, Sean Patrick Murphy and Marck Vaisman
  7. 7. 7Copyright © 2015 Business Over Broadway and AnalyticsWeek Proficiency Ratings* Proficiency Level Scale Value Description Don't know 0 You possess no knowledge Fundamental Awareness 20 You have a common knowledge or an understanding of basic techniques and concepts. Novice 40 You have the level of experience gained in a classroom and/or experimental scenarios or as a trainee on-the-job. You are expected to need help when performing this skill. Intermediate 60 You are able to successfully complete tasks in this competency as requested. Help from an expert may be required from time to time, but you can usually perform the skill independently. Advanced 80 You can perform the actions associated with this skill without assistance. You are certainly recognized within your immediate organization as "a person to ask" when difficult questions arise regarding this skill. Expert 100 You are known as an expert in this area. You can provide guidance, troubleshoot and answer questions related to this area of expertise and the field where the skill is used. * Rating scale is based on a proficiency rating scale used by NIH. Respondent instructions: You will be asked about your proficiency for a variety of skills. Please use the following scale when indicating your level of proficiency for each skill.
  8. 8. 8Copyright © 2015 Business Over Broadway and AnalyticsWeek Sample
  9. 9. 9Copyright © 2015 Business Over Broadway and AnalyticsWeek Proficiency varies across skills
  10. 10. 10Copyright © 2016 Business Over Broadway and AnalyticsWeek Job Roles in Data Science *Researcher (e.g., researcher, scientist, statistician); Business Management (e.g., leader, business person, entrepreneur); Creative (e.g., jack of all trades, artist, hacker); Developer (e.g., developer, engineer)
  11. 11. 11Copyright © 2015 Business Over Broadway and AnalyticsWeek Proficiency varies by job role – 1st Look
  12. 12. 12Copyright © 2016 Business Over Broadway and AnalyticsWeek Proficiency varies by job role – 2nd Look
  13. 13. 13Copyright © 2016 Business Over Broadway and AnalyticsWeek Structure of Data Science Skills * Factor analysis is based on proficiency ratings of 621 data professionals. Reliability (Cronbach’s alpha for each of the three Skills areas (based on items that loaded on the respective factors) were: .87 (Business); .92 (Tech / Prog); .92 (Math / Stats) Factor Analysis of Data Skills • Data reduction technique • Examines the statistical relationships (e.g., correlations) among a large set of variables and tries to explain these correlations using a smaller number of variables (factors) • Elements (or factor loadings) of the factor pattern matrix represent the strength of relationship between the variables and each of the underlying factors • Tells us two things: 1. number of underlying factors that describe the initial set of variables 2. which variables are best represented by each factor
  14. 14. 14Copyright © 2016 Business Over Broadway and AnalyticsWeek Structure of Data Science Skills * Factor analysis is based on proficiency ratings of 621 data professionals. Reliability (Cronbach’s alpha for each of the three Skills areas (based on items that loaded on the respective factors) were: .87 (Business); .92 (Tech / Prog); .92 (Math / Stats) Plot the factor loadings for the 25 data skills into a 3-dimensional space Three Distinct Skill Sets • Business • Technology / Programming • Math / Statistics
  15. 15. 16Copyright © 2016 Business Over Broadway and AnalyticsWeek
  16. 16. 17Copyright © 2016 Business Over Broadway and AnalyticsWeek The Structure of Data Science Skills
  17. 17. 18Copyright © 2016 Business Over Broadway and AnalyticsWeek Proficiency varies by job role – 3rd Look
  18. 18. 19Copyright © 2016 Business Over Broadway and AnalyticsWeek Proficiency varies by job role – 4th Look *Researcher (e.g., researcher, scientist, statistician) n = 133; Business Management (e.g., leader, business person, entrepreneur) n = 86; Creative (e.g., jack of all trades, artist, hacker) n = 30; Developer (e.g., developer, engineer) n = 54
  19. 19. 20Copyright © 2016 Business Over Broadway and AnalyticsWeek Proficiency varies by job role – 4th Look *Researcher (e.g., researcher, scientist, statistician) n = 133; Business Management (e.g., leader, business person, entrepreneur) n = 86; Creative (e.g., jack of all trades, artist, hacker) n = 30; Developer (e.g., developer, engineer) n = 54
  20. 20. 21Copyright © 2016 Business Over Broadway and AnalyticsWeek Top Data Science Skills by Job Role
  21. 21. 22Copyright © 2016 Business Over Broadway and AnalyticsWeek Top Data Science Skills by Job Role
  22. 22. 23Copyright © 2016 Business Over Broadway and AnalyticsWeek Satisfaction with Work Outcome *Researcher (e.g., researcher, scientist, statistician); Business Management (e.g., leader, business person, entrepreneur); Creative (e.g., jack of all trades, artist, hacker); Developer (e.g., developer, engineer)
  23. 23. 24Copyright © 2016 Business Over Broadway and AnalyticsWeek In Search of the Data Scientist Unicorn
  24. 24. 25Copyright © 2016 Business Over Broadway and AnalyticsWeek Data Science as a Team Sport Impact of Business Expert
  25. 25. 26Copyright © 2016 Business Over Broadway and AnalyticsWeek Data Science as a Team Sport Impact of Technology / Programming Expert
  26. 26. 27Copyright © 2016 Business Over Broadway and AnalyticsWeek Data Science as a Team Sport Impact of Math & Modeling / Statistics Expert
  27. 27. 28Copyright © 2016 Business Over Broadway and AnalyticsWeek Getting Insight from Data: The Scientific Method
  28. 28. 29Copyright © 2016 Business Over Broadway and AnalyticsWeek Scientific Method and Data Science Skills
  29. 29. 30Copyright © 2016 Business Over Broadway and AnalyticsWeek Skill Proficiency Drives Satisfaction with Work Outcome
  30. 30. 31Copyright © 2016 Business Over Broadway and AnalyticsWeek What skills are linked to project success?
  31. 31. 32Copyright © 2016 Business Over Broadway and AnalyticsWeek Importance of Data Science Skills by Job Role
  32. 32. 33Copyright © 2016 Business Over Broadway and AnalyticsWeek Impact of Data Mining and Viz Tools
  33. 33. 34Copyright © 2015 Business Over Broadway and AnalyticsWeek EXPERT PROFICIENCY NO PROFICIENCY Skill-Based Approach to Improve the Practice of Data Science IMPROVE / INVEST DIVEST DON’T OVERINVEST STAY THE COURSE SKILL ESSENTIAL TO PROJECT OUTCOME SKILL NOT ESSENTIAL TO PROJECT OUTCOMECopyright 2015 Business Over Broadway
  34. 34. 35Copyright © 2015 Business Over Broadway and AnalyticsWeek-.10 .00 .10 .20 .30 .40 .50 .60 0 20 40 60 80 100 Business Management EXPERTNONE Skill-Based Method to Improve Data Science Data Mining and Viz Tools Big and distributed data Business Management Science / Scientific Method Statistics / Statistical Modeling Machine Learning Bayesian Statistics Unstructured data Optimization Algorithms NOT ESSENTIAL ESSENTIAL Copyright 2015 Business Over Broadway
  35. 35. 36Copyright © 2015 Business Over Broadway and AnalyticsWeek EXPERTNONE Skill-Based Method to Improve Data Science Systems Administration Developer Data Mining and Viz Tools NOT ESSENTIAL ESSENTIAL Copyright 2015 Business Over Broadway
  36. 36. 37Copyright © 2015 Business Over Broadway and AnalyticsWeek EXPERTNONE Skill-Based Method to Improve Data Science Business Development Creative Optimization Copyright 2015 Business Over Broadway Math Data Mining and Viz Tools Graphical Models NOT ESSENTIAL ESSENTIAL
  37. 37. 38Copyright © 2015 Business Over Broadway and AnalyticsWeek ImpactonSatisfactionwithOutcome Proficiency EXPERTNONE NOT ESSENTIAL ESSENTIAL Skill-Based Method to Improve Data Science Big and distributed data Bayesian Statistics Researcher Data Management Machine Learning AlgorithmsProduct Design Copyright 2015 Business Over Broadway
  38. 38. 39Copyright © 2016 Business Over Broadway and AnalyticsWeek Lack of Gender Diversity
  39. 39. 40Copyright © 2016 Business Over Broadway and AnalyticsWeek Lack of Gender Diversity – Other Science Roles
  40. 40. 41Copyright © 2016 Business Over Broadway and AnalyticsWeek Job Roles in Data Science by Gender
  41. 41. 42Copyright © 2016 Business Over Broadway and AnalyticsWeek Highest Level of Education Attained
  42. 42. 43Copyright © 2016 Business Over Broadway and AnalyticsWeek Gender Comparison of Proficiency across Skills
  43. 43. 44Copyright © 2016 Business Over Broadway and AnalyticsWeek Advice for Data Scientists • Take the Data Science Skills Survey at: http://pxl.me/awrds3 • Be specific when talking about “data scientists” • There are different types – defined by what they do and the skills they possess • Work with other data professionals who have complementary skills. Teamwork is key to success. • Learn to use data mining and visualization tools • R, Python, SPSS, SAS, graphics, mapping, web-based data visualization • Be an advocate for women in the field of data science
  44. 44. 45Copyright © 2015 Business Over Broadway and AnalyticsWeek Bob E. Hayes, Ph.D. Email: bob@analyticsweek.com and bob@businessoverbroadway.com Web: www.analyticsweek.com and www.businessoverbroadway.com Blog: www.businessoverbroadway.com/blog Twitter: www.twitter.com/bobehayes

×