Successfully reported this slideshow.
Your SlideShare is downloading. ×

KDD 2019 IADSS Workshop - Research Updates from Usama Fayyad & Hamit Hamutcu

Ad

Licensed to Analyze?
Who Can Claim to be a Data Scientist?
Defining Roles, Standards and Assessing Skills in Data Science
...

Ad

www.iadss.org http://blog.kaggle.com/2019/01/18/reviewing-
2018-and-previewing-2019/
Number of analytics professionals is ...

Ad

www.iadss.org
Do you believe a pilot needs a license? Special training? Certification?
Who would you trust to pilot a plan...

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Check these out next

1 of 47 Ad
1 of 47 Ad
Advertisement

More Related Content

Slideshows for you (19)

Similar to KDD 2019 IADSS Workshop - Research Updates from Usama Fayyad & Hamit Hamutcu (20)

Advertisement

KDD 2019 IADSS Workshop - Research Updates from Usama Fayyad & Hamit Hamutcu

  1. 1. Licensed to Analyze? Who Can Claim to be a Data Scientist? Defining Roles, Standards and Assessing Skills in Data Science Initiative for Analytics and Data Science Standards (IADSS) August 5th | 2019 www.iadss.org Usama Fayyad | IADSS co-founder, Open Insights - Chairman & CEO Hamit Hamutcu | IADSS co-founder Analytics Center & Smartcon, Co-founder & CEO
  2. 2. www.iadss.org http://blog.kaggle.com/2019/01/18/reviewing- 2018-and-previewing-2019/ Number of analytics professionals is increasing at a high rate kaggle users The number of Kaggle members gives insight about the rapid increase in number of analytics professionals 4,466 24,313 70,980 137,873 240,933 437,442 589,552 1,400,000 2,500,000 0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 CAGR %120 (1.55 M Logged-in)
  3. 3. www.iadss.org Do you believe a pilot needs a license? Special training? Certification? Who would you trust to pilot a plane you are riding on? Would you rather ride with him? Or with them at the controls?
  4. 4. www.iadss.org How about a surgeon performing surgery on you? How do you know he or she is qualified to operate? Bad things happen when we cannot define the necessary skills and knowledge…
  5. 5. www.iadss.org Who is analyzing your data? Are they qualified? Are you extracting the right value from your data assets? What happens when the wrong outcomes are provided? Do you think bad things can happen when people who are not qualified are analyzing your data? In the new world, failure to use data properly likely means the failure of your business…
  6. 6. www.iadss.org estimated audience according to LinkedIn 12.000.000 + Global LinkedIn Members: Capability Targeting 1.600.000 + Global LinkedIn Members: Title Targeting A quick keyword search on LinkedIn shows a large # of professionals defining themselves in analytics related spaces
  7. 7. www.iadss.org Top 100 LinkedIn Group’s Member Base: 2,300,000 .. and a multitude of groups with growing memberships Group Name Members Big Data and Analytics 322.677 Data Science Central 270.944 Big Data, Analytics, Business Intelligence & Visualization Experts Community 223.728 Big Data | Analytics | Strategy | Finance | Innovation 221.666 Business Intelligence Professionals (BI, Big Data, Analytics, IoT) 209.659 Business Analytics, Big Data, and Artificial Intelligence 157.740 Data Mining, Statistics, Big Data, DataVisualization, and Data Science 155.737 Python Community 129.044 Microsoft Business Intelligence 123.292 Change Consulting | Digital Transformation Data Analytics Security 100.398 Big Data 86.404 Machine Learning and Data Science 75.363 Hadoop Users 70.821 Business Analyst forum [BA forum] 69.802 Big Data, Analytics, Hadoop, NoSQL & Cloud Computing 69.761 TDWI: Analytics and Data Management Discussion Group 68.326 Analytics and Artificial Intelligence (AI) in Marketing and Retail 66.896 Python Professionals 59.743 Data Warehouse - Big Data - Hadoop - Cloud - Data Science - ETL 59.711 Data Scientists 53.295 Big Data & Hadoop Professionals 51.887 Data Warehousing (Business Intelligence, ETL) Professional's Group 51.331 Business Intelligence 47.370
  8. 8. www.iadss.org How many Data Scientists are there in the world? What do you think? https://www.huffpost.com/entry/where-will-data-science-b_b_12375864 https://www.forbes.com/sites/louiscolumbus/2017/05/13/ibm-predicts-demand- for-data-scientists-will-soar-28-by-2020/ https://www.pwc.com/us/en/library/data-science-and-analytics.html “There are between 1.5-3 million data scientists in the world” (2016) – Anthony Goldbloom, Co-founder & CEO @Kaggle • 200K - 700K new grads join the job market annually • The number of jobs for all US data professionals will increase to 2,720,000 openings by 2020 [IBM]. • Annual demand for the fast-growing new roles of data scientists, developers, and engineers in US will reach nearly 700,000 openings. Really?
  9. 9. www.iadss.org Despite the increasingly large numbers, there is still data science skills shortage in US, which was not unexpected … https://expandedramblings.com/index.php/linkedin-job-statistics/ https://economicgraph.linkedin.com/research/LinkedIns-2017-US-Emerging-Jobs-Report https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/big-data- the-next-frontier-for-innovation https://economicgraph.linkedin.com/resources/linkedin-workforce-report-august-2018 In 2011, McKinsey forecasted that US could face a shortage of 150-190K people with deep analytical skills by 2018 … … as verified by LinkedIn 2018 Workforce Report; there is a shortage of 151K people with “data science skills” Top Emerging Jobs (2012-2017)
  10. 10. www.iadss.org PwC “Data Science & Analytics” Report does not mention a shortage, but the lack of skills of existing workforce https://www.pwc.com/us/en/library/data-science-and- analytics.html An alternative domain knowledge offered by PwC on DS roles:
  11. 11. www.iadss.org https://jfgagne.ai/talent/ https://www.oreilly.com/data/free/how-companies-are-putting- aI-to-work-through-deep-learning.csp A similar shortage also exists on Artificial Intelligence & Deep Learning domains According to a new report out by Tecent; - There are about 300,000 “AI practitioners and researchers” worldwide - But millions of roles available for people with these AI qualifications
  12. 12. www.iadss.org In addition to the shortage, we see different backgrounds & skills for data scientists even at same company … **** **** ****
  13. 13. www.iadss.org It gets even more complicated when you look across different companies and sectors **** **** ****
  14. 14. www.iadss.org At actual job postings, you see a wide variety of role definitions of and expectations from the same job title
  15. 15. www.iadss.org There are over 250 programs in the US that offer graduate degrees in Analytics or Data Science https://analytics.ncsu.edu/?page_id=4184
  16. 16. www.iadss.org Despite having similar names and objectives, course offerings and approach of these programs vary widely Brigham Young University Data Science http://www.byui.edu/catalog/#/programs/41PwqJ9RZ Winona State University Data Science https://catalog.winona.edu/preview_program.php?catoid=21&poid=4333 CIT111 - Introduction to Databases CS101 - Introduction to Programming CS241 - Survey Object-Oriented Programming/Data Struct. CS450 - Machine Learning and Data Mining MATH325 - Intermediate Statistics MATH425 - Applied Linear Regression MATH488 - Statistical Consulting + CS335 - Data Wrangling, Exploration, and Visualization MATH221A - Business Statistics Electives + Project + Internship DSCI 210 - Data Science DSCI 310 - Data Summary and Visualization DSCI 325 - Management of Structured Data STAT 210 - Statistics STAT 310 - Intermediate Statistics STAT 360 - Regression Analysis + MATH 140 - Applied Calculus CS 234-250 - Algorithms and Problem-Solving I-II CS 385 - Applied Database Management Systems DSCI 395-495 - Professional Skill Development & Communication Electives + Project OR Internship
  17. 17. www.iadss.org … which is also valid for graduate programs Carnegie Mellon Master of Computational - Data Science Columbia University Master of Science in - Data Science
  18. 18. www.iadss.org LESSON 1 Introduction to Data Science • Pi-Chaun (Data Scientist @ Google): What is Data Science? • Gabor (Data Scientist @ Twitter): What is Data Science? • Problems solved by data science. LESSON 2 Data Wrangling • What is Data Wrangling? • Acquiring data. • Common data formats. LESSON 3 Data Analysis • Statistical rigor. • Kurt (Data Scientist @ Twitter) - Why is Stats Useful? • Introduction to normal distribution. LESSON 4 Data Visualization • Effective information visualization. • An analysis of Napoleon's invasion of Russia! • Don (Principal Data Scientist @ AT&T): Communicating Findings. LESSON 5 MapReduce • Introduction to Big Data and MapReduce. • Learn the basics of MapReduce. • Mapper. Course Content udacity.com/course/intro-to-data-science--ud359 LESSON 1 Data Management and Visualization • Managing Data • Visualizing Data LESSON 2 Data Analysis Tools • Hypothesis Testing and ANOVA • Chi Square Test of Independence • Pearson Correlation • Exploring Statistical Interactions LESSON 3 Regression Modeling in Practice • Basics of Linear Regression • Multiple Regression • Logistic Regression LESSON 4 Machine Learning for Data Analysis • Decision Trees • Random Forests • Lasso Regression • K-Means Cluster Analysis coursera.org/specializations/data-analysis … as well as online courses Coursera | Intro to Data Science Udacity | Intro to Data Science
  19. 19. www.iadss.org : The same confusion exists also in the recruitment process “Most interviewers had me write pseudocode, in something like Python. Most asked me some product-specific questions, such as, "How would you use data to improve X feature on our website?" Some interviewers asked me to write SQL, in addition to or instead of pseudocode. Another question I was often asked was how to set up some kind of experiment, such as, "How would we design an experiment to see whether our new homepage is better?” or "How can we use data to improve search results?". One or two interviewers asked me algorithms questions (quicksort, etc) but not in very much depth. Beyond that, there was little in common. The formats varied a lot. Some interviews were all-day affairs - back-to-back meetings with programmers all day - and others were just a quick meeting with a CTO. Some interviews had me filling whiteboards with code, while others just consisted of a face-to-face conversation. A few of the interviews involved some sort of social/culture component, ranging from formal interviews with non- technical people to happy hours.” – A job searcher on Quora:
  20. 20. www.iadss.org : There is real and significant cost to employers in recruitment and matching the right person to the right job Average Cost of Recruitment (only fees, internal costs not included) $35,000 ❑ According to IBM; Data Science and Analytics jobs remain open an average of 45 days, 5 days longer than average for all positions. ❑ More than 2.7 million data science and analytics job openings by 2020 ❑ Hiring a data scientist involves multiple rounds of interviews, often carried out by already scarce existing talent within an organization ❑ Average turnover ratio is higher for growing roles https://www.forbes.com/sites/louiscolumbus/2017/05/13/ibm- predicts-demand-for-data-scientists-will-soar-28-by-2020/ https://www.shrm.org/about-shrm/press-room/press- releases/pages/human-capital-benchmarking-report.aspx https://www.burtchworks.com/2019/03/11/how-long-do-data- scientists-analytics-pros-stay-at-their-jobs/ Hundreds of millions of USD wasted in an increasingly inefficient recruitment process New hire rate Data Scientist ML Eng/Spec Data Analysts Statistician s (3-months) 10-11% 12% 6% 3-4% Software Eng Sales Rep Accountant 4% 4% 2%
  21. 21. www.iadss.org Setting industry standards would support the healthy growth of the analytics market ▪ As the role of data and analytics expands very rapidly in creating new business models or changing existing ones, demand for analytics professionals is growing at increasing rate ▪ Every company has a unique way of defining roles in related to data analytics and big data technology Background ▪ Wide variety of role definitions, expected hard/ soft skills, and experience/career development plan. ▪ Lack of standards creates inefficiencies and difficulties for companies in position matching, leveraging analytics skills effectively and retaining talent. ▪ Also makes it hard for professionals to understand what a position requires and follow a career-path. Challenges ▪ A framework to understand the analytics profession landscape, how companies structure their analytics teams, most common job titles, roles and corresponding skill- set requirements Need
  22. 22. www.iadss.org IADSS aims to support the data analytics ecosystem by defining professional standards and suggesting ways to measure and assess relevant knowledge and skills Job Titles, Roles Knowledge and Skills Requirements Assessment and Measurement Industry Standards
  23. 23. www.iadss.org … with involvement of all related parties • Research initiative will focus on existing and emerging data analytics related roles within organizations, job requirements for such roles and corresponding skill-sets. • IADSS will then analyze and group findings to create a standardized list of data analytics roles along with career paths and profiles/skills of professionals to fulfill the requirements of these roles • IADSS will rely on the expertise of academicians, industry experts and professionals to ensure defined standards are academically sound and rooted in industry realities • This will be an ongoing initiative, continuously updated according to industry dynamics and emerging topics in data science • To create awareness, IADSS will work towards promoting standards globally
  24. 24. www.iadss.org Insight are generated via interviews, surveys and other sources with participation of key profiles from industry & academia 1-to-1 Interviews Survey Research 3rd party research and social media analysis Conferences, Meet-ups & Workshops ● under the guidance of IADSS Advisory Board & with the support of Community Partners ● IADSS aims to engage with the analytics community through conferences and events (KDD2018, ICDM-2018, ODSC-2019, Metis Demystifying DS 07-2019
  25. 25. www.iadss.org Insight from our 1-1 interviews with data executives “We've seen folks create a bunch of beautiful dashboards and cost of tools has gone down precipitously in the last 20 years but that doesn't mean that you know what you’re looking at or ensure it won’t be misused and misrepresented. Same thing on the data science front. The most important thing is not being able to use an algorithm that you picked off a tool but to know how and why you're using it. “I just hired a data scientist and we started with about 60 applicants. The role was fairly well described and as a result I immediately eliminated 40 without a screening call… So I got down to about 20 for screening calls… down to 5 interviews onsite. At the end none of them were acceptable except for one. So from 60 to 1, this is a huge effort. After the screening, the ones that actually made it to interviews almost all of them failed on the math questions. There are many of them strong in engineering and but the math is rare.”
  26. 26. www.iadss.org Insight from our 1-1 interviews with data executives “You want to curb attrition and that ends up affecting your decisions on recognizing and promoting people between levels which might be inconsistent with actual skill sets and how they're progressing in their roles. But I don't see a solution for either because the market is so hot and they're getting bombarded with job offers. And that leads to a lot of frustration and cultural impact on the organization” “I take courses and certifications on platforms like Coursera as an expression of interest rather than expertise. It shows commitment to lifelong learning and which I think is really important for the Data Science community and participants.” “I've been on a panel where the panelist next to me who took a statistics course some time when she was at university and doesn't think math is important for Data Science.”
  27. 27. www.iadss.org IADSS aims the development & adoption of standards in the industry • Publishing report with proposed standards and assessment/measurement framework • Creating awareness and engagement within the Analytics Community • Driving adoption of standardized skill-sets and roles by the industry through tools for employers, driving awareness, and education • Collaboration with academia to provide input into program and curriculum development • Updating standards regularly to keep up with changes in technology and data analytics field • Developing methodologies and tools to support organizations and professionals to create more efficient job market and more effective career development Long-Term Objectives Ubiquity of standardized roles, definitions, and skill requirements for employers, educators, and practitioners.
  28. 28. www.iadss.org Some Concluding Thoughts There is a real problem in the industry – We will lose trust and credibility if we do not define what a Data Scientist is and what to expect from the role… • We need to come together as a community and think this through – standards can help a lot • Everyone is confused… o Employers don’t know who to hire and who is qualified o Educators don’t know how to train properly for the real roles out there o Candidates are confused and don’t know what is expected of them o Bad experiences lead to the steady decline of what will become the essential job of the 21st century • Much effort and money are wasted filling ill-defined roles with unqualified people • IADSS was created as an industry-wide initiative to address the confusion and de-mystify the role of a Data Scientist, Analyst, and Data Engineers/Professionals • Join us – take the survey, volunteer, participate, contribute in shaping this new field…
  29. 29. www.iadss.org Some Questions to Answer in this Workshop • Sharing early results from surveys • How do we define Data Science? • How many types of jobs do we end up outlining? o Data Scientist (different types) o Data Analyst o Business Analyst o Data Engineer o ML Scientist o ML Engineer o BI Profesional • How do we help create assessments? • How do we set industry standards that are accepted by majority
  30. 30. www.iadss.org Some results from survey by Hamit Hamutcu
  31. 31. www.iadss.org Research for Standards on Definitions of Analytics Roles, Skill-sets and Career: Survey looks into expected knowledge Insights about analytics/ data science team(s) Training, Development & Hiring • Analytics Director • Analytics Manager • BI Analyst / Specialist • BI Director • Big Data Engineer • Chief Data/ Analytics Officer • Data Analyst • Data Architect • Data Engineer Job Titles • Education • Data mining basics • Science skills • Engineering skills • Business / soft skills Join -> bit.ly/IADSSsurvey • Data Miner • Data Modeler • Data Science Director • Data Scientist • Machine Learning Engineer • Machine Learning Scientist/ Expert/ Specialist • Scientist / Researcher • Leadership related skills • Business domain skills • Tool skills
  32. 32. www.iadss.org Research participants come from a wide spectrum of industries and geographies • More than 700 survey responses collected so far from professionals and data science/analytics/BI executives. • We received insight from hundreds of organizations globally.
  33. 33. www.iadss.org Survey participant profile
  34. 34. www.iadss.org Structure and function of analytics teams
  35. 35. www.iadss.org Structure and function of analytics teams
  36. 36. www.iadss.org Training and management practices
  37. 37. www.iadss.org Organizations have a multitude of titles in data science and analytics teams
  38. 38. www.iadss.org Quick Quiz Which of these skills would you consider “must-have” for being a Data Scientist? Databases Statistics BI & Advanced Analytics Cloud Computing General Computing Big Data Visualization Data Transformation Optimization Programming Domain Expertise
  39. 39. www.iadss.org Initial insights for “Data Scientist” role Not Relevant Nice to Have Should Have Must Have Data mining basics Generating & interpreting basic statistical descriptions, basic visual descriptions of data, data cleaning, transformation, etc. Science skills * Statistics, optimization, predictive modelling, ML, NLP, etc. Engineering skills * SW engineering, big data development and maintenance, DB and DWH development, etc. Business / soft skills Building a data driven business narrative, cross functional collaboration, etc. Leadership related skills Business domain skills Tool skills * Scripting languages, statistical programming languages, libraries for machine learning, NoSQL, etc. * Unrelated individual skills decrease the average for domain
  40. 40. www.iadss.org Science Skills Tool Families Engineering Skills Initial insights for “Data Scientist” role Some domains in detail
  41. 41. www.iadss.org Tool Skills Tool Families Initial insights for “Data Scientist” role Some domains in detail Experience on a specifictool 3.14 Scripting languagesfordata science 3.28 Statisticalprogram m ing languages 3.33 Generalpurpose program m ing languages 2.05 Librariesform achine learning /num ericalcom puting 2.95 Developertools 2.40 SQ L,understanding ofrelationaldatabases 3.14 Enterprise relationalDBM S 2.40 NoSQ L 2.05 O pen-source relationalDBM S 2.30 Integration,ETL,A utom ation,Design tools 1.86 Generalknow ledge on distributed storage/com puting fram ew orks 2.16 Know ledge ofspecific 1.74 Distributed databases,data w arehousing,and storage 1.95 Clusterresource m anagers,otherbig data technologies 1.50 Enterprise advanced analytics/data m ining softw are 2.23 Reporting and visualization softw are 2.42 O ffice productivity,spreadsheettools 2.95 Experience w ith cloud com puting platform s 2.28 Cloud-based data w arehousing 1.98 O therspecificcloud-com puting services 1.68 Specificknow ledge of*NIX system s,shellscripting 1.90 V irtualization,containerization,etc. 1.95
  42. 42. www.iadss.org An alternative must-have analysis: Automatically extracted prototypical skill- sets from professionals’ responses Identified Prototypes Keywords/Highlights Database Enterprise/Open-Source RDBMS, SQL, NoSQL, Integration and ETL Tools Basic Analysis & Data Prep. Data Cleaning/Summarization, Generating and Interpreting Basic Visual/Statistical Descriptions from Data Cloud & Big Data AWS/Azure/Google Cloud, Hadoop/Spark, Flume/Storm/Flink, HBase/Cassandra, Hive/Parquet, Docker/Kubernetes, *NIX, General Purpose Programming Languages Statistical Analysis Statistics, Data Transformation, Statistical Programming Languages, Predictive Modeling Leadership & Management Executive/Peer Leadership, Project Management, Task Management Reporting & Cross-team Work Insight Generation/Presentation, Cross-functional Collaboration, Office Productivity Tools, Reporting & Visualization Tools Programming (General) CS Fundamentals, Software Engineering Skills, DB & DWH Development Subject-Matter Expertise Understanding and Experience of Applying Analytics in Spec. Domain/Industry, Knowledge of KPIs ML (Engineering) Scripting Languages for Data Science, ML Libraries, Jupyter/Zeppelin Environments ML (Theory) Neural Networks/Deep Learning, Predictive Modeling, Optimization, NLP/Time Series/Image & Audio & Signal Processing/Bioinformatics, Research Background * Top skills of the prototypical skill-sets extracted with Latent Dirichlet Allocation on self-declared “must-have” skills
  43. 43. www.iadss.org DB Basic Analysis& DataPrep. Cloud& BigData Statistical Analysis Leadership& Management Reporting& Cross-team Work Programming (General) Subject- Matter Expertise ML (Engineering) ML (Theory) Data Scientist Data Analyst Data Engineer BI Analyst Average role profiles as compositions of prototypical skill-sets
  44. 44. www.iadss.org A deeper look into variance: 3 types of Data Scientists The composition of the skill-sets varies greatly even across the respondents holding the same job title Estimated 41% of Data Scientists (Mostly Stats & ML Engineering) Estimated 37% of Data Scientists (Mostly Stats & ML Engineering, lower expertise) Estimated 22% of Data Scientists DS-1 DS-2 DS-3
  45. 45. www.iadss.org IADSS Blog – Research Updates & Stories & Analysis & Career Development Check our Blog & Be a Guest Writer or Subscribe IADSS.org/blog
  46. 46. www.iadss.org Please follow and engage with the initiative through our online channels (LinkedIn, Twitter, YouTube, and Website) iadss.org bit.ly/IADSStwitter bit.ly/IADSSlinkedin bit.ly/IADSSyoutube Twitter.com/IADSSglobal YouTube IADSS Take the survey bit.ly/IADSSsurvey
  47. 47. www.iadss.org Thank you! Usama Fayyad | Co-founder, IADSS, Chairman & CEO - Open Insights Hamit Hamutcu | Co-founder, IADSS iadss.org bit.ly/IADSStwitter bit.ly/IADSSlinkedin bit.ly/IADSSyoutube

×