Building Data Science Teams
Upcoming SlideShare
Loading in...5
×
 

Building Data Science Teams

on

  • 704 views

This session describes the roles and skill sets required when building a Data Science team, and starting a data science initiative, including how to develop Data Science capabilities, select suitable ...

This session describes the roles and skill sets required when building a Data Science team, and starting a data science initiative, including how to develop Data Science capabilities, select suitable organizational models for Data Science teams, and understand the role of executive engagement for enhancing analytical maturity at an organization.


Objective 1: Understand the knowledge and skills needed for a Data Science team and how to acquire them.
After this session you will be able to:
Objective 2: Learn about the different organizational models for forming a Data Science team and how to choose the best for your organization.
Objective 3: Understand the importance of Executive support for Data Science initiatives and role it plays in their successful deployment.

Statistics

Views

Total Views
704
Views on SlideShare
704
Embed Views
0

Actions

Likes
3
Downloads
23
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Building Data Science Teams Building Data Science Teams Presentation Transcript

  • Building Data Science Teams David Dietrich Advisory Technical Education Consultant EMC Education Services @imdaviddietrich © Copyright 2013 EMC Corporation. All rights reserved. 1
  • Roadmap Information Disclaimer  EMC makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”).  Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.  Roadmap information is EMC Restricted Confidential and is provided under the terms, conditions and restrictions defined in the EMC NonDisclosure Agreement in place with your organization. © Copyright 2013 EMC Corporation. All rights reserved. 2
  • Do You Need A Data Science Team For This? Creating Reports, Dashboards, and Databases… © Copyright 2013 EMC Corporation. All rights reserved. 3
  • How About For This? Creating a Map of the Internet © Copyright 2013 EMC Corporation. All rights reserved. 4
  • Example: Output From a Data Science Team Mapping The Spread of Innovation Ideas Using Social Graphs © Copyright 2013 EMC Corporation. All rights reserved. 5
  • Big Data Requires New Approaches to Analytics Business Intelligence Versus Data Science Predictive Analytics and Data Mining (Data Science) Typical Techniques and Data Types Common Questions Exploratory Data Science Analytical Approach • • • • • Typical Techniques and Data Types Business Intelligence Past © Copyright 2013 EMC Corporation. All rights reserved. What if…..? What’s the optimal scenario for our business? What will happen next? What if these trends continue? Why is this happening? Business Intelligence TIME • Common Questions Explanatory Optimization, predictive modeling, forecasting, statistical analysis Structured/unstructured data, many types of sources, very large data sets • • • • Standard and ad hoc reporting, dashboards, alerts, queries, details on demand Structured data, traditional sources, manageable data sets What happened last quarter? How many did we sell? Where is the problem? In which situations? Future 6
  • “Companies Are Always Looking To Reinvent Themselves….But It’s A Mistake To Treat Data Science Teams Like Any Old Product Group. To Build Teams That Create Great Data Products, You Have To Find People With The Skills And The Curiosity To Ask The Big Questions.” - DJ Patil, Data Scientist in Residence at Greylock Partners © Copyright 2013 EMC Corporation. All rights reserved. 7
  • Framework for Developing Data Science Teams Data Science Team Data Scientist © Copyright 2013 EMC Corporation. All rights reserved. BI Analyst Project Sponsor Project Manager Business User Data Engineer DBA 8
  • Data Science Teams © Copyright 2013 EMC Corporation. All rights reserved. 9
  • Data Scientist: An Emerging Career SPOTLIGHT ON BIG DATA by Thomas H. Davenport and D.J. Patil © Copyright 2013 EMC Corporation. All rights reserved. 10
  • Comparing Two Data Analysts Traditional BI Analyst Data Scientist ACME Healthcare ACME Healthcare John Janet Sample Tasks Sample Tasks • Report Regional Sales For Last Quarter • Perform Customer Feedback Surveys • Identify Average Cost Per Supplier • Predict Regional Sales For Next Quarter • Discover Customer Opinions Via Social Media • Identify Ways to Maximize Sales Campaign ROI © Copyright 2013 EMC Corporation. All rights reserved. 11
  • Skills Matrix, Based on Recent Students Quantitative Skills Quantitative Analysts, Statisticians, Business and data analysts Recent STEM Grads Data Scientists Business Intelligence Professionals, IT Technical Ability © Copyright 2013 EMC Corporation. All rights reserved. 12
  • Profile of a Data Scientist Quantitative Technical Skeptical © Copyright 2013 EMC Corporation. All rights reserved. Curious & Creative Communicative & Collaborative 13
  • Interpreting the Resume of a Senior Data Scientist Data Scientist Job Description Responsibilities: • Work with business owners to map business requirements into technical solutions • Analyze and extract relevant information from large amounts of data to help identify key revenue-driven features • Perform ad-hoc statistical and data mining analyses • Design and implement scalable and repeatable solutions, and establish scalable, efficient, automated processes for large scale data analyses • Work closely with the software engineering team to drive new feature creation • Design multi-factor experiments and validate hypothesis Qualifications: • A proven passion for generating insights from data, with a strong familiarity with the higher-level trends in data growth, open-source platforms, and public data sets • Experience with statistical languages and packages, including R, S-Plus, SAS and Matlab, and/or Mahout • Experience working with relational databases and/or distributed computing platforms, and their query interfaces, such as SQL, MapReduce, Hadoop, Cassandra, PIG, and Hive • Strong communication skills, with ability to communicate at all levels of the organization • Masters/PhD degree in mathematics, statistics, computer science or a similar quantitative field • Experience in designing and implementing scalable data mining solutions • Preferably experience with additional programming languages, including Python, Java, and C/C++ • Ability to travel as-needed to meet with customers Statistics Programming Data Mining Advanced STEM Degrees © Copyright 2013 EMC Corporation. All rights reserved. Sample Data Scientist Resume John Smith john.smith@email.com Skills R, SAS, Java, data mining, statistics, ontology, bioinformatics, human-computer interaction, research Experience 2009—Present, Senior Data Scientist, ABC Analytics 2007—2009, Founder&CEO, Genome Genome specializes in consumer health information. The main product is InherithHealth, a tool for acquisition of family medical histories that provides familial disease risk assessment. 2005—2007, Knowledge Engineer, ScienceExperts.com Managed technical outsourcing efforts. Developed criterion and evaluated engineering outsourcing agencies and individuals … 2004—2006, Research Scientist, University of Washington Developed rigorous statistical and computational models for addressing primary shortcomings of observational data analysis in the context of disease risk and drug response. 2000—2004, Research Developer, Nat’l Inst. of Standards and Technology Designed and implemented prototypes. Evaluated tools for representing rules of autonomous on-road navigation. Education Ph.D, Biomedical Informatics, University of Washington, 2011 Dissertation: Detection of Protein–protein Interaction in Living Cells by Flow Cytometry BS, Computer Science, University of Texas at Austin, 2004 14
  • Successful Analytic Projects Require Breadth of Roles Business User Project Sponsor Database Administrator (DBA) © Copyright 2013 EMC Corporation. All rights reserved. Project Manager Data Engineer Business Intelligence Analyst Data Scientist 15
  • Framework for Developing Data Science Teams Developing Data Science Capabilities Data Science Team Transforming Data Scientist © Copyright 2013 EMC Corporation. All rights reserved. BI Analyst Creating Project Sponsor As-a-Service Project Manager Business User Crowdsourcing Data Engineer DBA 16
  • Developing Data Science Capabilities © Copyright 2013 EMC Corporation. All rights reserved. 17
  • Four Approaches to Developing Data Science Capabilities Transforming © Copyright 2013 EMC Corporation. All rights reserved. Creating As a Service Crowdsourcing 18
  • Approaches to Developing Data Science Capabilities: Transforming Teams Transforming And Realignment With Minimal Change To The Current Organizational Structure • Industries Requiring Deep Domain Knowledge (Such As Genetics And DNA Sequencing) • Established Companies Who Wish To Introduce Data Science Into Their Business • Companies Who Wish To Enrich The In-house Skill Sets © Copyright 2013 EMC Corporation. All rights reserved. 19
  • Approaches to Developing Data Science Capabilities: Transforming Teams Developing A New Team From Scratch • Start-up Companies • Companies Who Wish To … – Increase Their Focus On Data Analytics – Start New Data Science Projects • Companies Where Data Is The Product • Deep Domain Knowledge Is Less Critical For The Analytics © Copyright 2013 EMC Corporation. All rights reserved. 20
  • Approaches to Developing Data Science Capabilities: Data Science as a Service Engaging Data Science as a Service (DSaaS) • When To Engage DSaaS Providers – Prefer Not To Change Existing Organizational Structure – When Creating Or Transforming Are Not Viable Options • Consider Service-level Agreements (SLAs) When Determining Whether To Engage Internal Resources Or External Providers © Copyright 2013 EMC Corporation. All rights reserved. 21
  • Approaches to Developing Data Science Capabilities: Crowdsourcing Data Science Outsource Data Science Project To Distributed Groups Of People • When To Crowdsource – The Problem Is “Open” In Nature – Willing To Accept Opinions From Distributed And Diverse Groups Of People – There’s A Back-up Plan In Case Of “Crowd Failures” • Examples: Wikipedia, Netflix’s $1,000,000 Prize © Copyright 2013 EMC Corporation. All rights reserved. 22
  • Approaches to Developing Data Science Capabilities: Crowdsourcing Data Science (Cont’d) Outsource Data Science Projects To Distributed Groups Of People • Different Crowdsourcing Models – Wisdom Of Crowds – Swarm Creativity (Collective Intelligence) • Crowdsourcing Platforms – Kaggle.com, Innocentive.com – Amazon Mechanical Turk • Crowd Failures: When The Turnout Of Crowdsourcing Is Unsatisfactory © Copyright 2013 EMC Corporation. All rights reserved. 23
  • Benefits and Drawbacks of the Four Approaches Transforming Creating DSaaS Crowdsourcing • Strong Domain • Control Over Skill- • Able to Scale on • Leverage Wisdom • Risk of homogeneous • Hiring and • Provider May Not • No SLA; value not Knowledge • Knowledge of Business Processes • New Talent Raises Level of Team Performance • Gradually Increases the Quality of Service thinking • May Struggle With Quality of Service • Some Team Members May Resist Change © Copyright 2013 EMC Corporation. All rights reserved. sets • More Flexibility • High Quality of Service Knowledge Transfer Are Timeconsuming • Time Required to Find and Hire Right Team Members Demand • May Get Better Service Levels Than In-house • Learn From Outside Experts Understand Company’s Unique Processes • Difficult to Bring Expertise Back Inhouse • Decreasing Quality of Service Over Time of the Crowds • Diverse Perspectives • Lower Cost • Fast Results guaranteed • Difficult to design the “Open” Problem • Difficult For Domain Intensive Tasks • Crowd Failure May Happen (Adds Cost) 24
  • Framework for Developing Data Science Teams Organizational Model Developing Data Science Capabilities Data Science Team Centralized Transforming Data Scientist © Copyright 2013 EMC Corporation. All rights reserved. BI Analyst Decentralized Creating Project Sponsor Hybrid As-a-Service Project Manager Business User Crowdsourcing Data Engineer DBA 25
  • Organizational Model © Copyright 2013 EMC Corporation. All rights reserved. 26
  • Organizational Models for Data Science Teams Centralized BU BU BU BU DS Team Hybrid DS Team BU The Data Science Team Functions As A Hub And Spoke Model, In Which They Are A Central Provider Of Analytics To Multiple Business Units Decentralized BU BU BU BU Each Business Unit Has Its Own Data Science Capabilities © Copyright 2013 EMC Corporation. All rights reserved. There Is A Centralized Data Science Team, But Business Units Also Have Data Science Capabilities Regardless Of Which Approach, They All Need Executive Sponsorship To Succeed 27
  • Framework for Developing Data Science Teams Executive Engagement Data-driven CEO Organizational Model Developing Data Science Capabilities Data Science Team Centralized Transforming Data Scientist © Copyright 2013 EMC Corporation. All rights reserved. BI Analyst Chief Data Officer Decentralized Creating Project Sponsor Hybrid As-a-Service Project Manager Business User Crowdsourcing Data Engineer DBA 28
  • Executive Engagement © Copyright 2013 EMC Corporation. All rights reserved. 29
  • Analytics Requires Executive Level Engagement “Executive Sponsorship Is So Vital To Analytical Competition…” -- Tom Davenport (Competing on Analytics) Chief Strategy Officer Simulate Outcomes for Acquiring Our Top 3 Competitors CEO Chief Finance Officer Use Time Series Analysis Over Historical Data to Predict KPIs to Project Earnings Chief Product Officer Conduct Social Media Analyses to Identify Customer Opinions Chief Security Officer Collect and Mine Log Data Within and Outside of the Company to Detect Unknown Threats Chief Marketing Officer Conduct Behavior Analyses to Predict If Customers Are Going to Churn Chief Operating Officer Mine Customer Opinions and Competitor Behaviors to Predict Inventory Demands © Copyright 2013 EMC Corporation. All rights reserved. Executive Boardroom 30
  • Executive Engagement: Data-Driven CEO “… If Your Organization Can Arrange It … Have Someone In A Key Operational Role -Business Unit Head, Chief Operations Officer, Even CEO -- To Be An Enthusiastic Advocate Of Matters Quantitative.” -- Tom Davenport (HBR Blog Network) Key Focus Areas of a Data-driven CEO: • Strategic Data Planning • Analytic Understanding • Technology Awareness © Copyright 2013 EMC Corporation. All rights reserved. Procter & Gamble Business Sphere 31
  • Executive Engagement: Chief Data Officer (CDO) “… It's Time For Corporations To Embrace A New Functional Member Of The C-suite: The Chief Data Officer (CDO).” -- Anthony Goldbloom and Merav Bloch, Kaggle • Promote Data-driven Decision Making To Support Company’s Key Initiatives • Ensure The Company Collects The Right Data • Oversee And Drive Analytics Company-wide 25% of organizations will have a Chief Data Officer by 2015. Executive-level Advisor On Data Analytics Executive Boardroom -- Gartner Blog Network © Copyright 2013 EMC Corporation. All rights reserved. 32
  • Two New EMC Data Science Courses for Business Transformation Business Leaders New 90 min Introducing Data Science and Big Data Analytics for Business Transformation Heads of Data Science Teams New 1 day Data Science and Big Data Analytics for Business Transformation Aspiring Data Scientists 5 days Data Science and Big Data Analytics © Copyright 2013 EMC Corporation. All rights reserved. 33
  • Closing Thoughts…. SPOTLIGHT ON BIG DATA Now You Know How To Develop Data Science Teams…What Next? • • • • • Determine How You Would Like To Develop Data Science Capabilities Hire People To Fill Out Your Data Science Team Consider Which Organizational Model Will Work Best For Your Situation Assess How Much Executive Engagement You Have Or Need Map Out Potential Projects -- Balance Quick Wins With Longer-term Wins © Copyright 2013 EMC Corporation. All rights reserved. 34
  • Questions? Visit the EMC Global Services Booth, #110 Additional Resources: 1. EMC Education Services curriculum on Data Science and Big Data Analytics for Business Transformation: http://education.emc.com/guest/campaign/data_science.aspx 2. My Blog on Data Science & Big Data Analytics: http://infocus.emc.com/author/david_dietrich/ 3. Blog on applying Data Analytics Lifecycle to measuring innovation data: http://stevetodd.typepad.com/my_weblog/data-science-and-big-data-curriculum/ © Copyright 2013 EMC Corporation. All rights reserved. 35