• Save
Building Data Science Teams, Abbreviated
 

Like this? Share it with your network

Share

Building Data Science Teams, Abbreviated

on

  • 1,098 views

Q: Can I simply hire one rockstar data scientist to cover all this kind of work? ...

Q: Can I simply hire one rockstar data scientist to cover all this kind of work?

A: No, interdisciplinary work requires teams

A: Hire leads who can speak the lingo of each required discipline

A: Hire individual contributors who cover 2+ roles, when possible

Statistical Thinking – Solve the Whole Problem

BONUS: Meta Organization – Integration with Adjacent Teams

Co-authors Allen Day @allenday and Paco Nathan @pacoid

Statistics

Views

Total Views
1,098
Views on SlideShare
1,027
Embed Views
71

Actions

Likes
7
Downloads
0
Comments
0

3 Embeds 71

https://twitter.com 59
https://www.linkedin.com 7
http://www.linkedin.com 5

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Building Data Science Teams, Abbreviated Presentation Transcript

  • 1. © 2014 MapR Technologies 1 Q: Can I simply hire one rockstar data scientist to cover all this kind of work?
  • 2. © 2014 MapR Technologies 2 A: No, interdisciplinary work requires teams A: Hire leads who can speak the lingo of each required discipline A: Hire individual contributors who cover 2+ roles, when possible
  • 3. © 2014 MapR Technologies 3© 2014 MapR Technologies
  • 4. © 2014 MapR Technologies 4 Statistical Thinking – Solve the Whole Problem • Use both logical AND analytical reasoning. Understand – not only problems and solutions – but also processes and variances • Uncommon mindset in IT industry – Programmers typically don’t think this way. Systems Engineers and Data Scientists must. • Common mindset in physical sciences – Particularly useful in BigData. Most of my peers are trained as Physical Scientists and Engineers.
  • 5. © 2014 MapR Technologies 5 Aggressively Proactive Learning • Disrupts old learning and management models – one size fits all – Specialists Hire people who learn and re-learn efficiently Throw Your Life a Curve Whitney Johnson blogs.hbr.org/johnson/2012/09/throw-your-life-a-curve.html
  • 6. © 2014 MapR Technologies 6 Team Process = Needs appsapps discoverydiscovery modelingmodeling systemssystems help people ask the right questions allow automation to place informed bets deliver products at scale to customers build smarts into product features keep infrastructure running, cost-effective
  • 7. © 2014 MapR Technologies 7 Team Matrix business process, stakeholder data prep, discovery, modeling, etc. software engineering, automation systems engineering, access Conceptual tool for building and managing Data Science teams Overlay your project requirements (needs) with your team’s strengths (roles) That will show very quickly where to focus Bring in individuals who cover 2-3 needs, particularly for Team Leads
  • 8. © 2014 MapR Technologies 8 Value Development Process = Needs business process, stakeholder data prep, discovery, modeling, etc. software engineering, automation systems engineering, access One dimension is “needs”: discovery, modeling, integration, apps, systems These are the primary phases of leveraging BigData Analysts from discovery. Engineers drive from systems. Both meet at integration. Effective management of Data Science lives at integration and doesn’t delegate it
  • 9. © 2014 MapR Technologies 9 Team Composition = Roles business process, stakeholder data prep, discovery, modeling, etc. software engineering, automation systems engineering, access The other dimension is “roles”: stakeholder, data scientist, app developer, ops Each role brings different disciplines, opportunities, and risks. There’s great power in pairing people with complementary skills. Blurring roles is very effective with great people, e.g. DevOps. There is danger in blurring boundaries: pushing down / overloading stresses teams
  • 10. © 2014 MapR Technologies 10 Team Matrix = Needs x Roles business process, stakeholder data prep, discovery, modeling, etc. software engineering, automation systems engineering, access
  • 11. © 2014 MapR Technologies 11 Allen’s Overlay business process, stakeholder data prep, discovery, modeling, etc. software engineering, automation systems engineering, access
  • 12. © 2014 MapR Technologies 12 Lambda Architecture NEW DATA STREAM MERGED VIEW (HBASE) BATCH VIEWS √ REAL-TIME DATA REAL-TIME VIEWS BATCH LAYER SERVING LAYER SPEED LAYER MERGE ALL DATA (HDFS) PRECOMPUTE VIEWS (MAP REDUCE) HADOOP BATCH RECOMPUTE PROCESS STREAM INCREMENT VIEWS STORM REAL-TIME INCREMENT Partial aggregate Partial aggregate Partial aggregate
  • 13. © 2014 MapR Technologies 13 Use Cases on Lambda Architecture NEW DATA STREAM MERGED VIEW (HBASE) BATCH VIEWS √ REAL-TIME DATA REAL-TIME VIEWS BATCH LAYER SERVING LAYER SPEED LAYER MERGE ALL DATA (HDFS) PRECOMPUTE VIEWS (MAP REDUCE) HADOOP BATCH RECOMPUTE PROCESS STREAM INCREMENT VIEWS STORM REAL-TIME INCREMENT Partial aggregate Partial aggregate Partial aggregate Log AnalysisData Lake Realtime Processing
  • 14. © 2014 MapR Technologies 14 Use Cases on Needs x Roles Data Lake
  • 15. © 2014 MapR Technologies 15 MapR Data Platform Supports Complete Data Science Lifecycle Filesystem POSIX NFS HBase HDFS MapReduce SAN Storage
  • 16. © 2014 MapR Technologies 16 FILESYSTEM POSIX NFS HBASE NOSQL TABLES API HADOOP HDFS API APACHE™ HADOOP® HDFS APACHE HBASE IMPLEMENTS IMPLEMENTS IMPLEMENTS IMPLEMENTS IMPLEMENTS DEPENDS DEPENDS MapR Data Platform Architecture in a Nutshell
  • 17. © 2014 MapR Technologies 17 HADOOP HDFS API HBASE NOSQL TABLES API FILESYSTEM APACHE™ HADOOP® HDFS APACHE HBASE IMPLEMENTS IMPLEMENTS IMPLEMENTS IMPLEMENTS IMPLEMENTS DEPENDS DEPENDS Vertical Integration = High Performance POSIX NFS MapR Data Platform Architecture in a Nutshell
  • 18. © 2014 MapR Technologies 18 Organization How Do Committees Invent? Melvin Conway, 1968 melconway.com/research/committees.html Manu Cornet bonkersworld.net “Any organization that designs a system (defined more broadly here than just information systems) will inevitably produce a design whose structure is a copy of the organization’s communication structure.” Q: •does this fit with software process? •does this fit with distributed apps? see also: haacked.com/archive/2013/05/13/applying-conways-law.aspx
  • 19. © 2014 MapR Technologies 19 WSJ: Five Ways to Organize Your Data Scientists
  • 20. © 2014 MapR Technologies 20 Meta Organization – Integration with Adjacent Teams • Central analytics and data science organization, based in a Strategy function [Facebook] • Same type of central organization, reporting to IT or Finance or maybe R&D [LinkedIn, GE, P&G] • Center of Excellence, located in one of the above-mentioned functions • Analysts and data scientists in one function, e.g., Marketing [American Express] • Fully decentralized analysts with no coordination [Twitter]
  • 21. © 2014 MapR Technologies 21© 2014 MapR Technologies