• Like
  • Save

Building Data Science Teams, Abbreviated

  • 918 views
Uploaded on

Q: Can I simply hire one rockstar data scientist to cover all this kind of work? …

Q: Can I simply hire one rockstar data scientist to cover all this kind of work?

A: No, interdisciplinary work requires teams

A: Hire leads who can speak the lingo of each required discipline

A: Hire individual contributors who cover 2+ roles, when possible

Statistical Thinking – Solve the Whole Problem

BONUS: Meta Organization – Integration with Adjacent Teams

Co-authors Allen Day @allenday and Paco Nathan @pacoid

More in: Science , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
918
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
0
Comments
0
Likes
7

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. © 2014 MapR Technologies 1 Q: Can I simply hire one rockstar data scientist to cover all this kind of work?
  • 2. © 2014 MapR Technologies 2 A: No, interdisciplinary work requires teams A: Hire leads who can speak the lingo of each required discipline A: Hire individual contributors who cover 2+ roles, when possible
  • 3. © 2014 MapR Technologies 3© 2014 MapR Technologies
  • 4. © 2014 MapR Technologies 4 Statistical Thinking – Solve the Whole Problem • Use both logical AND analytical reasoning. Understand – not only problems and solutions – but also processes and variances • Uncommon mindset in IT industry – Programmers typically don’t think this way. Systems Engineers and Data Scientists must. • Common mindset in physical sciences – Particularly useful in BigData. Most of my peers are trained as Physical Scientists and Engineers.
  • 5. © 2014 MapR Technologies 5 Aggressively Proactive Learning • Disrupts old learning and management models – one size fits all – Specialists Hire people who learn and re-learn efficiently Throw Your Life a Curve Whitney Johnson blogs.hbr.org/johnson/2012/09/throw-your-life-a-curve.html
  • 6. © 2014 MapR Technologies 6 Team Process = Needs appsapps discoverydiscovery modelingmodeling systemssystems help people ask the right questions allow automation to place informed bets deliver products at scale to customers build smarts into product features keep infrastructure running, cost-effective
  • 7. © 2014 MapR Technologies 7 Team Matrix business process, stakeholder data prep, discovery, modeling, etc. software engineering, automation systems engineering, access Conceptual tool for building and managing Data Science teams Overlay your project requirements (needs) with your team’s strengths (roles) That will show very quickly where to focus Bring in individuals who cover 2-3 needs, particularly for Team Leads
  • 8. © 2014 MapR Technologies 8 Value Development Process = Needs business process, stakeholder data prep, discovery, modeling, etc. software engineering, automation systems engineering, access One dimension is “needs”: discovery, modeling, integration, apps, systems These are the primary phases of leveraging BigData Analysts from discovery. Engineers drive from systems. Both meet at integration. Effective management of Data Science lives at integration and doesn’t delegate it
  • 9. © 2014 MapR Technologies 9 Team Composition = Roles business process, stakeholder data prep, discovery, modeling, etc. software engineering, automation systems engineering, access The other dimension is “roles”: stakeholder, data scientist, app developer, ops Each role brings different disciplines, opportunities, and risks. There’s great power in pairing people with complementary skills. Blurring roles is very effective with great people, e.g. DevOps. There is danger in blurring boundaries: pushing down / overloading stresses teams
  • 10. © 2014 MapR Technologies 10 Team Matrix = Needs x Roles business process, stakeholder data prep, discovery, modeling, etc. software engineering, automation systems engineering, access
  • 11. © 2014 MapR Technologies 11 Allen’s Overlay business process, stakeholder data prep, discovery, modeling, etc. software engineering, automation systems engineering, access
  • 12. © 2014 MapR Technologies 12 Lambda Architecture NEW DATA STREAM MERGED VIEW (HBASE) BATCH VIEWS √ REAL-TIME DATA REAL-TIME VIEWS BATCH LAYER SERVING LAYER SPEED LAYER MERGE ALL DATA (HDFS) PRECOMPUTE VIEWS (MAP REDUCE) HADOOP BATCH RECOMPUTE PROCESS STREAM INCREMENT VIEWS STORM REAL-TIME INCREMENT Partial aggregate Partial aggregate Partial aggregate
  • 13. © 2014 MapR Technologies 13 Use Cases on Lambda Architecture NEW DATA STREAM MERGED VIEW (HBASE) BATCH VIEWS √ REAL-TIME DATA REAL-TIME VIEWS BATCH LAYER SERVING LAYER SPEED LAYER MERGE ALL DATA (HDFS) PRECOMPUTE VIEWS (MAP REDUCE) HADOOP BATCH RECOMPUTE PROCESS STREAM INCREMENT VIEWS STORM REAL-TIME INCREMENT Partial aggregate Partial aggregate Partial aggregate Log AnalysisData Lake Realtime Processing
  • 14. © 2014 MapR Technologies 14 Use Cases on Needs x Roles Data Lake
  • 15. © 2014 MapR Technologies 15 MapR Data Platform Supports Complete Data Science Lifecycle Filesystem POSIX NFS HBase HDFS MapReduce SAN Storage
  • 16. © 2014 MapR Technologies 16 FILESYSTEM POSIX NFS HBASE NOSQL TABLES API HADOOP HDFS API APACHE™ HADOOP® HDFS APACHE HBASE IMPLEMENTS IMPLEMENTS IMPLEMENTS IMPLEMENTS IMPLEMENTS DEPENDS DEPENDS MapR Data Platform Architecture in a Nutshell
  • 17. © 2014 MapR Technologies 17 HADOOP HDFS API HBASE NOSQL TABLES API FILESYSTEM APACHE™ HADOOP® HDFS APACHE HBASE IMPLEMENTS IMPLEMENTS IMPLEMENTS IMPLEMENTS IMPLEMENTS DEPENDS DEPENDS Vertical Integration = High Performance POSIX NFS MapR Data Platform Architecture in a Nutshell
  • 18. © 2014 MapR Technologies 18 Organization How Do Committees Invent? Melvin Conway, 1968 melconway.com/research/committees.html Manu Cornet bonkersworld.net “Any organization that designs a system (defined more broadly here than just information systems) will inevitably produce a design whose structure is a copy of the organization’s communication structure.” Q: •does this fit with software process? •does this fit with distributed apps? see also: haacked.com/archive/2013/05/13/applying-conways-law.aspx
  • 19. © 2014 MapR Technologies 19 WSJ: Five Ways to Organize Your Data Scientists
  • 20. © 2014 MapR Technologies 20 Meta Organization – Integration with Adjacent Teams • Central analytics and data science organization, based in a Strategy function [Facebook] • Same type of central organization, reporting to IT or Finance or maybe R&D [LinkedIn, GE, P&G] • Center of Excellence, located in one of the above-mentioned functions • Analysts and data scientists in one function, e.g., Marketing [American Express] • Fully decentralized analysts with no coordination [Twitter]
  • 21. © 2014 MapR Technologies 21© 2014 MapR Technologies