Your SlideShare is downloading. ×
© 2014 MapR Technologies 1
Q:
Can I simply hire one rockstar data
scientist to cover all this kind of
work?
© 2014 MapR Technologies 2
A: No, interdisciplinary work
requires teams
A: Hire leads who can speak the
lingo of each requ...
© 2014 MapR Technologies 3© 2014 MapR Technologies
© 2014 MapR Technologies 4
Statistical Thinking – Solve the Whole Problem
• Use both logical AND analytical reasoning. Und...
© 2014 MapR Technologies 5
Aggressively Proactive Learning
• Disrupts old learning and
management models
– one size fits a...
© 2014 MapR Technologies 6
Team Process = Needs
appsapps
discoverydiscovery
modelingmodeling
systemssystems
help people as...
© 2014 MapR Technologies 7
Team Matrix
business process,
stakeholder
data prep, discovery,
modeling, etc.
software enginee...
© 2014 MapR Technologies 8
Value Development Process = Needs
business process,
stakeholder
data prep, discovery,
modeling,...
© 2014 MapR Technologies 9
Team Composition = Roles
business process,
stakeholder
data prep, discovery,
modeling, etc.
sof...
© 2014 MapR Technologies 10
Team Matrix = Needs x Roles
business process,
stakeholder
data prep, discovery,
modeling, etc....
© 2014 MapR Technologies 11
Allen’s Overlay
business process,
stakeholder
data prep, discovery,
modeling, etc.
software en...
© 2014 MapR Technologies 12
Lambda Architecture
NEW DATA
STREAM
MERGED
VIEW
(HBASE)
BATCH VIEWS
√
REAL-TIME DATA
REAL-TIME...
© 2014 MapR Technologies 13
Use Cases on Lambda Architecture
NEW DATA
STREAM
MERGED
VIEW
(HBASE)
BATCH VIEWS
√
REAL-TIME D...
© 2014 MapR Technologies 14
Use Cases on Needs x Roles
Data Lake
© 2014 MapR Technologies 15
MapR Data Platform
Supports Complete Data Science Lifecycle
Filesystem
POSIX NFS
HBase
HDFS
Ma...
© 2014 MapR Technologies 16
FILESYSTEM
POSIX NFS
HBASE
NOSQL TABLES API
HADOOP
HDFS API
APACHE™
HADOOP®
HDFS
APACHE HBASE
...
© 2014 MapR Technologies 17
HADOOP
HDFS API
HBASE
NOSQL TABLES API
FILESYSTEM
APACHE™
HADOOP®
HDFS
APACHE HBASE
IMPLEMENTS...
© 2014 MapR Technologies 18
Organization
How Do Committees Invent?
Melvin Conway, 1968
melconway.com/research/committees.h...
© 2014 MapR Technologies 19
WSJ: Five Ways to Organize Your Data Scientists
© 2014 MapR Technologies 20
Meta Organization – Integration with Adjacent
Teams
• Central analytics and data science organ...
© 2014 MapR Technologies 21© 2014 MapR Technologies
Upcoming SlideShare
Loading in...5
×

Building Data Science Teams, Abbreviated

1,272

Published on

Q: Can I simply hire one rockstar data scientist to cover all this kind of work?

A: No, interdisciplinary work requires teams

A: Hire leads who can speak the lingo of each required discipline

A: Hire individual contributors who cover 2+ roles, when possible

Statistical Thinking – Solve the Whole Problem

BONUS: Meta Organization – Integration with Adjacent Teams

Co-authors Allen Day @allenday and Paco Nathan @pacoid

Published in: Science, Technology

Transcript of "Building Data Science Teams, Abbreviated"

  1. 1. © 2014 MapR Technologies 1 Q: Can I simply hire one rockstar data scientist to cover all this kind of work?
  2. 2. © 2014 MapR Technologies 2 A: No, interdisciplinary work requires teams A: Hire leads who can speak the lingo of each required discipline A: Hire individual contributors who cover 2+ roles, when possible
  3. 3. © 2014 MapR Technologies 3© 2014 MapR Technologies
  4. 4. © 2014 MapR Technologies 4 Statistical Thinking – Solve the Whole Problem • Use both logical AND analytical reasoning. Understand – not only problems and solutions – but also processes and variances • Uncommon mindset in IT industry – Programmers typically don’t think this way. Systems Engineers and Data Scientists must. • Common mindset in physical sciences – Particularly useful in BigData. Most of my peers are trained as Physical Scientists and Engineers.
  5. 5. © 2014 MapR Technologies 5 Aggressively Proactive Learning • Disrupts old learning and management models – one size fits all – Specialists Hire people who learn and re-learn efficiently Throw Your Life a Curve Whitney Johnson blogs.hbr.org/johnson/2012/09/throw-your-life-a-curve.html
  6. 6. © 2014 MapR Technologies 6 Team Process = Needs appsapps discoverydiscovery modelingmodeling systemssystems help people ask the right questions allow automation to place informed bets deliver products at scale to customers build smarts into product features keep infrastructure running, cost-effective
  7. 7. © 2014 MapR Technologies 7 Team Matrix business process, stakeholder data prep, discovery, modeling, etc. software engineering, automation systems engineering, access Conceptual tool for building and managing Data Science teams Overlay your project requirements (needs) with your team’s strengths (roles) That will show very quickly where to focus Bring in individuals who cover 2-3 needs, particularly for Team Leads
  8. 8. © 2014 MapR Technologies 8 Value Development Process = Needs business process, stakeholder data prep, discovery, modeling, etc. software engineering, automation systems engineering, access One dimension is “needs”: discovery, modeling, integration, apps, systems These are the primary phases of leveraging BigData Analysts from discovery. Engineers drive from systems. Both meet at integration. Effective management of Data Science lives at integration and doesn’t delegate it
  9. 9. © 2014 MapR Technologies 9 Team Composition = Roles business process, stakeholder data prep, discovery, modeling, etc. software engineering, automation systems engineering, access The other dimension is “roles”: stakeholder, data scientist, app developer, ops Each role brings different disciplines, opportunities, and risks. There’s great power in pairing people with complementary skills. Blurring roles is very effective with great people, e.g. DevOps. There is danger in blurring boundaries: pushing down / overloading stresses teams
  10. 10. © 2014 MapR Technologies 10 Team Matrix = Needs x Roles business process, stakeholder data prep, discovery, modeling, etc. software engineering, automation systems engineering, access
  11. 11. © 2014 MapR Technologies 11 Allen’s Overlay business process, stakeholder data prep, discovery, modeling, etc. software engineering, automation systems engineering, access
  12. 12. © 2014 MapR Technologies 12 Lambda Architecture NEW DATA STREAM MERGED VIEW (HBASE) BATCH VIEWS √ REAL-TIME DATA REAL-TIME VIEWS BATCH LAYER SERVING LAYER SPEED LAYER MERGE ALL DATA (HDFS) PRECOMPUTE VIEWS (MAP REDUCE) HADOOP BATCH RECOMPUTE PROCESS STREAM INCREMENT VIEWS STORM REAL-TIME INCREMENT Partial aggregate Partial aggregate Partial aggregate
  13. 13. © 2014 MapR Technologies 13 Use Cases on Lambda Architecture NEW DATA STREAM MERGED VIEW (HBASE) BATCH VIEWS √ REAL-TIME DATA REAL-TIME VIEWS BATCH LAYER SERVING LAYER SPEED LAYER MERGE ALL DATA (HDFS) PRECOMPUTE VIEWS (MAP REDUCE) HADOOP BATCH RECOMPUTE PROCESS STREAM INCREMENT VIEWS STORM REAL-TIME INCREMENT Partial aggregate Partial aggregate Partial aggregate Log AnalysisData Lake Realtime Processing
  14. 14. © 2014 MapR Technologies 14 Use Cases on Needs x Roles Data Lake
  15. 15. © 2014 MapR Technologies 15 MapR Data Platform Supports Complete Data Science Lifecycle Filesystem POSIX NFS HBase HDFS MapReduce SAN Storage
  16. 16. © 2014 MapR Technologies 16 FILESYSTEM POSIX NFS HBASE NOSQL TABLES API HADOOP HDFS API APACHE™ HADOOP® HDFS APACHE HBASE IMPLEMENTS IMPLEMENTS IMPLEMENTS IMPLEMENTS IMPLEMENTS DEPENDS DEPENDS MapR Data Platform Architecture in a Nutshell
  17. 17. © 2014 MapR Technologies 17 HADOOP HDFS API HBASE NOSQL TABLES API FILESYSTEM APACHE™ HADOOP® HDFS APACHE HBASE IMPLEMENTS IMPLEMENTS IMPLEMENTS IMPLEMENTS IMPLEMENTS DEPENDS DEPENDS Vertical Integration = High Performance POSIX NFS MapR Data Platform Architecture in a Nutshell
  18. 18. © 2014 MapR Technologies 18 Organization How Do Committees Invent? Melvin Conway, 1968 melconway.com/research/committees.html Manu Cornet bonkersworld.net “Any organization that designs a system (defined more broadly here than just information systems) will inevitably produce a design whose structure is a copy of the organization’s communication structure.” Q: •does this fit with software process? •does this fit with distributed apps? see also: haacked.com/archive/2013/05/13/applying-conways-law.aspx
  19. 19. © 2014 MapR Technologies 19 WSJ: Five Ways to Organize Your Data Scientists
  20. 20. © 2014 MapR Technologies 20 Meta Organization – Integration with Adjacent Teams • Central analytics and data science organization, based in a Strategy function [Facebook] • Same type of central organization, reporting to IT or Finance or maybe R&D [LinkedIn, GE, P&G] • Center of Excellence, located in one of the above-mentioned functions • Analysts and data scientists in one function, e.g., Marketing [American Express] • Fully decentralized analysts with no coordination [Twitter]
  21. 21. © 2014 MapR Technologies 21© 2014 MapR Technologies

×