OCTOBER, 2018
Cloud-Native Enterprise
Data Science Teams
1
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Boston Consulting Group has been using quantitative
methods to transform global companies for 55 years
1
Advanced Degrees
• Machine Learning
• Deep Learning, AI
• Statistics
• Operations Research
• Optimization
Significant experience
• 200+ advanced
analytics and BigData
cases/year
• Top-10 academia
Industry
Specialized in Analytics
• Domain experience across
industries and use cases
• Operators and
entrepreneurs
• Experienced consultants
Value realization focus
• Operationalize analytics
• Business transformation
ALGORITHMS, TOOLS, PROPRIETARY DATA
TECHNOLOGY
BUSINESS INTEGRATION
Data
scientists
+ Tech
Business
Domain
Experts
2
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Data Science
• Descriptive
• Predictive
• Prescriptive
Topic/industry expertise
• Customer relation
• Marketing
• Networks
• Operations
• Risk
On shore/Off shore teams
• Data scientists
• Data engineers
• Developers (UI, tools)
• Trainers
BCG Gamma: Worldwide 550+ analytics practitioners
East Coast
Boston/NYC
London
Germany
West Coast
L.A./S.F
New Delhi
Sydney
Paris
Chicago
Singapore
Moscow
Warsaw
Nordics
Brazil
China
Madrid
Japan
Milan
Casablanca
Toronto
Bogota
Zurich
>1600
BCG consultants worked
on Gamma cases since '16
3
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
BCG Gamma, Principal Analytics Engineer
New York
Ian Stokes-Rees
• 20 years professional software leadership
• 5 years advising on open source data
science strategy for Fortune 500
• International expert on Python data
science
• Past lecturer in Harvard’s Data Science
program
• PhD in Particle Physics from Oxford
Profile summary
Ian is part of Gamma X, a division within Boston Consulting Group that contributes
professional systems and software engineer experience to data science teams. He has
spent decades developing large scale computational software and systems in commercial
and research domains.
Ian has deep experience with enterprise-oriented advanced analytics. Prior to BCG he
spent 5 years as a core part of the team that created the Anaconda data science platform.
Anaconda is in use today by millions of individuals and thousands of companies as the one-
stop-shop for an integrated and flexible open data science tool box.
Experience
Education
Ph.D. in Particle Physics from Oxford University: global-scale computing platform for
physics
M.ASc. in Electrical and Computer Engineering, University of Waterloo: statistical
automatic speech recognition
B.ASc. in Electrical Engineering, University of Waterloo
• Developed advanced analytics strategy built around Open Source data science tools for
Fortune 500 companies
• Product Manager for Anaconda Enterprise, a commercial analytics platform
• Evangelist for Anaconda, promoting adoption of Python-centric data science
• Past faculty member in Harvard’s Computational Science and Engineering graduate
program, teaching courses in Internet-scale computational and data science
• Postdoctoral researcher at Harvard Medical School; developed big data protein
discovery pipeline
• Graduate researcher at CERN during Oxford-based PhD in particle physics; developed
global computing platform for 250,000 networked computers and petabytes of data
4
http://bit.ly/bcg-data-science-teams
BCG Gamma, Principal Analytics Engineer
New York
Ian Stokes-Rees
StokesRees.Ian@bcg.com
@ijstokes
5
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Algorithms are the tip of the iceberg when it comes to
business impact from analytics
10% Algorithms
20% Technology / IT
70% Business Transformation
70%
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
6
People
Platform
Process
Pillars of
transformative
analytics
7
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
People
8
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Once upon a time, successfully creating software was
more art than science; more luck than predictable
engineering process
Can double the number of software engineers
complete a program in half the time?
Should software engineering teams be staffed
entirely with computer scientists?
Does the unique nature of each program make it
impervious to reliable planning?
Should the first release be thrown away since it
will be reimplemented based on experience gained
during implementation?
9
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Once upon a time, successful data analytics was more
art than science; more luck than predictable
engineering process
Can double the number of data scientists create a
model in half the time?
Should analytics teams be staffed entirely with
data scientists?
Does the creative process of exploratory data
analytics make it impervious to planning, testing,
collaboration and revision control?
Should the first model be thrown away since it will
be reimplemented by a DevOps team anyway?
10
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Fred Brooks and Harlan Mills proposed “Surgical Teams” for software
projects in 1971, consisting of a mix of roles
• Surgeon: Team leader; Coordination of tasks; Execution on most critical activities
• Co-pilot: Deputy to Surgeon; Responsible for secondary tasks; Team-external coordination
• Administrator: Manages people and resources used by the team
• Editor: Software documentation
• Program Clerk: Manages project documentation, plans, meeting agendas & minutes
• Secretary: Assistants to Administrator and Editor
• Toolsmith: Automation tasks; Supporting code infrastructure development
• Tester: Develop test planbs; Run various classes of tests on a regular basis; Report on testing
• Language lawyer: Specializing in code review and optimization
11
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Fred Brooks and Harlan Mills proposed “Surgical Teams” for
software projects in 1971, consisting of a mix of roles
Surgeon: Team leader; Coordination of tasks; Execution on most critical activities
Co-pilot: deputy to Surgeon; Responsible for secondary tasks; Team-external coordination
Administrator: Manages people and resources used by the team
Editor: Software documentation
Program Clerk: Manages project documentation, plans, meeting agendas and minutes
Secretary: Assistants to Administrator and Editor
Toolsmith: Automation tasks; Supporting code infrastructure development
Tester: Develop test planbs; Run various classes of tests on a regular basis; Report on testing
Language lawyer: Specializing in code review and optimization
Option
12
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
BCG Gamma Case Teams in 2018 have a similar
structure for key leadership roles
Security
Master
Code
Master
Data
Master
Product
Master
13
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Gamma “Ionization” approach ensures quality standards
across code, data, security and the final product
Code master
• Sets up and manages
overall code structure
• Owns codebase
• Supervises GitHub tree
• Professional software
development experience
• Review regularly code
• Supervise login process
• Defines testing protocols
• Makes sure team agrees
on code review
Data master
• Creates and updates
overall data structure
• Understands data
sources, provenance and
governance
• Main liaison for data
• Establishes data quality
checks
• Ensures no Personally
Identifiable Information
unless necessary
• Manages data tracker
and escalation process
• Ensures data versioning
Security master
• Enforces data security
agreement
• Systematically tracks
local copies of client
data
• Supervises security of
team infrastructure
• Tracks/limits access to
data
• Escalate any security
related issue to X-ray if
needed
Product master
• Responsible for sprints in
each phase of product
development
• Defines effort involved
in reaching milestones
or finish a sprint
• Ideally a shared role
with BCG to ensure
common view on
priorities
• Define roadmap,
timeline and
specifications for each
sprint
14
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Gamma organizes data science teams with a mix of
roles and advises our clients to do likewise (I/II)
Lead Data Scientist: Execution of project plan; Manage team; Pair with and
mentor individuals; Manage milestones and task flow; code review; Project
level documentation
Data Scientist: Explore problem space; Own project modules (self-contained
aspects of project); Implement models and algorithms; Implement tests; code
level documentation
Principal Architect: Solution framework; create project plan; High level team
guidance; Expert oversight and review; Accountable to product or project
owner (budget holder/authorizer)
15
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Gamma organizes data science teams with a mix of roles and advises
our clients to do likewise (II/II)
Software Engineer: Set and monitor code standards; Optimize model and
algorithm implementations; Develop and manage automation; Support use of
collaborative software engineer tools
Machine Learning Engineer: Infrastructure expert; Design for production
deployment; Asset hardening; manage deployment and deployment process;
Packaging; Configuration; Model performance monitoring and tuning
Data Engineer: Data management; ETL layer; Feature engineering; data
security/access control; Data quality monitoring
16
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Platform
17
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
The complexity of enterprise data science projects can
easily lead to fragmentation across the value chain
Data ingested from disparate
sources using duplicative,
manual processes
Ad-hoc model training,
dependent on capacity of
individual machines
Manual process to set-up
and monitor DevOps
infrastructure
Difficult for business end
users to understand and
leverage without dashboards
Duplication of effort across
team due to lack of
version control
Inconsistent quality and
lack of standard protocols
adn environments
Models designed for individual
use and not structured to be
deployed at scale
Visualization,
enablement
and
monitoring
Model
deployment
and activation
Model
training and
evaluation
Data science
and model
development
Data integration,
wrangling, and
management
1 2 3 4 5
18
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Integrated systems and processes streamline
advanced analytics
Visualization,
enablement and
monitoring
Model
deployment
and activation
Model training
and evaluation
Data science
and model
development
Data integration,
wrangling, and
management
1 2 3 4 5
19
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Consideration of entire analytics project lifecycle is
critical for impact and long term success
Proof of Concept: Small team performing Exploratory Data Analysis (EDA)
utilizing a mix of laptops, servers, and Data Lab capabilities (e.g., RandD
Hadoop/Spark cluster); Data offline and sampled subset
Evaluation: One or two data scientists working offline on laptops
Pilot: Expanded PoC including partners/vendors, other LoBs, and live data
sources; Production deployment involves IT and Security Operations interaction;
Not critical to business continuity
Production Deployment: Global roll-out; may beb incorporated into
critical operations
20
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Teams, analytics, and platform all need to support this
evolution
Deployment: Models will require a systematic approach to configuration and
deployment to allow orchestration and model interaction
Maintenance: Model performance must be monitored and successful models will
have lifespans that exceed the involvement of the original team that
created them
Interactive exploratory to automated production analytics: Models that begin
life in the hands of a single person developing them intimately and interactively
will end up in high performance and fully automated production environments
Laptop to server to cluster: Progressive scaling of analytics models to support
increased data volumes, users, and parallelization
21
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
A successful analytics platform strategy consists of several key
dimensions
Exploratory analytics: What is required to equip individuals and teams to
perform ad hoc analytics for new insights or project prototypes?
Data, Storage, Networking: What data is available? Where is it stored? How is
it accessed?
Infrastructure hosting: What advantages or motivators are there for
organization-managed data and compute centers? What advantages or motivators
are there for adoption of cloud services? Do containers play a part?
Data Lab: What role can a “Data Lab” play in mimicking production systems
and data?
Analytics tools: What tools are in use today? How impactful are they? Where
will the demand and opportunity be in the future?
22
BCG Gamma has a perspective on what successful
enterprise analytics platforms look like:
Collaborative, cloud centric, production ready,
and built on the open data science software stack
23
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
In practice, that analytics platform perspective translates to (I/II)
Hadoop: This mostly boils down to Spark for a fast and reliable distributed in-memory data
structure with a powerful API. Interestingly PySpark is generally preferred over Scala, and
our clients have the same experience.
Storage: Interesting data doesn’t easily exist on laptops, pushing us towards server and
cloud-hosted analytics environments which are “close” to the data
Exploration: Gamma data scientists love Jupyter Notebooks, even when we’d prefer they
use it less. They’re in good company: millions of people rely on Jupyter for exploratory
analytics and we don’t expect that to change any time soon.
Containerization: Gamma increasingly uses Docker containerization for model
encapsulation and Kubernetes for container coordination and deployment.
Languages: Python and R form the foundation of our work, so Anaconda and R-Studio are
leveraged heavily. We see this with our clients as well.
24
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
In practice, that analytics platform perspective translates to (II/II)
Collaboration: Server-based tools that centralize data and code are essential, as is
revision control for coordination, provenance, and reproducibility. Adoption can be
challenging for solo data scientists who are used to Jupyter running on their laptop
Cloud computing: We find that security, performance, scaling, and maintenance are
generally better and more cost-effective through cloud providers than via enterprise
owned and managed data centers. Each of the major cloud providers have their own
competitive advantages and in putting our clients success first we variously use Amazon,
Microsoft or Google. Transitioning large organizations to the cloud is challenging, however
Proprietary analytics tools: Despite a foundation of open data science tools and libraries,
Gamma makes heavy use of proprietary tools such as Tableau, Alteryx and Data-Iku. These
can often simplify and streamline analytics tasks
Automation: Gamma leverages the HashiStack for infrastructure automation, and
CircleCI for a CI/CD system
25
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Process
26
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
The Gamma approach is to think big, start small,
grow fast
what if...
Start with the business
opportunity
Build, test, iterate
Scale to solution
Transform organization
 Business first
 Value focus
 Lean technology
 Right design
 Practical
application of AI
and Big Data
 Well defined use
cases
 Iterative
technology
scale up
 Purpose fit
tools from
existing
technologies
 New ways of
working
 Analytics and
business strategy
in lock-step
 Right organization
and processes
 Advanced analytics
becomes BAU
27
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Unlocking value of data requires advanced analytics and
business transformation capabilities side by side
Analytics team enabled to
build and maintain AI
solutions
Scalable data layer
structured and unstructured
complex data
Dynamic triggers and
alerts
Agile development from
business idea to AI
solution
Continuous
learning and
improvement
Enterprise culture of
testing and
experimentation
Advanced pattern
recognition
and detection
AI solutions integrated with
business processes and
decisions
Advanced
analytics and
technology
Business
transformation
Unlocking
the value
of data
28
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
Gamma guiding principles of analytics transformations
Organizing for
analytics
everywhere
Industrializing
platform
Creating
momentum
Strategic design
Business- and strategy-led
approach
Data and Analytics
Rapid value through agile
development
Technology and
deployment
Flexible and scalable
technology stack
Ways of working
Holistic business
transformation
Demonstrate value through
pilots before scaling IT and
resources
Ground project roadmap in
most important strategic
priorities
Build capabilities
internally and create data
centric culture across
business and functions
Accretive in months,
significant PandL impact
within a year
Sequence PoC, pilots,
incubation, and
industrialization
Access to data across silos
and outside company
Leverage existing
architecture in pilots to
generate insight quickly
Scale data infrastructure
as projects come online
and adapt to business
needs
Expand access to data via
partnerships
Agile projects, deploying
MVP mindset in cross-
functional teams
Success is 70% business
transformation, 20% IT,
and 10% algorithms
Strong focus on execution,
change management, and
enablement
29
Thank you and Questions
BCG Gamma, Principal Analytics Engineer
New York
Ian Stokes-Rees
http://bit.ly/bcg-data-science-teams
StokesRees.Ian@bcg.com
@ijstokes
30
The services and materials provided by The Boston Consulting Group (BCG) are subject to BCG's Standard Terms
(a copy of which is available upon request) or such other agreement as may have been previously executed by BCG.
BCG does not provide legal, accounting, or tax advice. The Client is responsible for obtaining independent advice
concerning these matters. This advice may affect the guidance given by BCG. Further, BCG has made no undertaking
to update these materials after the date hereof, notwithstanding that such information may become outdated
or inaccurate.
The materials contained in this presentation are designed for the sole use by the board of directors or senior
management of the Client and solely for the limited purposes described in the presentation. The materials shall not be
copied or given to any person or entity other than the Client (“Third Party”) without the prior written consent of BCG.
These materials serve only as the focus for discussion; they are incomplete without the accompanying oral commentary
and may not be relied on as a stand-alone document. Further, Third Parties may not, and it is unreasonable for any
Third Party to, rely on these materials for any purpose whatsoever. To the fullest extent permitted by law (and except
to the extent otherwise agreed in a signed writing by BCG), BCG shall have no liability whatsoever to any Third Party,
and any Third Party hereby waives any rights and claims it may have at any time against BCG with regard to the
services, this presentation, or other materials, including the accuracy or completeness thereof. Receipt and review of
this document shall be deemed agreement with and consideration for the foregoing.
BCG does not provide fairness opinions or valuations of market transactions, and these materials should not be relied on
or construed as such. Further, the financial evaluations, projected market and financial information, and conclusions
contained in these materials are based upon standard valuation methodologies, are not definitive forecasts, and are not
guaranteed by BCG. BCG has used public and/or confidential data and assumptions provided to BCG by the Client.
BCG has not independently verified the data and assumptions used in these analyses. Changes in the underlying data or
operating assumptions will clearly impact the analyses and conclusions.
Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
bcg.com

Cloud-native Enterprise Data Science Teams

  • 1.
  • 2.
    1 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Boston Consulting Grouphas been using quantitative methods to transform global companies for 55 years 1 Advanced Degrees • Machine Learning • Deep Learning, AI • Statistics • Operations Research • Optimization Significant experience • 200+ advanced analytics and BigData cases/year • Top-10 academia Industry Specialized in Analytics • Domain experience across industries and use cases • Operators and entrepreneurs • Experienced consultants Value realization focus • Operationalize analytics • Business transformation ALGORITHMS, TOOLS, PROPRIETARY DATA TECHNOLOGY BUSINESS INTEGRATION Data scientists + Tech Business Domain Experts
  • 3.
    2 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Data Science • Descriptive •Predictive • Prescriptive Topic/industry expertise • Customer relation • Marketing • Networks • Operations • Risk On shore/Off shore teams • Data scientists • Data engineers • Developers (UI, tools) • Trainers BCG Gamma: Worldwide 550+ analytics practitioners East Coast Boston/NYC London Germany West Coast L.A./S.F New Delhi Sydney Paris Chicago Singapore Moscow Warsaw Nordics Brazil China Madrid Japan Milan Casablanca Toronto Bogota Zurich >1600 BCG consultants worked on Gamma cases since '16
  • 4.
    3 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. BCG Gamma, PrincipalAnalytics Engineer New York Ian Stokes-Rees • 20 years professional software leadership • 5 years advising on open source data science strategy for Fortune 500 • International expert on Python data science • Past lecturer in Harvard’s Data Science program • PhD in Particle Physics from Oxford Profile summary Ian is part of Gamma X, a division within Boston Consulting Group that contributes professional systems and software engineer experience to data science teams. He has spent decades developing large scale computational software and systems in commercial and research domains. Ian has deep experience with enterprise-oriented advanced analytics. Prior to BCG he spent 5 years as a core part of the team that created the Anaconda data science platform. Anaconda is in use today by millions of individuals and thousands of companies as the one- stop-shop for an integrated and flexible open data science tool box. Experience Education Ph.D. in Particle Physics from Oxford University: global-scale computing platform for physics M.ASc. in Electrical and Computer Engineering, University of Waterloo: statistical automatic speech recognition B.ASc. in Electrical Engineering, University of Waterloo • Developed advanced analytics strategy built around Open Source data science tools for Fortune 500 companies • Product Manager for Anaconda Enterprise, a commercial analytics platform • Evangelist for Anaconda, promoting adoption of Python-centric data science • Past faculty member in Harvard’s Computational Science and Engineering graduate program, teaching courses in Internet-scale computational and data science • Postdoctoral researcher at Harvard Medical School; developed big data protein discovery pipeline • Graduate researcher at CERN during Oxford-based PhD in particle physics; developed global computing platform for 250,000 networked computers and petabytes of data
  • 5.
    4 http://bit.ly/bcg-data-science-teams BCG Gamma, PrincipalAnalytics Engineer New York Ian Stokes-Rees StokesRees.Ian@bcg.com @ijstokes
  • 6.
    5 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Algorithms are thetip of the iceberg when it comes to business impact from analytics 10% Algorithms 20% Technology / IT 70% Business Transformation 70%
  • 7.
  • 8.
  • 9.
    8 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Once upon atime, successfully creating software was more art than science; more luck than predictable engineering process Can double the number of software engineers complete a program in half the time? Should software engineering teams be staffed entirely with computer scientists? Does the unique nature of each program make it impervious to reliable planning? Should the first release be thrown away since it will be reimplemented based on experience gained during implementation?
  • 10.
    9 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Once upon atime, successful data analytics was more art than science; more luck than predictable engineering process Can double the number of data scientists create a model in half the time? Should analytics teams be staffed entirely with data scientists? Does the creative process of exploratory data analytics make it impervious to planning, testing, collaboration and revision control? Should the first model be thrown away since it will be reimplemented by a DevOps team anyway?
  • 11.
    10 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Fred Brooks andHarlan Mills proposed “Surgical Teams” for software projects in 1971, consisting of a mix of roles • Surgeon: Team leader; Coordination of tasks; Execution on most critical activities • Co-pilot: Deputy to Surgeon; Responsible for secondary tasks; Team-external coordination • Administrator: Manages people and resources used by the team • Editor: Software documentation • Program Clerk: Manages project documentation, plans, meeting agendas & minutes • Secretary: Assistants to Administrator and Editor • Toolsmith: Automation tasks; Supporting code infrastructure development • Tester: Develop test planbs; Run various classes of tests on a regular basis; Report on testing • Language lawyer: Specializing in code review and optimization
  • 12.
    11 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Fred Brooks andHarlan Mills proposed “Surgical Teams” for software projects in 1971, consisting of a mix of roles Surgeon: Team leader; Coordination of tasks; Execution on most critical activities Co-pilot: deputy to Surgeon; Responsible for secondary tasks; Team-external coordination Administrator: Manages people and resources used by the team Editor: Software documentation Program Clerk: Manages project documentation, plans, meeting agendas and minutes Secretary: Assistants to Administrator and Editor Toolsmith: Automation tasks; Supporting code infrastructure development Tester: Develop test planbs; Run various classes of tests on a regular basis; Report on testing Language lawyer: Specializing in code review and optimization Option
  • 13.
    12 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. BCG Gamma CaseTeams in 2018 have a similar structure for key leadership roles Security Master Code Master Data Master Product Master
  • 14.
    13 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Gamma “Ionization” approachensures quality standards across code, data, security and the final product Code master • Sets up and manages overall code structure • Owns codebase • Supervises GitHub tree • Professional software development experience • Review regularly code • Supervise login process • Defines testing protocols • Makes sure team agrees on code review Data master • Creates and updates overall data structure • Understands data sources, provenance and governance • Main liaison for data • Establishes data quality checks • Ensures no Personally Identifiable Information unless necessary • Manages data tracker and escalation process • Ensures data versioning Security master • Enforces data security agreement • Systematically tracks local copies of client data • Supervises security of team infrastructure • Tracks/limits access to data • Escalate any security related issue to X-ray if needed Product master • Responsible for sprints in each phase of product development • Defines effort involved in reaching milestones or finish a sprint • Ideally a shared role with BCG to ensure common view on priorities • Define roadmap, timeline and specifications for each sprint
  • 15.
    14 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Gamma organizes datascience teams with a mix of roles and advises our clients to do likewise (I/II) Lead Data Scientist: Execution of project plan; Manage team; Pair with and mentor individuals; Manage milestones and task flow; code review; Project level documentation Data Scientist: Explore problem space; Own project modules (self-contained aspects of project); Implement models and algorithms; Implement tests; code level documentation Principal Architect: Solution framework; create project plan; High level team guidance; Expert oversight and review; Accountable to product or project owner (budget holder/authorizer)
  • 16.
    15 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Gamma organizes datascience teams with a mix of roles and advises our clients to do likewise (II/II) Software Engineer: Set and monitor code standards; Optimize model and algorithm implementations; Develop and manage automation; Support use of collaborative software engineer tools Machine Learning Engineer: Infrastructure expert; Design for production deployment; Asset hardening; manage deployment and deployment process; Packaging; Configuration; Model performance monitoring and tuning Data Engineer: Data management; ETL layer; Feature engineering; data security/access control; Data quality monitoring
  • 17.
  • 18.
    17 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. The complexity ofenterprise data science projects can easily lead to fragmentation across the value chain Data ingested from disparate sources using duplicative, manual processes Ad-hoc model training, dependent on capacity of individual machines Manual process to set-up and monitor DevOps infrastructure Difficult for business end users to understand and leverage without dashboards Duplication of effort across team due to lack of version control Inconsistent quality and lack of standard protocols adn environments Models designed for individual use and not structured to be deployed at scale Visualization, enablement and monitoring Model deployment and activation Model training and evaluation Data science and model development Data integration, wrangling, and management 1 2 3 4 5
  • 19.
    18 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Integrated systems andprocesses streamline advanced analytics Visualization, enablement and monitoring Model deployment and activation Model training and evaluation Data science and model development Data integration, wrangling, and management 1 2 3 4 5
  • 20.
    19 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Consideration of entireanalytics project lifecycle is critical for impact and long term success Proof of Concept: Small team performing Exploratory Data Analysis (EDA) utilizing a mix of laptops, servers, and Data Lab capabilities (e.g., RandD Hadoop/Spark cluster); Data offline and sampled subset Evaluation: One or two data scientists working offline on laptops Pilot: Expanded PoC including partners/vendors, other LoBs, and live data sources; Production deployment involves IT and Security Operations interaction; Not critical to business continuity Production Deployment: Global roll-out; may beb incorporated into critical operations
  • 21.
    20 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Teams, analytics, andplatform all need to support this evolution Deployment: Models will require a systematic approach to configuration and deployment to allow orchestration and model interaction Maintenance: Model performance must be monitored and successful models will have lifespans that exceed the involvement of the original team that created them Interactive exploratory to automated production analytics: Models that begin life in the hands of a single person developing them intimately and interactively will end up in high performance and fully automated production environments Laptop to server to cluster: Progressive scaling of analytics models to support increased data volumes, users, and parallelization
  • 22.
    21 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. A successful analyticsplatform strategy consists of several key dimensions Exploratory analytics: What is required to equip individuals and teams to perform ad hoc analytics for new insights or project prototypes? Data, Storage, Networking: What data is available? Where is it stored? How is it accessed? Infrastructure hosting: What advantages or motivators are there for organization-managed data and compute centers? What advantages or motivators are there for adoption of cloud services? Do containers play a part? Data Lab: What role can a “Data Lab” play in mimicking production systems and data? Analytics tools: What tools are in use today? How impactful are they? Where will the demand and opportunity be in the future?
  • 23.
    22 BCG Gamma hasa perspective on what successful enterprise analytics platforms look like: Collaborative, cloud centric, production ready, and built on the open data science software stack
  • 24.
    23 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. In practice, thatanalytics platform perspective translates to (I/II) Hadoop: This mostly boils down to Spark for a fast and reliable distributed in-memory data structure with a powerful API. Interestingly PySpark is generally preferred over Scala, and our clients have the same experience. Storage: Interesting data doesn’t easily exist on laptops, pushing us towards server and cloud-hosted analytics environments which are “close” to the data Exploration: Gamma data scientists love Jupyter Notebooks, even when we’d prefer they use it less. They’re in good company: millions of people rely on Jupyter for exploratory analytics and we don’t expect that to change any time soon. Containerization: Gamma increasingly uses Docker containerization for model encapsulation and Kubernetes for container coordination and deployment. Languages: Python and R form the foundation of our work, so Anaconda and R-Studio are leveraged heavily. We see this with our clients as well.
  • 25.
    24 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. In practice, thatanalytics platform perspective translates to (II/II) Collaboration: Server-based tools that centralize data and code are essential, as is revision control for coordination, provenance, and reproducibility. Adoption can be challenging for solo data scientists who are used to Jupyter running on their laptop Cloud computing: We find that security, performance, scaling, and maintenance are generally better and more cost-effective through cloud providers than via enterprise owned and managed data centers. Each of the major cloud providers have their own competitive advantages and in putting our clients success first we variously use Amazon, Microsoft or Google. Transitioning large organizations to the cloud is challenging, however Proprietary analytics tools: Despite a foundation of open data science tools and libraries, Gamma makes heavy use of proprietary tools such as Tableau, Alteryx and Data-Iku. These can often simplify and streamline analytics tasks Automation: Gamma leverages the HashiStack for infrastructure automation, and CircleCI for a CI/CD system
  • 26.
  • 27.
    26 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. The Gamma approachis to think big, start small, grow fast what if... Start with the business opportunity Build, test, iterate Scale to solution Transform organization  Business first  Value focus  Lean technology  Right design  Practical application of AI and Big Data  Well defined use cases  Iterative technology scale up  Purpose fit tools from existing technologies  New ways of working  Analytics and business strategy in lock-step  Right organization and processes  Advanced analytics becomes BAU
  • 28.
    27 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Unlocking value ofdata requires advanced analytics and business transformation capabilities side by side Analytics team enabled to build and maintain AI solutions Scalable data layer structured and unstructured complex data Dynamic triggers and alerts Agile development from business idea to AI solution Continuous learning and improvement Enterprise culture of testing and experimentation Advanced pattern recognition and detection AI solutions integrated with business processes and decisions Advanced analytics and technology Business transformation Unlocking the value of data
  • 29.
    28 Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved. Gamma guiding principlesof analytics transformations Organizing for analytics everywhere Industrializing platform Creating momentum Strategic design Business- and strategy-led approach Data and Analytics Rapid value through agile development Technology and deployment Flexible and scalable technology stack Ways of working Holistic business transformation Demonstrate value through pilots before scaling IT and resources Ground project roadmap in most important strategic priorities Build capabilities internally and create data centric culture across business and functions Accretive in months, significant PandL impact within a year Sequence PoC, pilots, incubation, and industrialization Access to data across silos and outside company Leverage existing architecture in pilots to generate insight quickly Scale data infrastructure as projects come online and adapt to business needs Expand access to data via partnerships Agile projects, deploying MVP mindset in cross- functional teams Success is 70% business transformation, 20% IT, and 10% algorithms Strong focus on execution, change management, and enablement
  • 30.
    29 Thank you andQuestions BCG Gamma, Principal Analytics Engineer New York Ian Stokes-Rees http://bit.ly/bcg-data-science-teams StokesRees.Ian@bcg.com @ijstokes
  • 31.
    30 The services andmaterials provided by The Boston Consulting Group (BCG) are subject to BCG's Standard Terms (a copy of which is available upon request) or such other agreement as may have been previously executed by BCG. BCG does not provide legal, accounting, or tax advice. The Client is responsible for obtaining independent advice concerning these matters. This advice may affect the guidance given by BCG. Further, BCG has made no undertaking to update these materials after the date hereof, notwithstanding that such information may become outdated or inaccurate. The materials contained in this presentation are designed for the sole use by the board of directors or senior management of the Client and solely for the limited purposes described in the presentation. The materials shall not be copied or given to any person or entity other than the Client (“Third Party”) without the prior written consent of BCG. These materials serve only as the focus for discussion; they are incomplete without the accompanying oral commentary and may not be relied on as a stand-alone document. Further, Third Parties may not, and it is unreasonable for any Third Party to, rely on these materials for any purpose whatsoever. To the fullest extent permitted by law (and except to the extent otherwise agreed in a signed writing by BCG), BCG shall have no liability whatsoever to any Third Party, and any Third Party hereby waives any rights and claims it may have at any time against BCG with regard to the services, this presentation, or other materials, including the accuracy or completeness thereof. Receipt and review of this document shall be deemed agreement with and consideration for the foregoing. BCG does not provide fairness opinions or valuations of market transactions, and these materials should not be relied on or construed as such. Further, the financial evaluations, projected market and financial information, and conclusions contained in these materials are based upon standard valuation methodologies, are not definitive forecasts, and are not guaranteed by BCG. BCG has used public and/or confidential data and assumptions provided to BCG by the Client. BCG has not independently verified the data and assumptions used in these analyses. Changes in the underlying data or operating assumptions will clearly impact the analyses and conclusions. Copyright©2017byTheBostonConsultingGroup,Inc.Allrightsreserved.
  • 32.