Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
FROM R&D TO ROI: REALIZE VALUE BY OPERATIONALIZING MACHINE LEARNING
Diego Oppenheimer, CEO
MACHINE LEARNING
!=
PRODUCTION MACHINE LEARNING
Cluster Orchestration
Container Image
Management
Load Balancing
Utilizing ...
75% of Time Spent on Infrastructure
Algorithmia Proprietary and Confidential
Survey: Teams are capable of much more
Key Cha...
Gartner: Productionization’s biggest barrier
The Main Barrier to Delivering Business Value Is Lack of Successful Productiz...
The Demand for AI
Source: 1) N = 11,400 organizations in North America, Europe, and Asia; 2) International Institute of
An...
Traditional vs. ML life cycle
Algorithmia Proprietary and Confidential
Traditional DevOps
ML Life Cycle DevOps
“Hidden Technical Debt in Machine Learning Systems,” D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips,...
ML is in a huge growth phase, but
Difficult/expensive for DevOps to keep up
Initially
● A few models, a couple frameworks, ...
Iteration speed separates ML from app dev
● The ML development
lifecycle is an evolving
ecosystem
● ML moves faster than
t...
Let’s get tactical
Deploying ML today is economically challenging
● Due to a lack of process
● Due to the wrong incentives
● Due to the wrong...
Lack of Process
● Easy to get POC funded and experiments running
● Once results shown… then what?
● Who funds production, ...
Wrong Incentives
● ML efforts as part of innovation mandates - designed to be “out there”
● Setting goals to learn, innova...
Minimal Justifiable Improvement Tree
Source: ML is Boring - Ian Xiao
Wrong Teams
● Asking Data Scientists with lack of engineering experience to build infrastructure
● Teams with lack of devo...
Lack of proper champions
● Like with any deployment of new technology lack of champions can be a death kiss
● ML projects ...
Wrong Technology
● Lack of defined technology stack or best practices
● Not building for repeatability, measurability and a...
Let’s get technical
● Connect to your Data
Management System
● Publish from the Training
Platform of your choice via
API, Git, or CI/CD pipeli...
20
Training
● Long compute cycle
● Fixed load
● Stateful
● Single user
Production
● Short compute bursts
● Elastic
● State...
Heterogeneous tooling and dependencies
● Dozens of language / framework
combinations
● Hardware dependencies (e.g.
CUDA) r...
● Multiple frameworks
● Multiple languages
● Multiple teams
Composability compounds the challenge
Diversity complicates auditability and governance
● Internal model
usage difficult to
track across
multi-model
pipelines
● ...
Lack of reusability slows growth
● Teams constantly reinventing
the wheel
● Models and other assets exist
only on laptops ...
Measuring Model Performance
● Success &
performance are
very
context-sensitive
● Multiple success
factors
● No one model i...
Considerations for operationalizing ML in the
Enterprise
● Infrastructure-agnostic deployment
● Collaboration & pipelining...
Navigate Common Pitfalls
● Don’t reinvent the wheel
● Outcomes, not process
● Don’t try to be perfect
● Say no to lock-in
...
MACHINE LEARNING
!=
PRODUCTION MACHINE LEARNING
Cluster Orchestration
Container Image
Management
Load Balancing
Utilizing ...
Q&A
Learn More
Request a demo at
https://algorithmia.com/demo
Download a whitepaper:
https://bit.ly/2HaA9Bg
Contact us for...
Further reading & credits:
● Last defense in another AI winter - Ian Xiao
(https://towardsdatascience.com/the-last-defense...
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Rsqrd AI: From R&D to ROI of AI

Download to read offline

In this talk, Rsqrd AI welcomes Diego Oppenheimer, CEO and co-founder of Algorithmia! Diego goes in depth on why machine learning projects fail and why we don’t see machine learning in production despite how powerful the technology can be. He shares his experiences on the problems surrounding pushing ML into production.

**These slides are from a talk given at Rsqrd AI. Learn more at rsqrdai.org**

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Rsqrd AI: From R&D to ROI of AI

  1. 1. FROM R&D TO ROI: REALIZE VALUE BY OPERATIONALIZING MACHINE LEARNING Diego Oppenheimer, CEO
  2. 2. MACHINE LEARNING != PRODUCTION MACHINE LEARNING Cluster Orchestration Container Image Management Load Balancing Utilizing GPUs Model Versioning API Management Distributed Parallel Processing Cloud Infrastructure Decisions
  3. 3. 75% of Time Spent on Infrastructure Algorithmia Proprietary and Confidential Survey: Teams are capable of much more Key Challenges 30%: supporting different languages and frameworks 30%: model management tasks such as versioning and reproducibility 38%: deploying models at necessary scale * - survey of > 500 practitioners & management in summer of 2018
  4. 4. Gartner: Productionization’s biggest barrier The Main Barrier to Delivering Business Value Is Lack of Successful Productizing Projects Base: n = 45 Gartner Research Circle members/external sample. Excludes “not sure.” Asked if selected “getting data and analytics projects into production” at DA05. DA5b. Thinking about why you selected “getting data and analytics projects into production”as a challenge, please identify your organization’s specific barriers to moving projects into production. Multiple responses allowed. ID: 333499 ©2018 Gartner, Inc.
  5. 5. The Demand for AI Source: 1) N = 11,400 organizations in North America, Europe, and Asia; 2) International Institute of Analytics; 3) Forbes, 2019; Author’s Analysis.
  6. 6. Traditional vs. ML life cycle Algorithmia Proprietary and Confidential Traditional DevOps ML Life Cycle DevOps
  7. 7. “Hidden Technical Debt in Machine Learning Systems,” D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, Dan Dennison Google: “Developing and deploying ML systems is relatively fast and cheap, but maintaining them over time is difficult and expensive.” Algorithmia Proprietary and Confidential
  8. 8. ML is in a huge growth phase, but Difficult/expensive for DevOps to keep up Initially ● A few models, a couple frameworks, 1-2 languages ● Dedicated hardware or VM Hosting ● Self-managed DevOps or IT team ● High time-to-deploy, manual discoverability ● Few end-users, heterogenous APIs (if any) Pretty soon... ● > 9,500 algorithms (95k versions) on many runtimes / frameworks ● > 100k algorithm developers: heterogenous, largely unpredictable ● Each algorithm: 1 to 1,000 calls/second, a lot of variance ● Need auto-deploy, discoverability, low (10-15ms) latency ● Common API, composability, fine-grained security
  9. 9. Iteration speed separates ML from app dev ● The ML development lifecycle is an evolving ecosystem ● ML moves faster than traditional app development ● ML can introduce breaking changes to apps that consume model output
  10. 10. Let’s get tactical
  11. 11. Deploying ML today is economically challenging ● Due to a lack of process ● Due to the wrong incentives ● Due to the wrong teams ● Due to the wrong technology ● Due to lack of proper champions
  12. 12. Lack of Process ● Easy to get POC funded and experiments running ● Once results shown… then what? ● Who funds production, who needs to be involved , how does production work at my enterprise? Must be able to answer: How do we go from POC to Production? Solution: ● Plan and fund deployment upfront ● Set clear deployment criterias ● Bring in stakeholders from IT and Devops early ● Build for repeatability in process
  13. 13. Wrong Incentives ● ML efforts as part of innovation mandates - designed to be “out there” ● Setting goals to learn, innovate, experiment - instead of deploy, affect company metrics, align with business ● Demo-ware vs integratable and usable Must be able to answer: What is the minimal justifiable improvement ? Solution: ● Consider using MJIT by Ian Xiao
  14. 14. Minimal Justifiable Improvement Tree Source: ML is Boring - Ian Xiao
  15. 15. Wrong Teams ● Asking Data Scientists with lack of engineering experience to build infrastructure ● Teams with lack of devops experience ● Not partnering the right skill sets inside the organization Must be able to answer: Does my team have the right skill set to make my solution deployable in the organization? Solution: ● Create hybrid teams of engineering, data scientists and devops engineers ● Stop chasing Unicorns (Data scientists with Devops and Engineering experience) ● Software and platforms that enhance the data science and ML team
  16. 16. Lack of proper champions ● Like with any deployment of new technology lack of champions can be a death kiss ● ML projects without executive sponsorship rarely see the light of day ● “Like any introduction of new ideas, tools, or processes, it creates a level of uncertainty due to skepticism, unfamiliarity, or misunderstanding. Fear of failure gets into the way of important and rational decisions.” Must be able to answer: How to get buy-in from stakeholders ? Solution: ● Align values and interests ● Involve stakeholders up and down the command chain early ● Collaborate to achieve goals vs dictate
  17. 17. Wrong Technology ● Lack of defined technology stack or best practices ● Not building for repeatability, measurability and auditability ● Proprietary lock-in to tooling ● Not thinking about access to data ● Differences between Prod and Dev Must be able to answer: What is the best ML architecture for my organization ? Solution: ● Design to execute at scale, repeatedly and efficiently ● “Tightly integrated but loosely coupled” ● Replace or upgrade components as technologies, data sources and needs evolve ● Anticipate and allow a variety of tools and technologies to be used concurrently, at every step of the life cycle ● Remain open to integration with the variety of in-house technologies
  18. 18. Let’s get technical
  19. 19. ● Connect to your Data Management System ● Publish from the Training Platform of your choice via API, Git, or CI/CD pipeline ● Deploy models and Manage model serving, inference, and compute infrastructure ● Integrate with your others models and consuming production applications ML Lifecycle: Data > Train > Deploy > Manage
  20. 20. 20 Training ● Long compute cycle ● Fixed load ● Stateful ● Single user Production ● Short compute bursts ● Elastic ● Stateless ● Many users Training and production are very different
  21. 21. Heterogeneous tooling and dependencies ● Dozens of language / framework combinations ● Hardware dependencies (e.g. CUDA) require substantial architecture investment ● New frameworks emerge every year ● Frameworks and languages evolve constantly, requiring ongoing maintenance and testing
  22. 22. ● Multiple frameworks ● Multiple languages ● Multiple teams Composability compounds the challenge
  23. 23. Diversity complicates auditability and governance ● Internal model usage difficult to track across multi-model pipelines ● Auditability and access are major security, compliance concerns
  24. 24. Lack of reusability slows growth ● Teams constantly reinventing the wheel ● Models and other assets exist only on laptops or local servers ● Multiple languages and frameworks introduce incompatibility
  25. 25. Measuring Model Performance ● Success & performance are very context-sensitive ● Multiple success factors ● No one model is right for every job
  26. 26. Considerations for operationalizing ML in the Enterprise ● Infrastructure-agnostic deployment ● Collaboration & pipelining ● Performance SLAs ● Regulatory compliance ● Governance ● Accounting / chargeback tracking ● Security / authentication
  27. 27. Navigate Common Pitfalls ● Don’t reinvent the wheel ● Outcomes, not process ● Don’t try to be perfect ● Say no to lock-in ● Tools aren’t solutions ● Audit honestly, revise constantly
  28. 28. MACHINE LEARNING != PRODUCTION MACHINE LEARNING Cluster Orchestration Container Image Management Load Balancing Utilizing GPUs Model Versioning API Management Distributed Parallel Processing Cloud Infrastructure Decisions
  29. 29. Q&A Learn More Request a demo at https://algorithmia.com/demo Download a whitepaper: https://bit.ly/2HaA9Bg Contact us for more info: info@algorithmia.com
  30. 30. Further reading & credits: ● Last defense in another AI winter - Ian Xiao (https://towardsdatascience.com/the-last-defense-against-another-ai-winter-c589b48c561) ● Foundations for ML at Scale - Peter Skomoroch ● Hidden Technical Debt in Machine Learning System - Google https://pdfs.semanticscholar.org/1eb1/31a34fbb508a9dd8b646950c65901d6f1a5b.pdf?_ga=2.43290021.1000937634.15724 18719-1606180446.1572418719 ● The Roadmap to Machine Learning Maturity - Algorithmia https://pdfs.semanticscholar.org/1eb1/31a34fbb508a9dd8b646950c65901d6f1a5b.pdf?_ga=2.43290021.1000937634.15724 18719-1606180446.1572418719

In this talk, Rsqrd AI welcomes Diego Oppenheimer, CEO and co-founder of Algorithmia! Diego goes in depth on why machine learning projects fail and why we don’t see machine learning in production despite how powerful the technology can be. He shares his experiences on the problems surrounding pushing ML into production. **These slides are from a talk given at Rsqrd AI. Learn more at rsqrdai.org**

Views

Total views

69

On Slideshare

0

From embeds

0

Number of embeds

22

Actions

Downloads

1

Shares

0

Comments

0

Likes

0

×