SlideShare a Scribd company logo
1 of 58
Download to read offline
Helixa
Tech Leader’s Guide to
Effective Building of
Machine Learning
Products
Gianmario Spacagna
Chief Scientist @ Helixa
ML for Enterprises Conference
Rome, 28th October 2019
About Me
7+ years experience in building Machine Learning products
Currently leading a team of ML Scientists and ML Engineers
Background in Software Engineering of Distributed Systems
MBA Candidate
Co-author of Python Deep Learning
Contributor of the Professional Data Science Manifesto
Blogger of Data Science Vademecum
Founder of the DataScienceMilan.org community
Stockholm, London, Milan
Gianmario Spacagna
Chief Scientist, Helixa
gspacagna@helixa.ai
Agenda
Manager’s guide (40 minutes)
1. Introducing ML in the Enterprises
2. Defining the ML Product Specifications
3. Planning Under Uncertainty
4. Building a balanced ML Team
Tech Leaders’ guide (20 minutes)
5. ML Product Lifecycle
6. Serverless architectures
5. ML Product Lifecycle
Cloud Providers Disclaimer
The following examples will focus on AWS stack but consider
that other cloud providers offers similar services.
It is not part of this talk to compare different cloud solutions.
* from “The Start-Up Trap” by Robert C. Martin (Uncle Bob)
“The only way to go fast is to go well” cit. Uncle Bob
Overview of a real-world ML production system
Source: https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems
Only a small fraction of real-world ML systems is
composed of the ML Code. The required surrounding
infrastructure is vast and complex.
Overview of a ML component lifecycle
Picture source: https://medium.com/microsoftazure/how-to-accelerate-devops-with-machine-learning-lifecycle-management-2ca4c86387a0
Manage the lifecycle with dedicated platforms
Picture source: www.mlflow.org
Predictions
Model
Serving
Training
Other Machine Learning lifecycle platforms and tools
Picture source: www.mlflow.org
TensorFlow Extended (TFX)
Data Version Control
HopsWorks
Native Cloud Object (Data) Storage
Benefits:
● Cheaper
● Elastic
● Highly available
● Performant
“The benefit of HDFS is minimal and not
worth the operational complexity”
Source: DataBricks
Keep Your Datasets Registered in a Catalog
Production solution: AWS Glue Data CatalogManual solution
Manage training labels using Snorkel
www.snorkel.org
Dev tool stack and workflow
Pull
Notebooks and data
stored in S3 in shared
folders
S3 buckets mounted locally
via Alluxio cache for fast and
cheap access to data
Commit
and
push
Dev Unix
Machine
Notebook name matching
branch ID
Header cell:
1. Pull the latest version of the
code and install locally
2. Print the git status and
dependency versions
Develop code in the laptop
using professional IDEs
Feature branches
matching Jira key
Branching models:
● GitFlow
● Trunk-based
Processing large datasets with Elastic MapReduce (EMR)
Picture source: https://dimensionless.in/different-ways-to-manage-apache-spark-applications-on-amazon-emr/
Ephemeral clusters on spot instances can dramatically
reduce the cost of operations compared to long running
ones.
Processing large datasets from notebooks using
and EMR within the same workflow
Picture source:
Port analysis findings into a production-quality modules with a
task-oriented design and entry points declared in makefiles
Picture source: https://medium.com/@davidstevens_16424/make-my-day-ta-science-easier-e16bc50e719c
Task:
1. Read
2. Transform
3. Write
Deliver jobs inside containers whenever is possible
Advantages:
● Isolated environment
● Different library requirements
● Different resources (memory, CPUs, GPUs)
● Simplified load balancing
● Scalable model serving
Processing chunks of data in parallel batch jobs
Source: https://spotinst.com/blog/cost-efficient-batch-computing-on-spot-instances-aws-batch-integration/
Containerized job logic
Orchestrate pools of containers using Kubernetes (K8s) for
inference services
Automated code testing pyramid
Unit tests
● Single methods of data
processing utils and major
components.
● Replace “assertEqual” with
uncertainty ranges on
predictions
70%
Integration tests
● Test the training, model
selection and tuning.
● Subset of component
integrations (e.g.
transformers followed by
model predictions)
20%
End-to-end tests
● Static and small dataset.
● Dry runs of the execution
plan.
● Check APIs work seamlessly
through every stage of the
pipeline.
10%
Bonus: Metamorphic testing allows to test ML algorithms by
generating complex, deep tests without the use of an oracle
Infrastructure-as-Code (IaC) is fundamental in order to have
fully-portable and consistent replicas of environments
Benefits:
● Reduced labor cost
● Speed of provisioning
● Minimizes errors and security violations
Automate tasks using Continuous Integration
On commit
Deployment tasks:
● Re-training of models
● Model selection
● Hyper-parameters tuning
● Update pipeline components
● Update microservices
● Publish builds and Docker containers
On release
Picture source: https://deploybot.com/blog/the-expert-guide-to-continuous-integration
Release without pain
Source: Spotify Engineering Culture — Part 1
Validate hypothesis and releases with A/B testing
Source: https://www.optimizely.com/optimization-glossary/ab-testing/
Centralized logging with the ELK stack
Generate Logs Aggregation &
Transformation
Storage & Indexing Visualization & Analysis
Infrastructure Monitoring and Alerting
Basic Monitoring:
AWS resources and
custom metrics
generated by your
applications and
services
Focus on IT Monitoring:
Cloud-scale monitoring of
logs, metrics and traces
from distributed, dynamic
and hybrid infrastructure.
Focus on App Monitoring:
All-in-one performance
management tool from the
end user experience,
through servers, down to
the line of application
code.
Governance and Auditability
Audit changes in the
configuration of resources.
Track account activity by
recording AWS console actions
and API calls.
Respect the Responsible AI principles
Source: https://ethical.institute/principles.html
Adopt the eXplainableAI Framework
Source: https://ethical.institute/xai.html
The 43 Rules of ML Engineering
Martin Zinkevich
Google Research Scientist
https://developers.google.com/machine-learning/guid
es/rules-of-ml/
6. Serverless
Serverless, or how to
build and run
applications without
thinking about
servers
In serverless, the cloud provider is responsible for executing a
piece of code by dynamically allocating the resources
Traditional Serverful Way:
Serverless Way:
Source: https://serverless-stack.com/chapters/what-is-serverless.html
Philosophy behind Serverless
"If a tree falls in a forest and no one is
around to hear it, does it make a sound?"
“If a server runs in the cloud and no one
is around to use it, does it need to incur
any costs?”
WinterClouds
Reasons to migrate to Serverless
Secure Scalable Cheap
Always available Worry free Low maintenance
An overview of Serverless services available in AWS
Docker container
execution.
Script execution in
response of events.
Full list available at https://aws.amazon.com/serverless/
Orchestration of
components and
microservices
Queuing +
publisher/subscriber
message services.
NoSQL Key-Value
database.
REST API
management
service.
Query service to
analyze data at scale
using standard SQL
(like PrestoDB).
ETL service to crawl and
process large datasets on
a fully managed Spark
environment.
Lambda function: listing files in a specified S3 directory
Event object Result objectPython script
Lambda cost: $1.04 / million requests
S3 LIST request cost: $5 / million requests
Serverless.com application framework
Hybrid solution for:
Orchestrating functions using state machines via Step Functions
Serverless scientific computing and Map/Reduce with PyWren
Pictures source: https://www.slideshare.net/AmazonWebServices/massively-parallel-data-processing-with-pywren-and-aws-lambda-srv424-reinvent-2017
Final Remarks
* from “The Start-Up Trap” by Robert C. Martin (Uncle Bob)
“The only way to go fast is to go well” cit. Uncle Bob
Overview of a real-world ML production system
Source: https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems
Embrace the serverless movement
Read the
Manager’s Guide
(first part)
Gianmario Spacagna
Chief Scientist at Helixa.ai
gspacagna@helixa.ai
@gm_spacagna
Appendix A:
Summary Steps
Steps to Managing the ML Product Lifecycle
1. Familiarize with the whole lifecycle and most popular tools and libraries.
2. Adopt a platform such as MLflow to track and version models and experiments.
3. Notebooks are good for explorations but the implementation should be in a codebase.
4. Make analysis, code and infrastructure, reproducible and avoid manual operations.
5. Communicate analysis results effectively summarizing only what is relevant.
6. Invest on automated tests at different integration levels.
7. Exploit Continuous Integration (CI) for automating builds and releases.
8. Deliver models and components inside Docker containers, when possible.
9. Centralize the logs collection for debugging and troubleshooting.
10. Monitor the infrastructure health using specific tools.
11. Consider a strategy for implementing Governance and Auditability.
Steps to migrate to Serverless architectures
1. Reverse Conway’s law: “Organizations produce software that resemble their
organizational communication structures”.
2. Divide your architecture in separate and simple services.
3. Adopt the serverless.com framework to make easier to develop lambda functions.
4. Pick the most suitable serverless MapReduce architecture for your needs.
5. Enjoy your team having fun with simplified and scalable deployments.
6. Make a report to your boss showing the consistent amount of saved costs.
Appendix B:
Serverless MapReduce
How can I process
large datasets using
serverless?
Serverless MapReduce with PyWren serializes and run local
Python code and return results back to the driver
Pictures source: https://www.slideshare.net/AmazonWebServices/massively-parallel-data-processing-with-pywren-and-aws-lambda-srv424-reinvent-2017
Serverless MapReduce with events sourced from S3
Picture source: https://aws.amazon.com/it/blogs/compute/ad-hoc-big-data-processing-made-simple-with-serverless-mapreduce/
Serverless MapReduce with Parallel tasks invoking
synchronously up to 10 concurrent lambdas
* A single Lambda function only supports up to 10 concurrent executions when invoked synchronously
Serverless MapReduce with queue polling invoking
asynchronously many concurrent lambdas within AWS limits*
...
Mapper2
Mapper1
Mapper n
SQS queue
Poll the queue
Driver
* StepFunctions has a limit of 1000 transitions/second and a max execution history size of 25k events.
Serverless MapReduce with activity callbacks invoking unlimited
parallel executions without limits
Source: https://semantive.com/part-2-asynchronous-actions-within-aws-step-functions-without-servers/
...
...
mapper1 mapper n
Get activity token and wait for
mapper activity to complete
Start mapper activity asynchronously
with the corresponding token
Send activity task success
s
driver

More Related Content

What's hot

Google Cloud GenAI Overview_071223.pptx
Google Cloud GenAI Overview_071223.pptxGoogle Cloud GenAI Overview_071223.pptx
Google Cloud GenAI Overview_071223.pptxVishPothapu
 
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬VINCI Digital - Industrial IoT (IIoT) Strategic Advisory
 
[DSC DACH 23] ChatGPT and Beyond: How generative AI is Changing the way peopl...
[DSC DACH 23] ChatGPT and Beyond: How generative AI is Changing the way peopl...[DSC DACH 23] ChatGPT and Beyond: How generative AI is Changing the way peopl...
[DSC DACH 23] ChatGPT and Beyond: How generative AI is Changing the way peopl...DataScienceConferenc1
 
ChatGPT and not only: how can you use the power of Generative AI at scale
ChatGPT and not only: how can you use the power of Generative AI at scaleChatGPT and not only: how can you use the power of Generative AI at scale
ChatGPT and not only: how can you use the power of Generative AI at scaleMaxim Salnikov
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfPremNaraindas1
 
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersHow do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersIvo Andreev
 
Generative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveGenerative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveHuahai Yang
 
Cavalry Ventures | Deep Dive: Generative AI
Cavalry Ventures | Deep Dive: Generative AICavalry Ventures | Deep Dive: Generative AI
Cavalry Ventures | Deep Dive: Generative AICavalry Ventures
 
apidays Paris 2022 - The next five years of the API Economy, Paolo Malinverno...
apidays Paris 2022 - The next five years of the API Economy, Paolo Malinverno...apidays Paris 2022 - The next five years of the API Economy, Paolo Malinverno...
apidays Paris 2022 - The next five years of the API Economy, Paolo Malinverno...apidays
 
Automated Testing with Logic Apps and Specflow
Automated Testing with Logic Apps and SpecflowAutomated Testing with Logic Apps and Specflow
Automated Testing with Logic Apps and SpecflowBizTalk360
 
Conversational AI and Chatbot Integrations
Conversational AI and Chatbot IntegrationsConversational AI and Chatbot Integrations
Conversational AI and Chatbot IntegrationsCristina Vidu
 
Building AI Product using AI Product Thinking
Building AI Product using AI Product Thinking Building AI Product using AI Product Thinking
Building AI Product using AI Product Thinking Saurabh Kaushik
 
Leveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesLeveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesDianaGray10
 
Torry Harris API and Application Integration Governance Framework
Torry Harris API and Application Integration Governance FrameworkTorry Harris API and Application Integration Governance Framework
Torry Harris API and Application Integration Governance FrameworkShubaS4
 
What is a Citizen Developer? How Can You Harness the Power of Citizen Develop...
What is a Citizen Developer? How Can You Harness the Power of Citizen Develop...What is a Citizen Developer? How Can You Harness the Power of Citizen Develop...
What is a Citizen Developer? How Can You Harness the Power of Citizen Develop...Maruti Techlabs
 
apidays Paris 2022 - Agile API delivery with Feature Toggles, Rafik Ferroukh,...
apidays Paris 2022 - Agile API delivery with Feature Toggles, Rafik Ferroukh,...apidays Paris 2022 - Agile API delivery with Feature Toggles, Rafik Ferroukh,...
apidays Paris 2022 - Agile API delivery with Feature Toggles, Rafik Ferroukh,...apidays
 
Responsible Generative AI
Responsible Generative AIResponsible Generative AI
Responsible Generative AICMassociates
 
Global Azure Bootcamp Pune 2023 - Lead the AI era with Microsoft Azure.pdf
Global Azure Bootcamp Pune 2023 -  Lead the AI era with Microsoft Azure.pdfGlobal Azure Bootcamp Pune 2023 -  Lead the AI era with Microsoft Azure.pdf
Global Azure Bootcamp Pune 2023 - Lead the AI era with Microsoft Azure.pdfAroh Shukla
 
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A..."Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...ISPMAIndia
 

What's hot (20)

Google Cloud GenAI Overview_071223.pptx
Google Cloud GenAI Overview_071223.pptxGoogle Cloud GenAI Overview_071223.pptx
Google Cloud GenAI Overview_071223.pptx
 
Charles Caldwell - Improve Your Life with AI.pdf
Charles Caldwell - Improve Your Life with AI.pdfCharles Caldwell - Improve Your Life with AI.pdf
Charles Caldwell - Improve Your Life with AI.pdf
 
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
 
[DSC DACH 23] ChatGPT and Beyond: How generative AI is Changing the way peopl...
[DSC DACH 23] ChatGPT and Beyond: How generative AI is Changing the way peopl...[DSC DACH 23] ChatGPT and Beyond: How generative AI is Changing the way peopl...
[DSC DACH 23] ChatGPT and Beyond: How generative AI is Changing the way peopl...
 
ChatGPT and not only: how can you use the power of Generative AI at scale
ChatGPT and not only: how can you use the power of Generative AI at scaleChatGPT and not only: how can you use the power of Generative AI at scale
ChatGPT and not only: how can you use the power of Generative AI at scale
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
 
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersHow do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
 
Generative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveGenerative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's Perspective
 
Cavalry Ventures | Deep Dive: Generative AI
Cavalry Ventures | Deep Dive: Generative AICavalry Ventures | Deep Dive: Generative AI
Cavalry Ventures | Deep Dive: Generative AI
 
apidays Paris 2022 - The next five years of the API Economy, Paolo Malinverno...
apidays Paris 2022 - The next five years of the API Economy, Paolo Malinverno...apidays Paris 2022 - The next five years of the API Economy, Paolo Malinverno...
apidays Paris 2022 - The next five years of the API Economy, Paolo Malinverno...
 
Automated Testing with Logic Apps and Specflow
Automated Testing with Logic Apps and SpecflowAutomated Testing with Logic Apps and Specflow
Automated Testing with Logic Apps and Specflow
 
Conversational AI and Chatbot Integrations
Conversational AI and Chatbot IntegrationsConversational AI and Chatbot Integrations
Conversational AI and Chatbot Integrations
 
Building AI Product using AI Product Thinking
Building AI Product using AI Product Thinking Building AI Product using AI Product Thinking
Building AI Product using AI Product Thinking
 
Leveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesLeveraging Generative AI & Best practices
Leveraging Generative AI & Best practices
 
Torry Harris API and Application Integration Governance Framework
Torry Harris API and Application Integration Governance FrameworkTorry Harris API and Application Integration Governance Framework
Torry Harris API and Application Integration Governance Framework
 
What is a Citizen Developer? How Can You Harness the Power of Citizen Develop...
What is a Citizen Developer? How Can You Harness the Power of Citizen Develop...What is a Citizen Developer? How Can You Harness the Power of Citizen Develop...
What is a Citizen Developer? How Can You Harness the Power of Citizen Develop...
 
apidays Paris 2022 - Agile API delivery with Feature Toggles, Rafik Ferroukh,...
apidays Paris 2022 - Agile API delivery with Feature Toggles, Rafik Ferroukh,...apidays Paris 2022 - Agile API delivery with Feature Toggles, Rafik Ferroukh,...
apidays Paris 2022 - Agile API delivery with Feature Toggles, Rafik Ferroukh,...
 
Responsible Generative AI
Responsible Generative AIResponsible Generative AI
Responsible Generative AI
 
Global Azure Bootcamp Pune 2023 - Lead the AI era with Microsoft Azure.pdf
Global Azure Bootcamp Pune 2023 -  Lead the AI era with Microsoft Azure.pdfGlobal Azure Bootcamp Pune 2023 -  Lead the AI era with Microsoft Azure.pdf
Global Azure Bootcamp Pune 2023 - Lead the AI era with Microsoft Azure.pdf
 
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A..."Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
 

Similar to Tech leaders guide to effective building of machine learning products

Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsMárton Kodok
 
DevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflowsDevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflowsMárton Kodok
 
Onion Architecture with S#arp
Onion Architecture with S#arpOnion Architecture with S#arp
Onion Architecture with S#arpGary Pedretti
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Luciano Resende
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Tushar Katarki
 
Legion - AI Runtime Platform
Legion -  AI Runtime PlatformLegion -  AI Runtime Platform
Legion - AI Runtime PlatformAlexey Kharlamov
 
49.INS2065.Computer Based Technologies.TA.NguyenDucAnh.pdf
49.INS2065.Computer Based Technologies.TA.NguyenDucAnh.pdf49.INS2065.Computer Based Technologies.TA.NguyenDucAnh.pdf
49.INS2065.Computer Based Technologies.TA.NguyenDucAnh.pdfcNguyn506241
 
LOTAR-PDES: Engineering digitalization through task automation and reuse in t...
LOTAR-PDES: Engineering digitalization through task automation and reuse in t...LOTAR-PDES: Engineering digitalization through task automation and reuse in t...
LOTAR-PDES: Engineering digitalization through task automation and reuse in t...CARLOS III UNIVERSITY OF MADRID
 
MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with DatabricksLiangjun Jiang
 
Mlflow with databricks
Mlflow with databricksMlflow with databricks
Mlflow with databricksLiangjun Jiang
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...Databricks
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Databricks
 
Lunch and learn as3_frameworks
Lunch and learn as3_frameworksLunch and learn as3_frameworks
Lunch and learn as3_frameworksYuri Visser
 
MLOps pipelines using MLFlow - From training to production
MLOps pipelines using MLFlow - From training to productionMLOps pipelines using MLFlow - From training to production
MLOps pipelines using MLFlow - From training to productionFabian Hadiji
 
Metaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixMetaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixBill Liu
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowKaxil Naik
 
Object oriented software_engg
Object oriented software_enggObject oriented software_engg
Object oriented software_enggAnnie Thomas
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...Abhinav Joshi
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesDatabricks
 

Similar to Tech leaders guide to effective building of machine learning products (20)

Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
 
DevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflowsDevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflows
 
Onion Architecture with S#arp
Onion Architecture with S#arpOnion Architecture with S#arp
Onion Architecture with S#arp
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes
 
Legion - AI Runtime Platform
Legion -  AI Runtime PlatformLegion -  AI Runtime Platform
Legion - AI Runtime Platform
 
49.INS2065.Computer Based Technologies.TA.NguyenDucAnh.pdf
49.INS2065.Computer Based Technologies.TA.NguyenDucAnh.pdf49.INS2065.Computer Based Technologies.TA.NguyenDucAnh.pdf
49.INS2065.Computer Based Technologies.TA.NguyenDucAnh.pdf
 
LOTAR-PDES: Engineering digitalization through task automation and reuse in t...
LOTAR-PDES: Engineering digitalization through task automation and reuse in t...LOTAR-PDES: Engineering digitalization through task automation and reuse in t...
LOTAR-PDES: Engineering digitalization through task automation and reuse in t...
 
MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with Databricks
 
Mlflow with databricks
Mlflow with databricksMlflow with databricks
Mlflow with databricks
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
 
Lunch and learn as3_frameworks
Lunch and learn as3_frameworksLunch and learn as3_frameworks
Lunch and learn as3_frameworks
 
MLOps pipelines using MLFlow - From training to production
MLOps pipelines using MLFlow - From training to productionMLOps pipelines using MLFlow - From training to production
MLOps pipelines using MLFlow - From training to production
 
Metaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixMetaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at Netflix
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
Object oriented software_engg
Object oriented software_enggObject oriented software_engg
Object oriented software_engg
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using Kubernetes
 

More from Gianmario Spacagna

Latent Panelists Affinities: a Helixa case study
Latent Panelists Affinities: a Helixa case studyLatent Panelists Affinities: a Helixa case study
Latent Panelists Affinities: a Helixa case studyGianmario Spacagna
 
Anomaly Detection using Deep Auto-Encoders
Anomaly Detection using Deep Auto-EncodersAnomaly Detection using Deep Auto-Encoders
Anomaly Detection using Deep Auto-EncodersGianmario Spacagna
 
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...Gianmario Spacagna
 
Logical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupLogical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupGianmario Spacagna
 
Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...Gianmario Spacagna
 
Parallel Tuning of Machine Learning Algorithms, Thesis Proposal
Parallel Tuning of Machine Learning Algorithms, Thesis ProposalParallel Tuning of Machine Learning Algorithms, Thesis Proposal
Parallel Tuning of Machine Learning Algorithms, Thesis ProposalGianmario Spacagna
 

More from Gianmario Spacagna (7)

Latent Panelists Affinities: a Helixa case study
Latent Panelists Affinities: a Helixa case studyLatent Panelists Affinities: a Helixa case study
Latent Panelists Affinities: a Helixa case study
 
Anomaly Detection using Deep Auto-Encoders
Anomaly Detection using Deep Auto-EncodersAnomaly Detection using Deep Auto-Encoders
Anomaly Detection using Deep Auto-Encoders
 
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...
 
Logical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupLogical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetup
 
Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...
 
TunUp final presentation
TunUp final presentationTunUp final presentation
TunUp final presentation
 
Parallel Tuning of Machine Learning Algorithms, Thesis Proposal
Parallel Tuning of Machine Learning Algorithms, Thesis ProposalParallel Tuning of Machine Learning Algorithms, Thesis Proposal
Parallel Tuning of Machine Learning Algorithms, Thesis Proposal
 

Recently uploaded

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 

Recently uploaded (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 

Tech leaders guide to effective building of machine learning products

  • 1. Helixa Tech Leader’s Guide to Effective Building of Machine Learning Products Gianmario Spacagna Chief Scientist @ Helixa ML for Enterprises Conference Rome, 28th October 2019
  • 2. About Me 7+ years experience in building Machine Learning products Currently leading a team of ML Scientists and ML Engineers Background in Software Engineering of Distributed Systems MBA Candidate Co-author of Python Deep Learning Contributor of the Professional Data Science Manifesto Blogger of Data Science Vademecum Founder of the DataScienceMilan.org community Stockholm, London, Milan Gianmario Spacagna Chief Scientist, Helixa gspacagna@helixa.ai
  • 3. Agenda Manager’s guide (40 minutes) 1. Introducing ML in the Enterprises 2. Defining the ML Product Specifications 3. Planning Under Uncertainty 4. Building a balanced ML Team Tech Leaders’ guide (20 minutes) 5. ML Product Lifecycle 6. Serverless architectures
  • 4. 5. ML Product Lifecycle
  • 5. Cloud Providers Disclaimer The following examples will focus on AWS stack but consider that other cloud providers offers similar services. It is not part of this talk to compare different cloud solutions.
  • 6. * from “The Start-Up Trap” by Robert C. Martin (Uncle Bob) “The only way to go fast is to go well” cit. Uncle Bob
  • 7. Overview of a real-world ML production system Source: https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems Only a small fraction of real-world ML systems is composed of the ML Code. The required surrounding infrastructure is vast and complex.
  • 8. Overview of a ML component lifecycle Picture source: https://medium.com/microsoftazure/how-to-accelerate-devops-with-machine-learning-lifecycle-management-2ca4c86387a0
  • 9. Manage the lifecycle with dedicated platforms Picture source: www.mlflow.org Predictions Model Serving Training
  • 10. Other Machine Learning lifecycle platforms and tools Picture source: www.mlflow.org TensorFlow Extended (TFX) Data Version Control HopsWorks
  • 11. Native Cloud Object (Data) Storage Benefits: ● Cheaper ● Elastic ● Highly available ● Performant “The benefit of HDFS is minimal and not worth the operational complexity” Source: DataBricks
  • 12. Keep Your Datasets Registered in a Catalog Production solution: AWS Glue Data CatalogManual solution
  • 13. Manage training labels using Snorkel www.snorkel.org
  • 14. Dev tool stack and workflow Pull Notebooks and data stored in S3 in shared folders S3 buckets mounted locally via Alluxio cache for fast and cheap access to data Commit and push Dev Unix Machine Notebook name matching branch ID Header cell: 1. Pull the latest version of the code and install locally 2. Print the git status and dependency versions Develop code in the laptop using professional IDEs Feature branches matching Jira key Branching models: ● GitFlow ● Trunk-based
  • 15. Processing large datasets with Elastic MapReduce (EMR) Picture source: https://dimensionless.in/different-ways-to-manage-apache-spark-applications-on-amazon-emr/ Ephemeral clusters on spot instances can dramatically reduce the cost of operations compared to long running ones.
  • 16. Processing large datasets from notebooks using and EMR within the same workflow Picture source:
  • 17. Port analysis findings into a production-quality modules with a task-oriented design and entry points declared in makefiles Picture source: https://medium.com/@davidstevens_16424/make-my-day-ta-science-easier-e16bc50e719c Task: 1. Read 2. Transform 3. Write
  • 18. Deliver jobs inside containers whenever is possible Advantages: ● Isolated environment ● Different library requirements ● Different resources (memory, CPUs, GPUs) ● Simplified load balancing ● Scalable model serving
  • 19. Processing chunks of data in parallel batch jobs Source: https://spotinst.com/blog/cost-efficient-batch-computing-on-spot-instances-aws-batch-integration/ Containerized job logic
  • 20. Orchestrate pools of containers using Kubernetes (K8s) for inference services
  • 21. Automated code testing pyramid Unit tests ● Single methods of data processing utils and major components. ● Replace “assertEqual” with uncertainty ranges on predictions 70% Integration tests ● Test the training, model selection and tuning. ● Subset of component integrations (e.g. transformers followed by model predictions) 20% End-to-end tests ● Static and small dataset. ● Dry runs of the execution plan. ● Check APIs work seamlessly through every stage of the pipeline. 10%
  • 22. Bonus: Metamorphic testing allows to test ML algorithms by generating complex, deep tests without the use of an oracle
  • 23. Infrastructure-as-Code (IaC) is fundamental in order to have fully-portable and consistent replicas of environments Benefits: ● Reduced labor cost ● Speed of provisioning ● Minimizes errors and security violations
  • 24. Automate tasks using Continuous Integration On commit Deployment tasks: ● Re-training of models ● Model selection ● Hyper-parameters tuning ● Update pipeline components ● Update microservices ● Publish builds and Docker containers On release Picture source: https://deploybot.com/blog/the-expert-guide-to-continuous-integration
  • 25. Release without pain Source: Spotify Engineering Culture — Part 1
  • 26. Validate hypothesis and releases with A/B testing Source: https://www.optimizely.com/optimization-glossary/ab-testing/
  • 27. Centralized logging with the ELK stack Generate Logs Aggregation & Transformation Storage & Indexing Visualization & Analysis
  • 28. Infrastructure Monitoring and Alerting Basic Monitoring: AWS resources and custom metrics generated by your applications and services Focus on IT Monitoring: Cloud-scale monitoring of logs, metrics and traces from distributed, dynamic and hybrid infrastructure. Focus on App Monitoring: All-in-one performance management tool from the end user experience, through servers, down to the line of application code.
  • 29. Governance and Auditability Audit changes in the configuration of resources. Track account activity by recording AWS console actions and API calls.
  • 30. Respect the Responsible AI principles Source: https://ethical.institute/principles.html
  • 31. Adopt the eXplainableAI Framework Source: https://ethical.institute/xai.html
  • 32. The 43 Rules of ML Engineering Martin Zinkevich Google Research Scientist https://developers.google.com/machine-learning/guid es/rules-of-ml/
  • 34. Serverless, or how to build and run applications without thinking about servers
  • 35. In serverless, the cloud provider is responsible for executing a piece of code by dynamically allocating the resources Traditional Serverful Way: Serverless Way: Source: https://serverless-stack.com/chapters/what-is-serverless.html
  • 36. Philosophy behind Serverless "If a tree falls in a forest and no one is around to hear it, does it make a sound?" “If a server runs in the cloud and no one is around to use it, does it need to incur any costs?” WinterClouds
  • 37. Reasons to migrate to Serverless Secure Scalable Cheap Always available Worry free Low maintenance
  • 38. An overview of Serverless services available in AWS Docker container execution. Script execution in response of events. Full list available at https://aws.amazon.com/serverless/ Orchestration of components and microservices Queuing + publisher/subscriber message services. NoSQL Key-Value database. REST API management service. Query service to analyze data at scale using standard SQL (like PrestoDB). ETL service to crawl and process large datasets on a fully managed Spark environment.
  • 39. Lambda function: listing files in a specified S3 directory Event object Result objectPython script Lambda cost: $1.04 / million requests S3 LIST request cost: $5 / million requests
  • 41. Orchestrating functions using state machines via Step Functions
  • 42. Serverless scientific computing and Map/Reduce with PyWren Pictures source: https://www.slideshare.net/AmazonWebServices/massively-parallel-data-processing-with-pywren-and-aws-lambda-srv424-reinvent-2017
  • 44. * from “The Start-Up Trap” by Robert C. Martin (Uncle Bob) “The only way to go fast is to go well” cit. Uncle Bob
  • 45. Overview of a real-world ML production system Source: https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems
  • 48. Gianmario Spacagna Chief Scientist at Helixa.ai gspacagna@helixa.ai @gm_spacagna
  • 50. Steps to Managing the ML Product Lifecycle 1. Familiarize with the whole lifecycle and most popular tools and libraries. 2. Adopt a platform such as MLflow to track and version models and experiments. 3. Notebooks are good for explorations but the implementation should be in a codebase. 4. Make analysis, code and infrastructure, reproducible and avoid manual operations. 5. Communicate analysis results effectively summarizing only what is relevant. 6. Invest on automated tests at different integration levels. 7. Exploit Continuous Integration (CI) for automating builds and releases. 8. Deliver models and components inside Docker containers, when possible. 9. Centralize the logs collection for debugging and troubleshooting. 10. Monitor the infrastructure health using specific tools. 11. Consider a strategy for implementing Governance and Auditability.
  • 51. Steps to migrate to Serverless architectures 1. Reverse Conway’s law: “Organizations produce software that resemble their organizational communication structures”. 2. Divide your architecture in separate and simple services. 3. Adopt the serverless.com framework to make easier to develop lambda functions. 4. Pick the most suitable serverless MapReduce architecture for your needs. 5. Enjoy your team having fun with simplified and scalable deployments. 6. Make a report to your boss showing the consistent amount of saved costs.
  • 53. How can I process large datasets using serverless?
  • 54. Serverless MapReduce with PyWren serializes and run local Python code and return results back to the driver Pictures source: https://www.slideshare.net/AmazonWebServices/massively-parallel-data-processing-with-pywren-and-aws-lambda-srv424-reinvent-2017
  • 55. Serverless MapReduce with events sourced from S3 Picture source: https://aws.amazon.com/it/blogs/compute/ad-hoc-big-data-processing-made-simple-with-serverless-mapreduce/
  • 56. Serverless MapReduce with Parallel tasks invoking synchronously up to 10 concurrent lambdas * A single Lambda function only supports up to 10 concurrent executions when invoked synchronously
  • 57. Serverless MapReduce with queue polling invoking asynchronously many concurrent lambdas within AWS limits* ... Mapper2 Mapper1 Mapper n SQS queue Poll the queue Driver * StepFunctions has a limit of 1000 transitions/second and a max execution history size of 25k events.
  • 58. Serverless MapReduce with activity callbacks invoking unlimited parallel executions without limits Source: https://semantive.com/part-2-asynchronous-actions-within-aws-step-functions-without-servers/ ... ... mapper1 mapper n Get activity token and wait for mapper activity to complete Start mapper activity asynchronously with the corresponding token Send activity task success s driver