SlideShare a Scribd company logo
1 of 24
Download to read offline
End-to-End Machine Learning
Pipeline with Docker Enterprise and
Kubeflow
Categorizing Docker
Hub Public Images
Roberto Hashioka / @rhashioka
Software Engineer, Docker
Amn Rahman / @amnrahman
Data Engineer, Docker
●Problem Statement

●Architecture and Workflow

●Docker Enterprise features 

●Demo

●Challenges and Next Steps
Agenda
● Current state:

○ More than 4.6M unlabeled public repositories on Hub 

○ Inability to search Hub images based on categories

○ Limited user experience due to lack of context (e.g. backend-
frameworks->databases) 

● Proposed Solution:

○ Automate the Docker Hub image categorization (send
description -> return suggested categories/topics)

Use Case - Docker Hub
● How to reliably deploy models created by data scientists to
production? 

● How to facilitate the creation and maintenance of end-to-end ML
pipelines?

● How to automate the ML pipeline workflows?

Problem Definition
Demonstrate the use of Kubeflow on Docker
Enterprise to train and deploy an ML model.
Project / Solution Scope
Personas
DEVOPS
ENGINEERS
• Provide and maintain
infrastructure services for
other teams

• Monitoring & Configuration
management

• Deployment Automation &
Security
DATA
ENGINEERS
• Develop, test and maintain
data pipelines

• Improve data reliability,
efficiency and quality

• Discover opportunities for
data acquisition

DATA
SCIENTISTS
• Answer industry and business
questions

• Prepare data for use in
predictive modeling

• Building ML models to
improve existing product
workflows and UX

DATAOPS ENGINEERS
The Kubeflow project is dedicated to making deployments of
machine learning (ML) workflows on Kubernetes simple, portable
and scalable. 

Source: github.com/kubeflow/kubeflow
What is Kubeflow?
Architecture
Kubeflow Prometheus
TFJob
Grafana Jupyter Hub
Tensorflow
Hub
Kubernetes +
Docker Engine
ENTERPRISE PLATFORM
Cloud VM
Bare
Metal
Docker for
Desktops
Notebook 1
Notebook 2
….
Model 1
Model 2
….
Seldon
ArgoAmbassadorKatib
Seldon-Core Architecture
Kubernetes API
Docker Trusted Registry
ENTERPRISE PLATFORM
Ambassador
(API Gateway)
Argo Job (CI/CD)
Data scientists
and engineers Operator
Service
Orchestrator
1.N deployment
graphs
Service
Orchestrator
Service
Orchestrator
Model
1..N
REST or gRPC
Business
Applications
Production Environments
Docker Trusted Registry
Docker EE
Production Environments
Version Control
Non-Production EnvironmentsDeveloper Machine
Development CI/CD Customers
Datacenter 1
Datacenter 2
Docker for
Development Process
Docker EE
Docker Content Trust
$ docker trust sign dev/dockerhubclassifier:v3
Signing and pushing trust metadata for dev/dockerhubclassifier:v3

The push refers to a repository [docker.io/dev/dockerhubclassifier:v3]
...

v1: digest: sha256:74d4bfa917d55d53c7df3d2ab20a8d926874d61c3da5ef6de15dd2654fc467c4 size: 1357

Signing and pushing trust metadata

Enter passphrase for delegation key with ID 27d42a8:

Successfully signed dev/dockerhubclassifier:v3
Image Scanning
Image Promotion
Image Promotion
Data Collection Exploratory Data Analysis
Data Cleaning and
Transformation
Model Building
Deploying Model to
Production
21 3
4 5
Collected data from other sources on
the internet with tagged content and
joined with Hub data
Pandas data frames, distribution of
categories, observing correlations etc.
using several Python packages 

Splitting, training, predicting and tuning.
Reducing categories, stop word
removal, encoding labels, vectorization
and general transformations to feed into
classifiers.

Serving, monitoring and logging at scale
Model Selection /
Walkthrough of Jupyter
Notebook
DEMO
Kubeflow: Composability, Portability and Scalability
Kubeflow: Composability, Portability and Scalability
Argo Workflow on Docker for Desktop
and Docker Enterprise

Model performance Monitoring
DEMO
Repo owner Mike
Mike decides to be a good person
and labels his repo
With ML, Mike can see the most
relevant categories according the
repo’s content
and proceeds to choose the
appropriate labels
- Simpler workflow with Kubeflow Pipelines 

- Stepping stone for future ML projects 

- Canary deployments and A/B Testing with Seldon +
Istio
Challenges / Next Steps
● End-to-End Machine Learning Pipeline with Docker for Desktop and Kubeflow
○ https://github.com/dockersamples/docker-hub-ml-project
Contributions to the Community
Docker for Desktop Kubeflow
- https://github.com/kubeflow/kubeflow

- https://www.docker.com/products/docker-desktop

- https://github.com/dockersamples/docker-hub-ml-project
Resources
THANK YOU!

More Related Content

What's hot

Architecting for Continuous Delivery
Architecting for Continuous DeliveryArchitecting for Continuous Delivery
Architecting for Continuous DeliveryMohammad Bilal Wahla
 
17 Things Developers Should Know About Databases
17 Things Developers Should Know About Databases17 Things Developers Should Know About Databases
17 Things Developers Should Know About DatabasesAll Things Open
 
What's new in python 3.8? | Python 3.8 New Features | Edureka
What's new in python 3.8? | Python 3.8 New Features | EdurekaWhat's new in python 3.8? | Python 3.8 New Features | Edureka
What's new in python 3.8? | Python 3.8 New Features | EdurekaEdureka!
 
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Animesh Singh
 
Red Hat OpenShift on Bare Metal and Containerized Storage
Red Hat OpenShift on Bare Metal and Containerized StorageRed Hat OpenShift on Bare Metal and Containerized Storage
Red Hat OpenShift on Bare Metal and Containerized StorageGreg Hoelzer
 
Kubernetes for java developers
Kubernetes for java developersKubernetes for java developers
Kubernetes for java developersSandro Giacomozzi
 
Introduction to openshift
Introduction to openshiftIntroduction to openshift
Introduction to openshiftMamathaBusi
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introductionkanedafromparis
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014Hojoong Kim
 
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...NETWAYS
 
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...Yong Feng
 
CNCF Live Webinar: Kubernetes 1.23
CNCF Live Webinar: Kubernetes 1.23CNCF Live Webinar: Kubernetes 1.23
CNCF Live Webinar: Kubernetes 1.23LibbySchulze
 
Using Containers to More Effectively Manage DevOps Continuous Integration
Using Containers to More Effectively Manage DevOps Continuous IntegrationUsing Containers to More Effectively Manage DevOps Continuous Integration
Using Containers to More Effectively Manage DevOps Continuous IntegrationCognizant
 
Sprint 44 review
Sprint 44 reviewSprint 44 review
Sprint 44 reviewManageIQ
 
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius SchumacherOSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius SchumacherNETWAYS
 
Dell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big DataDell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big DataBlueData, Inc.
 
Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Animesh Singh
 
OpenShift Overview - Red Hat Open House 2017
OpenShift Overview - Red Hat Open House 2017OpenShift Overview - Red Hat Open House 2017
OpenShift Overview - Red Hat Open House 2017Rodolfo Carvalho
 
OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4
OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4
OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4Robert Bohne
 

What's hot (20)

Architecting for Continuous Delivery
Architecting for Continuous DeliveryArchitecting for Continuous Delivery
Architecting for Continuous Delivery
 
17 Things Developers Should Know About Databases
17 Things Developers Should Know About Databases17 Things Developers Should Know About Databases
17 Things Developers Should Know About Databases
 
What's new in python 3.8? | Python 3.8 New Features | Edureka
What's new in python 3.8? | Python 3.8 New Features | EdurekaWhat's new in python 3.8? | Python 3.8 New Features | Edureka
What's new in python 3.8? | Python 3.8 New Features | Edureka
 
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
 
Red Hat OpenShift on Bare Metal and Containerized Storage
Red Hat OpenShift on Bare Metal and Containerized StorageRed Hat OpenShift on Bare Metal and Containerized Storage
Red Hat OpenShift on Bare Metal and Containerized Storage
 
Kubernetes for java developers
Kubernetes for java developersKubernetes for java developers
Kubernetes for java developers
 
Introduction to openshift
Introduction to openshiftIntroduction to openshift
Introduction to openshift
 
FluentD vs. Logstash
FluentD vs. LogstashFluentD vs. Logstash
FluentD vs. Logstash
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014
 
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
 
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
 
CNCF Live Webinar: Kubernetes 1.23
CNCF Live Webinar: Kubernetes 1.23CNCF Live Webinar: Kubernetes 1.23
CNCF Live Webinar: Kubernetes 1.23
 
Using Containers to More Effectively Manage DevOps Continuous Integration
Using Containers to More Effectively Manage DevOps Continuous IntegrationUsing Containers to More Effectively Manage DevOps Continuous Integration
Using Containers to More Effectively Manage DevOps Continuous Integration
 
Sprint 44 review
Sprint 44 reviewSprint 44 review
Sprint 44 review
 
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius SchumacherOSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
 
Dell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big DataDell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big Data
 
Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)
 
OpenShift Overview - Red Hat Open House 2017
OpenShift Overview - Red Hat Open House 2017OpenShift Overview - Red Hat Open House 2017
OpenShift Overview - Red Hat Open House 2017
 
OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4
OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4
OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4
 

Similar to Categorizing Docker Hub Public Images

Docker Birthday #3 - Intro to Docker Slides
Docker Birthday #3 - Intro to Docker SlidesDocker Birthday #3 - Intro to Docker Slides
Docker Birthday #3 - Intro to Docker SlidesDocker, Inc.
 
Docker Birthday #3 Slides - Overview
Docker Birthday #3 Slides - OverviewDocker Birthday #3 Slides - Overview
Docker Birthday #3 Slides - OverviewChris Ciborowski
 
[20200720]cloud native develoment - Nelson Lin
[20200720]cloud native develoment - Nelson Lin[20200720]cloud native develoment - Nelson Lin
[20200720]cloud native develoment - Nelson LinHanLing Shen
 
HPC Cloud Burst Using Docker
HPC Cloud Burst Using DockerHPC Cloud Burst Using Docker
HPC Cloud Burst Using DockerIRJET Journal
 
Docker Roadshow 2016
Docker Roadshow 2016Docker Roadshow 2016
Docker Roadshow 2016Docker, Inc.
 
Container on azure
Container on azureContainer on azure
Container on azureVishwas N
 
Tampere Docker meetup - Happy 5th Birthday Docker
Tampere Docker meetup - Happy 5th Birthday DockerTampere Docker meetup - Happy 5th Birthday Docker
Tampere Docker meetup - Happy 5th Birthday DockerSakari Hoisko
 
Docker Application to Scientific Computing
Docker Application to Scientific ComputingDocker Application to Scientific Computing
Docker Application to Scientific ComputingPeter Bryzgalov
 
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...Patrick Chanezon
 
Azure ai on premises with docker
Azure ai on premises with  dockerAzure ai on premises with  docker
Azure ai on premises with dockerVishwas N
 
Docker dev ops for cd meetup 12-14
Docker dev ops for cd meetup 12-14Docker dev ops for cd meetup 12-14
Docker dev ops for cd meetup 12-14Simon Storm
 
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017Patrick Chanezon
 
What is Docker | Docker Tutorial for Beginners | Docker Container | DevOps To...
What is Docker | Docker Tutorial for Beginners | Docker Container | DevOps To...What is Docker | Docker Tutorial for Beginners | Docker Container | DevOps To...
What is Docker | Docker Tutorial for Beginners | Docker Container | DevOps To...Edureka!
 
Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019Radulescu Adina-Valentina
 
How (and why) to roll your own Docker SaaS
How (and why) to roll your own Docker SaaSHow (and why) to roll your own Docker SaaS
How (and why) to roll your own Docker SaaSRyan Crawford
 
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment ModelUsing Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment ModelDocker, Inc.
 

Similar to Categorizing Docker Hub Public Images (20)

Docker Birthday #3 - Intro to Docker Slides
Docker Birthday #3 - Intro to Docker SlidesDocker Birthday #3 - Intro to Docker Slides
Docker Birthday #3 - Intro to Docker Slides
 
Docker Birthday #3 Slides - Overview
Docker Birthday #3 Slides - OverviewDocker Birthday #3 Slides - Overview
Docker Birthday #3 Slides - Overview
 
[20200720]cloud native develoment - Nelson Lin
[20200720]cloud native develoment - Nelson Lin[20200720]cloud native develoment - Nelson Lin
[20200720]cloud native develoment - Nelson Lin
 
HPC Cloud Burst Using Docker
HPC Cloud Burst Using DockerHPC Cloud Burst Using Docker
HPC Cloud Burst Using Docker
 
Docker Roadshow 2016
Docker Roadshow 2016Docker Roadshow 2016
Docker Roadshow 2016
 
Container on azure
Container on azureContainer on azure
Container on azure
 
Containers Demystified
Containers DemystifiedContainers Demystified
Containers Demystified
 
Tampere Docker meetup - Happy 5th Birthday Docker
Tampere Docker meetup - Happy 5th Birthday DockerTampere Docker meetup - Happy 5th Birthday Docker
Tampere Docker meetup - Happy 5th Birthday Docker
 
Webinar : Docker in Production
Webinar : Docker in ProductionWebinar : Docker in Production
Webinar : Docker in Production
 
Docker Application to Scientific Computing
Docker Application to Scientific ComputingDocker Application to Scientific Computing
Docker Application to Scientific Computing
 
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
 
Azure ai on premises with docker
Azure ai on premises with  dockerAzure ai on premises with  docker
Azure ai on premises with docker
 
Docker dev ops for cd meetup 12-14
Docker dev ops for cd meetup 12-14Docker dev ops for cd meetup 12-14
Docker dev ops for cd meetup 12-14
 
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017
 
Axigen on docker
Axigen on dockerAxigen on docker
Axigen on docker
 
What is Docker | Docker Tutorial for Beginners | Docker Container | DevOps To...
What is Docker | Docker Tutorial for Beginners | Docker Container | DevOps To...What is Docker | Docker Tutorial for Beginners | Docker Container | DevOps To...
What is Docker | Docker Tutorial for Beginners | Docker Container | DevOps To...
 
Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019
 
How (and why) to roll your own Docker SaaS
How (and why) to roll your own Docker SaaSHow (and why) to roll your own Docker SaaS
How (and why) to roll your own Docker SaaS
 
Power of Azure Devops
Power of Azure DevopsPower of Azure Devops
Power of Azure Devops
 
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment ModelUsing Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
 

Recently uploaded

Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Categorizing Docker Hub Public Images

  • 1. End-to-End Machine Learning Pipeline with Docker Enterprise and Kubeflow Categorizing Docker Hub Public Images
  • 2. Roberto Hashioka / @rhashioka Software Engineer, Docker Amn Rahman / @amnrahman Data Engineer, Docker
  • 3. ●Problem Statement ●Architecture and Workflow ●Docker Enterprise features ●Demo ●Challenges and Next Steps Agenda
  • 4. ● Current state: ○ More than 4.6M unlabeled public repositories on Hub ○ Inability to search Hub images based on categories ○ Limited user experience due to lack of context (e.g. backend- frameworks->databases) ● Proposed Solution: ○ Automate the Docker Hub image categorization (send description -> return suggested categories/topics) Use Case - Docker Hub
  • 5. ● How to reliably deploy models created by data scientists to production? ● How to facilitate the creation and maintenance of end-to-end ML pipelines? ● How to automate the ML pipeline workflows? Problem Definition
  • 6. Demonstrate the use of Kubeflow on Docker Enterprise to train and deploy an ML model. Project / Solution Scope
  • 7. Personas DEVOPS ENGINEERS • Provide and maintain infrastructure services for other teams • Monitoring & Configuration management • Deployment Automation & Security DATA ENGINEERS • Develop, test and maintain data pipelines • Improve data reliability, efficiency and quality • Discover opportunities for data acquisition DATA SCIENTISTS • Answer industry and business questions • Prepare data for use in predictive modeling • Building ML models to improve existing product workflows and UX DATAOPS ENGINEERS
  • 8. The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Source: github.com/kubeflow/kubeflow What is Kubeflow?
  • 9. Architecture Kubeflow Prometheus TFJob Grafana Jupyter Hub Tensorflow Hub Kubernetes + Docker Engine ENTERPRISE PLATFORM Cloud VM Bare Metal Docker for Desktops Notebook 1 Notebook 2 …. Model 1 Model 2 …. Seldon ArgoAmbassadorKatib
  • 10. Seldon-Core Architecture Kubernetes API Docker Trusted Registry ENTERPRISE PLATFORM Ambassador (API Gateway) Argo Job (CI/CD) Data scientists and engineers Operator Service Orchestrator 1.N deployment graphs Service Orchestrator Service Orchestrator Model 1..N REST or gRPC Business Applications
  • 11. Production Environments Docker Trusted Registry Docker EE Production Environments Version Control Non-Production EnvironmentsDeveloper Machine Development CI/CD Customers Datacenter 1 Datacenter 2 Docker for Development Process Docker EE
  • 12. Docker Content Trust $ docker trust sign dev/dockerhubclassifier:v3 Signing and pushing trust metadata for dev/dockerhubclassifier:v3
 The push refers to a repository [docker.io/dev/dockerhubclassifier:v3] ...
 v1: digest: sha256:74d4bfa917d55d53c7df3d2ab20a8d926874d61c3da5ef6de15dd2654fc467c4 size: 1357
 Signing and pushing trust metadata
 Enter passphrase for delegation key with ID 27d42a8:
 Successfully signed dev/dockerhubclassifier:v3
  • 16. Data Collection Exploratory Data Analysis Data Cleaning and Transformation Model Building Deploying Model to Production 21 3 4 5 Collected data from other sources on the internet with tagged content and joined with Hub data Pandas data frames, distribution of categories, observing correlations etc. using several Python packages Splitting, training, predicting and tuning. Reducing categories, stop word removal, encoding labels, vectorization and general transformations to feed into classifiers. Serving, monitoring and logging at scale
  • 17. Model Selection / Walkthrough of Jupyter Notebook DEMO
  • 20. Argo Workflow on Docker for Desktop and Docker Enterprise Model performance Monitoring DEMO
  • 21. Repo owner Mike Mike decides to be a good person and labels his repo With ML, Mike can see the most relevant categories according the repo’s content and proceeds to choose the appropriate labels
  • 22. - Simpler workflow with Kubeflow Pipelines - Stepping stone for future ML projects - Canary deployments and A/B Testing with Seldon + Istio Challenges / Next Steps
  • 23. ● End-to-End Machine Learning Pipeline with Docker for Desktop and Kubeflow ○ https://github.com/dockersamples/docker-hub-ml-project Contributions to the Community Docker for Desktop Kubeflow
  • 24. - https://github.com/kubeflow/kubeflow - https://www.docker.com/products/docker-desktop - https://github.com/dockersamples/docker-hub-ml-project Resources THANK YOU!