Hybrid Cloud, Kubeflow and Tensorflow
Extended [TFX]
Session by:
Hybrid Cloud, Kubeflow and TFX
Animesh Singh
Chief Architect
Data and AI Platform
CODAIT
Tommy Li
Software Engineer
CODAIT
Pete MacKinnon
Principal Software Engineer,
RedHat AICoE
Center for Open Source
Data and AI
Technologies (CODAIT)
Code – Build and improve practical frameworks to
enable more developers to realize immediate
value.
Content – Showcase solutions for complex and
real-world AI problems.
Community – Bring developers and data
scientists to engage with IBM
Improving Enterprise AI lifecycle in
Open Source
•  Team	contributes	to	over	10	open	source	projects	
•  Committers	in	Kubeflow,	Spark,	Tensorflow,	PyTorch,	ONNX…	
•  17	committers	and	many	contributors	in	Apache	projects	
•  Speakers	at	over	100	conferences,	meetups,	unconferences	and	more
CODAIT
codait.org
↳ codait.org
The Machine Learning Workflow
4
Perception
And the ML workflow spans teams …
…and is much more complex….
Data
cleansing
Data
analysis
Data
transformation
Data
validation
Data
splitting
Data
prep
Building
a model
Model
validation
Training
at scale
Model
creation
Deploying Serving
Monitoring &
Logging
Finetune &
improvements
Rollout
Training
optimization Model
Model
Data
Data
Data
ingestion
EdgeCloud
Hybrid Clouds, TFX and Kubeflow Pipelines
Hybrid
Cloud
Hybrid Cloud
Definition:
A cloud that is:
•  Inclusive of on-prem and public
•  Multicloud
•  Open
•  Secure
•  Managed
To scale enterprise workloads across the globe
Compute Network Storage
{Infrastructure
Cloud / On-Prem
10
Compute Network Storage
{Infrastructure
Cloud / On-Prem
{Platform Security Kubernetes DevOps CFApp Services
11
Compute Network Storage
{Infrastructure
Cloud / On-Prem
{Platform
Databases Analytics Governance
{Data
Security Kubernetes DevOps CFApp Services
12
Compute Network Storage
{Infrastructure
Cloud / On-Prem
{Platform
Databases Analytics Governance
{Data
Watson TensorFlow Machine learning
{AI
Security Kubernetes DevOps CFApp Services
13
Compute Network Storage
{Infrastructure
Cloud / On-Prem
{Platform
Databases Analytics Governance
TensorFlow Machine learning
Security Kubernetes DevOps CFApp Services
14
Watson
{Data
{AI
Compute Network Storage
{Infrastructure
Cloud / On-Prem
{Platform
Databases Analytics Governance
TensorFlow Machine learning
Public Cloud
Private Cloud
Security Kubernetes DevOps CFApp Services
{Infrastructure
{Platform
15
Watson
{Data
{AI
Hybrid Clouds, TFX and Kubeflow Pipelines
Private
Cloud
OpenShift is our Private Cloud for ML
Workloads
EXISTING
AUTOMATION
TOOLSETS
SCM
(GIT)
CI/CD
SERVICE LAYER
PERSISTEN
T
STORAGE
REGISTRY
RHEL
NODE
c
RHEL
NODE
RHEL
NODE
RHEL
NODE
RHEL
NODE
RHEL
NODE
C
C
C C
C
C
C CC C
RED HAT
ENTERPRISE LINUX
MASTER
API/AUTHENTICATION
DATA STORE
SCHEDULER
HEALTH/SCALING
PHYSICAL VIRTUAL PRIVATE PUBLIC HYBRID
DATA SCIENTIST
ML deployed across
clouds, data center,
and edge
ML services,
load balanced
and scaled
ML microservices
scheduled and
orchestrated on
shared resources
Best of SDLC
ML in Production
17
Enterprise
App
ML
Training
Data
Processing
ML
Inference
CI/CD
Pipelines
Data
Engineer
Data Scientist
Infra
Engineer
App
Developer
Red Hat
OpenShift Container Storage
Baremetal
Red Hat
OpenShift Container Platform
Persona Key
Red Hat
OpenStack
Platform
Multi cloud gateway
Analytics
RDBMS
Discovery
Ceph
OPEN DATA HUB
AI PLATFORM POWERED BY OPEN
SOURCE
Data Ingest
18
We are working across the Data and AI
Lifecycle
The Open Data Hub Project
●  OpenDataHub.io
●  Meta-operator that integrates best open source AI/ML/Data projects
●  Blueprint architecture for AI/ML on OpenShift
https://opendatahub.io/docs/architecture.html
Data
Acquisition & Preparation
ML Model
Selection, Training, Testing
ML Model Deployment in
App. Dev. Process
Open Data Hub Blueprint
20
Hybrid Clouds, TFX and Kubeflow Pipelines
Public
Cloud
IBM Cloud Architecture
Infrastructure
X86, Power CPU, GPU Compute Sleds Programmable Mesh Network Flash & Spinning Storage
Container Orchestration & Networking: Kubernetes
Red Hat OpenShift on IBM Cloud
Armada API
Carrier
Cluster
Workers
Master Master
Cluster
Workers
Red Hat OpenShift on IBM Cloud
Armada API
Carrier
Cluster
Workers
Master Master
Cluster
Workers
RH OpenShift
Carrier
Cluster
Workers
Master Master
Cluster
Workers
Red Hat OpenShift on IBM Cloud
Ubuntu Linux
ARMADA API
Containerd
Kubelet
(Community)
Calico Agent
Ubuntu Linux
CARRIER
WORKER
Containerd
Kubelet
(Community)
Calico Agent
Ubuntu Linux
Containerd
Kubelet
(Community)
Calico Agent
CLUSTER
WORKER
Red Hat OpenShift on IBM Cloud
Ubuntu Linux
ARMADA API
Containerd
Kubelet
(Community)
Calico Agent
Ubuntu Linux
CARRIER
WORKER
Containerd
Kubelet
(Community)
Calico Agent
Ubuntu Linux
Containerd
Kubelet
(Community)
Calico Agent
CLUSTER
WORKER
RHEL
REDHAT
CARRIER
CRI-O
Kubelet
(OpenShift)
Calico Agent
RHEL
CRI-O
Kubelet
(OpenShift)
Calico Agent
REDHAT
WORKER
Red Hat OpenShift on IBM Cloud
Ubuntu Linux
ARMADA API
Containerd
Kubelet
(Community)
Calico Agent
Ubuntu Linux
CARRIER
WORKER
Containerd
Kubelet
(Community)
Calico Agent
Ubuntu Linux
Containerd
Kubelet
(Community)
Calico Agent
CLUSTER
WORKER
RHEL
REDHAT
CARRIER
CRI-O
Kubelet
(OpenShift)
Calico Agent
RHEL
CRI-O
Kubelet
(OpenShift)
Calico Agent
REDHAT
WORKER
Hybrid Clouds, TFX and Kubeflow Pipelines
Kubeflow
Pipelines
29
Distributed Model Training and HPO (Katib, TFJob, PyTorch Job…)
●  Addresses One of the key goals for model builder
persona:
Distributed Model Training and Hyper parameter
optimization for Tensorflow, PyTorch etc.
●  Common problems in HP optimization
○  Overfitting
○  Wrong metrics
○  Too few hyperparameters
●  Katib: a fully open source, Kubernetes-native
hyperparameter tuning service
○  Inspired by Google Vizier
○  Framework agnostic
○  Extensible algorithms
Kubernetes
Compute cluster
GPU, TPU ,CPU
Cloud Object
Storage
Model Assets.
31
Istio
Knative
KFServing
Serving and Management: KFServing
Bringing the power of Knative and Istio for serverless Model deployments
PRE-
PROCESS
PREDICT POST-
PROCESS
EXPLAIN
Manages	the	hosting	aspects	of	your	models	
	
•  KFService	-	manages	the	lifecycle	of	models	
•  Configuration	-	manages	history	of	model	
deployments.	Two	configurations	for	default	
and	canary.	
•  Revision	-	A	snapshot	of	your	model	version	
•  Config	and	image
•  Route	-	Endpoint	and	network	traffic	
management
Route Default
Configuration		
Revision	1
Revision	M	90
%
KFService	
Canary
Configuration		
Revision	1
Revision	N	10
%
KFServing: Default and Canary Configurations
Kubeflow Pipelines
-  Released to Kubeflow in Nov 2018,
integrated into KF deployment CLI and
1-click-deploy-app
-  Aimed to bring
-  Orchestration for complex ML
workflows
-  Reproducible and reliable
experimentation
-  Bridging experimentation and
operationalization
-  Composition and reusable ML
components and pipelines
Kubeflow
Pipelines
Experiment Tracking
34
•  Kubeflow offers an easy way to compare different runs of the pipeline.
•  You can create the pipeline with model training. Then run it multiple times with different parameter values,
and you’ll get accuracy and ROC AUC scores for every run compared.
•  Lot more under “Compare runs” view.
What constitutes a Kubeflow ML Pipeline
§  Containerized implementations of ML Tasks
§  Pre-built components: Just provide params or code
snippets (e.g. training code)
§  Create your own components from code or libraries
§  Use any runtime, framework, data types
§  Attach k8s objects - volumes, secrets
§  Specification of the sequence of steps
§  Specified via Python DSL
§  Inferred from data dependencies on input/output
§  Input Parameters
§  A “Run” = Pipeline invoked w/ specific parameters
§  Can be cloned with different parameters
§  Schedules
§  Invoke a single run or create a recurring scheduled
pipeline
Define Pipeline with Python SDK
@dsl.pipeline(name='Taxi	Cab	Classification	Pipeline	Example’)	
def	taxi_cab_classification(	
				output_dir,		
				project,	
				Train_data						=	'gs://bucket/train.csv',	
				Evaluation_data	=	'gs://bucket/eval.csv',	
				Target										=	'tips',		
				Learning_rate			=	0.1,	hidden_layer_size	=	'100,50’,	steps=3000):	
	
				 	tfdv	 	 	=	TfdvOp(train_data,	evaluation_data,	project,	output_dir)	
				 	preprocess	 	=	PreprocessOp(train_data,	evaluation_data,	tfdv.output[“schema”],	project,	output_dir)	
				 	training	 	=	DnnTrainerOp(preprocess.output,	tfdv.schema,	learning_rate,	hidden_layer_size,	steps,		
target,	output_dir)	
				 	tfma	 	 	=	TfmaOp(training.output,	evaluation_data,	tfdv.schema,	project,	output_dir)	
				 	deploy	 	=	TfServingDeployerOp(training.output)	
Compile and Submit Pipeline Run
dsl.compile(taxi_cab_classification,		'tfx.tar.gz')	
run	=	client.run_pipeline(	
'tfx_run',	'tfx.tar.gz',	params={'output':	‘gs://dpa22’,	'project':	‘my-project-33’})
Creating your own components
-  Ways to build reusable components for pipelines
-  Create a container with your code and write either a ContainerOp() or shareable component descriptor
-  Turn your python code into a component directly in the notebook (with or without building a container)
-  These components can be exported into a shareable format
Container
Image
Execution
Code
ContainerOp
I/O schema
Code
ContainerOp
Container
Image
I/O schema
Container build
Container build
Component
descriptor
Watson AI Operations: Kubeflow Pipelines
e.g. Watson Speech to Text Operations Kubeflow pipeline
e.g. Watson Machine Learning and Watson OpenScale Pipeline
Hybrid Clouds, TFX and Kubeflow Pipelines
TFX
What is TFX?
TL;DR
●  TFX is a platform that to deploy Tensorflow models in production
●  TFX pipelines consist of a set of integrated components
●  TFX pipelines are configured using python
●  TFX consists of components, executors, and libraries
●  TFX components are optional (and repeated)
●  TFX can be configured to run in many different ways
42
TFX has existed externally as open source
libraries.
4343
Open sourced TFX libraries (circa 2018)
TensorFlow
Data Validation
TensorFlow
Transform
TensorFlow
Model Analysis
TensorFlow
Serving
In 2019, the horizontal layers that integrate
TFX libraries as one platform were open
sourced
4444
Open sourced TFX platform (2019)
Data
Ingestion
TensorFlow
Data Validation
TensorFlow
Transform
Estimator
or Keras
Model
TensorFlow
Model Analysis
TensorFlow
Serving
Logging
Shared Utilities for Garbage Collection, Data Access Controls
Pipeline Storage
Shared Configuration Framework and Job Orchestration
Integrated Frontend for Job Management, Monitoring, Debugging, Data/Model/Evaluation Visualization
Anatomy of a Component
TFX components consist of
three main pieces:
●  Driver
●  Executor
●  Publisher
45
Anatomy of a Component
TFX includes both libraries and pipeline components. This diagram illustrates the relationships
between TFX libraries and pipeline components. TFX provides several Python packages that are
the libraries which are used to create pipeline components
46
TFX (inside the box)
47
Other runtimes
ExampleGen
StatisticsGen SchemaGen
Example
Validator
Transform Trainer
Evaluator
Model
Validator
Pusher
TFX Config
Metadata Store
Training +
Eval Data
TensorFlow
Serving
TensorFlow
Hub
TensorFlow
Lite
TensorFlow
JS
TFX Pipeline
TFX uses ml-metadata for artifact
management.
48
Trainer
Task-Aware Pipelines
Input Data
Transformed
Data
Trained
Models
Serving
System
Task- and Data-Aware Pipelines
Pipeline + Metadata Storage
Training Data
Transform TrainerTransform
Snapshot of a component.
49
Metadata Store
Trainer
Config
Last Validated
Model
New (Candidate)
Model
New Model
Model
Validator
Validation
Outcome
Pusher
New (Candidate)
Model
Validation
Outcome
What’s in the Metadata Store?
50
Trained
Models
Type definitions of Artifacts and their Properties
E.g., Models, Data, Evaluation Metrics
Trainer Execution Records (Runs) of Components
E.g., Runtime Configuration, Inputs + Outputs
Lineage Tracking Across All Executions
E.g., to recurse back to all inputs of a specific artifact
Examples of Metadata-Powered Functionality.
51
Find out which data a model was trained on Compare previous model runs
Carry-over state from previous models Re-use previously computed outputs
Hybrid Clouds, TFX and Kubeflow Pipelines
demo
https://github.com/kubeflow/kfp-tekton/tree/master/samples/kfp-tfx
Center for Open-Source Data & AI Technologies (CODAIT) / June 28, 2019 / © 2019 IBM Corporation
Model Goal: Will the customer tip more or less than 20%?
Center for Open-Source Data & AI Technologies (CODAIT) / June 28, 2019 / © 2019 IBM Corporation
TFX Taxi Pipeline
Private Cloud Public Cloud
On-prem
PVC
KubeFlow Pipelines
Istio + Kiali
Kubeflow Serving
Trigger model
deployment
Object
Storage
Hybrid Clouds, TFX and Kubeflow Pipelines
Lessons
Learnt
Recommendations for TFX and KFP
•  TFX Pipelines shall be made executable without any dependency on public cloud service e.g. GCS.
•  Apache Beam is a strong dependency in TFX. Doesn’t support S3 natively
•  TFX DSL shall support dynamically creating Persistent Volume Claims.
•  Support for mixing and matching KFP ContainerOps components with TFX ones through DSL
•  IBM IKS runs Kubernetes with containerd, Openshift uses CRIO APIs. The underlying pipeline platforms
(Argo, Airflow, Beam etc) should support them as first class citizens
•  Visualizing artifacts on the KubeFlow Pipeline UI shall not be limited to GCS by default
•  Don’t assume root privileges on OpenShift and Kube, as well as underlying storage file system.
TFX and KFP DSL
Center for Open-Source Data & AI Technologies (CODAIT) / June 28, 2019 / © 2019 IBM Corporation
DEMO CODE:
https://github.com/kubeflow/kfp-tekton/tree/master/samples/kfp-tfx
RFC for TFX and KFP DSL Merge
https://docs.google.com/document/d/1_n3q0mNOr7gUSM04yaA0e5BO9RrS0Vkh1cNCyrB07WM/edit#
Please reach out at @AnimeshSingh for any follow-on discussions
Thank You!
Session by:
Center for Open-Source Data & AI Technologies (CODAIT) / June 28, 2019 / © 2019 IBM Corporation

Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]

  • 1.
    Hybrid Cloud, Kubeflowand Tensorflow Extended [TFX] Session by:
  • 2.
    Hybrid Cloud, Kubeflowand TFX Animesh Singh Chief Architect Data and AI Platform CODAIT Tommy Li Software Engineer CODAIT Pete MacKinnon Principal Software Engineer, RedHat AICoE
  • 3.
    Center for OpenSource Data and AI Technologies (CODAIT) Code – Build and improve practical frameworks to enable more developers to realize immediate value. Content – Showcase solutions for complex and real-world AI problems. Community – Bring developers and data scientists to engage with IBM Improving Enterprise AI lifecycle in Open Source •  Team contributes to over 10 open source projects •  Committers in Kubeflow, Spark, Tensorflow, PyTorch, ONNX… •  17 committers and many contributors in Apache projects •  Speakers at over 100 conferences, meetups, unconferences and more CODAIT codait.org ↳ codait.org
  • 4.
  • 5.
  • 6.
    And the MLworkflow spans teams …
  • 7.
    …and is muchmore complex…. Data cleansing Data analysis Data transformation Data validation Data splitting Data prep Building a model Model validation Training at scale Model creation Deploying Serving Monitoring & Logging Finetune & improvements Rollout Training optimization Model Model Data Data Data ingestion EdgeCloud
  • 8.
    Hybrid Clouds, TFXand Kubeflow Pipelines Hybrid Cloud
  • 9.
    Hybrid Cloud Definition: A cloudthat is: •  Inclusive of on-prem and public •  Multicloud •  Open •  Secure •  Managed To scale enterprise workloads across the globe
  • 10.
  • 11.
    Compute Network Storage {Infrastructure Cloud/ On-Prem {Platform Security Kubernetes DevOps CFApp Services 11
  • 12.
    Compute Network Storage {Infrastructure Cloud/ On-Prem {Platform Databases Analytics Governance {Data Security Kubernetes DevOps CFApp Services 12
  • 13.
    Compute Network Storage {Infrastructure Cloud/ On-Prem {Platform Databases Analytics Governance {Data Watson TensorFlow Machine learning {AI Security Kubernetes DevOps CFApp Services 13
  • 14.
    Compute Network Storage {Infrastructure Cloud/ On-Prem {Platform Databases Analytics Governance TensorFlow Machine learning Security Kubernetes DevOps CFApp Services 14 Watson {Data {AI
  • 15.
    Compute Network Storage {Infrastructure Cloud/ On-Prem {Platform Databases Analytics Governance TensorFlow Machine learning Public Cloud Private Cloud Security Kubernetes DevOps CFApp Services {Infrastructure {Platform 15 Watson {Data {AI
  • 16.
    Hybrid Clouds, TFXand Kubeflow Pipelines Private Cloud
  • 17.
    OpenShift is ourPrivate Cloud for ML Workloads EXISTING AUTOMATION TOOLSETS SCM (GIT) CI/CD SERVICE LAYER PERSISTEN T STORAGE REGISTRY RHEL NODE c RHEL NODE RHEL NODE RHEL NODE RHEL NODE RHEL NODE C C C C C C C CC C RED HAT ENTERPRISE LINUX MASTER API/AUTHENTICATION DATA STORE SCHEDULER HEALTH/SCALING PHYSICAL VIRTUAL PRIVATE PUBLIC HYBRID DATA SCIENTIST ML deployed across clouds, data center, and edge ML services, load balanced and scaled ML microservices scheduled and orchestrated on shared resources Best of SDLC ML in Production 17
  • 18.
    Enterprise App ML Training Data Processing ML Inference CI/CD Pipelines Data Engineer Data Scientist Infra Engineer App Developer Red Hat OpenShiftContainer Storage Baremetal Red Hat OpenShift Container Platform Persona Key Red Hat OpenStack Platform Multi cloud gateway Analytics RDBMS Discovery Ceph OPEN DATA HUB AI PLATFORM POWERED BY OPEN SOURCE Data Ingest 18 We are working across the Data and AI Lifecycle
  • 19.
    The Open DataHub Project ●  OpenDataHub.io ●  Meta-operator that integrates best open source AI/ML/Data projects ●  Blueprint architecture for AI/ML on OpenShift https://opendatahub.io/docs/architecture.html Data Acquisition & Preparation ML Model Selection, Training, Testing ML Model Deployment in App. Dev. Process
  • 20.
    Open Data HubBlueprint 20
  • 21.
    Hybrid Clouds, TFXand Kubeflow Pipelines Public Cloud
  • 22.
    IBM Cloud Architecture Infrastructure X86,Power CPU, GPU Compute Sleds Programmable Mesh Network Flash & Spinning Storage Container Orchestration & Networking: Kubernetes
  • 23.
    Red Hat OpenShifton IBM Cloud Armada API Carrier Cluster Workers Master Master Cluster Workers
  • 24.
    Red Hat OpenShifton IBM Cloud Armada API Carrier Cluster Workers Master Master Cluster Workers RH OpenShift Carrier Cluster Workers Master Master Cluster Workers
  • 25.
    Red Hat OpenShifton IBM Cloud Ubuntu Linux ARMADA API Containerd Kubelet (Community) Calico Agent Ubuntu Linux CARRIER WORKER Containerd Kubelet (Community) Calico Agent Ubuntu Linux Containerd Kubelet (Community) Calico Agent CLUSTER WORKER
  • 26.
    Red Hat OpenShifton IBM Cloud Ubuntu Linux ARMADA API Containerd Kubelet (Community) Calico Agent Ubuntu Linux CARRIER WORKER Containerd Kubelet (Community) Calico Agent Ubuntu Linux Containerd Kubelet (Community) Calico Agent CLUSTER WORKER RHEL REDHAT CARRIER CRI-O Kubelet (OpenShift) Calico Agent RHEL CRI-O Kubelet (OpenShift) Calico Agent REDHAT WORKER
  • 27.
    Red Hat OpenShifton IBM Cloud Ubuntu Linux ARMADA API Containerd Kubelet (Community) Calico Agent Ubuntu Linux CARRIER WORKER Containerd Kubelet (Community) Calico Agent Ubuntu Linux Containerd Kubelet (Community) Calico Agent CLUSTER WORKER RHEL REDHAT CARRIER CRI-O Kubelet (OpenShift) Calico Agent RHEL CRI-O Kubelet (OpenShift) Calico Agent REDHAT WORKER
  • 28.
    Hybrid Clouds, TFXand Kubeflow Pipelines Kubeflow Pipelines
  • 29.
  • 30.
    Distributed Model Trainingand HPO (Katib, TFJob, PyTorch Job…) ●  Addresses One of the key goals for model builder persona: Distributed Model Training and Hyper parameter optimization for Tensorflow, PyTorch etc. ●  Common problems in HP optimization ○  Overfitting ○  Wrong metrics ○  Too few hyperparameters ●  Katib: a fully open source, Kubernetes-native hyperparameter tuning service ○  Inspired by Google Vizier ○  Framework agnostic ○  Extensible algorithms
  • 31.
    Kubernetes Compute cluster GPU, TPU,CPU Cloud Object Storage Model Assets. 31 Istio Knative KFServing Serving and Management: KFServing Bringing the power of Knative and Istio for serverless Model deployments PRE- PROCESS PREDICT POST- PROCESS EXPLAIN
  • 32.
    Manages the hosting aspects of your models •  KFService - manages the lifecycle of models •  Configuration - manages history of model deployments. Two configurations for default and canary. • Revision - A snapshot of your model version •  Config and image •  Route - Endpoint and network traffic management Route Default Configuration Revision 1 Revision M 90 % KFService Canary Configuration Revision 1 Revision N 10 % KFServing: Default and Canary Configurations
  • 33.
    Kubeflow Pipelines -  Releasedto Kubeflow in Nov 2018, integrated into KF deployment CLI and 1-click-deploy-app -  Aimed to bring -  Orchestration for complex ML workflows -  Reproducible and reliable experimentation -  Bridging experimentation and operationalization -  Composition and reusable ML components and pipelines Kubeflow Pipelines
  • 34.
    Experiment Tracking 34 •  Kubeflowoffers an easy way to compare different runs of the pipeline. •  You can create the pipeline with model training. Then run it multiple times with different parameter values, and you’ll get accuracy and ROC AUC scores for every run compared. •  Lot more under “Compare runs” view.
  • 35.
    What constitutes aKubeflow ML Pipeline §  Containerized implementations of ML Tasks §  Pre-built components: Just provide params or code snippets (e.g. training code) §  Create your own components from code or libraries §  Use any runtime, framework, data types §  Attach k8s objects - volumes, secrets §  Specification of the sequence of steps §  Specified via Python DSL §  Inferred from data dependencies on input/output §  Input Parameters §  A “Run” = Pipeline invoked w/ specific parameters §  Can be cloned with different parameters §  Schedules §  Invoke a single run or create a recurring scheduled pipeline
  • 36.
    Define Pipeline withPython SDK @dsl.pipeline(name='Taxi Cab Classification Pipeline Example’) def taxi_cab_classification( output_dir, project, Train_data = 'gs://bucket/train.csv', Evaluation_data = 'gs://bucket/eval.csv', Target = 'tips', Learning_rate = 0.1, hidden_layer_size = '100,50’, steps=3000): tfdv = TfdvOp(train_data, evaluation_data, project, output_dir) preprocess = PreprocessOp(train_data, evaluation_data, tfdv.output[“schema”], project, output_dir) training = DnnTrainerOp(preprocess.output, tfdv.schema, learning_rate, hidden_layer_size, steps, target, output_dir) tfma = TfmaOp(training.output, evaluation_data, tfdv.schema, project, output_dir) deploy = TfServingDeployerOp(training.output) Compile and Submit Pipeline Run dsl.compile(taxi_cab_classification, 'tfx.tar.gz') run = client.run_pipeline( 'tfx_run', 'tfx.tar.gz', params={'output': ‘gs://dpa22’, 'project': ‘my-project-33’})
  • 37.
    Creating your owncomponents -  Ways to build reusable components for pipelines -  Create a container with your code and write either a ContainerOp() or shareable component descriptor -  Turn your python code into a component directly in the notebook (with or without building a container) -  These components can be exported into a shareable format Container Image Execution Code ContainerOp I/O schema Code ContainerOp Container Image I/O schema Container build Container build Component descriptor
  • 38.
    Watson AI Operations:Kubeflow Pipelines
  • 39.
    e.g. Watson Speechto Text Operations Kubeflow pipeline
  • 40.
    e.g. Watson MachineLearning and Watson OpenScale Pipeline
  • 41.
    Hybrid Clouds, TFXand Kubeflow Pipelines TFX
  • 42.
    What is TFX? TL;DR ● TFX is a platform that to deploy Tensorflow models in production ●  TFX pipelines consist of a set of integrated components ●  TFX pipelines are configured using python ●  TFX consists of components, executors, and libraries ●  TFX components are optional (and repeated) ●  TFX can be configured to run in many different ways 42
  • 43.
    TFX has existedexternally as open source libraries. 4343 Open sourced TFX libraries (circa 2018) TensorFlow Data Validation TensorFlow Transform TensorFlow Model Analysis TensorFlow Serving
  • 44.
    In 2019, thehorizontal layers that integrate TFX libraries as one platform were open sourced 4444 Open sourced TFX platform (2019) Data Ingestion TensorFlow Data Validation TensorFlow Transform Estimator or Keras Model TensorFlow Model Analysis TensorFlow Serving Logging Shared Utilities for Garbage Collection, Data Access Controls Pipeline Storage Shared Configuration Framework and Job Orchestration Integrated Frontend for Job Management, Monitoring, Debugging, Data/Model/Evaluation Visualization
  • 45.
    Anatomy of aComponent TFX components consist of three main pieces: ●  Driver ●  Executor ●  Publisher 45
  • 46.
    Anatomy of aComponent TFX includes both libraries and pipeline components. This diagram illustrates the relationships between TFX libraries and pipeline components. TFX provides several Python packages that are the libraries which are used to create pipeline components 46
  • 47.
    TFX (inside thebox) 47 Other runtimes ExampleGen StatisticsGen SchemaGen Example Validator Transform Trainer Evaluator Model Validator Pusher TFX Config Metadata Store Training + Eval Data TensorFlow Serving TensorFlow Hub TensorFlow Lite TensorFlow JS TFX Pipeline
  • 48.
    TFX uses ml-metadatafor artifact management. 48 Trainer Task-Aware Pipelines Input Data Transformed Data Trained Models Serving System Task- and Data-Aware Pipelines Pipeline + Metadata Storage Training Data Transform TrainerTransform
  • 49.
    Snapshot of acomponent. 49 Metadata Store Trainer Config Last Validated Model New (Candidate) Model New Model Model Validator Validation Outcome Pusher New (Candidate) Model Validation Outcome
  • 50.
    What’s in theMetadata Store? 50 Trained Models Type definitions of Artifacts and their Properties E.g., Models, Data, Evaluation Metrics Trainer Execution Records (Runs) of Components E.g., Runtime Configuration, Inputs + Outputs Lineage Tracking Across All Executions E.g., to recurse back to all inputs of a specific artifact
  • 51.
    Examples of Metadata-PoweredFunctionality. 51 Find out which data a model was trained on Compare previous model runs Carry-over state from previous models Re-use previously computed outputs
  • 53.
    Hybrid Clouds, TFXand Kubeflow Pipelines demo https://github.com/kubeflow/kfp-tekton/tree/master/samples/kfp-tfx
  • 54.
    Center for Open-SourceData & AI Technologies (CODAIT) / June 28, 2019 / © 2019 IBM Corporation Model Goal: Will the customer tip more or less than 20%?
  • 55.
    Center for Open-SourceData & AI Technologies (CODAIT) / June 28, 2019 / © 2019 IBM Corporation TFX Taxi Pipeline Private Cloud Public Cloud On-prem PVC KubeFlow Pipelines Istio + Kiali Kubeflow Serving Trigger model deployment Object Storage
  • 57.
    Hybrid Clouds, TFXand Kubeflow Pipelines Lessons Learnt
  • 58.
    Recommendations for TFXand KFP •  TFX Pipelines shall be made executable without any dependency on public cloud service e.g. GCS. •  Apache Beam is a strong dependency in TFX. Doesn’t support S3 natively •  TFX DSL shall support dynamically creating Persistent Volume Claims. •  Support for mixing and matching KFP ContainerOps components with TFX ones through DSL •  IBM IKS runs Kubernetes with containerd, Openshift uses CRIO APIs. The underlying pipeline platforms (Argo, Airflow, Beam etc) should support them as first class citizens •  Visualizing artifacts on the KubeFlow Pipeline UI shall not be limited to GCS by default •  Don’t assume root privileges on OpenShift and Kube, as well as underlying storage file system.
  • 59.
    TFX and KFPDSL Center for Open-Source Data & AI Technologies (CODAIT) / June 28, 2019 / © 2019 IBM Corporation DEMO CODE: https://github.com/kubeflow/kfp-tekton/tree/master/samples/kfp-tfx RFC for TFX and KFP DSL Merge https://docs.google.com/document/d/1_n3q0mNOr7gUSM04yaA0e5BO9RrS0Vkh1cNCyrB07WM/edit# Please reach out at @AnimeshSingh for any follow-on discussions
  • 60.
  • 61.
    Center for Open-SourceData & AI Technologies (CODAIT) / June 28, 2019 / © 2019 IBM Corporation