Successfully reported this slideshow.

Training And Serving ML Model Using Kubeflow by Jayesh Sharma

0

Share

1 of 15
1 of 15

Training And Serving ML Model Using Kubeflow by Jayesh Sharma

0

Share

Download to read offline

We will walk through the exploration, training and serving of a machine learning model by leveraging Kubeflow's main components. We will use Jupyter notebooks on the cluster to train the model and then introduce Kubeflow Pipelines to chain all the steps together, to automate the entire process.

We will walk through the exploration, training and serving of a machine learning model by leveraging Kubeflow's main components. We will use Jupyter notebooks on the cluster to train the model and then introduce Kubeflow Pipelines to chain all the steps together, to automate the entire process.

More Related Content

Similar to Training And Serving ML Model Using Kubeflow by Jayesh Sharma

More from CodeOps Technologies LLP

Related Books

Free with a 30 day trial from Scribd

See all

Training And Serving ML Model Using Kubeflow by Jayesh Sharma

  1. 1. Training and Serving ML models using Kubeflow • Subtitle or speaker name Jayesh Sharma
  2. 2. Machine Learning Stages @aronchick
  3. 3. Make it easy for everyone to develop, deploy, and manage portable, scalable ML everywhere
  4. 4. Why Kubeflow? ● Composability ○ Choose from existing popular tools ● Portability ○ Build using cloud native, portable Kubernetes APIs ● Scalability ○ TF already supports CPU/GPU/distributed ○ K8s scales to 5k nodes with same stack
  5. 5. What’s in the Box? ● Jupyter Hub - for collaborative & interactive training ● A TensorFlow Training Controller ● A TensorFlow Serving Deployment ● Argo for workflows ● Much more
  6. 6. What’s in the Box?
  7. 7. Kubeflow today
  8. 8. Kubeflow is composable Training • Perform distributed training with TF-Jobs • Run pipelines with regular containers as steps. • Run pipelines with TF-Jobs and other CRDs as steps. Serving • KF-Serving, Seldon Core • Azure ML Service and other frameworks.
  9. 9. Kubeflow Architecture
  10. 10. TF-Job: Distributed Training A distributed TensorFlow job typically contains 0 or more of the following processes: • Chief: The chief is responsible for orchestrating training and performing tasks like checkpointing the model. • PS: The ps are parameter servers; these servers provide a distributed data store for the model parameters. • Worker: The workers do the actual work of training the model. In some cases, worker 0 might also act as the chief. • Evaluator: The evaluators can be used to compute evaluation metrics as the model is trained.
  11. 11. An example TF-Job YAML Parameter Server option Worker specification Image with your code Command to begin training
  12. 12. Kubeflow Pipelines • A user interface (UI) for managing and tracking experiments, jobs, and runs. • An engine for scheduling multi-step ML workflows. • An SDK for defining and manipulating pipelines and components. • Notebooks for interacting with the system using the SDK.
  13. 13. Anatomy of a pipeline • Containerized implementations of ML Tasks • Pre-built components: Just provide params or code snippets. Create your own components from code or libraries • Use any runtime, framework, data types • Attach k8s objects - volumes, secrets • Specification of the sequence of steps • Specified via Python DSL • Inferred from data dependencies on input/output • Input Parameters • A “Run” = Pipeline invoked w/ specific parameters • Schedules • Invoke a single run or create a recurring scheduled pipeline

×