Chris Fregly, Founder @ PipelineAI, will walk you through a real-world, complete end-to-end Pipeline-optimization example. We highlight hyper-parameters - and model pipeline phases - that have never been exposed until now.
While most Hyperparameter Optimizers stop at the training phase (ie. learning rate, tree depth, ec2 instance type, etc), we extend model validation and tuning into a new post-training optimization phase including 8-bit reduced precision weight quantization and neural network layer fusing - among many other framework and hardware-specific optimizations.
Next, we introduce hyperparameters at the prediction phase including request-batch sizing and chipset (CPU v. GPU v. TPU).
Lastly, we determine a PipelineAI Efficiency Score of our overall Pipeline including Cost, Accuracy, and Time. We show techniques to maximize this PipelineAI Efficiency Score using our massive PipelineDB along with the Pipeline-wide hyper-parameter tuning techniques mentioned in this talk.
Bio
Chris Fregly is Founder and Applied AI Engineer at PipelineAI, a Real-Time Machine Learning and Artificial Intelligence Startup based in San Francisco.
He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production with Kubernetes and GPUs."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
Building Google Cloud ML Engine From Scratch on AWS with PipelineAI - ODSC Lo...Chris Fregly
http://pipeline.ai
Applying my Netflix experience to a real-world problem in the ML and AI world, I will demonstrate a full-featured, open-source, end-to-end TensorFlow Model Training and Deployment System using the latest advancements from Kubernetes, Istio, and TensorFlow.
In addition to training and hyper-parameter tuning, our model deployment pipeline will include continuous canary deployments of our TensorFlow Models into a live, hybrid-cloud production environment.
This is the holy grail of data science - rapid and safe experiments of ML / AI models directly in production.
Following the Successful Netflix Culture that I lived and breathed (https://www.slideshare.net/reed2001/culture-1798664/2-Netflix_CultureFreedom_Responsibility2), I give Data Scientists the Freedom and Responsibility to extend their ML / AI pipelines and experiments safely into production.
Offline, batch training and validation is for the slow and weak. Online, real-time training and validation on live production data is for the fast and strong.
Learn to be fast and strong by attending this talk.
http://pipeline.ai
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...Chris Fregly
https://pipeline.ai
With PipelineAI, You Can…
* Generate Hardware-Specific Model Optimizations
* Deploy and Compare Models in Live Production
* Optimize Complete AI Pipeline Across Many Models
* Hyper-Parameter Tune Both Training & Predicting Phases
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...Chris Fregly
Applying my Netflix experience to a real-world problem in the ML and AI world, I will demonstrate a full-featured, open-source, end-to-end TensorFlow Model Training and Deployment System using the latest advancements from Kubernetes, Istio, and TensorFlow.
In addition to training and hyper-parameter tuning, our model deployment pipeline will include continuous canary deployments of our TensorFlow Models into a live, hybrid-cloud production environment.
This is the holy grail of data science - rapid and safe experiments of ML / AI models directly in production.
Following the Successful Netflix Culture that I lived and breathed (https://www.slideshare.net/reed2001/culture-1798664/2-Netflix_CultureFreedom_Responsibility2), I give Data Scientists the Freedom and Responsibility to extend their ML / AI pipelines and experiments safely into production.
Offline, batch training and validation is for the slow and weak. Online, real-time training and validation on live production data is for the fast and strong.
Learn to be fast and strong by attending this talk.
Bio:
Chris Fregly is Founder and Research Engineer at PipelineAI, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
http://pipeline.ai
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...Chris Fregly
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs - Advanced Spark and TensorFlow Meetup May 23 2017 @ Hotels.com London
We'll discuss how to deploy TensorFlow, Spark, and Sciki-learn models on GPUs with Kubernetes across multiple cloud providers including AWS, Google, and Azure - as well as on-premise.
In addition, we'll discuss how to optimize TensorFlow models for high-performance inference using the latest TensorFlow XLA (Accelerated Linear Algebra) framework including the JIT and AOT Compilers.
Github Repo (100% Open Source!)
https://github.com/fluxcapacitor/pipeline
http://pipeline.io
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool , I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models in GPU-based production environment.
This talk contains many demos based on open source tools. You can completely reproduce all demos through Docker on your own GPU cluster.
See http://pipeline.ai for links to the GitHub Repo.
Building Google Cloud ML Engine From Scratch on AWS with PipelineAI - ODSC Lo...Chris Fregly
http://pipeline.ai
Applying my Netflix experience to a real-world problem in the ML and AI world, I will demonstrate a full-featured, open-source, end-to-end TensorFlow Model Training and Deployment System using the latest advancements from Kubernetes, Istio, and TensorFlow.
In addition to training and hyper-parameter tuning, our model deployment pipeline will include continuous canary deployments of our TensorFlow Models into a live, hybrid-cloud production environment.
This is the holy grail of data science - rapid and safe experiments of ML / AI models directly in production.
Following the Successful Netflix Culture that I lived and breathed (https://www.slideshare.net/reed2001/culture-1798664/2-Netflix_CultureFreedom_Responsibility2), I give Data Scientists the Freedom and Responsibility to extend their ML / AI pipelines and experiments safely into production.
Offline, batch training and validation is for the slow and weak. Online, real-time training and validation on live production data is for the fast and strong.
Learn to be fast and strong by attending this talk.
http://pipeline.ai
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...Chris Fregly
https://pipeline.ai
With PipelineAI, You Can…
* Generate Hardware-Specific Model Optimizations
* Deploy and Compare Models in Live Production
* Optimize Complete AI Pipeline Across Many Models
* Hyper-Parameter Tune Both Training & Predicting Phases
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...Chris Fregly
Applying my Netflix experience to a real-world problem in the ML and AI world, I will demonstrate a full-featured, open-source, end-to-end TensorFlow Model Training and Deployment System using the latest advancements from Kubernetes, Istio, and TensorFlow.
In addition to training and hyper-parameter tuning, our model deployment pipeline will include continuous canary deployments of our TensorFlow Models into a live, hybrid-cloud production environment.
This is the holy grail of data science - rapid and safe experiments of ML / AI models directly in production.
Following the Successful Netflix Culture that I lived and breathed (https://www.slideshare.net/reed2001/culture-1798664/2-Netflix_CultureFreedom_Responsibility2), I give Data Scientists the Freedom and Responsibility to extend their ML / AI pipelines and experiments safely into production.
Offline, batch training and validation is for the slow and weak. Online, real-time training and validation on live production data is for the fast and strong.
Learn to be fast and strong by attending this talk.
Bio:
Chris Fregly is Founder and Research Engineer at PipelineAI, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
http://pipeline.ai
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...Chris Fregly
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs - Advanced Spark and TensorFlow Meetup May 23 2017 @ Hotels.com London
We'll discuss how to deploy TensorFlow, Spark, and Sciki-learn models on GPUs with Kubernetes across multiple cloud providers including AWS, Google, and Azure - as well as on-premise.
In addition, we'll discuss how to optimize TensorFlow models for high-performance inference using the latest TensorFlow XLA (Accelerated Linear Algebra) framework including the JIT and AOT Compilers.
Github Repo (100% Open Source!)
https://github.com/fluxcapacitor/pipeline
http://pipeline.io
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool , I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models in GPU-based production environment.
This talk contains many demos based on open source tools. You can completely reproduce all demos through Docker on your own GPU cluster.
See http://pipeline.ai for links to the GitHub Repo.
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...Chris Fregly
http://pipeline.ai
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool, I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models - and the TensorFlow Runtime - in GPU-based production environment. This talk is 100% demo based on open source tools and completely reproducible through Docker on your own GPU cluster.
Bio
Chris Fregly is Founder and Research Engineer at PipelineAI, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High-Performance TensorFlow in Production."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
http://pipeline.ai
High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...Chris Fregly
http://pipeline.ai
Title
PipelineAI Distributed Spark ML + Tensorflow AI + GPU Workshop
*A GPU-based cloud instance will be provided to each attendee as part of this event
Highlights
We will each build an end-to-end, continuous Tensorflow AI model training and deployment pipeline on our own GPU-based cloud instance.
At the end, we will combine our cloud instances to create the LARGEST Distributed Tensorflow AI Training and Serving Cluster in the WORLD!
Agenda
Spark ML
Tensorflow AI
Storing and Serving Models with HDFS
Trade-offs of CPU vs. *GPU, Scale Up vs. Scale Out
CUDA + cuDNN GPU Development Overview
Tensorflow Model Checkpointing, Saving, Exporting, and Importing
Distributed Tensorflow AI Model Training (Distributed Tensorflow)
Centralized Logging and Visualizing of Distributed Tensorflow Training (Tensorboard)
Distributed Tensorflow AI Model Serving/Predicting (Tensorflow Serving)
Centralized Logging and Metrics Collection (Prometheus, Grafana)
Continuous Tensorflow AI Model Deployment (Tensorflow, Airflow)
Hybrid Cross-Cloud and On-Premise Deployments (Kubernetes)
High-Performance and Fault-Tolerant Microsservices using Request Batching and Circuit Breakers (NetflixOSS)
Github Repo
https://github.com/fluxcapacitor/pipeline
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...Chris Fregly
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool, I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models in GPU-based production environment.
This talk is contains many Spark ML and TensorFlow AI demos using PipelineIO's 100% Open Source Community Edition. All code and Docker images are available to reproduce on your own CPU or GPU-based cluster.
Chris Fregly is Founder and Research Engineer at PipelineIO, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
https://www.meetup.com/TensorFlow-Chicago/events/240267321/
https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/240587698/
http://pipeline.io
https://github.com/fluxcapacitor/pipeline
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...Chris Fregly
Online Workshop
Note: A GPU-based cloud instance will be provided to each attendee for the duration of this event!!
At 8am PT on the morning of this workshop, we will email the Webinar details to your email address registered with Eventbrite.
If this email address is not up to date - or you do not get the email by 8am PT - please email your Eventbrite confirmation to help@pipeline.ai and we'll send you the details.
http://pipeline.ai
Title
PipelineAI Distributed Spark ML + Tensorflow AI + GPU Workshop
Time
Start: 9am PT Time
End: 1pm PT Time
Highlights
We will each build an end-to-end, continuous Tensorflow AI model training and deployment pipeline on our own GPU-based cloud instance.
At the end, we will combine our cloud instances to create the LARGEST Distributed Tensorflow AI Training and Serving Cluster in the WORLD!
Pre-requisites
Just a modern browser, internet connection, and a good night's sleep! We'll provide the rest.
Agenda
Spark ML
TensorFlow AI
Storing and Serving Models with HDFS
Trade-offs of CPU vs. *GPU, Scale Up vs. Scale Out
CUDA + cuDNN GPU Development Overview
TensorFlow Model Checkpointing, Saving, Exporting, and Importing
Distributed TensorFlow AI Model Training (Distributed Tensorflow)
TensorFlow's Accelerated Linear Algebra Framework (XLA)
TensorFlow's Just-in-Time (JIT) Compiler, Ahead of Time (AOT) Compiler
Centralized Logging and Visualizing of Distributed TensorFlow Training (Tensorboard)
Distributed Tensorflow AI Model Serving/Predicting (TensorFlow Serving)
Centralized Logging and Metrics Collection (Prometheus, Grafana)
Continuous TensorFlow AI Model Deployment (TensorFlow, Airflow)
Hybrid Cross-Cloud and On-Premise Deployments (Kubernetes)
High-Performance and Fault-Tolerant Micro-services (NetflixOSS)
More Info including GitHub and Docker Repos
http://pipeline.ai
Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...Chris Fregly
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT AOT Compiler, and Graph Transform Tool , I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models in GPU-based production environment.
This talk is 100% demo based with open source tools and completely reproducible through Docker on your own GPU cluster.
In addition, I spin up a GPU cloud instance for every attendee in the audience. We go through the notebooks together as I demonstrate the process of continuously training, optimizing, deploying, and serving a TensorFlow model on a large, distributed cluster of Nvidia GPUs managed by the attendees.
http://pipeline.ai
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUsChris Fregly
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs @ Strata London, May 24 2017
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs - Advanced Spark and TensorFlow Meetup May 23 2017 @ Hotels.com London
We'll discuss how to deploy TensorFlow, Spark, and Sciki-learn models on GPUs with Kubernetes across multiple cloud providers including AWS, Google, and Azure - as well as on-premise.
In addition, we'll discuss how to optimize TensorFlow models for high-performance inference using the latest TensorFlow XLA (Accelerated Linear Algebra) framework including the JIT and AOT Compilers.
Github Repo (100% Open Source!)
https://github.com/fluxcapacitor/pipeline
http://pipeline.io
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017Chris Fregly
http://pipeline.io
Title
PipelineAI Distributed Spark ML + Tensorflow AI + GPU Workshop
*A GPU-based cloud instance will be provided to each attendee as part of this event
Highlights
We will each build an end-to-end, continuous Tensorflow AI model training and deployment pipeline on our own GPU-based cloud instance.
At the end, we will combine our cloud instances to create the LARGEST Distributed Tensorflow AI Training and Serving Cluster in the WORLD!
Pre-requisites
Just a modern browser, internet connection, and a good night's sleep! We'll provide the rest.
Agenda
Spark ML
TensorFlow AI
Storing and Serving Models with HDFS
Trade-offs of CPU vs. *GPU, Scale Up vs. Scale Out
CUDA + cuDNN GPU Development Overview
TensorFlow Model Checkpointing, Saving, Exporting, and Importing
Distributed TensorFlow AI Model Training (Distributed Tensorflow)
TensorFlow's Accelerated Linear Algebra Framework (XLA)
TensorFlow's Just-in-Time (JIT) Compiler, Ahead of Time (AOT) Compiler
Centralized Logging and Visualizing of Distributed TensorFlow Training (Tensorboard)
Distributed Tensorflow AI Model Serving/Predicting (TensorFlow Serving)
Centralized Logging and Metrics Collection (Prometheus, Grafana)
Continuous TensorFlow AI Model Deployment (TensorFlow, Airflow)
Hybrid Cross-Cloud and On-Premise Deployments (Kubernetes)
High-Performance and Fault-Tolerant Micro-services (NetflixOSS)
Bio
Chris Fregly is Founder and Research Engineer at PipelineIO, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
Github Repo
https://github.com/fluxcapacitor/pipeline
Video
https://youtu.be/oNf3I1fVmg8
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...Chris Fregly
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool, Chris will demonstrate how to optimize, profile, and deploy TensorFlow Models in GPU-based production environment. This talk is 100% demo based with open source tools and completely reproducible through Docker on your own GPU cluster.
https://github.com/fluxcapacitor/pipeline/gpu.ml
http://pipeline.io
Speaker: Umayah Abdennabi
Agenda
* Intro Grammarly (Umayah Abdennabi, 5 mins)
* Meetup Updates and Announcements (Chris, 5 mins)
* Custom Functions in Spark SQL (30 mins)
Speaker: Umayah Abdennabi
Spark comes with a rich Expression library that can be extended to make custom expressions. We will look into custom expressions and why you would want to use them.
* TF 2.0 + Keras (30 mins)
Speaker: Francesco Mosconi
Tensorflow 2.0 was announced at the March TF Dev Summit, and it brings many changes and upgrades. The most significant change is the inclusion of Keras as the default model building API. In this talk, we'll review the main changes introduced in TF 2.0 and highlight the differences between open source Keras and tf.keras
* SQUAD Deep-Dive: Question & Answer with Context (45 mins)
Speaker: Brett Koonce (https://quarkworks.co)
SQuAD (Stanford Question Answer Dataset) is an NLP challenge based around answering questions by reading Wikipedia articles, designed to be a real-world machine learning benchmark. We will look at several different ways to tackle the SQuAD problem, building up to state of the art approaches in terms of time, complexity, and accuracy.
https://rajpurkar.github.io/SQuAD-explorer/
https://dawn.cs.stanford.edu/benchmark/#squad
Food and drinks will be provided. The event will be held at Grammarly's office at One Embarcadero Center on the 9th floor. When you arrive at One Embarcadero, take the escalator to the second floor where you will find the lobby and elevators to the office suites. Come on up to the 9th floor (no need to check in at security), and ring the Grammarly doorbell.
High Performance Distributed TensorFlow with GPUs and Kubernetesinside-BigData.com
In this deck from the Stanford HPC Conference, Chris Fregly from PipelineAI presents: High Performance Distributed TensorFlow with GPUs and Kubernetes.
"Applying my Netflix experience to a real-world problem in the ML and AI world, I will demonstrate a full-featured, open-source, end-to-end TensorFlow Model Training and Deployment System using the latest advancements with TensorFlow, Kubernetes, OpenFaaS, GPUs, and PipelineAI.
In addition to training and hyper-parameter tuning, our model deployment pipeline will include continuous canary deployments of our TensorFlow Models into a live, hybrid-cloud production environment. This is the holy grail of data science - rapid and safe experiments of ML / AI models directly in production. Following the famous Netflix Culture that encourages "Freedom and Responsibility", I use this talk to demonstrate how Data Scientists can use PipelineAI to safely deploy their ML / AI pipelines into production using live data. Offline, batch training and validation is for the slow and weak. Online, real-time training and validation on live production data is for the fast and strong. Learn to be fast and strong by attending this talk!"
Watch the video: https://youtu.be/k4qAKQHakNg
Learn more: https://pipeline.ai/
and
http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Now that you have your apps running on K8s, wondering how to get the response time that you need ? Tuning applications to get the performance that you need can be challenging. When you have to tune a number of microservices in Kubernetes to fix a response time or a throughput issue, it can get really overwhelming. This talk looks at some common performance issues and ways to solve them and more importantly the tools that can help you. We will also be specifically looking at Kruize that helps to not only right size your containers but also optimize the runtimes.
This is an introduction to polyaxon and why I use polyaxon.
Polyaxon enables me to leverage kubernetes to achieve the objectives:
- Make the lead time of experiments as short as possible.
- Make the financial cost to train models as cheap as possible.
- Make the experiments reproducible.
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...Chris Fregly
Empowering the Data Scientist with "1-Click" Production Deployment and Canary Testing of High-Performance and Highly-Scalable Spark ML and TensorFlow Models directly from Jupyter/iPython Notebooks using Docker, Kubernetes, Netflix OSS, Microservices, and Spinnaker.
With proper tooling and metrics, Data Scientists can directly deploy, analyze, A/B test, rollback, and scale out their Spark ML and TensorFlow model into live production serving with zero friction.
We will show you the open source tools that we've built based on Docker, Kubernetes, Netflix Open Source, Microservices, Spinnaker - and even Chaos Monkey!
Speaker: Chris Fregly @ PipelineIO, formerly Databricks and Netflix
Scaling Puppet Enterprise with Compile Masters requires you to provision new machines and manually configure them, as well as your Puppet Master server.
Learn how you can automatically provision and configure new Compile Master nodes for your AWS Opsworks for Puppet Enterprise server by leveraging AWS Systems Manager
Migrating to a Bazel-based CI System: 6 Learnings - Or ShacharWix Engineering
Two years ago, we were given a big challenge - Transform Wix Build System, then based on Maven and Teamcity, to a new system that will support our exponentially growing scale. Naturally, we chose Bazel.
But, how could we move to a system so different in so many ways than the existing one? Furthermore, we were required not to break the current build system, as we migrate to the new one.
Fast forward to today: Wix backend CI system is fully migrated to Bazel! The system builds in a fracture of the time - even with our largest codebases. In this talk, Or Shachar will describe how we achieved this, why it took us so long, what tools we had to build on the way (and what we already have, and will, open source!), and share the principles that helped us.
You can watch it here:
https://www.wix.engineering/post/bazelcon-2019-lessons-learned-from-migrating-our-build-system-to-bazel
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsStijn Decubber
Slides from the TensorFlow meetup hosted on October 9th at the ML6 offices in Ghent. Join our Meetup group for updates and future sessions: https://www.meetup.com/TensorFlow-Belgium/
Migrating to a bazel based CI system: 6 learnings Or Shachar
Two years ago, we were given a big challenge - Transform Wix Build System, then based on Maven and Teamcity, to a new system that will support our exponentially growing scale.
But, how could we move to a system so different in so many ways than the existing one? Furthermore, we were required not to break the current build system, as we migrate to the new one.
Fast forward to today: Wix backend CI system is fully migrated to Bazel! The system builds in a fracture of the time - even with our largest codebases. In this talk, we will describe how we achieved this, why it took us so long, what tools we had to build on the way (and what we already have, and will, open source!), and share the principles that helped us.
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AIData Con LA
Abstract:-
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool , I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models - and the TensorFlow Runtime - in GPU-based production environment.
This talk is 100% demo based with open source tools and completely reproducible through Docker on your own GPU cluster.
Bio:-
Chris Fregly is Founder and Research Engineer at PipelineAI, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production."
Pipeline.AI was also the recent winner of the O'Reilly Media AI Startup Showcase at the AI conference.
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...Chris Fregly
http://pipeline.ai
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool, I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models - and the TensorFlow Runtime - in GPU-based production environment. This talk is 100% demo based on open source tools and completely reproducible through Docker on your own GPU cluster.
Bio
Chris Fregly is Founder and Research Engineer at PipelineAI, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High-Performance TensorFlow in Production."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
http://pipeline.ai
High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...Chris Fregly
http://pipeline.ai
Title
PipelineAI Distributed Spark ML + Tensorflow AI + GPU Workshop
*A GPU-based cloud instance will be provided to each attendee as part of this event
Highlights
We will each build an end-to-end, continuous Tensorflow AI model training and deployment pipeline on our own GPU-based cloud instance.
At the end, we will combine our cloud instances to create the LARGEST Distributed Tensorflow AI Training and Serving Cluster in the WORLD!
Agenda
Spark ML
Tensorflow AI
Storing and Serving Models with HDFS
Trade-offs of CPU vs. *GPU, Scale Up vs. Scale Out
CUDA + cuDNN GPU Development Overview
Tensorflow Model Checkpointing, Saving, Exporting, and Importing
Distributed Tensorflow AI Model Training (Distributed Tensorflow)
Centralized Logging and Visualizing of Distributed Tensorflow Training (Tensorboard)
Distributed Tensorflow AI Model Serving/Predicting (Tensorflow Serving)
Centralized Logging and Metrics Collection (Prometheus, Grafana)
Continuous Tensorflow AI Model Deployment (Tensorflow, Airflow)
Hybrid Cross-Cloud and On-Premise Deployments (Kubernetes)
High-Performance and Fault-Tolerant Microsservices using Request Batching and Circuit Breakers (NetflixOSS)
Github Repo
https://github.com/fluxcapacitor/pipeline
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...Chris Fregly
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool, I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models in GPU-based production environment.
This talk is contains many Spark ML and TensorFlow AI demos using PipelineIO's 100% Open Source Community Edition. All code and Docker images are available to reproduce on your own CPU or GPU-based cluster.
Chris Fregly is Founder and Research Engineer at PipelineIO, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
https://www.meetup.com/TensorFlow-Chicago/events/240267321/
https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/240587698/
http://pipeline.io
https://github.com/fluxcapacitor/pipeline
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...Chris Fregly
Online Workshop
Note: A GPU-based cloud instance will be provided to each attendee for the duration of this event!!
At 8am PT on the morning of this workshop, we will email the Webinar details to your email address registered with Eventbrite.
If this email address is not up to date - or you do not get the email by 8am PT - please email your Eventbrite confirmation to help@pipeline.ai and we'll send you the details.
http://pipeline.ai
Title
PipelineAI Distributed Spark ML + Tensorflow AI + GPU Workshop
Time
Start: 9am PT Time
End: 1pm PT Time
Highlights
We will each build an end-to-end, continuous Tensorflow AI model training and deployment pipeline on our own GPU-based cloud instance.
At the end, we will combine our cloud instances to create the LARGEST Distributed Tensorflow AI Training and Serving Cluster in the WORLD!
Pre-requisites
Just a modern browser, internet connection, and a good night's sleep! We'll provide the rest.
Agenda
Spark ML
TensorFlow AI
Storing and Serving Models with HDFS
Trade-offs of CPU vs. *GPU, Scale Up vs. Scale Out
CUDA + cuDNN GPU Development Overview
TensorFlow Model Checkpointing, Saving, Exporting, and Importing
Distributed TensorFlow AI Model Training (Distributed Tensorflow)
TensorFlow's Accelerated Linear Algebra Framework (XLA)
TensorFlow's Just-in-Time (JIT) Compiler, Ahead of Time (AOT) Compiler
Centralized Logging and Visualizing of Distributed TensorFlow Training (Tensorboard)
Distributed Tensorflow AI Model Serving/Predicting (TensorFlow Serving)
Centralized Logging and Metrics Collection (Prometheus, Grafana)
Continuous TensorFlow AI Model Deployment (TensorFlow, Airflow)
Hybrid Cross-Cloud and On-Premise Deployments (Kubernetes)
High-Performance and Fault-Tolerant Micro-services (NetflixOSS)
More Info including GitHub and Docker Repos
http://pipeline.ai
Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...Chris Fregly
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT AOT Compiler, and Graph Transform Tool , I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models in GPU-based production environment.
This talk is 100% demo based with open source tools and completely reproducible through Docker on your own GPU cluster.
In addition, I spin up a GPU cloud instance for every attendee in the audience. We go through the notebooks together as I demonstrate the process of continuously training, optimizing, deploying, and serving a TensorFlow model on a large, distributed cluster of Nvidia GPUs managed by the attendees.
http://pipeline.ai
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUsChris Fregly
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs @ Strata London, May 24 2017
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs - Advanced Spark and TensorFlow Meetup May 23 2017 @ Hotels.com London
We'll discuss how to deploy TensorFlow, Spark, and Sciki-learn models on GPUs with Kubernetes across multiple cloud providers including AWS, Google, and Azure - as well as on-premise.
In addition, we'll discuss how to optimize TensorFlow models for high-performance inference using the latest TensorFlow XLA (Accelerated Linear Algebra) framework including the JIT and AOT Compilers.
Github Repo (100% Open Source!)
https://github.com/fluxcapacitor/pipeline
http://pipeline.io
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017Chris Fregly
http://pipeline.io
Title
PipelineAI Distributed Spark ML + Tensorflow AI + GPU Workshop
*A GPU-based cloud instance will be provided to each attendee as part of this event
Highlights
We will each build an end-to-end, continuous Tensorflow AI model training and deployment pipeline on our own GPU-based cloud instance.
At the end, we will combine our cloud instances to create the LARGEST Distributed Tensorflow AI Training and Serving Cluster in the WORLD!
Pre-requisites
Just a modern browser, internet connection, and a good night's sleep! We'll provide the rest.
Agenda
Spark ML
TensorFlow AI
Storing and Serving Models with HDFS
Trade-offs of CPU vs. *GPU, Scale Up vs. Scale Out
CUDA + cuDNN GPU Development Overview
TensorFlow Model Checkpointing, Saving, Exporting, and Importing
Distributed TensorFlow AI Model Training (Distributed Tensorflow)
TensorFlow's Accelerated Linear Algebra Framework (XLA)
TensorFlow's Just-in-Time (JIT) Compiler, Ahead of Time (AOT) Compiler
Centralized Logging and Visualizing of Distributed TensorFlow Training (Tensorboard)
Distributed Tensorflow AI Model Serving/Predicting (TensorFlow Serving)
Centralized Logging and Metrics Collection (Prometheus, Grafana)
Continuous TensorFlow AI Model Deployment (TensorFlow, Airflow)
Hybrid Cross-Cloud and On-Premise Deployments (Kubernetes)
High-Performance and Fault-Tolerant Micro-services (NetflixOSS)
Bio
Chris Fregly is Founder and Research Engineer at PipelineIO, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
Github Repo
https://github.com/fluxcapacitor/pipeline
Video
https://youtu.be/oNf3I1fVmg8
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...Chris Fregly
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool, Chris will demonstrate how to optimize, profile, and deploy TensorFlow Models in GPU-based production environment. This talk is 100% demo based with open source tools and completely reproducible through Docker on your own GPU cluster.
https://github.com/fluxcapacitor/pipeline/gpu.ml
http://pipeline.io
Speaker: Umayah Abdennabi
Agenda
* Intro Grammarly (Umayah Abdennabi, 5 mins)
* Meetup Updates and Announcements (Chris, 5 mins)
* Custom Functions in Spark SQL (30 mins)
Speaker: Umayah Abdennabi
Spark comes with a rich Expression library that can be extended to make custom expressions. We will look into custom expressions and why you would want to use them.
* TF 2.0 + Keras (30 mins)
Speaker: Francesco Mosconi
Tensorflow 2.0 was announced at the March TF Dev Summit, and it brings many changes and upgrades. The most significant change is the inclusion of Keras as the default model building API. In this talk, we'll review the main changes introduced in TF 2.0 and highlight the differences between open source Keras and tf.keras
* SQUAD Deep-Dive: Question & Answer with Context (45 mins)
Speaker: Brett Koonce (https://quarkworks.co)
SQuAD (Stanford Question Answer Dataset) is an NLP challenge based around answering questions by reading Wikipedia articles, designed to be a real-world machine learning benchmark. We will look at several different ways to tackle the SQuAD problem, building up to state of the art approaches in terms of time, complexity, and accuracy.
https://rajpurkar.github.io/SQuAD-explorer/
https://dawn.cs.stanford.edu/benchmark/#squad
Food and drinks will be provided. The event will be held at Grammarly's office at One Embarcadero Center on the 9th floor. When you arrive at One Embarcadero, take the escalator to the second floor where you will find the lobby and elevators to the office suites. Come on up to the 9th floor (no need to check in at security), and ring the Grammarly doorbell.
High Performance Distributed TensorFlow with GPUs and Kubernetesinside-BigData.com
In this deck from the Stanford HPC Conference, Chris Fregly from PipelineAI presents: High Performance Distributed TensorFlow with GPUs and Kubernetes.
"Applying my Netflix experience to a real-world problem in the ML and AI world, I will demonstrate a full-featured, open-source, end-to-end TensorFlow Model Training and Deployment System using the latest advancements with TensorFlow, Kubernetes, OpenFaaS, GPUs, and PipelineAI.
In addition to training and hyper-parameter tuning, our model deployment pipeline will include continuous canary deployments of our TensorFlow Models into a live, hybrid-cloud production environment. This is the holy grail of data science - rapid and safe experiments of ML / AI models directly in production. Following the famous Netflix Culture that encourages "Freedom and Responsibility", I use this talk to demonstrate how Data Scientists can use PipelineAI to safely deploy their ML / AI pipelines into production using live data. Offline, batch training and validation is for the slow and weak. Online, real-time training and validation on live production data is for the fast and strong. Learn to be fast and strong by attending this talk!"
Watch the video: https://youtu.be/k4qAKQHakNg
Learn more: https://pipeline.ai/
and
http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Now that you have your apps running on K8s, wondering how to get the response time that you need ? Tuning applications to get the performance that you need can be challenging. When you have to tune a number of microservices in Kubernetes to fix a response time or a throughput issue, it can get really overwhelming. This talk looks at some common performance issues and ways to solve them and more importantly the tools that can help you. We will also be specifically looking at Kruize that helps to not only right size your containers but also optimize the runtimes.
This is an introduction to polyaxon and why I use polyaxon.
Polyaxon enables me to leverage kubernetes to achieve the objectives:
- Make the lead time of experiments as short as possible.
- Make the financial cost to train models as cheap as possible.
- Make the experiments reproducible.
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...Chris Fregly
Empowering the Data Scientist with "1-Click" Production Deployment and Canary Testing of High-Performance and Highly-Scalable Spark ML and TensorFlow Models directly from Jupyter/iPython Notebooks using Docker, Kubernetes, Netflix OSS, Microservices, and Spinnaker.
With proper tooling and metrics, Data Scientists can directly deploy, analyze, A/B test, rollback, and scale out their Spark ML and TensorFlow model into live production serving with zero friction.
We will show you the open source tools that we've built based on Docker, Kubernetes, Netflix Open Source, Microservices, Spinnaker - and even Chaos Monkey!
Speaker: Chris Fregly @ PipelineIO, formerly Databricks and Netflix
Scaling Puppet Enterprise with Compile Masters requires you to provision new machines and manually configure them, as well as your Puppet Master server.
Learn how you can automatically provision and configure new Compile Master nodes for your AWS Opsworks for Puppet Enterprise server by leveraging AWS Systems Manager
Migrating to a Bazel-based CI System: 6 Learnings - Or ShacharWix Engineering
Two years ago, we were given a big challenge - Transform Wix Build System, then based on Maven and Teamcity, to a new system that will support our exponentially growing scale. Naturally, we chose Bazel.
But, how could we move to a system so different in so many ways than the existing one? Furthermore, we were required not to break the current build system, as we migrate to the new one.
Fast forward to today: Wix backend CI system is fully migrated to Bazel! The system builds in a fracture of the time - even with our largest codebases. In this talk, Or Shachar will describe how we achieved this, why it took us so long, what tools we had to build on the way (and what we already have, and will, open source!), and share the principles that helped us.
You can watch it here:
https://www.wix.engineering/post/bazelcon-2019-lessons-learned-from-migrating-our-build-system-to-bazel
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsStijn Decubber
Slides from the TensorFlow meetup hosted on October 9th at the ML6 offices in Ghent. Join our Meetup group for updates and future sessions: https://www.meetup.com/TensorFlow-Belgium/
Migrating to a bazel based CI system: 6 learnings Or Shachar
Two years ago, we were given a big challenge - Transform Wix Build System, then based on Maven and Teamcity, to a new system that will support our exponentially growing scale.
But, how could we move to a system so different in so many ways than the existing one? Furthermore, we were required not to break the current build system, as we migrate to the new one.
Fast forward to today: Wix backend CI system is fully migrated to Bazel! The system builds in a fracture of the time - even with our largest codebases. In this talk, we will describe how we achieved this, why it took us so long, what tools we had to build on the way (and what we already have, and will, open source!), and share the principles that helped us.
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AIData Con LA
Abstract:-
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool , I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models - and the TensorFlow Runtime - in GPU-based production environment.
This talk is 100% demo based with open source tools and completely reproducible through Docker on your own GPU cluster.
Bio:-
Chris Fregly is Founder and Research Engineer at PipelineAI, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production."
Pipeline.AI was also the recent winner of the O'Reilly Media AI Startup Showcase at the AI conference.
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...SQUADEX
The right setup of the local development and cloud infrastructure are the requirement for reproducible and reliable Machine Learning products. They also require a well-polished process behind the management of the data science life cycle, from research to production. ML stimulates the need for a more advanced type of software development process and requires a sophisticated ecosystem of services than classic IDE.
This SlideShare provides ML engineers with insightful tips on how to use specific AWS & open-sources tools as well as DevOps best practices to complete routine tasks like data ingestion, data preprocessing, feature engineering, labeling, training, parameters tuning, testing, deployment, monitoring, and retraining.
On top of that, you will learn what can and what can not be automated when it comes to using both AWS products and tools like Kubernetes, Kubeflow, Jupiter notebooks, TensorFlow, and TPOT.
The keynote was originally delivered to Stanford academia (University IT, students, and staff) on campus of Stanford University.
Speakers:
-- Stepan Pushkarev, CTO at Squadex (https://www.linkedin.com/in/stepanpushkarev/)
-- Rinat Gareev, Machine Learning Engineer at Squadex (https://www.linkedin.com/in/gareev/)
-- Iskandar Sitdikov, Machine Learning Engineer at Squadex (https://www.linkedin.com/in/icekhan/)
DevoxxUK: Optimizating Application Performance on KubernetesDinakar Guniguntala
Now that you have your apps running on K8s, wondering how to get the response time that you need ? Tuning a polyglot set of microservices to get the performance that you need can be challenging in Kubernetes. The key to overcoming this is observability. Luckily there are a number of tools such as Prometheus that can provide all the metrics you need, but here is the catch, there is so much of data and metrics that is difficult make sense of it all. This is where Hyperparameter tuning can come to the rescue to help build the right models.
This talk covers best practices that will help attendees
1. To understand and avoid common performance related problems.
2. Discuss observability tools and how they can help identify perf issues.
3. Look closer into Kruize Autotune which is a Open Source Autonomous Performance Tuning Tool for Kubernetes and where it can help.
KFServing - Serverless Model InferencingAnimesh Singh
Deep dive into KFServing: Serverless Model Inferencing Platform built on top of KNative and Istio. Part of the Kubeflow project, and deployed in production across organizations.
Scaling AI in production using PyTorchgeetachauhan
Slides from my talk at MLOps World' 21
Deploying AI models in production and scaling the ML services is still a big challenge. In this talk we will cover details of how to deploy your AI models, best practices for the deployment scenarios, and techniques for performance optimization and scaling the ML services. Come join us to learn how you can jumpstart the journey of taking your PyTorch models from Research to production.
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...PAPIs.io
When making machine learning applications in Uber, we identified a sequence of common practices and painful procedures, and thus built a machine learning platform as a service. We here present the key components to build such a scalable and reliable machine learning service which serves both our online and offline data processing needs.
In this deck from the 2018 Swiss HPC Conference, Axel Koehler from NVIDIA presents: The Convergence of HPC and Deep Learning.
"The intersection of AI and HPC is extending the reach of science and accelerating the pace of scientific innovation like never before. The technology originally developed for HPC has enabled deep learning, and deep learning is enabling many usages in science. Deep learning is also helping deliver real-time results with models that used to take days or months to simulate. The presentation will give an overview about the latest hard- and software developments for HPC and Deep Learning from NVIDIA and will show some examples that Deep Learning can be combined with traditional large scale simulations."
Watch the video: https://wp.me/p3RLHQ-ijM
Learn more: http://nvidia.com
and
http://www.hpcadvisorycouncil.com/events/2018/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017Amazon Web Services
Join Facebook’s Pieter Noordhuis to learn about Caffe2, a lightweight and scalable framework for deep learning. You’ll learn about its features, the way Facebook applies it in production, and how to use Caffe2 to create and train your own deep learning models on Amazon EC2 P3 instances, which use the latest NVIDIA Volta architecture for GPU-acceleration. This session will also discuss the cost tradeoffs and time to model measurements for deep learning.
Stress Testing at Twitter: a tale of New Year EvesHerval Freire
Failure testing is a fundamental piece of Twitter’s reliability engineering. Over the years, we developed a rich toolchain that allows us to detect and fix scalability problems long before they happen. In this talk, we’ll cover some of the strategies we employ and discuss our always evolving approach to API stress testing and its “unit test” equivalent, redline testing.
My talk at Data Science Labs conference in Odessa.
Training a model in Apache Spark while having it automatically available for real-time serving is an essential feature for end-to-end solutions.
There is an option to export the model into PMML and then import it into a separated scoring engine. The idea of interoperability is great but it has multiple challenges, such as code duplication, limited extensibility, inconsistency, extra moving parts. In this talk we discussed an alternative solution that does not introduce custom model formats and new standards, not based on export/import workflow and shares Apache Spark API.
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...Amazon Web Services
Every day, the computing power of high-performance computing (HPC) clusters helps scientists make breakthroughs, such as proving the existence of gravitational waves and screening new compounds for new drugs. Yet building HPC clusters is out of reach for most organizations, due to the upfront hardware costs and ongoing operational expenses. Now the speed of innovation is only bound by your imagination, not your budget. Researchers can run one cluster for 10,000 hours or 10,000 clusters for one hour anytime, from anywhere, and both cost the same in the cloud. And with the availability of Public Data Sets in Amazon S3, petabyte scale data is instantly accessible in the cloud. Attend and learn how to build HPC clusters on the fly, leverage Amazon’s Spot market pricing to minimize the cost of HPC jobs, and scale HPC jobs on a small budget, using all the same tools you use today, and a few new ones too.
Similar to Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San Jose March 2018 (20)
Pandas on AWS - Let me count the ways.pdfChris Fregly
Chris Fregly (Principal Solution Architect, AI and machine learning at AWS) will give a brief presentation on the various ways to perform scalable Pandas, Modin, and Ray on AWS. He will then answer questions from the audience and moderator, Alejandro Herrera (whatever he is) at Ponder.
Chris Fregly is a Principal Solution Architect for AI and Machine Learning at Amazon Web Services (AWS) based in San Francisco, California. He is the organizer of the Global Data Science on AWS meetup. He is co-author of the O'Reilly Book, "Data Science on AWS."
Related Links
O'Reilly Book: https://www.amazon.com/dp/1492079391/
Website: https://datascienceonaws.com
Meetup: https://meetup.datascienceonaws.com
GitHub Repo: https://github.com/data-science-on-aws/
YouTube: https://youtube.datascienceonaws.com
Slideshare: https://slideshare.datascienceonaws.com
Ray AI Runtime (AIR) on AWS - Data Science On AWS MeetupChris Fregly
RSVP Webinar: https://www.eventbrite.com/e/webinarkubeflow-tensorflow-tfx-pytorch-gpu-spark-ml-amazonsagemaker-tickets-45852865154
Talk #0: Introductions and Meetup Announcements By Chris Fregly and Antje Barth
Talk #1: Ray Overview, Ray AI Runtime on AWS using Amazon SageMaker, EC2, EMR, EKS by Chris Fregly, Principal Specialist Solution Architect, AI and Machine Learning @ AWS
Talk #2: Deep-dive Blueprints for Amazon Elastic Kubernetes Service (EKS) including Ray and Spark by Apoorva Kulkarni, Sr. Specialist Solution Architect, Containers and Kubernetes @ AWS
RSVP Webinar: https://www.eventbrite.com/e/webinarkubeflow-tensorflow-tfx-pytorch-gpu-spark-ml-amazonsagemaker-tickets-45852865154
Zoom link: https://us02web.zoom.us/j/82308186562
Related Links
O'Reilly Book: https://www.amazon.com/dp/1492079391/
Website: https://datascienceonaws.com
Meetup: https://meetup.datascienceonaws.com
GitHub Repo: https://github.com/data-science-on-aws/
YouTube: https://youtube.datascienceonaws.com
Slideshare: https://slideshare.datascienceonaws.com
Amazon reInvent 2020 Recap: AI and Machine LearningChris Fregly
Amazon reInvent 2020 Recap: AI and Machine Learning
Video here: https://youtu.be/YSXe02Y5pHM
NEW RELEASE! Build, Automate, Manage, and Scale ML Workflows with the NEW Amazon SageMaker Pipelines by Hallie Crosby Weishahn.
Description of Talk and Demo
AWS recently announced Amazon SageMaker Pipelines (https://aws.amazon.com/sagemaker/pipelines/), the first purpose-built, easy-to-use Continuous Integration and Continuous Delivery (CI/CD) service for machine learning.
SageMaker Pipelines has three main components which improve the operational resilience and reproducibility of your workflows: 1) pipelines, 2) model registry, and 3) projects.
In this talk and demo, Hallie will walk us through the new Amazon SageMaker Pipelines feature including MLOps support.
Date/Time
9-10am US Pacific Time (Third Monday of Every Month)
RSVP: https://www.eventbrite.com/e/1-hr-free-workshop-pipelineai-gpu-tpu-spark-ml-tensorflow-ai-kubernetes-kafka-scikit-tickets-45852865154
Meetup:
https://www.meetup.com/Data-Science-on-AWS/
Zoom:
https://zoom.us/j/690414331
Webinar ID: 690 414 331
Phone:
+1 646 558 8656 (US Toll) or +1 408 638 0968 (US Toll)
Related Links
Meetup: https://meetup.datascienceonaws.com
GitHub Repo: https://github.com/data-science-on-aws/
O'Reilly Book: https://datascienceonaws.com
YouTube: https://youtube.datascienceonaws.com
Slideshare: https://slideshare.datascienceonaws.com
Support: https://support.pipeline.ai
Monthly Workshop: https://www.eventbrite.com/e/full-day-workshop-kubeflow-gpu-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-tickets-63362929227
RSVP: https://www.eventbrite.com/e/1-hr-free-workshop-pipelineai-gpu-tpu-spark-ml-tensorflow-ai-kubernetes-kafka-scikit-tickets-45852865154
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...Chris Fregly
Waking the Data Scientist at 2am:
Detect Model Degradation on Production Models with Amazon SageMaker Endpoints & Model Monitor
In this talk, I describe how to deploy a model into production and monitor its performance using SageMaker Model Monitor. With Model Monitor, I can detect if a model's predictive performance has degraded - and alert an on-call data scientist to take action and improve the model at 2am while the DevOps folks sleep soundly through the night.
Topics: AI and Machine Learning, Model Deployment, Anomaly Detection, Amazon SageMaker Endpoints, and Model Monitor
Quantum Computing with Amazon Braket
In this talk, I describe some fundamental principles of quantum computing including qu-bits, superposition, and entanglement. I will demonstrate how to perform secure quantum computing tasks across many Quantum Processing Units (QPUs) using Amazon Braket, IAM, and S3.
AI and Machine Learning, Quantum Computing, Amazon Braket, QPU
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-PersonChris Fregly
In this talk, we present tips and best practices for scaling a large workshop for 1,000's of simultaneous attendees - both online and in-person. While our workshop is focused on AI and machine learning on AWS, we generalize our learnings for any domain or specialization.
Video: https://youtu.be/T0L0JxDaPkc
RSVP Here: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227
Description
In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, Airflow, and MLflow.
Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google.
KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking.
Airflow is the most-widely used pipeline orchestration framework in machine learning and data engineering.
MLflow is a lightweight experiment-tracking system recently open-sourced by Databricks, the creators of Apache Spark. MLflow supports Python, Java/Scala, and R - and offers native support for TensorFlow, Keras, and Scikit-Learn.
Pre-requisites
Modern browser - and that's it!
Every attendee will receive a cloud instance
Nothing will be installed on your local laptop
Everything can be downloaded at the end of the workshop
Location
Online Workshop
The link will be sent a few hours before the start of the workshop.
Only registered users will receive the link.
If you do not receive the link a few hours before the start of the workshop, please send your Eventbrite registration confirmation to support@pipeline.ai for help.
Agenda
1. Create a Kubernetes cluster
2. Install KubeFlow, Airflow, TFX, and Jupyter
3. Setup ML Training Pipelines with KubeFlow and Airflow
4. Transform Data with TFX Transform
5. Validate Training Data with TFX Data Validation
6. Train Models with Jupyter, Keras/TensorFlow 2.0, PyTorch, XGBoost, and KubeFlow
7. Run a Notebook Directly on Kubernetes Cluster with KubeFlow
8. Analyze Models using TFX Model Analysis and Jupyter
9. Perform Hyper-Parameter Tuning with KubeFlow
10. Select the Best Model using KubeFlow Experiment Tracking
11. Run Multiple Experiments with MLflow Experiment Tracking
12. Reproduce Model Training with TFX Metadata Store
13. Deploy the Model to Production with TensorFlow Serving and Istio
14. Save and Download your Workspace
Key Takeaways
Attendees will gain experience training, analyzing, and serving real-world Keras/TensorFlow 2.0 models in production using model frameworks and open-source tools.
RSVP Here: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227
https://youtu.be/T0L0JxDaPkc
Title
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTorch + XGBoost + Airflow + MLflow + Spark + Jupyter + TPU
Video
https://youtu.be/vaB4IM6ySD0
Description
In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, and Airflow.
Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google.
KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking.
Airflow is the most-widely used pipeline orchestration framework in machine learning.
Pre-requisites
Modern browser - and that's it!
Every attendee will receive a cloud instance
Nothing will be installed on your local laptop
Everything can be downloaded at the end of the workshop
Location
Online Workshop
Agenda
1. Create a Kubernetes cluster
2. Install KubeFlow, Airflow, TFX, and Jupyter
3. Setup ML Training Pipelines with KubeFlow and Airflow
4. Transform Data with TFX Transform
5. Validate Training Data with TFX Data Validation
6. Train Models with Jupyter, Keras/TensorFlow 2.0, PyTorch, XGBoost, and KubeFlow
7. Run a Notebook Directly on Kubernetes Cluster with KubeFlow
8. Analyze Models using TFX Model Analysis and Jupyter
9. Perform Hyper-Parameter Tuning with KubeFlow
10. Select the Best Model using KubeFlow Experiment Tracking
11. Reproduce Model Training with TFX Metadata Store and Pachyderm
12. Deploy the Model to Production with TensorFlow Serving and Istio
13. Save and Download your Workspace
Key Takeaways
Attendees will gain experience training, analyzing, and serving real-world Keras/TensorFlow 2.0 models in production using model frameworks and open-source tools.
Related Links
1. PipelineAI Home: https://pipeline.ai
2. PipelineAI Community Edition: http://community.pipeline.ai
3. PipelineAI GitHub: https://github.com/PipelineAI/pipeline
4. Advanced Spark and TensorFlow Meetup (SF-based, Global Reach): https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup
5. YouTube Videos: https://youtube.pipeline.ai
6. SlideShare Presentations: https://slideshare.pipeline.ai
7. Slack Support: https://joinslack.pipeline.ai
8. Web Support and Knowledge Base: https://support.pipeline.ai
9. Email Support: support@pipeline.ai
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...Chris Fregly
Traditional machine learning pipelines end with life-less models sitting on disk in the research lab. These traditional models are typically trained on stale, offline, historical batch data. Static models and stale data are not sufficient to power today's modern, AI-first Enterprises that require continuous model training, continuous model optimizations, and lightning-fast model experiments directly in production. Through a series of open source, hands-on demos and exercises, we will use PipelineAI to breathe life into these models using 4 new techniques that we’ve pioneered:
* Continuous Validation (V)
* Continuous Optimizing (O)
* Continuous Training (T)
* Continuous Explainability (E).
The Continuous "VOTE" techniques has proven to maximize pipeline efficiency, minimize pipeline costs, and increase pipeline insight at every stage from continuous model training (offline) to live model serving (online.)
Attendees will learn to create continuous machine learning pipelines in production with PipelineAI, TensorFlow, and Kafka.
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...Chris Fregly
Perform Online Predictions using Slack
A/B and multi-armed bandit model compare
Train Online Models with Kafka Streams
Create new models quickly
Deploy to production safely
Mirror traffic to validate online performance
Any Framework, Any Hardware, Any Cloud
Dashboard to manage the lifecycle of models from local development to live production
Generates optimized runtimes for the models
Custom targeting rules, shadow mode, and percentage-based rollouts to safely test features in live production
Continuous model training, model validation, and pipeline optimization
https://youtu.be/zpkH9oiIovU
https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/258276286/
Related Links
PipelineAI Home: https://pipeline.ai
PipelineAI Community Edition: https://community.pipeline.ai
PipelineAI GitHub: https://github.com/PipelineAI/pipeline
PipelineAI Quick Start: https://quickstart.pipeline.ai
Advanced Spark and TensorFlow Meetup (SF-based, Global Reach): https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup
YouTube Videos: https://youtube.pipeline.ai
SlideShare Presentations: https://slideshare.pipeline.ai
Slack Support:
https://joinslack.pipeline.ai
Web Support and Knowledge Base: https://support.pipeline.ai
Email Support: help@pipeline.ai
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Chris Fregly
https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/244971261/
Based on this blog post: https://mengdong.github.io/2017/07/15/distributed-tensorflow-with-gpu-on-kubernetes-and-mapr/
youtube video:
https://www.youtube.com/watch?v=3phz1_B-rR4
http://pipeline.ai
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Launch Your Streaming Platforms in MinutesRoshan Dwivedi
The claim of launching a streaming platform in minutes might be a bit of an exaggeration, but there are services that can significantly streamline the process. Here's a breakdown:
Pros of Speedy Streaming Platform Launch Services:
No coding required: These services often use drag-and-drop interfaces or pre-built templates, eliminating the need for programming knowledge.
Faster setup: Compared to building from scratch, these platforms can get you up and running much quicker.
All-in-one solutions: Many services offer features like content management systems (CMS), video players, and monetization tools, reducing the need for multiple integrations.
Things to Consider:
Limited customization: These platforms may offer less flexibility in design and functionality compared to custom-built solutions.
Scalability: As your audience grows, you might need to upgrade to a more robust platform or encounter limitations with the "quick launch" option.
Features: Carefully evaluate which features are included and if they meet your specific needs (e.g., live streaming, subscription options).
Examples of Services for Launching Streaming Platforms:
Muvi [muvi com]
Uscreen [usencreen tv]
Alternatives to Consider:
Existing Streaming platforms: Platforms like YouTube or Twitch might be suitable for basic streaming needs, though monetization options might be limited.
Custom Development: While more time-consuming, custom development offers the most control and flexibility for your platform.
Overall, launching a streaming platform in minutes might not be entirely realistic, but these services can significantly speed up the process compared to building from scratch. Carefully consider your needs and budget when choosing the best option for you.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
How to Position Your Globus Data Portal for Success Ten Good Practices
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San Jose March 2018
1. HYPER-PARAMETER TUNING ACROSS THE ENTIRE
AI PIPELINE: MODEL TRAINING TO PREDICTING
GPU TECH CONFERENCE -- SAN JOSE, MARCH 2018
CHRIS FREGLY
FOUNDER @ PIPELINEAI
2. KEY TAKE-AWAYS
With PipelineAI, You Can…
§ Hyper-Parameter Tuning From Training to Inference
§ Generate Hardware-Specific Pipeline Optimizations
§ Deploy & Compare Optimizations in Live Production
§ Perform Continuous Model Training & Data Labeling
3. AGENDA
Part 0: Introductions and Setup
Part 1: Optimize TensorFlow Training
Part 2: Optimize TensorFlow Serving
Part 3: Advanced Model Serving + Routing
4. INTRODUCTIONS: ME
§ Chris Fregly, Founder & Engineer @ PipelineAI
§ Formerly Netflix, Databricks, IBM Spark Tech
§ Founder @ Advanced Spark TensorFlow Meetup
§ Please Join Our 75,000+ Global Members!!
Contact Me
chris@pipeline.ai
@cfregly
Global Locations
* San Francisco
* Chicago
* Austin
* Washington DC
* Dusseldorf
* London
5. INTRODUCTIONS: YOU
You Want To …
§ Perform Hyper-Parameter Tuning Across *Entire* Pipeline
§ Measure Results of Tuning Both Offline *and* Online
§ Deploy Models Rapidly, Safely, *Directly* in Production
§ Trace and Explain *Live* Model Predictions
6. PIPELINEAI IS OPEN SOURCE
§ https://github.com/PipelineAI/pipeline/
§ Please Star this GitHub Repo!
§ “Each Star is Worth $1,500 in Seed Money”
- A Prominent Venture Capitalist in Silicon Valley
http://jrvis.com/red-dwarf/
9. PIPELINEAI TERMINOLOGY
§ “Flask-App Falacy”: Flask is Not Enough for Production-izing ML/AI Models
§ “Pipeline”: All Phases Including Train, Validate, Optimize, Deploy, and Predict
§ “Experiment”: Across All Environments from Research Lab to Live Production
§ “Turning Knobs”: Hyper-Parameter Tuning Across All Phases of the Pipeline
§ “Model Serving”: Models Serving Predictions in Live Production
§ “Runtime”: Execution Environment for Any Phase of Pipeline (TensorRT, Caffe)
§ “Train-to-Serve”: Training with Intent to Serve Predictions
§ “Train-Serving Skew”: Model Performs Poorly on Live Data
§ “Post-Training Optimization”: Prepare Model and Runtime for Fast Inference
http://NoFlaskApp.com
10. Any Runtime
Any Device CPU, GPU, TPU, IoT
Any Network and System Configuration
Any Clouud and On-Premise Environment
AnyModel
AnyLanguage
AnyFramework
AnyHyper-
Parameter
1,000,000’s of
Model + Runtime Pipeline
Combinations
We Find the Best Combinations
For Your Model and Workload!
WHOLE-PIPELINE HYPER-PARAMETER TUNING
12. WHOLE-PIPELINE HYPER-PARAMETERS
Training: Hyperparameters
pipelinedb.add("learning_rate", 0.025)
pipelinedb.add(”batch_size", 8192)
pipelinedb.add(”num_epochs", 100)
^^ THIS IS WHERE MOST DATA SCIENTISTS END BECAUSE ^^
^^ THEY HAVE NO WAY OF COLLECTING ANYTHING MORE ^^
^^ UNTIL NOW! ^^
pipelinedb.add("ec2_instance_type", "g3.4xlarge”)
pipelinedb.add("utilized_memory_gigabyte", 20)
pipelinedb.add(“network_speed_gigabit”, 10)
pipelinedb.add("training_precision_bits", 16)
pipelinedb.add("accelerator_type", "nvidia_gpu_v100") # google_tpu
pipelinedb.add(“cpu_to_accelerator_network_type", “pcie”) # nvlink
pipelinedb.add(“cpu_to_accelerator_network_bandwidth_gigabit”, 100)
Training: Results
pipelinedb.add("training_accuracy_percent", 95)
pipelinedb.add(“validation_accuracy_percent", 94)
pipelinedb.add("training_auc", 0.70)
pipelinedb.add(“validation_auc", 0.69)
pipelinedb.add(”time_to_train_seconds", 0.69)
Optimization: Hyperparameters
pipelinedb.add(”batch_norm_fusing", True)
pipelinedb.add("weight_quantization_bits", 8) # 2-bit, 7-bit
Optimization: Results (Collected At End of Optimization)
pipelinedb.add("weight_quantization_reduction_percent", 50)
Inference: Hyperparameters
pipelinedb.add("runtime_type", ”tfserving") # python,tensorrt
Pipelinedb.add(“runtime_chip”, “gpu”)
pipelinedb.add("model_type", "tensorflow") # caffe, scikit
pipelinedb.add("request_batch_window_ms", 10)
pipelinedb.add("request_batch_size", 1000)
Inference: Results (Every ~15 Mins Inside PipelineAI Runtime)
pipelinedb.add("latency_99_percentile_ms", 5)
pipelinedb.add("cost_per_prediction_usd", 0.000001)
pipelinedb.add("24_hr_auc", 0.70)
pipelinedb.add("48_hr_auc", 0.30)
Training Optimizing
Inferencing
13. WHY EMPHASIS ON MODEL INFERENCE?
Model Training
Batch & Boring
Offline in Research Lab
Pipeline Ends at Training
No Insight into Live Production
Small Number of Data Scientists
Optimizations Are Very Well-Known
Real-Time & Exciting!!
Online in Live Production
No Ability To Turn Inference Knobs (Yet)
Extend Model Validation Into Production
Huuuuuuge Number of Application Users
Inference Optimizations Not Yet Explored
<<<
Model Inference
100’s Training Jobs per Day 1,000,000’s Predictions per Sec
14. GROWTH IN ML/AI MODELS
2017 2026
Data
Scientists
44,000
11,500,000
$39 Billion in 2017
$2 Trillion by 2026
2017 2026
Models
Trained
50,000,000
200,000
2017 2026
Model
Predictions
250,000,000,000
4,000,000
2016 2026 2016 2026 2016 2026
15. MODEL DEPLOYMENT OPTIONS
§ AWS SageMaker
§ Released Nov 2017 @ Re-invent
§ Custom Docker Images for Training/Serving (ie. PipelineAI Images)
§ Distributed TensorFlow Training through Estimator API
§ Traffic Splitting for A/B Model Testing
§ Google Cloud ML Engine
§ Mostly Command-Line Based
§ Driving TensorFlow Open Source API (ie. Estimator API)
§ Azure ML
§ On-Premise Docker, Docker Swarm, Kubernetes, Mesos
PipelineAI Supports All
Hybrid-Cloud, On-Prem,
and Air-Gap Deployments!
16. WHOLE-PIPELINE OPTIMIZATION OPTIONS
§ Model Training Optimizations
§ Model Hyper-Parameters (ie. Learning Rate)
§ Reduced Precision (ie. FP16 Half Precision)
§ Model Optimizations to Prepare for Inference
§ Quantize Model Weights + Activations From 32-bit to 8-bit
§ Fuse Neural Network Layers Together
§ Model Inference Runtime Optimizations
§ Runtime Config: Request Batch Size, etc
§ Different Runtime: TensorFlow Serving CPU/GPU, Nvidia TensorRT
17. NVIDIA TENSOR-RT RUNTIME
§ Post-Training Model Optimizations
§ Specific to Nvidia GPUs
§ GPU-Optimized Prediction Runtime
§ Alternative to TensorFlow Serving
§ PipelineAI Supports TensorRT!
18. TENSORFLOW LITE OPTIMIZING CONVERTER
§ Post-Training Model Optimizations
§ Currently Supports iOS and Android
§ On-Device Prediction Runtime
§ Low-Latency, Fast Startup
§ Selective Operator Loading
§ 70KB Min - 300KB Max Runtime Footprint
§ Supports Accelerators (GPU, TPU)
§ Falls Back to CPU without Accelerator
§ Java and C++ APIs
bazel build tensorflow/contrib/lite/toco:toco &&
./bazel-bin/third_party/tensorflow/contrib/lite/toco/toco
--input_file=frozen_eval_graph.pb
--output_file=tflite_model.tflite
--input_format=TENSORFLOW_GRAPHDEF --output_format=TFLITE
--inference_type=QUANTIZED_UINT8
--input_shape="1,224, 224,3"
--input_array=input
--output_array=outputs
--std_value=127.5 --mean_value=127.5
19. PIPELINEAI QUICK START
§ http://quickstart.pipeline.ai
§ Any Model, Any Training Runtime, Any Prediction Runtime
§ Support for Docker, Docker Swarm, Kubernetes, Mesos
§ Package Model+Runtime into a Docker Image
§ Emphasizes Immutable Deployment and Infrastructure
§ Same Image Across All Environments
§ No Library or Dependency Surprises from Laptop to Production
§ Allows Tuning Offline and Online Model+Runtime Together
20. STEP 1: BUILD MODEL+TRAINING SERVER
§ Train Model with Specific Hyper-Parameters
§ Monitor and Compare Validation Accuracy
§ Tune Hyper-Parameters to Improve Accuracy
pipeline train-server-build --model-name=mnist
--model-tag=A
--model-type=tensorflow
--model-path=./tensorflow/mnist/0.025/model
Build Model
Training Server A
(Learning Rate 0.025)
pipeline train-server-build --model-name=mnist
--model-tag=B
--model-type=tensorflow
--model-path=./tensorflow/mnist/0.050/model
Build Model
Training Server B
(Learning Rate 0.050)
21. STEP 2: TRAIN, MEASURE, TUNE
§ Train Model with Specific Hyper-Parameters
§ Monitor abnd Compare Validation Accuracy
§ Tune Hyper-Parameters to Improve Accuracy
pipeline train-server-start --model-name=mnist
--model-tag=A
--input-host-path=./tensorflow/mnist/input
--output-host-path=./tensorflow/mnist/output
--train-args= "--learning-rate=0.025 --batch-size=128"
Train
Model A
(Learning Rate 0.025)
pipeline train-server-start --model-name=mnist
--model-tag=B
--input-host-path=./tensorflow/mnist/input
--output-host-path=./tensorflow/mnist/output
--train-args= "--learning-rate=0.025 --batch-size=128"
Train
Model B
(Learning Rate 0.050)
23. STEP 4: BUILD MODEL+PREDICTION SERVER
pipeline predict-server-build --model-name=mnist
--model-tag=C
--model-type=tensorflow
--model-runtime=tensorrt
--model-chip=gpu
--model-path=./tensorflow/mnist/
Build Local
Model Server C
TensorRT GPU
pipeline predict-server-build --model-name=mnist
--model-tag=A
--model-type=tensorflow
--model-runtime=tfserving
--model-chip=cpu
--model-path=./tensorflow/mnist/
Build Local
Model Server A
TF Serving CPU
pipeline predict-server-build --model-name=mnist
--model-tag=B
--model-type=tensorflow
--model-runtime=tfserving
--model-chip=gpu
--model-path=./tensorflow/mnist/
Build Local
Model Server B
TF Serving GPU
Same Model,
3 Different
Prediction
Runtimes
24. STEP 5: PREDICT, MEASURE, TUNE (LOCAL)
§ Perform Mini-Load Test on Local Model Server
§ Immediate Feedback on Prediction Performance
§ Compare to Previous Model+Runtime Variations
§ Gain Intuition Before Pushing to Prod
pipeline predict-server-start --model-name=mnist
--model-tag=A
--memory-limit=2G
pipeline predict-http-test --model-endpoint-url=http://localhost:8080
--test-request-path=test_request.json
--test-request-concurrency=1000
Start Local
Predict Load Test
Start Local
Model Server
25. STEP 6: DEPLOY, MEASURE, TUNE (IN PROD)
§ Deploy from CLI or Jupyter Notebook
§ Tear-Down and Rollback Models Quickly
§ Shadow Canary: Deploy to 20% Live Traffic
§ Split Canary: Deploy to 97-2-1% Live Traffic
pipeline predict-kube-start --model-name=mnist
--model-tag=BStart Cluster B
pipeline predict-kube-start --model-name=mnist
--model-tag=CStart Cluster C
pipeline predict-kube-start --model-name=mnist
--model-tag=AStart Cluster A
pipeline predict-kube-route --model-name=mnist
--model-split-tag-and-weight-dict='{"A":97, "B":2, "C”:1}'
--model-shadow-tag-list='[]'
Route Live Traffic
26. STEP 7: OPTIMIZE, MEASURE, RE-DEPLOY
§ Prepare Model for Predicting
§ Simplify Network, Reduce Size
§ Reduce Precision -> Fast Math
§ Some Tools
§ Graph Transform Tool (GTT)
§ tfcompile
After Training
After
Optimizing!
pipeline optimize --optimization-list=[‘quantize_weights’,‘tfcompile’]
--model-name=mnist
--model-tag=A
--model-path=./tensorflow/mnist/model
--model-inputs=[‘x’]
--model-outputs=[‘add’]
--output-path=./tensorflow/mnist/optimized_model
Linear
Regression
Model Size: 70MB –> 70K (!)
27. STEP 8: EVALUATE MODEL+RUNTIME VARIANT
§ Offline, Batch Metrics
§ Validation + Training Accuracy
§ CPU + GPU Utilization
§ Online, Live Prediction Values
§ Compare Relative Precision
§ Newly-Seen, Streaming Data
§ Online, Real-Time Metrics
§ Response Time, Throughput
§ Cost ($) Per Prediction
29. STEP 10: SHIFT TRAFFIC TO BEST VARIANT
§ A/B Tests
§ Inflexible and Boring
§ Multi-Armed Bandits
§ Adaptive and Exciting!
pipeline predict-kube-route --model-name=mnist
--model-split-tag-and-weight-dict='{"A":1, "B":2, "C”:97}’
--model-shadow-tag-list='[]'
Dynamically Route
Traffic to Winning
Model+Runtime
30. PIPELINE PROFILING AND TUNING
§ Instrument Code to Generate “Timelines” for Any Metric
§ Analyze with Google Web
Tracing Framework (WTF)
§ Can Also Monitor CPU with top, GPU with nvidia-smi
http://google.github.io/tracing-framework/
from tensorflow.python.client import timeline
trace =
timeline.Timeline(step_stats=run_metadata.step_stats)
with open('timeline.json', 'w') as trace_file:
trace_file.write(
trace.generate_chrome_trace_format(show_memory=True))
31. MODEL AND ENSEMBLE TRACING/AUDITING
§ Necessary for Model Explain-ability
§ Fine-Grained Request Tracing
§ Used for Model Ensembles
32. VIEW REAL-TIME PREDICTION STREAMS
§ Visually Compare Real-time Predictions
Features and
Inputs
Predictions and
Confidences
Model B Model CModel A
33. CONTINUOUS DATA LABELING AND FIXING
§ Identify and Fix Borderline (Unconfident) Predictions
§ Fix Predictions Along Class Boundaries
§ Facilitate ”Human in the Loop”
§ Path to Crowd-Sourced Labeling
§ Retrain with Newly-Labeled Data
§ Game-ify the Labeling Process
34. CONTINUOUS MODEL TRAINING
§ The Holy Grail of Machine Learning
§ Kafka, Kinesis, Spark Streaming, Flink, Storm, Heron
PipelineAI Supports
Continuous Model Training
35. AGENDA
Part 0: Introductions and Setup
Part 1: Optimize TensorFlow Training
Part 2: Optimize TensorFlow Serving
Part 3: Advanced Model Serving + Traffic Routing
36. AGENDA
Part 1: Optimize TensorFlow Training
§ GPUs and TensorFlow
§ Feed, Train, and Debug TensorFlow Models
§ TensorFlow Distributed Cluster Model Training
§ Optimize Training with JIT XLA Compiler
37. SETTING UP TENSORFLOW WITH GPUS
§ Very Painful!
§ Especially inside Docker
§ Use nvidia-docker
§ Especially on Kubernetes!
§ Use the Latest Kubernetes (with Init Script Support)
§ http://pipeline.ai for GitHub + DockerHub Links
39. VOLTA V100 AND TENSOR CORES
§ 84 Streaming Multiprocessors (SM’s)
§ 5,376 GPU Cores
§ 640 Tensor Cores (ie. Google TPU)
§ Can Perform 640 FP16 4x4 Matrix Multiplies
§ 120 TFLOPS = 4x FP32 and 10x FP64
§ Allows Mixed FP16/FP32 Precision Operations
§ Matrix Dims Should be Multiples of 8
§ More Shared Memory
§ New L0 Instruction Cache
§ Faster L1 Data Cache
40. GPU HALF-PRECISION SUPPORT
§ FP32: “Full Precision”, FP16: “Half Precision”
§ Two(2) FP16’s in 1 FP32 GPU Core
§ 2x Throughput!
§ Lower Precision is OK
§ Deep learning is approximate
§ The Network Matters Most
§ Not individual neuron accuracy
41. MORE ON HALF-PRECISION
§ 1997: Related Work by SGI
§ Commercial Request from ILM in 2002
§ Implemented in Silicon by Nvidia in 2002
§ Supported by Pascal P100 and Volta V100
42. MORE ON REDUCED-PRECISION
§ Less Precision => Less Memory & Bandwidth
=> Faster Math & Less Energy
§ Fits into Smaller Places Close to ALU’s
§ 4-bit, 2-bit, 1-bit (?!) Quantization
§ More Layers Help Maintain Accuracy at Reduced Precision
§ Tip: Scale and Center Dynamic Range at Each Layer
§ Otherwise, FP16’s become 0 - model may not converge!
43. GPU: 4-WAY DOT PRODUCT OF 8-BIT INTS
§ GPU Hardware and CUDA Support
§ Compute Capability (CC) >= 6.1
44. FP16 VS. INT8
§ FP16 Has Larger Dynamic Range Than INT8
§ Larger Dynamic Range Allows Higher Precision
§ Truncated FP32 Dynamic Range Higher Than FP16
§ Not IEEE 754 Standard, But Worth Exploring
45. ENABLING FP16 IN TENSORFLOW
§ Harder Than You Think!
§ TPUs are 16-bit Native
GPU’s With CC 5.3+ (Only), Set the Following:
TF_FP16_MATMUL_USE_FP32_COMPUTE=0
TF_FP16_CONV_USE_FP32_COMPUTE=0
TF_XLA_FLAGS=--xla_enable_fast_math=1
Pascal P100 Volta V100
46. FP32 VS. FP16 ON AWS GPU INSTANCES
FP16 Half Precision
87.2 T ops/second for p3 Volta V100
4.1 T ops/second for g3 Tesla M60
1.6 T ops/second for p2 Tesla K80
FP32 Full Precision
15.4 T ops/second for p3 Volta V100
4.0 T ops/second for g3 Tesla M60
3.3 T ops/second for p2 Tesla K80
47. § Tesla K80
§ Pascal P100
§ Volta V100 (Beta)
§ TPU (Beta, Google Cloud Only)
GOOGLE CLOUD GPU + TPU
48. GOOGLE CLOUD TPUS
§ Attach/Detach As Needed
§ Scale In/Out As Needed
§ 180 TFlops per Device
§ TPU Pod = 64 TPUs
= 11.5 PetaFlops
§ $6.50 per TPU Hour
§ Supports 16-bit TensorFlow
49. V100 AND CUDA 9
§ Independent Thread Scheduling - Finally!!
§ Similar to CPU fine-grained thread synchronization semantics
§ Allows GPU to yield execution of any thread
§ Still Optimized for SIMT (Same Instruction Multi-Thread)
§ SIMT units automatically scheduled together
§ Explicit Synchronization
P100 V100
New CUDA
Thread Cooperative Groups
https://devblogs.nvidia.com/cooperative-groups/
50. GPU CUDA PROGRAMMING
§ Barbaric, But Fun Barbaric
§ Must Know Hardware Very Well
§ Hardware Changes are Painful
§ Use the Profilers & Debuggers
51. CUDA STREAMS
§ Asynchronous I/O Transfer
§ Overlap Compute and I/O
§ Keep GPUs Saturated!
§ Used Heavily by TensorFlow
Bad
Good
Bad
Good
53. NUMBA AND PYCUDA
§ Numba is Drop-In Replacement for Numpy
§ PyCuda is Python Binding for CUDA
54. AGENDA
Part 1: Optimize TensorFlow Training
§ GPUs and TensorFlow
§ Feed, Train, and Debug TensorFlow Models
§ TensorFlow Distributed Cluster Model Training
§ Optimize Training with JIT XLA Compiler
55. TRAINING TERMINOLOGY
§ Tensors: N-Dimensional Arrays
§ ie. Scalar, Vector, Matrix
§ Operations: MatMul, Add, SummaryLog,…
§ Graph: Graph of Operations (DAG)
§ Session: Contains Graph(s)
§ Feeds: Feed Inputs into Placeholder
§ Fetches: Fetch Output from Operation
§ Variables: What We Learn Through Training
§ aka “Weights”, “Parameters”
§ Devices: Hardware Device (GPU, CPU, TPU, ...)
-TensorFlow-
Trains
Variables
-User-
Fetches
Outputs
-User-
Feeds
Inputs
-TensorFlow-
Performs
Operations
-TensorFlow-
Flows
Tensors
with tf.device(“/cpu:0,/gpu:15”):
57. TENSORFLOW GRAPH EXECUTION
§ Lazy Execution by Default
§ Similar to Spark
§ Eager Execution
§ Similar to PyTorch
§ "Linearize” Execution Minimizes RAM
§ Useful on Single GPU with Limited RAM
§ May Need to Re-Compute (CPU/GPU) vs Store (RAM)
58. OPERATION PARALLELISM
§ Inter-Op (Between-Op) Parallelism
§ By default, TensorFlow runs multiple ops in parallel
§ Useful for low core and small memory/cache envs
§ Set to one (1)
§ Intra-Op (Within-Op) Parallelism
§ Different threads can use same set of data in RAM
§ Useful for compute-bound workloads (CNNs)
§ Set to # of cores (>=2)
59. TENSORFLOW MODEL
§ MetaGraph
§ Combines GraphDef and Metadata
§ GraphDef
§ Architecture of your model (nodes, edges)
§ Metadata
§ Asset: Accompanying assets to your model
§ SignatureDef: Maps external to internal tensors
§ Variables
§ Stored separately during training (checkpoint)
§ Allows training to continue from any checkpoint
§ Variables are “frozen” into Constants when preparing for inference
GraphDef
x
W
mul add
b
MetaGraph
Metadata
Assets
SignatureDef
Tags
Version
Variables:
“W” : 0.328
“b” : -1.407
60. STOCHASTIC GRADIENT DESCENT (SGD)
§ Or “Simply Go Down” J
§ Small Batch Sizes Are Ideal
§ But not too small!
§ Parallel, Distributed Training Across Devices
§ Each device calculates gradients on small batch
§ Gradients averaged across all devices
§ Training is Fast, Batches are Small
63. TENSORFLOW + SPARK OPTIONS
§ TensorFlow on Spark (Yahoo!)
§ TensorFrames <-Dead Project->
§ Separate Clusters for Spark and TensorFlow
§ Spark: Boring Batch ETL
§ TensorFlow: Exciting AI Model Training and Serving
§ Hand-Off Point is S3, HDFS, Google Cloud Storage
64. TENSORFLOW + KAFKA
§ TensorFlow Dataset API Now Supports Kafka!!
from tensorflow.contrib.kafka.python.ops import kafka_dataset_ops
repeat_dataset = kafka_dataset_ops.KafkaDataset(topics,
group="test",
eof=True)
.repeat(num_epochs)
batch_dataset = repeat_dataset.batch(batch_size)
…
65. TENSORFLOW I/O
§ TFRecord File Format
§ TensorFlow Python and C++ Dataset API
§ Python Module and Packaging
§ Comfort with Python’s Lack of Strong Typing
§ C++ Concurrency Constructs
§ Protocol Buffers
§ Old Queue API
§ GPU/CUDA Memory Tricks And a Lot of Coffee!
66. FEED TENSORFLOW TRAINING PIPELINE
§ Training is Limited by the Ingestion Pipeline
§ Number One Problem We See Today
§ Scaling GPUs Up / Out Doesn’t Help
§ GPUs are Heavily Under-Utilized
§ Use tf.dataset API for best perf
§ Efficient parallel async I/O (C++)
Tesla K80 Volta V100
67. DON’T USE FEED_DICT!!
§ feed_dict Requires Python <-> C++ Serialization
§ Not Optimized for Production Ingestion Pipelines
§ Retrieves Next Batch After Current Batch is Done
§ Single-Threaded, Synchronous
§ CPUs/GPUs Not Fully Utilized!
§ Use Queue or Dataset APIs
§ Queues are old & complex
sess.run(train_step, feed_dict={…}
68. DETECT UNDERUTILIZED CPUS, GPUS
§ Instrument Code to Generate “Timelines”
§ Analyze with Google Web
Tracing Framework (WTF)
§ Monitor CPU with top, GPU with nvidia-smi
http://google.github.io/tracing-framework/
from tensorflow.python.client import timeline
trace =
timeline.Timeline(step_stats=run_metadata.step_stats)
with open('timeline.json', 'w') as trace_file:
trace_file.write(
trace.generate_chrome_trace_format(show_memory=True))
69. QUEUES
§ More than Traditional Queue
§ Uses CUDA Streams
§ Perform I/O, Pre-processing, Cropping, Shuffling, …
§ Pull from HDFS, S3, Google Storage, Kafka, ...
§ Combine Many Small Files into Large TFRecord Files
§ Use CPUs to Free GPUs for Compute
§ Helps Saturate CPUs and GPUs
70. QUEUE CAPACITY PLANNING
§ batch_size
§ # examples / batch (ie. 64 jpg)
§ Limited by GPU RAM
§ num_processing_threads
§ CPU threads pull and pre-process batches of data
§ Limited by CPU Cores
§ queue_capacity
§ Limited by CPU RAM (ie. 5 * batch_size)
71. TF.DTYPE
§ tf.float32, tf.int32, tf.string, etc
§ Default is usually tf.float32
§ Most TF operations support numpy natively
# Tuple of (tf.float32 scalar, tf.int32 array of 100 elements)
(tf.random_uniform([1]), tf.random_uniform([1, 100], dtype=tf.int32))
72. TF.TRAIN.FEATURE
§ Three(3) Feature Types
§ Bytes
§ Float
§ Int64
§ Actually, They Are Lists of 0..* Values of 3 Types Above
§ BytesList
§ FloatList
§ Int64List
73. TF.TRAIN.FEATURES
§ Map of {String -> Feature}
§ Better Name is “FeatureMap”
§ Organize Feature into Categories
§ Access Feature Using
Features[’feature_name’]
75. TF.TRAIN.FEATURELISTS
§ Map of {String -> FeatureList}
§ Better Name is “FeatureListMap”
§ Organize FeatureList into Categories
§ Access FeatureList Using
FeatureLists[’feature_list_name’]
76. TF.TRAIN.EXAMPLE
§ Key-Value Dictionary
§ String -> tf.train.Feature
§ Not a Self-Describing Format (?!)
§ Must Establish Schema Upfront by Writers and Readers
§ Must Obey the Following Conventions
§ Feature K must be of Type T in all Examples
§ Feature K can be omitted, default can be configured
§ If Feature K exists as empty, no default is applied
77. TF.TFRECORD
§ Contains many tf.train.Example’s
=> tf.train.Example contains many tf.train.Feature’s
=> tf.train.Feature contains BytesList, FloatList, Int64List
§ Record-Oriented Format of Binary Strings (ProtoBuffer)
§ Must Convert tf.train.Example to Serialized String
§ Use tf.train.Example.SerializeToString()
§ Used for Large Scale ML/AI Training
§ Not Meant for Random or Non-Sequential Access
§ Compression: GZIP, ZLIB
uint64 length
uint32 masked_crc32_of_length
byte data[length]
uint32 masked_crc32_of_data
78. EMBRACE BINARY FORMATS!
§ Unreadable and Scary, But Much More Efficient
§ Better Use of Memory and Disk Cache
§ Faster Copying and Moving
§ Smaller on the Wire
I
79. CONVERTING MNIST DATA TO TFRECORD
def convert_to_tfrecord(data, name):
images = data.images
labels = data.labels
num_examples = data.num_examples
rows = images.shape[1]
cols = images.shape[2]
depth = images.shape[3]
filename = os.path.join(FLAGS.directory, name + '.tfrecords’)
with tf.python_io.TFRecordWriter(filename) as writer:
for index in range(num_examples):
image_raw = images[index].tostring()
example = tf.train.Example(
features = tf.train.Features(
feature = {'height': tf.train.Feature(int64_list=tf.train.Int64List(value=[rows])),
'width': tf.train.Feature(int64_list=tf.train.Int64List(value=[cols])),
'depth': tf.train.Feature(int64_list=tf.train.Int64List(value=[depth])),
'label': tf.train.Feature(int64_list=tf.train.Int64List(value=[index])),
'image_raw': tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_raw]))
}))
writer.write(example.SerializeToString())
tf.python_io.TFRecordWriter
80. READING TF.TFRECORD’S
§ tf.data.TFRecordDatasetß Preferred (Dataset API)
§ tf.TFRecordReader()ß Not Preferred (Queue API)
§ tf.python_io.tf_record_iterator ß Preferred
§ Used as Python Generator
for serialized_example in tf.python_io.tf_record_iterator(filename):
example = tf.train.Example()
example.ParseFromString(serialized_example)
image_raw example.features.feature['image_raw’].string_list.value
height = example.features.feature[‘height'].int32_list.value[0]
…
81. DE-SERIALIZING TF.TFRECORD’S
feature_map = {'height': tf.train.Feature(int64_list=tf.train.Int64List(value=[rows])),
'width': tf.train.Feature(int64_list=tf.train.Int64List(value=[cols])),
'depth': tf.train.Feature(int64_list=tf.train.Int64List(value=[depth])),
'label': tf.train.Feature(int64_list=tf.train.Int64List(value=[index])),
'image_raw': tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_raw]))
deserialized_features = tf.parse_single_example(serialized_example, features=feature_map)
# Cast height from String to int32
height = tf.cast(deserialized_features[‘height’], tf.int32)
…
# Convert raw image from string to float32
image_raw = tf.decode_raw(deserialized_features[‘image_raw'], tf.float32)
82. MORE TF.TRAIN.FEATURE CONSTRUCTS
§ tf.VarLenFeature
§ tf.FixedLenFeature, tf.FixedLenSequenceFeature
§ tf.SparseFeature
feature_map = {'height': tf.FixedLenFeature((), tf.int32, …)),
…
'image_raw': tf.train.VarLenFeature(tf.string, …))
deserialized_features = tf.parse_single_example(serialized_example, features=feature_map)
# Cast height from String to int32
height = tf.cast(deserialized_features[‘height’], tf.int32)
…
# Convert raw image from string to float32
image_raw = tf.decode_raw(deserialized_features[‘image_raw'], tf.float32)
83. TF.DATA.DATASET
tf.Tensor => tf.data.Dataset
Functional Transformations
Python Generator => tf.data.Dataset
Dataset.from_tensors((features, labels))
Dataset.from_tensor_slices((features, labels))
TextLineDataset(filenames)
dataset.map(lambda x: tf.decode_jpeg(x))
dataset.repeat(NUM_EPOCHS)
dataset.batch(BATCH_SIZE)
def generator():
while True:
yield ...
dataset.from_generator(generator, tf.int32)
Dataset => One-Shot Iterator
Dataset => Initializable Iter
iter = dataset.make_one_shot_iterator()
next_element = iter.get_next()
while …:
sess.run(next_element)
iter = dataset.make_initializable_iterator()
sess.run(iter.initializer, feed_dict=PARAMS)
next_element = iter.get_next()
while …:
sess.run(next_element)
TIP: Use Dataset.prefetch() and parallel version of Dataset.map()
86. CUSTOM TF.PY_FUNC() TRANSFORMATION
§ Custom Python Function
§ Similar to Spark Python UDF (Eek!)
§ You Will Suffer a Big Performance Penalty
§ Try to Use TensorFlow-Native Operations
§ Remember, you can build your own in C++!
87. TF.DATA.ITERATOR TYPES
§ One Shot: Iterates Once Through the Dataset
§ Currently, best Iterator to use with Estimator API
§ Initializable: Runs iterator.initializer() Once
§ Re-Initializable: Runs iterator.initializer() Many
§ Ie. Random shuffling between iterations (epochs) of training
§ Feedable: Switch Between Different Dataset
§ Uses Feed and Placeholder to explicitly feed the iterator
§ Doesn’t require initialization when switching
88. TF.DATA.ITERATOR SIMPLE EXAMPLE
dataset = tf.data.Dataset.range(5)
iterator = dataset.make_initializable_iterator()
next_element = iterator.get_next()
# Typically `result` will be the output of a model, or an optimizer's
# training operation.
result = tf.add(next_element, next_element)
sess.run(iterator.initializer)
while True:
try:
sess.run(result) # è 0, 2, 4, 6, 8
except tf.errors.OutOfRangeError:
print(‘End of dataset…’)
break
89. TF.DATA.ITERATOR TEXT EXAMPLE
filenames = ["/var/data/file1.txt", "/var/data/file2.txt"]
dataset = tf.data.TextLineDataset(filenames)
filenames = ["/var/data/file1.txt", "/var/data/file2.txt"]
dataset = tf.data.Dataset.from_tensor_slices(filenames)
dataset = dataset.flat_map(
lambda filename: (
tf.data.TextLineDataset(filename)
.skip(1)
.filter(lambda line: tf.not_equal(tf.substr(line, 0, 1), "#"))))
§ Skip 1st Header Line and Comment Lines Starting with `#`
90. TF.DATA.ITERATOR NUMPY EXAMPLE
# Load the training data into two NumPy arrays, for example using `np.load()`.
with np.load("/var/data/training_data.npy") as data:
features = data["features"]
labels = data["labels"]
# Assume that each row of `features` corresponds to the same row as `labels`.
assert features.shape[0] == labels.shape[0]
features_placeholder = tf.placeholder(features.dtype, features.shape)
labels_placeholder = tf.placeholder(labels.dtype, labels.shape)
dataset = tf.data.Dataset.from_tensor_slices((features_placeholder, labels_placeholder))
# …Your Dataset Transformations…
iterator = dataset.make_initializable_iterator()
sess.run(iterator.initializer, feed_dict={features_placeholder: features,
labels_placeholder: labels})
91. TF.DATA.ITERATOR TFRECORD EXAMPLE
filenames = tf.placeholder(tf.string, shape=[None])
dataset = tf.data.TFRecordDataset(filenames)
dataset = dataset.map(...) # Parse the record into tensors.
dataset = dataset.repeat() # Repeat the input indefinitely.
dataset = dataset.batch(32) # Batches of size 32
iterator = dataset.make_initializable_iterator()
# You can feed the initializer with the appropriate filenames for the current
# phase of execution, e.g. training vs. validation.
# Initialize `iterator` with training data.
training_filenames = ["/var/data/file1.tfrecord", "/var/data/file2.tfrecord"]
sess.run(iterator.initializer, feed_dict={filenames: training_filenames})
# Initialize `iterator` with validation data.
validation_filenames = ["/var/data/validation1.tfrecord", ...]
sess.run(iterator.initializer, feed_dict={filenames: validation_filenames})
92. FUTURE OF DATASET API
§ Replaces Queue API
§ More Functional Operators
§ Automatic GPU Data Staging and Pre-Fetching
§ Under-utilized GPUs Assisting with Data Ingestion
§ More Profiling and Recommendations for Ingestion
93. TF.ESTIMATOR.ESTIMATOR (1/2)
§ Supports Keras!
§ Unified API for Local + Distributed
§ Provide Clear Path to Production
§ Enable Rapid Model Experiments
§ Provide Flexible Parameter Tuning
§ Enable Downstream Optimizing & Serving Infra( )
§ Nudge Users to Best Practices Through Opinions
§ Provide Hooks/Callbacks to Override Opinions
94. TF.ESTIMATOR.ESTIMATOR (2/2)
§ “Train-to-Serve” Design
§ Create Custom Estimator or Re-Use Canned Estimator
§ Hides Session, Graph, Layers, Iterative Loops (Train, Eval, Predict)
§ Hooks for All Phases of Model Training and Evaluation
§ Load Input: input_fn()
§ Train: model_fn() and train()
§ Evaluate: eval_fn() and evaluate()
§ Performance Metrics: Loss, Accuracy, …
§ Save and Export: export_savedmodel()
§ Predict: predict() Uses the slow sess.run()
https://github.com/GoogleCloudPlatform/cloudml-samples
/blob/master/census/customestimator/
95. TF.CONTRIB.LEARN.EXPERIMENT
§ Easier-to-Use Distributed TensorFlow
§ Same API for Local and Distributed
§ Combines Estimator with input_fn()
§ Used for Training, Evaluation, & Hyper-Parameter Tuning
§ Distributed Training Defaults to Data-Parallel & Async
§ Cluster Configuration is Fixed at Start of Training Job
§ No Auto-Scaling Allowed, but That’s OK for Training
Note: The Experiment API Will Likely Be Deprecated Soon
96. ESTIMATOR + EXPERIMENT CONFIGS
§ TF_CONFIG
§ Special environment variable for config
§ Defines ClusterSpec in JSON incl. master, workers, PS’s
§ Distributed mode ‘{“environment”:“cloud”}’
§ Local: ‘{environment”:“local”, {“task”:{”type”:”worker”}}’
§ RunConfig: Defines checkpoint interval, output directory,
§ HParams: Hyper-parameter tuning parameters and ranges
§ learn_runner creates RunConfig before calling run() & tune()
§ schedule is set based on {”task”:{”type”:…}}
TF_CONFIG=
'{
"environment": "cloud",
"cluster":
{
"master":["worker0:2222”],
"worker":["worker1:2222"],
"ps": ["ps0:2222"]
},
"task": {"type": "ps",
"index": "0"}
}'
97. ESTIMATOR + KERAS
§ Distributed TensorFlow (Estimator) + Easy to Use (Keras)
§ tf.keras.estimator.model_to_estimator()
# Instantiate a Keras inception v3 model.
keras_inception_v3 = tf.keras.applications.inception_v3.InceptionV3(weights=None)
# Compile model with the optimizer, loss, and metrics you'd like to train with.
keras_inception_v3.compile(optimizer=tf.keras.optimizers.SGD(lr=0.0001, momentum=0.9),
loss='categorical_crossentropy',
metric='accuracy')
# Create an Estimator from the compiled Keras model.
est_inception_v3 = tf.keras.estimator.model_to_estimator(keras_model=keras_inception_v3)
# Treat the derived Estimator as you would any other Estimator. For example,
# the following derived Estimator calls the train method:
est_inception_v3.train(input_fn=my_training_set, steps=2000)
98. “CANNED” ESTIMATORS
§ Commonly-Used Estimators
§ Pre-Tested and Pre-Tuned
§ DNNClassifer, TensorForestEstimator
§ Always Use Canned Estimators If Possible
§ Reduce Lines of Code, Complexity, and Bugs
§ Use FeatureColumn to Define & Create Features
Custom vs. Canned
@ Google, August 2017
99. ESTIMATOR + DATASET API
def input_fn():
def generator():
while True:
yield ...
my_dataset = tf.data.dataset.from_generator(generator, tf.int32)
# A one-shot iterator automatically initializes itself on first use.
iter = my_dataset.make_one_shot_iterator()
# The return value of get_next() matches the dataset element type.
images, labels = iter.get_next()
return images, labels
# The input_fn can be used as a regular Estimator input function.
estimator = tf.estimator.Estimator(…)
estimator.train(train_input_fn=input_fn, …)
106. TF.CONTRIB.LEARN.HEAD (OBJECTIVES)
§ Single-Objective Estimator
§ Single classification prediction
§ Multi-Objective Estimator
§ One (1) classification prediction
§ One(1) final layer to feed into next model
§ Multiple Heads Used to Ensemble Models
§ Treats neural network as a feature engineering step
§ Supported by TensorFlow Serving
107. TF.LAYERS
§ Standalone Layer or Entire Sub-Graphs
§ Functions of Tensor Inputs & Outputs
§ Mix and Match with Operations
§ Assumes 1st Dimension is Batch Size
§ Handles One (1) to Many (*) Inputs
§ Metrics are Layers
§ Loss Metric (Per Mini-Batch)
§ Accuracy and MSE (Across Mini-Batches)
108. TF.FEATURE_COLUMN
§ Used by Canned Estimator
§ Declaratively Specify Training Inputs
§ Converts Sparse to Dense Tensors
§ Sparse Features: Query Keyword, ProductID
§ Dense Features: One-Hot, Multi-Hot
§ Wide/Linear: Use Feature-Crossing
§ Deep: Use Embeddings
109. TF.FEATURE_COLUMN EXAMPLE
§ Continuous + One-Hot + Embedding
deep_columns = [
age,
education_num,
capital_gain,
capital_loss,
hours_per_week,
tf.feature_column.indicator_column(workclass),
tf.feature_column.indicator_column(education),
tf.feature_column.indicator_column(marital_status),
tf.feature_column.indicator_column(relationship),
# To show an example of embedding
tf.feature_column.embedding_column(occupation, dimension=8),
]
110. FEATURE CROSSING
§ Create New Features by Combining Existing Features
§ Limitation: Combinations Must Exist in Training Dataset
base_columns = [
education, marital_status, relationship, workclass, occupation, age_buckets
]
crossed_columns = [
tf.feature_column.crossed_column(
['education', 'occupation'], hash_bucket_size=1000),
tf.feature_column.crossed_column(
['age_buckets', 'education', 'occupation'], hash_bucket_size=1000)
]
111. SEPARATE TRAINING + EVALUATION
§ Separate Training and Evaluation Clusters
§ Evaluate Upon Checkpoint
§ Avoid Resource Contention
§ Training Continues in Parallel with Evaluation
Training
Cluster
Evaluation
Cluster
Parameter Server
Cluster
112. BATCH (RE-)NORMALIZATION (2015, 2017)
§ Each Mini-Batch May Have Wildly Different Distributions
§ Normalize per Batch (and Layer)
§ Faster Training, Learns Quicker
§ Final Model is More Accurate
§ TensorFlow is already on 2nd Generation Batch Algorithm
§ First-Class Support for Fusing Batch Norm Layers
§ Final mean + variance Are Folded Into Graph Later
-- (Almost) Always Use Batch (Re-)Normalization! --
z = tf.matmul(a_prev, W)
a = tf.nn.relu(z)
a_mean, a_var = tf.nn.moments(a, [0])
scale = tf.Variable(tf.ones([depth/channels]))
beta = tf.Variable(tf.zeros ([depth/channels]))
bn = tf.nn.batch_normalizaton(a, a_mean, a_var,
beta, scale, 0.001)
113. DROPOUT (2014)
§ Training Technique
§ Prevents Overfitting
§ Helps Avoid Local Minima
§ Inherent Ensembling Technique
§ Creates and Combines Different Neural Architectures
§ Expressed as Probability Percentage (ie. 50%)
§ Boost Other Weights During Validation & Prediction
Perform Dropout
(Training Phase)
Boost for Dropout
(Validation & Prediction Phase)
0%
Dropout
50%
Dropout
114. BATCH NORM, DROPOUT + ESTIMATOR API
§ Must Specify Evaluation or Training Mode
§ These Will Behave Differently Depending on Mode
115. SAVED MODEL FORMAT
§ Different Format than Traditional Exporter
§ Contains Checkpoints, 1..* MetaGraph’s, and Assets
§ Export Manually with SavedModelBuilder
§ Estimator.export_savedmodel()
§ Hooks to Generate SignatureDef
§ Use saved_model_cli to Verify
§ Used by TensorFlow Serving
§ New Standard Export Format? (Catching on Slowly…)
116. TENSORFLOW DEBUGGER
§ Step through Operations
§ Inspect Inputs and Outputs
§ Wrap Session in Debug Session
sess = tf.Session(config=config)
sess =
tf_debug.LocalCLIDebugWrapperSession(sess)
https://www.tensorflow.org/programmers_guide/debugger
117. AGENDA
Part 1: Optimize TensorFlow Training
§ GPUs and TensorFlow
§ Train, Inspect, and Debug TensorFlow Models
§ TensorFlow Distributed Cluster Model Training
§ Optimize Training with JIT XLA Compiler
118. SINGLE NODE, MULTI-GPU TRAINING
§ cpu:0
§ By default, all CPUs
§ Requires extra config to target a CPU
§ gpu:0..n
§ Each GPU has a unique id
§ TF usually prefers a single GPU
§ xla_cpu:0, xla_gpu:0..n
§ “JIT Compiler Device”
§ Hints TensorFlow to attempt JIT Compile
with tf.device(“/cpu:0”):
with tf.device(“/gpu:0”):
with tf.device(“/gpu:1”):
GPU 0 GPU 1
119. DISTRIBUTED, MULTI-NODE TRAINING
§ TensorFlow Automatically Inserts Send and Receive Ops into Graph
§ Parameter Server Synchronously Aggregates Updates to Variables
§ Nodes with Multiple GPUs will Pre-Aggregate Before Sending to PS
Worker0 Worker0
Worker1
Worker0 Worker1 Worker2
gpu0 gpu1
gpu2 gpu3
gpu0 gpu1
gpu2 gpu3
gpu0 gpu1
gpu2 gpu3
gpu0
gpu1
gpu0
gpu0
Single
Node
Multiple
Nodes
120. DATA PARALLEL VS. MODEL PARALLEL
§ Data Parallel (“Between-Graph Replication”)
§ Send exact same model to each device
§ Each device operates on partition of data
§ ie. Spark sends same function to many workers
§ Each worker operates on their partition of data
§ Model Parallel (“In-Graph Replication”)
§ Send different partition of model to each device
§ Each device operates on all data
§ Difficult, but required for larger models with lower-memory GPUs
121. SYNCHRONOUS VS. ASYNCHRONOUS
§ Synchronous
§ Nodes compute gradients
§ Nodes update Parameter Server (PS)
§ Nodes sync on PS for latest gradients
§ Asynchronous
§ Some nodes delay in computing gradients
§ Nodes don’t update PS
§ Nodes get stale gradients from PS
§ May not converge due to stale reads!
122. CHIEF WORKER
§ Chief Defaults to Worker Task 0
§ Task 0 is guaranteed to exist
§ Performs Maintenance Tasks
§ Writes log summaries
§ Instructs PS to checkpoint vars
§ Performs PS health checks
§ (Re-)Initialize variables at (re-)start of training
123. NODE AND PROCESS FAILURES
§ Checkpoint to Persistent Storage (HDFS, S3)
§ Use MonitoredTrainingSession and Hooks
§ Use a Good Cluster Orchestrator (ie. Kubernetes, Mesos)
§ Understand Failure Modes and Recovery States
Stateless, Not Bad: Training Continues Stateful, Bad: Training Must Stop Dios Mio! Long Night Ahead…
125. AGENDA
Part 1: Optimize TensorFlow Training
§ GPUs and TensorFlow
§ Train, Inspect, and Debug TensorFlow Models
§ TensorFlow Distributed Cluster Model Training
§ Optimize Training with JIT XLA Compiler
126. XLA FRAMEWORK
§ XLA: “Accelerated Linear Algebra”
§ Reduce Reliance on Custom Operators
§ Intermediate Representation used by Hardware Vendors
§ Improve Portability
§ Increase Execution Speed
§ Decrease Memory Usage
§ Decrease Mobile Footprint
Helps TensorFlow Be Flexible AND Performant!!
127. XLA HIGH LEVEL OPTIMIZER (HLO)
§ HLO: “High Level Optimizer”
§ Compiler Intermediate Representation (IR)
§ Independent of source and target language
§ XLA Step 1 Emits Target-Independent HLO
§ XLA Step 2 Emits Target-Dependent LLVM
§ LLVM Emits Native Code Specific to Target
§ Supports x86-64, ARM64 (CPU), and NVPTX (GPU)
128. XLA IS DESIGNED FOR RE-USE
§ Pluggable Backends
§ HLO “Toolkit”
§ Call BLAS or cuDNN
§ Use LLVM or BYO Low-Level-Optimizer
135. XLA PERFORMANCE OPTIMIZATIONS
§ JIT Training
§ MNIST: 30% Speed Up
§ Inception: 20% Speed Up
§ Basic LSTM: 80% Speed Up
§ Translation Model BNMT: 20% Speed Up
§ AOT Inference (Next Section)
§ LSTM Model Size: 1 MB => 10 KB
136. JIT COMPILER
§ JIT: “Just-In-Time” Compiler
§ Built on XLA Framework
§ Reduce Memory Movement – Especially with GPUs
§ Reduce Overhead of Multiple Function Calls
§ Similar to Spark Operator Fusing in Spark 2.0
§ Unroll Loops, Fuse Operators, Fold Constants, …
§ Scopes: session, device, with jit_scope():
138. VISUALIZING JIT COMPILER IN ACTION
Before JIT After JIT
Google Web Tracing Framework:
http://google.github.io/tracing-framework/
from tensorflow.python.client import timeline
trace =
timeline.Timeline(step_stats=run_metadata.step_stats)
with open('timeline.json', 'w') as trace_file:
trace_file.write(
trace.generate_chrome_trace_format(show_memory=True))
run_options = tf.RunOptions(trace_level=tf.RunOptions.SOFTWARE_TRACE)
run_metadata = tf.RunMetadata()
sess.run(options=run_options,
run_metadata=run_metadata)
140. XLA COMPILATION SUMMARY
§ Generates Code and Libraries for Your Computation
§ Packages Libraries Needed for Your
§ Eliminates Dispatch Overhead of Operations
§ Fuses Operations to Avoid Memory Round Trip
§ Analyzes Buffers to Reuse Memory
§ Updates Memory In-Place
§ Unrolls Loops with Your Data Dimensions (ie.Batch Size)
§ Vectorizes Operations Specific to Your Data Dimensions
141. AGENDA
Part 0: Introductions and Setup
Part 1: Optimize TensorFlow Training
Part 2: Optimize TensorFlow Serving
Part 3: Advanced Model Serving + Traffic Routing
143. AGENDA
Part 2: Optimize TensorFlow Serving
§ AOT XLA Compiler and Graph Transform Tool
§ Key Components of TensorFlow Serving
§ Deploy Optimized TensorFlow Model
§ Optimize TensorFlow Serving Runtime
144. AOT COMPILER
§ Standalone, Ahead-Of-Time (AOT) Compiler
§ Built on XLA framework
§ tfcompile
§ Creates executable with minimal TensorFlow Runtime needed
§ Includes only dependencies needed by subgraph computation
§ Creates functions with feeds (inputs) and fetches (outputs)
§ Packaged as cc_libary header and object files to link into your app
§ Commonly used for mobile device inference graph
§ Currently, only CPU x86-64 and ARM are supported - no GPU
145. GRAPH TRANSFORM TOOL (GTT)
§ Post-Training Optimization to Prepare for Inference
§ Remove Training-only Ops (checkpoint, drop out, logs)
§ Remove Unreachable Nodes between Given feed -> fetch
§ Fuse Adjacent Operators to Improve Memory Bandwidth
§ Fold Final Batch Norm mean and variance into Variables
§ Round Weights/Variables to improve compression (ie. 70%)
§ Quantize (FP32 -> INT8) to Speed Up Math Operations
148. AFTER STRIPPING UNUSED NODES
§ Optimizations
§ strip_unused_nodes
§ Results
§ Graph much simpler
§ File size much smaller
149. AFTER REMOVING UNUSED NODES
§ Optimizations
§ strip_unused_nodes
§ remove_nodes
§ Results
§ Pesky nodes removed
§ File size a bit smaller
150. AFTER FOLDING CONSTANTS
§ Optimizations
§ strip_unused_nodes
§ remove_nodes
§ fold_constants
§ Results
§ Placeholders (feeds) -> Variables*
(*Why Variables and not Constants?)
151. AFTER FOLDING BATCH NORMS
§ Optimizations
§ strip_unused_nodes
§ remove_nodes
§ fold_constants
§ fold_batch_norms
§ Results
§ Graph remains the same
§ File size approximately the same
152. AFTER QUANTIZING WEIGHTS
§ Optimizations
§ strip_unused_nodes
§ remove_nodes
§ fold_constants
§ fold_batch_norms
§ quantize_weights
§ Results
§ Graph is same, file size is smaller, compute is faster
153. WEIGHT (VARIABLE) QUANTIZATION
§ FP16 or INT8: Smaller & Computationally Faster than FP32
§ Easy to “Linearly Quantize” (Re-Encode) FP32 -> INT8
Easy Breezy!
154. BENEFITS OF 32-BIT TO 8-BIT QUANTIZE
§ First Class Hardware and CUDA Support
§ One 32-Bit GPU Core: 4-Way Dot Product of 8-Bit Ints
§ GPU Compute Capability (CC) >= 6.1 Only
155. ACTIVATION QUANTIZATION
§ Activations Not Known Ahead of Time
§ Depends on input, not easy to quantize
§ Requires Additional Calibration Step
§ Use representative, diverse validation dataset
§ ~1000 samples, ~10 minutes,, cheap hardware
§ Run 32-Bit Inference with Calibration Data
§ Collect histogram of activation values at each layer
§ Generate many quantized distributions at diff saturation thresholds
§ Choose Saturation Threshold That Minimizes Accuracy Loss
156. CHOOSING SATURATION THRESHOLD
§ Trade-off Between Range & Precision
§ INT8 Should Encode Same Information As Original FP32
§ Minimize Loss of Information Across Encoding/Distributions
§ Use KL_Divergence(32bit_dist, 8bit_dist)
§ Compares 2 distributions
§ Similar to Cross-Entropy
157. SATURATE TO MINIMIZE ACCURACY LOSS
§ Helps Preserve Accuracy After Activation Quantization
§ Goal: Find Threshold (T) That Minimizes Accuracy Loss
No Saturation Saturation
158. AUTO-CALIBRATE: PIPELINEAI + TENSOR-RT
Pre-Requisites
§ 32-Bit Trained Model (TensorFlow, Caffe)
§ Small Calibration Dataset (Validation)
PipelineAI + TensorRT Optimizations
§ Run 32-Bit Inference on Calibration Dataset
§ Collect Required Statistics
§ Use KL_Divergence to Determine Saturation Thresholds
§ Perform 32-Bit Float -> 8-Bit Int Quantization
§ Generate Calibration Table and INT8 Execution Engine
159. 32-BIT TO 8-BIT QUANTIZATION RESULTS
Accuracy of INT8 Models Comparable to FP32
163. AGENDA
Part 2: Optimize TensorFlow Serving
§ AOT XLA Compiler and Graph Transform Tool
§ Key Components of TensorFlow Serving
§ Deploy Optimized TensorFlow Model
§ Optimize TensorFlow Serving Runtime
164. MODEL SERVING TERMINOLOGY
§ Inference
§ Only Forward Propagation through Network
§ Predict, Classify, Regress, …
§ Bundle
§ GraphDef, Variables, Metadata, …
§ Assets
§ ie. Map of ClassificationID -> String
§ {9283: “penguin”, 9284: “bridge”}
§ Version
§ Every Model Has a Version Number (Integer)
§ Version Policy
§ ie. Serve Only Latest (Highest), Serve Both Latest and Previous, …
165. TENSORFLOW SERVING FEATURES
§ Supports Auto-Scaling
§ Custom Loaders beyond File-based
§ Tune for Low-latency or High-throughput
§ Serve Diff Models/Versions in Same Process
§ Customize Models Types beyond HashMap and TensorFlow
§ Customize Version Policies for A/B and Bandit Tests
§ Support Request Draining for Graceful Model Updates
§ Enable Request Batching for Diff Use Cases and HW
§ Supports Optimized Transport with GRPC and Protocol Buffers
167. PREDICTION SERVICE
§ Predict (Original, Generic)
§ Input: List of Tensor
§ Output: List of Tensor
§ Classify
§ Input: List of tf.Example (key, value) pairs
§ Output: List of (class_label: String, score: float)
§ Regress
§ Input: List of tf.Example (key, value) pairs
§ Output: List of (label: String, score: float)
169. MULTI-HEADED INFERENCE
§ Inputs Pass Through Model One Time
§ Model Returns Multiple Predictions:
1. Human-readable prediction (ie. “penguin”, “church”,…)
2. Final layer of scores (float vector)
§ Final Layer of floats Pass to the Next Model in Ensemble
§ Optimizes Bandwidth, CPU/GPU, Latency, Memory
§ Enables Complex Model Composing and Ensembling
170. BUILD YOUR OWN MODEL SERVER
§ Adapt GRPC(Google) <-> HTTP (REST of the World)
§ Perform Batch Inference vs. Request/Response
§ Handle Requests Asynchronously
§ Support Mobile, Embedded Inference
§ Customize Request Batching
§ Add Circuit Breakers, Fallbacks
§ Control Latency Requirements
§ Reduce Number of Moving Parts
#include
“tensorflow_serving/model_servers/server_core.h”
class MyTensorFlowModelServer {
ServerCore::Options options;
// set options (model name, path, etc)
std::unique_ptr<ServerCore> core;
TF_CHECK_OK(
ServerCore::Create(std::move(options), &core)
);
}
Compile and Link with
libtensorflow.so
171. RUNTIME OPTION: NVIDIA TENSOR-RT
§ Post-Training Model Optimizations
§ Specific to Nvidia GPU
§ Similar to TF Graph Transform Tool
§ GPU-Optimized Prediction Runtime
§ Alternative to TensorFlow Serving
§ PipelineAI Supports TensorRT!
172. AGENDA
Part 2: Optimize TensorFlow Serving
§ AOT XLA Compiler and Graph Transform Tool
§ Key Components of TensorFlow Serving
§ Deploy Optimized TensorFlow Model
§ Optimize TensorFlow Serving Runtime
173. AGENDA
Part 2: Optimize TensorFlow Serving
§ AOT XLA Compiler and Graph Transform Tool
§ Key Components of TensorFlow Serving
§ Deploy Optimized TensorFlow Model
§ Optimize TensorFlow Serving Runtime
174. REQUEST BATCH TUNING
§ max_batch_size
§ Enables throughput/latency tradeoff
§ Bounded by RAM
§ batch_timeout_micros
§ Defines batch time window, latency upper-bound
§ Bounded by RAM
§ num_batch_threads
§ Defines parallelism
§ Bounded by CPU cores
§ max_enqueued_batches
§ Defines queue upper bound, throttling
§ Bounded by RAM
Reaching either threshold
will trigger a batch
Separate, Non-Batched Requests
Combined, Batched Requests
175. ADVANCED BATCHING & SERVING TIPS
§ Batch Just the GPU/TPU Portions of the Computation Graph
§ Batch Arbitrary Sub-Graphs using Batch / Unbatch Graph Ops
§ Distribute Large Models Into Shards Across TensorFlow Model Servers
§ Batch RNNs Used for Sequential and Time-Series Data
§ Find Best Batching Strategy For Your Data Through Experimentation
§ BasicBatchScheduler: Homogeneous requests (ie Regress or Classify)
§ SharedBatchScheduler: Mixed requests, multi-step, ensemble predict
§ StreamingBatchScheduler: Mixed CPU/GPU/IO-bound Workloads
§ Serve Only One (1) Model Inside One (1) TensorFlow Serving Process
§ Much Easier to Debug, Tune, Scale, and Manage Models in Production.
177. AGENDA
Part 0: Introductions and Setup
Part 1: Optimize TensorFlow Training
Part 2: Optimize TensorFlow Serving
Part 3: Advanced Model Serving + Traffic Routing
178. AGENDA
Part 3: Advanced Model Serving + Traffic
Routing
§ Kubernetes Ingress, Egress, Networking
§ Istio and Envoy Architecture
§ Intelligent Traffic Routing and Scaling
§ Metrics, Chaos Monkey, Production Readiness
179. KUBERNETES PRIORITY SCHEDULING
Workloads can …
§ access the entire cluster up
to the autoscaler max size
§ trigger autoscaling until
higher-priority workload
§ “fill the cracks” of resource
usage of higher-priority work
(i.e., wait to run until resources are feed
180. KUBERNETES INGRESS
§ Single Service
§ Can also use Service (LoadBalancer or NodePort)
§ Fan Out & Name-Based Virtual Hosting
§ Route Traffic Using Path or Host Header
§ Reduces # of load balancers needed
§ 404 Implemented as default backend
§ Federation / Hybrid-Cloud
§ Creates Ingress objects in every cluster
§ Monitors health and capacity of pods within each cluster
§ Routes clients to appropriate backend anywhere in federation
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: gateway-fanout
annotations:
kubernetes.io/ingress.class: istio
spec:
rules:
- host: foo.bar.com
http:
paths:
- path: /foo
backend:
serviceName: s1
servicePort: 80
- path: /bar
backend:
serviceName: s2
servicePort: 80
Fan Out (Path)
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: gateway-virtualhost
annotations:
kubernetes.io/ingress.class: istio
spec:
rules:
- host: foo.bar.com
http:
paths:
backend:
serviceName: s1
servicePort: 80
- host: bar.foo.com
http:
paths:
backend:
serviceName: s2
servicePort: 80
Virtual Hosting
181. KUBERNETES INGRESS CONTROLLER
§ Ingress Controller Types
§ Google Cloud: kubernetes.io/ingress.class: gce
§ Nginx: kubernetes.io/ingress.class: nginx
§ Istio: kubernetes.io/ingress.class: istio
§ Must Start Ingress Controller Manually
§ Just deploying Ingress is not enough
§ Not started by kube-controller-manager
§ Start Istio Ingress Controller
kubectl apply -f
$ISTIO_INSTALL_PATH/install/kubernetes/istio.yaml
193. ISTIO AUTO-SCALING
§ Traffic Routing and Auto-Scaling Occur Independently
§ Istio Continues to Obey Traffic Splits After Auto-Scaling
§ Auto-Scaling May Occur In Response to New Traffic Route
194. A/B & BANDIT MODEL TESTING
§ Perform Live Experiments in Production
§ Compare Existing Model A with Model B, Model C
§ Safe Split-Canary Deployment
§ Pro Tip: Keep Ingress Simple – Use Route Rules Instead!
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
name: predict-mnist-20-5-75
spec:
destination:
name: predict-mnist
precedence: 2 # Greater than global deny-all
route:
- labels:
version: A
weight: 20 # 20% still routes to model A
- labels:
version: B # 5% routes to new model B
weight: 5
- labels:
version: C # 75% routes to new model C
weight: 75
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
name: predict-mnist-1-2-97
spec:
destination:
name: predict-mnist
precedence: 2 # Greater than global deny-all
route:
- labels:
version: A
weight: 1 # 1% routes to model A
- labels:
version: B # 2% routes to new model B
weight: 2
- labels:
version: C # 97% routes to new model C
weight: 97
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
name: predict-mnist-97-2-1
spec:
destination:
name: predict-mnist
precedence: 2 # Greater than global deny-all
route:
- labels:
version: A
weight: 97 # 97% still routes to model A
- labels:
version: B # 2% routes to new model B
weight: 2
- labels:
version: C # 1% routes to new model C
weight: 1
195. AGENDA
Part 3: Advanced Model Serving + Traffic
Routing
§ Kubernetes Ingress, Egress, Networking
§ Istio and Envoy Architecture
§ Intelligent Traffic Routing and Scaling
§ Metrics, Chaos Monkey, Production Readiness
198. SPECIAL THANKS TO CHRISTIAN POSTA
§ http://blog.christianposta.com/istio-workshop
199. AGENDA
Part 0: Introductions and Setup
Part 1: Optimize TensorFlow Training
Part 2: Optimize TensorFlow Serving
Part 3: Advanced Model Serving + Traffic Routing
202. THANK YOU!!
§ Please Star this GitHub Repo!
§ All slides, code, notebooks, and Docker images here:
https://github.com/PipelineAI/pipeline
Contact Me
chris@pipeline.ai
@cfregly