Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker + Google Cloud ML + Azure ML

1,098 views

Published on

Advanced Spark and TensorFlow Meetup - Dec 12, 2017

https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/244971261/

http://pipeline.ai

Published in: Software
  • Be the first to comment

PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker + Google Cloud ML + Azure ML

  1. 1. HIGH PERFORMANCE MODEL SERVING WITH KUBERNETES AND ISTIO… …AND AWS SAGEMAKER, GOOGLE CLOUD ML, AZURE ML! CHRIS FREGLY FOUNDER @ PIPELINE.AI
  2. 2. RECENT PIPELINE.AI NEWS Sept 2017 Dec 2017
  3. 3. INTRODUCTIONS: ME § Chris Fregly, Founder & Engineer @PipelineAI § Formerly Netflix, Databricks, IBM Spark Tech § Advanced Spark and TensorFlow Meetup § Please Join Our 60,000+ Global Members!! Contact Me chris@pipeline.ai @cfregly Global Locations * San Francisco * Chicago * Austin * Washington DC * Dusseldorf * London
  4. 4. INTRODUCTIONS: YOU § Software Engineer, DevOps Engineer, Data {Scientist, Engineer} § Interested in Optimizing and Deploying TF Models to Production § Nice to Have a Working Knowledge of TensorFlow (Not Required)
  5. 5. PIPELINE.AI IS 100% OPEN SOURCE § https://github.com/PipelineAI/pipeline/ § Please Star 🌟 this GitHub Repo! § Some VC’s Value GitHub Stars @ $1,500 Each (?!)
  6. 6. PIPELINE.AI OVERVIEW 450,000 Docker Downloads 60,000 Users Registered for GA 60,000 Meetup Members 40,000 LinkedIn Followers 2,200 GitHub Stars 12 Enterprise Beta Users
  7. 7. WHY HEAVY FOCUS ON MODEL SERVING? Model Training Batch & Boring Offline in Research Lab Pipeline Ends at Training No Insight into Live Production Small Number of Data Scientists Optimizations Very Well-Known Real-Time & Exciting!! Online in Live Production Pipeline Extends into Production Continuous Insight into Live Production Huuuuuuge Number of Application Users **Many Optimizations Not Yet Utilized <<< Model Serving 100’s Training Jobs per Day 1,000,000’s Predictions per Sec
  8. 8. AGENDA Part 0: Latest PipelineAI Research Part 1: PipelineAI + Kubernetes + Istio
  9. 9. AGENDA Part 0: Latest PipelineAI Research § Deploy, Tune Models + Runtimes Safely in Prod § Compare Models Both Offline and Online § Auto-Shift Traffic to Winning Model or Cloud § Live, Continuous Model Training in Production
  10. 10. PACKAGE MODEL + RUNTIME AS ONE § Build Model with Runtime into Immutable Docker Image § Emphasize Immutable Deployment and Infrastructure § Same Runtime Dependencies in All Environments § Local, Development, Staging, Production § No Library or Dependency Surprises § Deploy and Tune Model + Runtime Together pipeline predict-server-build --model-type=tensorflow --model-name=mnist --model-tag=A --model-path=./models/tensorflow/mnist/ Build Local Model Server A
  11. 11. LOAD TEST LOCAL MODEL + RUNTIME § Perform Mini-Load Test on Local Model Server § Immediate, Local Prediction Performance Metrics § Compare to Previous Model + Runtime Variations pipeline predict-server-start --model-type=tensorflow --model-name=mnist --model-tag=A pipeline predict --model-endpoint-url=http://localhost:8080 --test-request-path=test_request.json --test-request-concurrency=1000 Load Test Local Model Server A Start Local Model Server A
  12. 12. PUSH IMAGE TO DOCKER REGISTRY § Supports All Public + Private Docker Registries § DockerHub, Artifactory, Quay, AWS, Google, … § Or Self-Hosted, Private Docker Registry pipeline predict-server-push --image-registry-url=<your-registry> --image-registry-repo=<your-repo> --model-type=tensorflow --model-name=mnist --model-tag=A Push Image To Docker Registry
  13. 13. CLOUD-BASED OPTIONS § AWS SageMaker § Released Nov 2017 @ Re-invent § Custom Docker Images for Training/Serving (ie. PipelineAI Images) § Distributed TensorFlow Training through Estimator API § Traffic Splitting for A/B Model Testing § Google Cloud ML Engine § Mostly Command-Line Based § Driving TensorFlow Open Source API (ie. Experiment API) § Azure ML
  14. 14. TUNE MODEL + RUNTIME AS SINGLE UNIT § Model Training Optimizations § Model Hyper-Parameters (ie. Learning Rate) § Reduced Precision (ie. FP16 Half Precision) § Post-Training Model Optimizations § Quantize Model Weights + Activations From 32-bit to 8-bit § Fuse Neural Network Layers Together § Model Runtime Optimizations § Runtime Configs (ie. Request Batch Size) § Different Runtimes (ie. TensorFlow Lite, Nvidia TensorRT)
  15. 15. POST-TRAINING OPTIMIZATIONS § Prepare Model for Serving § Simplify Network § Reduce Model Size § Lower Precision for Fast Math § Some Tools § Graph Transform Tool (GTT) § tfcompile After Training After Optimizing! pipeline optimize --optimization-list=[quantize_weights, tfcompile] --model-type=tensorflow --model-name=mnist --model-tag=A --model-path=./tensorflow/mnist/model --output-path=./tensorflow/mnist/optimized_model Linear Regression
  16. 16. RUNTIME OPTION: TENSORFLOW LITE § Post-Training Model Optimizations § Currently Supports iOS and Android § On-Device Prediction Runtime § Low-Latency, Fast Startup § Selective Operator Loading § 70KB Min - 300KB Max Runtime Footprint § Supports Accelerators (GPU, TPU) § Falls Back to CPU without Accelerator § Java and C++ APIs
  17. 17. RUNTIME OPTION: NVIDIA TENSOR-RT § Post-Training Model Optimizations § Specific to Nvidia GPU § GPU-Optimized Prediction Runtime § Alternative to TensorFlow Serving § PipelineAI Supports TensorRT!
  18. 18. DEPLOY MODELS SAFELY TO PROD § Deploy from CLI or Jupyter Notebook § Tear-Down or Rollback Models Quickly § Shadow Canary Deploy: ie.20% Live Traffic § Split Canary Deploy: ie. 97-2-1% Live Traffic pipeline predict-cluster-start --model-runtime=tflite --model-type=tensorflow --model-name=mnist --model-tag=B --traffic-split=2 Start Production Model Cluster B pipeline predict-cluster-start --model-runtime=tensorrt --model-type=tensorflow --model-name=mnist --model-tag=C --traffic-split=1 Start Production Model Cluster C pipeline predict-cluster-start --model-runtime=tfserving_gpu --model-type=tensorflow --model-name=mnist --model-tag=A --traffic-split=97 Start Production Model Cluster A
  19. 19. AGENDA Part 0: Latest PipelineAI Research § Deploy, Tune Models + Runtimes Safely in Prod § Compare Models Both Offline and Online § Auto-Shift Traffic to Winning Model or Cloud § Live, Continuous Model Training in Production
  20. 20. COMPARE MODELS OFFLINE & ONLINE § Offline, Batch Metrics § Validation + Training Accuracy § CPU + GPU Utilization § Live Prediction Values § Compare Relative Precision § Newly-Seen, Streaming Data § Online, Real-Time Metrics § Response Time, Throughput § Cost ($) Per Prediction
  21. 21. VIEW REAL-TIME PREDICTION STREAM § Visually Compare Real-Time Predictions Prediction Inputs Prediction Results & Confidences Model B Model CModel A
  22. 22. PREDICTION PROFILING AND TUNING § Pinpoint Performance Bottlenecks § Fine-Grained Prediction Metrics § 3 Steps in Real-Time Prediction 1. transform_request() 2. predict() 3. transform_response()
  23. 23. AGENDA Part 0: Latest PipelineAI Research § Deploy, Tune Models + Runtimes Safely in Prod § Compare Models Both Offline and Online § Auto-Shift Traffic to Winning Model or Cloud § Live, Continuous Model Training in Production
  24. 24. LIVE, ADAPTIVE TRAFFIC ROUTING § A/B Tests § Inflexible and Boring § Multi-Armed Bandits § Adaptive and Exciting! pipeline traffic-router-split --model-type=tensorflow --model-name=mnist --model-tag-list=[A,B,C] --model-weight-list=[1,2,97] Adjust Traffic Routing Dynamically
  25. 25. SHIFT TRAFFIC TO MAX(REVENUE) § Shift Traffic to Winning Model using AI Bandit Algos
  26. 26. SHIFT TRAFFIC TO MIN(CLOUD CO$T) § Based on Cost ($) Per Prediction § Cost Changes Throughout Day § Lose AWS Spot Instances § Google Cloud Becomes Cheaper § Shift Across Clouds & On-Prem
  27. 27. AGENDA Part 0: Latest PipelineAI Research § Deploy, Tune Models + Runtimes Safely in Prod § Compare Models Both Offline and Online § Auto-Shift Traffic to Winning Model or Cloud § Live, Continuous Model Training in Production
  28. 28. LIVE, CONTINUOUS MODEL TRAINING § The Holy Grail of Machine Learning § Q1 2018: PipelineAI Supports Continuous Model Training! § Kafka, Kinesis § Spark Streaming
  29. 29. PSEUDO-CONTINUOUS TRAINING § Identify and Fix Borderline Predictions (~50-50% Confidence) § Fix Along Class Boundaries § Retrain Newly-Labeled Data § Game-ify Labeling Process § Enable Crowd Sourcing
  30. 30. DEMOS!! § https://github.com/PipelineAI/pipeline/ § Please Star 🌟 this GitHub Repo!
  31. 31. AGENDA Part 0: Latest PipelineAI Research Part 1: PipelineAI + Kubernetes + Istio
  32. 32. SPECIAL THANKS TO CHRISTIAN POSTA § http://blog.christianposta.com/istio-workshop/slides/
  33. 33. KUBERNETES INGRESS § Single Service § Can also use Service (LoadBalancer or NodePort) § Fan Out & Name-Based Virtual Hosting § Route Traffic Using Path or Host Header § Reduces # of load balancers needed § 404 Implemented as default backend § Federation / Hybrid-Cloud § Creates Ingress objects in every cluster § Monitors health and capacity of pods within each cluster § Routes clients to appropriate backend anywhere in federation apiVersion: extensions/v1beta1 kind: Ingress metadata: name: gateway-fanout annotations: kubernetes.io/ingress.class: istio spec: rules: - host: foo.bar.com http: paths: - path: /foo backend: serviceName: s1 servicePort: 80 - path: /bar backend: serviceName: s2 servicePort: 80 Fan Out (Path) apiVersion: extensions/v1beta1 kind: Ingress metadata: name: gateway-virtualhost annotations: kubernetes.io/ingress.class: istio spec: rules: - host: foo.bar.com http: paths: backend: serviceName: s1 servicePort: 80 - host: bar.foo.com http: paths: backend: serviceName: s2 servicePort: 80 Virtual Hosting
  34. 34. KUBERNETES INGRESS CONTROLLER § Ingress Controller Types § Google Cloud: kubernetes.io/ingress.class: gce § Nginx: kubernetes.io/ingress.class: nginx § Istio: kubernetes.io/ingress.class: istio § Must Start Ingress Controller Manually § Just deploying Ingress is not enough § Not started by kube-controller-manager § Start Istio Ingress Controller kubectl apply -f $ISTIO_INSTALL_PATH/install/kubernetes/istio.yaml
  35. 35. ISTIO ARCHITECTURE: ENVOY § Lyft Project § High-perf Proxy (C++) § Lots of Metrics § Zone-Aware § Service Discovery § Load Balancing § Fault Injection, Circuits § %-based Traffic Split, Shadow § Sidecar Pattern § Rate Limiting, Retries, Outlier Detection, Timeout with Budget, …
  36. 36. ISTIO ARCHITECTURE: MIXER § Enforce Access Control § Evaluate Request-Attrs § Collect Metrics § Platform-Independent § Extensible Plugin Model
  37. 37. ISTIO ARCHITECTURE: PILOT § Envoy service discovery § Intelligent routing § A/B Tests § Canary deployments § RouteRule->Envoy conf § Propagates to sidecars § Supports Kube, Consul, ...
  38. 38. ISTIO ARCHITECTURE: ISTIO-AUTH § Mutual TLS Auth § Credential management § Uses Service-identity § Canary deployments § Fine-grained ACLs § Attribute & role-based § Auditing & monitoring
  39. 39. ISTIO ROUTE RULES § Kubernetes Custom Resource Definition (CRD) kind: CustomResourceDefinition metadata: name: routerules.config.istio.io spec: group: config.istio.io names: kind: RouteRule listKind: RouteRuleList plural: routerules singular: routerule scope: Namespaced version: v1alpha2
  40. 40. A/B & BANDIT MODEL TESTING § Live Experiments in Production § Compare Existing Model A with Model B, Model C § Safe Split-Canary Deployment § Tip: Keep Ingress Simple – Use Route Rules Instead! apiVersion: config.istio.io/v1alpha2 kind: RouteRule metadata: name: live-experiment-20-5-75 spec: destination: name: predict-mnist precedence: 2 # Greater than global deny-all route: - labels: version: A weight: 20 # 20% still routes to model A - labels: version: B # 5% routes to new model B weight: 5 - labels: version: C # 75% routes to new model C weight: 75 apiVersion: config.istio.io/v1alpha2 kind: RouteRule metadata: name: live-experiment-1-2-97 spec: destination: name: predict-mnist precedence: 2 # Greater than global deny-all route: - labels: version: A weight: 1 # 1% routes to model A - labels: version: B # 2% routes to new model B weight: 2 - labels: version: C # 97% routes to new model C weight: 97 apiVersion: config.istio.io/v1alpha2 kind: RouteRule metadata: name: live-experiment-97-2-1 spec: destination: name: predict-mnist precedence: 2 # Greater than global deny-all route: - labels: version: A weight: 97 # 97% still routes to model A - labels: version: B # 2% routes to new model B weight: 2 - labels: version: C # 1% routes to new model C weight: 1
  41. 41. ISTIO AUTO-SCALING § Traffic Routing and Auto-Scaling Occur Independently § Istio Continues to Obey Traffic Splits After Auto-Scaling § Auto-Scaling May Occur In Response to New Traffic Route
  42. 42. ADVANCED ROUTING RULES § Content-based Routing § Uses headers, username, payload, … § Cross-Environment Routing § Shadow traffic prod => staging
  43. 43. ISTIO DESTINATION POLICIES § Load Balancing § ROUND_ROBIN (default) § LEAST_CONN (between 2 randomly-selected hosts) § RANDOM § Circuit Breaker § Max connections § Max requests per conn § Consecutive errors § Penalty timer (15 mins) § Scan windows (5 mins) circuitBreaker: simpleCb: maxConnections: 100 httpMaxRequests: 1000 httpMaxRequestsPerConnection: 10 httpConsecutiveErrors: 7 sleepWindow: 15m httpDetectionInterval: 5m
  44. 44. ISTIO EGRESS § Whilelisted Domains Accessible Within Service Mesh § Apply RoutingRules and DestinationPolicys § Supports TLS, HTTP, GRPC kind: EgressRule metadata: name: foo-egress-rule spec: destination: service: api.pipeline.ai ports: - port: 80 protocol: http - port: 443 protocol: https
  45. 45. ISTIO & CHAOS + LATENCY MONKIES § Fault Injection § Delay § Abort kind: RouteRule metadata: name: predict-mnist spec: destination: name: predict-mnist httpFault: abort: httpStatus: 420 percent: 100 kind: RouteRule metadata: name: predict-mnist spec: destination: name: predict-mnist httpFault: delay: fixedDelay: 7.000s percent: 100
  46. 46. ISTIO METRICS AND MONITORING § Verify Traffic Splits § Fine-Grained Request Tracing
  47. 47. ISTIO SECURITY § Istio Certificate Authority § Mutual TLS
  48. 48. AGENDA Part 0: Latest PipelineAI Research Part 1: PipelineAI + Kubernetes + Istio
  49. 49. THANK YOU!! § https://github.com/PipelineAI/pipeline/ § Please Star 🌟 this GitHub Repo! § Reminder: VC’s Value GitHub Stars @ $1,500 Each (!!) Contact Me chris@pipeline.ai @cfregly

×