Advertisement

Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala and Vinay Sridhar

Developer Marketing and Relations at MuleSoft
Jun. 14, 2018
Advertisement

More Related Content

Similar to Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala and Vinay Sridhar(20)

Advertisement

More from Databricks(20)

Recently uploaded(20)

Advertisement

Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala and Vinay Sridhar

  1. Operationalizing Edge Machine Learning with Apache Spark ParallelM
  2. 2 Growing AI Investments; Few Deployed at Scale Source: “Artificial Intelligence: The Next Digital Frontier?”, McKinsey Global Institute, June 2017 Out of 160 reviewed AI use cases: 88% did not progress beyond the experimental stage But successful early AI adopters report: Profit margins 3–15% higher than industry average 20% AI in Production 80% Developing, Experimenting, Contemplating Survey of 3073 AI-aware C-level Executives
  3. 3 Challenges of Deploying & Managing ML in Production • Diverse focus and expertise of Data Science & Ops teams • Increased risk from non-deterministic nature of ML • Current Operations solutions do not address uniqueness of ML Apps
  4. 4 Challenges of Edge/Distributed Topologies • Varied resources at each level • Scale, heterogeneity, disconnected operation IoT is Driving Explosive Growth in Data Volume “Things” Edge and Network Data Center/Cloud Data Lake
  5. 5 What We Need For Operational ML • Accelerate deployment & facilitate collaboration between Data & Ops teams • Monitor validity of ML predictions, diagnose data and ML performance issues • Orchestrate training, update, and configuration of ML pipelines across distributed, heterogeneous infrastructure with tracking
  6. 6 What We Need For Edge Operational ML • Distribute analytics processing to the optimal point for each use case • Flexible management framework enables: • Secure centralized and/or local learning, prediction, or combined learning/prediction • Granular monitoring and control of model update policies • Support multi-layer topologies to achieve maximum scale while accommodating low bandwidth or unreliable connectivity Sources Edge Intelligence Streams Data Lake Central/Cloud Intelligence Batches Edge/Cloud Orchestration
  7. 7 ML Orchestration ML Health Business Impact Model Governance Continuous Integration/ Deployment Database Business Value Machine Learning Models MLOps – Managing the full Production ML Lifecycle
  8. 8 Models, Retraining Control, Statistics Events, Alerts Data Data Science Platforms Data Streams Data Lakes Our Approach MCenter MCenter Server MCenter Agent MCenter Agent MCenter Agent MCenter Agent and more… Analytic Engines MCenter Developer Connectors
  9. 9 Operational Abstraction • Link pipelines (training and inference) via an ION (Intelligence Overlay Network) • Basically a Directed Graph representation with allowance for cycles • Pipelines are DAGs within each engine • Distributed execution over heterogeneous engines, programming languages and geographies Policy based Update Example – KMeans Batch Training Plus Streaming Inference Anomaly Detection
  10. 10 An Example ION to Resource Mapping Human Approved Model Update Every 5 min - EdgeEvery Tuesday at 10AM - Cloud Sources Edge Intelligence Streams Data Lake Central/Cloud Intelligence Batches Models
  11. 11 Pipeline Examples Training Pipeline (SparkML) Inference Pipeline (SparkML)
  12. 12 Instrument, Upload, Orchestrate, Monitor 12
  13. 13 Integrating with Analytics Engines (Spark) Job Management • Via SparkLauncher: A library to control launching, monitoring and terminating jobs • PM Agent communicates with Spark through this library for job management (also uses Java API to launch child processes) Statistics • Via SparkListener: A Spark-driver callback service • SparkListener taps into all accumulators which, is one of the popular ways to expose statistics • PM agent communicates with the Spark driver and exposes statistics via a REST endpoint ML Health / Model collection and updates • PM Agent delivers and receives health events, health objects and models via sockets from custom PM components in the ML Pipeline
  14. 14 Demo Description Training Inference
  15. Thank You! Nisha Talagala nisha.talagala@parallelm.com Vinay Sridhar vinay.sridhar@parallelm.com
  16. 16 Data Lake What We Need For Edge Operational ML • Distribute analytics processing to the optimal point for each use case • Flexible management framework enables: • Secure centralized and/or local learning, prediction, or combined learning/prediction • Granular monitoring and control of model update policies • Support multi-layer topologies to achieve maximum scale while accommodating low bandwidth or unreliable connectivity Central/Cloud Intelligence Sources Edge Intelligence Batches Streams Edge/Cloud Orchestration
  17. 17 Integrating with Analytics Engines (TensorFlow) Job Management • TensorFlow Python programs run as standalone applications • Standard process control mechanisms based on the OS is used to monitor and control TensorFlow programs Statistics Collection • PM Agent parses contents via TensorBoard log files to extract meaningful statistics and events that data scientists added ML Health / Model collection • Generation of models and health objects is recorded on a shared medium
  18. 18 An Example ION 18 Node 1: Inference Pipeline Node 2: (Re) Training Pipeline Model Node 3: Policy Model Every 5 min on Spark cluster 1 Every Tuesday at 10AM on Spark cluster 2 Human approves/rejects When: anytime there is a new model
Advertisement