Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Making Spark ML Models Portable - Know Your Options

319 views

Published on

After successfully training ML model with Apache Spark the next task becomes important - how to serve it? One way is to keep using Spark for serving as well, but sometimes it's not desired or possible. For instance if one would like to expose model as HTTP service, run in Docker container or use it on mobile device. This talk explores various approaches of how to allow model portability outside Spark to achieve this.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Making Spark ML Models Portable - Know Your Options

  1. 1. Feature Extraction Feature Engineering Feature Selection Model Selection Model Evaluation Score Calibration & Insights D A T A ETL LEARNING Best Model
  2. 2. Feature Extraction Feature Engineering Feature Selection D A T A ETL Scores & Insights Model Application Score Calibration & Insights Best Model INFERENCE
  3. 3. Feature Extraction Feature Engineering Feature Selection D A T A ETL Scores & Insights Model Application Score Calibration & Insights INFERENCE Best Model
  4. 4. Feature Extraction Feature Engineering Feature Selection D A T A ETL Scores & Insights Model Application Score Calibration & Insights INFERENCE Best Model Spark Application
  5. 5. Feature Extraction Feature Engineering Feature Selection Model Selection Model Evaluation Score Calibration D A T A S E T / R D D ETL LEARNING Best Model tokenize pivot impute tf-idf bucketize combine distribs logreg randfores t xgboost auroc fmeasure aupr calibrate loco
  6. 6. Model Application Feature Extraction Feature Engineering Feature Selection Score Calibration D A T A S E T / R D D ETL INFERENCE Scores & Insights tokenize impute tf-idf bucketize combine xgboost calibrate loco bucket s mean freqs booste r langs distrib s
  7. 7. D A T A S E T Train tokenize pivot impute tf-idf bucketize combine distribs logreg randfores t xgboost auroc fmeasure aupr calibrate loco Score tokenize impute tf-idf bucketize combine xgboost calibrate loco bucketsmean freqs booster langs distribs Train ( Dataset[T] ) => Score ( Dataset[T] ) => Dataset[S]
  8. 8. Python/R/JavaScriptSpark Runtime JVM + ?Spark Runtime (load) Spark Dependencies
  9. 9. Python/R/JavaScriptSpark Runtime JVM + ?Spark Runtime (load) Spark Dependencies
  10. 10. Python/R/JavaScriptSpark Runtime JVM + ?Spark Runtime (load) Spark Dependencies
  11. 11. Do you mind having Spark Runtime?Yes No Do you mind having JVM runtime? Do you need sub-ms latency? Yes No Custom runtim e Use Spark. EOF MLeap NoYes PFA Deep Learning? ONNX Yes No

×