Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

KFServing, Model Monitoring with Apache Spark and a Feature Store

Download to read offline

In recent years, MLOps has emerged to bring DevOps processes to the machine learning (ML) development process, aiming at more automation in the execution of repetitive tasks and at smoother interoperability between tools. Among the different stages in the ML lifecycle, model monitoring involves the supervision of model performance over time, involving the combination of techniques in four categories: outlier detection, data drift detection, explainability and adversarial attacks. Most existing model monitoring tools follow a scheduled batch processing approach or analyse model performance using isolated subsets of the inference data. However, for the continuous monitoring of models, stream processing platforms show several advantages, including support for continuous data analytics, scalable processing of large amounts of data and first-class support for window-based aggregations useful for concept drift detection.

In this talk, we present an open-source platform for serving and monitoring models at scale based on Kubeflow’s model serving framework, KFServing, the Hopsworks Online Feature Store for enriching feature vectors with transformer in KFServing, and Spark and Spark Streaming as general purpose frameworks for monitoring models in production.

We also show how Spark Streaming can use the Hopsworks Feature Store to implement continuous data drift detection, where the Feature Store provides statistics on the distribution of feature values in training, and Spark Streaming computes the statistics on live traffic to the model, alerting if the live traffic differs significantly from the training data. We will include a live demonstration of the platform in action.

  • Be the first to like this

KFServing, Model Monitoring with Apache Spark and a Feature Store

  1. 1. KFServing, model monitoring with Spark and a Feature Store Jim Dowling and Javier de la Rúa Martínez Logical Clocks AB
  2. 2. Machine Learning (ML) with Information Dense Input Input Image is Information Dense JellyFish AI Complex behaviour from information dense signals. But, no brain to enrich with history or context. Behaviour is autonomic based only on the input. NLP Input Text is Information Dense Encoder1 Encoder24 [ Image from https://dl.acm.org/doi/fullHtml/10.1145/3329784 ] “Python can do everything that PySpark can do and more” 99% Spam Bert Large
  3. 3. ML with Information Light Input - Enrich with History/Context Web Search Input signal: characters Enrich with: your history, profile, context, location. Fraud: Transfer Money Input signal: customer/bank ID, $$$ Enrich with: your credit, historical transfers, location, bank’s ranking. 5G Edge Security* Input signal: IP packets Enrich with: your device history, traffic flow characteristics. *Image from https://www.ericsson.com/en/blog/2021/3/5g-edge-computing-gaming
  4. 4. The Feature Store enables AI-Enabled Products Model AI-enabled Product uses Feature Store Add Context / History Enterprise Data 0. Pipelines Continually Update Features 1. Predict 2. Enrich
  5. 5. Hopsworks Online Feature Store - RonDB https://www.logicalclocks.com/blog/ai-ml-needs-a-key-value-store-and-redis-is-not-up-to-it https://github.com/logicalclocks/rondb RonDB is an open-source LATS Database RonDB out-performs Redis on a 32-core Server
  6. 6. Kafka Input RTFeatureGroup ClickFeatureGroup TableFeatureGroup UserFeatureGroup LogsFeatureGroup User Clicks DB Updates User Profile Updates Weblogs Real-time features Event Data SQL SQL DW S3, HDFS Feature Store Train, Batch App Model Serving DataFrame API Low Latency Features High Latency Features Feature Pipelines, Online/Offline Feature Store Online Offline User
  7. 7. ▪ Join (reuse) features and materialize as training datasets ▪ File formats: TFRecord, NPY, CSV, PETASTORM, etc Representing Models in the Feature Store with Training Datasets transaction_type transaction_amount user_id user_nationality user_gender transactions_fg users_fg Feature Groups Training Datasets pk join transactions_2020_td Descriptive Statistics Feature Correlations Histograms ... Baseline Statistics used for Data Drift Detection fraud_classifier Models
  8. 8. KubeFlow Model Serving (KFServing) with a Feature Store
  9. 9. Local Remote AI-Enabled Product Online Feature Store 1. 3. 4. 2. KFServing with an Online Feature Store 1. Request Features 2. Return Enriched Feature Vector 3. Prediction Request 4. Make Prediction & Return Result td = fs.get_training_dataset("card_fraud_model", 1) input_keys = { “cc_num” : ... } fv = td.get_serving_vector(input_keys) 1. Request Features KFServing
  10. 10. KFServing with an Online Feature Store Local Remote AI-Enabled Product KFServing Online Feature Store 1. 2. 3. 4. 1. Prediction Request 2. Request Features 3. Return Enriched Feature Vector 4. Make Prediction & Return Result class Transformer: def _init_(self): self.fs = #connect to feature store self.td = self.fs.get_training_dataset("card_fraud_model") def preprocess(inputs): return td.get_serving_vector(inputs["cc_num"]) 2. Request Features from inside the KFServing Transformer
  11. 11. KFServing Internals KFServing Internals ● KFServing Supports Complex inference pipelines ○ Transformer, Explainer, Multi-model serving Online Feature Store [Image from https://www.kubeflow.org/docs/components/kfserving/kfserving/ ]
  12. 12. [Image from https://www.kubeflow.org/docs/components/kfserving/kfserving/ ]
  13. 13. Model Monitoring with KFServing and Hopsworks
  14. 14. AI Data Lifecycle - Model Serving to Feature Store Feature Data Model Registry Training Artifacts (Logs, Experiments) Model Serving Training Data Feature Vectors Inference Data, Stats, New Training Data Feed the AI Data Flywheel
  15. 15. AI Data Lifecycle - KFServing to Hopsworks Model Registry Training Artifacts (Logs, Experiments) KFServing Training Data Feature Vectors Inference Data Kafka Hopsworks Feature Store
  16. 16. Add support for Kafka/Spark Logging to KFServing KFServing ● Enable automated ingestion to Feature Store ○ Hopsworks can automatically create an Avro schema for the target Training Dataset ● Enable live monitoring of inference data with Spark Streaming Kafka Transformer Predictor cc_num long num_trans_12h avg_trans_1h std_trans_10m long, double, float fraud bool request response
  17. 17. Inference Data AI-Enabled Product Kafka Feature Vector Context Online Feature Store Offline Feature Store Baseline Statistics Inference data Evaluation Data drift, outliers Online Model Monitoring with Spark Streaming Request Response KFServing Hopsworks Feature Store
  18. 18. Live inference data is an unbounded data stream Stateful, global window-based monitoring on inference data. Use Feature Store APIs to access descriptive statistics of the training set to help identify data drift and outliers compared to the live inference data. Challenges in Online Model Monitoring with Spark Streaming
  19. 19. Usage example Windowed Outliers Pipe Windowed Drift Pipe Stats Outliers Pipe Stats Drift Pipe Outliers Pipe Drift Pipe Monitor pipe Window pipe Stats pipe Sink Pipe Alerts Reports Insights Inference data Spark Streaming For Online Model Monitoring Scalable Architecture for Automating Machine Learning Model Monitoring http://kth.diva-portal.org/smash/get/diva2:1464577/FULLTEXT01.pdf ]
  20. 20. Inference Data ● Interactive Queries to debug the Model ● Interactive Queries to debug Inference Data ● Inspect Model KPIs Charts ● Inspect Model Serving Performance Charts ● Identify Model/Data Drift ● Interactive Queries to Audit Logs Model Monitoring with Evaluation/Feature Store Evaluation Store Feature Store ML Engineer Data Scientist ● Understand Live Model Performance ● Use new Training Data Kafka
  21. 21. Unified Feature and Data Drift Detection Hopsworks Feature Store Model Registry Training Artifacts (Logs, Experiments) KFServing Training Data Feature Vectors Deequ Data Validation Feature Pipelines Feature Drift Data Drift Inference Data Outcomes
  22. 22. Reuse Deequ Data Validation Rules in Hopsworks* # Insert and validate feature data using the following expectation expect = fs.create_expectation(..., rules=[ Rule(name="HAS_MIN", level="WARNING", min=0), Rule(name="HAS_MAX", level="ERROR", max=1000000) ]) pipeline_fg = fs.create_feature_group(..., expectations=[expect] ) pipeline_df = # dataframe from feature pipeline pipeline_fg.insert(pipeline_df) # Expectations are validated on ingestion # Insert inference data and validate using the following expectation td = fs.get_training_dataset("model", version=1) log_expect = fs.create_expectation(..., rules=[ Rule(name="HAS_MIN", level="WARNING", min=td.stats[‘feature’].min), Rule(name="HAS_MAX", level="ERROR", max=td.stats[‘feature’].max)]) logging_fg = fs.create_feature_group(..., expectations=[log_expect]) logging_df = # dataframe from prediction logging logging_fg.insert(logging_df) # Rule evaluated on ingestion *https://examples.hopsworks.ai/featurestore/hsfs/data_validation/feature_validation_python/
  23. 23. DEMO
  24. 24. github.com/logicalclocks www.hopsworks.ai @logicalclocks This work was part-funded by the Aniara Project (led by Ericsson), EU Celtic-Next.
  25. 25. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.

In recent years, MLOps has emerged to bring DevOps processes to the machine learning (ML) development process, aiming at more automation in the execution of repetitive tasks and at smoother interoperability between tools. Among the different stages in the ML lifecycle, model monitoring involves the supervision of model performance over time, involving the combination of techniques in four categories: outlier detection, data drift detection, explainability and adversarial attacks. Most existing model monitoring tools follow a scheduled batch processing approach or analyse model performance using isolated subsets of the inference data. However, for the continuous monitoring of models, stream processing platforms show several advantages, including support for continuous data analytics, scalable processing of large amounts of data and first-class support for window-based aggregations useful for concept drift detection. In this talk, we present an open-source platform for serving and monitoring models at scale based on Kubeflow’s model serving framework, KFServing, the Hopsworks Online Feature Store for enriching feature vectors with transformer in KFServing, and Spark and Spark Streaming as general purpose frameworks for monitoring models in production. We also show how Spark Streaming can use the Hopsworks Feature Store to implement continuous data drift detection, where the Feature Store provides statistics on the distribution of feature values in training, and Spark Streaming computes the statistics on live traffic to the model, alerting if the live traffic differs significantly from the training data. We will include a live demonstration of the platform in action.

Views

Total views

133

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

12

Shares

0

Comments

0

Likes

0

×