Advertisement

Apache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline

Developer Marketing and Relations at MuleSoft
Nov. 24, 2020
Advertisement

More Related Content

Slideshows for you(20)

Similar to Apache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline(20)

Advertisement

More from Databricks(20)

Advertisement

Apache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline

  1. Apache Liminal (Incubating) Orchestrate the Machine Learning Pipeline
  2. Aviem Zur ● Data tech lead @ Natural Intelligence ● Data frameworks and platforms specialist ● Open source specialist ● PPMC Member, Apache Liminal ● PMC Member, Apache Beam ● Magic: The Gathering player Lior Schachter ● Product & Cyber @ 2000 | IL Intelligence corps ● DSLs @ 2007 | Marketing Automation & Academia ● Internet & BigData @ 2009 | AdTech ● ML/AI @ 2015 | Marketing Automation ● CTO @ 2018-.. | Natural Intelligence ● PPMC Member @ 2020-.. | Apache Liminal Who are we?
  3. Motivation
  4. ML/AI tech & processes have rapidly evolved in recent years
  5. ... and are still shaping: Monitoring, AutoML, MLOps, etc.
  6. Natural Intelligence - A global leader in multi-vertical online comparison marketplaces - Our matching technology enables consumers to make confident purchasing decisions while helping brands grow their business
  7. NI started the journey to automate its core business using ML/AI 1.5 years ago with main focus on: - Website personalization - Adwords bidding
  8. We decided to continue working with proven solutions that we already utilized in our data platform
  9. The Orchestration Barrier - Diversity in infra (e.g. GCP, AWS) - Numerous platforms and libraries - Diversified skill-set - Complex workflows
  10. Impact - Data Scientists can’t focus in algorithms & business-logic - The time-to-market (TTM) of ML features & solutions is often too long and unpredictable.
  11. The Liminal Approach
  12. Let data scientists focus on data science… from research to production
  13. Plugin Architecture Minimalistic Scalable Orchestration DSL Extensible in Python Open Source
  14. The Liminal Way
  15. Build & Train Deploy & Manage Monitor Fetch Clean Prepare Train Evaluate Batch Inference Realtime Inference Validate Deploy Data Science infra Algorithms, Frameworks, Auto Tuning,... Data Lake & Infra Data Stores Meta Store Workflow and Scheduling Processing Engines Feature StoreModel Store The Problem
  16. name: MyDataScienceApp owner: Bosco Albert Baracus
  17. name: MyDataScienceApp owner: Bosco Albert Baracus services: - service: name: my_datascience_server type: python_server description: my ds server image: myorg/mydatascienceapp source: . endpoints: - endpoint: /predict module: serving function: predict
  18. name: MyDataScienceApp owner: Bosco Albert Baracus pipelines: - pipeline: my_datascience_pipeline schedule: 0 9 * * * metrics: namespace: DataScience backends: ['cloudwatch'] tasks: - task: train type: python description: train model image: myorg/mydatascienceapp cmd: python -u training.py train - task: validate type: python description: validate model and deploy image: myorg/mydatascienceapp cmd: python -u training.py validate services: - service: name: my_datascience_server type: python_server description: my ds server image: myorg/mydatascienceapp source: . endpoints: - endpoint: /predict module: serving function: predict
  19. import pickle import time import boto3 BUCKET = 'ni-ml-ab-dev' PRODUCTION = 'production' CANDIDATE = 'candidate' _ONE_HOUR = 60 * 60 class ModelStore: def __init__(self, env): self.env = env self._latest_model = None self._latest_version = None self._last_check = time.time() self._s3 = boto3.client('s3') def _download_latest_model(self): file_path = '/tmp/downloaded_model.p' s3_objects = self._s3.list_objects(Bucket=BUCKET, Prefix=self.env)['Contents'] models = list( reversed(sorted([obj['Key'] for obj in s3_objects if obj['Key'].endswith('.p')])) ) latest_s3_key = models[0] version = latest_s3_key.split('/')[1] print(f'Loading model version {version}') self._s3.download_file(BUCKET, latest_s3_key, file_path) return pickle.load(open(file_path, 'rb')), version def load_latest_model(self, force=False): if not self._latest_model or time.time() - self._last_check > _ONE_HOUR or force: self._latest_model, self._latest_version = self._download_latest_model() return self._latest_model, self._latest_version def save_model(self, model, version): pickle.dump(model, open("/tmp/model.p", "wb")) model_pkl = open("/tmp/model.p", "rb").read() s3_key = f'{self.env}/{version}/model.p' self._s3.put_object(Bucket=BUCKET, Key=s3_key, Body=model_pkl)
  20. boto3==1.15.18 scikit-learn==0.23.2 liminal
  21. import json import model_store from model_store import ModelStore _MODEL_STORE = ModelStore(model_store.PRODUCTION) _PETAL_WIDTH = 'petal_width' def predict(input_json): """ predicts probability input flower being Iris Virginica """ print(f'input_json={input_json}') input_dict = json.loads(input_json) model, version = _MODEL_STORE.load_latest_model() result = str(model.predict_proba([[input_dict[_PETAL_WIDTH]]])[0][1]) print(f'result={result}') return result
  22. import sys import time import model_store import numpy as np from model_store import ModelStore from sklearn import datasets from sklearn.linear_model import LogisticRegression _CANDIDATE_MODEL_STORE = ModelStore(model_store.CANDIDATE) _PRODUCTION_MODEL_STORE = ModelStore(model_store.PRODUCTION) def train_model(): iris = datasets.load_iris() X = iris["data"][:, 3:] # petal width y = (iris["target"] == 2).astype(np.int) model = LogisticRegression() model.fit(X, y) version = round(time.time()) print(f'Saving model with version {version} to candidate model store.') _CANDIDATE_MODEL_STORE.save_model(model, version) def validate_model(): model, version = _CANDIDATE_MODEL_STORE.load_latest_model() print(f'Validating model with version {version} to candidate model store.') if not isinstance(model.predict([[1]]), np.ndarray): raise ValueError('Invalid model') print(f'Deploying model with version {version} to production model store.') _PRODUCTION_MODEL_STORE.save_model(model, version) if __name__ == '__main__': cmd = sys.argv[1] if cmd == 'train': train_model() elif cmd == 'validate': validate_model() else: raise ValueError(f"Unknown command {cmd}")
  23. ⇒ # build images from user code ⇒ liminal build ... ⇒ # deploy liminal.yml file ⇒ liminal deploy ... ⇒ # start server ⇒ liminal start ... liminal cli
  24. Pipeline run
  25. Request
  26. Server Logs
  27. What’s next? Model Store Cloud Support CI Integrations Open Source Community Experiment Tracking User Interface ML Integrations (Kubeflow, MLflow, Feature stores, ..)
  28. Join the effort @: http://liminal.apache.org/ https://github.com/apache/incubator-liminal Apache JIRA
Advertisement