Multi runtime serving pipelines for machine learning

•

0 likes•648 views

The talk I gave at Scale By The Bay. Deploying, Serving and monitoring machine learning models built with different ML frameworks in production. Envoy proxy powered serving mesh. TensorFlow, Spark ML, Scikit-learn and custom functions on CPU and GPU.

Software

Multi-Runtime Serving Pipelines
Stepan Pushkarev
CTO of Hydrosphere.io

Mission: Accelerate Machine Learning to Production
Opensource Products:
- Mist: Serverless proxy for Spark
- ML Lambda: ML Function as a Service
- Sonar: Data and ML Monitoring
Business Model: Subscription services and hands-on consulting
About

Deployment | Serving | Scoring | Inference
@Nvidia https://www.nvidia.com/en-us/deep-learning-ai/solutions/

- HTTP/1.1, HTTP/2, gRPC
- Kafka, Flink, Kinesis
- Protobuf, Avro
- Service Discovery
- Pipelining
- Tracing
- Monitoring
- Autoscaling
- Versioning
- A/B, Canary
- Testing
- CPU, GPU
API & Logistics

Monitoring
Shifting
experimentation
to production

Functions registry
responsible for the
model life cycle and
all the business logic
required to configure
models for serving
Mesh of serving
runtimes is an actual
serving cluster
Infrastructure
integration: ECS for
AWS, Kubernetes for GCE
and on premise

UX: Models and Applications
Applications provide public virtual endpoints for the
models and compositions of the models.

Why Not just one Big Neural Network?
● Not always possible
● Stages could be independent
● Ad-hoc rule based models
● Physics models (e.g. LIDAR)
● Big E2E DL Requires Black
Magic skills

Why Not just one Python script?
● Modularity. Stages could be developed by different teams
● Traceability and Monitoring
● Versioning
● Independent deployment, A/B testing and Canary
● Request Shadowing and other cool stuff
● Could require different ML runtimes (TF, Scikit, Spark
ML, etc)
● We need more microservices :)

Why Not just TF Serving?
● Other ML runtimes (DL4J, Scikit,
Spark ML). Servables are overkill.
● Need better versioning and
immutability (Docker per version)
● Don’t want to deal with state
(model loaded, offloaded, etc)
● Want to re-use microservices stack
(tracing, logging, metrics)
● Need better scalability

Thank you
- @hydrospheredata
- https://github.com/Hydrospheredata
- https://hydrosphere.io/
- spushkarev@hydrosphere.io

Any startup has to have a clear go-to-market strategy from the beginning. Similarly, any data science project has to have a go-to-production strategy from its first days, so it could go beyond proof-of-concept. Machine learning and artificial intelligence in production would result in hundreds of training pipelines and machine learning models that are continuously revised by teams of data scientists and seamlessly connected with web applications for tenants and users. In this demo-based talk we will walk through the best practices for simplifying machine learning operations across the enterprise and providing a serverless abstraction for data scientists and data engineers, so they could train, deploy and monitor machine learning models faster and with better quality.

Spark ML Pipeline serving

Stepan Pushkarev

My talk at Data Science Labs conference in Odessa. Training a model in Apache Spark while having it automatically available for real-time serving is an essential feature for end-to-end solutions. There is an option to export the model into PMML and then import it into a separated scoring engine. The idea of interoperability is great but it has multiple challenges, such as code duplication, limited extensibility, inconsistency, extra moving parts. In this talk we discussed an alternative solution that does not introduce custom model formats and new standards, not based on export/import workflow and shares Apache Spark API.

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models

Anyscale

Apache Spark has rapidly become a key tool for data scientists to explore, understand and transform massive datasets and to build and train advanced machine learning models. The question then becomes, how do I deploy these model to a production environment? How do I embed what I have learned into customer facing data applications? In this webinar, we will discuss best practices from Databricks on how our customers productionize machine learning models do a deep dive with actual customer case studies, show live tutorials of a few example architectures and code in Python, Scala, Java and SQL.

Spark and machine learning in microservices architecture

Stepan Pushkarev

ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens...

Databricks

The explosion of data volume in the years to come challenge the idea of a centralized cloud infrastructure which handles all business needs. Edge computing comes to rescue by pushing the needs of computation and data analysis at the edge of the network, thus avoiding data exchange when makes sense. One of the areas where data exchange could impose a big overhead is scoring ML models especially where data to score are files like images eg. in a computer vision application. Another concern in some applications, is that of keeping data as private as possible and this is where keeping things local makes sense. In this talk we will discuss current needs and recent advances in model serving, like newly introduced formats for pushing models at the edge nodes eg. mobile phones and how a unified model serving architecture could cover current and future needs for both data scientists and data engineers. This architecture is based among others, on training models in a distributed fashion with TensorFlow and leveraging Spark for cleaning data before training (eg. using TensorFlow connector). Finally we will describe a microservice based approach for scoring models back at the cloud infrastructure side (where bandwidth can be high) eg. using TensorFlow serving and updating models remotely with a pull model approach for edge devices. We will talk also about implementing the proposed architecture and how that might look on a modern deployment environment eg. Kubernetes.

Monitoring AI with AI

Stepan Pushkarev

Monitoring AI applications with AI The best performing offline algorithm can lose in production. The most accurate model does not always improve business metrics. Environment misconfiguration or upstream data pipeline inconsistency can silently kill the model performance. Neither prodops, data science or engineering teams are skilled to detect, monitor and debug such types of incidents. Was it possible for Microsoft to test Tay chatbot in advance and then monitor and adjust it continuously in production to prevent its unexpected behaviour? Real mission critical AI systems require advanced monitoring and testing ecosystem which enables continuous and reliable delivery of machine learning models and data pipelines into production. Common production incidents include: Data drifts, new data, wrong features Vulnerability issues, malicious users Concept drifts Model Degradation Biased Training set / training issue Performance issue In this demo based talk we discuss a solution, tooling and architecture that allows machine learning engineer to be involved in delivery phase and take ownership over deployment and monitoring of machine learning pipelines. It allows data scientists to safely deploy early results as end-to-end AI applications in a self serve mode without assistance from engineering and operations teams. It shifts experimentation and even training phases from offline datasets to live production and closes a feedback loop between research and production. Technical part of the talk will cover the following topics: Automatic Data Profiling Anomaly Detection Clustering of inputs and outputs of the model A/B Testing Service Mesh, Envoy Proxy, trafic shadowing Stateless and stateful models Monitoring of regression, classification and prediction models

KFServing, Model Monitoring with Apache Spark and a Feature Store

Databricks

In recent years, MLOps has emerged to bring DevOps processes to the machine learning (ML) development process, aiming at more automation in the execution of repetitive tasks and at smoother interoperability between tools. Among the different stages in the ML lifecycle, model monitoring involves the supervision of model performance over time, involving the combination of techniques in four categories: outlier detection, data drift detection, explainability and adversarial attacks. Most existing model monitoring tools follow a scheduled batch processing approach or analyse model performance using isolated subsets of the inference data. However, for the continuous monitoring of models, stream processing platforms show several advantages, including support for continuous data analytics, scalable processing of large amounts of data and first-class support for window-based aggregations useful for concept drift detection. In this talk, we present an open-source platform for serving and monitoring models at scale based on Kubeflow’s model serving framework, KFServing, the Hopsworks Online Feature Store for enriching feature vectors with transformer in KFServing, and Spark and Spark Streaming as general purpose frameworks for monitoring models in production. We also show how Spark Streaming can use the Hopsworks Feature Store to implement continuous data drift detection, where the Feature Store provides statistics on the distribution of feature values in training, and Spark Streaming computes the statistics on live traffic to the model, alerting if the live traffic differs significantly from the training data. We will include a live demonstration of the platform in action.

MLOps with a Feature Store: Filling the Gap in ML Infrastructure

Data Science Milan

A Feature Store enables machine learning (ML) features to be registered, discovered, and used as part of ML pipelines, thus making it easier to transform and validate the training data that is fed into machine learning systems. Feature stores can also enable consistent engineering of features between training and inference, but to do so, they need a common data processing platform. The first Feature Stores, developed at hyperscale AI companies such as Uber, Airbnb, and Facebook, enabled feature engineering using domain specific languages, providing abstractions tailored to the companies’ feature engineering domains. However, a general purpose Feature Store needs a general purpose feature engineering, feature selection, and feature transformation platform. In this talk, we describe how we built a general purpose, open-source Feature Store for ML around dataframes and Apache Spark. We will demonstrate how data engineers can transform and engineers features from backend databases and data lakes, while data scientists can use PySpark to select and transform features into train/test data in a file format of choice (.tfrecords, .npy, .petastorm, etc) on a file system of choice (S3, HDFS). Finally, we will show how the Feature Store enables end-to-end ML pipelines to be factored into feature engineering and data science stages that each can run at different cadences. Bio: Fabio Buso is the head of engineering at Logical Clocks AB, where he leads the Feature Store development. Fabio holds a master's degree in cloud computing and services with a focus on data intensive applications, awarded by a joint program between KTH Stockholm and TU Berlin. Topics: feature store, MLOps.

MLOps is a trend in machine learning (ML) engineering that unifies ML system development (Dev) and ML system operation (Ops). Some ML lifecycle frameworks, such as TensorFlow Extended, are based around end-to-end pipelines that start with raw data and end in production models. During this talk we will introduce the concept of a feature store as the missing piece of ML infrastructure that enables faster lower cost deployment of models. We will show how the Hopsworks Feature Store - factors monolithic end-to-end ML pipelines into feature and model training pipelines that can each run at different cadences. We will show examples of ingestion and training pipelines including hyperparameter optimization and model deployment.

NextGenML

Moldovan Radu Adrian

Revolutionary container based hybrid cloud solution for MLPlatform Ness' data science platform, NextGenML, puts the entire machine learning process: modelling, execution and deployment in the hands of data science teams. The entire paradigm approaches collaboration around AI/ML, being implemented with full respect for best practices and commitment to innovation. Kubernetes (onPrem) + Docker, Azure Kubernetes Cluster (AKS), Nexus, Azure Container Registry(ACR), GlusterFS Workflow Argo->Kubeflow DevOps Helm, kSonnet, Kustomize,Azure DevOps Code Management & CI/CD Git, TeamCity, SonarQube, Jenkins Security MS Active Directory, Azure VPN, Dex (K8s) integrated with GitLab Machine Learning TensorFlow (model training, boarding, serving), Keras, Seldon Storage (Azure) Storage Gen1 & Gen2, Data Lake, File Storage ETL (Azure) Databricks, Spark on K8, Data Factory (ADF), HDInsight (Kafka and Spark), Service Bus (ASB) Lambda functions & VMs, Cache for Redis Monitoring and Logging Graphana, Prometeus, GrayLog

MLOps and Data Quality: Deploying Reliable ML Models in Production

Provectus

Looking to build a robust machine learning infrastructure to streamline MLOps? Learn from Provectus experts how to ensure the success of your MLOps initiative by implementing Data QA components in your ML infrastructure. For most organizations, the development of multiple machine learning models, their deployment and maintenance in production are relatively new tasks. Join Provectus as we explain how to build an end-to-end infrastructure for machine learning, with a focus on data quality and metadata management, to standardize and streamline machine learning life cycle management (MLOps). Agenda - Data Quality and why it matters - Challenges and solutions of Data Testing - Challenges and solutions of Model Testing - MLOps pipelines and why they matter - How to expand validation pipelines for Data Quality

How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform

Databricks

This document summarizes a presentation about utilizing MLFlow and Kubernetes to build an enterprise machine learning platform. It discusses challenges that motivated building such a platform, like lack of model management and difficult deployments. The solution presented abstracts data pipelines into modular components to standardize workflows. It also uses MLFlow to package and track models and experiments, and Kubernetes with Kubeflow to deploy models at scale. A demo shows implementing model serving with these tools.

Blind spots in big data erez koren @ forter

Ido Shilon

1) The document discusses challenges with big data analysis including ensuring complete data coverage from all relevant sources like devices, platforms and browser configurations. 2) It also discusses the challenge of effective monitoring to detect issues that could corrupt alerting data, giving examples of how the company Forter addresses these challenges through techniques like API monitoring and machine learning anomaly detection. 3) The key takeaways are to understand all parts of the data pipeline, log errors from both client and server, and flag any incidents affecting input data for data scientists.

Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...

Databricks

Getting cars to drive autonomously is one of the most exciting problems these days. One of the key challenges is making them drive safely, which requires processing large amounts of data. In our talk we would like to focus on only one task of a self-driving car, namely road detection. Road detection is a software component which needs to be safe for being able to keep the car in the current lane. In order to track the progress of such a software component, a well-designed KPI (key performance indicators) evaluation pipeline is required. In this presentation we would like to show you how we incorporate Spark in our pipeline to deal with huge amounts of data and operate under strict scalability constraints for gathering relevant KPIs. Additionally, we would like to mention several lessons learned from using Spark in this environment.

Managing the Machine Learning Lifecycle with MLflow

Databricks

Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...

Databricks

This talk describes migrating a large random forest classifier from scikit-learn to Spark's MLlib. We cut training time from 2 days to 2 hours, reduced failed runs, and track experiments better with MLflow. Kount provides certainty in digital interactions like online credit card transactions. One of our scores uses a random forest classifier with 250 trees and 100,000 nodes per tree. We used scikit-learn to train using 60 million samples that each contained over 150 features. The in-memory requirements exceeded 750 GB, took 2 days, and were not robust to disruption in our database or training execution. To migrate workflow to Spark, we built a 6-node cluster with HDFS. This provides 1.35 TB of RAM and 484 cores. Using MLlib and parallelization, the training time for our random forests are now less than 2 hours. Training data stays in our production environment, which used to require a deploy cycle to move locally-developed code onto our training server. The new implementation uses Jupyter notebooks for remote development with server-side execution. MLflow tracks all input parameters, code, and git revision number, while the performance and model itself are retained as experiment artifacts. The new workflow is robust to service disruption. Our training pipeline begins by pulling from a Vertica database. Originally, this single connection took over 8 hours to complete with any problem causing a restart. Using sqoop and multiple connections, we pull the data in 45 minutes. The old technique used volatile storage and required the data for each experiment. Now, we pull the data from Vertica one time and then reload much faster from HDFS. While a significant undertaking, moving to the Spark ecosystem converted an ad hoc and hands-on training process into a fully repeatable pipeline that meets regulatory and business goals for traceability and speed. Speaker: Josh Johnston

Feature store: Solving anti-patterns in ML-systems

Andrzej Michałowski

Modern machine learning systems may be very complex and may fall into many pitfalls. It's very easy to unintendedly introduce technical debt into such a complex structure. One of the approaches solving some of anti-patterns is a feature store. Feature store is a missing piece filling a gap between raw data and machine learning models. Not only it will help you to handle technical debt, but even more importantly speeds up time to develop new model.

Managed Feature Store for Machine Learning

Logical Clocks

All hyperscale AI companies build their machine learning platforms around a Feature Store. A feature is a measurable property of some data-sample. It could be for example an image-pixel, a word from a piece of text, the age of a person, a coordinate emitted from a sensor, or an aggregate value like the average number of purchases within the last hour. A Feature Store is a central place to store curated features within an organization. Feature Stores are a fuel for AI systems as we use them to train machine learning models so that we can make predictions for feature values that we have never seen before. During this presentation you learn: - About the concept of a Feature Store and how it can help manage feature data for Enterprises and ease the path of data from backend systems and data-lakes to Data Scientists. - Our take on Feature Stores, including best practices and use cases and: - How to ensure Consistent Features in both Training and Serving Governance, Access-Control, and Versioning - To create Training Data in the File Format of your Choice Eliminate Inconsistency between Features in Training and Inferencing Watch the webinar with a demo: https://www.logicalclocks.com/webinars

MLOps - Build pipelines with Tensor Flow Extended & Kubeflow

Jan Kirenz

Ai platform at scale

Henry Saputra

The document discusses designing scalable platforms for artificial intelligence (AI) and machine learning (ML). It outlines several challenges in developing AI applications, including technical debts, unpredictability, different data and compute needs compared to traditional software. It then reviews existing commercial AI platforms and common components of AI platforms, including data access, ML workflows, computing infrastructure, model management, and APIs. The rest of the document focuses on eBay's Krylov project as an example AI platform, outlining its architecture, challenges of deploying platforms at scale, and needed skill sets on the platform team.

Model versioning done right: A ModelDB 2.0 Walkthrough

Manasi Vartak

DevOps and Machine Learning (Geekwire Cloud Tech Summit)

Jasjeet Thind

Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...

Databricks

Transformer-based pretrained language models such as BERT, XLNet, Roberta and Albert significantly advance the state-of-the-art of NLP and open doors for solving practical business problems with high performance transfer learning. However, operationalizing these models with production-quality continuous integration/ delivery (CI/CD) end-to-end pipelines that cover the full machine learning life cycle stages of train, test, deploy and serve while managing associated data and code repositories is still a challenging task.

Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus

Manasi Vartak

High Performance Transfer Learning for Classifying Intent of Sales Engagement...

Databricks

"The advent of pre-trained language models such as Google’s BERT promises a high performance transfer learning (HPTL) paradigm for many natural language understanding tasks. One such task is email classification. Given the complexity of content and context of sales engagement, lack of standardized large corpus and benchmarks, limited labeled examples and heterogenous context of intent, this real-world use case poses both a challenge and an opportunity for adopting an HPTL approach. This talk presents an experimental investigation to evaluate transfer learning with pre-trained language models and embeddings for classifying sales engagement emails arising from digital sales engagement platforms (e.g., Outreach.io). We will present our findings on evaluating BERT, ELMo, Flair and GloVe embeddings with both feature-based and fine-tuning based transfer learning implementation strategies and their scalability on a GPU cluster with progressively increasing number of labeled samples. Databricks’ MLFlow was used to track hundreds of experiments with different parameters, metrics and models (tensorflow, pytorch etc.). While in this talk we focus on email classification task, the approach described is generic and can be used to evaluate applicability of HPTL to other machine learnings tasks. We hope our findings will help practitioners better understand capabilities and limitations of transfer learning and how to implement transfer learning at scale with Databricks for their scenarios."

AI Modernization at AT&T and the Application to Fraud with Databricks

Databricks

AT&T has been involved in AI from the beginning, with many firsts; “first to coin the term AI”, “inventors of R”, “foundational work on Conv. Neural Nets”, etc. and we have applied AI to hundreds of solutions. Today we are modernizing these AI solutions in the cloud with the help of Databricks and a variety of in-house developments. This talk will highlight our AI modernization effort along with its application to Fraud which is one of our biggest benefitting applications.

Apply MLOps at Scale by H&M

Databricks

H&M uses machine learning for various use cases including logistics, production, sales, marketing, and design/buying. MLOps principles like model versioning, reproducibility, scalability, and automated training are applied to manage the machine learning lifecycle. The technical stack includes Kubernetes, Docker, Azure Databricks for interactive development, Airflow for automated training, and Seldon for model serving. The goal is to apply MLOps at scale for various prediction scenarios through a continuous integration/continuous delivery pipeline.

Next.ml Boston: Data Science Dev Ops

Eric Chiang

This document discusses challenges in deploying machine learning models into production and potential solutions. It covers: 1. Issues with reproducibility due to dependencies and environment configurations when models are trained and deployed. 2. Problems with serializing models and transferring them between different versions of libraries and software stacks. 3. How containers can help address these issues by encapsulating the full runtime environment and dependencies of a model. 4. Managing both models and Docker containers is still required when using this approach.

Machine learning in production with scikit-learn

Jeff Klukas

Presented at PyOhio 2017: https://pyohio.org/schedule/presentation/284/ The Python data ecosystem provides amazing tools to quickly get up and running with machine learning models, but the path to stably serving them in production is not so clear. We'll discuss details of wrapping a minimal REST API around scikit-learn, training and persisting models in batch, and logging decisions, then compare to some other common approaches to productionizing models.

Python as part of a production machine learning stack by Michael Manapat PyDa...

PyData

Over the course of three years, we've built Stripe from scratch and scaled it to process billions of dollars of transaction volume a year by making it easy and painless for merchants to get set up and start accepting payments. While the vast majority of transactions facilitated by Stripe are honest, we do need to protect our merchants from rogue individuals and groups seeing to "test" or "cash" stolen credit cards. To combat this sort of activity, Stripe uses Python (together with Scala and Ruby) as part of its production machine learning pipeline to detect and block fraud in real time. In this talk, I'll go through the scikit-based modeling process for a sample data set that is derived from production data to illustrate how we train and validate our models. We'll also walk through how we deploy the models and monitor them in our production environment and how Python has allowed us to do this at scale.

What's hot

Hamburg Data Science Meetup - MLOps with a Feature Store

Moritz Meister

NextGenML

Moldovan Radu Adrian

MLOps and Data Quality: Deploying Reliable ML Models in Production

Provectus

How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform

Databricks

Blind spots in big data erez koren @ forter

Ido Shilon

Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...

Databricks

Managing the Machine Learning Lifecycle with MLflow

Databricks

Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...

Databricks

Feature store: Solving anti-patterns in ML-systems

Andrzej Michałowski

Managed Feature Store for Machine Learning

Logical Clocks

MLOps - Build pipelines with Tensor Flow Extended & Kubeflow

Jan Kirenz

Ai platform at scale

Henry Saputra

Model versioning done right: A ModelDB 2.0 Walkthrough

Manasi Vartak

DevOps and Machine Learning (Geekwire Cloud Tech Summit)

Jasjeet Thind

Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...

Databricks

Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus

Manasi Vartak

High Performance Transfer Learning for Classifying Intent of Sales Engagement...

Databricks

AI Modernization at AT&T and the Application to Fraud with Databricks

Databricks

Apply MLOps at Scale by H&M

Databricks

Next.ml Boston: Data Science Dev Ops

Eric Chiang

What's hot (20)

Hamburg Data Science Meetup - MLOps with a Feature Store

NextGenML

MLOps and Data Quality: Deploying Reliable ML Models in Production

How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform

Blind spots in big data erez koren @ forter

Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...

Managing the Machine Learning Lifecycle with MLflow

Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...

Feature store: Solving anti-patterns in ML-systems

Managed Feature Store for Machine Learning

MLOps - Build pipelines with Tensor Flow Extended & Kubeflow

Ai platform at scale

Model versioning done right: A ModelDB 2.0 Walkthrough

DevOps and Machine Learning (Geekwire Cloud Tech Summit)

Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...

Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus

High Performance Transfer Learning for Classifying Intent of Sales Engagement...

AI Modernization at AT&T and the Application to Fraud with Databricks

Apply MLOps at Scale by H&M

Next.ml Boston: Data Science Dev Ops

Viewers also liked

Machine learning in production with scikit-learn

Jeff Klukas

Python as part of a production machine learning stack by Michael Manapat PyDa...

PyData

Building A Production-Level Machine Learning Pipeline

Robert Dempsey

With so many options to choose from how do you select the right technologies to use for your machine learning pipeline? Do you purchase bare metal and hire a devops team, install Spark on EC2 instances, use EMR and other AWS services, combine Spark and Elasticsearch?! View this talk to get a first-hand experience of building ML pipelines: what options were looked at, how the final solution was selected, the tradeoffs made and the final results.

Managing and Versioning Machine Learning Models in Python

Simon Frid

Using PySpark to Process Boat Loads of Data

Robert Dempsey

Learn how to use PySpark for processing massive amounts of data. Combined with the GitHub repo - https://github.com/rdempsey/pyspark-for-data-processing - this presentation will help you gain familiarity with processing data using Python and Spark. If you're thinking about machine learning and not sure if it can help improve your business, but want to find out, set up a free 20-minute consultation with us: https://calendly.com/robertwdempsey/free-consultation

Production machine learning_infrastructure

joshwills

This document discusses building machine learning infrastructure to scale data science from the lab to production. It describes two types of data scientists - those focused on investigative analytics in the lab and those building production systems in the factory. Moving analytics from the lab to the factory requires a shift from question-driven and ad-hoc work to metric-driven and automated systems. The document outlines steps to begin this transition such as choosing a good problem, logging everything, and hiring more data scientists. It also describes tools and techniques for experimentation in production machine learning.

Production and Beyond: Deploying and Managing Machine Learning Models

Turi, Inc.

1) Deploying machine learning models into production involves evaluating, monitoring, deploying, and managing models over their lifecycle. 2) Evaluation involves continuously tracking metrics on both historical and live data to determine when models need to be updated. Monitoring involves choosing between existing models, such as by using A/B testing or multi-armed bandits. 3) Dato provides tools to simplify each stage of the machine learning lifecycle from batch training to real-time predictions to continuous evaluation and management of models in production.

Machine learning in production

Turi, Inc.

Square's Machine Learning Infrastructure and Applications - Rong Yan

Hakka Labs

1) Square uses machine learning for fraud detection in payments and to power recommendations on its Square Market platform. 2) Random forests and gradient boosted trees are the primary algorithms used for fraud detection, achieving up to a 10-11% improvement over random forests alone. 3) Square has built scalable machine learning infrastructure including parallel environments, data transport systems, and a learning management system to support rapid model development and evaluation.

PostgreSQL + Kafka: The Delight of Change Data Capture

Jeff Klukas

PostgreSQL is an open source relational database. Kafka is an open source log-based messaging system. Because both systems are powerful and flexible, they’re devouring whole categories of infrastructure. And they’re even better together. In this talk, you’ll learn about commit logs and how that fundamental data structure underlies both PostgreSQL and Kafka. We’ll use that basis to understand what Kafka is, what advantages it has over traditional messaging systems, and why it’s perfect for modeling database tables as streams. From there, we’ll introduce the concept of change data capture (CDC) and run a live demo of Bottled Water, an open source CDC pipeline, watching INSERT, UPDATE, and DELETE operations in PostgreSQL stream into Kafka. We’ll wrap up with a discussion of use cases for this pipeline: messaging between systems with transactional guarantees, transmitting database changes to a data warehouse, and stream processing.

A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...

Jose Quesada (hiring)

The machine learning libraries in Apache Spark are an impressive piece of software engineering, and are maturing rapidly. What advantages does Spark.ml offer over scikit-learn? At Data Science Retreat we've taken a real-world dataset and worked through the stages of building a predictive model -- exploration, data cleaning, feature engineering, and model fitting; which would you use in production? The machine learning libraries in Apache Spark are an impressive piece of software engineering, and are maturing rapidly. What advantages does Spark.ml offer over scikit-learn? At Data Science Retreat we've taken a real-world dataset and worked through the stages of building a predictive model -- exploration, data cleaning, feature engineering, and model fitting -- in several different frameworks. We'll show what it's like to work with native Spark.ml, and compare it to scikit-learn along several dimensions: ease of use, productivity, feature set, and performance. In some ways Spark.ml is still rather immature, but it also conveys new superpowers to those who know how to use it.

Machine Learning In Production

Samir Bessalah

This document discusses challenges in running machine learning applications in production environments. It notes that while Kaggle competitions focus on accuracy, real-world applications require balancing accuracy with interpretability, speed and infrastructure constraints. It also emphasizes that machine learning in production is as much a software and systems problem as a modeling problem. Key aspects that are discussed include flexible and scalable deployment architectures, model versioning, packaging and serving, online evaluation and experiments, and ensuring reproducibility of results.

Machine Learning Pipelines

jeykottalam

This document discusses machine learning pipelines and introduces Evan Sparks' presentation on building image classification pipelines. It provides an overview of feature extraction techniques used in computer vision like normalization, patch extraction, convolution, rectification and pooling. These techniques are used to transform images into feature vectors that can be input to linear classifiers. The document encourages building simple, intermediate and advanced image classification pipelines using these techniques to qualitatively and quantitatively compare their effectiveness.

AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017

Carol Smith

What is machine learning? Is UX relevant in the age of artificial intelligence (AI)? How can I take advantage of cognitive computing? Get answers to these questions and learn about the implications for your work in this session. Carol will help you understand at a basic level how these systems are built and what is required to get insights from them. Carol will present examples of how machine learning is already being used and explore the ethical challenges inherent in creating AI. You will walk away with an awareness of the weaknesses of AI and the knowledge of how these systems work.

Viewers also liked (14)

Machine learning in production with scikit-learn

Python as part of a production machine learning stack by Michael Manapat PyDa...

Building A Production-Level Machine Learning Pipeline

Managing and Versioning Machine Learning Models in Python

Using PySpark to Process Boat Loads of Data

Production machine learning_infrastructure

Production and Beyond: Deploying and Managing Machine Learning Models

Machine learning in production

Square's Machine Learning Infrastructure and Applications - Rong Yan

PostgreSQL + Kafka: The Delight of Change Data Capture

A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...

Machine Learning In Production

Machine Learning Pipelines

AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017

Similar to Multi runtime serving pipelines for machine learning

GeeCON Microservices 2015 scaling micro services at gilt

Adrian Trenaman

AWS re:Invent 2017 Recap - Solutions Updates

Amazon Web Services

AWS re:Invent is an annual global conference of the Amazon Web Services community held in Las Vegas. In 2017, we held 1000+ breakout sessions and attracted over 40,000 attendees. The event offers expanded opportunities to learn about the latest AWS releases, use cases and business benefits, not to mention diving deep into hot topics and meeting with our subject matter experts. Missed it? Don’t worry, we are bringing AWS re:Invent to Hong Kong on Jan 18, 2018. Packed in a day, AWS re:Invent 2017 Recap Hong Kong will showcase new releases announced at re:Invent 2017 on Serverless & Container, DevOps & Mobile, Artificial Intelligence & Machine Learning and more. Local customers will also be invited to share their re:Invent experience and success stories with AWS. Discover the latest services and features from Amazon Web Services and learn how to integrate them into your applications

DF1 - ML - Petukhov - Azure Ml Machine Learning as a Service

MoscowDataFest

The Data Science Process - Do we need it and how to apply?

Ivo Andreev

Machine learning is not black magic but a discipline that involves statistics, data science, analysis and hard work. From searching patterns and data preparation through applying and optimizing algorithms to obtaining usable predictions, one would need background and appropriate tools. But do we need it, when there is already available AI as a service solution out there? Do we need to try hard with artificial neural networks? And if we decide to do so, what tools would be a safe bet? In this session we will go through real world examples, mention key tools from Microsoft and open source world to do data science and machine learning and most importantly - we will provide a workflow and some best practices.

Apache Deltacloud (Linuxcon 2010)

lutter

Apache Deltacloud aims to provide cross-cloud API abstraction and aggregation to manage multiple cloud platforms. It currently supports EC2, GoGrid, Rackspace, vCloud, Terremark and others. Deltacloud provides a common API, hardware profiles to model resources, and instance states. It uses drivers to interface with individual clouds and a core to manage operations across clouds. The project is seeking more contributions to improve features like image management, remote access, and accounting.

Evolving the Network Automation Journey from Python to Platforms

Network Automation Forum

This document discusses the challenges of network automation and proposes moving from a Python-based approach to platform engineering. It argues that previous attempts at network automation through abstractions, standardization, and single sources of truth have failed because networks are more complex than other IT domains. The document advocates separating automation and orchestration domains to allow for specialized tooling in each area. It also suggests that networks will evolve through exposing programmable interfaces without restrictions rather than top-down standardization. The document outlines how Itential supports these goals through its focus on network automation, integration of cloud and traditional networks, and partner ecosystem.

TensorFlow meetup: Keras - Pytorch - TensorFlow.js

Stijn Decubber

DataPalooza - A Music Festival themed ML + IoT Workshop

Amazon Web Services

by Mahendra Bairagi, AI Specialist Solutions Architect, AWS As the CTO of a new startup, you have taken up a challenge of improving the EDM music festival experience. At venues with multiple stages, festival-goers are always looking to identify DJ stage areas with the liveliest atmosphere. This causes them to constantly move around between different stages and miss out on having fun. You are looking to use Machine Learning and IoT technologies to solve this unique problem. Do you accept the Challenge? The objective of this task is to help the festival-goers quickly identify the DJ stage where crowd is the happiest. You've seen a lot of buzz around computer vision, machine learning, and IoT and want to use this technology to detect crowd emotions. From your initial research there are existing ML models that you can leverage to do face and emotion detection, but there are two ways that the predictions (inference) can be done; on the cloud and on the camera itself, but which one will work the best for your needs at the festival? You are going to test both approaches and find out! In this workshop you will use AWS and Intel technologies to learn how to build, deploy, and run ML inference on the cloud as well as on the IoT Edge. You will learn to use Amazon SageMaker with Intel C5 Instances, AWS DeepLens, AWS Greengrass, Amazon Rekognition, and AWS Lambda to build an end-to-end IoT solution that performs machine learning.

Scala services in action

Underscore

The document summarizes a talk given on Scala services frameworks. It introduces Play, HTTP4s, Spray/Akka HTTP, Finch, and Lagom frameworks. It discusses criteria for evaluating frameworks like routing, pluggability, and deployment. It provides high-level overviews of each framework, noting pros and cons. Design, build, support and extension aspects are summarized for each. Finally, it maps frameworks to developer personas and adoption curves, noting where each may be good fits based on priorities like rapid prototyping, functional programming, or enterprise needs.

Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...

Jen Aman

This document discusses optimizations made to Apache Spark MLlib algorithms to better support sparse data at large scale. It describes how KMeans, linear methods, and other ML algorithms were modified to use sparse vector representations to reduce memory usage and improve performance when working with sparse data, including optimizations made for clustering large, high-dimensional datasets. The optimizations allow these algorithms to be applied to much larger sparse datasets and high-dimensional problems than was previously possible with MLlib.

Microsoft & Machine Learning / Artificial Intelligence

İbrahim KIVANÇ

Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines

Sanjeev Rampal

Talk presented at Kubernetes Community Day, New York, May 2024. Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics. 1) Key patterns for Multi-cluster architectures 2) Architectural comparison of several OSS/ CNCF projects to address these patterns 3) Evolution trends for the APIs of these projects 4) Some design recommendations & guidelines for adopting/ deploying these solutions.

201908 Overview of Automated ML

Mark Tabladillo

Automated machine learning (automated ML) automates feature engineering, algorithm and hyperparameter selection to find the best model for your data. The mission: Enable automated building of machine learning with the goal of accelerating, democratizing and scaling AI. This presentation covers some recent announcements of technologies related to Automated ML, and especially for Azure. The demonstrations focus on Python with Azure ML Service and Azure Databricks.

Camel Day Italia 2021 - Camel K

Nicola Ferraro

Nicola Ferraro and Luca Burgazzoli presented on Apache Camel K, a lightweight integration platform for running integrations on Kubernetes. Camel K allows developers to write integration logic in a simple DSL and have it run as serverless functions on Kubernetes. It integrates well with Knative for auto-scaling and can connect to over 300 components. The presentation demonstrated Camel K's use in a event mesh and discussed current developments including support for Quarkus, Kamelets for reusable integration snippets, and testing with YAKS. Camel K aims to simplify building integrations in a cloud native environment.

Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...

Lviv Startup Club

1. The document discusses Microsoft's artificial intelligence capabilities including cognitive services, bots, Azure machine learning, and tools for building AI applications and managing models. 2. It provides an overview of Microsoft's offerings for AI including cognitive services, bots, Azure machine learning studio for building experiments visually, and Azure machine learning services for experimentation and model management. 3. The document emphasizes that Microsoft's goal is to make AI accessible and useful to every person and organization by providing a broad set of tools, frameworks, and infrastructure for developing, training, deploying, and managing AI applications and models.

Matrix.org decentralised communication, Matthew Hodgson, TADSummit

Alan Quayle

201906 02 Introduction to AutoML with ML.NET 1.0

Mark Tabladillo

ML.NET 1.0 release is the first major milestone of a great journey that started in May 2018 when we released ML.NET 0.1 as open source. ML.NET is an open-source and cross-platform machine learning framework for .NET developers. Using ML.NET, developers can leverage their existing tools and skillsets to develop and infuse custom AI into their applications by creating custom machine learning models for common scenarios like Sentiment Analysis, Recommendation, Image Classification and more. “Automated ML” is a collection of new technologies from Microsoft to enhance the data science development process. Still in preview, Auto ML for ML.NET 1.0 will be demonstrated in a Deep Learning Virtual Machine running Windows Server 2016. Code examples are in C# and run in Visual Studio Community 2019. This presentation is the second of four related to ML.NET and Automated ML. The presentation will be recorded with video posted to this YouTube Channel: http://bit.ly/2ZybKwI

Advanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio

Animesh Singh

Model Inferencing use cases are becoming a requirement for models moving into the next phase of production deployments. More and more users are now encountering use cases around canary deployments, scale-to-zero or serverless characteristics. And then there are also advanced use cases coming around model explainability, including A/B tests, ensemble models, multi-armed bandits, etc. In this talk, the speakers are going to detail how to handle these use cases using Kubeflow Serving and the native Kubernetes stack which is Istio and Knative. Knative and Istio help with autoscaling, scale-to-zero, canary deployments to be implemented, and scenarios where traffic is optimized to the best performing models. This can be combined with KNative eventing, Istio observability stack, KFServing Transformer to handle pre/post-processing and payload logging which consequentially can enable drift and outlier detection to be deployed. We will demonstrate where currently KFServing is, and where it's heading towards.

Machine learning and Deep learning on edge devices using TensorFlow

Aditya Bhattacharya

Data Mesh Part 4 Monolith to Mesh

Jeffrey T. Pollock

This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems. Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/) Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.

Similar to Multi runtime serving pipelines for machine learning (20)

GeeCON Microservices 2015 scaling micro services at gilt

AWS re:Invent 2017 Recap - Solutions Updates

DF1 - ML - Petukhov - Azure Ml Machine Learning as a Service

The Data Science Process - Do we need it and how to apply?

Apache Deltacloud (Linuxcon 2010)

Evolving the Network Automation Journey from Python to Platforms

TensorFlow meetup: Keras - Pytorch - TensorFlow.js

DataPalooza - A Music Festival themed ML + IoT Workshop

Scala services in action

Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...

Microsoft & Machine Learning / Artificial Intelligence

Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines

201908 Overview of Automated ML

Camel Day Italia 2021 - Camel K

Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...

Matrix.org decentralised communication, Matthew Hodgson, TADSummit

201906 02 Introduction to AutoML with ML.NET 1.0

Advanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio

Machine learning and Deep learning on edge devices using TensorFlow

Data Mesh Part 4 Monolith to Mesh

Recently uploaded

Graspan: A Big Data System for Big Code Analysis

Aftab Hussain

We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs. We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations. These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18. - Accepted in ASPLOS ‘17, Xi’an, China. - Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17. - Invited for presentation at SoCal PLS ‘16. - Invited for poster presentation at PLDI SRC ‘16.

Transform Your Communication with Cloud-Based IVR Solutions

TheSMSPoint

Discover the power of Cloud-Based IVR Solutions to streamline communication processes. Embrace scalability and cost-efficiency while enhancing customer experiences with features like automated call routing and voice recognition. Accessible from anywhere, these solutions integrate seamlessly with existing systems, providing real-time analytics for continuous improvement. Revolutionize your communication strategy today with Cloud-Based IVR Solutions. Learn more at: https://thesmspoint.com/channel/cloud-telephony

KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD

rodomar2

Microservice Teams - How the cloud changes the way we work

Sven Peters

A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams? Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.

Introducing Crescat - Event Management Software for Venues, Festivals and Eve...

Crescat

Crescat is industry-trusted event management software, built by event professionals for event professionals. Founded in 2017, we have three key products tailored for the live event industry. Crescat Event for concert promoters and event agencies. Crescat Venue for music venues, conference centers, wedding venues, concert halls and more. And Crescat Festival for festivals, conferences and complex events. With a wide range of popular features such as event scheduling, shift management, volunteer and crew coordination, artist booking and much more, Crescat is designed for customisation and ease-of-use. Over 125,000 events have been planned in Crescat and with hundreds of customers of all shapes and sizes, from boutique event agencies through to international concert promoters, Crescat is rigged for success. What's more, we highly value feedback from our users and we are constantly improving our software with updates, new features and improvements. If you plan events, run a venue or produce festivals and you're looking for ways to make your life easier, then we have a solution for you. Try our software for free or schedule a no-obligation demo with one of our product specialists today at crescat.io

Webinar On-Demand: Using Flutter for Embedded

ICS

Flutter is a popular open source, cross-platform framework developed by Google. In this webinar we'll explore Flutter and its architecture, delve into the Flutter Embedder and Flutter’s Dart language, discover how to leverage Flutter for embedded device development, learn about Automotive Grade Linux (AGL) and its consortium and understand the rationale behind AGL's choice of Flutter for next-gen IVI systems. Don’t miss this opportunity to discover whether Flutter is right for your project.

Energy consumption of Database Management - Florina Jonuzi

Green Software Development

Measures in SQL (SIGMOD 2024, Santiago, Chile)

Julian Hyde

SQL has attained widespread adoption, but Business Intelligence tools still use their own higher level languages based upon a multidimensional paradigm. Composable calculations are what is missing from SQL, and we propose a new kind of column, called a measure, that attaches a calculation to a table. Like regular tables, tables with measures are composable and closed when used in queries. SQL-with-measures has the power, conciseness and reusability of multidimensional languages but retains SQL semantics. Measure invocations can be expanded in place to simple, clear SQL. To define the evaluation semantics for measures, we introduce context-sensitive expressions (a way to evaluate multidimensional expressions that is consistent with existing SQL semantics), a concept called evaluation context, and several operations for setting and modifying the evaluation context. A talk at SIGMOD, June 9–15, 2024, Santiago, Chile Authors: Julian Hyde (Google) and John Fremlin (Google) https://doi.org/10.1145/3626246.3653374

Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris

Neo4j

GraphSummit Paris - The art of the possible with Graph Technology

Neo4j

GreenCode-A-VSCode-Plugin--Dario-Jurisic

Green Software Development

ALGIT - Assembly Line for Green IT - Numbers, Data, Facts

Green Software Development

SMS API Integration in Saudi Arabia| Best SMS API Service

Yara Milbes

Discover the benefits and implementation of SMS API integration in the UAE and Middle East. This comprehensive guide covers the importance of SMS messaging APIs, the advantages of bulk SMS APIs, and real-world case studies. Learn how CEQUENS, a leader in communication solutions, can help your business enhance customer engagement and streamline operations with innovative CPaaS, reliable SMS APIs, and omnichannel solutions, including WhatsApp Business. Perfect for businesses seeking to optimize their communication strategies in the digital age.

E-commerce Development Services- Hornet Dynamics

Hornet Dynamics

LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM

lorraineandreiamcidl

Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf

timtebeek1

openEuler Case Study - The Journey to Supply Chain Security

Shane Coughlan

What is Augmented Reality Image Tracking

pavan998932

LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx

lorraineandreiamcidl

WhatsApp offers simple, reliable, and private messaging and calling services for free worldwide. With end-to-end encryption, your personal messages and calls are secure, ensuring only you and the recipient can access them. Enjoy voice and video calls to stay connected with loved ones or colleagues. Express yourself using stickers, GIFs, or by sharing moments on Status. WhatsApp Business enables global customer outreach, facilitating sales growth and relationship building through showcasing products and services. Stay connected effortlessly with group chats for planning outings with friends or staying updated on family conversations.

DDS-Security 1.2 - What's New? Stronger security for long-running systems

Gerardo Pardo-Castellote

Recently uploaded (20)

Graspan: A Big Data System for Big Code Analysis

Transform Your Communication with Cloud-Based IVR Solutions

KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD

Microservice Teams - How the cloud changes the way we work

Introducing Crescat - Event Management Software for Venues, Festivals and Eve...

Webinar On-Demand: Using Flutter for Embedded

Energy consumption of Database Management - Florina Jonuzi

Measures in SQL (SIGMOD 2024, Santiago, Chile)

Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris

GraphSummit Paris - The art of the possible with Graph Technology

GreenCode-A-VSCode-Plugin--Dario-Jurisic

ALGIT - Assembly Line for Green IT - Numbers, Data, Facts

SMS API Integration in Saudi Arabia| Best SMS API Service

E-commerce Development Services- Hornet Dynamics

LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM

Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf

openEuler Case Study - The Journey to Supply Chain Security

What is Augmented Reality Image Tracking

LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx

DDS-Security 1.2 - What's New? Stronger security for long-running systems

Multi runtime serving pipelines for machine learning

1. Multi-Runtime Serving Pipelines Stepan Pushkarev CTO of Hydrosphere.io

2. Mission: Accelerate Machine Learning to Production Opensource Products: - Mist: Serverless proxy for Spark - ML Lambda: ML Function as a Service - Sonar: Data and ML Monitoring Business Model: Subscription services and hands-on consulting About

3. Deployment | Serving | Scoring | Inference @Nvidia https://www.nvidia.com/en-us/deep-learning-ai/solutions/

4. From Single Model to Meta Pipelines

5. Item 1 Item 2 Title Authentic HERMES Bijouterie Fantaisie Selle Clip-On Earrings Silvertone #S1742 E Auth HERMES Earrings Sellier Clip-on Silver Tone Round $0 Ship 25130490900 S06B Specs Brand: HERMES Size(cm): W1.8 x H1.8 cm(Approx) Color: Silver Size(inch): W0.7 x H0.7" (Approx) Style: Earrings Rank: B Brand: Hermes Fastening: Clip-On Style: Clip on Country/Region of Manufacture: Unknown Metal: Silver Plated Main Color:Silver Color: Silver Description ... ... Does this pair describe the same thing? Product Matching

6. Model Artifact: Ops perspective

7. - HTTP/1.1, HTTP/2, gRPC - Kafka, Flink, Kinesis - Protobuf, Avro - Service Discovery - Pipelining - Tracing - Monitoring - Autoscaling - Versioning - A/B, Canary - Testing - CPU, GPU API & Logistics

8. Monitoring Shifting experimentation to production

9. Sidecar Architecture

10. Functions registry responsible for the model life cycle and all the business logic required to configure models for serving Mesh of serving runtimes is an actual serving cluster Infrastructure integration: ECS for AWS, Kubernetes for GCE and on premise

11. UX: Models and Applications Applications provide public virtual endpoints for the models and compositions of the models.

12. Why Not just one Big Neural Network? ● Not always possible ● Stages could be independent ● Ad-hoc rule based models ● Physics models (e.g. LIDAR) ● Big E2E DL Requires Black Magic skills

13. Why Not just one Python script? ● Modularity. Stages could be developed by different teams ● Traceability and Monitoring ● Versioning ● Independent deployment, A/B testing and Canary ● Request Shadowing and other cool stuff ● Could require different ML runtimes (TF, Scikit, Spark ML, etc) ● We need more microservices :)

14. Why Not just TF Serving? ● Other ML runtimes (DL4J, Scikit, Spark ML). Servables are overkill. ● Need better versioning and immutability (Docker per version) ● Don’t want to deal with state (model loaded, offloaded, etc) ● Want to re-use microservices stack (tracing, logging, metrics) ● Need better scalability

15. Demo

16. Thank you - @hydrospheredata - https://github.com/Hydrospheredata - https://hydrosphere.io/ - spushkarev@hydrosphere.io

Multi runtime serving pipelines for machine learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to Multi runtime serving pipelines for machine learning

Similar to Multi runtime serving pipelines for machine learning (20)

Recently uploaded

Recently uploaded (20)

Multi runtime serving pipelines for machine learning