Production machine learning_infrastructure

Presented by David Taieb, Architect, IBM Cloud Data Services Along with Spark Streaming, Spark SQL and GraphX, MLLib is one of the four key architectural components of Spark. It provides easy-to-use (even for beginners), powerful Machine Learning APIs that are designed to work in parallel using Spark RDDs. In this session, we’ll introduce the different algorithms available in MLLib, e.g. supervised learning with classification (binary and multi class) and regression but also unsupervised learning with clustering (K-means) and recommendation systems. We’ll conclude the presentation with a deep dive on a sample machine learning application built with Spark MLLib that predicts whether a scheduled flight will be delayed or not. This application trains a model using data from real flight information. The labeled flight data is combined with weather data from the “Insight for Weather” service available on IBM Bluemix Cloud Platform to form the training, test and blind data. Even if you are not a black belt in machine learning, you will learn in this session how to leverage powerful Machine Learning algorithms available in Spark to build interesting predictive and prescriptive applications. About the Speaker: For the last 4 years, David has been the lead architect for the Watson Core UI & Tooling team based in Littleton, Massachusetts. During that time, he led the design and development of a Unified Tooling Platform to support all the Watson Tools including accuracy analysis, test experiments, corpus ingestion, and training data generation. Before that, he was the lead architect for the Domino Server OSGi team responsible for integrating the eXpeditor J2EE Web Container in Domino and building first class APIs for the developer community. He started with IBM in 1996, working on various globalization technologies and products including Domino Global Workbench (used to develop multilingual Notes/Domino NSF applications) and a multilingual Content Management system for the Websphere Application Server. David enjoys sharing his experience by speaking at conferences. You’ll find him at various events like the Unicode conference, Eclipsecon, and Lotusphere. He’s also passionate about building tools that help improve developer productivity and overall experience.

Machine Learning with Apache Spark

IBM Cloud Data Services

Machine Learning with GraphLab Create

With data as a valuable currency and the architecture of reliable, scalable Data Lakes and Lakehouses continuing to mature, it is crucial that machine learning training and deployment techniques keep up to realize value. Reproducibility, efficiency, and governance in training and production environments rest on the shoulders of both point in time snapshots of the data and a governing mechanism to regulate, track, and make best use of associated metadata. This talk will outline the challenges and importance of building and maintaining reproducible, efficient, and governed machine learning solutions as well as posing solutions built on open source technologies – namely Delta Lake for data versioning and MLflow for efficiency and governance.

Importance of ML Reproducibility & Applications with MLfLow

Data ops: Machine Learning in production

Cloud Native Night July 2019, Munich: Talk by Jörg Schad (@joerg_schad, Head of Engineering & ML at ArangoDB) === Please download slides if blurred! === Abstract: With the rapid and recent rise of data science, the Machine Learning Platforms being built are becoming more complex. For example, consider the various Kubeflow components: Distributed Training, Jupyter Notebooks, CI/CD, Hyperparameter Optimization, Feature store, and more. Each of these components is producing metadata: Different (versions) Datasets, different versions a of a jupyter notebooks, different training parameters, test/training accuracy, different features, model serving statistics, and many more. For production use it is critical to have a common view across all these metadata as we have to ask questions such as: Which jupyter notebook has been used to build Model xyz currently running in production? If there is new data for a given dataset, which models (currently serving in production) have to be updated? In this talk, we look at existing implementations, in particular MLMD as part of the TensorFlow ecosystem. Further, propose a first draft of a (MLMD compatible) universal Metadata API. We demo the first implementation of this API using ArangoDB.

Knowledge Discovery

André Karpištšenko

The Quest for an Open Source Data Science Platform

QAware GmbH

The machine learning libraries in Apache Spark are an impressive piece of software engineering, and are maturing rapidly. What advantages does Spark.ml offer over scikit-learn? At Data Science Retreat we've taken a real-world dataset and worked through the stages of building a predictive model -- exploration, data cleaning, feature engineering, and model fitting; which would you use in production? The machine learning libraries in Apache Spark are an impressive piece of software engineering, and are maturing rapidly. What advantages does Spark.ml offer over scikit-learn? At Data Science Retreat we've taken a real-world dataset and worked through the stages of building a predictive model -- exploration, data cleaning, feature engineering, and model fitting -- in several different frameworks. We'll show what it's like to work with native Spark.ml, and compare it to scikit-learn along several dimensions: ease of use, productivity, feature set, and performance. In some ways Spark.ml is still rather immature, but it also conveys new superpowers to those who know how to use it.

A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...

Jose Quesada (hiring)

Looking to build a robust machine learning infrastructure to streamline MLOps? Learn from Provectus experts how to ensure the success of your MLOps initiative by implementing Data QA components in your ML infrastructure. For most organizations, the development of multiple machine learning models, their deployment and maintenance in production are relatively new tasks. Join Provectus as we explain how to build an end-to-end infrastructure for machine learning, with a focus on data quality and metadata management, to standardize and streamline machine learning life cycle management (MLOps). Agenda - Data Quality and why it matters - Challenges and solutions of Data Testing - Challenges and solutions of Model Testing - MLOps pipelines and why they matter - How to expand validation pipelines for Data Quality

MLOps and Data Quality: Deploying Reliable ML Models in Production

Provectus

Description: We present a supervised anomaly detection approach that is scalable and interpretable. It works with tabular data and searches over all decision rules for the anomaly class involving one or two features. It creates a classifier out of all rules meeting user-specified precision and recall constraints, classifying a test example as an anomaly if any of the rules fire. Overlapping decision rules can be pruned to reduce model complexity, leaving a small number of simple rules that a user can easily understand. Our system operates on Pandas DataFrames and has a high-performance C++ backend with experimental GPU and FPGA acceleration available. It is available open-source at https://github.com/jjthomas/rule_engine

A Fast Decision Rule Engine for Anomaly Detection

A/B testing, i.e., measuring the impact of proposed variants of e.g. e-commerce websites, is fundamental for increasing conversion rates and other key business metrics. We have developed a solution that makes it possible to run dozens of simultaneous A/B tests, obtain conclusive results sooner, and get more interpretable results than just statistical significance, but rather probabilities of the change having a positive effect, how much revenue is risked, etc. To compute those metrics, we need to estimate the posterior distributions of the metrics, which are computed using Generalized Linear Models (GLMs). Since we process gigabytes of data, we use a PySpark implementation, which however does not provide standard errors of coefficients. We, therefore, use bootstrapping to estimate the distributions. In this talk, I’ll describe how we’ve implemented parallelization of an already parallelized GLM computation to be able to scale this computation horizontally over a large cluster in Databricks and describe various tweaks and how they’ve improved the performance.

AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...

Robert Grossman

Bootstrapping of PySpark Models for Factorial A/B Tests

Building Personalized Data Products with Dato

At Schiphol airport we run a lot of mission critical machine learning models in production, ranging from models that predict passenger flow to computer vision models that analyze what is happening around the aircraft. Especially now in times of Covid it is paramount for us to be able to quickly iterate on these models by implementing new features, retraining them to match the new dynamics and above all to monitor them actively to see if they still fit the current state of affairs. To achieve those needs we rely on MLFlow but have also integrated that with many of our other systems. So have we written Airflow operators for MLFlow to ease the retraining of our models, have we integrated MLFlow deeply with our CI pipelines and have we integrated it with our model monitoring tooling. In this talk we will take you through the way we rely on MLFlow and how that enables us to release (sometimes) multiple versions of a model per week in a controlled fashion. With this set-up we are achieving the same benefits and speed as you have with a traditional software CI pipeline.

Consolidating MLOps at One of Europe’s Biggest Airports

Rest microservice ml_deployment_ntalagala_ai_conf_2019

Nisha Talagala

What's hot (20)

AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...

Production ready big ml workflows from zero to hero daniel marcous @ waze

Ml infra at an early stage

Machine Learning system architecture – Microsoft Translator, a Case Study : ...

Making Data Science Scalable - 5 Lessons Learned

Getting Started With Dato - August 2015

Machine Learning with Apache Spark

Machine Learning with GraphLab Create

Importance of ML Reproducibility & Applications with MLfLow

Data ops: Machine Learning in production

Knowledge Discovery

The Quest for an Open Source Data Science Platform

A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...

MLOps and Data Quality: Deploying Reliable ML Models in Production

A Fast Decision Rule Engine for Anomaly Detection

AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...

Bootstrapping of PySpark Models for Factorial A/B Tests

Building Personalized Data Products with Dato

Consolidating MLOps at One of Europe’s Biggest Airports

Rest microservice ml_deployment_ntalagala_ai_conf_2019

Viewers also liked

Square's Machine Learning Infrastructure and Applications - Rong Yan

Hakka Labs

Production and Beyond: Deploying and Managing Machine Learning Models

Learn how to use PySpark for processing massive amounts of data. Combined with the GitHub repo - https://github.com/rdempsey/pyspark-for-data-processing - this presentation will help you gain familiarity with processing data using Python and Spark. If you're thinking about machine learning and not sure if it can help improve your business, but want to find out, set up a free 20-minute consultation with us: https://calendly.com/robertwdempsey/free-consultation

Using PySpark to Process Boat Loads of Data

Robert Dempsey

Multi runtime serving pipelines for machine learning

With so many options to choose from how do you select the right technologies to use for your machine learning pipeline? Do you purchase bare metal and hire a devops team, install Spark on EC2 instances, use EMR and other AWS services, combine Spark and Elasticsearch?! View this talk to get a first-hand experience of building ML pipelines: what options were looked at, how the final solution was selected, the tradeoffs made and the final results.

Building A Production-Level Machine Learning Pipeline

Robert Dempsey

Over the course of three years, we've built Stripe from scratch and scaled it to process billions of dollars of transaction volume a year by making it easy and painless for merchants to get set up and start accepting payments. While the vast majority of transactions facilitated by Stripe are honest, we do need to protect our merchants from rogue individuals and groups seeing to "test" or "cash" stolen credit cards. To combat this sort of activity, Stripe uses Python (together with Scala and Ruby) as part of its production machine learning pipeline to detect and block fraud in real time. In this talk, I'll go through the scikit-based modeling process for a sample data set that is derived from production data to illustrate how we train and validate our models. We'll also walk through how we deploy the models and monitor them in our production environment and how Python has allowed us to do this at scale.

Python as part of a production machine learning stack by Michael Manapat PyDa...

PyData

Any startup has to have a clear go-to-market strategy from the beginning. Similarly, any data science project has to have a go-to-production strategy from its first days, so it could go beyond proof-of-concept. Machine learning and artificial intelligence in production would result in hundreds of training pipelines and machine learning models that are continuously revised by teams of data scientists and seamlessly connected with web applications for tenants and users. In this demo-based talk we will walk through the best practices for simplifying machine learning operations across the enterprise and providing a serverless abstraction for data scientists and data engineers, so they could train, deploy and monitor machine learning models faster and with better quality.

Serverless machine learning operations

PostgreSQL is an open source relational database. Kafka is an open source log-based messaging system. Because both systems are powerful and flexible, they’re devouring whole categories of infrastructure. And they’re even better together. In this talk, you’ll learn about commit logs and how that fundamental data structure underlies both PostgreSQL and Kafka. We’ll use that basis to understand what Kafka is, what advantages it has over traditional messaging systems, and why it’s perfect for modeling database tables as streams. From there, we’ll introduce the concept of change data capture (CDC) and run a live demo of Bottled Water, an open source CDC pipeline, watching INSERT, UPDATE, and DELETE operations in PostgreSQL stream into Kafka. We’ll wrap up with a discussion of use cases for this pipeline: messaging between systems with transactional guarantees, transmitting database changes to a data warehouse, and stream processing.

PostgreSQL + Kafka: The Delight of Change Data Capture

Jeff Klukas

Machine learning in production

Presented at PyOhio 2017: https://pyohio.org/schedule/presentation/284/ The Python data ecosystem provides amazing tools to quickly get up and running with machine learning models, but the path to stably serving them in production is not so clear. We'll discuss details of wrapping a minimal REST API around scikit-learn, training and persisting models in batch, and logging decisions, then compare to some other common approaches to productionizing models.

Managing and Versioning Machine Learning Models in Python

Simon Frid

Machine learning in production with scikit-learn

Jeff Klukas

Machine Learning Pipelines

jeykottalam

Spark and machine learning in microservices architecture

What is machine learning? Is UX relevant in the age of artificial intelligence (AI)? How can I take advantage of cognitive computing? Get answers to these questions and learn about the implications for your work in this session. Carol will help you understand at a basic level how these systems are built and what is required to get insights from them. Carol will present examples of how machine learning is already being used and explore the ethical challenges inherent in creating AI. You will walk away with an awareness of the weaknesses of AI and the knowledge of how these systems work.

AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017

Carol Smith

Viewers also liked (14)

Square's Machine Learning Infrastructure and Applications - Rong Yan

Production and Beyond: Deploying and Managing Machine Learning Models

Using PySpark to Process Boat Loads of Data

Multi runtime serving pipelines for machine learning

Building A Production-Level Machine Learning Pipeline

Python as part of a production machine learning stack by Michael Manapat PyDa...

Serverless machine learning operations

PostgreSQL + Kafka: The Delight of Change Data Capture

Machine learning in production

Managing and Versioning Machine Learning Models in Python

Machine learning in production with scikit-learn

Machine Learning Pipelines

Spark and machine learning in microservices architecture

AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017

Similar to Production machine learning_infrastructure

Cloudera User Group - From the Lab to the Factory

ClouderaUserGroups

Josh Wills, MLconf 2013

MLconf

MLconf NYC Josh Wills

MLconf

As data science workloads grow, so does their need for infrastructure. But, is it fair to ask data scientists to also become infrastructure experts? If not the data scientists, then, who is responsible for spinning up and managing data science infrastructure? This talk will address the context in which ML infrastructure is emerging, walk through two examples of ML infrastructure tools for launching hyperparameter optimization jobs, and end with some thoughts for building better tools in the future. Originally given as a talk at the PyData Ann Arbor meetup (https://www.meetup.com/PyData-Ann-Arbor/events/260380989/)

Machine Learning Infrastructure

SigOpt

Continuum Analytics and Python

Travis Oliphant

Data Discovery and Metadata

markgrover

Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware

DaveEdwards12

In the U.S., pharmaceutical firms and medical device manufacturers must meet electronic record-keeping regulations set by the Food and Drug Administration (FDA). The regulation is Title 21 CFR Part 11, commonly known as Part 11. Part 11 requires regulated firms to implement controls for software and systems involved in processing many forms of data as part of business operations and product development. Enterprise data warehouses are used by the pharmaceutical and medical device industries for storing data covered by Part 11 (for example, Safety Data and Clinical Study project data). QuerySurge, the only test tool designed specifically for automating the testing of data warehouses and the ETL process, has been effective in testing data warehouses used by Part 11-governed companies. The purpose of QuerySurge is to assure that your warehouse is not populated with bad data. In industry surveys, bad data has been found in every database and data warehouse studied and is estimated to cost firms on average $8.2 million annually, according to analyst firm Gartner. Most firms test far less than 10% of their data, leaving at risk the rest of the data they are using for critical audits and compliance reporting. QuerySurge can test up to 100% of your data and help assure your organization that this critical information is accurate. QuerySurge not only helps in eliminating bad data, but is also designed to support Part 11 compliance. Learn more at www.QuerySurge.com

Data Warehouse Testing in the Pharmaceutical Industry

RTTS

Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware

Priyanka Aash

Building an Experimentation Platform in Clojure

Srihari Sriraman

Ds for finance day 4

QuantUniversity

Transferring Software Testing Tools to Practice

Tao Xie

Discover the world of IoT and how they're shaping our world with a hands-on approach. Affordable, internet-connected devices are becoming ubiquitous - with the rise of Arduino, Raspberry Pi, and the Particle Photon, it's now possible to quickly prototype and design an internet-ready device that monitors weather patterns, responds to movement, or collects and transmits data to the cloud for under $100. In this full-day workshop, we'll begin with a hands-on introduction to IoT and build IoT devices. With a Raspberry Pi 2 kit running Windows 10 IoT Core, we’ll build a simple temperature sensor, collecting ambient temperature readings, and stream the data to an Azure IoT Hub. Once the data is in Azure, we’ll analyze it with Azure Stream Analytics, and ship it to an Azure SQL Database. Finally, we’ll report on the data and build dashboards of our temperature readings using Power BI.

Code PaLOUsa Azure IoT Workshop

Mike Branstein

At Lennox International, we have thousands of IoT connected devices streaming data into the Azure platform with a minute level polling interval. The challenge was to use these data sets, combine with external data sources such as weather, and predict equipment failure with high levels of accuracy along with their influencing patterns and parameters. Previously the team was using a combination of on-premise and desktop tools to run algorithms on a sample set of devices. The result was low accuracy levels (around 65%) on a process that took more than 6 hours. The team had to work through several data orchestration challenges and identify a machine learning platform which enabled them to collaborate between our engineering SME’s, Data Engineers and Data Scientists. The team decided to use Azure Databricks to build the data engineering pipelines, appropriate machine learning models and extract predictions using PySpark. To enhance the sophistication of the learning, the team worked on a variety of Spark ML models such as Gradient Boosted Trees and Random Forest. The team also implemented stacking, ensemble methods using H2O driverless AI and sparkling water on Azure Databricks clusters, which can scale up to 1000 cores. Join us in this session and see how this resulted in models that run in 40 minutes with minimal tuning and predict failures with accuracy of about 90%.

How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...