Deep learning systems model serving

•Download as PPTX, PDF•

0 likes•113 views

This document discusses model serving for deep learning. It begins with a brief introduction to machine learning, deep learning, and neural networks. It then explains that deep learning has a growing impact and can perform better than other machine learning techniques and humans. The document focuses on model serving, including what a deployed model looks like, key aspects of model serving systems like performance, availability and monitoring, and examples of model serving systems. It describes the Amazon Model Server and its features like model archives, REST APIs, containerization, metrics, and ONNX support. In closing, it discusses challenges and opportunities in model serving.

Software

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Hagay Lupesko
01.25.2018
Model Serving for Deep Learning
Amazon AI

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Brief Intro to Deep Learning
AI
Machine
Learning
Deep
Learning

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Brief Intro to Deep Learning – Neural Networks
Output
Layer
Input
Layer
Hidden
Layers
Many
More…
• Non linear
• Hierarchical
feature learning
• Scalable
architecture
• Computationally
intensive

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deep Learning is a Big Deal
It has a growing impact on our lives
Personalization Logistics Voice Autonomous
Vehicles

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deep Learning is a Big Deal
It’s able to do better than other ML and Humans

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Model
Model Server
Mobile
Desktop
IoT
Internet
So what does a deployed model looks like?

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Performance
Availability
Networking
Monitoring
Model Decoupling
Cross Framework
Cross Platform
The Undifferentiated
Heavy Lifting of
Model Serving

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Tensor Flow
Serving
Model Server
for MXNet
UC Berkeley
Clipper
Model Serving Systems for Deep Learning

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
It’s Demo Time!

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Model Archive
REST and
OpenAPI
Containerized
ONNX Support Operational Metrics

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Trained
Network
Model
Signature
Custom
Code
Auxiliary
Assets
Model Archive
Model Export CLI
Model Archive
Back

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
REST and OpenAPI
REST-like endpoint: <model-name>/predict
Endpoint auto-generated from the model’s signature.json
JSON encoding by default
Binary input via request payload
OpenAPI support – client code-gen and tooling
Back

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MMS
Dockerfile
Build
Push
Launch
Containerization
Container Cluster
MMS Container
MMS ContainerMMS Container
MXNet NGINX
MXNet Model Server
Lightweight virtualization, isolation, runs anywhere
Back

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Requests
• Latencies
• Resources
Metrics
• Model Name
• Host Name
Dimensions
• Log / CSV
• AWS CloudWatch
Target
Operational Metrics
Back

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
O(n2)
Pairs
MXNet
Caffe2
PyTorch
TF
CNTKCoreML
TensorRT
NGraph
SNPEMany Frameworks
ONNX Support
Many Platforms
ONNX: Common IR
Supported in MMS v0.2

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Performance
• Batching
• Caching
• JIT Compilation
• Custom code
• Quantization Platform
• New players
• ONNX
• Plugins
Adoption
• Ease of use
• Internal
Amazon dev
tools
• Industry
partners
Challenges Ahead

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Open source – try it out and file issues
github.com/awslabs/mxnet-model-server
mxnet-sdk-team@amazon.com

Interested in learning how to integrate the Internet of Things into your advertising platform and combine it with AWS Greengrass, AWS Lambda, Amazon DynamoDB, and Amazon API Gateway to send context-aware advertisements to users at the point of buying? In this session, Mobiquity, the leader in digital engagements servicing the world’s top brands, and their Innovation Partner Flomio discuss how they’ve been able to use AWS to create compelling digital experiences for their clients. We deep-dive on the technology behind Mobiquity’s innovative shopping system that uses RFID, Bluetooth, captive Wifi, and a mobile app to provide real-time context for understanding how and where your customers interact with your products and services, allowing you to better tailor your ads to their particular preferences.

Tensult introduction deck

Dilip Kola

Natural Language Processing Plus Natural Language Generation: The Cutting Edg...

Amazon Web Services

Your Alexa skill could become the voice of your company to customers. How do you make sure that it conveys rich information, delivered with your brand's personality? In this session, Adam Long, VP of Product Management at Automated Insights, discusses natural language generation (NLG) techniques and how to make your Alexa response more insightful and engaging. Rob McCauley, Solutions Architect with Amazon Alexa, shows you how to put those techniques into action.

SRV336_Build a Serverless, Face-Recognizing IoT Security System with Amazon R...

Amazon Web Services

Learn how to build powerful backends without managing servers by using MongoDB Stitch. Stitch is a backend-as-a-service that lets developers perform CRUD operations directly against their database with a REST API, declaratively specify field-level security on their data, and compose server-side logic and external services with hosted functions. We provide four live coding demonstrations of Stitch in action. First, we demonstrate querying and inserting data into Stitch by adding comment capability to a static blog. Second, we demonstrate the power of Stitch's declarative ACL rules in the context of a medical records application. Third, we show services integration using Amazon S3 and Amazon Rekognition. Finally, we put it all together with an IoT-powered two-factor door security system, demonstrating how Stitch orchestrates a complex architecture of devices, logic, and services. Session sponsored by MongoDB

GPSBUS211-Edge Intelligence for IoT Applications

Amazon Web Services

The Internet of Things (IoT) keeps evolving, and there’s a critical need for high-speed data processing, analytics, and reduced latency at the edge. Meeting the needs of these systems that leverage a distributed architecture to bring compute resources to the edge and the cloud is essential. A cloud-only model might not be applicable for time-sensitive operations or where network connectivity is poor. Also, connecting every device to the cloud and sending raw data over the internet can have privacy, security, and legal implications, especially for sensitive data. Learn how AWS extends AWS Greengrass to devices, so they can act locally on data and use the cloud for management, analytics, and durable storage.

IOT312_A New Generation IoT Core Platform

Amazon Web Services

This session provides a technical overview of a new-generation core IoT platform, designed and implemented by Enel in partnership with AWS IoT. The core IoT platform provides a single architecture and a common set of services that will be adopted by existing and future IoT applications across different business units at Enel. We analyze use cases with a live showcase of platform capabilities. We also demonstrate how the core platform enables Enel to build resilient and scalable business solutions by leveraging existing and leading-edge AWS services, such as the AWS IoT Device Gateway, AWS IoT Device Shadow, and AWS Greengrass.

Automation of the ML Cycle

Amazon Web Services

The Machine Learning Factory: Automation of the ML Lifecycle Speaker: Jason Barto, AWS Solutions Architect, AWS The lifecycle of a machine learning model, and more importantly the business insights it offers, is an iterative and ever evolving process. From feature discovery and engineering, to model training and selection, even through to production hosting and drift detection, AWS services can support and automate the events that lead to change in a customer’s model. Join us to see a demonstration of how AWS services can be used to transform raw data into an engineered feature set that then triggers the training and evaluation of an updated model. This session will address topics such as context drift, secure hosting of trained models as a RESTful API, and automation for retraining models when data or code changes.

Join us to hear about our strategy for driving machine learning innovation for our customers and learn what’s new from AWS in the machine learning space. Swami Sivasubramanian, VP of Amazon Machine Learning, will discuss and demonstrate the latest new services for ML on AWS: Amazon SageMaker, AWS DeepLens, Amazon Rekogntion Video, Amazon Translate, Amazon Transcribe, and Amazon Comprehend. Attend this session to understand how to make the most of machine learning in the cloud.

TLC304-At the Cutting Edge AWS IOT and Greengrass for Multi-Access Edge Compu...

Amazon Web Services

In June 2017, AWS announced the general availability of the Greengrass service bringing local compute, messaging, data caching and synch capabilities to network edge devices. In this session, you will learn how AWS IOT, Greengrass and Lambda@Edge are integrated into Nokia’s Multi-Access Edge Compute (MEC) solution, enabling a platform that provides a programming model at the edge as well as specialized access necessary for the roll-out of advanced 4G and 5G use cases. We will dive into the architecture of this MEC implementation that is tailored to aggregate traffic from multiple macro-cellular and small-cell stations in LTE and 5G networks. You will learn to take advantage of the containerized programming environment on the MEC platform, while also connecting with the eco-system of AWS services.

Scaling Convolutional Neural Networks with Kubernetes and TensorFlow on AWS -...

Amazon Web Services

In this session, Reza Zadeh, CEO of Matroid, presents a Kubernetes deployment on Amazon Web Services that provides customized computer vision to a large number of users. Reza offers an overview of Matroid’s pipeline and demonstrates how to customize computer vision neural network models in the browser, followed by building, training, and visualizing TensorFlow models, which are provided at scale to monitor video streams.

Amazon's Innovation with Machine Learning

Amazon Web Services Japan

Cloud Computing Tutorial For Beginners | What is Cloud Computing | AWS Traini...

Edureka!

BAP307_Use Amazon Lex to Build a Customer Service Chatbot in Your Amazon Conn...

Amazon Web Services

In this session, you learn how easy it is to incorporate a voice-based Amazon Lex chatbot into Amazon Connect. We walk you through configuring your own Amazon Connect contact center, implementing a chatbot, and using it in your workflows to deliver a personalized caller experience. You also learn to further personalize the caller experience using AWS Lambda to look up information about the caller from a customer data system. Leave the session with a functioning contact center and a voice-enabled chatbot that you can configure to your business needs over time. Come prepared to build by bringing your laptop and a phone to make test calls.

TVB 透過創新快速接觸三百萬用戶

Amazon Web Services

ATC302_How to Leverage AWS Machine Learning Services to Analyze and Optimize ...

Amazon Web Services

In this session, you’ll learn how AdTech companies use AWS services like Glue, Athena, Quicksight, and EMR to analyze your Google DoubleClick Campaign Manager data at scale without the burden of infrastructure or worries about server maintenance. We’ll live-process a click stream so you can see how Machine Learning can help maximize your revenue by finding the most optimal path of a campaign and we’ll look at a real world demo from A9’s Advertising Science Team of how they use the data to build Look-alike Model in their projects.

Architecting a Real-World Microservices Architecture and DevOps Strategy on A...

Amazon Web Services

Scaling a large-scale tier-1 solution in the cloud requires a sophisticated approach, a microservices design, and a robust DevOps strategy. Architects must consider how to design a solution that maintains zero downtime, adjusts gracefully to consumption peaks and valleys, and can iterate and version APIs without client-side challenges to allow for continuous innovation. In this session, we share some approaches an architect should consider when building a tier-1 application. We cover such topics as authentication and authorization, API versioning, stateless design, zero-downtime deployment, graceful failure, automatic recovery, and infrastructure as code provisioning and maintenance.

Building Mobile Apps with AWS Amplify

Amazon Web Services

GPSTEC305-Machine Learning in Capital Markets

Amazon Web Services

Financial services companies are using machine learning to reduce fraud, streamline processes, and improve their bottom line. AWS provides tools that help them easily use AI tools like MXNet and Tensor Flow to perform predictive analytics, clustering, and more advanced data analyses. In this session, hear how IHS Markit has used machine learning on AWS to help global banking institutions manage their commodities portfolios. Learn how Amazon Machine Learning can take the hassle out of AI.

Introducing AWS Cloud9 - AWS Online Tech Talks

Amazon Web Services

MBL201_Progressive Web Apps in the Real World

Amazon Web Services

AWS SysOps Administrator Training | AWS SysOps Tutorial | Edureka

Edureka!

IOT328_Building an AWS IoT-Enabled Drink Dispenser

Amazon Web Services

Explore and build all the components of a complete connected device workflow. We start with constructing a physical drink dispenser from provided parts and connecting it to AWS IoT. Then we use Amazon Cognito, Amazon DynamoDB, AWS Lambda, Amazon API Gateway, and Amazon S3 to build a serverless application for secure device management and control of your dispenser. Learn how AWS IoT provides flexible communication with physical connected devices and integrates with other AWS services. Also learn how to incorporate a serverless application built with other AWS services to intuitively manage and control devices from a responsive web application. This workshop involves connections to the physical drink dispenser, so bring a laptop with administrative privileges and a working USB port, and have the AWS CLI loaded and configured for your AWS account (with administrative permissions). We provide the physical hardware, USB cable, and network connectivity.

Amazon Time Sync Service now makes it easier to generate and compare timestamps

Dhaval Soni

Keynote

Amazon Web Services

AWS Initiate Day Manchester 2019 – AWS Cloud Foundations

Amazon Web Services

Model Serving for Deep Learning with MXNet Model Server

Amazon Web Services

Model Serving for Deep Learning

Adrian Hornsby

What's hot

AWS 마켓플레이스 기반 API 비즈니스 성장 경험 공유 (김건오 대표, 트윈워드) :: AWS TechShift 2018

Amazon Web Services Korea

How Websites go Serverless - WebSummit Lisbon 2018

Boaz Ziniman

Machine Learning State of the Union - MCL210 - re:Invent 2017

Amazon Web Services

TLC304-At the Cutting Edge AWS IOT and Greengrass for Multi-Access Edge Compu...

Amazon Web Services

Scaling Convolutional Neural Networks with Kubernetes and TensorFlow on AWS -...

Amazon Web Services

Amazon's Innovation with Machine Learning

Amazon Web Services Japan

Cloud Computing Tutorial For Beginners | What is Cloud Computing | AWS Traini...

Edureka!

BAP307_Use Amazon Lex to Build a Customer Service Chatbot in Your Amazon Conn...

Amazon Web Services

TVB 透過創新快速接觸三百萬用戶

Amazon Web Services

ATC302_How to Leverage AWS Machine Learning Services to Analyze and Optimize ...

Amazon Web Services

Architecting a Real-World Microservices Architecture and DevOps Strategy on A...

Amazon Web Services

Building Mobile Apps with AWS Amplify

Amazon Web Services

GPSTEC305-Machine Learning in Capital Markets

Amazon Web Services

Introducing AWS Cloud9 - AWS Online Tech Talks

Amazon Web Services

MBL201_Progressive Web Apps in the Real World

Amazon Web Services

AWS SysOps Administrator Training | AWS SysOps Tutorial | Edureka

Edureka!

IOT328_Building an AWS IoT-Enabled Drink Dispenser

Amazon Web Services

Amazon Time Sync Service now makes it easier to generate and compare timestamps

Dhaval Soni

Keynote

Amazon Web Services

AWS Initiate Day Manchester 2019 – AWS Cloud Foundations

Amazon Web Services

What's hot (20)

AWS 마켓플레이스 기반 API 비즈니스 성장 경험 공유 (김건오 대표, 트윈워드) :: AWS TechShift 2018

How Websites go Serverless - WebSummit Lisbon 2018

Machine Learning State of the Union - MCL210 - re:Invent 2017

TLC304-At the Cutting Edge AWS IOT and Greengrass for Multi-Access Edge Compu...

Scaling Convolutional Neural Networks with Kubernetes and TensorFlow on AWS -...

Amazon's Innovation with Machine Learning

Cloud Computing Tutorial For Beginners | What is Cloud Computing | AWS Traini...

BAP307_Use Amazon Lex to Build a Customer Service Chatbot in Your Amazon Conn...

TVB 透過創新快速接觸三百萬用戶

ATC302_How to Leverage AWS Machine Learning Services to Analyze and Optimize ...

Architecting a Real-World Microservices Architecture and DevOps Strategy on A...

Building Mobile Apps with AWS Amplify

GPSTEC305-Machine Learning in Capital Markets

Introducing AWS Cloud9 - AWS Online Tech Talks

MBL201_Progressive Web Apps in the Real World

AWS SysOps Administrator Training | AWS SysOps Tutorial | Edureka

IOT328_Building an AWS IoT-Enabled Drink Dispenser

Amazon Time Sync Service now makes it easier to generate and compare timestamps

Keynote

AWS Initiate Day Manchester 2019 – AWS Cloud Foundations

Similar to Deep learning systems model serving

Model Serving for Deep Learning with MXNet Model Server

Amazon Web Services

Model Serving for Deep Learning

Adrian Hornsby

Artificial Intelligence (Machine Learning) on AWS: How to Start

Vladimir Simek

Building Serverless Microservices with AWS

Donnie Prakoso

Learn how to build serverless applications using the AWS Serverless Platform-...

Amazon Web Services

What if you could build a web application that could support true web-scale traffic without having to ever provision or manage a single server? In this session, you will learn how to build a serverless website that scales automatically using services like AWS Lambda, Amazon API Gateway, and Amazon S3. We will review several frameworks that can help you build serverless applications, such as the AWS Serverless Application Model (AWS SAM), Chalice, and ClaudiaJS. We will cover: - Learn the basics of AWS Lambda and Amazon API Gateway - Understand how to build a web application using these AWS services - Learn to architect a serverless application - Gain an overview of frameworks for building serverless applications This webinar is a Level 100 session and is suited for: - Developers - Solution architects and engineers - Technical managers Speakers: Stephen Liedig, Public Sector Solution Architect, Amazon Web Services Q&A: Ed Lima, Solutions Architect, Amazon Web Services

Artificial Intelligence (Machine Learning) on AWS: How to Start

Vladimir Simek

Amazon has been investing deeply in artificial intelligence (AI) for over 20 years. Machine learning (ML) algorithms drive many of its internal systems. It is also core to the capabilities Amazon's customers experience – from the path optimization in the fulfillment centers, and Amazon.com’s recommendations engine, to Echo powered by Alexa, drone initiative Prime Air, and the new retail experience Amazon Go. This is just the beginning. Amazon's mission is to share learnings and ML capabilities as fully managed services, and put them into the hands of every developer and data scientist. If you are interested, how can you develop ML-based smart applications on the AWS platform, and want to see a couple of cool demos, join us for the next AWS meetup. AWS Solutions Architect, Vladimir Simek, will be presenting the full AWS portfolio for AI and ML - from virtual servers enabled for training Deep Learning models up to a fully managed API-based services.

Technological Accelerants for Organizational Transformation - DVC303 - re:Inv...

Amazon Web Services

Developers and management can seem at cross purposes when one group looks at technologies and the other looks at organizational issues. Both groups are looking for ways to deliver value faster, leaner, and at less cost. There are technological avenues for accomplishing these goals, including DevOps and serverless architectures. However, these approaches also have organizational implications, as they change the nature and content of communication between teams. In this session, we cover the technology benefits and organizational transformations involved in DevOps and serverless architectures. This session is part of the re:Invent Developer Community Day, six community-led sessions where AWS enthusiasts share technical insights on trending topics based on first-hand experiences and knowledge shared within local AWS communities.

DVC303-Technological Accelerants for Organizational Transformation

Amazon Web Services

"Developers and management can seem at cross purposes when one group looks at technologies and the other looks at organizational issues. Both groups are looking for ways to deliver value faster, leaner, and at less cost. There are technological avenues for accomplishing these goals, including DevOps and serverless architectures. However, these approaches also have organizational implications, as they change the nature and content of communication between teams. In this session, we cover the technology benefits and organizational transformations involved in DevOps and serverless architectures. This session is part of the re:Invent Developer Community Day, six community-led sessions where AWS enthusiasts share technical insights on trending topics based on first-hand experiences and knowledge shared within local AWS communities."

Accelerating Apache MXNet Models on Apple Platforms Using Core ML - MCL311 - ...

Amazon Web Services

Running deep learning models on devices at the edge is one of the hottest trends in AI today. This workshop provides a tutorial on developing and training deep learning models with Apache MXNet and walks you through how to easily bring them into the Apple ecosystem of products. You will learn how to convert MXNet models easily and efficiently to formats that can be integrated into iOS/macOS applications. To participate in this workshop, attendees will require an Apple MacBook running the latest OS (10.13). An iPhone running iOS 11+ or higher to run Core ML and Apache MXNet is optional.

AWS X-Ray: Debugging Applications at Scale - AWS Online Tech Talks

Amazon Web Services

Building secure and scalable mobile applications on AWS - AWS Summit Cape Tow...

Amazon Web Services

Speaker: Dennis Hills, AWS Level: 200 Developing mobile applications that capture customer attention in today’s marketplace is extremely competitive. Applications must be responsive, real-time, secure, and usable when no network is available.In this session, you’ll learn about the latest features from AWS Mobile to build secure, scalable mobile and web applications. We'll demonstrate the latest techniques for automatic mobile backend provisioning along with new capabilities for advanced querying and connecting to different data sources.

Innovations fueled by IoT and the Cloud

Adrian Hornsby

Slides from my talk at the IP Expo Nordic 2017: https://www.ipexponordic.com/Speakers-2017/Adrian-Hornsby Speed and agility are essential for today’s businesses. The quicker you can get from an idea to first results, the more you can experiment and innovate with your data, perform ad-hoc analysis, and drive answers to new business questions. During this talk, Adrian will take in key features of the AWS IoT platform, latest developments and live demos

Maschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten

AWS Germany

In diesem Vortrag geben wir einen Überblick mit Beispielen über aktuelle Werkzeuge für Maschinelles Lernen (ML) auf AWS. Dieser überblick deckt alle Möglichkeiten von einfach zu nutzenden, vollständig verwalteten ML-Services für Entwickler über ML-Plattformen für Data Scientists bis hin zu ML-optimierten Infrastruktur- und Software-Komponenten ab. Beispiele und Online-Demos zeigen, wie einfach ML-Methoden auf AWS genutzt werden können. Moderator: Christian Petters, Solutions Architect, AWS

Reactive Architectures with Microservices

AWS Germany

CON203_Driving Innovation with Containers

Amazon Web Services

"Containers allow you to easily package an application's code, configurations, and dependencies into easy to use building blocks that deliver environmental consistency, operational efficiency, developer productivity, and version control. But how can developers leverage containers to drive innovation for their applications, their team, and organization? In this session, Asif Khan Technical Business Manager for AWS will discuss how containers are becoming a new cloud native compute primitive, and how your organization can use containers as a building block to accelerate innovation. WeWork's Christopher Tava, Joshua Davis, and OpsLine's Radek Wierzbicki will show how they adopted containers as discipline in code development, and how they refactored their production architecture into containers running on Amazon ECS in under 8 months."

Driving Innovation with Containers - CON203 - re:Invent 2017

Amazon Web Services

Containers allow you to easily package an application's code, configurations, and dependencies into easy to use building blocks that deliver environmental consistency, operational efficiency, developer productivity, and version control. But how can developers leverage containers to drive innovation for their applications, their team, and organization? In this session, Asif Khan Technical Business Manager for AWS will discuss how containers are becoming a new cloud native compute primitive, and how your organization can use containers as a building block to accelerate innovation. WeWork's Christopher Tava, Joshua Davis, and OpsLine's Radek Wierzbicki will show how they adopted containers as discipline in code development, and how they refactored their production architecture into containers running on Amazon ECS in under 8 months.

AWS Application Service Workshop - Serverless Architecture

John Yeung

GPS: Industry 4.0: AI and the Future of Manufacturing - GPSTEC326 - re:Invent...

Amazon Web Services

Advances in artificial intelligence, machine learning, and deep learning, along with the rapid deployment of Internet of Things (IoT) devices, are changing how physical products are designed and built. In this session, learn how AWS partners Siemens and Autodesk use AWS to enhance the design process and how they're incorporating AWS services into their products and smart factories. We explore how these trends impact the future of design and manufacturing.

GPSTEC326-GPS Industry 4.0 AI and the Future of Manufacturing

Amazon Web Services

RET304_Rapidly Respond to Demanding Retail Customers with the Same Serverless...

Amazon Web Services

Today’s retail customers want to set the rules on how and when they buy, receive, and return their product. But many retailers are struggling to unify their sales channels using existing legacy e-commerce software stacks. To consistently serve customers across retail channels, retailers must adopt a modern architecture that is elastic, cost effective, and based on loosely coupled application services. In this session, we dive deep into how retailers can leverage serverless architectures using Amazon API Gateway, AWS Lambda, and Amazon DynamoDB. Learn how Amazon Fresh quickly responded to customer feedback on the Totes Pickup feature, developing a cost-effective and scalable self-service serverless application to deliver a 1-click experience for the customer, while providing faster insights back to the business.

Similar to Deep learning systems model serving (20)

Model Serving for Deep Learning with MXNet Model Server

Model Serving for Deep Learning

Artificial Intelligence (Machine Learning) on AWS: How to Start

Building Serverless Microservices with AWS

Learn how to build serverless applications using the AWS Serverless Platform-...

Artificial Intelligence (Machine Learning) on AWS: How to Start

Technological Accelerants for Organizational Transformation - DVC303 - re:Inv...

DVC303-Technological Accelerants for Organizational Transformation

Accelerating Apache MXNet Models on Apple Platforms Using Core ML - MCL311 - ...

AWS X-Ray: Debugging Applications at Scale - AWS Online Tech Talks

Building secure and scalable mobile applications on AWS - AWS Summit Cape Tow...

Innovations fueled by IoT and the Cloud

Maschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten

Reactive Architectures with Microservices

CON203_Driving Innovation with Containers

Driving Innovation with Containers - CON203 - re:Invent 2017

AWS Application Service Workshop - Serverless Architecture

GPS: Industry 4.0: AI and the Future of Manufacturing - GPSTEC326 - re:Invent...

GPSTEC326-GPS Industry 4.0 AI and the Future of Manufacturing

RET304_Rapidly Respond to Demanding Retail Customers with the Same Serverless...

More from Hagay Lupesko

AI Powered Personalization @ Scale - O'Reilly AI San Jose - Sep 2019

Hagay Lupesko

Hagay Lupesko explores AI-powered personalization at Facebook and the challenges and practical techniques it applied to overcome these challenges. You’ll learn about deep learning-based personalization modeling, scalable training, and the accompanying system design approaches that are applied in practice. This is the deck from Hagay Lupesko's talk at the O'Reilly AI San Jose conference from Sep 2019.

Deep learning acceleration with Amazon Elastic Inference

Hagay Lupesko

What is deep learning (and why you should care) - Talk at SJSU Oct 2018

Hagay Lupesko

Emotion recognition in images: from idea to a model in production - Nordic DS...

Hagay Lupesko

Build, Train and Deploy ML Models using Amazon SageMaker

Hagay Lupesko

(presented in AWS ML Day in SF on June 2018) Amazon SageMaker is a fully-managed platform that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale. This presentation goes over key use cases and features of SageMaker, including a hands-on demo of using SageMaker and MXNet to build, train and deploy a neural network for sentiment analysis.

ONNX - The Lingua Franca of Deep Learning

Hagay Lupesko

(deck from my Prepare.AI talk in May 2018) ONNX is an open source format to encode deep learning models that is driven by industry leaders such as AWS, Facebook and Microsoft, and supported by a growing number of frameworks and platforms. With ONNX, deep learning practitioners gain model interoperability, which enables to pick and choose the framework and platform that is best suited for the task at hand. In this talk, I will dive into the ONNX format, explain the motivation, demo use cases, and discuss the roadmap.

More from Hagay Lupesko (6)

AI Powered Personalization @ Scale - O'Reilly AI San Jose - Sep 2019

Deep learning acceleration with Amazon Elastic Inference

What is deep learning (and why you should care) - Talk at SJSU Oct 2018

Emotion recognition in images: from idea to a model in production - Nordic DS...

Build, Train and Deploy ML Models using Amazon SageMaker

ONNX - The Lingua Franca of Deep Learning

Recently uploaded

TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR

Tier1 app

Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.

GlobusWorld 2024 Opening Keynote session

Globus

Globus Compute Introduction - GlobusWorld 2024

Globus

Understanding Globus Data Transfers with NetSage

Globus

NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?

SOCRadar Research Team: Latest Activities of IntelBroker

SOCRadar

The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month. The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies. However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News. Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!

How to Position Your Globus Data Portal for Success Ten Good Practices

Globus

Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.

Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...

Globus

The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.

Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better

XfilesPro

A Comprehensive Look at Generative AI in Retail App Testing.pdf

kalichargn70th171

OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam

takuyayamamoto1800

Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...

Globus

Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.

Navigating the Metaverse: A Journey into Virtual Evolution"

Donna Lenk

Orion Context Broker introduction 20240604

Fermin Galan

Corporate Management | Session 3 of 3 | Tendenci AMS

Tendenci - The Open Source AMS (Association Management Software)

Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have. For more Tendenci AMS events, check out www.tendenci.com/events

Globus Connect Server Deep Dive - GlobusWorld 2024

Globus

Developing Distributed High-performance Computing Capabilities of an Open Sci...

Globus

COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.

Enhancing Research Orchestration Capabilities at ORNL.pdf

Globus

Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.

Lecture 1 Introduction to games development

abdulrafaychaudhry

Into the Box 2024 - Keynote Day 2 Slides.pdf

Ortus Solutions, Corp

Cracking the code review at SpringIO 2024

Paco van Beckhoven

Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production. Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process? In this session we will cover: - The Art of Effective Code Reviews - Streamlining the Review Process - Elevating Reviews with Automated Tools By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces

Recently uploaded (20)

TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR

GlobusWorld 2024 Opening Keynote session

Globus Compute Introduction - GlobusWorld 2024

Understanding Globus Data Transfers with NetSage

SOCRadar Research Team: Latest Activities of IntelBroker

How to Position Your Globus Data Portal for Success Ten Good Practices

Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...

Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better

A Comprehensive Look at Generative AI in Retail App Testing.pdf

OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam

Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...

Navigating the Metaverse: A Journey into Virtual Evolution"

Orion Context Broker introduction 20240604

Corporate Management | Session 3 of 3 | Tendenci AMS

Globus Connect Server Deep Dive - GlobusWorld 2024

Developing Distributed High-performance Computing Capabilities of an Open Sci...

Enhancing Research Orchestration Capabilities at ORNL.pdf

Lecture 1 Introduction to games development

Into the Box 2024 - Keynote Day 2 Slides.pdf

Cracking the code review at SpringIO 2024

Deep learning systems model serving

3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Brief Intro to Deep Learning – Neural Networks Output Layer Input Layer Hidden Layers Many More… • Non linear • Hierarchical feature learning • Scalable architecture • Computationally intensive

7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Performance Availability Networking Monitoring Model Decoupling Cross Framework Cross Platform The Undifferentiated Heavy Lifting of Model Serving

12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. REST and OpenAPI REST-like endpoint: <model-name>/predict Endpoint auto-generated from the model’s signature.json JSON encoding by default Binary input via request payload OpenAPI support – client code-gen and tooling Back

13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. MMS Dockerfile Build Push Launch Containerization Container Cluster MMS Container MMS ContainerMMS Container MXNet NGINX MXNet Model Server Lightweight virtualization, isolation, runs anywhere Back

14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Requests • Latencies • Resources Metrics • Model Name • Host Name Dimensions • Log / CSV • AWS CloudWatch Target Operational Metrics Back

16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. O(n2) Pairs MXNet Caffe2 PyTorch TF CNTKCoreML TensorRT NGraph SNPEMany Frameworks ONNX Support Many Platforms ONNX: Common IR Supported in MMS v0.2

17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Performance • Batching • Caching • JIT Compilation • Custom code • Quantization Platform • New players • ONNX • Plugins Adoption • Ease of use • Internal Amazon dev tools • Industry partners Challenges Ahead

Deep learning systems model serving

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Deep learning systems model serving

Similar to Deep learning systems model serving (20)

More from Hagay Lupesko

More from Hagay Lupesko (6)

Recently uploaded

Recently uploaded (20)

Deep learning systems model serving