SlideShare a Scribd company logo
1 of 25
Real-Time AI:
Designing for Low Latency
and High Throughput
Sergei Izrailev
Chief Data Scientist
Beeswax
LinkedIn, Twitter: @sizrailev
• What is “real time”?
• Designing for real time
• Real-time machine learning applications at Beeswax
Topics
What is “Real Time”?
• One prediction at a time: request - response
• Limited time to retrieve the prediction (latency)
• Minimum number of prediction per unit time (throughput)
Defining “Real Time”
• Loan officer of financial advisor
• ideally within a few seconds, but waiting longer is OK
• Online travel site
• a couple of seconds overall response time
• thousands of predictions; sub-millisecond time per prediction
• Ad buying platform
• 5-10 millisecond response time
• High-frequency trading
• sub-microsecond response time
Scale of Real Time: End-User Experience
Designing for Real Time
Model Building and Model Scoring
Serialized
Model
Build
Train
Transform
Score
Predict
Transform
Real Time: The Biggest Design Decision
Look-up
Predictions
Computed
in Batch
Compute
Predictions
in Real Time
?
Batch Scoring
ConsumerScore
Cache
Predictions
PredictionKey
Default
New data
Trans-
form
Predict
New data
Predictions
Real-Time Scoring
ConsumerScore
New data
Prediction
New data
Trans-
form
Predict
Hybrid Scoring
ScoreScore
Cache
Features
FeaturesKey
Features
Predict
New data Trans-
form New data
Consumer
Prediction
Trade-Offs of Batch and Real-Time Scoring
Consideration Batch Real-Time Hybrid
ML packages and languages Any Some Both
ML algorithms and feature transformations Any Some Both
Using complex features Yes No Yes
Predictions for every request No Yes Yes
Combinatorial input dimensions No Yes Yes
System complexity Medium Medium High
Accuracy (depends) Medium Medium High
• Data only - linear coefficients, PMML, etc
• scoring code is independent
• real-time: scoring code can be in C++ or Java for speed
• batch: any scoring engine can be used (even SQL)
• Code + Data: Serialized objects - Pickle, Spark, R
• reuse generated code in the same framework
• primarily good for batch
• May work in a real-time service, latency and throughput permitting
Model Deployment Options
• Code + Data: H2O's POJO and MOJO
• Generated code is used in a different environment
• Batch: load a generated jar in Spark as a UDF
• Real-time: load a generated jar in the real-time system (or wrap it into
a REST service)
• MOJO 2: includes feature transformations
Model Deployment Options (continued)
Real-Time ML Applications
at Beeswax
Bid request
Ad Exchanges
Bidder
Advertiser
Machine
Learning
Bid price Predictions
Data
2M requests
per second
10K-200K
requests per
second
under 50 ms
response time
5 ms
ML Use Case: Campaign Optimization
Event
Probability
Value of
Event
Real-Time
Metrics
Optimization
Algorithm
Bid Price
Example: spend the whole budget
evenly over a given time period,
while maximizing the number of events
• Inputs: billions of records
• Latency: 5 ms
• Throughput: 100K requests per second
• Production stack: Python, C++, and Java
Real-Time Constraints
• Training pipelines: pyspark
• Scoring type: batch predictions + real-time cache
• ML training engine: H2O Driverless AI
• Infrastructure on AWS: “h2ostart”, “h2ostop”
• Transformations and ML scoring engine: pysparkling + mojo 2
Machine Learning Setup
• Manual feature engineering is very time consuming
• And there are higher value activities for data scientists
• Mojo 2: both feature transformations and prediction engines
• Option to switch to real-time scoring later
Driverless AI vs Other Options
• Provides an auto-pilot
• Someone still has to fuel, service, take off, and land the plane
• Still need to experiment, but setting it up is easy
• Needed a reasonably large machine
• p2.8xlarge: 8 GPUs, 32 vCPUs, 488 GB RAM
• Other constraints: mojo2 is limited to XGBoost and GLM only
Practicalities of Driverless AI
• Accuracy levels off, while complexity continues to increase
• Complexity leads to larger size of the mojos (increased
memory requirements) and reduces the speed of scoring
(increased CPU requirements).
• On AWS, literally, time IS money, so complexity = higher costs
Trade-Off: Accuracy vs Complexity
Trade-Off: Accuracy vs Complexity
• Define what "real time" means for your application
• The most important choice: batch or real-time predictions
• Driverless AI helps solve the feature engineering problem
• Mojo 2 includes feature transformation code ready for real-
time applications
Takeaways
Yes, we are hiring…
sergei@beeswax.com
Questions?

More Related Content

What's hot

Embedded Systems Power Management
Embedded Systems Power ManagementEmbedded Systems Power Management
Embedded Systems Power ManagementPatrick Bellasi
 
A New Golden Age for Computer Architecture
A New Golden Age for Computer ArchitectureA New Golden Age for Computer Architecture
A New Golden Age for Computer ArchitectureYanbin Kong
 
Orchestration vs Choreography - A Guide To Composing Your Monolith
Orchestration vs Choreography - A Guide To Composing Your MonolithOrchestration vs Choreography - A Guide To Composing Your Monolith
Orchestration vs Choreography - A Guide To Composing Your MonolithIan Thomas
 
Managing by Missing
Managing by MissingManaging by Missing
Managing by MissingIan Nowland
 
Vectorwise database training
Vectorwise database trainingVectorwise database training
Vectorwise database trainingZahid Quadri
 
Unreal Open Day 2017 UE4 for Mobile: The Future of High Quality Mobile Games
Unreal Open Day 2017 UE4 for Mobile: The Future of High Quality Mobile GamesUnreal Open Day 2017 UE4 for Mobile: The Future of High Quality Mobile Games
Unreal Open Day 2017 UE4 for Mobile: The Future of High Quality Mobile GamesEpic Games China
 

What's hot (8)

Embedded Systems Power Management
Embedded Systems Power ManagementEmbedded Systems Power Management
Embedded Systems Power Management
 
A New Golden Age for Computer Architecture
A New Golden Age for Computer ArchitectureA New Golden Age for Computer Architecture
A New Golden Age for Computer Architecture
 
Orchestration vs Choreography - A Guide To Composing Your Monolith
Orchestration vs Choreography - A Guide To Composing Your MonolithOrchestration vs Choreography - A Guide To Composing Your Monolith
Orchestration vs Choreography - A Guide To Composing Your Monolith
 
Managing by Missing
Managing by MissingManaging by Missing
Managing by Missing
 
Relic's FX System
Relic's FX SystemRelic's FX System
Relic's FX System
 
Vectorwise database training
Vectorwise database trainingVectorwise database training
Vectorwise database training
 
Unreal Open Day 2017 UE4 for Mobile: The Future of High Quality Mobile Games
Unreal Open Day 2017 UE4 for Mobile: The Future of High Quality Mobile GamesUnreal Open Day 2017 UE4 for Mobile: The Future of High Quality Mobile Games
Unreal Open Day 2017 UE4 for Mobile: The Future of High Quality Mobile Games
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
 

Similar to Real-Time AI: Designing for Low Latency and High Throughput - Dr. Sergei Izrailev - H2O AI World London 2018

MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Crate.io
 
From 6 hours to 1 minute... in 2 days! How we managed to stream our (long) Ha...
From 6 hours to 1 minute... in 2 days! How we managed to stream our (long) Ha...From 6 hours to 1 minute... in 2 days! How we managed to stream our (long) Ha...
From 6 hours to 1 minute... in 2 days! How we managed to stream our (long) Ha...Dataconomy Media
 
Big Data Berlin - Criteo
Big Data Berlin - CriteoBig Data Berlin - Criteo
Big Data Berlin - CriteoSofian Djamaa
 
AWS for Java Developers workshop
AWS for Java Developers workshopAWS for Java Developers workshop
AWS for Java Developers workshopRory Preddy
 
Accelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks RuntimeAccelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks RuntimeDatabricks
 
IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0Matt Lucas
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastDatabricks
 
November 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridNovember 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridYahoo Developer Network
 
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDBMongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDBMongoDB
 
AWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with KinesisAWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with KinesisAmazon Web Services
 
High Performance Computing with AWS
High Performance Computing with AWSHigh Performance Computing with AWS
High Performance Computing with AWSAmazon Web Services
 
Micro Service Architecture
Micro Service ArchitectureMicro Service Architecture
Micro Service ArchitectureEduards Sizovs
 
AWS for the Java Developer
AWS for the Java DeveloperAWS for the Java Developer
AWS for the Java DeveloperRory Preddy
 
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...Gary Arora
 
Eric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New ContextsEric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New ContextsEric Proegler
 
CQRS and Event Sourcing for IoT applications
CQRS and Event Sourcing for IoT applicationsCQRS and Event Sourcing for IoT applications
CQRS and Event Sourcing for IoT applicationsMichael Blackstock
 
How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...PerformanceVision (previously SecurActive)
 
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Chun-Yu Tseng
 

Similar to Real-Time AI: Designing for Low Latency and High Throughput - Dr. Sergei Izrailev - H2O AI World London 2018 (20)

MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?
 
From 6 hours to 1 minute... in 2 days! How we managed to stream our (long) Ha...
From 6 hours to 1 minute... in 2 days! How we managed to stream our (long) Ha...From 6 hours to 1 minute... in 2 days! How we managed to stream our (long) Ha...
From 6 hours to 1 minute... in 2 days! How we managed to stream our (long) Ha...
 
Big Data Berlin - Criteo
Big Data Berlin - CriteoBig Data Berlin - Criteo
Big Data Berlin - Criteo
 
AWS for Java Developers workshop
AWS for Java Developers workshopAWS for Java Developers workshop
AWS for Java Developers workshop
 
Accelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks RuntimeAccelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks Runtime
 
IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
 
November 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridNovember 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory grid
 
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDBMongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
 
AWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with KinesisAWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with Kinesis
 
High Performance Computing with AWS
High Performance Computing with AWSHigh Performance Computing with AWS
High Performance Computing with AWS
 
Micro Service Architecture
Micro Service ArchitectureMicro Service Architecture
Micro Service Architecture
 
Robotics technical Presentation
Robotics technical PresentationRobotics technical Presentation
Robotics technical Presentation
 
AWS for the Java Developer
AWS for the Java DeveloperAWS for the Java Developer
AWS for the Java Developer
 
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
 
Eric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New ContextsEric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New Contexts
 
CQRS and Event Sourcing for IoT applications
CQRS and Event Sourcing for IoT applicationsCQRS and Event Sourcing for IoT applications
CQRS and Event Sourcing for IoT applications
 
How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...
 
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
 

More from Sri Ambati

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxSri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thSri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMsSri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the WaySri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OSri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersSri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email AgainSri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati
 

More from Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Real-Time AI: Designing for Low Latency and High Throughput - Dr. Sergei Izrailev - H2O AI World London 2018

  • 1. Real-Time AI: Designing for Low Latency and High Throughput Sergei Izrailev Chief Data Scientist Beeswax LinkedIn, Twitter: @sizrailev
  • 2. • What is “real time”? • Designing for real time • Real-time machine learning applications at Beeswax Topics
  • 3. What is “Real Time”?
  • 4. • One prediction at a time: request - response • Limited time to retrieve the prediction (latency) • Minimum number of prediction per unit time (throughput) Defining “Real Time”
  • 5. • Loan officer of financial advisor • ideally within a few seconds, but waiting longer is OK • Online travel site • a couple of seconds overall response time • thousands of predictions; sub-millisecond time per prediction • Ad buying platform • 5-10 millisecond response time • High-frequency trading • sub-microsecond response time Scale of Real Time: End-User Experience
  • 7. Model Building and Model Scoring Serialized Model Build Train Transform Score Predict Transform
  • 8. Real Time: The Biggest Design Decision Look-up Predictions Computed in Batch Compute Predictions in Real Time ?
  • 12. Trade-Offs of Batch and Real-Time Scoring Consideration Batch Real-Time Hybrid ML packages and languages Any Some Both ML algorithms and feature transformations Any Some Both Using complex features Yes No Yes Predictions for every request No Yes Yes Combinatorial input dimensions No Yes Yes System complexity Medium Medium High Accuracy (depends) Medium Medium High
  • 13. • Data only - linear coefficients, PMML, etc • scoring code is independent • real-time: scoring code can be in C++ or Java for speed • batch: any scoring engine can be used (even SQL) • Code + Data: Serialized objects - Pickle, Spark, R • reuse generated code in the same framework • primarily good for batch • May work in a real-time service, latency and throughput permitting Model Deployment Options
  • 14. • Code + Data: H2O's POJO and MOJO • Generated code is used in a different environment • Batch: load a generated jar in Spark as a UDF • Real-time: load a generated jar in the real-time system (or wrap it into a REST service) • MOJO 2: includes feature transformations Model Deployment Options (continued)
  • 16. Bid request Ad Exchanges Bidder Advertiser Machine Learning Bid price Predictions Data 2M requests per second 10K-200K requests per second under 50 ms response time 5 ms
  • 17. ML Use Case: Campaign Optimization Event Probability Value of Event Real-Time Metrics Optimization Algorithm Bid Price Example: spend the whole budget evenly over a given time period, while maximizing the number of events
  • 18. • Inputs: billions of records • Latency: 5 ms • Throughput: 100K requests per second • Production stack: Python, C++, and Java Real-Time Constraints
  • 19. • Training pipelines: pyspark • Scoring type: batch predictions + real-time cache • ML training engine: H2O Driverless AI • Infrastructure on AWS: “h2ostart”, “h2ostop” • Transformations and ML scoring engine: pysparkling + mojo 2 Machine Learning Setup
  • 20. • Manual feature engineering is very time consuming • And there are higher value activities for data scientists • Mojo 2: both feature transformations and prediction engines • Option to switch to real-time scoring later Driverless AI vs Other Options
  • 21. • Provides an auto-pilot • Someone still has to fuel, service, take off, and land the plane • Still need to experiment, but setting it up is easy • Needed a reasonably large machine • p2.8xlarge: 8 GPUs, 32 vCPUs, 488 GB RAM • Other constraints: mojo2 is limited to XGBoost and GLM only Practicalities of Driverless AI
  • 22. • Accuracy levels off, while complexity continues to increase • Complexity leads to larger size of the mojos (increased memory requirements) and reduces the speed of scoring (increased CPU requirements). • On AWS, literally, time IS money, so complexity = higher costs Trade-Off: Accuracy vs Complexity
  • 24. • Define what "real time" means for your application • The most important choice: batch or real-time predictions • Driverless AI helps solve the feature engineering problem • Mojo 2 includes feature transformation code ready for real- time applications Takeaways
  • 25. Yes, we are hiring… sergei@beeswax.com Questions?