This talk was recorded in London on October 30th, 2018 and can be viewed here: https://youtu.be/CeOJFynB6BE
Real-Time AI: Designing for Low Latency and High Throughput
Bio: Dr. Sergei Izrailev is Chief Data Scientist at Beeswax, where he is responsible for data strategy and building AI applications powering the next generation of real-time bidding technology. Before Beeswax, Sergei led data science teams at Integral Ad Science and Collective, where he focused on architecture, development, and scaling of data science-based advertising technology products. Prior to advertising, Sergei was a quant/trader and developed trading strategies and portfolio optimization methodologies. Previously, he worked as a senior scientist at Johnson & Johnson, where he developed intelligent tools for structure-based drug discovery.
4. • One prediction at a time: request - response
• Limited time to retrieve the prediction (latency)
• Minimum number of prediction per unit time (throughput)
Defining “Real Time”
5. • Loan officer of financial advisor
• ideally within a few seconds, but waiting longer is OK
• Online travel site
• a couple of seconds overall response time
• thousands of predictions; sub-millisecond time per prediction
• Ad buying platform
• 5-10 millisecond response time
• High-frequency trading
• sub-microsecond response time
Scale of Real Time: End-User Experience
12. Trade-Offs of Batch and Real-Time Scoring
Consideration Batch Real-Time Hybrid
ML packages and languages Any Some Both
ML algorithms and feature transformations Any Some Both
Using complex features Yes No Yes
Predictions for every request No Yes Yes
Combinatorial input dimensions No Yes Yes
System complexity Medium Medium High
Accuracy (depends) Medium Medium High
13. • Data only - linear coefficients, PMML, etc
• scoring code is independent
• real-time: scoring code can be in C++ or Java for speed
• batch: any scoring engine can be used (even SQL)
• Code + Data: Serialized objects - Pickle, Spark, R
• reuse generated code in the same framework
• primarily good for batch
• May work in a real-time service, latency and throughput permitting
Model Deployment Options
14. • Code + Data: H2O's POJO and MOJO
• Generated code is used in a different environment
• Batch: load a generated jar in Spark as a UDF
• Real-time: load a generated jar in the real-time system (or wrap it into
a REST service)
• MOJO 2: includes feature transformations
Model Deployment Options (continued)
17. ML Use Case: Campaign Optimization
Event
Probability
Value of
Event
Real-Time
Metrics
Optimization
Algorithm
Bid Price
Example: spend the whole budget
evenly over a given time period,
while maximizing the number of events
18. • Inputs: billions of records
• Latency: 5 ms
• Throughput: 100K requests per second
• Production stack: Python, C++, and Java
Real-Time Constraints
19. • Training pipelines: pyspark
• Scoring type: batch predictions + real-time cache
• ML training engine: H2O Driverless AI
• Infrastructure on AWS: “h2ostart”, “h2ostop”
• Transformations and ML scoring engine: pysparkling + mojo 2
Machine Learning Setup
20. • Manual feature engineering is very time consuming
• And there are higher value activities for data scientists
• Mojo 2: both feature transformations and prediction engines
• Option to switch to real-time scoring later
Driverless AI vs Other Options
21. • Provides an auto-pilot
• Someone still has to fuel, service, take off, and land the plane
• Still need to experiment, but setting it up is easy
• Needed a reasonably large machine
• p2.8xlarge: 8 GPUs, 32 vCPUs, 488 GB RAM
• Other constraints: mojo2 is limited to XGBoost and GLM only
Practicalities of Driverless AI
22. • Accuracy levels off, while complexity continues to increase
• Complexity leads to larger size of the mojos (increased
memory requirements) and reduces the speed of scoring
(increased CPU requirements).
• On AWS, literally, time IS money, so complexity = higher costs
Trade-Off: Accuracy vs Complexity
24. • Define what "real time" means for your application
• The most important choice: batch or real-time predictions
• Driverless AI helps solve the feature engineering problem
• Mojo 2 includes feature transformation code ready for real-
time applications
Takeaways
25. Yes, we are hiring…
sergei@beeswax.com
Questions?