Implementing AI: High Performance Architectures: Solving Core Recommendation Model Challenges in Data Centers

Copyright © Myrtle.ai 2020
Solving Core Recommendation
Model Challenges in Data Centers
Giles Peckham, Myrtle.ai

Myrtle.aiacceleratesMachineLearninginference
• Accelerates Recommendation Models, RNNs and other DNNs with sparse structures
• Achieves maximum throughputin applicationswith strictlatency constraints
• Addresses hyper-scale inference
• Data Centers (Cloud & On-Premise) and Embedded applications
Myrtle.ai
Founding Member:
MLCommons
Alliance Member
Gold Partner
AI Keynote 2019
Joint White Paper
Copyright© Myrtle.ai 2020
Recommendation
Systems Speech Synthesis
Speech
Transcription
Machine
Translation

MAU Accelerator
Low latency inference accelerator for data center ML workloads
Optimized for highest latency-bounded throughput
DNN Model
Cloud or enterprise
data center serverFPGA accelerator
card

MAU Accelerator Benefits
Optimized for highest
latency-bounded
throughput
Reduceddata centerinfrastructure required • Lower CapEx
• Mitigates against rack space limitations
Reducedenergy consumption • Lower OpEx
• Smaller carbonfootprint
• Mitigates against power constraints
Deterministic low tail-latency enables the use of
higher quality models
• Improvedcustomer experience
• Better services
Uses readily-available data center accelerator cards
compatible withtypical server installations
• Rapiddeployment at scale
Development flow basedonindustry standards • Easy to compile frompopularopen-source
frameworks
Flexible & reprogrammable solution • Future proof

Applications
Target Applications
• Speech transcription
• Natural language processing
• Speech synthesis
• Time series cleansing & analysis
• Payment & trading fraud detection
• Anomaly detection
• Network security
Target Model Architectures
• Fully connected linear layers
• RNN, including LSTM and GRU
• Time delay neural network (TDNN)
Target Sectors
• Finance (trading,compliance, service)
• Search, Social Media & other Ad Servers
• HPC (very large ML)
• Life science (genomics, dataanalytics)
• Defense, Aerospace, Security
• Telcos & Conferencing Providers

An Accelerator for Recommendation Systems

Recommendation Models
• One of the most common data center workloads
• Used for search, adverts,feeds and personalization
Demands
• Throughput / Capacity
• Need to ramp up capacityquicklyto meet demand
• Months/years to commission new data center floor space
• Cost
• Data center rack server investment >$50B /yr1
• Latency / Model Accuracy / Revenue
• 5ms latencyis challengingfor typical server systems
• 100ms delayin load time can cost e-commerce companies many $B /yr2
• Energy Consumption/ Carbon Footprint
• Global data center energy costs >$10B /yr3
• Global data center emissions ~100M tonnes CO2 /yr4
1. https://www.marketsandmarkets.com/Market-Reports/data-center-rack-server-market-53332315.html
2. https://www.akamai.com/uk/en/about/news/press/2017-press/akamai-releases-spring-2017-state-of-online-retail-performance-report.jsp
3. https://www.sciencedaily.com/releases/2020/02/200227144313.htm
4. https://www.comsoc.org/publications/tcn/2019-nov/energy-efficiency-data-centers

Design Challenges
• A typicalRecommendationModel:
• Traditionalapproach:
• Put the wholemodel on one chip
• Myrtle.aiapproach:
• Offloaddifferent features of the modelto different hardware accelerators
• Make it equallypracticalto adopt
Dense Features
Compute-Bound
Sparse Features
Memory-Bound
Dense Features
Compute-Bound
OutputInput
• Up to 80% of time can be spent here
• Memory architecture in typical data center infrastructure is inefficient here
• Existingaccelerators give a poor return here

• Accelerates the memory-bound sparse operations in all recommendation models
• Delivers large gains in latency bounded throughput
• Fully preserves existing model accuracy
• Is complementary to existing compute accelerators
• Is integrated into the PyTorch Deep Learning Framework
SEAL: An Accelerator for
Recommendation Systems

Add SEAL
modules
Offload sparse
operations to
SEAL
CPU freed up;
latency reduced
Increase
CPU batch size
Throughput
increased
The “Virtuous Circle”

Performance
• Vector Processing Bandwidth is the bandwidth achievable when
transforming random multi-hot vectors into real-valued dense vectors
• Carrier is Glacier Point v2
Vector Processing Bandwidth
16 GB version 18 GB/s (219 GB/s per carrier)
32 GB version 16 GB/s (195 GB/s per carrier)

Key Benefits
Based on benchmarking using a weighted average of the mlperf.org benchmark
recommendation models (Dec. 2019):
• Rapid 8x increase in latency-boundedthroughputusing existing infrastructure1
• Enables more recommendationsto be made
• Enables better quality recommendationsto be made
• Higher CTRs
• Increased revenue
• Greater consumer satisfaction
• Up to 50%CapEx savings on further capacity expansion1,2
• Up to 80%reduction in energy consumption1,2
• OpEx savings
• Smaller carbonfootprint
1 Comparisonsarebetween a Xeon D-2100 performinginferenceon itown and thesameCPUleveragingSEALacceleration.
Performanceandbenefitswill vary,dependingon individualsystemconfiguration and model usage.
2 Based on servers+SEALonly. Excludesbuildings,HVAC etc.
8xmore
throughput
50%
less CapEx
80%
less energy

Highly Complementary
to Existing Infrastructure
• Acceleratesexisting servers; easy to install
• Complementary to other accelerators
• Scalable
• Does not requireany changeto the
recommendation model.No model
retraining.No degradation in accuracy
• Supportsco-location of modelswith no
performancepenalty
• Supportsconcurrentdeploymentof different
versionsof a model, and loading/unloading
models on the fly to facilitate A/B testing

Contact seal@myrtle.ai to evaluate what SEAL can do for your business
For more information visit myrtle.ai/seal
SEAL is the
• lowest power
• smallest form factor
• easiest-to-deploy
method of optimizing memory bound
recommendation models in existing infrastructure.

Thank You
w w w . m y r t l e . a i
Giles Peckham
07785 278478
giles@myrtle.ai

Implementing AI: High Performance Architectures: Solving Core Recommendation Model Challenges in Data Centers

More Related Content

What's hot

Similar to Implementing AI: High Performance Architectures: Solving Core Recommendation Model Challenges in Data Centers

More from KTN

Recently uploaded

Implementing AI: High Performance Architectures: Solving Core Recommendation Model Challenges in Data Centers