Build Your Own
Recommendation Engine 

(during Weekend)
Michal Malohlava
@mmalohlava && @h2oai
presents
MusicService
Activities 

clicks/swipes/likes
Clients iOS/Android/…
Next 

N-recommendations
? REBB*?
*REBB = Recommendation Engine Black Box
Requirements
Activities 

can be >100/s
REBB should be accessible
via REST API
Recommendations
need to be served <500ms,
should keep users exploring
AWS infrastructure
Need to 

be ready in 2 days!
Requirements
• Recommendations should be served <500ms
• ML part should allow quick prototyping &
experimentation
• Storage (online/offline) - user stats, histories, recommendations
• Scalable
• frontend receiving requests
• backend solving ML
• storage Need to 

be ready in 2 days!
Engine Architecture
Variation of λ-architecture…
… with pluggable ML

backend
Engine Architecture
Regular EC2
nodes
API Router
REST API via Spray

Akka Actor accepting and filtering:
• user activities
• recommendation requests
Scalable via HAProxy
API Router


Akka Actor handles

• POST of user activity 

• publish activity to Redis

• update stats in Redis (quick
updates)

• trigger recommendation
computation
API Router


Akka Actor handles
• GET recommendation request

• fetch pre-computed
recommendation from Redis if exists

• OR try to do best-effort to provide
“coldstart" recommendation based
on history of user activities

Redis Store
Redis is used as

• events bus:
• inform subscribers about user
activities
• requests to provide new
recommendation for user

• data storage
• old/new recommendations
• statistics (likes/swipe per user)
• simple persistence model

• computation engine
• keep top-N artists, top-N songs per user
ML Backend
Language/technology agnostic
• Needs to be flexible enough to prototype
different strategies
“Runners” for
• generating recommendations

with H2O and Python
• collecting/generating statistics
• clustering users with H2O JVM
“Runners” are subscribed to Redis/
processing Redis data
ML Backend
Final strategy
• identify user cluster based on 

users activities (aka music styles)
• apply different recommendation

strategies inside each cluster
• identify “weird” users (~outliers)
• adapt recommendation for them
• needs manual intervention/algorithm
tuning
Results
• Single machine for API Router and Redis
• peeks 50 activities/sec, avg 10 activities/sec
• small memory footprint
• ML Runners spread over EC2 machines
• even simple but different strategies for each user sectors
and selected individual users provides surprisingly good
results
Learn more at h2o.ai
Follow us at @h2oai
Thank you!

Build Your Own Recommendation Engine

  • 1.
    Build Your Own RecommendationEngine 
 (during Weekend) Michal Malohlava @mmalohlava && @h2oai presents
  • 2.
    MusicService Activities 
 clicks/swipes/likes Clients iOS/Android/… Next
 N-recommendations ? REBB*? *REBB = Recommendation Engine Black Box
  • 3.
    Requirements Activities 
 can be>100/s REBB should be accessible via REST API Recommendations need to be served <500ms, should keep users exploring AWS infrastructure Need to 
 be ready in 2 days!
  • 4.
    Requirements • Recommendations shouldbe served <500ms • ML part should allow quick prototyping & experimentation • Storage (online/offline) - user stats, histories, recommendations • Scalable • frontend receiving requests • backend solving ML • storage Need to 
 be ready in 2 days!
  • 5.
    Engine Architecture Variation ofλ-architecture… … with pluggable ML
 backend
  • 6.
  • 7.
    API Router REST APIvia Spray
 Akka Actor accepting and filtering: • user activities • recommendation requests Scalable via HAProxy
  • 8.
    API Router 
 Akka Actorhandles
 • POST of user activity 
 • publish activity to Redis
 • update stats in Redis (quick updates)
 • trigger recommendation computation
  • 9.
    API Router 
 Akka Actorhandles • GET recommendation request
 • fetch pre-computed recommendation from Redis if exists
 • OR try to do best-effort to provide “coldstart" recommendation based on history of user activities

  • 10.
    Redis Store Redis isused as
 • events bus: • inform subscribers about user activities • requests to provide new recommendation for user
 • data storage • old/new recommendations • statistics (likes/swipe per user) • simple persistence model
 • computation engine • keep top-N artists, top-N songs per user
  • 11.
    ML Backend Language/technology agnostic •Needs to be flexible enough to prototype different strategies “Runners” for • generating recommendations
 with H2O and Python • collecting/generating statistics • clustering users with H2O JVM “Runners” are subscribed to Redis/ processing Redis data
  • 12.
    ML Backend Final strategy •identify user cluster based on 
 users activities (aka music styles) • apply different recommendation
 strategies inside each cluster • identify “weird” users (~outliers) • adapt recommendation for them • needs manual intervention/algorithm tuning
  • 13.
    Results • Single machinefor API Router and Redis • peeks 50 activities/sec, avg 10 activities/sec • small memory footprint • ML Runners spread over EC2 machines • even simple but different strategies for each user sectors and selected individual users provides surprisingly good results
  • 14.
    Learn more ath2o.ai Follow us at @h2oai Thank you!