Sonya Liberman leads the Personalization team @ Outbrain's Recommendations group, developing large-scale machine learning algorithms for Outbrain's content recommendations platform serving tens of billions real-time recommendations a day. She specializes in Machine Learning, Information Retrieval and Computational Linguistics. Before joining Outbrain, she led the Algorithms team @ ConvertMedia (acquired by Taboola). She holds an MSc in Computer Science and a BSc in Computer Science and Computational Biology.
This invited talk was given at the Inspiring Big Data Science meetup, January 2018.
Abstract: Sonya will share how Outbrain, a world leading content recommendations service, uses machine learning to monthly deliver 200 billion personalized content recommendations to hundreds of millions of unique monthly users. She will cover the layers of their algorithmic architecture, including its Spark-based offline layer, and its Elasticsearch-based serving layer that enables running complex models under difficult scale constrains and shortens the cycle between research and production.
Artificial intelligence in the post-deep learning era
From Spark to Elasticsearch and Back - Learning Large Scale Models for Content Recommendation
1. | CONTENT-BASED PERSONALIZATION
From Spark to Elasticsearch and
Back
Learning Large Scale
Models for
Content
Recommendation
Sonya Liberman
Personalization Team Lead
Outbrain Recommendations Group
15. 15
What we Read vs. what we Share
Do our social shares reflect our reading patterns?
200 publishers
> 1 billion of user interactions
47 million Facebook shares
For Your Eyes Only: Consuming vs. Sharing Content |
Roy Sasson, Ram Meshulam
3rd SNOW Workshop on Social News on the Web, WWW’2016, Montreal,
Canada
For Your Eyes Only: Consuming vs. Sharing Content |
Roy Sasson, Ram Meshulam
3rd SNOW Workshop on Social News on the Web, WWW’2016, Montreal,
Canada
17. 17
Know Your Reader
Know our Users better than their Facebook Friends?
For Your Eyes Only: Consuming vs. Sharing Content |
Roy Sasson, Ram Meshulam
3rd SNOW Workshop on Social News on the Web, WWW’2016, Montreal,
Canada
32. 32
Data Processing
3 Data Centers
300 Machines in each cluster
7 petabytes of data
5 terabytes of compressed
new data daily
Distributed Machine
Learning Framework
33. 33
Distributed Machine Learning Framework
Data
Collection
Feature
Engineering
Model
Training
Offline
Evaluation &
Simulation
Model
Deployment
1 2 3
4 5
34. 34
Distributed Machine Learning Framework
Used for Production
Daily production flow
Automatic model evaluation and decision making
35. 35
Distributed Machine Learning Framework
Feature
Engineering
Model
Training
2 3
Used for Research
Agile Development of New Models
42. 42
Why a Search Engine?
what the
day brings
The Inverted Index
43. 43
Why a Search Engine?
what the
day brings
The Inverted Index
44. 44
Why a Search Engine?
what the
day brings
The Inverted Index
45. 45
Why a Search Engine?
Scalable
Distributed
Open
Source
Real-time
Search
RESTful
46. Replace Bag of Words with Semantic Features
Content Based Recommendations to
Search Reduction
47. Replace Bag of Words with Semantic Features
Index the semantic features of
each document
(potential recommendation)
47
Content Based Recommendations to
Search Reduction
Tech
Music
Sports
Celebrities
48. Generate a query from User Interests
48
Content Based Recommendations to
Search Reduction
Music
Tech
Travel
Tech
Music
Sports
Celebrities
Get me relevant
recommendations
49. 49
Custom Models - Elasticsearch Plugins
Writing custom scoring functions with native Java
Passing model parameters via the query
Implementing efficient data storages
for feature vectors
50. 50
The Power of Data
Getting to Know Our Readers
Personalization Models
Distributed Machine Learning framework
Search based Serving Layer