SlideShare a Scribd company logo
1 of 51
| CONTENT-BASED PERSONALIZATION
From Spark to Elasticsearch and
Back
Learning Large Scale
Models for
Content
Recommendation
Sonya Liberman
Personalization Team Lead
Outbrain Recommendations Group
?
What Is
3
The Lighthouse
Help people discover content they can trust to
be interesting, relevant and timely for them
275
BILLION
RECOMMENDATIONS
SERVED PER MONTH
554
MONTHLY
UNIQUE USERS
GLOBALLY
MILLION
10
Know Your Reader
11
Oubtrain’s NLP Engine
Crawling articles where
widget is displayed
Crawling articles
recommended in
widget
Over 3 million new
articles a week
12
What is a Document About?
13
User Semantic Profile
14
User Semantic Profile
15
What we Read vs. what we Share
Do our social shares reflect our reading patterns?
200 publishers
> 1 billion of user interactions
47 million Facebook shares
For Your Eyes Only: Consuming vs. Sharing Content |
Roy Sasson, Ram Meshulam
3rd SNOW Workshop on Social News on the Web, WWW’2016, Montreal,
Canada
For Your Eyes Only: Consuming vs. Sharing Content |
Roy Sasson, Ram Meshulam
3rd SNOW Workshop on Social News on the Web, WWW’2016, Montreal,
Canada
16
What People
Read vs.
What They
Share
17
Know Your Reader
Know our Users better than their Facebook Friends?
For Your Eyes Only: Consuming vs. Sharing Content |
Roy Sasson, Ram Meshulam
3rd SNOW Workshop on Social News on the Web, WWW’2016, Montreal,
Canada
18
Predictive Models
1. Content Based Models
Recommends content based on semantic similarity with
user interests
19
Predictive Models
Music
Tech
Travel
1. Content Based Models
1. Behavioural Models
Finding behavioural patterns beyond a semantic connection
20
Predictive Models
Retirement
Investing
Heart
Disease
3. Collaborative Models
Use wisdom of the crowd
Predict new content using users that have similar
reading patterns
21
Predictive Models
22
Collaborative Models
* Taken from Wikipedia
23
Collaborative Models
24
Collaborative Models
25
Collaborative Models
26
Collaborative Models
27
Collaborative Models
28
Collaborative Models
29
Collaborative Models
3. Collaborative Models
Matrix Fatorization
Factorization Machines
Feature Embedding with Deep Learning
30
Predictive Models
31
Data Processing
and
Distributed Machine Learning Framework
32
Data Processing
3 Data Centers
300 Machines in each cluster
7 petabytes of data
5 terabytes of compressed
new data daily
Distributed Machine
Learning Framework
33
Distributed Machine Learning Framework
Data
Collection
Feature
Engineering
Model
Training
Offline
Evaluation &
Simulation
Model
Deployment
1 2 3
4 5
34
Distributed Machine Learning Framework
Used for Production
Daily production flow
Automatic model evaluation and decision making
35
Distributed Machine Learning Framework
Feature
Engineering
Model
Training
2 3
Used for Research
Agile Development of New Models
36
The Serving Layer
37
The Serving Layer
Relevance
Content
Inventory
Machine Learned
Model
Request for Content
Recommendations
38
Challenges of The Serving Layer
35K req/sec, 50ms latency
Millions of potential content recommendations
39
Using Search Technology for
Recommender System Serving Layer
40
Why Search?
1. Score documents by relevance to query
1. Scalable in terms of inventory size
1. Scalable in terms of number of requests
41
Why a Search Engine?
The Inverted Index
42
Why a Search Engine?
what the
day brings
The Inverted Index
43
Why a Search Engine?
what the
day brings
The Inverted Index
44
Why a Search Engine?
what the
day brings
The Inverted Index
45
Why a Search Engine?
Scalable
Distributed
Open
Source
Real-time
Search
RESTful
Replace Bag of Words with Semantic Features
Content Based Recommendations to
Search Reduction
Replace Bag of Words with Semantic Features
Index the semantic features of
each document
(potential recommendation)
47
Content Based Recommendations to
Search Reduction
Tech
Music
Sports
Celebrities
Generate a query from User Interests
48
Content Based Recommendations to
Search Reduction
Music
Tech
Travel
Tech
Music
Sports
Celebrities
Get me relevant
recommendations
49
Custom Models - Elasticsearch Plugins
Writing custom scoring functions with native Java
Passing model parameters via the query
Implementing efficient data storages
for feature vectors
50
The Power of Data
Getting to Know Our Readers
Personalization Models
Distributed Machine Learning framework
Search based Serving Layer
Thank You

More Related Content

Similar to From Spark to Elasticsearch and Back - Learning Large Scale Models for Content Recommendation

Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Sonya Liberman
 
Introduction to machine learning with GPUs
Introduction to machine learning with GPUsIntroduction to machine learning with GPUs
Introduction to machine learning with GPUsCarol McDonald
 
Bioschemas Workshop
Bioschemas WorkshopBioschemas Workshop
Bioschemas WorkshopNiall Beard
 
Demystifying Systems for Interactive and Real-time Analytics
Demystifying Systems for Interactive and Real-time AnalyticsDemystifying Systems for Interactive and Real-time Analytics
Demystifying Systems for Interactive and Real-time AnalyticsDataWorks Summit
 
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE
 
Webinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's NewWebinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's NewLucidworks
 
محاضرة برنامج Nails لتحليل الدراسات السابقة د.شروق المقرن
محاضرة برنامج Nails  لتحليل الدراسات السابقة د.شروق المقرنمحاضرة برنامج Nails  لتحليل الدراسات السابقة د.شروق المقرن
محاضرة برنامج Nails لتحليل الدراسات السابقة د.شروق المقرنمركز البحوث الأقسام العلمية
 
Notey's talk 20160923
Notey's talk 20160923Notey's talk 20160923
Notey's talk 20160923Rosanna Man
 
Reduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective SearchReduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective SearchLucidworks
 
Recommender System at Scale Using HBase and Hadoop
Recommender System at Scale Using HBase and HadoopRecommender System at Scale Using HBase and Hadoop
Recommender System at Scale Using HBase and HadoopDataWorks Summit
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentationTao Feng
 
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Kris Jack
 
Webinar: Building Customer-Targeted Search with Fusion
Webinar: Building Customer-Targeted Search with FusionWebinar: Building Customer-Targeted Search with Fusion
Webinar: Building Customer-Targeted Search with FusionLucidworks
 
Search-Based Serving Architecture of Embeddings-Based Recommendations (RecSys...
Search-Based Serving Architecture of Embeddings-Based Recommendations (RecSys...Search-Based Serving Architecture of Embeddings-Based Recommendations (RecSys...
Search-Based Serving Architecture of Embeddings-Based Recommendations (RecSys...Sonya Liberman
 

Similar to From Spark to Elasticsearch and Back - Learning Large Scale Models for Content Recommendation (20)

Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
 
Introduction to machine learning with GPUs
Introduction to machine learning with GPUsIntroduction to machine learning with GPUs
Introduction to machine learning with GPUs
 
Bioschemas Workshop
Bioschemas WorkshopBioschemas Workshop
Bioschemas Workshop
 
Demystifying Systems for Interactive and Real-time Analytics
Demystifying Systems for Interactive and Real-time AnalyticsDemystifying Systems for Interactive and Real-time Analytics
Demystifying Systems for Interactive and Real-time Analytics
 
miKrow presentation at ESWC2011
miKrow presentation at ESWC2011miKrow presentation at ESWC2011
miKrow presentation at ESWC2011
 
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
 
Webinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's NewWebinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's New
 
PFI Corporate Profile
PFI Corporate ProfilePFI Corporate Profile
PFI Corporate Profile
 
محاضرة برنامج Nails لتحليل الدراسات السابقة د.شروق المقرن
محاضرة برنامج Nails  لتحليل الدراسات السابقة د.شروق المقرنمحاضرة برنامج Nails  لتحليل الدراسات السابقة د.شروق المقرن
محاضرة برنامج Nails لتحليل الدراسات السابقة د.شروق المقرن
 
Notey's talk 20160923
Notey's talk 20160923Notey's talk 20160923
Notey's talk 20160923
 
Reduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective SearchReduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective Search
 
User Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge BaseUser Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge Base
 
Recommender System at Scale Using HBase and Hadoop
Recommender System at Scale Using HBase and HadoopRecommender System at Scale Using HBase and Hadoop
Recommender System at Scale Using HBase and Hadoop
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
 
Lecture4 Social Web
Lecture4 Social Web Lecture4 Social Web
Lecture4 Social Web
 
Webinar: Building Customer-Targeted Search with Fusion
Webinar: Building Customer-Targeted Search with FusionWebinar: Building Customer-Targeted Search with Fusion
Webinar: Building Customer-Targeted Search with Fusion
 
Semantic web an overview and projects
Semantic web   an  overview and projectsSemantic web   an  overview and projects
Semantic web an overview and projects
 
Search-Based Serving Architecture of Embeddings-Based Recommendations (RecSys...
Search-Based Serving Architecture of Embeddings-Based Recommendations (RecSys...Search-Based Serving Architecture of Embeddings-Based Recommendations (RecSys...
Search-Based Serving Architecture of Embeddings-Based Recommendations (RecSys...
 
Alamw15 VIVO
Alamw15 VIVOAlamw15 VIVO
Alamw15 VIVO
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 

Recently uploaded (20)

Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 

From Spark to Elasticsearch and Back - Learning Large Scale Models for Content Recommendation