From Spark to Elasticsearch and Back - Learning Large Scale Models for Content Recommendation

•Download as PPTX, PDF•

1 like•370 views

Sonya Liberman leads the Personalization team @ Outbrain's Recommendations group, developing large-scale machine learning algorithms for Outbrain's content recommendations platform serving tens of billions real-time recommendations a day. She specializes in Machine Learning, Information Retrieval and Computational Linguistics. Before joining Outbrain, she led the Algorithms team @ ConvertMedia (acquired by Taboola). She holds an MSc in Computer Science and a BSc in Computer Science and Computational Biology. This invited talk was given at the Inspiring Big Data Science meetup, January 2018. Abstract: Sonya will share how Outbrain, a world leading content recommendations service, uses machine learning to monthly deliver 200 billion personalized content recommendations to hundreds of millions of unique monthly users. She will cover the layers of their algorithmic architecture, including its Spark-based offline layer, and its Elasticsearch-based serving layer that enables running complex models under difficult scale constrains and shortens the cycle between research and production.

Technology

| CONTENT-BASED PERSONALIZATION
From Spark to Elasticsearch and
Back
Learning Large Scale
Models for
Content
Recommendation
Sonya Liberman
Personalization Team Lead
Outbrain Recommendations Group

3
The Lighthouse
Help people discover content they can trust to
be interesting, relevant and timely for them

275
BILLION
RECOMMENDATIONS
SERVED PER MONTH

554
MONTHLY
UNIQUE USERS
GLOBALLY
MILLION

11
Oubtrain’s NLP Engine
Crawling articles where
widget is displayed
Crawling articles
recommended in
widget
Over 3 million new
articles a week

15
What we Read vs. what we Share
Do our social shares reflect our reading patterns?
200 publishers
> 1 billion of user interactions
47 million Facebook shares
For Your Eyes Only: Consuming vs. Sharing Content |
Roy Sasson, Ram Meshulam
3rd SNOW Workshop on Social News on the Web, WWW’2016, Montreal,
Canada
For Your Eyes Only: Consuming vs. Sharing Content |
Roy Sasson, Ram Meshulam
3rd SNOW Workshop on Social News on the Web, WWW’2016, Montreal,
Canada

17
Know Your Reader
Know our Users better than their Facebook Friends?
For Your Eyes Only: Consuming vs. Sharing Content |
Roy Sasson, Ram Meshulam
3rd SNOW Workshop on Social News on the Web, WWW’2016, Montreal,
Canada

1. Content Based Models
Recommends content based on semantic similarity with
user interests
19
Predictive Models
Music
Tech
Travel

1. Content Based Models
1. Behavioural Models
Finding behavioural patterns beyond a semantic connection
20
Predictive Models
Retirement
Investing
Heart
Disease

3. Collaborative Models
Use wisdom of the crowd
Predict new content using users that have similar
reading patterns
21
Predictive Models

22
Collaborative Models
* Taken from Wikipedia

3. Collaborative Models
Matrix Fatorization
Factorization Machines
Feature Embedding with Deep Learning
30
Predictive Models

31
Data Processing
and
Distributed Machine Learning Framework

32
Data Processing
3 Data Centers
300 Machines in each cluster
7 petabytes of data
5 terabytes of compressed
new data daily
Distributed Machine
Learning Framework

33
Distributed Machine Learning Framework
Data
Collection
Feature
Engineering
Model
Training
Offline
Evaluation &
Simulation
Model
Deployment
1 2 3
4 5

34
Distributed Machine Learning Framework
Used for Production
Daily production flow
Automatic model evaluation and decision making

35
Distributed Machine Learning Framework
Feature
Engineering
Model
Training
2 3
Used for Research
Agile Development of New Models

37
The Serving Layer
Relevance
Content
Inventory
Machine Learned
Model
Request for Content
Recommendations

38
Challenges of The Serving Layer
35K req/sec, 50ms latency
Millions of potential content recommendations

39
Using Search Technology for
Recommender System Serving Layer

40
Why Search?
1. Score documents by relevance to query
1. Scalable in terms of inventory size
1. Scalable in terms of number of requests

41
Why a Search Engine?
The Inverted Index

42
Why a Search Engine?
what the
day brings
The Inverted Index

43
Why a Search Engine?
what the
day brings
The Inverted Index

44
Why a Search Engine?
what the
day brings
The Inverted Index

45
Why a Search Engine?
Scalable
Distributed
Open
Source
Real-time
Search
RESTful

Replace Bag of Words with Semantic Features
Content Based Recommendations to
Search Reduction

Replace Bag of Words with Semantic Features
Index the semantic features of
each document
(potential recommendation)
47
Content Based Recommendations to
Search Reduction
Tech
Music
Sports
Celebrities

Generate a query from User Interests
48
Content Based Recommendations to
Search Reduction
Music
Tech
Travel
Tech
Music
Sports
Celebrities
Get me relevant
recommendations

49
Custom Models - Elasticsearch Plugins
Writing custom scoring functions with native Java
Passing model parameters via the query
Implementing efficient data storages
for feature vectors

50
The Power of Data
Getting to Know Our Readers
Personalization Models
Distributed Machine Learning framework
Search based Serving Layer

Similar to From Spark to Elasticsearch and Back - Learning Large Scale Models for Content Recommendation

Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Sonya Liberman

Introduction to machine learning with GPUsCarol McDonald

Bioschemas WorkshopNiall Beard

Demystifying Systems for Interactive and Real-time AnalyticsDataWorks Summit

miKrow presentation at ESWC2011Guillermo Álvaro Rey

HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE

Webinar: Fusion 3.1 - What's NewLucidworks

PFI Corporate ProfilePreferred Networks

محاضرة برنامج Nails لتحليل الدراسات السابقة د.شروق المقرنمركز البحوث الأقسام العلمية

Notey's talk 20160923Rosanna Man

Reduce Query Time Up to 60% with Selective SearchLucidworks

User Interests Identification From Twitter using Hierarchical Knowledge BaseArtificial Intelligence Institute at UofSC

Recommender System at Scale Using HBase and HadoopDataWorks Summit

Data council sf amundsen presentationTao Feng

Mendeley’s Research Catalogue: building it, opening it up and making it even ...Kris Jack

Lecture4 Social Web Marieke van Erp

Webinar: Building Customer-Targeted Search with FusionLucidworks

Semantic web an overview and projectsPranali Gedam-Khobragade

Search-Based Serving Architecture of Embeddings-Based Recommendations (RecSys...Sonya Liberman

Alamw15 VIVOKristi Holmes

Similar to From Spark to Elasticsearch and Back - Learning Large Scale Models for Content Recommendation (20)

Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...

Introduction to machine learning with GPUs

Bioschemas Workshop

Demystifying Systems for Interactive and Real-time Analytics

miKrow presentation at ESWC2011

HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com

Webinar: Fusion 3.1 - What's New

PFI Corporate Profile

محاضرة برنامج Nails لتحليل الدراسات السابقة د.شروق المقرن

Notey's talk 20160923

Reduce Query Time Up to 60% with Selective Search

User Interests Identification From Twitter using Hierarchical Knowledge Base

Recommender System at Scale Using HBase and Hadoop

Data council sf amundsen presentation

Mendeley’s Research Catalogue: building it, opening it up and making it even ...

Lecture4 Social Web

Webinar: Building Customer-Targeted Search with Fusion

Semantic web an overview and projects

Search-Based Serving Architecture of Embeddings-Based Recommendations (RecSys...

Alamw15 VIVO

Recently uploaded

Vulnerability_Management_GRC_by Sohang Sengupta.pptxnull - The Open Security Community

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

Pigging Solutions Piggable Sweeping ElbowsPigging Solutions

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

Understanding the Laravel MVC ArchitecturePixlogix Infotech

APIForce Zurich 5 April Automation LPDGMarianaLemus7

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

The transition to renewables in India.pdfCompetition Advisory Services (India) LLP

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Key Features Of Token Development (1).pptxLBM Solutions

Artificial intelligence in the post-deep learning eraDeakin University

Recently uploaded (20)

Vulnerability_Management_GRC_by Sohang Sengupta.pptx

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

Pigging Solutions Piggable Sweeping Elbows

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Scanning the Internet for External Cloud Exposures via SSL Certs

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

Understanding the Laravel MVC Architecture

APIForce Zurich 5 April Automation LPDG

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

The transition to renewables in India.pdf

Unleash Your Potential - Namagunga Girls Coding Club

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

Streamlining Python Development: A Guide to a Modern Project Setup

Key Features Of Token Development (1).pptx

Artificial intelligence in the post-deep learning era

From Spark to Elasticsearch and Back - Learning Large Scale Models for Content Recommendation

1. | CONTENT-BASED PERSONALIZATION From Spark to Elasticsearch and Back Learning Large Scale Models for Content Recommendation Sonya Liberman Personalization Team Lead Outbrain Recommendations Group

2. ? What Is

3. 3 The Lighthouse Help people discover content they can trust to be interesting, relevant and timely for them

4. 275 BILLION RECOMMENDATIONS SERVED PER MONTH

5. 554 MONTHLY UNIQUE USERS GLOBALLY MILLION

10. 10 Know Your Reader

11. 11 Oubtrain’s NLP Engine Crawling articles where widget is displayed Crawling articles recommended in widget Over 3 million new articles a week

12. 12 What is a Document About?

13. 13 User Semantic Profile

14. 14 User Semantic Profile

15. 15 What we Read vs. what we Share Do our social shares reflect our reading patterns? 200 publishers > 1 billion of user interactions 47 million Facebook shares For Your Eyes Only: Consuming vs. Sharing Content | Roy Sasson, Ram Meshulam 3rd SNOW Workshop on Social News on the Web, WWW’2016, Montreal, Canada For Your Eyes Only: Consuming vs. Sharing Content | Roy Sasson, Ram Meshulam 3rd SNOW Workshop on Social News on the Web, WWW’2016, Montreal, Canada

16. 16 What People Read vs. What They Share

17. 17 Know Your Reader Know our Users better than their Facebook Friends? For Your Eyes Only: Consuming vs. Sharing Content | Roy Sasson, Ram Meshulam 3rd SNOW Workshop on Social News on the Web, WWW’2016, Montreal, Canada

18. 18 Predictive Models

19. 1. Content Based Models Recommends content based on semantic similarity with user interests 19 Predictive Models Music Tech Travel

20. 1. Content Based Models 1. Behavioural Models Finding behavioural patterns beyond a semantic connection 20 Predictive Models Retirement Investing Heart Disease

21. 3. Collaborative Models Use wisdom of the crowd Predict new content using users that have similar reading patterns 21 Predictive Models

22. 22 Collaborative Models * Taken from Wikipedia

23. 23 Collaborative Models

24. 24 Collaborative Models

25. 25 Collaborative Models

26. 26 Collaborative Models

27. 27 Collaborative Models

28. 28 Collaborative Models

29. 29 Collaborative Models

30. 3. Collaborative Models Matrix Fatorization Factorization Machines Feature Embedding with Deep Learning 30 Predictive Models

31. 31 Data Processing and Distributed Machine Learning Framework

32. 32 Data Processing 3 Data Centers 300 Machines in each cluster 7 petabytes of data 5 terabytes of compressed new data daily Distributed Machine Learning Framework

33. 33 Distributed Machine Learning Framework Data Collection Feature Engineering Model Training Offline Evaluation & Simulation Model Deployment 1 2 3 4 5

34. 34 Distributed Machine Learning Framework Used for Production Daily production flow Automatic model evaluation and decision making

35. 35 Distributed Machine Learning Framework Feature Engineering Model Training 2 3 Used for Research Agile Development of New Models

36. 36 The Serving Layer

37. 37 The Serving Layer Relevance Content Inventory Machine Learned Model Request for Content Recommendations

38. 38 Challenges of The Serving Layer 35K req/sec, 50ms latency Millions of potential content recommendations

39. 39 Using Search Technology for Recommender System Serving Layer

40. 40 Why Search? 1. Score documents by relevance to query 1. Scalable in terms of inventory size 1. Scalable in terms of number of requests

41. 41 Why a Search Engine? The Inverted Index

42. 42 Why a Search Engine? what the day brings The Inverted Index

43. 43 Why a Search Engine? what the day brings The Inverted Index

44. 44 Why a Search Engine? what the day brings The Inverted Index

45. 45 Why a Search Engine? Scalable Distributed Open Source Real-time Search RESTful

46. Replace Bag of Words with Semantic Features Content Based Recommendations to Search Reduction

47. Replace Bag of Words with Semantic Features Index the semantic features of each document (potential recommendation) 47 Content Based Recommendations to Search Reduction Tech Music Sports Celebrities

48. Generate a query from User Interests 48 Content Based Recommendations to Search Reduction Music Tech Travel Tech Music Sports Celebrities Get me relevant recommendations

49. 49 Custom Models - Elasticsearch Plugins Writing custom scoring functions with native Java Passing model parameters via the query Implementing efficient data storages for feature vectors

50. 50 The Power of Data Getting to Know Our Readers Personalization Models Distributed Machine Learning framework Search based Serving Layer

51. Thank You

From Spark to Elasticsearch and Back - Learning Large Scale Models for Content Recommendation

Recommended

Recommended

More Related Content

Similar to From Spark to Elasticsearch and Back - Learning Large Scale Models for Content Recommendation

Similar to From Spark to Elasticsearch and Back - Learning Large Scale Models for Content Recommendation (20)

Recently uploaded

Recently uploaded (20)

From Spark to Elasticsearch and Back - Learning Large Scale Models for Content Recommendation