SlideShare a Scribd company logo
1 of 67
Download to read offline
Verarbeitung von Datenstromen in Echtzeit 
Tobias Heintz1 Benjamin Kille2 
1plista GmbH 
2Technische Universitat Berlin 
September 26, 2014
Table of Contents 
Introduction 
Recommender Systems 
Unpersonalised Recommendation 
Collaborative Filtering 
Content-based Filtering 
Evaluation 
News Recommendation 
Big Data Issues
Who are we? 
I Tobias Heintz, plista GmbH 
I Benjamin Kille, Technische Universitat Berlin 
plista GmbH 
Pioneers for targeted advertisement and content distribution. 
I founded 31 July, 2008 
I incorporated in the WPP Group as of 1 January, 2014 
I headquaters in Berlin, Germany 
I 120 employees (30 % R&D) 
Technische Universitat Berlin 
I >30 000 enrolled students 
I 331 professors 
I >2600 researchers
What problems do we address? 
Recommender Systems 
We will introduce recommender systems; we will discuss a variety 
of algorithms; we will explore how to evaluate recommender 
systems. 
News 
We will talk about speci
c challenges when recommending news; 
we will illustrate issues arising as system fail to build 
comprehensive user pro
les; we will depict how news evolving over 
time aect recommender systems. 
Big Data 
We will examplify in what way news represent a source of big data; 
we will introduce a system which grants researchers access to big 
data; we will show you, how you can compete with your own 
approaches.
Why are these problem important? 
Users increasingly face information overload as they interact with 
item collections. For instance: 
I 43 000 000 songs on Apple's iTunes 
I 100 h of video are uploaded on Youtube every minute 
I 3 000 000 movies on IMDb 
I ... 
Collection continue to grow causing even more severe information 
overload. The same yields for news articles.
Table of Contents 
Introduction 
Recommender Systems 
Unpersonalised Recommendation 
Collaborative Filtering 
Content-based Filtering 
Evaluation 
News Recommendation 
Big Data Issues
Problem de
nition 
Users have insucient time and cognitive capacity to iterate the 
full collection. Recommender Systems support users as they
lter 
collections. Recommender Systems dier with respect to the 
method they use to
lter. More formally, a general-purpose 
recommender system is a triple (U; I; ). 
U ! set of users fu1; u2; : : : ; uMg 
I ! set of items fi1; i2; : : : ; iNg 
 ! a
lter function 
The performance of dierent recommendation algorithms typically 
depends on .
Filter Functions 
Filter functions take a user u, the entire item collection I, and a 
model M. They return a subset of items to be recommended I. 
(u; I;M) = I 
Recommender systems' success or failure strongly depends on the 
model M. In particular, how accurately the model re
ects actual 
user preferences. M may take various kinds of input, as we will 
discuss for a selection of recommendation algorithms.
Random Recommendation 
M takes the item collection and selects items randomly.
Random Recommendation 
M takes the item collection and selects items randomly. 
random
Most-Popular Recommendation 
M orders the item collection according to the number of 
interactions, K  L  M  N. 
K interactions 
L interactions 
M interactions 
most N interactions 
popular
Summary: Unpersonalised Recommenders 
Advantages 
I low computational complexity 
I easy to update M 
I domain independent 
Disadvantages 
I disregard personal taste 
I disregard context 
I high chance to recommend known or unpopular items
Collaborative Filtering 
Basic Assumptions 
I systems have access to users' preferences 
I users with similar tastes in the past will continue to like 
similar items 
I systems have means to compare users tastes 
Distinctions 
I model-based vs memory-based 
I item-based vs user-based
Example 
Anna 
Aviator 
Bob 
Clara 
Dan 
Bad Boys 
Cars 
District 9 
Elektra
Example 
Anna 
Aviator 
Bob 
Clara 
Dan 
Bad Boys 
Cars 
District 9 
Elektra
Example 
Anna 
Aviator 
Bob 
Clara 
Dan 
Bad Boys 
Cars 
District 9 
Elektra 
user profile: Anna 
Bad Boys District 9 Elektra [ , , ]
Example 
Anna 
Bob 
Clara 
Dan 
[ , , ] 
Bad Boys District 9 Elektra [ , , , ] 
Aviator 
Bad Boys District 9 Elektra 
[ , , ] 
Cars District 9 Elektra 
[ ] 
Aviator
Example 
Anna 
Aviator 
Bob 
Clara 
Dan 
Bad Boys Cars District 9 Elektra
Example 
Anna 
Aviator 
Bob 
Clara 
Dan 
Bad Boys Cars District 9 Elektra
Example 
Anna 
Aviator 
Bob 
Clara 
Dan 
Bad Boys Cars District 9 Elektra
Example 
Anna 
Aviator 
Bob 
Clara 
Dan 
Bad Boys Cars District 9 Elektra
Preference Elicitation 
Explicit Preferences 
I Likes 
I Thumbs Up/Down 
I Ratings 
I Comments 
I Purchase 
Implicit Preferences 
I Click 
I Dwell Time 
I Returns 
How can we measure whether users like items and how much they 
do?
Collaborative Filtering Algorithms with Ratings 
Memory-based 
Algorithm uses the complete set of data in the recommendation 
process. M contains the full rating matrix. 
I user-based k-nearest neighbour 
I item-based k-nearest neighbour 
Model-based 
Algorithm learns a model M and uses it to recommend items. 
I matrix factorisation with ALS 
I matrix factorisation with SGD
User-based k-nearest Neighbour 
Input: M  N rating matrix R, similarity measure (u; v) 
Anna 
Aviator 
Bob 
Clara 
Dan 
Bad Boys Cars District 9 Elektra
User-based k-nearest Neighbour 
Input: M  N rating matrix R, similarity measure (u; v) 
Anna 
Aviator 
Bob 
Clara 
Dan 
Bad Boys Cars District 9 Elektra
User-based k-nearest Neighbour 
Input: M  N rating matrix R, similarity measure (u; v) 
Anna 
Aviator 
Bob 
Bad Boys Cars District 9 Elektra 
0 0 
1 1 1 
1 1 0 
1 1
Similarity Measures 
Number of items in common 
(u; v) = 
X 
i2I 
I(i) 
I(i) = 
( 
1 if both u and v liked i 
0 otherwise 
Cosine similarity 
(u; v) = 
u  v 
jjujjjjvjj 
Pearson's correlation coecient 
(u; v) = 
cov(u; v) 
std(u)std(v)
User-based k-nearest Neighbour 
Input: M  N rating matrix R, similarity measure (u; v) 
Anna 
Bob 
Clara 
Dan 
Anna Bob Clara Dan 
1 
1 
1 
1 
sim(Anna, Bob) 
sim(Bob, Anna)
User-based k-nearest Neighbour 
Input: M  N rating matrix R, similarity measure (u; v) 
Anna 
Bob 
Clara 
Dan 
Anna Bob Clara Dan 
1 
1 
1 
1 
sim(Anna, Bob) 
sim(Bob, Anna) 
[1, sBob, sClara, sDan]
User-based k-nearest Neighbour 
Input: M  N rating matrix R, similarity measure (u; v) 
Anna 
Aviator 
Bob 
Clara 
Dan 
Bad Boys Cars District 9 Elektra 
?
User-based k-nearest Neighbour 
Recommendation procedure user pro
le: 
u = (r (i1); r (i2); : : : ; r (iN)) 
similarity vector: 
(u; ) = ((u; v1); (u; v2); : : : ; (u; u); : : : ; (u; vM)) 
preference prediction: 
r (j) = u(u; ) 
Result 
We obtain a prediction for each item's preference and can rank 
them accordingly. The algorithm returns as many items as 
requested starting from the top rank.
Item-based k-nearest Neighbour 
Input: M  N rating matrix R, similarity measure (i ; j) 
Anna 
Aviator 
Bob 
Clara 
Dan 
Bad Boys Cars District 9 Elektra
Item-based k-nearest Neighbour 
Input: M  N rating matrix R, similarity measure (i ; j) 
Anna 
Aviator 
Bob 
Clara 
Dan 
Bad Boys Cars District 9 Elektra
Item-based k-nearest Neighbour 
Input: M  N rating matrix R, similarity measure (i ; j) 
Anna 
Aviator 
Bob 
Clara 
Dan 
Bad Boys 
1 
1 
1 
1 
0 
0 0 
0
Similarity Measures 
Number of items in common 
(i ; j) = 
X 
u2U 
I(u) 
I(u) = 
( 
1 if both i and j are liked by u 
0 otherwise 
Cosine similarity 
(i ; j) = 
i  j 
jji jjjjj jj 
Pearson's correlation coecient 
(i ; j) = 
cov(i ; j) 
std(i)std(j)
Item-based k-nearest Neighbour 
Input: M  N rating matrix R, similarity measure (i ; j) 
Aviator Bad Boys Cars District 9 Elektra 
Aviator 
Bad Boys 
Cars 
District 9 
Elektra 
1 
1 
1 
1 
1 
sim(Aviator, Bad Boys) 
sim(Bad Boys, Aviator)
Item-based k-nearest Neighbour 
Input: M  N rating matrix R, similarity measure (i ; j) 
Anna 
Aviator 
Bob 
Clara 
Dan 
Bad Boys Cars District 9 Elektra 
?
Item-based k-nearest Neighbour 
Recommendation procedure item pro
le: 
i = (r (u1); r (u2); : : : ; r (uM)) 
similarity vector: 
(i ; ) = ((i ; j1); (i ; j2); : : : ; (i ; i); : : : ; (i ; jN)) 
preference prediction: 
r (u) = (i ; )i 
Result 
We obtain a prediction for each item's preference and can rank 
them accordingly. The algorithm returns as many items as 
requested starting from the top rank.
Matrix Factorisation 
Input: M  N rating matrix R 
R = 
2 
664 
1 1 1 
1 1 1 1 
1 1 1 
1 
3 
775 
Goal 
Fill the gaps of missing preferences.
Matrix Factorisation 
Idea 
Project preferences into low dimensional space to detect latent 
structures. 
[R]MN  [P]MK[Q]N 
K 
K  M;N 
Problem 
How to determine P and Q?
Matrix Factorisation 
Learning P and Q 
Input: Error metric 
E(P;Q; R) = 
X 
(u;i)2R 
i )2 
(r (u; i)  PuQ 
(quadratic error) 
E(P;Q; R) = 
X 
(u;i)2R 
jr (u; i)  PuQ 
i j 
(absolute error)
Matrix Factorisation 
Stochastic Gradient Descent 
Optimise error metric by selecting data points at random. 
I initialise P;Q with small random values 
I pick a preference (u; i) at random 
I determine the gradient at that point 
I adjust P;Q accordingly 
I continue 
Alternating Least Squares 
Optimise either P or Q keeping the other
xed 
I initialise P;Q with small random values 
I optimise error metric by P 
I optimise error metric by Q 
I continue
Summary: Collaborative Filtering 
Advantages 
I takes personal taste into account 
I successful in the Net
ix Prize competition 
I domain-independent 
Disadvantages 
I cold-start problem 
I sparsity 
I grey sheep
Cold-Start Problem 
I user without known preferences 
I item without preferences 
I similarity measures fail 
I inconclusive latent factors
Grey Sheep 
I user rate all their items average 
I user pro
le: [3; 3; 3; 3; : : : ; 3] 
I collaborative systems cannot distinguish good from bad items
Content-based Filtering 
Idea 
Suggest items which are similar to items users have liked. 
Similarity 
I based on content ! features 
I depending on the domain
Content-based Filtering 
Input: user pro
le, item collection, item features, and similarity 
measure
Content-based Filtering 
Input: user pro
le, item collection, item features, and similarity 
measure
Content-based Filtering 
Input: user pro
le, item collection, item features, and similarity 
measure 
Features 
▪ Name/ID 
▪ Meta data 
▪ Content 
▪ audio stream -- songs 
▪ video stream -- 
movies 
▪ text -- book, news 
article
Content-based Filtering 
Input: user pro
le, item collection, item features, and similarity 
measure 
CBF 
sim(i,j)
Content-based Filtering 
Similarity: Example 
I keyword overlap ! text 
I average colour match ! images/video 
I maximum amplitude ! audio/sound 
I common actors ! movies 
I common interests ! friends/partnership
Summary: Content-based Filtering 
Advantages 
I considers personal taste 
I high expectability 
Disadvantages 
I cost-sensitive for high-volume contents, e.g., video 
I low serendipity 
I user cold-start
Evaluation 
Important aspects 
I how well does the system predict preferences? 
I how often do users receive useful suggestions? 
I how long does it take for the system to provide suggestions? 
I how many requests cannot be answered? 
I how often do users return to the site? 
I how often do users purchase/rent/consume items which the 
system had recommended? 
I how well did users perceive the system?

More Related Content

Similar to Real-world News Recommender Systems

Recommendation Engine Powered by Hadoop - Pranab Ghosh
Recommendation Engine Powered by Hadoop - Pranab GhoshRecommendation Engine Powered by Hadoop - Pranab Ghosh
Recommendation Engine Powered by Hadoop - Pranab GhoshBigDataCloud
 
Download
DownloadDownload
Downloadbutest
 
Download
DownloadDownload
Downloadbutest
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix DatasetBen Mabey
 
CS583-recommender-systems.ppt
CS583-recommender-systems.pptCS583-recommender-systems.ppt
CS583-recommender-systems.pptArfatAhmadKhan1
 
Collaborative Filtering Survey
Collaborative Filtering SurveyCollaborative Filtering Survey
Collaborative Filtering Surveymobilizer1000
 
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...NAVER Engineering
 
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM RecommendersYONG ZHENG
 
CSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 AugCSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 Augcstalks
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithmsnextlib
 
Building Understanding Out of Incomplete and Biased Datasets using Machine Le...
Building Understanding Out of Incomplete and Biased Datasets using Machine Le...Building Understanding Out of Incomplete and Biased Datasets using Machine Le...
Building Understanding Out of Incomplete and Biased Datasets using Machine Le...Databricks
 
Setting up an A/B-testing framework
Setting up an A/B-testing frameworkSetting up an A/B-testing framework
Setting up an A/B-testing frameworkAgnes van Belle
 
FairBench: A Fairness Assessment Framework
FairBench: A Fairness Assessment FrameworkFairBench: A Fairness Assessment Framework
FairBench: A Fairness Assessment Frameworkmaniopas
 
Implementing a data_science_project (Python Version)_part1
Implementing a data_science_project (Python Version)_part1Implementing a data_science_project (Python Version)_part1
Implementing a data_science_project (Python Version)_part1Dr Sulaimon Afolabi
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation SystemsRobin Reni
 
LCBM: Statistics-Based Parallel Collaborative Filtering
LCBM: Statistics-Based Parallel Collaborative FilteringLCBM: Statistics-Based Parallel Collaborative Filtering
LCBM: Statistics-Based Parallel Collaborative FilteringFabio Petroni, PhD
 
Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFYusuke Yamamoto
 
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Daniel Valcarce
 
Artificial intelligence and IoT
Artificial intelligence and IoTArtificial intelligence and IoT
Artificial intelligence and IoTVeselin Pizurica
 

Similar to Real-world News Recommender Systems (20)

Cs583 recommender-systems
Cs583 recommender-systemsCs583 recommender-systems
Cs583 recommender-systems
 
Recommendation Engine Powered by Hadoop - Pranab Ghosh
Recommendation Engine Powered by Hadoop - Pranab GhoshRecommendation Engine Powered by Hadoop - Pranab Ghosh
Recommendation Engine Powered by Hadoop - Pranab Ghosh
 
Download
DownloadDownload
Download
 
Download
DownloadDownload
Download
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
 
CS583-recommender-systems.ppt
CS583-recommender-systems.pptCS583-recommender-systems.ppt
CS583-recommender-systems.ppt
 
Collaborative Filtering Survey
Collaborative Filtering SurveyCollaborative Filtering Survey
Collaborative Filtering Survey
 
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...
 
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
 
CSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 AugCSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 Aug
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
 
Building Understanding Out of Incomplete and Biased Datasets using Machine Le...
Building Understanding Out of Incomplete and Biased Datasets using Machine Le...Building Understanding Out of Incomplete and Biased Datasets using Machine Le...
Building Understanding Out of Incomplete and Biased Datasets using Machine Le...
 
Setting up an A/B-testing framework
Setting up an A/B-testing frameworkSetting up an A/B-testing framework
Setting up an A/B-testing framework
 
FairBench: A Fairness Assessment Framework
FairBench: A Fairness Assessment FrameworkFairBench: A Fairness Assessment Framework
FairBench: A Fairness Assessment Framework
 
Implementing a data_science_project (Python Version)_part1
Implementing a data_science_project (Python Version)_part1Implementing a data_science_project (Python Version)_part1
Implementing a data_science_project (Python Version)_part1
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
LCBM: Statistics-Based Parallel Collaborative Filtering
LCBM: Statistics-Based Parallel Collaborative FilteringLCBM: Statistics-Based Parallel Collaborative Filtering
LCBM: Statistics-Based Parallel Collaborative Filtering
 
Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CF
 
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
 
Artificial intelligence and IoT
Artificial intelligence and IoTArtificial intelligence and IoT
Artificial intelligence and IoT
 

Recently uploaded

Application of Foraminiferal Ecology- Rahul.pptx
Application of Foraminiferal Ecology- Rahul.pptxApplication of Foraminiferal Ecology- Rahul.pptx
Application of Foraminiferal Ecology- Rahul.pptxRahulVishwakarma71547
 
Applied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docxApplied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docxmarwaahmad357
 
KeyBio pipeline for bioinformatics and data science
KeyBio pipeline for bioinformatics and data scienceKeyBio pipeline for bioinformatics and data science
KeyBio pipeline for bioinformatics and data scienceLayne Sadler
 
Bureau of Indian Standards Specification of Shampoo.pptx
Bureau of Indian Standards Specification of Shampoo.pptxBureau of Indian Standards Specification of Shampoo.pptx
Bureau of Indian Standards Specification of Shampoo.pptxkastureyashashree
 
Controlling Parameters of Carbonate platform Environment
Controlling Parameters of Carbonate platform EnvironmentControlling Parameters of Carbonate platform Environment
Controlling Parameters of Carbonate platform EnvironmentRahulVishwakarma71547
 
RCPE terms and cycles scenarios as of March 2024
RCPE terms and cycles scenarios as of March 2024RCPE terms and cycles scenarios as of March 2024
RCPE terms and cycles scenarios as of March 2024suelcarter1
 
Krishi Vigyan Kendras - कृषि विज्ञान केंद्र
Krishi Vigyan Kendras - कृषि विज्ञान केंद्रKrishi Vigyan Kendras - कृषि विज्ञान केंद्र
Krishi Vigyan Kendras - कृषि विज्ञान केंद्रKrashi Coaching
 
Excavation Methods in Archaeological Research & Studies
Excavation Methods in Archaeological Research &  StudiesExcavation Methods in Archaeological Research &  Studies
Excavation Methods in Archaeological Research & StudiesPrachya Adhyayan
 
Alternative system of medicine herbal drug technology syllabus
Alternative system of medicine herbal drug technology syllabusAlternative system of medicine herbal drug technology syllabus
Alternative system of medicine herbal drug technology syllabusPradnya Wadekar
 
Genomics and Bioinformatics basics from genome to phenome
Genomics and Bioinformatics basics from genome to phenomeGenomics and Bioinformatics basics from genome to phenome
Genomics and Bioinformatics basics from genome to phenomeAjay Kumar Mahato
 
Physics Serway Jewett 6th edition for Scientists and Engineers
Physics Serway Jewett 6th edition for Scientists and EngineersPhysics Serway Jewett 6th edition for Scientists and Engineers
Physics Serway Jewett 6th edition for Scientists and EngineersAndreaLucarelli
 
World Water Day 22 March 2024 - kiyorndlab
World Water Day 22 March 2024 - kiyorndlabWorld Water Day 22 March 2024 - kiyorndlab
World Water Day 22 March 2024 - kiyorndlabkiyorndlab
 
Role of Herbs in Cosmetics in Cosmetic Science.
Role of Herbs in Cosmetics in Cosmetic Science.Role of Herbs in Cosmetics in Cosmetic Science.
Role of Herbs in Cosmetics in Cosmetic Science.ShwetaHattimare
 
Gene transfer in plants agrobacterium.pdf
Gene transfer in plants agrobacterium.pdfGene transfer in plants agrobacterium.pdf
Gene transfer in plants agrobacterium.pdfNetHelix
 
Main Exam Applied biochemistry final year
Main Exam Applied biochemistry final yearMain Exam Applied biochemistry final year
Main Exam Applied biochemistry final yearmarwaahmad357
 
Substances in Common Use for Shahu College Screening Test
Substances in Common Use for Shahu College Screening TestSubstances in Common Use for Shahu College Screening Test
Substances in Common Use for Shahu College Screening TestAkashDTejwani
 
geometric quantization on coadjoint orbits
geometric quantization on coadjoint orbitsgeometric quantization on coadjoint orbits
geometric quantization on coadjoint orbitsHassan Jolany
 
An intro to explainable AI for polar climate science
An intro to  explainable AI for  polar climate scienceAn intro to  explainable AI for  polar climate science
An intro to explainable AI for polar climate scienceZachary Labe
 
TORSION IN GASTROPODS- Anatomical event (Zoology)
TORSION IN GASTROPODS- Anatomical event (Zoology)TORSION IN GASTROPODS- Anatomical event (Zoology)
TORSION IN GASTROPODS- Anatomical event (Zoology)chatterjeesoumili50
 
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Sérgio Sacani
 

Recently uploaded (20)

Application of Foraminiferal Ecology- Rahul.pptx
Application of Foraminiferal Ecology- Rahul.pptxApplication of Foraminiferal Ecology- Rahul.pptx
Application of Foraminiferal Ecology- Rahul.pptx
 
Applied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docxApplied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docx
 
KeyBio pipeline for bioinformatics and data science
KeyBio pipeline for bioinformatics and data scienceKeyBio pipeline for bioinformatics and data science
KeyBio pipeline for bioinformatics and data science
 
Bureau of Indian Standards Specification of Shampoo.pptx
Bureau of Indian Standards Specification of Shampoo.pptxBureau of Indian Standards Specification of Shampoo.pptx
Bureau of Indian Standards Specification of Shampoo.pptx
 
Controlling Parameters of Carbonate platform Environment
Controlling Parameters of Carbonate platform EnvironmentControlling Parameters of Carbonate platform Environment
Controlling Parameters of Carbonate platform Environment
 
RCPE terms and cycles scenarios as of March 2024
RCPE terms and cycles scenarios as of March 2024RCPE terms and cycles scenarios as of March 2024
RCPE terms and cycles scenarios as of March 2024
 
Krishi Vigyan Kendras - कृषि विज्ञान केंद्र
Krishi Vigyan Kendras - कृषि विज्ञान केंद्रKrishi Vigyan Kendras - कृषि विज्ञान केंद्र
Krishi Vigyan Kendras - कृषि विज्ञान केंद्र
 
Excavation Methods in Archaeological Research & Studies
Excavation Methods in Archaeological Research &  StudiesExcavation Methods in Archaeological Research &  Studies
Excavation Methods in Archaeological Research & Studies
 
Alternative system of medicine herbal drug technology syllabus
Alternative system of medicine herbal drug technology syllabusAlternative system of medicine herbal drug technology syllabus
Alternative system of medicine herbal drug technology syllabus
 
Genomics and Bioinformatics basics from genome to phenome
Genomics and Bioinformatics basics from genome to phenomeGenomics and Bioinformatics basics from genome to phenome
Genomics and Bioinformatics basics from genome to phenome
 
Physics Serway Jewett 6th edition for Scientists and Engineers
Physics Serway Jewett 6th edition for Scientists and EngineersPhysics Serway Jewett 6th edition for Scientists and Engineers
Physics Serway Jewett 6th edition for Scientists and Engineers
 
World Water Day 22 March 2024 - kiyorndlab
World Water Day 22 March 2024 - kiyorndlabWorld Water Day 22 March 2024 - kiyorndlab
World Water Day 22 March 2024 - kiyorndlab
 
Role of Herbs in Cosmetics in Cosmetic Science.
Role of Herbs in Cosmetics in Cosmetic Science.Role of Herbs in Cosmetics in Cosmetic Science.
Role of Herbs in Cosmetics in Cosmetic Science.
 
Gene transfer in plants agrobacterium.pdf
Gene transfer in plants agrobacterium.pdfGene transfer in plants agrobacterium.pdf
Gene transfer in plants agrobacterium.pdf
 
Main Exam Applied biochemistry final year
Main Exam Applied biochemistry final yearMain Exam Applied biochemistry final year
Main Exam Applied biochemistry final year
 
Substances in Common Use for Shahu College Screening Test
Substances in Common Use for Shahu College Screening TestSubstances in Common Use for Shahu College Screening Test
Substances in Common Use for Shahu College Screening Test
 
geometric quantization on coadjoint orbits
geometric quantization on coadjoint orbitsgeometric quantization on coadjoint orbits
geometric quantization on coadjoint orbits
 
An intro to explainable AI for polar climate science
An intro to  explainable AI for  polar climate scienceAn intro to  explainable AI for  polar climate science
An intro to explainable AI for polar climate science
 
TORSION IN GASTROPODS- Anatomical event (Zoology)
TORSION IN GASTROPODS- Anatomical event (Zoology)TORSION IN GASTROPODS- Anatomical event (Zoology)
TORSION IN GASTROPODS- Anatomical event (Zoology)
 
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
 

Real-world News Recommender Systems

  • 1. Verarbeitung von Datenstromen in Echtzeit Tobias Heintz1 Benjamin Kille2 1plista GmbH 2Technische Universitat Berlin September 26, 2014
  • 2. Table of Contents Introduction Recommender Systems Unpersonalised Recommendation Collaborative Filtering Content-based Filtering Evaluation News Recommendation Big Data Issues
  • 3. Who are we? I Tobias Heintz, plista GmbH I Benjamin Kille, Technische Universitat Berlin plista GmbH Pioneers for targeted advertisement and content distribution. I founded 31 July, 2008 I incorporated in the WPP Group as of 1 January, 2014 I headquaters in Berlin, Germany I 120 employees (30 % R&D) Technische Universitat Berlin I >30 000 enrolled students I 331 professors I >2600 researchers
  • 4. What problems do we address? Recommender Systems We will introduce recommender systems; we will discuss a variety of algorithms; we will explore how to evaluate recommender systems. News We will talk about speci
  • 5. c challenges when recommending news; we will illustrate issues arising as system fail to build comprehensive user pro
  • 6. les; we will depict how news evolving over time aect recommender systems. Big Data We will examplify in what way news represent a source of big data; we will introduce a system which grants researchers access to big data; we will show you, how you can compete with your own approaches.
  • 7. Why are these problem important? Users increasingly face information overload as they interact with item collections. For instance: I 43 000 000 songs on Apple's iTunes I 100 h of video are uploaded on Youtube every minute I 3 000 000 movies on IMDb I ... Collection continue to grow causing even more severe information overload. The same yields for news articles.
  • 8. Table of Contents Introduction Recommender Systems Unpersonalised Recommendation Collaborative Filtering Content-based Filtering Evaluation News Recommendation Big Data Issues
  • 10. nition Users have insucient time and cognitive capacity to iterate the full collection. Recommender Systems support users as they
  • 11. lter collections. Recommender Systems dier with respect to the method they use to
  • 12. lter. More formally, a general-purpose recommender system is a triple (U; I; ). U ! set of users fu1; u2; : : : ; uMg I ! set of items fi1; i2; : : : ; iNg ! a
  • 13. lter function The performance of dierent recommendation algorithms typically depends on .
  • 14. Filter Functions Filter functions take a user u, the entire item collection I, and a model M. They return a subset of items to be recommended I. (u; I;M) = I Recommender systems' success or failure strongly depends on the model M. In particular, how accurately the model re ects actual user preferences. M may take various kinds of input, as we will discuss for a selection of recommendation algorithms.
  • 15. Random Recommendation M takes the item collection and selects items randomly.
  • 16. Random Recommendation M takes the item collection and selects items randomly. random
  • 17. Most-Popular Recommendation M orders the item collection according to the number of interactions, K L M N. K interactions L interactions M interactions most N interactions popular
  • 18. Summary: Unpersonalised Recommenders Advantages I low computational complexity I easy to update M I domain independent Disadvantages I disregard personal taste I disregard context I high chance to recommend known or unpopular items
  • 19. Collaborative Filtering Basic Assumptions I systems have access to users' preferences I users with similar tastes in the past will continue to like similar items I systems have means to compare users tastes Distinctions I model-based vs memory-based I item-based vs user-based
  • 20. Example Anna Aviator Bob Clara Dan Bad Boys Cars District 9 Elektra
  • 21. Example Anna Aviator Bob Clara Dan Bad Boys Cars District 9 Elektra
  • 22. Example Anna Aviator Bob Clara Dan Bad Boys Cars District 9 Elektra user profile: Anna Bad Boys District 9 Elektra [ , , ]
  • 23. Example Anna Bob Clara Dan [ , , ] Bad Boys District 9 Elektra [ , , , ] Aviator Bad Boys District 9 Elektra [ , , ] Cars District 9 Elektra [ ] Aviator
  • 24. Example Anna Aviator Bob Clara Dan Bad Boys Cars District 9 Elektra
  • 25. Example Anna Aviator Bob Clara Dan Bad Boys Cars District 9 Elektra
  • 26. Example Anna Aviator Bob Clara Dan Bad Boys Cars District 9 Elektra
  • 27. Example Anna Aviator Bob Clara Dan Bad Boys Cars District 9 Elektra
  • 28. Preference Elicitation Explicit Preferences I Likes I Thumbs Up/Down I Ratings I Comments I Purchase Implicit Preferences I Click I Dwell Time I Returns How can we measure whether users like items and how much they do?
  • 29. Collaborative Filtering Algorithms with Ratings Memory-based Algorithm uses the complete set of data in the recommendation process. M contains the full rating matrix. I user-based k-nearest neighbour I item-based k-nearest neighbour Model-based Algorithm learns a model M and uses it to recommend items. I matrix factorisation with ALS I matrix factorisation with SGD
  • 30. User-based k-nearest Neighbour Input: M N rating matrix R, similarity measure (u; v) Anna Aviator Bob Clara Dan Bad Boys Cars District 9 Elektra
  • 31. User-based k-nearest Neighbour Input: M N rating matrix R, similarity measure (u; v) Anna Aviator Bob Clara Dan Bad Boys Cars District 9 Elektra
  • 32. User-based k-nearest Neighbour Input: M N rating matrix R, similarity measure (u; v) Anna Aviator Bob Bad Boys Cars District 9 Elektra 0 0 1 1 1 1 1 0 1 1
  • 33. Similarity Measures Number of items in common (u; v) = X i2I I(i) I(i) = ( 1 if both u and v liked i 0 otherwise Cosine similarity (u; v) = u v jjujjjjvjj Pearson's correlation coecient (u; v) = cov(u; v) std(u)std(v)
  • 34. User-based k-nearest Neighbour Input: M N rating matrix R, similarity measure (u; v) Anna Bob Clara Dan Anna Bob Clara Dan 1 1 1 1 sim(Anna, Bob) sim(Bob, Anna)
  • 35. User-based k-nearest Neighbour Input: M N rating matrix R, similarity measure (u; v) Anna Bob Clara Dan Anna Bob Clara Dan 1 1 1 1 sim(Anna, Bob) sim(Bob, Anna) [1, sBob, sClara, sDan]
  • 36. User-based k-nearest Neighbour Input: M N rating matrix R, similarity measure (u; v) Anna Aviator Bob Clara Dan Bad Boys Cars District 9 Elektra ?
  • 37. User-based k-nearest Neighbour Recommendation procedure user pro
  • 38. le: u = (r (i1); r (i2); : : : ; r (iN)) similarity vector: (u; ) = ((u; v1); (u; v2); : : : ; (u; u); : : : ; (u; vM)) preference prediction: r (j) = u(u; ) Result We obtain a prediction for each item's preference and can rank them accordingly. The algorithm returns as many items as requested starting from the top rank.
  • 39. Item-based k-nearest Neighbour Input: M N rating matrix R, similarity measure (i ; j) Anna Aviator Bob Clara Dan Bad Boys Cars District 9 Elektra
  • 40. Item-based k-nearest Neighbour Input: M N rating matrix R, similarity measure (i ; j) Anna Aviator Bob Clara Dan Bad Boys Cars District 9 Elektra
  • 41. Item-based k-nearest Neighbour Input: M N rating matrix R, similarity measure (i ; j) Anna Aviator Bob Clara Dan Bad Boys 1 1 1 1 0 0 0 0
  • 42. Similarity Measures Number of items in common (i ; j) = X u2U I(u) I(u) = ( 1 if both i and j are liked by u 0 otherwise Cosine similarity (i ; j) = i j jji jjjjj jj Pearson's correlation coecient (i ; j) = cov(i ; j) std(i)std(j)
  • 43. Item-based k-nearest Neighbour Input: M N rating matrix R, similarity measure (i ; j) Aviator Bad Boys Cars District 9 Elektra Aviator Bad Boys Cars District 9 Elektra 1 1 1 1 1 sim(Aviator, Bad Boys) sim(Bad Boys, Aviator)
  • 44. Item-based k-nearest Neighbour Input: M N rating matrix R, similarity measure (i ; j) Anna Aviator Bob Clara Dan Bad Boys Cars District 9 Elektra ?
  • 45. Item-based k-nearest Neighbour Recommendation procedure item pro
  • 46. le: i = (r (u1); r (u2); : : : ; r (uM)) similarity vector: (i ; ) = ((i ; j1); (i ; j2); : : : ; (i ; i); : : : ; (i ; jN)) preference prediction: r (u) = (i ; )i Result We obtain a prediction for each item's preference and can rank them accordingly. The algorithm returns as many items as requested starting from the top rank.
  • 47. Matrix Factorisation Input: M N rating matrix R R = 2 664 1 1 1 1 1 1 1 1 1 1 1 3 775 Goal Fill the gaps of missing preferences.
  • 48. Matrix Factorisation Idea Project preferences into low dimensional space to detect latent structures. [R]MN [P]MK[Q]N K K M;N Problem How to determine P and Q?
  • 49. Matrix Factorisation Learning P and Q Input: Error metric E(P;Q; R) = X (u;i)2R i )2 (r (u; i) PuQ (quadratic error) E(P;Q; R) = X (u;i)2R jr (u; i) PuQ i j (absolute error)
  • 50. Matrix Factorisation Stochastic Gradient Descent Optimise error metric by selecting data points at random. I initialise P;Q with small random values I pick a preference (u; i) at random I determine the gradient at that point I adjust P;Q accordingly I continue Alternating Least Squares Optimise either P or Q keeping the other
  • 51. xed I initialise P;Q with small random values I optimise error metric by P I optimise error metric by Q I continue
  • 52. Summary: Collaborative Filtering Advantages I takes personal taste into account I successful in the Net ix Prize competition I domain-independent Disadvantages I cold-start problem I sparsity I grey sheep
  • 53. Cold-Start Problem I user without known preferences I item without preferences I similarity measures fail I inconclusive latent factors
  • 54. Grey Sheep I user rate all their items average I user pro
  • 55. le: [3; 3; 3; 3; : : : ; 3] I collaborative systems cannot distinguish good from bad items
  • 56. Content-based Filtering Idea Suggest items which are similar to items users have liked. Similarity I based on content ! features I depending on the domain
  • 58. le, item collection, item features, and similarity measure
  • 60. le, item collection, item features, and similarity measure
  • 62. le, item collection, item features, and similarity measure Features ▪ Name/ID ▪ Meta data ▪ Content ▪ audio stream -- songs ▪ video stream -- movies ▪ text -- book, news article
  • 64. le, item collection, item features, and similarity measure CBF sim(i,j)
  • 65. Content-based Filtering Similarity: Example I keyword overlap ! text I average colour match ! images/video I maximum amplitude ! audio/sound I common actors ! movies I common interests ! friends/partnership
  • 66. Summary: Content-based Filtering Advantages I considers personal taste I high expectability Disadvantages I cost-sensitive for high-volume contents, e.g., video I low serendipity I user cold-start
  • 67. Evaluation Important aspects I how well does the system predict preferences? I how often do users receive useful suggestions? I how long does it take for the system to provide suggestions? I how many requests cannot be answered? I how often do users return to the site? I how often do users purchase/rent/consume items which the system had recommended? I how well did users perceive the system?
  • 68. Evaluation: Rating Prediction Goal The evaluation ought to show how well the system estimates preferences. Assumptions I system can access recorded explicit numerical preferences I tastes remain stable over time I the more accurate the system estimates preferences, the more suited the suggestions Metrics I root mean squared error q 1 j(u;i)j P (u;i)2R(r (u; i) ^r (u; i ))2 I mean absolute error 1 j(u;i)j P (u;i)2R jr (u; i) ^r (u; i)j
  • 69. Evaluation: Ranking Goal The evaluation ought to show how well the system ranks items according to users' preferences. Assumptions I system can access preference relations between items I tastes remain stable over time I the better the system ranks items, the more suited the suggestions Metrics I normalised discounted cumulative gain DCG IDCG I mean reciprocal rank 1 juj P u2U 1 ranki
  • 70. Evaluation: Top-N Goald The evaluation ought to show how well the system selects the top suggestions. Assumptions I system can access preference relations between items I tastes remain stable over time I the better the system selects the top suggestions, the more suited they are Metrics I precision@N TP TP+FP I recall@N TP TP+FN
  • 71. Evaluation: Problems I explicit preferences may not be available I tastes change over time I recorded data does not fully re ect the current situation Solution Accessing real systems with current user interactions to see whether method performs better than existing one ! second part of the tutorial
  • 72. Summary: Recommender Systems I support users by suggesting interesting items I counteract information overload I unpersonalised recommender I collaborative
  • 73. ltering I user-based k-nearest neighbour I item-based k-nearest neighbour I matrix factorisation I content-based
  • 74. ltering I evaluation still dicult
  • 75. Table of Contents Introduction Recommender Systems Unpersonalised Recommendation Collaborative Filtering Content-based Filtering Evaluation News Recommendation Big Data Issues
  • 76. News Recommendation: Special Characteristics Collection Dynamics I thousands of new article published daily I older articles' relevancy decays Contextual Dierences I users perceive recommendations dierently I devices render recommendations dierently I dependence on daytime and weekday Popularity Bias I few items receive a lot of attention I most items receive hardly any attention
  • 77. News Recommendation: Collection Dynamics 2000 1500 1000 500 entry Oct Jan Oct Jan exit
  • 78. News Recommendation: Contextual Dierences Sun Sat Fri Thu Wed Tue desktop phone hour Sun Sat Fri Thu Wed Tue Mon 0 6 12 18 Mon tablet 0.014 0.012 0.010 0.008 0.006 0.004 0.002 0.000
  • 79. News Recommendation: Popularity Bias News Interactions Frequency 10^4 10^3 10^2 10^1 10^0 10^0 10^1 10^2 10^3 10^4 10^5 10^6 Movies Interactions Frequency 10^2.0 10^1.5 10^1.0 10^0.5 10^0.0 10^0 10^1 10^2 10^3 10^4
  • 80. Table of Contents Introduction Recommender Systems Unpersonalised Recommendation Collaborative Filtering Content-based Filtering Evaluation News Recommendation Big Data Issues
  • 81. Big Data Goal Intelligent real-time processing of huge amounts of data. Recommender Systems ! personalisation I volume ! amount of data to be stored increases I variety ! heterogeneous data I velocity ! data streams in (near) real-time I veracity ! noisy data
  • 82. Big Data Do news recommendations full
  • 83. l the requirements of big data? Volume hundreds of GB every day X Variety news entail textual data and images enducing some variety Velocity news arise continuously ! second part of the tutorial X Veracity news have some consistent attributes (headline, text), but also comprise some features which are missing or wrong (date, location, image)
  • 84. Questions? Thank you for your attention! We hope you enjoyed the
  • 85. rst part of the tutorial! There is more (practical) to come in the second part!