SlideShare a Scribd company logo
The plista Dataset

ACM RecSys 2013, Hong Kong

Authors:
Kille, Benjamin
and Hopfgartner, Frank
and Brodt, Torben
and Heintz, Tobias
Speaker:
Brodt, Torben
International News Recommender
Systems Workshop and Challenge
October 13th, 2013
Introduction and Motivation
● Context: News Article Recommendation
Introduction and Motivation
● Do we need another recommendation data
set?
we have
...
● What features are those data sets missing?
● What requirements entail news articles for
recommendation?
Introduction and Motivation
● Features that had not been available in
existing data sets:
○ contextual features: device, operating system,
browser, etc.
○ cross-domain features: 13 different news providers
included
○ different interaction types: interactions with
recommendations (clicks), as well as news items
(impressions)
○ content features: headline, URL, images, text
snippets, etc.
Introduction and Motivation
● Additional requirements for recommending news articles
○ real-time → recommendations must be provided within a
short time interval (< 200ms)
○ changing relevancy → items’ relevancy decreases with
time
○ dynamics → new news items are being continuously
added
● Requirements inherent to existing recommender systems:
○ sparsity → users typically read only few news articles
○ cold start → systems refrain from requesting users to
create profiles; this results in a majority of small user
profiles
Dataset characteristics
{ // json
"type": "impression",
"context": {
"simple": {
"27": 418, // publisher
"14": 31721, // widget
...
},
"lists": {
"10": [100, 101] // channel
}
... specs hosted at http://orp.plista.
api
} com
Dataset characteristics
● object types
○
○
○
○

impressions → users reading news articles
clicks → users clicking recommendations
creates → news articles being created
updates → news articles being updated

api specs hosted at http://orp.plista.
com
Dataset usage
Dataset usage
● Evaluation based on
Click-Through-Rate
(CTR)
● ~ 84 million
impressions
● ~ 1 million clicks
Dataset usage
● evaluation cross-news portal
recommenders
● 10 - 36 % user overlap in
between different news
portals
Dataset usage
● news portal comparisons
● do we observe similar user
behaviour on news portals
offering similar content?
Dataset usage
● evaluating contextual
recommendation algorithms
● sensitive to
○ weekday
○ hour of day
○ ...
Dataset usage
When using the data set you may consider…
● … we identify users by session IDs
○
○

individual users may have several IDs
users sharing their device might be mapped to one ID

● … interactions (clicks, impressions) and content
dynamics (creates, updates) differ between news
portals
● … contents are restricted to German
● … preferences are represented on a binary scale (user
read article, user clicked recommendation)
● … clicking on recommendations might not reveal the
actual relevancy of an item
Conclusions
● we introduce a new data set intended to
support recommender systems research
● we outlined novel features which existing
data sets lacked
● we presented scenarios which can be
evaluated using the data set
● we pointed to critical aspects which ought
to be considered when working with the data
set
Summary
● news articles
○ of ~13 publishers

● transactional data
○ Impressions
○ Clicks

● contextual data
○ of ~50 attributes

● cross domain application
The plista Dataset
@inproceedings{Kille:2013,
title = {The plista Dataset},
author = {
Kille, Benjamin
and Hopfgartner, Frank
and Brodt, Torben
and Heintz, Tobias
},
booktitle = {
NRS'13: Proceedings of
the International Workshop and
Challenge on News Recommender Systems
},
year = {2013},
month = {10},
location = {Hong Kong, China},
publisher = {ACM},
pages={14--21}
}

More Related Content

Similar to Paper the plista dataset

Understanding and responding to content blocking
Understanding and responding to content blockingUnderstanding and responding to content blocking
Understanding and responding to content blocking
Pierre Far
 
Semantic e commerce
Semantic e commerceSemantic e commerce
Semantic e commerce
Christian Opitz
 
Pivotal Tracker - Research Findings
Pivotal Tracker - Research FindingsPivotal Tracker - Research Findings
Pivotal Tracker - Research Findings
Paulina Galindo
 
What can media learn from game analytics
What can media learn from game analyticsWhat can media learn from game analytics
What can media learn from game analytics
Osma Ahvenlampi
 
[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...
[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...
[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...
Gabriel Moreira
 
Data + Audience: Connecting to Create Impact
Data + Audience: Connecting to Create ImpactData + Audience: Connecting to Create Impact
Data + Audience: Connecting to Create Impact
Courtney Clark
 
Identifying Personas With Agile Research - Dawn of the Data Age Lecture Series
Identifying Personas With Agile Research - Dawn of the Data Age Lecture SeriesIdentifying Personas With Agile Research - Dawn of the Data Age Lecture Series
Identifying Personas With Agile Research - Dawn of the Data Age Lecture Series
Luciano Pesci, PhD
 
Pinpoint, Prepare, and Perform with Web Analytics
Pinpoint, Prepare, and Perform with Web AnalyticsPinpoint, Prepare, and Perform with Web Analytics
Pinpoint, Prepare, and Perform with Web Analytics
Katie Vojtko
 
A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...
A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...
A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...
Nim Dvir
 
Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...
Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...
Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...
Marijn Koolen
 
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...
LavaConConference
 
Web usage mining
Web usage miningWeb usage mining
Web usage mining
Monu Chaudhary
 
UXPA 2023: Learn how to get over personas by swiping right on user roles
UXPA 2023: Learn how to get over personas by swiping right on user rolesUXPA 2023: Learn how to get over personas by swiping right on user roles
UXPA 2023: Learn how to get over personas by swiping right on user roles
UXPA International
 
#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...
#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...
#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...
Together We're Better
 
Help Me, Help You: Supporting Your Data
Help Me, Help You: Supporting Your DataHelp Me, Help You: Supporting Your Data
Help Me, Help You: Supporting Your Data
Data Con LA
 
Nicholas Gorski: Real-time revenue science at Twitter
Nicholas Gorski: Real-time revenue science at TwitterNicholas Gorski: Real-time revenue science at Twitter
Nicholas Gorski: Real-time revenue science at Twitter
David Garrison
 
Drools5 Community Training Module 5 Drools BLIP Architectural Overview + Demos
Drools5 Community Training Module 5 Drools BLIP Architectural Overview + DemosDrools5 Community Training Module 5 Drools BLIP Architectural Overview + Demos
Drools5 Community Training Module 5 Drools BLIP Architectural Overview + Demos
Mauricio (Salaboy) Salatino
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
Förderverein Technische Fakultät
 
Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação
Gabriel Moreira
 
Deep Recommender Systems - PAPIs.io LATAM 2018
Deep Recommender Systems - PAPIs.io LATAM 2018Deep Recommender Systems - PAPIs.io LATAM 2018
Deep Recommender Systems - PAPIs.io LATAM 2018
Gabriel Moreira
 

Similar to Paper the plista dataset (20)

Understanding and responding to content blocking
Understanding and responding to content blockingUnderstanding and responding to content blocking
Understanding and responding to content blocking
 
Semantic e commerce
Semantic e commerceSemantic e commerce
Semantic e commerce
 
Pivotal Tracker - Research Findings
Pivotal Tracker - Research FindingsPivotal Tracker - Research Findings
Pivotal Tracker - Research Findings
 
What can media learn from game analytics
What can media learn from game analyticsWhat can media learn from game analytics
What can media learn from game analytics
 
[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...
[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...
[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...
 
Data + Audience: Connecting to Create Impact
Data + Audience: Connecting to Create ImpactData + Audience: Connecting to Create Impact
Data + Audience: Connecting to Create Impact
 
Identifying Personas With Agile Research - Dawn of the Data Age Lecture Series
Identifying Personas With Agile Research - Dawn of the Data Age Lecture SeriesIdentifying Personas With Agile Research - Dawn of the Data Age Lecture Series
Identifying Personas With Agile Research - Dawn of the Data Age Lecture Series
 
Pinpoint, Prepare, and Perform with Web Analytics
Pinpoint, Prepare, and Perform with Web AnalyticsPinpoint, Prepare, and Perform with Web Analytics
Pinpoint, Prepare, and Perform with Web Analytics
 
A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...
A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...
A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...
 
Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...
Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...
Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...
 
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...
 
Web usage mining
Web usage miningWeb usage mining
Web usage mining
 
UXPA 2023: Learn how to get over personas by swiping right on user roles
UXPA 2023: Learn how to get over personas by swiping right on user rolesUXPA 2023: Learn how to get over personas by swiping right on user roles
UXPA 2023: Learn how to get over personas by swiping right on user roles
 
#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...
#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...
#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...
 
Help Me, Help You: Supporting Your Data
Help Me, Help You: Supporting Your DataHelp Me, Help You: Supporting Your Data
Help Me, Help You: Supporting Your Data
 
Nicholas Gorski: Real-time revenue science at Twitter
Nicholas Gorski: Real-time revenue science at TwitterNicholas Gorski: Real-time revenue science at Twitter
Nicholas Gorski: Real-time revenue science at Twitter
 
Drools5 Community Training Module 5 Drools BLIP Architectural Overview + Demos
Drools5 Community Training Module 5 Drools BLIP Architectural Overview + DemosDrools5 Community Training Module 5 Drools BLIP Architectural Overview + Demos
Drools5 Community Training Module 5 Drools BLIP Architectural Overview + Demos
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação
 
Deep Recommender Systems - PAPIs.io LATAM 2018
Deep Recommender Systems - PAPIs.io LATAM 2018Deep Recommender Systems - PAPIs.io LATAM 2018
Deep Recommender Systems - PAPIs.io LATAM 2018
 

More from Torben Brodt

Living Labs Challenge Workshop
Living Labs Challenge WorkshopLiving Labs Challenge Workshop
Living Labs Challenge Workshop
Torben Brodt
 
Recommender Trends 2014
Recommender Trends 2014Recommender Trends 2014
Recommender Trends 2014
Torben Brodt
 
Nrs2013 recap
Nrs2013 recapNrs2013 recap
Nrs2013 recap
Torben Brodt
 
Open recommendation platform
Open recommendation platformOpen recommendation platform
Open recommendation platform
Torben Brodt
 
#TOA13 - Tech Opoen Air Recommender Hackathon
#TOA13 - Tech Opoen Air Recommender Hackathon#TOA13 - Tech Opoen Air Recommender Hackathon
#TOA13 - Tech Opoen Air Recommender Hackathon
Torben Brodt
 
Algorithmus, Good School, Camp Digital
Algorithmus, Good School, Camp DigitalAlgorithmus, Good School, Camp Digital
Algorithmus, Good School, Camp DigitalTorben Brodt
 
Realtime Recommender with Redis: Hands on
Realtime Recommender with Redis: Hands onRealtime Recommender with Redis: Hands on
Realtime Recommender with Redis: Hands on
Torben Brodt
 
Recommender Hackathon @plista 2013/04
Recommender Hackathon @plista 2013/04Recommender Hackathon @plista 2013/04
Recommender Hackathon @plista 2013/04
Torben Brodt
 
SIGIR 2013 BARS Keynote - the search for the best live recommender system
SIGIR 2013 BARS Keynote - the search for the best live recommender systemSIGIR 2013 BARS Keynote - the search for the best live recommender system
SIGIR 2013 BARS Keynote - the search for the best live recommender system
Torben Brodt
 
Content recommendations
Content recommendationsContent recommendations
Content recommendations
Torben Brodt
 
RecSys2012 inside the plista contest
RecSys2012   inside the plista contestRecSys2012   inside the plista contest
RecSys2012 inside the plista contest
Torben Brodt
 
Webhacks am Beispiel PHP + MySQL
Webhacks am Beispiel PHP + MySQLWebhacks am Beispiel PHP + MySQL
Webhacks am Beispiel PHP + MySQL
Torben Brodt
 
GIT / SVN
GIT / SVNGIT / SVN
GIT / SVN
Torben Brodt
 
Collaborative Filtering.. für automatische Empfehlungen
Collaborative Filtering.. für automatische EmpfehlungenCollaborative Filtering.. für automatische Empfehlungen
Collaborative Filtering.. für automatische Empfehlungen
Torben Brodt
 
Google Web Toolkit
Google Web ToolkitGoogle Web Toolkit
Google Web Toolkit
Torben Brodt
 
Geld Verdienen Mit Adsense
Geld Verdienen Mit AdsenseGeld Verdienen Mit Adsense
Geld Verdienen Mit Adsense
Torben Brodt
 
AJAX
AJAXAJAX
Web 2.0 - "Fluch oder Segen"
Web 2.0 - "Fluch oder Segen"Web 2.0 - "Fluch oder Segen"
Web 2.0 - "Fluch oder Segen"
Torben Brodt
 

More from Torben Brodt (18)

Living Labs Challenge Workshop
Living Labs Challenge WorkshopLiving Labs Challenge Workshop
Living Labs Challenge Workshop
 
Recommender Trends 2014
Recommender Trends 2014Recommender Trends 2014
Recommender Trends 2014
 
Nrs2013 recap
Nrs2013 recapNrs2013 recap
Nrs2013 recap
 
Open recommendation platform
Open recommendation platformOpen recommendation platform
Open recommendation platform
 
#TOA13 - Tech Opoen Air Recommender Hackathon
#TOA13 - Tech Opoen Air Recommender Hackathon#TOA13 - Tech Opoen Air Recommender Hackathon
#TOA13 - Tech Opoen Air Recommender Hackathon
 
Algorithmus, Good School, Camp Digital
Algorithmus, Good School, Camp DigitalAlgorithmus, Good School, Camp Digital
Algorithmus, Good School, Camp Digital
 
Realtime Recommender with Redis: Hands on
Realtime Recommender with Redis: Hands onRealtime Recommender with Redis: Hands on
Realtime Recommender with Redis: Hands on
 
Recommender Hackathon @plista 2013/04
Recommender Hackathon @plista 2013/04Recommender Hackathon @plista 2013/04
Recommender Hackathon @plista 2013/04
 
SIGIR 2013 BARS Keynote - the search for the best live recommender system
SIGIR 2013 BARS Keynote - the search for the best live recommender systemSIGIR 2013 BARS Keynote - the search for the best live recommender system
SIGIR 2013 BARS Keynote - the search for the best live recommender system
 
Content recommendations
Content recommendationsContent recommendations
Content recommendations
 
RecSys2012 inside the plista contest
RecSys2012   inside the plista contestRecSys2012   inside the plista contest
RecSys2012 inside the plista contest
 
Webhacks am Beispiel PHP + MySQL
Webhacks am Beispiel PHP + MySQLWebhacks am Beispiel PHP + MySQL
Webhacks am Beispiel PHP + MySQL
 
GIT / SVN
GIT / SVNGIT / SVN
GIT / SVN
 
Collaborative Filtering.. für automatische Empfehlungen
Collaborative Filtering.. für automatische EmpfehlungenCollaborative Filtering.. für automatische Empfehlungen
Collaborative Filtering.. für automatische Empfehlungen
 
Google Web Toolkit
Google Web ToolkitGoogle Web Toolkit
Google Web Toolkit
 
Geld Verdienen Mit Adsense
Geld Verdienen Mit AdsenseGeld Verdienen Mit Adsense
Geld Verdienen Mit Adsense
 
AJAX
AJAXAJAX
AJAX
 
Web 2.0 - "Fluch oder Segen"
Web 2.0 - "Fluch oder Segen"Web 2.0 - "Fluch oder Segen"
Web 2.0 - "Fluch oder Segen"
 

Recently uploaded

Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 

Recently uploaded (20)

Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 

Paper the plista dataset

  • 1. The plista Dataset ACM RecSys 2013, Hong Kong Authors: Kille, Benjamin and Hopfgartner, Frank and Brodt, Torben and Heintz, Tobias Speaker: Brodt, Torben International News Recommender Systems Workshop and Challenge October 13th, 2013
  • 2. Introduction and Motivation ● Context: News Article Recommendation
  • 3. Introduction and Motivation ● Do we need another recommendation data set? we have ... ● What features are those data sets missing? ● What requirements entail news articles for recommendation?
  • 4. Introduction and Motivation ● Features that had not been available in existing data sets: ○ contextual features: device, operating system, browser, etc. ○ cross-domain features: 13 different news providers included ○ different interaction types: interactions with recommendations (clicks), as well as news items (impressions) ○ content features: headline, URL, images, text snippets, etc.
  • 5. Introduction and Motivation ● Additional requirements for recommending news articles ○ real-time → recommendations must be provided within a short time interval (< 200ms) ○ changing relevancy → items’ relevancy decreases with time ○ dynamics → new news items are being continuously added ● Requirements inherent to existing recommender systems: ○ sparsity → users typically read only few news articles ○ cold start → systems refrain from requesting users to create profiles; this results in a majority of small user profiles
  • 6. Dataset characteristics { // json "type": "impression", "context": { "simple": { "27": 418, // publisher "14": 31721, // widget ... }, "lists": { "10": [100, 101] // channel } ... specs hosted at http://orp.plista. api } com
  • 7. Dataset characteristics ● object types ○ ○ ○ ○ impressions → users reading news articles clicks → users clicking recommendations creates → news articles being created updates → news articles being updated api specs hosted at http://orp.plista. com
  • 9. Dataset usage ● Evaluation based on Click-Through-Rate (CTR) ● ~ 84 million impressions ● ~ 1 million clicks
  • 10. Dataset usage ● evaluation cross-news portal recommenders ● 10 - 36 % user overlap in between different news portals
  • 11. Dataset usage ● news portal comparisons ● do we observe similar user behaviour on news portals offering similar content?
  • 12. Dataset usage ● evaluating contextual recommendation algorithms ● sensitive to ○ weekday ○ hour of day ○ ...
  • 13. Dataset usage When using the data set you may consider… ● … we identify users by session IDs ○ ○ individual users may have several IDs users sharing their device might be mapped to one ID ● … interactions (clicks, impressions) and content dynamics (creates, updates) differ between news portals ● … contents are restricted to German ● … preferences are represented on a binary scale (user read article, user clicked recommendation) ● … clicking on recommendations might not reveal the actual relevancy of an item
  • 14. Conclusions ● we introduce a new data set intended to support recommender systems research ● we outlined novel features which existing data sets lacked ● we presented scenarios which can be evaluated using the data set ● we pointed to critical aspects which ought to be considered when working with the data set
  • 15. Summary ● news articles ○ of ~13 publishers ● transactional data ○ Impressions ○ Clicks ● contextual data ○ of ~50 attributes ● cross domain application
  • 16. The plista Dataset @inproceedings{Kille:2013, title = {The plista Dataset}, author = { Kille, Benjamin and Hopfgartner, Frank and Brodt, Torben and Heintz, Tobias }, booktitle = { NRS'13: Proceedings of the International Workshop and Challenge on News Recommender Systems }, year = {2013}, month = {10}, location = {Hong Kong, China}, publisher = {ACM}, pages={14--21} }