SlideShare a Scribd company logo
Mendeley |
Presented By
Date
Building recommender systems for scholarly information
Maya Hristakeva, Daniel Kershaw, Marco Rossetti*, Petr Knoth^,
Benjamin Pettit, Saùl Vargas, Kris Jack
Daniel Kershaw
10th February 2017
* Currently working at Trainline
^ Currently working at the Open University
Mendeley | 2
Mendeley / Mendeley Suggest
• Make it easier for user to discover
relevant content
• Utilize Collective intelligence for
article discovery
• Citations slow to propagate
• Citation lags behind user reading
patterns
Mendeley |
• For the user the recommendations need to be:
• Novel
• Relevant
• Familiar
• Serendipitous
• Well Explained
• How to deal with cold and warm users
• How to deal with large data sets
3
Challenges
Mendeley |
• Implicit – serves recommendations
based on user libraries
• Recent Activity – based off recent
additions to a users library
• Research Interests - based on user
generated tags
• Discipline – based on their self
identified discipline
4
Types of Recommendations
Mendeley |
• Implicit – serves recommendations
based on user libraries
• Recent Activity – based off recent
additions to a users library
• Research Interests - based on user
generated tags
• Discipline – based on their self
identified discipline
5
Types of Recommendations
Most Personalized
Least Personalized
Mendeley |
Users who have read the same in the past will read the same in the future
Identify similar users using cosine similarity
cos 𝑢1, 𝑢2 =
𝐿1 × 𝐿2
𝐿1 × 𝐿2
The score of document for user is then a sum across the inverted neighborhood
𝑟𝑑
𝑢
=
𝑢′∈𝑠𝑖𝑚(𝑈,𝑢)
cos 𝑢, 𝑢′
, 𝑖𝑓 𝑑 ∈
𝑙𝑖𝑏(𝑢′)
𝑙𝑖𝑏(𝑢)
0, otherwise
6
Implicit – user-based nearest neighbor collaborative filtering
Mendeley |
• Use the last article added to a users library or last article read
• Fundamentally item-to-item recommendations
• Performed through comparing the content of article though TF-IDF vectors.
𝑟𝑎 𝑞,𝑦 = 𝑠𝑖𝑚 𝑞, 𝑦 × (1 + log(𝑝𝑜𝑝𝑢𝑙𝑎𝑟𝑖𝑡𝑦 𝑦, `𝑔𝑙𝑜𝑏𝑎𝑙′
))
• Score modified by the log of the global popularity, as a proxy for the quality of
the article
7
Recent Activity
Mendeley |
• Use user defined tags to form
Search Query
• Queries article stored in Elastic
Search, limited to globally popular
documents
• Top N documents served as
recommendations
• More tailored to users
• Not all users have filled in
interests
• Sometimes research interests are
mini abstracts
8
Research Interests
Mendeley |
• User chose discipline from a list of 30 categories (e.g. engineering, arts &
humanities)
• Popularity - rank each documents in our catalogue according to the number of
unique users from that discipline who have it in their libraries
𝑝𝑜𝑝 𝑑, 𝑈𝑔 = 𝑢; 𝑢 ∈ 𝑈𝑔; 𝑑 ∈ 𝑙𝑖𝑏(𝑢)
• Trending – rank each document in a discipline based on the rate of growth in
popularity across consecutive weeks.
𝑇𝑑
𝑔
= 𝑝𝑜𝑝 𝑑, 𝑈𝑔, 𝜏 − 𝑝𝑜𝑝 𝑑, 𝑈 𝐺, 𝜏 − 1 : 𝜏 = 0 … 𝑛
9
Discipline
Mendeley |
Predicting what users are going to add to their library
Split Mendeley library addition on a time boundary (T).
Warm users in both test and training sets ( ≈ 200,000 users)
Cold users only in the Testing Data ( ≈ 50,000 users)
10
Evaluation
Mendeley | 11
Metrics
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛@𝑛 =
𝑡𝑝
𝑡𝑝 + 𝑓𝑝
𝐹@𝑛1 = 2 ×
𝑝@𝑛 × 𝑟@𝑛
𝑝@𝑛 + 𝑟@𝑛
𝑟𝑒𝑐𝑎𝑙𝑙@𝑛 =
𝑡𝑝
𝑡𝑝 + 𝑓𝑛
Mendeley | 12
Cold Recommendations
Mendeley | 13
Warm Recommendations
Mendeley |
• Unpublished – undergraduates and
new postgrads
• Postgraduate – publish 1 or 2
articles
• Postdoc – published during their
PhD and postdoc
• Lecture – extensively published
across a number of fields
• Professor – prolific author with
many collaborations
14
User Segmentation
Mendeley | 15
User Segmentation Results
Mendeley |
Technical implementation
• Spark, Hadoop, Mahout, Elastic Search
Freshness of Content
• Dithering is applied to give the appearance of fresh content to end user
𝑛𝑒𝑤𝑠𝑐𝑜𝑟𝑒 = log(𝑟𝑎𝑛𝑘) + 𝑁 0, log 𝜀 , 𝜀 =
∆𝑟𝑎𝑛𝑘
𝑟𝑎𝑛𝑘
Content Quality
• User add anything to their library
• Pre filtering removes articles with titles containing `content’ or `TOC’
• Completeness of meta data checked
16
Practicalities
2/10/2017
Mendeley |
By mining user interaction with the
Implicit feedback recommender,
learn an optimal ranking based on a
comparison of item features and
user features e.g. content vectors
Aggregate the different
recommender systems into one list.
With the mixture of recommenders
personalized to each user.
Future Directions - Learning to Rank
Mendeley |
Presented By
Date
http://bit.ly/MendeleyDataScienceJob
WE ARE HIRING DATA SCIENTISTS & ENGINEERS!
18

More Related Content

Similar to Building Recommender Systems for Scholarly Information

Research recommendations at Mendeley
Research recommendations at MendeleyResearch recommendations at Mendeley
Research recommendations at Mendeley
Marco Rossetti
 
Discovery study detailed results 20140728
Discovery study detailed results 20140728Discovery study detailed results 20140728
Discovery study detailed results 20140728
Michael Levine-Clark
 
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزيرمحاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
مركز البحوث الأقسام العلمية
 
Scientific Publication Retrieval in Linked Data
Scientific Publication Retrieval in Linked DataScientific Publication Retrieval in Linked Data
Scientific Publication Retrieval in Linked Data
AIMS (Agricultural Information Management Standards)
 
One System, Different Expectations (Laura Morse)
One System, Different Expectations (Laura Morse)One System, Different Expectations (Laura Morse)
One System, Different Expectations (Laura Morse)
Charleston Conference
 
The OCLC Research Library Partnership
The OCLC Research Library PartnershipThe OCLC Research Library Partnership
The OCLC Research Library Partnership
OCLC
 
DDA/OAMI Update, NISO Update ALA Annual 2013
DDA/OAMI Update, NISO Update ALA Annual 2013DDA/OAMI Update, NISO Update ALA Annual 2013
DDA/OAMI Update, NISO Update ALA Annual 2013
National Information Standards Organization (NISO)
 
DDA/OAMI Update - NISO Update, ALA Annual Chicago 2013
DDA/OAMI Update - NISO Update, ALA Annual Chicago 2013DDA/OAMI Update - NISO Update, ALA Annual Chicago 2013
DDA/OAMI Update - NISO Update, ALA Annual Chicago 2013
nettiel
 
NISO Webinar: Keyword Search = "Improve Discovery Systems"
NISO Webinar: Keyword Search = "Improve Discovery Systems"NISO Webinar: Keyword Search = "Improve Discovery Systems"
NISO Webinar: Keyword Search = "Improve Discovery Systems"
National Information Standards Organization (NISO)
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
2016-05-31 Venia Legendi (CEITER): Sergey Sosnovsky
2016-05-31 Venia Legendi (CEITER): Sergey Sosnovsky2016-05-31 Venia Legendi (CEITER): Sergey Sosnovsky
2016-05-31 Venia Legendi (CEITER): Sergey Sosnovsky
ifi8106tlu
 
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزيرمحاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
مركز البحوث الأقسام العلمية
 
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Giannis Tsakonas
 
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleMendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scale
Kris Jack
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
Angelo Salatino
 
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Lillian Rigling
 
Recommender systems
Recommender systemsRecommender systems
Recommender systems
Tamer Rezk
 
Recommenders.ppt
Recommenders.pptRecommenders.ppt
Recommenders.ppt
Aravind Reddy
 
Recommenders.ppt
Recommenders.pptRecommenders.ppt
Recommenders.ppt
NagendraBabu27244
 
Data and Research Infrastructures and Open Science
Data and Research Infrastructures and Open ScienceData and Research Infrastructures and Open Science
Data and Research Infrastructures and Open Science
Laboratorio di Cultura Digitale, labcd.humnet.unipi.it
 

Similar to Building Recommender Systems for Scholarly Information (20)

Research recommendations at Mendeley
Research recommendations at MendeleyResearch recommendations at Mendeley
Research recommendations at Mendeley
 
Discovery study detailed results 20140728
Discovery study detailed results 20140728Discovery study detailed results 20140728
Discovery study detailed results 20140728
 
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزيرمحاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
 
Scientific Publication Retrieval in Linked Data
Scientific Publication Retrieval in Linked DataScientific Publication Retrieval in Linked Data
Scientific Publication Retrieval in Linked Data
 
One System, Different Expectations (Laura Morse)
One System, Different Expectations (Laura Morse)One System, Different Expectations (Laura Morse)
One System, Different Expectations (Laura Morse)
 
The OCLC Research Library Partnership
The OCLC Research Library PartnershipThe OCLC Research Library Partnership
The OCLC Research Library Partnership
 
DDA/OAMI Update, NISO Update ALA Annual 2013
DDA/OAMI Update, NISO Update ALA Annual 2013DDA/OAMI Update, NISO Update ALA Annual 2013
DDA/OAMI Update, NISO Update ALA Annual 2013
 
DDA/OAMI Update - NISO Update, ALA Annual Chicago 2013
DDA/OAMI Update - NISO Update, ALA Annual Chicago 2013DDA/OAMI Update - NISO Update, ALA Annual Chicago 2013
DDA/OAMI Update - NISO Update, ALA Annual Chicago 2013
 
NISO Webinar: Keyword Search = "Improve Discovery Systems"
NISO Webinar: Keyword Search = "Improve Discovery Systems"NISO Webinar: Keyword Search = "Improve Discovery Systems"
NISO Webinar: Keyword Search = "Improve Discovery Systems"
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
2016-05-31 Venia Legendi (CEITER): Sergey Sosnovsky
2016-05-31 Venia Legendi (CEITER): Sergey Sosnovsky2016-05-31 Venia Legendi (CEITER): Sergey Sosnovsky
2016-05-31 Venia Legendi (CEITER): Sergey Sosnovsky
 
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزيرمحاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
 
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
 
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleMendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scale
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
 
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
 
Recommender systems
Recommender systemsRecommender systems
Recommender systems
 
Recommenders.ppt
Recommenders.pptRecommenders.ppt
Recommenders.ppt
 
Recommenders.ppt
Recommenders.pptRecommenders.ppt
Recommenders.ppt
 
Data and Research Infrastructures and Open Science
Data and Research Infrastructures and Open ScienceData and Research Infrastructures and Open Science
Data and Research Infrastructures and Open Science
 

Recently uploaded

8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 

Recently uploaded (20)

8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 

Building Recommender Systems for Scholarly Information

  • 1. Mendeley | Presented By Date Building recommender systems for scholarly information Maya Hristakeva, Daniel Kershaw, Marco Rossetti*, Petr Knoth^, Benjamin Pettit, Saùl Vargas, Kris Jack Daniel Kershaw 10th February 2017 * Currently working at Trainline ^ Currently working at the Open University
  • 2. Mendeley | 2 Mendeley / Mendeley Suggest • Make it easier for user to discover relevant content • Utilize Collective intelligence for article discovery • Citations slow to propagate • Citation lags behind user reading patterns
  • 3. Mendeley | • For the user the recommendations need to be: • Novel • Relevant • Familiar • Serendipitous • Well Explained • How to deal with cold and warm users • How to deal with large data sets 3 Challenges
  • 4. Mendeley | • Implicit – serves recommendations based on user libraries • Recent Activity – based off recent additions to a users library • Research Interests - based on user generated tags • Discipline – based on their self identified discipline 4 Types of Recommendations
  • 5. Mendeley | • Implicit – serves recommendations based on user libraries • Recent Activity – based off recent additions to a users library • Research Interests - based on user generated tags • Discipline – based on their self identified discipline 5 Types of Recommendations Most Personalized Least Personalized
  • 6. Mendeley | Users who have read the same in the past will read the same in the future Identify similar users using cosine similarity cos 𝑢1, 𝑢2 = 𝐿1 × 𝐿2 𝐿1 × 𝐿2 The score of document for user is then a sum across the inverted neighborhood 𝑟𝑑 𝑢 = 𝑢′∈𝑠𝑖𝑚(𝑈,𝑢) cos 𝑢, 𝑢′ , 𝑖𝑓 𝑑 ∈ 𝑙𝑖𝑏(𝑢′) 𝑙𝑖𝑏(𝑢) 0, otherwise 6 Implicit – user-based nearest neighbor collaborative filtering
  • 7. Mendeley | • Use the last article added to a users library or last article read • Fundamentally item-to-item recommendations • Performed through comparing the content of article though TF-IDF vectors. 𝑟𝑎 𝑞,𝑦 = 𝑠𝑖𝑚 𝑞, 𝑦 × (1 + log(𝑝𝑜𝑝𝑢𝑙𝑎𝑟𝑖𝑡𝑦 𝑦, `𝑔𝑙𝑜𝑏𝑎𝑙′ )) • Score modified by the log of the global popularity, as a proxy for the quality of the article 7 Recent Activity
  • 8. Mendeley | • Use user defined tags to form Search Query • Queries article stored in Elastic Search, limited to globally popular documents • Top N documents served as recommendations • More tailored to users • Not all users have filled in interests • Sometimes research interests are mini abstracts 8 Research Interests
  • 9. Mendeley | • User chose discipline from a list of 30 categories (e.g. engineering, arts & humanities) • Popularity - rank each documents in our catalogue according to the number of unique users from that discipline who have it in their libraries 𝑝𝑜𝑝 𝑑, 𝑈𝑔 = 𝑢; 𝑢 ∈ 𝑈𝑔; 𝑑 ∈ 𝑙𝑖𝑏(𝑢) • Trending – rank each document in a discipline based on the rate of growth in popularity across consecutive weeks. 𝑇𝑑 𝑔 = 𝑝𝑜𝑝 𝑑, 𝑈𝑔, 𝜏 − 𝑝𝑜𝑝 𝑑, 𝑈 𝐺, 𝜏 − 1 : 𝜏 = 0 … 𝑛 9 Discipline
  • 10. Mendeley | Predicting what users are going to add to their library Split Mendeley library addition on a time boundary (T). Warm users in both test and training sets ( ≈ 200,000 users) Cold users only in the Testing Data ( ≈ 50,000 users) 10 Evaluation
  • 11. Mendeley | 11 Metrics 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛@𝑛 = 𝑡𝑝 𝑡𝑝 + 𝑓𝑝 𝐹@𝑛1 = 2 × 𝑝@𝑛 × 𝑟@𝑛 𝑝@𝑛 + 𝑟@𝑛 𝑟𝑒𝑐𝑎𝑙𝑙@𝑛 = 𝑡𝑝 𝑡𝑝 + 𝑓𝑛
  • 12. Mendeley | 12 Cold Recommendations
  • 13. Mendeley | 13 Warm Recommendations
  • 14. Mendeley | • Unpublished – undergraduates and new postgrads • Postgraduate – publish 1 or 2 articles • Postdoc – published during their PhD and postdoc • Lecture – extensively published across a number of fields • Professor – prolific author with many collaborations 14 User Segmentation
  • 15. Mendeley | 15 User Segmentation Results
  • 16. Mendeley | Technical implementation • Spark, Hadoop, Mahout, Elastic Search Freshness of Content • Dithering is applied to give the appearance of fresh content to end user 𝑛𝑒𝑤𝑠𝑐𝑜𝑟𝑒 = log(𝑟𝑎𝑛𝑘) + 𝑁 0, log 𝜀 , 𝜀 = ∆𝑟𝑎𝑛𝑘 𝑟𝑎𝑛𝑘 Content Quality • User add anything to their library • Pre filtering removes articles with titles containing `content’ or `TOC’ • Completeness of meta data checked 16 Practicalities 2/10/2017
  • 17. Mendeley | By mining user interaction with the Implicit feedback recommender, learn an optimal ranking based on a comparison of item features and user features e.g. content vectors Aggregate the different recommender systems into one list. With the mixture of recommenders personalized to each user. Future Directions - Learning to Rank
  • 18. Mendeley | Presented By Date http://bit.ly/MendeleyDataScienceJob WE ARE HIRING DATA SCIENTISTS & ENGINEERS! 18

Editor's Notes

  1. It should be noted that this does not take into account thedifferent publication patterns across disciplines only apply a generic classification. Each metric is applied to warm users in each of the five persona classes.
  2. Postdoc and lecturer have a higher recall for recency. This could be due to more senior researchers exploring a focused topic and adding a succession of related pa- pers, whereas less experienced research’s may be exploring the field and require a broader range of recommendations, as delivered by the CF system.