Metrics14 - ASIS&T SIGMET Workshop, Seattle, 5th November, 2014 
Exploring data quality and retrieval 
strategies for Mendeley reader counts 
Zohreh Zahedi1, Stefanie Haustein2 & Timothy D. Bowman2 
z.zahedi.2@cwts.leidenuniv.nl stefanie.haustein@umontreal.ca tim.bowman@gmail.com 
@zohrehzahedi @stefhaustein @timothydbowman 
1Leiden University, The Netherlands 
2Université de Montréal, Canada
• online reference management tool 
• usage statistics, available via open API
• 2.8 million users, 275,860 groups, 
535 user documents (02/2014) 
• 68 million unique publications (08/2012; 
281 million user documents) 
Mendeley statistics based on monthly user counts from 10/2010 to 02/2014 on the Mendeley website accessed through the Internet Archive
Research Objectives 
• metadata quality and its effect on retrieval
Research Objectives 
• fluctuation in Mendeley coverage and readership 
counts over time and through different retrieval 
strategies (Bar-Ilan, 2014) 
• altmetric studies and tools use different retrieval 
strategies 
• DOI API search 
• title search (e.g., Webometric Analyst) 
 lack of systematic study to determine effect of 
retrieval strategy
Research Objectives 
• analyzing metadata quality of Mendeley entries 
systematically 
• testing completeness and accuracy of relevant 
metadata fields 
• identify and quantify error types 
• analyze difference between retrieval strategies 
determine best retrieval strategy for collecting 
Mendeley reader counts
Research Questions 
• How accurate is the metadata on Mendeley for a 
random sample of publications? 
• In how far do results differ between: 
• manual title search in online catalog 
• API search via DOI 
• What are the most frequent error types in the 
bibliographic data on Mendeley? 
• What retrieval strategy provides the most 
accurate and complete results for the sampled 
publications?
Data set and Method 
• random sample of 2012 WoS publications: 
384 of 1,873,759 documents 
• manual title search via Mendeley online catalog 
n=384 
• DOI search via Mendeley API simultaneously 
n=264 (=-31%) 
• comparison of all relevant metadata 
• Author 
• DOI 
• ISSN 
• Pages 
• Source 
• Title 
• Title 
• Volume 
• Year
Found by manual title search
Found by API DOI search
Results: overview 
n=264 
2 false positives 
91.3% of searched documents 
n=384 
47.4% of searched documents
Results: overview 
documents reader counts 
N % N % + 
identical reader counts 103 36.4 975 41.1 0 
identical 102 36.0 975 41.1 0 
identical, both 0 1 0.4 0 0 0 
API higher 111 39.2 752 31.7 718 
API higher 10 3.5 204 8.6 170 
API higher, manual not found 80 28.3 548 23.1 548 
API 0, manual not found 21 7.4 0 0 0 
manual higher 69 24.4 644 27.2 563 
manual higher 21 7.4 379 16.0 298 
manual higher, API not found 40 14.1 242 10.2 242 
manual higher, API 0 6 2.1 23 1.0 23 
manual 0, API not found 2 0.7 0 0 0 
all documents 283 100.0 2,371 100.0 1,281
Results: incorrect metadata 
Title search 
n=182 
DOI search 
n=241 
93% 
92% 
87% 
90% 
80% 
73% 
85% 
94% 
99% 
7% 
4% 
13% 
6% 
14% 
27% 
15% 
6% 
1% 
Author 
DOI 
ISSN 
Issue 
Pages 
Source 
Title 
Volume 
Year 
6% 
0%* 
68% 
10% 
10% 
24% 
18% 
7% 
1% 
94% 
100%* 
32% 
83% 
83% 
76% 
82% 
91% 
99% 
*the API DOI search retrieved two false positives which are not included in this analysis
Results: error types 
Title search DOI search
Conclusions 
• errors in fields commonly used for matching: 
• Title: 15/18% 
• First author: 7/6% 
• Year: 1/1% 
• source (27/24%), ISSN (13/68%), volume (6/7%), 
issue (6/10%), page number (14/10%) should not 
be used for matching 
• special characters produce most errors, removing 
them would resolve large share of errors: 
• Title: 81/84% 
• First author: 67/73%
Conclusions 
• results of retrieval strategies: 
• manual title: 182 (64%) documents & 1,653 readers 
• API DOI: 241 (85%) & 1,808 
• combined: 283 & 2,371 (max) / 2,486 (sum) 
• DOI search found 101 (36%) additional documents, 
but: 
• could not be applied to 120 (31%) documents w/out DOI 
• did not retrieve 42 (15%) documents found by title 
search 
• led to 2 (1%) false positives 
 combination of DOI and title search w/out special 
characters
Thank you for your attention!

Exploring data quality and retrieval strategies for Mendeley reader counts

  • 1.
    Metrics14 - ASIS&TSIGMET Workshop, Seattle, 5th November, 2014 Exploring data quality and retrieval strategies for Mendeley reader counts Zohreh Zahedi1, Stefanie Haustein2 & Timothy D. Bowman2 z.zahedi.2@cwts.leidenuniv.nl stefanie.haustein@umontreal.ca tim.bowman@gmail.com @zohrehzahedi @stefhaustein @timothydbowman 1Leiden University, The Netherlands 2Université de Montréal, Canada
  • 2.
    • online referencemanagement tool • usage statistics, available via open API
  • 3.
    • 2.8 millionusers, 275,860 groups, 535 user documents (02/2014) • 68 million unique publications (08/2012; 281 million user documents) Mendeley statistics based on monthly user counts from 10/2010 to 02/2014 on the Mendeley website accessed through the Internet Archive
  • 4.
    Research Objectives •metadata quality and its effect on retrieval
  • 5.
    Research Objectives •fluctuation in Mendeley coverage and readership counts over time and through different retrieval strategies (Bar-Ilan, 2014) • altmetric studies and tools use different retrieval strategies • DOI API search • title search (e.g., Webometric Analyst)  lack of systematic study to determine effect of retrieval strategy
  • 6.
    Research Objectives •analyzing metadata quality of Mendeley entries systematically • testing completeness and accuracy of relevant metadata fields • identify and quantify error types • analyze difference between retrieval strategies determine best retrieval strategy for collecting Mendeley reader counts
  • 7.
    Research Questions •How accurate is the metadata on Mendeley for a random sample of publications? • In how far do results differ between: • manual title search in online catalog • API search via DOI • What are the most frequent error types in the bibliographic data on Mendeley? • What retrieval strategy provides the most accurate and complete results for the sampled publications?
  • 8.
    Data set andMethod • random sample of 2012 WoS publications: 384 of 1,873,759 documents • manual title search via Mendeley online catalog n=384 • DOI search via Mendeley API simultaneously n=264 (=-31%) • comparison of all relevant metadata • Author • DOI • ISSN • Pages • Source • Title • Title • Volume • Year
  • 9.
    Found by manualtitle search
  • 10.
    Found by APIDOI search
  • 11.
    Results: overview n=264 2 false positives 91.3% of searched documents n=384 47.4% of searched documents
  • 12.
    Results: overview documentsreader counts N % N % + identical reader counts 103 36.4 975 41.1 0 identical 102 36.0 975 41.1 0 identical, both 0 1 0.4 0 0 0 API higher 111 39.2 752 31.7 718 API higher 10 3.5 204 8.6 170 API higher, manual not found 80 28.3 548 23.1 548 API 0, manual not found 21 7.4 0 0 0 manual higher 69 24.4 644 27.2 563 manual higher 21 7.4 379 16.0 298 manual higher, API not found 40 14.1 242 10.2 242 manual higher, API 0 6 2.1 23 1.0 23 manual 0, API not found 2 0.7 0 0 0 all documents 283 100.0 2,371 100.0 1,281
  • 13.
    Results: incorrect metadata Title search n=182 DOI search n=241 93% 92% 87% 90% 80% 73% 85% 94% 99% 7% 4% 13% 6% 14% 27% 15% 6% 1% Author DOI ISSN Issue Pages Source Title Volume Year 6% 0%* 68% 10% 10% 24% 18% 7% 1% 94% 100%* 32% 83% 83% 76% 82% 91% 99% *the API DOI search retrieved two false positives which are not included in this analysis
  • 14.
    Results: error types Title search DOI search
  • 15.
    Conclusions • errorsin fields commonly used for matching: • Title: 15/18% • First author: 7/6% • Year: 1/1% • source (27/24%), ISSN (13/68%), volume (6/7%), issue (6/10%), page number (14/10%) should not be used for matching • special characters produce most errors, removing them would resolve large share of errors: • Title: 81/84% • First author: 67/73%
  • 16.
    Conclusions • resultsof retrieval strategies: • manual title: 182 (64%) documents & 1,653 readers • API DOI: 241 (85%) & 1,808 • combined: 283 & 2,371 (max) / 2,486 (sum) • DOI search found 101 (36%) additional documents, but: • could not be applied to 120 (31%) documents w/out DOI • did not retrieve 42 (15%) documents found by title search • led to 2 (1%) false positives  combination of DOI and title search w/out special characters
  • 17.
    Thank you foryour attention!