Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Metrics14 - ASIS&T SIGMET Workshop, Seattle, 5th November, 2014 
Exploring data quality and retrieval 
strategies for Mend...
• online reference management tool 
• usage statistics, available via open API
• 2.8 million users, 275,860 groups, 
535 user documents (02/2014) 
• 68 million unique publications (08/2012; 
281 millio...
Research Objectives 
• metadata quality and its effect on retrieval
Research Objectives 
• fluctuation in Mendeley coverage and readership 
counts over time and through different retrieval 
...
Research Objectives 
• analyzing metadata quality of Mendeley entries 
systematically 
• testing completeness and accuracy...
Research Questions 
• How accurate is the metadata on Mendeley for a 
random sample of publications? 
• In how far do resu...
Data set and Method 
• random sample of 2012 WoS publications: 
384 of 1,873,759 documents 
• manual title search via Mend...
Found by manual title search
Found by API DOI search
Results: overview 
n=264 
2 false positives 
91.3% of searched documents 
n=384 
47.4% of searched documents
Results: overview 
documents reader counts 
N % N % + 
identical reader counts 103 36.4 975 41.1 0 
identical 102 36.0 975...
Results: incorrect metadata 
Title search 
n=182 
DOI search 
n=241 
93% 
92% 
87% 
90% 
80% 
73% 
85% 
94% 
99% 
7% 
4% 
...
Results: error types 
Title search DOI search
Conclusions 
• errors in fields commonly used for matching: 
• Title: 15/18% 
• First author: 7/6% 
• Year: 1/1% 
• source...
Conclusions 
• results of retrieval strategies: 
• manual title: 182 (64%) documents & 1,653 readers 
• API DOI: 241 (85%)...
Thank you for your attention!
Upcoming SlideShare
Loading in …5
×

Exploring data quality and retrieval strategies for Mendeley reader counts

1,707 views

Published on

Zohreh Zahedi, Stefanie Haustein & Tim Bowman (2014). Exploring data quality and retrieval strategies for Mendeley reader counts. Presentation at SIGMET Metrics 2014 workshop, 5 November 2014, Seattle, WA (USA)

Published in: Data & Analytics
  • Be the first to comment

Exploring data quality and retrieval strategies for Mendeley reader counts

  1. 1. Metrics14 - ASIS&T SIGMET Workshop, Seattle, 5th November, 2014 Exploring data quality and retrieval strategies for Mendeley reader counts Zohreh Zahedi1, Stefanie Haustein2 & Timothy D. Bowman2 z.zahedi.2@cwts.leidenuniv.nl stefanie.haustein@umontreal.ca tim.bowman@gmail.com @zohrehzahedi @stefhaustein @timothydbowman 1Leiden University, The Netherlands 2Université de Montréal, Canada
  2. 2. • online reference management tool • usage statistics, available via open API
  3. 3. • 2.8 million users, 275,860 groups, 535 user documents (02/2014) • 68 million unique publications (08/2012; 281 million user documents) Mendeley statistics based on monthly user counts from 10/2010 to 02/2014 on the Mendeley website accessed through the Internet Archive
  4. 4. Research Objectives • metadata quality and its effect on retrieval
  5. 5. Research Objectives • fluctuation in Mendeley coverage and readership counts over time and through different retrieval strategies (Bar-Ilan, 2014) • altmetric studies and tools use different retrieval strategies • DOI API search • title search (e.g., Webometric Analyst)  lack of systematic study to determine effect of retrieval strategy
  6. 6. Research Objectives • analyzing metadata quality of Mendeley entries systematically • testing completeness and accuracy of relevant metadata fields • identify and quantify error types • analyze difference between retrieval strategies determine best retrieval strategy for collecting Mendeley reader counts
  7. 7. Research Questions • How accurate is the metadata on Mendeley for a random sample of publications? • In how far do results differ between: • manual title search in online catalog • API search via DOI • What are the most frequent error types in the bibliographic data on Mendeley? • What retrieval strategy provides the most accurate and complete results for the sampled publications?
  8. 8. Data set and Method • random sample of 2012 WoS publications: 384 of 1,873,759 documents • manual title search via Mendeley online catalog n=384 • DOI search via Mendeley API simultaneously n=264 (=-31%) • comparison of all relevant metadata • Author • DOI • ISSN • Pages • Source • Title • Title • Volume • Year
  9. 9. Found by manual title search
  10. 10. Found by API DOI search
  11. 11. Results: overview n=264 2 false positives 91.3% of searched documents n=384 47.4% of searched documents
  12. 12. Results: overview documents reader counts N % N % + identical reader counts 103 36.4 975 41.1 0 identical 102 36.0 975 41.1 0 identical, both 0 1 0.4 0 0 0 API higher 111 39.2 752 31.7 718 API higher 10 3.5 204 8.6 170 API higher, manual not found 80 28.3 548 23.1 548 API 0, manual not found 21 7.4 0 0 0 manual higher 69 24.4 644 27.2 563 manual higher 21 7.4 379 16.0 298 manual higher, API not found 40 14.1 242 10.2 242 manual higher, API 0 6 2.1 23 1.0 23 manual 0, API not found 2 0.7 0 0 0 all documents 283 100.0 2,371 100.0 1,281
  13. 13. Results: incorrect metadata Title search n=182 DOI search n=241 93% 92% 87% 90% 80% 73% 85% 94% 99% 7% 4% 13% 6% 14% 27% 15% 6% 1% Author DOI ISSN Issue Pages Source Title Volume Year 6% 0%* 68% 10% 10% 24% 18% 7% 1% 94% 100%* 32% 83% 83% 76% 82% 91% 99% *the API DOI search retrieved two false positives which are not included in this analysis
  14. 14. Results: error types Title search DOI search
  15. 15. Conclusions • errors in fields commonly used for matching: • Title: 15/18% • First author: 7/6% • Year: 1/1% • source (27/24%), ISSN (13/68%), volume (6/7%), issue (6/10%), page number (14/10%) should not be used for matching • special characters produce most errors, removing them would resolve large share of errors: • Title: 81/84% • First author: 67/73%
  16. 16. Conclusions • results of retrieval strategies: • manual title: 182 (64%) documents & 1,653 readers • API DOI: 241 (85%) & 1,808 • combined: 283 & 2,371 (max) / 2,486 (sum) • DOI search found 101 (36%) additional documents, but: • could not be applied to 120 (31%) documents w/out DOI • did not retrieve 42 (15%) documents found by title search • led to 2 (1%) false positives  combination of DOI and title search w/out special characters
  17. 17. Thank you for your attention!

×