Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1
b
b
www.know-center.at
Exploring Coverage and Distribution of
Scholarly Identifiers on the Web
14th International Sympos...
2
Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
The Emerging Eco-system of Scholarly ...
3
Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
The Emerging Eco-system of Scholarly ...
4
Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
5
Research Questions
• How are scholarly identifiers distributed in crowd-
sourced systems on the Web?
• Does the provisio...
6
Indentifiers on the Scholarly Web
Article level
Publication levelAuthor level
Know-Center GmbH • Research Center for Dat...
7
Data & Method
Data Sources: arXiv, CrossRef
and Mendeley
arXiv discipline of quantitative
biology (q-bio), 1992-2014
(n=...
8
Distribution of arXiv Articles Over Time: q-bio
Know-Center GmbH • Research Center for Data-Driven Business and Big Data...
9
Distribution of arXiv Articles Over Time: astro-ph
Know-Center GmbH • Research Center for Data-Driven Business and Big D...
10
Results of the Data Collection Process
Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analyti...
11
Findability on Mendeley: q-bio (1992-2014)
11Know-Center GmbH • Research Center for Data-Driven Business and Big Data A...
12
Findability on Mendeley: astro-ph (2009-2014)
12Know-Center GmbH • Research Center for Data-Driven Business and Big Dat...
13
Findability on Mendeley & DOI Coverage: q-bio
Know-Center GmbH • Research Center for Data-Driven Business and Big Data ...
14
Findability on Mendeley & DOI Coverage: astro-ph
Know-Center GmbH • Research Center for Data-Driven Business and Big Da...
15
Identifier Frequency and Mean Readership on
Mendeley
Know-Center GmbH • Research Center for Data-Driven Business and Bi...
16
Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
Distribution of Disciplines on Mende...
17
Identifier Combination Frequency on Mendeley
Know-Center GmbH • Research Center for Data-Driven Business and Big Data A...
18
Conclusions
• As expected, crowd-sourced systems show big differences
in identifier coverage and distribution  crowd-s...
19
Conclusions
• The distribution of identifiers in a collection of papers
gives hints at the nature of papers in this col...
20
Limitations & Future Work
• Choice of arXiv as primary data source introduces a bias
• We only looked at two discipline...
21
Open Source Crawling Framework
Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
https...
22
b
SAVE THE DATE: i-KNOW Conference
Special Track on Science 2.0 & Open Science
21-23 October 2015, Graz, Austria
Submis...
Upcoming SlideShare
Loading in …5
×

Exploring Coverage and Distribution of Scholarly Identifiers on the Web

1,343 views

Published on

Presentation at the 14th International Symposium of Information Science; Zadar, 21 May 2015

Published in: Education
  • Hello! I can recommend a site that has helped me. It's called ⇒ www.WritePaper.info ⇐ They helped me for writing my quality research paper.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Exploring Coverage and Distribution of Scholarly Identifiers on the Web

  1. 1. 1 b b www.know-center.at Exploring Coverage and Distribution of Scholarly Identifiers on the Web 14th International Symposium of Information Science Zadar, 21 May 2015 Peter Kraker, Asura Enkhbayar & Elisabeth Lex
  2. 2. 2 Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics The Emerging Eco-system of Scholarly Services on the Web
  3. 3. 3 Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics The Emerging Eco-system of Scholarly Services on the Web
  4. 4. 4 Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
  5. 5. 5 Research Questions • How are scholarly identifiers distributed in crowd- sourced systems on the Web? • Does the provision of different identifiers have an influence on findability of scientific publications in other bibliographic and bibliometric sources? • Who are the top providers of identifiers? • Which identifier combinations are the most common? Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
  6. 6. 6 Indentifiers on the Scholarly Web Article level Publication levelAuthor level Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
  7. 7. 7 Data & Method Data Sources: arXiv, CrossRef and Mendeley arXiv discipline of quantitative biology (q-bio), 1992-2014 (n=14,195 metadata records) Data collection: 17/11/2014 arXiv discipline of astrophysics (astro-ph), 2009-2014 (n=81,814 metdata records) Data collection: 04/03/2015 Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics arxiv-ID, DOI DOI • arXiv-ID • DOI • Scopus-ID • Pubmed-ID • ISSN
  8. 8. 8 Distribution of arXiv Articles Over Time: q-bio Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
  9. 9. 9 Distribution of arXiv Articles Over Time: astro-ph Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
  10. 10. 10 Results of the Data Collection Process Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics 36.1% 50.0% 81.5% 29.8% 76.8% 86.1% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% docs with DOI on arXiv docs with DOI docs found on Mendeley q-bio (1992-2014; n=14,195) astro-ph (2009-2014; n=81,814) Change in the metadata lookup
  11. 11. 11 Findability on Mendeley: q-bio (1992-2014) 11Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics Total documents 14,195 Documents with DOI & arXiv-ID 7,100 (50.02%) Documents with arXiv-ID 7,095 (49.98%) Documents retrieved on Mendeley with DOI 6,492 (91.44%) Documents retrieved on Mendeley with arXiv-ID 4,896 (69.01%) Documents retrieved on Mendeley with arXiv-ID 5,414 (76.25%)
  12. 12. 12 Findability on Mendeley: astro-ph (2009-2014) 12Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics Total documents 81,814 Documents with DOI & arXiv-ID 62,800 (76.8%) Documents with arXiv-ID 19,014 (23.2%) Documents retrieved on Mendeley with DOI 54,093 (86.1%) Documents retrieved on Mendeley with arXiv-ID 12,812 (67.4%) Documents retrieved on Mendeley with arXiv-ID 45,586 (72.6%)
  13. 13. 13 Findability on Mendeley & DOI Coverage: q-bio Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
  14. 14. 14 Findability on Mendeley & DOI Coverage: astro-ph Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 2009 2010 2011 2012 2013 2014 findability DOI availability
  15. 15. 15 Identifier Frequency and Mean Readership on Mendeley Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics arxiv doi scopus pmid issn q-bio (n=11,570) frequency 10,351 (89.5%) 8,321 (71.9%) 8,409 (72.7%) 5,477 (47.3%) 8,119 (70.2%) mean readership 20.4 25.4 25.4 32.4 25.9 median readership 9 13 13 19 14 astro-ph (n=70,459) frequency 65,719 (93.3%) 61,667 (87.5%) 58,780 (83.4%) 1,454 (2.1%) 60,985 (86.6%) mean readership 7.2 7.3 7.5 20.6 7.3 median readership 5 5 13 5 6 StdDev: 33.4
  16. 16. 16 Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics Distribution of Disciplines on Mendeley (March 2011) [Kraker et al. 2012]
  17. 17. 17 Identifier Combination Frequency on Mendeley Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics 39.2% 22.8% 18.2% 4.4% 4.1% 2.7% 0.0% 0.0% 0.0% 8.5% 1.3% 10.2% 74.5% 0.0% 0.0% 4.5% 2.6% 1.7% 1.5% 3.7% 0% 20% 40% 60% 80% 100% arxiv-doi-issn-pmid-scopus arxiv arxiv-doi-issn-scopus doi-issn-pmid-scopus arxiv-scopus doi-issn-scopus doi-arxiv-issn doi-issn doi-arxiv Other astro-ph (2009-2014; n=81,814) q-bio (1992-2014; n=14,195)
  18. 18. 18 Conclusions • As expected, crowd-sourced systems show big differences in identifier coverage and distribution  crowd-sourced data needs to be amended automatically • When retrieving arXiv articles from Mendeley, we were able to obtain more articles using the DOI than the arXiv-ID • BUT: a single arXiv-ID is the second most popular identifier combination on Mendeley. This suggests that pre-prints are being read – if at a lower level • There is, however, a certain time lag concerning the adoption of articles from arXiv on Mendeley Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
  19. 19. 19 Conclusions • The distribution of identifiers in a collection of papers gives hints at the nature of papers in this collection • As with citations, field normalization will be essential for cross-comparison of altmetrics scores Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
  20. 20. 20 Limitations & Future Work • Choice of arXiv as primary data source introduces a bias • We only looked at two disciplines from natural sciences • Using a random sample of 381 articles from Web of Science, Zahedi et al. (2014) report that they were able to retrieve only 47.7% of articles on Mendeley using the DOI or the title. Expand this study to further disciplines and fields using more data sources  Look more deeply into the reasons for disciplinary and tool-related differences Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
  21. 21. 21 Open Source Crawling Framework Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics https://github.com/Bubblbu/crawling-framework
  22. 22. 22 b SAVE THE DATE: i-KNOW Conference Special Track on Science 2.0 & Open Science 21-23 October 2015, Graz, Austria Submission Deadline (extended): 22 June 2015 (Abstracts due: June 8)

×