UKSG 2018 Breakout - Setting your cites to open I4OC - Maccallum


Published on

Sitations are the way that researchers communicate how
their work builds on and relates to the work of others and
they can be used to trace how a discovery spreads and is
used by researchers in different disciplines and countries.
Creating a truly comprehensive map of scholarship,
however, relies on having a curated machine-readable
database of citation information, where the provenance of
every citation is clear and reusable. The Initiative for Open
Citations (I4OC), a campaign launched on 6 April 2017,
sought to make publisher members of Crossref aware that
they could open up the citation metadata they already give
to Crossref simply by asking them. With the support of
major publishers and the endorsement of funders and other
organisations, more than 50% of citation data in Crossref
is now freely available, up from less than 1% before the
campaign. This provides the foundation of a well-structured,
open database of literally millions of datapoints that anyone
can query, mine, consume and explore. The presenter will
discuss the aims of the campaign, the new innovative
services that are already using the data, what more still
needs to be done and how you can support the initiative.
Catriona J MacCallum, Hindawi

Published in: Education
  1. 1. Setting your cites on open The Initiative for Open Citations (I4OC) what it is, why it matters and how you can get involved Catriona J. MacCallum (Hindawi) Mark Patterson (eLife) • Dario Taraborelli (Wikimedia Foundation) UKSG• Glasgow, April 2018
  2. 2. Open Access since 2007 ~18,000 peer-reviewed articles a year Science, Technology & Medicine A founding member of OASPA  Free access – no charge to access  No embargos – immediately available  Reuse – Creative Commons Attribution License (CC BY) - use with proper attribution
  3. 3. November 2016 September 2017
  4. 4. Welcome to… … #OpenCitationsMonth at #UKSG18 - personal use only
  5. 5. The Initiative for Open Citations What it is Why it matters Knowledge Discovery Evaluation Beyond the article… How open citations are being re-used How you can get involved
  6. 6. Paper A Paper B
  7. 7. References are Data
  8. 8. The Initiative for Open Citations (
  9. 9. Aim of I40C To promote the availability of data on citations that are structured, separable, and open. Structured - the data representing each publication and each citation instance are expressed in common, machine-readable formats. Separable - citations can be accessed and analyzed without the need to access the source bibliographic products (such as journal articles and books). Open - the data are freely accessible and reusable.
  10. 10. Why? Establish a global public web of linked scholarly citation data to enhance the discoverability of published content, both subscription access and open access. This will particularly benefit individuals who are not members of academic institutions with subscriptions to commercial citation databases. Build new services for the benefit of publishers, researchers, funding agencies, academic institutions and the general public And enhance existing services. Create a public citation graph to explore connections between knowledge fields, and to follow the evolution of ideas and scholarly disciplines.
  11. 11. Image: Andy Lamb, CC BY
  12. 12. The impetus - COASP
  13. 13. How it came together The starting point Most publishers already deposit their citation data with Crossref The default state for the data is not open The challenge Could we persuade a group of influential publishers to release their data all at once?
  14. 14. Making the case It’s easy and doesn’t cost anything All you need to do is to send an email to Publishers also benefit Better discovery tools mean that content will be found and used more The goal cannot be achieved alone A comprehensive network of all scholarship can only be achieved if data is pooled
  15. 15. Making it happen Agree a deadline Everyone has time to prepare their comms and to be part of a big splash Focus on publishers depositing the most data Contacted the top-20 publishers asking for agreement in principle and permission to share their decision Leverage the early adopters As soon as we had a few publishers on board, others quickly followed
  16. 16. Why are publishers joining I4OC? “If you’re not looking to monetize references in some way, why wouldn’t you?” “We believe there is great benefit in supporting sustainable and standardized infrastructure. Opening up our reference metadata cost us no more than the time required to write one simple email.” Liz Ferguson, Wiley “At Taylor & Francis we are working to make it as easy as possible for the communities we serve to achieve their open aims. I4OC sits well with this, and was a very quick and easy process to implement.” “Although we charge for metadata feeds, those are service- rather than content-based charges. We didn’t identify any commercial downside of supporting I4OC as we are highly unlikely to develop significant revenue streams from just our own references.”
  17. 17. “References have long been a path to serendipitous discovery. Making citation data open and machine readable will only accelerate that discovery process for researchers.” Why are publishers joining I4OC? “One of the key purposes of a publisher is to assist in the development of networks of scholarship to aid the cross fertilization of research. Freeing up the reference data is an extremely powerful way of doing that.” “One of the most exciting benefits is the potential to expose networks of research that might otherwise take years to discover.” “It will make our customers’ lives easier by helping data scientists to mine a large body of references in one go. Currently we see little threat to our business as this aligns perfectly with our aims to go beyond open access to research, by using open approaches and utilizing our own data to advance discovery.” Steven Inchcoombe, Springer Nature
  18. 18. The Initiative for Open Citations We built a coalition of major funders, technology platforms, open data organizations and publishers supporting the unrestricted availability of scholarly citation data. STAKEHOLDERS OF THE INITIATIVE FOR OPEN CITATIONS •
  19. 19. Where we started ~1% of the Crossref citation data is open
  20. 20. Where we are now >50% of the citation data is open
  21. 21. What we can do now We can start to use the data
  22. 22. ACKNOWLEDGEMENT: SUSAN KENWRICK FOR HELP WITH THE JIGSAW Where we’d like to get to A public map of scholarship
  23. 23. LONDON UNDERGROUND MAP FROM 1908 (Public Domain) • Can also explore how the map of scholarship has evolved
  24. 24. One year on… The fraction of open citation data has surpassed 50% The number of participating publishers has risen to 490. There are over 500 million references now openly available. There are almost 50 stakeholder organisations who have joined I4OC to help advocate and promote reuse of open citations. The initiative has attracted commentary and media coverage across the world.
  25. 25. Of the top-20 biggest publishers with citation data, all but five now make these data open via Crossref. Three represent Scholarly Societies…
  26. 26. Crossref was founded to enable a shared reciprocal linking and metadata exchange, removing the need for bilateral agreements between publishers and other service providers.
  27. 27. Extracting data via the Crossref API ~41% Crossref records have citation data ~47% of those have public citation data ACKNOWLEDGEMENT: DANIEL ECER, DATA SCIENTIST, ELIFE • Exploring the data from Crossref
  28. 28. >1billion citations 49% are open 53% have DOIs (and can be linked to another record) Some cleanup required Exploring the data from Crossref ACKNOWLEDGEMENT: DANIEL ECER, DATA SCIENTIST, ELIFE •
  29. 29. Why do we need open citations? The ability to undertake large-scale and generalizable bibliometric research … is limited to a few well-funded centers that can afford to pay for full access to the raw data of Web of Science or Scopus. …scientometricians need a data source that is freely available and comprehensive. This is a matter of scientific integrity, scientific progress, and equity Scientometrics is widely used to support science policy and research evaluation, with consequences for the entire scientific community. There is a need for specialized organizations, both commercial and non-commercial, that offer scientometric services. guarantee full transparency and reproducibility of scientometric analyses, these analyses need to be based on open data sources. advocating for open references is critical to ensure replicable and equitable research practices. We should use our relationships with journals—as authors, reviewers, and editorial board members—to advocate for openness and should expect scientometric journals to be leaders in this respect. “References are a product of scholarly work and represent the backbone of science—demonstrating the origin and advancement of knowledge—and provide essential information for studying science and making decisions about the future of research. References are generated by the academic community and should be freely available to this community.” Dec 2017
  30. 30. Who cares about measuring research impact? Researchers (authors and readers) Publishers Funders The public Policy Makers Institutions
  31. 31. Impact factors mask huge variation in citations - if you use it you are dishonest and statistically illiterate @Stephen_Curry #COASP 2015 COASP7 ‘Research and researcher evaluation’ (2015), Stephen Curry (Imperial College London) – available soon from OASPA website
  32. 32. The Acta Crystallographica Section A effect. The plot shows that this journal had a JIF of 2.051 in 2008 which jumped to 49.926 in 2009 due to a single highly-cited paper. Did every other paper in this journal suddenly get amazingly awesome and highly-cited for this period? Of course not. Steve Royle. “Wrong Number: A Closer Look at Impact Factors.” Quantixed, May 2015.
  33. 33. Imperfect Impact Stuart Cantrill January 23, 2016 Imperfect impact Chemical connections
  34. 34. Citation Bias CC BY NC Steven A Greenberg BMJ 2009;339:bmj.b2680 How citation distortions create unfounded authority: analysis of a citation network • Citations to papers supporting rationale for overproduction of β amyloid precursor protein mRNA as a valid model of inclusion body myositis. • The supportive papers received 94% of the 214 citations to these primary data, whereas the six papers containing data that weakened or refuted the claim received only 6% of these citations
  35. 35. Fig 1. Citation distributions of 11 different science journals. Citations are to ‘citable documents’ as classified by Thomson Reuters, which include standard research articles and reviews. The distributions contain citations accumulated in 2015 to citable documents published in 2013 and 2014 in order to be comparable to the 2015 JIFs published by Thomson Reuters. To facilitate direct comparison, distributions are plotted with the same range of citations (0-100) in each plot; articles with more than 100 citations are shown as a single bar at the right of each plot. 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 100+ Numberofpapers Number of citations eLife 0 5 10 15 20 25 30 35 40 45 0 10 20 30 40 50 60 70 80 90 100+ Numberofpapers Number of citations EMBO J. 0 10 20 30 40 50 60 0 10 20 30 40 50 60 70 80 90 100+ Numberofpapers Number of citations J. Informetrics 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 90 100+ Numberofpapers Number of citations Nature 0 50 100 150 200 250 300 350 400 0 10 20 30 40 50 60 70 80 90 100+ Numberofpapers Number of citations Nature Comm. 0 5 10 15 20 25 30 35 40 45 0 10 20 30 40 50 60 70 80 90 100+ Numberofpapers Number of citations PLOS Biol. 0 20 40 60 80 100 120 140 160 180 200 0 10 20 30 40 50 60 70 80 90 100+ Numberofpapers Number of citations PLOS Genet. 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 0 10 20 30 40 50 60 70 80 90 100+ Numberofpapers Number of citations PLOS ONE 0 20 40 60 80 100 120 140 160 180 200 0 10 20 30 40 50 60 70 80 90 100+ Numberofpapers Number of citations Proc. R. Soc. B 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 80 90 100+ Numberofpapers Number of citations Science 0 200 400 600 800 1,000 1,200 0 10 20 30 40 50 60 70 80 90 100+ Numberofpapers Number of citations Sci. Rep. A simple proposal for the publication of journal citation distributions Vincent Larivière1, Véronique Kiermer2, Catriona J. MacCallum3, Marcia McNutt4, Mark Patterson5, Bernd Pulverer6, Sowmya Swaminathan7, Stuart Taylor8, Stephen Curry9* Published in bioRxiv, 2016 : CC BY
  36. 36. Can Scientists Assess Merit or Predict Impact? Analysed subjective rankings of papers from two different data sets over five years • Faculty of 1000 • Welcome Trust (data from Allen et al. of 2 assessor rankings within 6 months of publication) • In relation to citations and impact factor Eyre-Walker A, Stoletzki N (2013) The Assessment of Science: The Relative Merits of Post-Publication Review, the Impact Factor, and the Number of Citations. PLoS Biol 11(10): e1001675. doi:10.1371/journal.pbio.1001675 io.1001675
  37. 37. Subjective assessments of science are poor: Very weak correlation between assessors Strongly biased by the journal in which the paper was published Number of citations or the impact factor exaggerates differences between papers Scientists are also poor at predicting the future impact: Because they are not good at assessing merit Similar articles accumulate citations essentially by chance. “What this paper shows is that whatever merit might be, scientists can't be doing a good job of evaluating it when they rank the importance or quality of papers. From the (lack of) correlation among assessor scores, most of the variation in ranking has to be due to ‘error’ rather than actual quality differences.” Carl Bergstrom , 2013 Eisen JA, MacCallum CJ, Neylon C (2013) Expert Failure: Re-evaluating Research Assessment. PLoS Biol 11(10): e1001677. doi:10.1371/journal.pbio.1001677
  38. 38. What is Quality? Context dependent Discipline Stage of your career Different levels Individual Project Institutional (rankings…) National and International Cannot be distilled into a single number or proxy Multi-variate Metrics need to be qualitative as well as quantitative
  39. 39. Nicolas Raymond CC BY ‘qualities…’
  40. 40. not ‘quality’
  41. 41. DORA
  42. 42. References are data… Data about the network of information Between scholars, fields and science & society A source with which to validate a scholarly work Data sharing is on the agenda… OECD EU Open Science AGU Enabling Fair Data Project Belmont Forum NIH, NSF RDA, CoData, FORCE11 & many others Data citation is a prerequisite as a first class research object e.g. DataCite DOIs in the reference list… References are data One of the most expertly curated sources of scholarly recommendations…
  43. 43. We need to apply the scientific method to the process of scholarly communication itself
  44. 44. Open Science? Open Science = Open Infrastructure+Open Outputs Culture (change) X Access, reuse & discoverability Evaluation & Researcher behaviour How Jeff Rouder @JeffRouder What is Open Science? It is endeavoring to preserve the rights of others to reach independent conclusions about your data and work. 8:47 PM - 5 Dec 2017 Why
  45. 45. most of the data needed to support Open Science is controlled by commercial companies, both big and small. This growing reliance on a handful of companies to provide proprietary analytics and decision tools for research funders and universities poses serious risks for the future Open Source • prevents monopolistic control • requires an active community of users and service providers to develop and maintain infrastructure Open Data • metadata about the research process itself, such as funding data, publication and citation data, and “altmetrics” data Open Integrations • standard metadata formats and open APIs Open Contracts • completely open (public) and no lock-in (e.g. Non- Disclosure Agreements, multi-year contract terms, and privately negotiated prices)
  46. 46. PARTIAL CITATION GRAPH FOR ULRICH K. LAEMMLI (1970) • How data from the I4OC is being reused? The Wikidata Citation Graph 36 million citation links using the cites (P2860) property in Wikidata
  47. 47. How data from the I4OC is being reused? Tools to create profiles Scholia uses data from Wikidata sourced from Crossref and other Metadata providers PROFILE INFORMATION FOR EGON WILLIGHAGEN •
  48. 48. How data from the I4OC is being reused? Integration of cited by data by ScienceOpen SEARCH RESULTS FROM SCIENCEOPEN SHOWING CITED-BY DATA •
  49. 49. How data from the I4OC is being reused? The Open Citations Corpus A broad and open collection of citation information from many sources David Shotton and Silvio Peroni PROGRESS OF THE INITIATIVE FOR OPEN CITATIONS •
  50. 50. Towards a fully open scholarly graph “The visualization shows a structure of science that is well known from earlier large-scale bibliometric visualizations, which were based on Web of Science or Scopus data.” VISUALIZING FREELY AVAILABLE CITATION DATA USING VOSVIEWER •
  51. 51.
  52. 52. The Initiative for Open Citations • I4OC Making tens of millions of machine-readable citation metadata openly available to everyone, with no copyright restriction. PROGRESS OF THE INITIATIVE FOR OPEN CITATIONS •
  53. 53. The road to 100% CROSSREF MEMBERS WITH OPEN REFERENCES • open-references/ A list of all Crossref members with open references and statistics on their open reference coverage
  54. 54. Getting involved If you are a publisher and deposit references, email A CALL TO ACTION TO THE I4OC STAKEHOLDERS •
  55. 55. I am a scholarly publisher already depositing references to Crossref. How do I publicly release them? If you are already submitting article metadata to Crossref as a participant in their Cited-by service: 1. Contact Crossref support directly by e-mail, asking them to turn on reference distribution for all of the relevant DOI prefixes OR 2. Set the <reference_distribution_opt> metadata element to "any" for each DOI deposit for which they want to make references openly available.
  56. 56. How you can help (1) • Publishers who aren't making their references public yet - send an email to Crossref before the end of the month requesting them to make your references open. It's that simple! • Publishers who don't yet deposit references with Crossref - contact Crossref to find out how to do this. • Editors and editorial board members – if references in your journal are not yet made public - contact your publisher and request this. Use this list to see whether your publisher is already making references open. • Funders, institutions, companies, researchers, and all other users of open citation data - write a short piece about your work and the benefits of open citation data for the I4OC website. Please contact
  57. 57. How you can help (2) • If you have a story about open citation data and why they matter to your organization and community, share a link, tag it as #OpenCitationsMonth. We’ll retweet it to our followers. • Please keep an eye on the #OpenCitationsMonth tag, and help us to amplify the message.
  58. 58. Thank you C.J. MacCallum, M. Patterson, D. Taraborelli (2017) Setting your cites on open: what it is, why it matters and how you can help. UKSG 2018 [CC BY 4.0]* Acknowledgments The I4OC founders: OpenCitations, Wikimedia Foundation, PLOS, eLife, DataCite, the Center for Culture and Technology at Curtin University. The I4OC instigators: Jonathan Dugan, Martin Fenner, Jan Gerlach, Catriona MacCallum, Daniel Mietchen, Cameron Neylon, Mark Patterson, Michelle Paulson, Silvio Peroni, David Shotton, Dario Taraborelli The I4OC stakeholders ( and participating publishers (