SlideShare a Scribd company logo
1 of 24
Download to read offline
Oxford e-Research Centre
University of Oxford, UK
9th Conference on
Open Access
Scholarly Publishing
Lisbon, Portugal
20 Sept 2017
© David Shotton 2017 Published under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Licence
david.shotton@opencitations.net
David Shotton
The Initiative for Open Citations
and the OpenCitations Corpus
2013 “Free scholarly citation data!”
Donatello’s
John the Baptist
Fifth Conference on
Open Access
Scholarly Publishing
Riga, Latvia
20 September 2013
. . . the voice of one
crying in the wilderness
2016 “Release open citation data!”
Eighth Conference on
Open Access
Scholarly Publishing
Virginia, USA
20 September 2016
Dario Taraborelli
Head of Research,
Wikimedia Foundation
2017 The year of success - citation data is freed!
n  Two fantastic success stories
§  The Initiative for Open Citations https://i4oc.org/
§  The OpenCitations Corpus http://opencitations.net
n  While related, these initiatives are separate and distinct
n  Two Italian heros: Dario Taraborelli and Silvio Peroni
Crossref - providing the fundamental infrastructure
https://www.crossref.org/
n  Crossref is the registration agency of Digital Object Identifiers (DOIs) for
scholarly publications (journal articles). Most publishers are members
n  Crossref hold metadata about articles, made available via its REST API
https://www.crossref.org/services/metadata-delivery/rest-api/
n  Crossref has its own heros:
Ed Pentz Executive Director Geoff Bilder Director of Strategic Initiatives
The Initiative for Open Citations
n  The Initiative for Open Citations is a collaboration between scholarly publishers,
researchers, and other interested parties to promote the unrestricted availability
of scholarly citation It does not host citation data!
n  Launched April 6, 2017 Web site https://i4oc.org
n  Spearheaded by Dario Taraborelli of the Wikimedia Foundation
§  with help from Jonathan Dugan, Martin Fenner, Jan Gerlach,
Catriona MacCallum, Daniel Mietchen, Cameron Neylon,
Mark Patterson, Michelle Paulson, Silvio Peroni and myself
n  Six founding organizations:
§  The Wikimedia Foundation, PLOS, eLife, DataCite, OpenCitations,
and the Centre for Culture and Technology at Curtin University
n  Within a short space of time, I4OC has persuaded most of the major scholarly
publishers to make their reference lists open, so that the proportion of all
references submitted to Crossref that are now open has risen from 1% to
over 45%!
Publishers supporting I4OC and opening their references
n  49 scholarly publishers have opened their references, including the following
major ones:
n  Commercial publishers
§  Association for Computing Machinery, BMJ, De Gruyter, eLife, EMBO
Press, Hindawi, IOS Press, PeerJ, Pensoft Publishers, Portland Press,
Public Library of Science, Springer Nature, Taylor & Francis, Wiley
n  University and scholarly presses
§  Cambridge University Press, Cold Spring Harbor Laboratory Press,
Company of Biologists, Edinburgh University Press, MIT Press,
Rockefeller University Press
n  Learned societies
§  American Association for the Advancement of Science (AAAS),
American Physical Society, American Society for Cell Biology,
International Union of Crystallography, Proceedings of the
National Academy of Sciences (PNAS), Royal Society of Chemistry,
The Royal Society
Organizations and institutions who have endorsed I4OC
n  Funders
§  Sloan Foundation, Bill and Melinda Gates Foundation, Jisc, Simons
Foundations Science Sandbox, Wellcome Trust
n  Research organizations
§  Allen Institute for Artificial Intelligence, Microsoft Research
n  Libraries
§  Association of Research Libraries, British Library, California Digital
Library, Harvard Library Office for Scholarly Communication, LIBER,
Max Planck Digital Library
n  Bibliographic / bibliometric organizations
§  Altmetrics, CiteSeerX, DBLP Computer Science Bibliography,
ImpactStory, Zotero
n  Other organizations
§  Dryad Data Repository, Figshare, Internet Archive, Mozilla, OASPA,
Open Knowledge International, OpenAire, ScienceOPEN, Wiki Education
Foundation, Wikimedia Deutchland, Wikimedia UK
I4OC – what’s left to do
n  Almost 50% of Crossref-deposited references, from ~16 million articles, are
now open, leaving about half that are still closed
n  Crossref has over 7000 members, and it’s the long tail of smaller
publisher-members that are not presently opening their references
n  This includes a large number of Open Access publishers!
§  Just because an article is published as Open Access and its references
are available on the publisher’s web site, this is not sufficient for the bulk
harvesting and analysis of citation data
§  Imagine the effort of going to each site in turn and scraping reference lists
presented in a wide variety of differing formats and DTD markups!
n  Many small scholarly publishers are not even members of Crossref
n  But help is at hand:
§  OASPA has a sponsored agreement with Crossref whereby its smaller
members can join Crossref via OASPA, with OASPA covering the cost of
a proportion of their DOIs
How to open references using the Crossref Cited-by service
n  The Crossref Cited-by service is a free service that helps publishers find out who
is citing their articles
n  Publishers submit article reference lists to Crossref along with other metadata
n  However, the Crossref default is that these reference lists are closed, not OPEN!
n  To open their article reference lists, a publisher needs to do one of two things:
§  Either contact support@crossref.org and ask them to turn on reference
distribution for all the DOI prefixes they manage
§  Or, in the article metadata they submit to Crossref, set the
<reference_distribution_opt> span element to “any” for each DOI deposit
where they want to make references openly available
n  It’s that easy!!!
ZooKeys use of Crossref open citation data
The OpenCitations Corpus
n  OpenCitations (http://opencitations.net) is a small infrastructure organization
directed by myself and Silvio Peroni
n  Its primary purpose is to host and develop the OpenCitations Corpus (OCC),
a Linked Open Data repository of scholarly bibliographic citation data
n  A founding member of I4OC, it is distinct and separate from that initiative
n  The first OCC prototype was created at Oxford in 2011 with Jisc funding – see
my 2013 COASP talk in Riga (http://zeeba.tv/the-open-citations-corpus/)
n  A new instance of the OCC, based on our revised metadata schema, was
created by Silvio Peroni and is now running at the University of Bologna
n  It has been ingesting scholarly references continuously since early July 2016
n  OCC now provides the largest RDF collection of open citation data on the Web
§  Currently holds references from ~240,000 citing bibliographic resources
§  Provides >10 million citation links to over 5.5 million cited resources
§  These data are freely available under a CC0 public domain waiver
Source data - reference lists from PubMed Central
n  At present, the ingested reference lists are obtained by processing the XML
sources of papers in the Open Access subset of PubMed Central
n  These are parsed to yield authors, titles, journal names, etc.
§  We ask for the most recent papers first
§  Thus, as citing papers, the OCC mainly includes articles published in
2016 and 2017
n  The identifiers of all the citing papers already processed are stored locally, so
as not to request the same XML source twice
n  We then call several external APIs, including Crossref and ORCID, to obtain
additional metadata describing the citing and cited papers and their authors
n  There are almost 1.7 million OA articles available in PubMed
§  So far we have harvested 14% . . .
The raw reference list data
n  The reference lists extracted from citing papers are made available in JSON:
{

"doi": "10.1007/s11892-016-0752-4",

"pmid": "27168063",

"pmcid": "PMC4863913",

"localid": "MED-27168063",

"curator": "BEE EuropeanPubMedCentralProcessor",

"source": "http://www.ebi.ac.uk/europepmc/webservices/rest/PMC4863913/fullTextXML",

"source_provider": "Europe PubMed Central”

"references": [

...


{

"bibentry": "Chang, KY, Unanue, ER. Prediction of HLA-DQ8beta cell peptidome using

a computational program and its relationship to autoreactive T cells,

Int Immunol, 2009, 21, 6, 705, 13, DOI: 10.1093/intimm/dxp039, 

PMID: 19461125",

"pmid": "19461125",

"doi": "10.1093/intimm/dxp039",

"pmcid": "PMC2686615",

"process_entry": "True”

},

...

]

}
The citing paper's metadata and identifiers
A reference in the citing paper's reference list, with its own ids
The SPAR (Semantic Publishing and Referencing) Ontologies
FaBiO, the FRBR-aligned Bibliographic Ontology - an ontology for
describing bibliographic entities (books, articles, etc.)
CiTO, the Citation Typing Ontology - enables the characterization of
citations, both factually and rhetorically
BiRO, the Bibliographic Reference Ontology - an ontology to define
bibliographic records and references, and their compilation into
bibliographic collections and reference lists, respectively
http://www.sparontologies.net/
n  OCC data are then stored in RDF (JSON-LD) using the SPAR (Semantic
Publishing and Referencing) ontologies and other standard vocabularies
n  These SPAR ontologies include
Availability of the OpenCitations Corpus data
n  All the OpenCitations software is available on GitHub under an open license
n  The data in the OpenCitations Corpus are available in three different ways:
§  Direct access to bibliographic resources by means of their HTTP URIs
(via content negotiation), e.g. https://w3id.org/oc/corpus/br/1
§  Queries to our SPARQL endpoint: https://w3id.org/oc/sparql
§  Monthly dumps stored in Figshare: http://opencitations.net/download
n  Currently the OCC uses a good graph-based triplestore – Blazegraph
n  However, the virtual machine that hosts it is very limited in resources,
causing performance problems for demanding SPARQL queries
n  We plan soon to commission a new powerful physical server that should
provide a better user experience, and to develop additional user-friendly
interfaces for accessing the OCC data, including graphic visualizations of
citation networks
Use of the OpenCitations web site
n  Accesses to the OpenCitations web site and services:
The “corpus” and “sparql” pages have together gained 89% of the total accesses, showing that
people mainly access the OpenCitations Corpus to explore and use the data within it
Use of OpenCitations data stored on Figshare
What happened this summer?
n  Use of the OpenCitations social accounts
§  Twitter - https://twitter.com/opencitations
§  Wordpress Blog – https://opencitations.wordpress.com/
increased markedly following the launch of the Initiative for Open Citations
Who is using OpenCitations, and for what?
n  Organizations and projects that we know use OpenCitations resources include:
§  Wikidata - pulling citation data to enrich their pages
§  OpenAIRE – using OCC bibliographic resources info in OpenAIRE
§  LOC-DB - have adopted the OpenCitations data model for their database
§  Tomas Petricek of the Turing Institute - extending his Gamma Project
visualization software to handle OpenCitations’ RDF data
§  Ontotext.com - combining Springer's SciGraph data with OpenCitations
data using SPARQL federation
§  Anna Kamińska of the Polish Librarians Association - undertaking citation
network analysis of PLoS One research papers using data in the OCC
n  We can’t know who else is using OpenCitations resources unless they tell us!
§  Please let us know if you are!
n  On 10th September, Crossref blogged about our use of their REST API
§  https://www.crossref.org/blog/using-the-crossref-rest-api.-part-5-with-
opencitations/
Present status of OpenCitations
n  We have recently received a small
grant from the Sloan Foundation for the
OpenCitations Enhancement Project
§  This provides one year’s salary
for a postdoc to develop new user
interfaces, and new hardware to
enhance the OCC performance
n  We have just appointed Ivan Heibi to
work on the OCC with Silvio in Bologna
n  Silvio and Ivan will be commissioning
the new hardware next month
§  This will use parallel processing
to increase ingest rate 30-fold
n  We are in the process of appointing an
International Advisory Board to guide
the growth of OpenCitations
Enhancing the OpenCitations ingestion rate
n  OpenCitations current ingests ~8 million new citations per year
n  With 30 Raspberry Pis working in parallel as ingest machines, we anticipate
that this rate will increase to ~240 million new citations per year
n  By the end of 2018, OpenCitations should hold ~ 250 million citations,
compared to Web of Knowledge’s ~1.25 billion
n  Even this partial coverage will include citations of all important papers,
these critical papers being easily recognized because they are highly cited,
forming nodes in the citation graph with a large number of inward citation links
n  A further five-fold increase in ingest rate - significant but achievable with
additional hardware (and funding!) - will enable us to reach parity by 2020
Where will the references come from?
n  With the enhanced ingest rate, we will quickly consume all 1.7 million articles
in the Open Access Subset of PubMed Central
n  We will then start harvesting the references from the ~16 million articles
already made open at Crossref in response to the Initiative for Open Citations,
and the additional articles that I4OC now encourages other publishers to open
n  Possible additional significant sources of open citation data include
§  ArXiv (1.3 million preprints)
§  CiteSeerX (>120 million references from >6 million documents)
§  CitEc (11 million references from a million Economics papers)
n  References from pre-digital publications extracted by text mining, e.g.
§  In the Social Sciences, from the LOC-DB at the University of Mannheim
§  In Biological Taxonomy, mined into BioStor by Rod Page from the
Biodiversity Heritage Library, e.g. http://biostor.org/reference/105357
We are winning the battle for open scholarship!
david.shotton@opencitations.net
David Shotton
Silvio Peroni
silvio.peroni@opencitations.net
Website: http://opencitations.net
Email: contact@opencitations.net
Twitter: @opencitations
Blog: https://opencitations.wordpress.com
Website: https://i4oc.org/
Email: info@i4oc.org
Twitter: @i4oc_org
dtaraborelli@wikimedia.org
Dario Taraborelli
Mark Patterson
m.patterson@elifesciences.org
Catriona MacCallum
catriona.maccallum@hindawi.com

More Related Content

What's hot

How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)Charleston Conference
 
Multilingual presentation ifla 2013 08-19
Multilingual presentation ifla 2013 08-19Multilingual presentation ifla 2013 08-19
Multilingual presentation ifla 2013 08-19Janifer Gatenby
 
China: Journal Publishing, DOI and CrossCheck (2011 CrossRef Workshops)
China: Journal Publishing, DOI and CrossCheck (2011 CrossRef Workshops)China: Journal Publishing, DOI and CrossCheck (2011 CrossRef Workshops)
China: Journal Publishing, DOI and CrossCheck (2011 CrossRef Workshops)Crossref
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for RepositoriesMartin Klein
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemHerbert Van de Sompel
 
How Libraries Use Publisher Metadata - Crossref Community Webinar
How Libraries Use Publisher Metadata - Crossref Community WebinarHow Libraries Use Publisher Metadata - Crossref Community Webinar
How Libraries Use Publisher Metadata - Crossref Community WebinarCrossref
 
towards interoperable archives: the Universal Preprint Service initiative
towards interoperable archives:  the Universal Preprint Service initiativetowards interoperable archives:  the Universal Preprint Service initiative
towards interoperable archives: the Universal Preprint Service initiativeHerbert Van de Sompel
 
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...Alison Hitchens
 
Linked open data and libraries
Linked open data and librariesLinked open data and libraries
Linked open data and librariesAlison Hitchens
 
What is #LODLAM?! (revised January 2015)
What is #LODLAM?! (revised January 2015)What is #LODLAM?! (revised January 2015)
What is #LODLAM?! (revised January 2015)Alison Hitchens
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTHerbert Van de Sompel
 
Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Herbert Van de Sompel
 
Environmental trends and OCLC Research, a presentation at the University of N...
Environmental trends and OCLC Research, a presentation at the University of N...Environmental trends and OCLC Research, a presentation at the University of N...
Environmental trends and OCLC Research, a presentation at the University of N...lisld
 
Verifiable, linked open knowledge that anyone can edit
Verifiable, linked open knowledge that anyone can editVerifiable, linked open knowledge that anyone can edit
Verifiable, linked open knowledge that anyone can editDario Taraborelli
 
Open Annotation Collaboration Introduction
Open Annotation Collaboration IntroductionOpen Annotation Collaboration Introduction
Open Annotation Collaboration IntroductionTimothy Cole
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISimon Jupp
 
The library in the life of the user
The library in the life of the userThe library in the life of the user
The library in the life of the userlisld
 

What's hot (20)

Reminiscing about interoperability
Reminiscing about interoperabilityReminiscing about interoperability
Reminiscing about interoperability
 
How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)
 
Multilingual presentation ifla 2013 08-19
Multilingual presentation ifla 2013 08-19Multilingual presentation ifla 2013 08-19
Multilingual presentation ifla 2013 08-19
 
China: Journal Publishing, DOI and CrossCheck (2011 CrossRef Workshops)
China: Journal Publishing, DOI and CrossCheck (2011 CrossRef Workshops)China: Journal Publishing, DOI and CrossCheck (2011 CrossRef Workshops)
China: Journal Publishing, DOI and CrossCheck (2011 CrossRef Workshops)
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for Repositories
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication System
 
How Libraries Use Publisher Metadata - Crossref Community Webinar
How Libraries Use Publisher Metadata - Crossref Community WebinarHow Libraries Use Publisher Metadata - Crossref Community Webinar
How Libraries Use Publisher Metadata - Crossref Community Webinar
 
towards interoperable archives: the Universal Preprint Service initiative
towards interoperable archives:  the Universal Preprint Service initiativetowards interoperable archives:  the Universal Preprint Service initiative
towards interoperable archives: the Universal Preprint Service initiative
 
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
 
Linked open data and libraries
Linked open data and librariesLinked open data and libraries
Linked open data and libraries
 
What is #LODLAM?! (revised January 2015)
What is #LODLAM?! (revised January 2015)What is #LODLAM?! (revised January 2015)
What is #LODLAM?! (revised January 2015)
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
 
Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013
 
Environmental trends and OCLC Research, a presentation at the University of N...
Environmental trends and OCLC Research, a presentation at the University of N...Environmental trends and OCLC Research, a presentation at the University of N...
Environmental trends and OCLC Research, a presentation at the University of N...
 
Verifiable, linked open knowledge that anyone can edit
Verifiable, linked open knowledge that anyone can editVerifiable, linked open knowledge that anyone can edit
Verifiable, linked open knowledge that anyone can edit
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
Open Annotation Collaboration Introduction
Open Annotation Collaboration IntroductionOpen Annotation Collaboration Introduction
Open Annotation Collaboration Introduction
 
Bracke may4-1
Bracke may4-1Bracke may4-1
Bracke may4-1
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBI
 
The library in the life of the user
The library in the life of the userThe library in the life of the user
The library in the life of the user
 

Similar to The Initiative for Open Citations and the OpenCitations Corpus

David Shotton - OpenCon Oxford, 1st Dec 2017
David Shotton - OpenCon Oxford, 1st Dec 2017David Shotton - OpenCon Oxford, 1st Dec 2017
David Shotton - OpenCon Oxford, 1st Dec 2017Crossref
 
finde datasets repository.pptx
finde datasets repository.pptxfinde datasets repository.pptx
finde datasets repository.pptxhasanrdhaiwi
 
Open Access: an introduction
Open Access: an introductionOpen Access: an introduction
Open Access: an introductionElizabeth Yates
 
Open data sources in VOSviewer
Open data sources in VOSviewerOpen data sources in VOSviewer
Open data sources in VOSviewerNees Jan van Eck
 
2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...
2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...
2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...Crossref
 
Possible ways of getting oneself abreast of current literature
Possible ways of getting oneself abreast of current literaturePossible ways of getting oneself abreast of current literature
Possible ways of getting oneself abreast of current literatureMythili Srinivasan
 
Visualizing science based on open data sources
Visualizing science based on open data sourcesVisualizing science based on open data sources
Visualizing science based on open data sourcesNees Jan van Eck
 
University at Albany Lunch and Learn
University at Albany Lunch and LearnUniversity at Albany Lunch and Learn
University at Albany Lunch and Learnrachelmccullough
 
Finding Insights in Article-Level Metrics for Research Evaluation
Finding Insights in Article-Level Metrics for Research EvaluationFinding Insights in Article-Level Metrics for Research Evaluation
Finding Insights in Article-Level Metrics for Research EvaluationRichard Cave
 
PLoS - Why It is a Model to be Emulated
PLoS - Why It is a Model to be EmulatedPLoS - Why It is a Model to be Emulated
PLoS - Why It is a Model to be EmulatedPhilip Bourne
 
Crossref/OASPA Publishers
Crossref/OASPA PublishersCrossref/OASPA Publishers
Crossref/OASPA PublishersCrossref
 
Postgraduate orientation 6th june 2017
Postgraduate orientation 6th june 2017Postgraduate orientation 6th june 2017
Postgraduate orientation 6th june 2017Debs Martindale
 
Open Bibliography, Citations and Scholarship
Open Bibliography, Citations and ScholarshipOpen Bibliography, Citations and Scholarship
Open Bibliography, Citations and Scholarshipbenosteen
 
A Strategy for Sharing Your Research: Make Your Work Open Access
A Strategy for Sharing Your Research: Make Your Work Open AccessA Strategy for Sharing Your Research: Make Your Work Open Access
A Strategy for Sharing Your Research: Make Your Work Open AccessSunghae Ress
 
Web Today, Good Tomorrow? Transactional archiving of web content
Web Today, Good Tomorrow? Transactional archiving of web contentWeb Today, Good Tomorrow? Transactional archiving of web content
Web Today, Good Tomorrow? Transactional archiving of web contentPeter Burnhill
 
The role of open access with regards to bibliometrics in the merit and resour...
The role of open access with regards to bibliometrics in the merit and resour...The role of open access with regards to bibliometrics in the merit and resour...
The role of open access with regards to bibliometrics in the merit and resour...Gustaf Nelhans
 
Open Access + Preprints for Scholars and Journals
Open Access + Preprints for Scholars and Journals Open Access + Preprints for Scholars and Journals
Open Access + Preprints for Scholars and Journals Scholastica
 

Similar to The Initiative for Open Citations and the OpenCitations Corpus (20)

David Shotton - OpenCon Oxford, 1st Dec 2017
David Shotton - OpenCon Oxford, 1st Dec 2017David Shotton - OpenCon Oxford, 1st Dec 2017
David Shotton - OpenCon Oxford, 1st Dec 2017
 
finde datasets repository.pptx
finde datasets repository.pptxfinde datasets repository.pptx
finde datasets repository.pptx
 
UKSG 2018 Breakout - Setting your cites to open I4OC - Maccallum
UKSG 2018 Breakout - Setting your cites to open I4OC - MaccallumUKSG 2018 Breakout - Setting your cites to open I4OC - Maccallum
UKSG 2018 Breakout - Setting your cites to open I4OC - Maccallum
 
Open Access: an introduction
Open Access: an introductionOpen Access: an introduction
Open Access: an introduction
 
Open data sources in VOSviewer
Open data sources in VOSviewerOpen data sources in VOSviewer
Open data sources in VOSviewer
 
The university library as a support for the institutional research identity
The university library as a support for the institutional research identityThe university library as a support for the institutional research identity
The university library as a support for the institutional research identity
 
2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...
2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...
2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...
 
Possible ways of getting oneself abreast of current literature
Possible ways of getting oneself abreast of current literaturePossible ways of getting oneself abreast of current literature
Possible ways of getting oneself abreast of current literature
 
Visualizing science based on open data sources
Visualizing science based on open data sourcesVisualizing science based on open data sources
Visualizing science based on open data sources
 
University at Albany Lunch and Learn
University at Albany Lunch and LearnUniversity at Albany Lunch and Learn
University at Albany Lunch and Learn
 
Finding Insights in Article-Level Metrics for Research Evaluation
Finding Insights in Article-Level Metrics for Research EvaluationFinding Insights in Article-Level Metrics for Research Evaluation
Finding Insights in Article-Level Metrics for Research Evaluation
 
PLoS - Why It is a Model to be Emulated
PLoS - Why It is a Model to be EmulatedPLoS - Why It is a Model to be Emulated
PLoS - Why It is a Model to be Emulated
 
Syracuse Lunch and Learn
Syracuse Lunch and LearnSyracuse Lunch and Learn
Syracuse Lunch and Learn
 
Crossref/OASPA Publishers
Crossref/OASPA PublishersCrossref/OASPA Publishers
Crossref/OASPA Publishers
 
Postgraduate orientation 6th june 2017
Postgraduate orientation 6th june 2017Postgraduate orientation 6th june 2017
Postgraduate orientation 6th june 2017
 
Open Bibliography, Citations and Scholarship
Open Bibliography, Citations and ScholarshipOpen Bibliography, Citations and Scholarship
Open Bibliography, Citations and Scholarship
 
A Strategy for Sharing Your Research: Make Your Work Open Access
A Strategy for Sharing Your Research: Make Your Work Open AccessA Strategy for Sharing Your Research: Make Your Work Open Access
A Strategy for Sharing Your Research: Make Your Work Open Access
 
Web Today, Good Tomorrow? Transactional archiving of web content
Web Today, Good Tomorrow? Transactional archiving of web contentWeb Today, Good Tomorrow? Transactional archiving of web content
Web Today, Good Tomorrow? Transactional archiving of web content
 
The role of open access with regards to bibliometrics in the merit and resour...
The role of open access with regards to bibliometrics in the merit and resour...The role of open access with regards to bibliometrics in the merit and resour...
The role of open access with regards to bibliometrics in the merit and resour...
 
Open Access + Preprints for Scholars and Journals
Open Access + Preprints for Scholars and Journals Open Access + Preprints for Scholars and Journals
Open Access + Preprints for Scholars and Journals
 

More from University of Bologna

A Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology DevelopmentA Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology DevelopmentUniversity of Bologna
 
A pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflowsA pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflowsUniversity of Bologna
 
Semantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherSemantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherUniversity of Bologna
 
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...University of Bologna
 
Characterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experimentCharacterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experimentUniversity of Bologna
 
Bringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersBringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersUniversity of Bologna
 
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...University of Bologna
 
Towards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsTowards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsUniversity of Bologna
 
The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...University of Bologna
 
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...University of Bologna
 
Embedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approachEmbedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approachUniversity of Bologna
 

More from University of Bologna (14)

A Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology DevelopmentA Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology Development
 
FOOD: FOod in Open Data
FOOD: FOod in Open DataFOOD: FOod in Open Data
FOOD: FOod in Open Data
 
A pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflowsA pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflows
 
Semantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherSemantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing together
 
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
 
Characterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experimentCharacterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experiment
 
Bringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersBringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointers
 
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
 
Towards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsTowards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citations
 
The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...
 
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
 
Embedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approachEmbedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approach
 
Dealing with Markup Semantics
Dealing with Markup SemanticsDealing with Markup Semantics
Dealing with Markup Semantics
 
Handling Markup Overlaps Using OWL
Handling Markup Overlaps Using OWLHandling Markup Overlaps Using OWL
Handling Markup Overlaps Using OWL
 

Recently uploaded

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 

Recently uploaded (20)

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 

The Initiative for Open Citations and the OpenCitations Corpus

  • 1. Oxford e-Research Centre University of Oxford, UK 9th Conference on Open Access Scholarly Publishing Lisbon, Portugal 20 Sept 2017 © David Shotton 2017 Published under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Licence david.shotton@opencitations.net David Shotton The Initiative for Open Citations and the OpenCitations Corpus
  • 2. 2013 “Free scholarly citation data!” Donatello’s John the Baptist Fifth Conference on Open Access Scholarly Publishing Riga, Latvia 20 September 2013 . . . the voice of one crying in the wilderness
  • 3. 2016 “Release open citation data!” Eighth Conference on Open Access Scholarly Publishing Virginia, USA 20 September 2016 Dario Taraborelli Head of Research, Wikimedia Foundation
  • 4. 2017 The year of success - citation data is freed! n  Two fantastic success stories §  The Initiative for Open Citations https://i4oc.org/ §  The OpenCitations Corpus http://opencitations.net n  While related, these initiatives are separate and distinct n  Two Italian heros: Dario Taraborelli and Silvio Peroni
  • 5. Crossref - providing the fundamental infrastructure https://www.crossref.org/ n  Crossref is the registration agency of Digital Object Identifiers (DOIs) for scholarly publications (journal articles). Most publishers are members n  Crossref hold metadata about articles, made available via its REST API https://www.crossref.org/services/metadata-delivery/rest-api/ n  Crossref has its own heros: Ed Pentz Executive Director Geoff Bilder Director of Strategic Initiatives
  • 6. The Initiative for Open Citations n  The Initiative for Open Citations is a collaboration between scholarly publishers, researchers, and other interested parties to promote the unrestricted availability of scholarly citation It does not host citation data! n  Launched April 6, 2017 Web site https://i4oc.org n  Spearheaded by Dario Taraborelli of the Wikimedia Foundation §  with help from Jonathan Dugan, Martin Fenner, Jan Gerlach, Catriona MacCallum, Daniel Mietchen, Cameron Neylon, Mark Patterson, Michelle Paulson, Silvio Peroni and myself n  Six founding organizations: §  The Wikimedia Foundation, PLOS, eLife, DataCite, OpenCitations, and the Centre for Culture and Technology at Curtin University n  Within a short space of time, I4OC has persuaded most of the major scholarly publishers to make their reference lists open, so that the proportion of all references submitted to Crossref that are now open has risen from 1% to over 45%!
  • 7. Publishers supporting I4OC and opening their references n  49 scholarly publishers have opened their references, including the following major ones: n  Commercial publishers §  Association for Computing Machinery, BMJ, De Gruyter, eLife, EMBO Press, Hindawi, IOS Press, PeerJ, Pensoft Publishers, Portland Press, Public Library of Science, Springer Nature, Taylor & Francis, Wiley n  University and scholarly presses §  Cambridge University Press, Cold Spring Harbor Laboratory Press, Company of Biologists, Edinburgh University Press, MIT Press, Rockefeller University Press n  Learned societies §  American Association for the Advancement of Science (AAAS), American Physical Society, American Society for Cell Biology, International Union of Crystallography, Proceedings of the National Academy of Sciences (PNAS), Royal Society of Chemistry, The Royal Society
  • 8. Organizations and institutions who have endorsed I4OC n  Funders §  Sloan Foundation, Bill and Melinda Gates Foundation, Jisc, Simons Foundations Science Sandbox, Wellcome Trust n  Research organizations §  Allen Institute for Artificial Intelligence, Microsoft Research n  Libraries §  Association of Research Libraries, British Library, California Digital Library, Harvard Library Office for Scholarly Communication, LIBER, Max Planck Digital Library n  Bibliographic / bibliometric organizations §  Altmetrics, CiteSeerX, DBLP Computer Science Bibliography, ImpactStory, Zotero n  Other organizations §  Dryad Data Repository, Figshare, Internet Archive, Mozilla, OASPA, Open Knowledge International, OpenAire, ScienceOPEN, Wiki Education Foundation, Wikimedia Deutchland, Wikimedia UK
  • 9. I4OC – what’s left to do n  Almost 50% of Crossref-deposited references, from ~16 million articles, are now open, leaving about half that are still closed n  Crossref has over 7000 members, and it’s the long tail of smaller publisher-members that are not presently opening their references n  This includes a large number of Open Access publishers! §  Just because an article is published as Open Access and its references are available on the publisher’s web site, this is not sufficient for the bulk harvesting and analysis of citation data §  Imagine the effort of going to each site in turn and scraping reference lists presented in a wide variety of differing formats and DTD markups! n  Many small scholarly publishers are not even members of Crossref n  But help is at hand: §  OASPA has a sponsored agreement with Crossref whereby its smaller members can join Crossref via OASPA, with OASPA covering the cost of a proportion of their DOIs
  • 10. How to open references using the Crossref Cited-by service n  The Crossref Cited-by service is a free service that helps publishers find out who is citing their articles n  Publishers submit article reference lists to Crossref along with other metadata n  However, the Crossref default is that these reference lists are closed, not OPEN! n  To open their article reference lists, a publisher needs to do one of two things: §  Either contact support@crossref.org and ask them to turn on reference distribution for all the DOI prefixes they manage §  Or, in the article metadata they submit to Crossref, set the <reference_distribution_opt> span element to “any” for each DOI deposit where they want to make references openly available n  It’s that easy!!!
  • 11. ZooKeys use of Crossref open citation data
  • 12. The OpenCitations Corpus n  OpenCitations (http://opencitations.net) is a small infrastructure organization directed by myself and Silvio Peroni n  Its primary purpose is to host and develop the OpenCitations Corpus (OCC), a Linked Open Data repository of scholarly bibliographic citation data n  A founding member of I4OC, it is distinct and separate from that initiative n  The first OCC prototype was created at Oxford in 2011 with Jisc funding – see my 2013 COASP talk in Riga (http://zeeba.tv/the-open-citations-corpus/) n  A new instance of the OCC, based on our revised metadata schema, was created by Silvio Peroni and is now running at the University of Bologna n  It has been ingesting scholarly references continuously since early July 2016 n  OCC now provides the largest RDF collection of open citation data on the Web §  Currently holds references from ~240,000 citing bibliographic resources §  Provides >10 million citation links to over 5.5 million cited resources §  These data are freely available under a CC0 public domain waiver
  • 13. Source data - reference lists from PubMed Central n  At present, the ingested reference lists are obtained by processing the XML sources of papers in the Open Access subset of PubMed Central n  These are parsed to yield authors, titles, journal names, etc. §  We ask for the most recent papers first §  Thus, as citing papers, the OCC mainly includes articles published in 2016 and 2017 n  The identifiers of all the citing papers already processed are stored locally, so as not to request the same XML source twice n  We then call several external APIs, including Crossref and ORCID, to obtain additional metadata describing the citing and cited papers and their authors n  There are almost 1.7 million OA articles available in PubMed §  So far we have harvested 14% . . .
  • 14. The raw reference list data n  The reference lists extracted from citing papers are made available in JSON: {
 "doi": "10.1007/s11892-016-0752-4",
 "pmid": "27168063",
 "pmcid": "PMC4863913",
 "localid": "MED-27168063",
 "curator": "BEE EuropeanPubMedCentralProcessor",
 "source": "http://www.ebi.ac.uk/europepmc/webservices/rest/PMC4863913/fullTextXML",
 "source_provider": "Europe PubMed Central”
 "references": [
 ... 
 {
 "bibentry": "Chang, KY, Unanue, ER. Prediction of HLA-DQ8beta cell peptidome using
 a computational program and its relationship to autoreactive T cells,
 Int Immunol, 2009, 21, 6, 705, 13, DOI: 10.1093/intimm/dxp039, 
 PMID: 19461125",
 "pmid": "19461125",
 "doi": "10.1093/intimm/dxp039",
 "pmcid": "PMC2686615",
 "process_entry": "True”
 },
 ...
 ]
 } The citing paper's metadata and identifiers A reference in the citing paper's reference list, with its own ids
  • 15. The SPAR (Semantic Publishing and Referencing) Ontologies FaBiO, the FRBR-aligned Bibliographic Ontology - an ontology for describing bibliographic entities (books, articles, etc.) CiTO, the Citation Typing Ontology - enables the characterization of citations, both factually and rhetorically BiRO, the Bibliographic Reference Ontology - an ontology to define bibliographic records and references, and their compilation into bibliographic collections and reference lists, respectively http://www.sparontologies.net/ n  OCC data are then stored in RDF (JSON-LD) using the SPAR (Semantic Publishing and Referencing) ontologies and other standard vocabularies n  These SPAR ontologies include
  • 16. Availability of the OpenCitations Corpus data n  All the OpenCitations software is available on GitHub under an open license n  The data in the OpenCitations Corpus are available in three different ways: §  Direct access to bibliographic resources by means of their HTTP URIs (via content negotiation), e.g. https://w3id.org/oc/corpus/br/1 §  Queries to our SPARQL endpoint: https://w3id.org/oc/sparql §  Monthly dumps stored in Figshare: http://opencitations.net/download n  Currently the OCC uses a good graph-based triplestore – Blazegraph n  However, the virtual machine that hosts it is very limited in resources, causing performance problems for demanding SPARQL queries n  We plan soon to commission a new powerful physical server that should provide a better user experience, and to develop additional user-friendly interfaces for accessing the OCC data, including graphic visualizations of citation networks
  • 17. Use of the OpenCitations web site n  Accesses to the OpenCitations web site and services: The “corpus” and “sparql” pages have together gained 89% of the total accesses, showing that people mainly access the OpenCitations Corpus to explore and use the data within it
  • 18. Use of OpenCitations data stored on Figshare
  • 19. What happened this summer? n  Use of the OpenCitations social accounts §  Twitter - https://twitter.com/opencitations §  Wordpress Blog – https://opencitations.wordpress.com/ increased markedly following the launch of the Initiative for Open Citations
  • 20. Who is using OpenCitations, and for what? n  Organizations and projects that we know use OpenCitations resources include: §  Wikidata - pulling citation data to enrich their pages §  OpenAIRE – using OCC bibliographic resources info in OpenAIRE §  LOC-DB - have adopted the OpenCitations data model for their database §  Tomas Petricek of the Turing Institute - extending his Gamma Project visualization software to handle OpenCitations’ RDF data §  Ontotext.com - combining Springer's SciGraph data with OpenCitations data using SPARQL federation §  Anna Kamińska of the Polish Librarians Association - undertaking citation network analysis of PLoS One research papers using data in the OCC n  We can’t know who else is using OpenCitations resources unless they tell us! §  Please let us know if you are! n  On 10th September, Crossref blogged about our use of their REST API §  https://www.crossref.org/blog/using-the-crossref-rest-api.-part-5-with- opencitations/
  • 21. Present status of OpenCitations n  We have recently received a small grant from the Sloan Foundation for the OpenCitations Enhancement Project §  This provides one year’s salary for a postdoc to develop new user interfaces, and new hardware to enhance the OCC performance n  We have just appointed Ivan Heibi to work on the OCC with Silvio in Bologna n  Silvio and Ivan will be commissioning the new hardware next month §  This will use parallel processing to increase ingest rate 30-fold n  We are in the process of appointing an International Advisory Board to guide the growth of OpenCitations
  • 22. Enhancing the OpenCitations ingestion rate n  OpenCitations current ingests ~8 million new citations per year n  With 30 Raspberry Pis working in parallel as ingest machines, we anticipate that this rate will increase to ~240 million new citations per year n  By the end of 2018, OpenCitations should hold ~ 250 million citations, compared to Web of Knowledge’s ~1.25 billion n  Even this partial coverage will include citations of all important papers, these critical papers being easily recognized because they are highly cited, forming nodes in the citation graph with a large number of inward citation links n  A further five-fold increase in ingest rate - significant but achievable with additional hardware (and funding!) - will enable us to reach parity by 2020
  • 23. Where will the references come from? n  With the enhanced ingest rate, we will quickly consume all 1.7 million articles in the Open Access Subset of PubMed Central n  We will then start harvesting the references from the ~16 million articles already made open at Crossref in response to the Initiative for Open Citations, and the additional articles that I4OC now encourages other publishers to open n  Possible additional significant sources of open citation data include §  ArXiv (1.3 million preprints) §  CiteSeerX (>120 million references from >6 million documents) §  CitEc (11 million references from a million Economics papers) n  References from pre-digital publications extracted by text mining, e.g. §  In the Social Sciences, from the LOC-DB at the University of Mannheim §  In Biological Taxonomy, mined into BioStor by Rod Page from the Biodiversity Heritage Library, e.g. http://biostor.org/reference/105357
  • 24. We are winning the battle for open scholarship! david.shotton@opencitations.net David Shotton Silvio Peroni silvio.peroni@opencitations.net Website: http://opencitations.net Email: contact@opencitations.net Twitter: @opencitations Blog: https://opencitations.wordpress.com Website: https://i4oc.org/ Email: info@i4oc.org Twitter: @i4oc_org dtaraborelli@wikimedia.org Dario Taraborelli Mark Patterson m.patterson@elifesciences.org Catriona MacCallum catriona.maccallum@hindawi.com