SlideShare a Scribd company logo

The Initiative for Open Citations and the OpenCitations Corpus

University of Bologna
University of Bologna
University of BolognaPostDoc at University of Bologna

Slides of David Shotton's presentation at OASPA 2017 - 20 September 2017, Lisbon, Portugal. These slides describe the Initiative for Open Citations – which is a collaboration between scholarly publishers, researchers, and other interested parties to promote the unrestricted availability of scholarly citation – and OpenCitations – i.e. a small infrastructure organization which hosts and develops the OpenCitations Corpus (OCC), a Linked Open Data repository of scholarly bibliographic citation data.

The Initiative for Open Citations and the OpenCitations Corpus

1 of 24
Download to read offline
Oxford e-Research Centre
University of Oxford, UK
9th Conference on
Open Access
Scholarly Publishing
Lisbon, Portugal
20 Sept 2017
© David Shotton 2017 Published under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Licence
david.shotton@opencitations.net
David Shotton
The Initiative for Open Citations
and the OpenCitations Corpus
2013 “Free scholarly citation data!”
Donatello’s
John the Baptist
Fifth Conference on
Open Access
Scholarly Publishing
Riga, Latvia
20 September 2013
. . . the voice of one
crying in the wilderness
2016 “Release open citation data!”
Eighth Conference on
Open Access
Scholarly Publishing
Virginia, USA
20 September 2016
Dario Taraborelli
Head of Research,
Wikimedia Foundation
2017 The year of success - citation data is freed!
n  Two fantastic success stories
§  The Initiative for Open Citations https://i4oc.org/
§  The OpenCitations Corpus http://opencitations.net
n  While related, these initiatives are separate and distinct
n  Two Italian heros: Dario Taraborelli and Silvio Peroni
Crossref - providing the fundamental infrastructure
https://www.crossref.org/
n  Crossref is the registration agency of Digital Object Identifiers (DOIs) for
scholarly publications (journal articles). Most publishers are members
n  Crossref hold metadata about articles, made available via its REST API
https://www.crossref.org/services/metadata-delivery/rest-api/
n  Crossref has its own heros:
Ed Pentz Executive Director Geoff Bilder Director of Strategic Initiatives
The Initiative for Open Citations
n  The Initiative for Open Citations is a collaboration between scholarly publishers,
researchers, and other interested parties to promote the unrestricted availability
of scholarly citation It does not host citation data!
n  Launched April 6, 2017 Web site https://i4oc.org
n  Spearheaded by Dario Taraborelli of the Wikimedia Foundation
§  with help from Jonathan Dugan, Martin Fenner, Jan Gerlach,
Catriona MacCallum, Daniel Mietchen, Cameron Neylon,
Mark Patterson, Michelle Paulson, Silvio Peroni and myself
n  Six founding organizations:
§  The Wikimedia Foundation, PLOS, eLife, DataCite, OpenCitations,
and the Centre for Culture and Technology at Curtin University
n  Within a short space of time, I4OC has persuaded most of the major scholarly
publishers to make their reference lists open, so that the proportion of all
references submitted to Crossref that are now open has risen from 1% to
over 45%!

Recommended

A document-inspired way for tracking changes of RDF data - The case of the Op...
A document-inspired way for tracking changes of RDF data - The case of the Op...A document-inspired way for tracking changes of RDF data - The case of the Op...
A document-inspired way for tracking changes of RDF data - The case of the Op...University of Bologna
 
Freedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseFreedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseUniversity of Bologna
 
When the Web of Linked Data Arrives
When the Web of Linked Data ArrivesWhen the Web of Linked Data Arrives
When the Web of Linked Data ArrivesRichard Wallis
 
The SFX Framework for Context-Sensitive Reference Linking
The SFX Framework for  Context-Sensitive Reference LinkingThe SFX Framework for  Context-Sensitive Reference Linking
The SFX Framework for Context-Sensitive Reference LinkingHerbert Van de Sompel
 
Data Designed for Discovery
Data Designed for DiscoveryData Designed for Discovery
Data Designed for DiscoveryOCLC
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for LibrariesLukas Koster
 
鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107皓仁 柯
 

More Related Content

What's hot

How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)Charleston Conference
 
Multilingual presentation ifla 2013 08-19
Multilingual presentation ifla 2013 08-19Multilingual presentation ifla 2013 08-19
Multilingual presentation ifla 2013 08-19Janifer Gatenby
 
China: Journal Publishing, DOI and CrossCheck (2011 CrossRef Workshops)
China: Journal Publishing, DOI and CrossCheck (2011 CrossRef Workshops)China: Journal Publishing, DOI and CrossCheck (2011 CrossRef Workshops)
China: Journal Publishing, DOI and CrossCheck (2011 CrossRef Workshops)Crossref
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for RepositoriesMartin Klein
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemHerbert Van de Sompel
 
How Libraries Use Publisher Metadata - Crossref Community Webinar
How Libraries Use Publisher Metadata - Crossref Community WebinarHow Libraries Use Publisher Metadata - Crossref Community Webinar
How Libraries Use Publisher Metadata - Crossref Community WebinarCrossref
 
towards interoperable archives: the Universal Preprint Service initiative
towards interoperable archives:  the Universal Preprint Service initiativetowards interoperable archives:  the Universal Preprint Service initiative
towards interoperable archives: the Universal Preprint Service initiativeHerbert Van de Sompel
 
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...Alison Hitchens
 
Linked open data and libraries
Linked open data and librariesLinked open data and libraries
Linked open data and librariesAlison Hitchens
 
What is #LODLAM?! (revised January 2015)
What is #LODLAM?! (revised January 2015)What is #LODLAM?! (revised January 2015)
What is #LODLAM?! (revised January 2015)Alison Hitchens
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTHerbert Van de Sompel
 
Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Herbert Van de Sompel
 
Environmental trends and OCLC Research, a presentation at the University of N...
Environmental trends and OCLC Research, a presentation at the University of N...Environmental trends and OCLC Research, a presentation at the University of N...
Environmental trends and OCLC Research, a presentation at the University of N...lisld
 
Verifiable, linked open knowledge that anyone can edit
Verifiable, linked open knowledge that anyone can editVerifiable, linked open knowledge that anyone can edit
Verifiable, linked open knowledge that anyone can editDario Taraborelli
 
Open Annotation Collaboration Introduction
Open Annotation Collaboration IntroductionOpen Annotation Collaboration Introduction
Open Annotation Collaboration IntroductionTimothy Cole
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISimon Jupp
 
The library in the life of the user
The library in the life of the userThe library in the life of the user
The library in the life of the userlisld
 

What's hot (20)

Reminiscing about interoperability
Reminiscing about interoperabilityReminiscing about interoperability
Reminiscing about interoperability
 
How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)
 
Multilingual presentation ifla 2013 08-19
Multilingual presentation ifla 2013 08-19Multilingual presentation ifla 2013 08-19
Multilingual presentation ifla 2013 08-19
 
China: Journal Publishing, DOI and CrossCheck (2011 CrossRef Workshops)
China: Journal Publishing, DOI and CrossCheck (2011 CrossRef Workshops)China: Journal Publishing, DOI and CrossCheck (2011 CrossRef Workshops)
China: Journal Publishing, DOI and CrossCheck (2011 CrossRef Workshops)
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for Repositories
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication System
 
How Libraries Use Publisher Metadata - Crossref Community Webinar
How Libraries Use Publisher Metadata - Crossref Community WebinarHow Libraries Use Publisher Metadata - Crossref Community Webinar
How Libraries Use Publisher Metadata - Crossref Community Webinar
 
towards interoperable archives: the Universal Preprint Service initiative
towards interoperable archives:  the Universal Preprint Service initiativetowards interoperable archives:  the Universal Preprint Service initiative
towards interoperable archives: the Universal Preprint Service initiative
 
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
 
Linked open data and libraries
Linked open data and librariesLinked open data and libraries
Linked open data and libraries
 
What is #LODLAM?! (revised January 2015)
What is #LODLAM?! (revised January 2015)What is #LODLAM?! (revised January 2015)
What is #LODLAM?! (revised January 2015)
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
 
Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013
 
Environmental trends and OCLC Research, a presentation at the University of N...
Environmental trends and OCLC Research, a presentation at the University of N...Environmental trends and OCLC Research, a presentation at the University of N...
Environmental trends and OCLC Research, a presentation at the University of N...
 
Verifiable, linked open knowledge that anyone can edit
Verifiable, linked open knowledge that anyone can editVerifiable, linked open knowledge that anyone can edit
Verifiable, linked open knowledge that anyone can edit
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
Open Annotation Collaboration Introduction
Open Annotation Collaboration IntroductionOpen Annotation Collaboration Introduction
Open Annotation Collaboration Introduction
 
Bracke may4-1
Bracke may4-1Bracke may4-1
Bracke may4-1
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBI
 
The library in the life of the user
The library in the life of the userThe library in the life of the user
The library in the life of the user
 

Similar to The Initiative for Open Citations and the OpenCitations Corpus

David Shotton - OpenCon Oxford, 1st Dec 2017
David Shotton - OpenCon Oxford, 1st Dec 2017David Shotton - OpenCon Oxford, 1st Dec 2017
David Shotton - OpenCon Oxford, 1st Dec 2017Crossref
 
finde datasets repository.pptx
finde datasets repository.pptxfinde datasets repository.pptx
finde datasets repository.pptxhasanrdhaiwi
 
Open Access: an introduction
Open Access: an introductionOpen Access: an introduction
Open Access: an introductionElizabeth Yates
 
Open data sources in VOSviewer
Open data sources in VOSviewerOpen data sources in VOSviewer
Open data sources in VOSviewerNees Jan van Eck
 
2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...
2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...
2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...Crossref
 
Possible ways of getting oneself abreast of current literature
Possible ways of getting oneself abreast of current literaturePossible ways of getting oneself abreast of current literature
Possible ways of getting oneself abreast of current literatureMythili Srinivasan
 
Visualizing science based on open data sources
Visualizing science based on open data sourcesVisualizing science based on open data sources
Visualizing science based on open data sourcesNees Jan van Eck
 
University at Albany Lunch and Learn
University at Albany Lunch and LearnUniversity at Albany Lunch and Learn
University at Albany Lunch and Learnrachelmccullough
 
Finding Insights in Article-Level Metrics for Research Evaluation
Finding Insights in Article-Level Metrics for Research EvaluationFinding Insights in Article-Level Metrics for Research Evaluation
Finding Insights in Article-Level Metrics for Research EvaluationRichard Cave
 
PLoS - Why It is a Model to be Emulated
PLoS - Why It is a Model to be EmulatedPLoS - Why It is a Model to be Emulated
PLoS - Why It is a Model to be EmulatedPhilip Bourne
 
Crossref/OASPA Publishers
Crossref/OASPA PublishersCrossref/OASPA Publishers
Crossref/OASPA PublishersCrossref
 
Postgraduate orientation 6th june 2017
Postgraduate orientation 6th june 2017Postgraduate orientation 6th june 2017
Postgraduate orientation 6th june 2017Debs Martindale
 
Open Bibliography, Citations and Scholarship
Open Bibliography, Citations and ScholarshipOpen Bibliography, Citations and Scholarship
Open Bibliography, Citations and Scholarshipbenosteen
 
A Strategy for Sharing Your Research: Make Your Work Open Access
A Strategy for Sharing Your Research: Make Your Work Open AccessA Strategy for Sharing Your Research: Make Your Work Open Access
A Strategy for Sharing Your Research: Make Your Work Open AccessSunghae Ress
 
Web Today, Good Tomorrow? Transactional archiving of web content
Web Today, Good Tomorrow? Transactional archiving of web contentWeb Today, Good Tomorrow? Transactional archiving of web content
Web Today, Good Tomorrow? Transactional archiving of web contentPeter Burnhill
 
The role of open access with regards to bibliometrics in the merit and resour...
The role of open access with regards to bibliometrics in the merit and resour...The role of open access with regards to bibliometrics in the merit and resour...
The role of open access with regards to bibliometrics in the merit and resour...Gustaf Nelhans
 
Open Access + Preprints for Scholars and Journals
Open Access + Preprints for Scholars and Journals Open Access + Preprints for Scholars and Journals
Open Access + Preprints for Scholars and Journals Scholastica
 

Similar to The Initiative for Open Citations and the OpenCitations Corpus (20)

David Shotton - OpenCon Oxford, 1st Dec 2017
David Shotton - OpenCon Oxford, 1st Dec 2017David Shotton - OpenCon Oxford, 1st Dec 2017
David Shotton - OpenCon Oxford, 1st Dec 2017
 
finde datasets repository.pptx
finde datasets repository.pptxfinde datasets repository.pptx
finde datasets repository.pptx
 
UKSG 2018 Breakout - Setting your cites to open I4OC - Maccallum
UKSG 2018 Breakout - Setting your cites to open I4OC - MaccallumUKSG 2018 Breakout - Setting your cites to open I4OC - Maccallum
UKSG 2018 Breakout - Setting your cites to open I4OC - Maccallum
 
Open Access: an introduction
Open Access: an introductionOpen Access: an introduction
Open Access: an introduction
 
Open data sources in VOSviewer
Open data sources in VOSviewerOpen data sources in VOSviewer
Open data sources in VOSviewer
 
The university library as a support for the institutional research identity
The university library as a support for the institutional research identityThe university library as a support for the institutional research identity
The university library as a support for the institutional research identity
 
2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...
2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...
2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...
 
Possible ways of getting oneself abreast of current literature
Possible ways of getting oneself abreast of current literaturePossible ways of getting oneself abreast of current literature
Possible ways of getting oneself abreast of current literature
 
Visualizing science based on open data sources
Visualizing science based on open data sourcesVisualizing science based on open data sources
Visualizing science based on open data sources
 
University at Albany Lunch and Learn
University at Albany Lunch and LearnUniversity at Albany Lunch and Learn
University at Albany Lunch and Learn
 
Finding Insights in Article-Level Metrics for Research Evaluation
Finding Insights in Article-Level Metrics for Research EvaluationFinding Insights in Article-Level Metrics for Research Evaluation
Finding Insights in Article-Level Metrics for Research Evaluation
 
PLoS - Why It is a Model to be Emulated
PLoS - Why It is a Model to be EmulatedPLoS - Why It is a Model to be Emulated
PLoS - Why It is a Model to be Emulated
 
Syracuse Lunch and Learn
Syracuse Lunch and LearnSyracuse Lunch and Learn
Syracuse Lunch and Learn
 
Crossref/OASPA Publishers
Crossref/OASPA PublishersCrossref/OASPA Publishers
Crossref/OASPA Publishers
 
Postgraduate orientation 6th june 2017
Postgraduate orientation 6th june 2017Postgraduate orientation 6th june 2017
Postgraduate orientation 6th june 2017
 
Open Bibliography, Citations and Scholarship
Open Bibliography, Citations and ScholarshipOpen Bibliography, Citations and Scholarship
Open Bibliography, Citations and Scholarship
 
A Strategy for Sharing Your Research: Make Your Work Open Access
A Strategy for Sharing Your Research: Make Your Work Open AccessA Strategy for Sharing Your Research: Make Your Work Open Access
A Strategy for Sharing Your Research: Make Your Work Open Access
 
Web Today, Good Tomorrow? Transactional archiving of web content
Web Today, Good Tomorrow? Transactional archiving of web contentWeb Today, Good Tomorrow? Transactional archiving of web content
Web Today, Good Tomorrow? Transactional archiving of web content
 
The role of open access with regards to bibliometrics in the merit and resour...
The role of open access with regards to bibliometrics in the merit and resour...The role of open access with regards to bibliometrics in the merit and resour...
The role of open access with regards to bibliometrics in the merit and resour...
 
Open Access + Preprints for Scholars and Journals
Open Access + Preprints for Scholars and Journals Open Access + Preprints for Scholars and Journals
Open Access + Preprints for Scholars and Journals
 

More from University of Bologna

A Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology DevelopmentA Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology DevelopmentUniversity of Bologna
 
A pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflowsA pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflowsUniversity of Bologna
 
Semantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherSemantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherUniversity of Bologna
 
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...University of Bologna
 
Characterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experimentCharacterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experimentUniversity of Bologna
 
Bringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersBringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersUniversity of Bologna
 
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...University of Bologna
 
Towards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsTowards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsUniversity of Bologna
 
The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...University of Bologna
 
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...University of Bologna
 
Embedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approachEmbedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approachUniversity of Bologna
 

More from University of Bologna (14)

A Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology DevelopmentA Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology Development
 
FOOD: FOod in Open Data
FOOD: FOod in Open DataFOOD: FOod in Open Data
FOOD: FOod in Open Data
 
A pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflowsA pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflows
 
Semantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherSemantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing together
 
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
 
Characterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experimentCharacterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experiment
 
Bringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersBringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointers
 
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
 
Towards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsTowards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citations
 
The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...
 
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
 
Embedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approachEmbedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approach
 
Dealing with Markup Semantics
Dealing with Markup SemanticsDealing with Markup Semantics
Dealing with Markup Semantics
 
Handling Markup Overlaps Using OWL
Handling Markup Overlaps Using OWLHandling Markup Overlaps Using OWL
Handling Markup Overlaps Using OWL
 

Recently uploaded

Carpal tunnel Syndrom Wesam Aljabali -1.pdf
Carpal tunnel Syndrom Wesam Aljabali -1.pdfCarpal tunnel Syndrom Wesam Aljabali -1.pdf
Carpal tunnel Syndrom Wesam Aljabali -1.pdfMsm_mo
 
Age dependent lactation dynamics in northern elephant seals-2.pptx
Age dependent lactation dynamics in northern elephant seals-2.pptxAge dependent lactation dynamics in northern elephant seals-2.pptx
Age dependent lactation dynamics in northern elephant seals-2.pptxElise. Baugh
 
Analytical Coursework - Molly Winterbottom.pdf
Analytical Coursework - Molly Winterbottom.pdfAnalytical Coursework - Molly Winterbottom.pdf
Analytical Coursework - Molly Winterbottom.pdfMollyWinterbottom
 
A recently formed ocean inside Saturn’s moon Mimas
A recently formed ocean inside Saturn’s moon MimasA recently formed ocean inside Saturn’s moon Mimas
A recently formed ocean inside Saturn’s moon MimasSérgio Sacani
 
Duchenne Muscular Dystrophy or DMD .pptx
Duchenne Muscular Dystrophy or DMD .pptxDuchenne Muscular Dystrophy or DMD .pptx
Duchenne Muscular Dystrophy or DMD .pptxNavanidhan.M
 
Oral histology : BDS- 1st year Dental-Pulp-Ppt.ppt
Oral histology : BDS- 1st year Dental-Pulp-Ppt.pptOral histology : BDS- 1st year Dental-Pulp-Ppt.ppt
Oral histology : BDS- 1st year Dental-Pulp-Ppt.pptSOUMYAADRDEPTOFDENTA1
 
Construction of Magic Squares by Swapping Rows and Columns.pdf
Construction of Magic Squares by Swapping Rows and Columns.pdfConstruction of Magic Squares by Swapping Rows and Columns.pdf
Construction of Magic Squares by Swapping Rows and Columns.pdfLossian Barbosa Bacelar Miranda
 
Quality safety and legislations of cosmetics.pptx
Quality safety and legislations of cosmetics.pptxQuality safety and legislations of cosmetics.pptx
Quality safety and legislations of cosmetics.pptxDeviSky1
 
dkNET Webinar: An Encyclopedia of the Adipose Tissue Secretome to Identify Me...
dkNET Webinar: An Encyclopedia of the Adipose Tissue Secretome to Identify Me...dkNET Webinar: An Encyclopedia of the Adipose Tissue Secretome to Identify Me...
dkNET Webinar: An Encyclopedia of the Adipose Tissue Secretome to Identify Me...dkNET
 
A seven-Earth-radius helium-burning star inside a 20.5-min detached binary
A seven-Earth-radius helium-burning star inside a 20.5-min detached binaryA seven-Earth-radius helium-burning star inside a 20.5-min detached binary
A seven-Earth-radius helium-burning star inside a 20.5-min detached binarySérgio Sacani
 
Antibiotic Resistance: Global Threat to Public Health.pptx
Antibiotic Resistance: Global Threat to Public Health.pptxAntibiotic Resistance: Global Threat to Public Health.pptx
Antibiotic Resistance: Global Threat to Public Health.pptxSudnyankaKandge1
 
RNA organic extraction.pdf
RNA organic extraction.pdfRNA organic extraction.pdf
RNA organic extraction.pdfNetHelix
 
PROSTHETIC FEET description and its types
PROSTHETIC FEET description and its typesPROSTHETIC FEET description and its types
PROSTHETIC FEET description and its typeseshasmalik27
 
A review of volcanic electrification of the atmosphere and volcanic lightning
A review of volcanic electrification of the atmosphere and volcanic lightningA review of volcanic electrification of the atmosphere and volcanic lightning
A review of volcanic electrification of the atmosphere and volcanic lightningSérgio Sacani
 
Advancing CAM Assay Image Analysis Using Deep Learning Software
Advancing CAM Assay Image Analysis Using Deep Learning SoftwareAdvancing CAM Assay Image Analysis Using Deep Learning Software
Advancing CAM Assay Image Analysis Using Deep Learning SoftwareKML Vision
 
An Introduction to Quantum Programming Languages
An Introduction to Quantum Programming LanguagesAn Introduction to Quantum Programming Languages
An Introduction to Quantum Programming LanguagesDavid Yonge-Mallo
 
Agroecology as an approach to design sustainable Food Systems
Agroecology as an approach to design sustainable Food SystemsAgroecology as an approach to design sustainable Food Systems
Agroecology as an approach to design sustainable Food SystemsSIANI
 
transgenics_17b.pptx
transgenics_17b.pptxtransgenics_17b.pptx
transgenics_17b.pptxridhi124788
 

Recently uploaded (20)

Carpal tunnel Syndrom Wesam Aljabali -1.pdf
Carpal tunnel Syndrom Wesam Aljabali -1.pdfCarpal tunnel Syndrom Wesam Aljabali -1.pdf
Carpal tunnel Syndrom Wesam Aljabali -1.pdf
 
Age dependent lactation dynamics in northern elephant seals-2.pptx
Age dependent lactation dynamics in northern elephant seals-2.pptxAge dependent lactation dynamics in northern elephant seals-2.pptx
Age dependent lactation dynamics in northern elephant seals-2.pptx
 
LC MS.pptx
LC MS.pptxLC MS.pptx
LC MS.pptx
 
Analytical Coursework - Molly Winterbottom.pdf
Analytical Coursework - Molly Winterbottom.pdfAnalytical Coursework - Molly Winterbottom.pdf
Analytical Coursework - Molly Winterbottom.pdf
 
A recently formed ocean inside Saturn’s moon Mimas
A recently formed ocean inside Saturn’s moon MimasA recently formed ocean inside Saturn’s moon Mimas
A recently formed ocean inside Saturn’s moon Mimas
 
Duchenne Muscular Dystrophy or DMD .pptx
Duchenne Muscular Dystrophy or DMD .pptxDuchenne Muscular Dystrophy or DMD .pptx
Duchenne Muscular Dystrophy or DMD .pptx
 
Oral histology : BDS- 1st year Dental-Pulp-Ppt.ppt
Oral histology : BDS- 1st year Dental-Pulp-Ppt.pptOral histology : BDS- 1st year Dental-Pulp-Ppt.ppt
Oral histology : BDS- 1st year Dental-Pulp-Ppt.ppt
 
Construction of Magic Squares by Swapping Rows and Columns.pdf
Construction of Magic Squares by Swapping Rows and Columns.pdfConstruction of Magic Squares by Swapping Rows and Columns.pdf
Construction of Magic Squares by Swapping Rows and Columns.pdf
 
Quality safety and legislations of cosmetics.pptx
Quality safety and legislations of cosmetics.pptxQuality safety and legislations of cosmetics.pptx
Quality safety and legislations of cosmetics.pptx
 
dkNET Webinar: An Encyclopedia of the Adipose Tissue Secretome to Identify Me...
dkNET Webinar: An Encyclopedia of the Adipose Tissue Secretome to Identify Me...dkNET Webinar: An Encyclopedia of the Adipose Tissue Secretome to Identify Me...
dkNET Webinar: An Encyclopedia of the Adipose Tissue Secretome to Identify Me...
 
A seven-Earth-radius helium-burning star inside a 20.5-min detached binary
A seven-Earth-radius helium-burning star inside a 20.5-min detached binaryA seven-Earth-radius helium-burning star inside a 20.5-min detached binary
A seven-Earth-radius helium-burning star inside a 20.5-min detached binary
 
Antibiotic Resistance: Global Threat to Public Health.pptx
Antibiotic Resistance: Global Threat to Public Health.pptxAntibiotic Resistance: Global Threat to Public Health.pptx
Antibiotic Resistance: Global Threat to Public Health.pptx
 
RNA organic extraction.pdf
RNA organic extraction.pdfRNA organic extraction.pdf
RNA organic extraction.pdf
 
PROSTHETIC FEET description and its types
PROSTHETIC FEET description and its typesPROSTHETIC FEET description and its types
PROSTHETIC FEET description and its types
 
A review of volcanic electrification of the atmosphere and volcanic lightning
A review of volcanic electrification of the atmosphere and volcanic lightningA review of volcanic electrification of the atmosphere and volcanic lightning
A review of volcanic electrification of the atmosphere and volcanic lightning
 
ALL the evidence webinar: Appraising and using evidence about community conte...
ALL the evidence webinar: Appraising and using evidence about community conte...ALL the evidence webinar: Appraising and using evidence about community conte...
ALL the evidence webinar: Appraising and using evidence about community conte...
 
Advancing CAM Assay Image Analysis Using Deep Learning Software
Advancing CAM Assay Image Analysis Using Deep Learning SoftwareAdvancing CAM Assay Image Analysis Using Deep Learning Software
Advancing CAM Assay Image Analysis Using Deep Learning Software
 
An Introduction to Quantum Programming Languages
An Introduction to Quantum Programming LanguagesAn Introduction to Quantum Programming Languages
An Introduction to Quantum Programming Languages
 
Agroecology as an approach to design sustainable Food Systems
Agroecology as an approach to design sustainable Food SystemsAgroecology as an approach to design sustainable Food Systems
Agroecology as an approach to design sustainable Food Systems
 
transgenics_17b.pptx
transgenics_17b.pptxtransgenics_17b.pptx
transgenics_17b.pptx
 

The Initiative for Open Citations and the OpenCitations Corpus

  • 1. Oxford e-Research Centre University of Oxford, UK 9th Conference on Open Access Scholarly Publishing Lisbon, Portugal 20 Sept 2017 © David Shotton 2017 Published under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Licence david.shotton@opencitations.net David Shotton The Initiative for Open Citations and the OpenCitations Corpus
  • 2. 2013 “Free scholarly citation data!” Donatello’s John the Baptist Fifth Conference on Open Access Scholarly Publishing Riga, Latvia 20 September 2013 . . . the voice of one crying in the wilderness
  • 3. 2016 “Release open citation data!” Eighth Conference on Open Access Scholarly Publishing Virginia, USA 20 September 2016 Dario Taraborelli Head of Research, Wikimedia Foundation
  • 4. 2017 The year of success - citation data is freed! n  Two fantastic success stories §  The Initiative for Open Citations https://i4oc.org/ §  The OpenCitations Corpus http://opencitations.net n  While related, these initiatives are separate and distinct n  Two Italian heros: Dario Taraborelli and Silvio Peroni
  • 5. Crossref - providing the fundamental infrastructure https://www.crossref.org/ n  Crossref is the registration agency of Digital Object Identifiers (DOIs) for scholarly publications (journal articles). Most publishers are members n  Crossref hold metadata about articles, made available via its REST API https://www.crossref.org/services/metadata-delivery/rest-api/ n  Crossref has its own heros: Ed Pentz Executive Director Geoff Bilder Director of Strategic Initiatives
  • 6. The Initiative for Open Citations n  The Initiative for Open Citations is a collaboration between scholarly publishers, researchers, and other interested parties to promote the unrestricted availability of scholarly citation It does not host citation data! n  Launched April 6, 2017 Web site https://i4oc.org n  Spearheaded by Dario Taraborelli of the Wikimedia Foundation §  with help from Jonathan Dugan, Martin Fenner, Jan Gerlach, Catriona MacCallum, Daniel Mietchen, Cameron Neylon, Mark Patterson, Michelle Paulson, Silvio Peroni and myself n  Six founding organizations: §  The Wikimedia Foundation, PLOS, eLife, DataCite, OpenCitations, and the Centre for Culture and Technology at Curtin University n  Within a short space of time, I4OC has persuaded most of the major scholarly publishers to make their reference lists open, so that the proportion of all references submitted to Crossref that are now open has risen from 1% to over 45%!
  • 7. Publishers supporting I4OC and opening their references n  49 scholarly publishers have opened their references, including the following major ones: n  Commercial publishers §  Association for Computing Machinery, BMJ, De Gruyter, eLife, EMBO Press, Hindawi, IOS Press, PeerJ, Pensoft Publishers, Portland Press, Public Library of Science, Springer Nature, Taylor & Francis, Wiley n  University and scholarly presses §  Cambridge University Press, Cold Spring Harbor Laboratory Press, Company of Biologists, Edinburgh University Press, MIT Press, Rockefeller University Press n  Learned societies §  American Association for the Advancement of Science (AAAS), American Physical Society, American Society for Cell Biology, International Union of Crystallography, Proceedings of the National Academy of Sciences (PNAS), Royal Society of Chemistry, The Royal Society
  • 8. Organizations and institutions who have endorsed I4OC n  Funders §  Sloan Foundation, Bill and Melinda Gates Foundation, Jisc, Simons Foundations Science Sandbox, Wellcome Trust n  Research organizations §  Allen Institute for Artificial Intelligence, Microsoft Research n  Libraries §  Association of Research Libraries, British Library, California Digital Library, Harvard Library Office for Scholarly Communication, LIBER, Max Planck Digital Library n  Bibliographic / bibliometric organizations §  Altmetrics, CiteSeerX, DBLP Computer Science Bibliography, ImpactStory, Zotero n  Other organizations §  Dryad Data Repository, Figshare, Internet Archive, Mozilla, OASPA, Open Knowledge International, OpenAire, ScienceOPEN, Wiki Education Foundation, Wikimedia Deutchland, Wikimedia UK
  • 9. I4OC – what’s left to do n  Almost 50% of Crossref-deposited references, from ~16 million articles, are now open, leaving about half that are still closed n  Crossref has over 7000 members, and it’s the long tail of smaller publisher-members that are not presently opening their references n  This includes a large number of Open Access publishers! §  Just because an article is published as Open Access and its references are available on the publisher’s web site, this is not sufficient for the bulk harvesting and analysis of citation data §  Imagine the effort of going to each site in turn and scraping reference lists presented in a wide variety of differing formats and DTD markups! n  Many small scholarly publishers are not even members of Crossref n  But help is at hand: §  OASPA has a sponsored agreement with Crossref whereby its smaller members can join Crossref via OASPA, with OASPA covering the cost of a proportion of their DOIs
  • 10. How to open references using the Crossref Cited-by service n  The Crossref Cited-by service is a free service that helps publishers find out who is citing their articles n  Publishers submit article reference lists to Crossref along with other metadata n  However, the Crossref default is that these reference lists are closed, not OPEN! n  To open their article reference lists, a publisher needs to do one of two things: §  Either contact support@crossref.org and ask them to turn on reference distribution for all the DOI prefixes they manage §  Or, in the article metadata they submit to Crossref, set the <reference_distribution_opt> span element to “any” for each DOI deposit where they want to make references openly available n  It’s that easy!!!
  • 11. ZooKeys use of Crossref open citation data
  • 12. The OpenCitations Corpus n  OpenCitations (http://opencitations.net) is a small infrastructure organization directed by myself and Silvio Peroni n  Its primary purpose is to host and develop the OpenCitations Corpus (OCC), a Linked Open Data repository of scholarly bibliographic citation data n  A founding member of I4OC, it is distinct and separate from that initiative n  The first OCC prototype was created at Oxford in 2011 with Jisc funding – see my 2013 COASP talk in Riga (http://zeeba.tv/the-open-citations-corpus/) n  A new instance of the OCC, based on our revised metadata schema, was created by Silvio Peroni and is now running at the University of Bologna n  It has been ingesting scholarly references continuously since early July 2016 n  OCC now provides the largest RDF collection of open citation data on the Web §  Currently holds references from ~240,000 citing bibliographic resources §  Provides >10 million citation links to over 5.5 million cited resources §  These data are freely available under a CC0 public domain waiver
  • 13. Source data - reference lists from PubMed Central n  At present, the ingested reference lists are obtained by processing the XML sources of papers in the Open Access subset of PubMed Central n  These are parsed to yield authors, titles, journal names, etc. §  We ask for the most recent papers first §  Thus, as citing papers, the OCC mainly includes articles published in 2016 and 2017 n  The identifiers of all the citing papers already processed are stored locally, so as not to request the same XML source twice n  We then call several external APIs, including Crossref and ORCID, to obtain additional metadata describing the citing and cited papers and their authors n  There are almost 1.7 million OA articles available in PubMed §  So far we have harvested 14% . . .
  • 14. The raw reference list data n  The reference lists extracted from citing papers are made available in JSON: {
 "doi": "10.1007/s11892-016-0752-4",
 "pmid": "27168063",
 "pmcid": "PMC4863913",
 "localid": "MED-27168063",
 "curator": "BEE EuropeanPubMedCentralProcessor",
 "source": "http://www.ebi.ac.uk/europepmc/webservices/rest/PMC4863913/fullTextXML",
 "source_provider": "Europe PubMed Central”
 "references": [
 ... 
 {
 "bibentry": "Chang, KY, Unanue, ER. Prediction of HLA-DQ8beta cell peptidome using
 a computational program and its relationship to autoreactive T cells,
 Int Immunol, 2009, 21, 6, 705, 13, DOI: 10.1093/intimm/dxp039, 
 PMID: 19461125",
 "pmid": "19461125",
 "doi": "10.1093/intimm/dxp039",
 "pmcid": "PMC2686615",
 "process_entry": "True”
 },
 ...
 ]
 } The citing paper's metadata and identifiers A reference in the citing paper's reference list, with its own ids
  • 15. The SPAR (Semantic Publishing and Referencing) Ontologies FaBiO, the FRBR-aligned Bibliographic Ontology - an ontology for describing bibliographic entities (books, articles, etc.) CiTO, the Citation Typing Ontology - enables the characterization of citations, both factually and rhetorically BiRO, the Bibliographic Reference Ontology - an ontology to define bibliographic records and references, and their compilation into bibliographic collections and reference lists, respectively http://www.sparontologies.net/ n  OCC data are then stored in RDF (JSON-LD) using the SPAR (Semantic Publishing and Referencing) ontologies and other standard vocabularies n  These SPAR ontologies include
  • 16. Availability of the OpenCitations Corpus data n  All the OpenCitations software is available on GitHub under an open license n  The data in the OpenCitations Corpus are available in three different ways: §  Direct access to bibliographic resources by means of their HTTP URIs (via content negotiation), e.g. https://w3id.org/oc/corpus/br/1 §  Queries to our SPARQL endpoint: https://w3id.org/oc/sparql §  Monthly dumps stored in Figshare: http://opencitations.net/download n  Currently the OCC uses a good graph-based triplestore – Blazegraph n  However, the virtual machine that hosts it is very limited in resources, causing performance problems for demanding SPARQL queries n  We plan soon to commission a new powerful physical server that should provide a better user experience, and to develop additional user-friendly interfaces for accessing the OCC data, including graphic visualizations of citation networks
  • 17. Use of the OpenCitations web site n  Accesses to the OpenCitations web site and services: The “corpus” and “sparql” pages have together gained 89% of the total accesses, showing that people mainly access the OpenCitations Corpus to explore and use the data within it
  • 18. Use of OpenCitations data stored on Figshare
  • 19. What happened this summer? n  Use of the OpenCitations social accounts §  Twitter - https://twitter.com/opencitations §  Wordpress Blog – https://opencitations.wordpress.com/ increased markedly following the launch of the Initiative for Open Citations
  • 20. Who is using OpenCitations, and for what? n  Organizations and projects that we know use OpenCitations resources include: §  Wikidata - pulling citation data to enrich their pages §  OpenAIRE – using OCC bibliographic resources info in OpenAIRE §  LOC-DB - have adopted the OpenCitations data model for their database §  Tomas Petricek of the Turing Institute - extending his Gamma Project visualization software to handle OpenCitations’ RDF data §  Ontotext.com - combining Springer's SciGraph data with OpenCitations data using SPARQL federation §  Anna Kamińska of the Polish Librarians Association - undertaking citation network analysis of PLoS One research papers using data in the OCC n  We can’t know who else is using OpenCitations resources unless they tell us! §  Please let us know if you are! n  On 10th September, Crossref blogged about our use of their REST API §  https://www.crossref.org/blog/using-the-crossref-rest-api.-part-5-with- opencitations/
  • 21. Present status of OpenCitations n  We have recently received a small grant from the Sloan Foundation for the OpenCitations Enhancement Project §  This provides one year’s salary for a postdoc to develop new user interfaces, and new hardware to enhance the OCC performance n  We have just appointed Ivan Heibi to work on the OCC with Silvio in Bologna n  Silvio and Ivan will be commissioning the new hardware next month §  This will use parallel processing to increase ingest rate 30-fold n  We are in the process of appointing an International Advisory Board to guide the growth of OpenCitations
  • 22. Enhancing the OpenCitations ingestion rate n  OpenCitations current ingests ~8 million new citations per year n  With 30 Raspberry Pis working in parallel as ingest machines, we anticipate that this rate will increase to ~240 million new citations per year n  By the end of 2018, OpenCitations should hold ~ 250 million citations, compared to Web of Knowledge’s ~1.25 billion n  Even this partial coverage will include citations of all important papers, these critical papers being easily recognized because they are highly cited, forming nodes in the citation graph with a large number of inward citation links n  A further five-fold increase in ingest rate - significant but achievable with additional hardware (and funding!) - will enable us to reach parity by 2020
  • 23. Where will the references come from? n  With the enhanced ingest rate, we will quickly consume all 1.7 million articles in the Open Access Subset of PubMed Central n  We will then start harvesting the references from the ~16 million articles already made open at Crossref in response to the Initiative for Open Citations, and the additional articles that I4OC now encourages other publishers to open n  Possible additional significant sources of open citation data include §  ArXiv (1.3 million preprints) §  CiteSeerX (>120 million references from >6 million documents) §  CitEc (11 million references from a million Economics papers) n  References from pre-digital publications extracted by text mining, e.g. §  In the Social Sciences, from the LOC-DB at the University of Mannheim §  In Biological Taxonomy, mined into BioStor by Rod Page from the Biodiversity Heritage Library, e.g. http://biostor.org/reference/105357
  • 24. We are winning the battle for open scholarship! david.shotton@opencitations.net David Shotton Silvio Peroni silvio.peroni@opencitations.net Website: http://opencitations.net Email: contact@opencitations.net Twitter: @opencitations Blog: https://opencitations.wordpress.com Website: https://i4oc.org/ Email: info@i4oc.org Twitter: @i4oc_org dtaraborelli@wikimedia.org Dario Taraborelli Mark Patterson m.patterson@elifesciences.org Catriona MacCallum catriona.maccallum@hindawi.com