SlideShare a Scribd company logo
1 of 27
The nature.com
ontologies portal
nature.com/ontologies
Tony Hammond, Michele Pasin
Who we are
We are both part of Macmillan Science and Education*
- Macmillan S&E is a global STM publisher
- Tony Hammond is Data Architect, Technology
@tonyhammond
- Michele Pasin is Information Architect, Product Office
@lambdaman
* We merged earlier this year (May 2015) with Springer Science+Business Media
to become Springer Nature. We are currently actively engaged in integrating our
businesses.
Macmillan: science and education brands
We publish a lot of science! (1845-2015)
http://www.nature.com/developers/hacks/articles/by-year
1,2 million articles in total
Why we’re here today: to ask some questions
We have been making semantic data available in RDF models for a number of
years through our data.nature.com portal (2012–2015)
Big questions:
- Is this data of any use to the Linked Science community?
- Should Springer Nature continue to invest in LOD sharing?
More specifically:
- Does the data contain enough items of interest? [Content]
- Are the vocabularies understandable and useful? [Structure]
- Are the data easy to get and to reuse? [Accessibility]
- Is dereference / download / query the preferred option?
Our goals and rationale
- Semantic technologies are a promising way to do enterprise metadata
management at web scale
- Initially used primarily for data publishing / sharing (data.nature.com, 2011)
- Since 2013, a core component of our digital publishing workflow (see ISWC14 paper)
- Contributing to an emerging web of linked science data
- As a major publisher since 1845, ideally positioned to bootstrap a science ‘publications hub’
- Building on the fundamental ties that exist between the actual research works and the
publications that tell the story about it
The vision of a science graph
Zooming into the science graph
Implementing this vision
- Step 1: Linked Data Platform (2012–2014)
- datasets
- downloads + SPARQL endpoints (streaming, non-streaming)
- linked data dereference
- Step 2: Ontologies Portal (2015–)
- datasets + models (core, domain)
- downloads
- extensive documentation
The Ontologies Portal
www.nature.com/ontologies
Architecture
The core ontology
- Language: OWL 2, Profile: ALCHI(D)
- Entities: ~50 classes, ~140 properties
- Principles: Incremental Formation / Enterprise Integration / Model Coherence
http://www.nature.com/ontologies/core/
The core ontology: mappings
:Asset
:Thing
:Publicat ion
:Concept
:Event
:Subj ect
:Type
:Agent
:Art icleType
:Publishing
Event
:Aggregat ion
Event
:Component
:Document
:Serial
cidoc-crm:
Information_Carrier
cidoc-crm:
Conceptual_Object
dbpedia:Agent
dc:Agent
dcterms:Agent
cidoc-crm:Agent
vcard:Agent
foaf:Agent
event:Event
bibo:Event
schema:Event
cidoc-crm:
TemporalEntity
cidoc-crm:Type
vcard:Type
fabio:SubjectTerm
bibo:Document
cidoc-crm:Document
foaf:Document
bibo:Periodical
fabio:Periodical
schema:Periodical
bibo:DocumentPart
fabio:Expression
cidoc-crm:InformationObject
= owl:equivalentClass
http://www.nature.com/ontologies/linksets/core/
Domain models
Domain models: subjects
- Structure: SKOS, multi hierarchical tree, 6 branches, 7 levels of depth
- Entities: ~2500 concepts
- Mappings: 100% of terms, using skos:broadMatch or skos:closeMatch
www.nature.com/ontologies/models/subjects/
http://www.nature.com/developers/hacks/#1
Subject ontology visualizations
Domain models: mappings
Article Types
Subjects
Journals
Relations
http://www.nature.com/ontologies/linksets
Datasets
- Articles: 25m records (for 1.2m articles) with metadata like title, publication etc.. except authors
- Contributors: 11m records (for 2.7m contributors) i.e. the article’s authors, structured and ordered
but not disambiguated
- Citations: 218m records (for 9.3m citations) – from an earlier release
Datasets: articles-wikipedia links
How: data extracted using wikipedia search API, 51,309 links over 145 years
Quality: only ~900 were links to nature.com without a DOI, rest all use DOIs correctly
Encoding: cito:isCitedBy => wiki URL, foaf:topic => dbPedia URI
http://www.nature.com/developers/hacks/wikilinks
Data publishing: sources
Sources:
Ontologies (small scale; RDF native)
- mastered as RDF data (Turtle)
- managed in GitHub
- in-memory RDF models built using Apache Jena
- models augmented at build time using SPIN rules
- deployed to MarkLogic as RDF/XML for query
- exported as RDF dataset (Turtle) and as CSV
Documents (large scale; XML native)
- mastered as XML data
- managed in MarkLogic XML database
- data mined from XML documents (1.2m articles) using Scala
- in-memory RDF models built using Apache Jena
- injected as RDF/XML sections into XML documents for query
- exported as RDF dataset (N-Quads)
Organization:
Named graphs – one graph per class
Data publishing: workflows
Data publishing: rules (basic inference)
construct {
?s npg:publicationStartYear ?xds1 .
?s npg:publicationStartYearMonth ?xds2 .
?s npg:publicationStartDate ?xds3 .
?s npg:publicationEndYear ?xde1 .
?s npg:publicationEndYearMonth ?xde2 .
?s npg:publicationEndDate ?xde3 .
}
where {
?s a npg:Journal .
optional { ?s npg:dateStart ?dateStart } optional { ?s npg:dateEnd ?dateEnd }
{
bind (if(regex(?dateStart, "^d{4}"), substr(?dateStart,1,4), "") as ?ds1)
bind (xsd:gYear(?ds1) as ?xds1)
} union {
bind (if(regex(?dateStart, "^d{4}-d{2}"), substr(?dateStart,1,7), "") as ?ds2)
bind (xsd:gYearMonth(?ds2) as ?xds2)
} union {
bind (if(regex(?dateStart, "^d{4}-d{2}-d{2}$"), substr(?dateStart,1,10), "") as ?ds3)
bind (xsd:date(?ds3) as ?xds3)
} union {
…
}
filter (?xds1 != "" || ?xds2 != "" || ?xds3 != "" || ?xde1 != "" || ?xde2 != "" || ?xde3 != "")
}
Data publishing: rules (validation)
construct {
npgg:journals npg:hasConstraintViolation [
a spin:ConstraintViolation ;
npg:severityLevel "Warning" ;
rdfs:label ?message ;
spin:rule [ a sp:Construct ; sp:text ?query ; ] ;
] .
}
where {
{ select (count(?s) as ?count)
where {
?s a npg:Journal .
filter ( not exists { ?s bibo:shortTitle ?h . } ) }
}
bind (concat("! Found ", str(?count), " journals with no short title") as ?message)
bind (""”
construct {
npgg:journals npg:hasConstraintViolation [
a spin:ConstraintViolation ;
spin:violationRoot ?s ; … ] .
} where { … }
""" as ?query)
}
Data publishing: rules (contracts)
knowledge-bases:public
...
npg:hasContract [
rdfs:comment "Contract for ArticleTypes Ontology" ;
npg:graph npgg:article-types ;
npg:hasBinding [
npg:onOntology article-types: ;
npg:allowsPredicate
dc:creator , dc:date , dc:publisher , dc:rights , dcterms:license ,
npg:webpage , owl:imports , owl:versionInfo , rdf:type , rdfs:comment ,
skos:definition , skos:prefLabel , skos:note ,
vann:preferredNamespacePrefix , vann:preferredNamespaceUri
;
] , [
npg:onInstanceOf npg:ArticleType ;
npg:allowsPredicate
npg:hasRoot , npg:isPrimaryArticleType ,
npg:id , npg:isLeaf , npg:isRoot , npg:treeDepth ,
rdf:type , rdfs:isDefinedBy , rdfs:seeAlso ,
skos:broadMatch , skos:broader , skos:closeMatch ,
skos:definition , skos:exactMatch , skos:inScheme , skos:narrower ,
skos:prefLabel , skos:relatedMatch , skos:topConceptOf
;
] ;
] ;
...
Data publishing: contracts workflow
Next steps
More features:
- Linked data dereference
- Richer dataset descriptions (VoID, PROV, HCLS Profile, etc.)
- SPARQL endpoint?
- JSON-LD API?
More data:
- Adding extra data points (funding info, abstracts, …)
- Revamp citations dataset
- Longer term: extending archive to include Springer content
More feedback:
- User testing around data accessibility
- Surveying communities/users for this data
Looking ahead: how can a publisher make linked
science happen?
From a business perspective:
- Finding adequate licensing solutions
- Justifying the effort to publishers
- Who uses this data? What’s the ROI?
From a communities perspective:
- Do we actually know who are the users?
- How do we get more feedback/uptake?
- Should we work more with non-linked-data communities?

More Related Content

What's hot

Using the whole web as your dataset
Using the whole web as your datasetUsing the whole web as your dataset
Using the whole web as your datasetTuri, Inc.
 
30° Nexa Lunch Seminar - Linked Data Platform vs real world
30° Nexa Lunch Seminar - Linked Data Platform vs real world30° Nexa Lunch Seminar - Linked Data Platform vs real world
30° Nexa Lunch Seminar - Linked Data Platform vs real worldDiego Valerio Camarda
 
A Semantic Data Model for Web Applications
A Semantic Data Model for Web ApplicationsA Semantic Data Model for Web Applications
A Semantic Data Model for Web ApplicationsArmin Haller
 
Knowledge discoverylaurahollink
Knowledge discoverylaurahollinkKnowledge discoverylaurahollink
Knowledge discoverylaurahollinkSSSW
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Fabrizio Orlandi
 
Cenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlCenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlPrimal Pappachan
 
Repeatable Semantic Queries for the Linked Data Agnostic
Repeatable Semantic Queries for the Linked Data AgnosticRepeatable Semantic Queries for the Linked Data Agnostic
Repeatable Semantic Queries for the Linked Data AgnosticAlbert Meroño-Peñuela
 
Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Fabrizio Orlandi
 
(PROJEKTURA) Big Data Open Data story for TGG
(PROJEKTURA) Big Data Open Data story for TGG(PROJEKTURA) Big Data Open Data story for TGG
(PROJEKTURA) Big Data Open Data story for TGGRatko Mutavdzic
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataOntotext
 
Knowledge graphs on the Web
Knowledge graphs on the WebKnowledge graphs on the Web
Knowledge graphs on the WebArmin Haller
 
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioOpen Knowledge Belgium
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageBuild Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageOntotext
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW
 
Schema.org - An Extending Influence
Schema.org - An Extending InfluenceSchema.org - An Extending Influence
Schema.org - An Extending InfluenceRichard Wallis
 
(Enterprise) Linked Data Platform a new standard to manage LOD
(Enterprise) Linked Data Platform a new standard to manage LOD(Enterprise) Linked Data Platform a new standard to manage LOD
(Enterprise) Linked Data Platform a new standard to manage LODDiego Valerio Camarda
 
The Lonesome LOD Cloud
The Lonesome LOD CloudThe Lonesome LOD Cloud
The Lonesome LOD CloudRuben Verborgh
 

What's hot (20)

Using the whole web as your dataset
Using the whole web as your datasetUsing the whole web as your dataset
Using the whole web as your dataset
 
30° Nexa Lunch Seminar - Linked Data Platform vs real world
30° Nexa Lunch Seminar - Linked Data Platform vs real world30° Nexa Lunch Seminar - Linked Data Platform vs real world
30° Nexa Lunch Seminar - Linked Data Platform vs real world
 
A Semantic Data Model for Web Applications
A Semantic Data Model for Web ApplicationsA Semantic Data Model for Web Applications
A Semantic Data Model for Web Applications
 
Knowledge discoverylaurahollink
Knowledge discoverylaurahollinkKnowledge discoverylaurahollink
Knowledge discoverylaurahollink
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
 
Cenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlCenitpede: Analyzing Webcrawl
Cenitpede: Analyzing Webcrawl
 
Repeatable Semantic Queries for the Linked Data Agnostic
Repeatable Semantic Queries for the Linked Data AgnosticRepeatable Semantic Queries for the Linked Data Agnostic
Repeatable Semantic Queries for the Linked Data Agnostic
 
Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021
 
(PROJEKTURA) Big Data Open Data story for TGG
(PROJEKTURA) Big Data Open Data story for TGG(PROJEKTURA) Big Data Open Data story for TGG
(PROJEKTURA) Big Data Open Data story for TGG
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
Knowledge graphs on the Web
Knowledge graphs on the WebKnowledge graphs on the Web
Knowledge graphs on the Web
 
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
 
Web of Data Usage Mining
Web of Data Usage MiningWeb of Data Usage Mining
Web of Data Usage Mining
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageBuild Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
 
Schema.org - An Extending Influence
Schema.org - An Extending InfluenceSchema.org - An Extending Influence
Schema.org - An Extending Influence
 
(Enterprise) Linked Data Platform a new standard to manage LOD
(Enterprise) Linked Data Platform a new standard to manage LOD(Enterprise) Linked Data Platform a new standard to manage LOD
(Enterprise) Linked Data Platform a new standard to manage LOD
 
Danbri Drupalcon Export
Danbri Drupalcon ExportDanbri Drupalcon Export
Danbri Drupalcon Export
 
The Lonesome LOD Cloud
The Lonesome LOD CloudThe Lonesome LOD Cloud
The Lonesome LOD Cloud
 

Similar to The nature.com ontologies portal: nature.com/ontologies

Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectEnrico Daga
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Enrico Daga
 
SemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in PracticeSemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in PracticeDan Brickley
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And VisualizationIvan Ermilov
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Takeshi Morita
 
Linked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryLinked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryRuben Schalk
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsCarole Goble
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us? Andrea Volpini
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012François Belleau
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseRDTF-Discovery
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Researchadameq
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked dataEnno Meijers
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Jane Stevenson
 
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Mark Wilkinson
 
Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012scorlosquet
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionFlink Forward
 
The Future of Search and SEO in Drupal
The Future of Search and SEO in DrupalThe Future of Search and SEO in Drupal
The Future of Search and SEO in Drupalscorlosquet
 

Similar to The nature.com ontologies portal: nature.com/ontologies (20)

Publishing Linked Data using Schema.org
Publishing Linked Data using Schema.orgPublishing Linked Data using Schema.org
Publishing Linked Data using Schema.org
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything Project
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
 
SemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in PracticeSemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in Practice
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...
 
Linked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryLinked Open Data Utrecht University Library
Linked Open Data Utrecht University Library
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcase
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Research
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked data
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011
 
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
 
Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
The Future of Search and SEO in Drupal
The Future of Search and SEO in DrupalThe Future of Search and SEO in Drupal
The Future of Search and SEO in Drupal
 

More from Tony Hammond

Data Integration & Disintegration: Managing SN SciGraph with SHACL and OWL
Data Integration & Disintegration: Managing SN SciGraph with SHACL and OWLData Integration & Disintegration: Managing SN SciGraph with SHACL and OWL
Data Integration & Disintegration: Managing SN SciGraph with SHACL and OWLTony Hammond
 
Iswc 2014-hammond-pasin-presentation-final
Iswc 2014-hammond-pasin-presentation-finalIswc 2014-hammond-pasin-presentation-final
Iswc 2014-hammond-pasin-presentation-finalTony Hammond
 
Techniques used in RDF Data Publishing at Nature Publishing Group
Techniques used in RDF Data Publishing at Nature Publishing GroupTechniques used in RDF Data Publishing at Nature Publishing Group
Techniques used in RDF Data Publishing at Nature Publishing GroupTony Hammond
 
nature.com OpenSearch
nature.com OpenSearchnature.com OpenSearch
nature.com OpenSearchTony Hammond
 
OpenURL - The Rough Guide
OpenURL - The Rough GuideOpenURL - The Rough Guide
OpenURL - The Rough GuideTony Hammond
 
Agile Descriptions
Agile DescriptionsAgile Descriptions
Agile DescriptionsTony Hammond
 

More from Tony Hammond (11)

XMP Inspector
XMP InspectorXMP Inspector
XMP Inspector
 
Data Integration & Disintegration: Managing SN SciGraph with SHACL and OWL
Data Integration & Disintegration: Managing SN SciGraph with SHACL and OWLData Integration & Disintegration: Managing SN SciGraph with SHACL and OWL
Data Integration & Disintegration: Managing SN SciGraph with SHACL and OWL
 
Iswc 2014-hammond-pasin-presentation-final
Iswc 2014-hammond-pasin-presentation-finalIswc 2014-hammond-pasin-presentation-final
Iswc 2014-hammond-pasin-presentation-final
 
Techniques used in RDF Data Publishing at Nature Publishing Group
Techniques used in RDF Data Publishing at Nature Publishing GroupTechniques used in RDF Data Publishing at Nature Publishing Group
Techniques used in RDF Data Publishing at Nature Publishing Group
 
nature.com OpenSearch
nature.com OpenSearchnature.com OpenSearch
nature.com OpenSearch
 
Handle 08
Handle 08Handle 08
Handle 08
 
OpenURL - The Rough Guide
OpenURL - The Rough GuideOpenURL - The Rough Guide
OpenURL - The Rough Guide
 
Bionlp 07
Bionlp 07Bionlp 07
Bionlp 07
 
Agile Descriptions
Agile DescriptionsAgile Descriptions
Agile Descriptions
 
Yads
YadsYads
Yads
 
Jisc
JiscJisc
Jisc
 

Recently uploaded

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 

Recently uploaded (20)

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 

The nature.com ontologies portal: nature.com/ontologies

  • 2. Who we are We are both part of Macmillan Science and Education* - Macmillan S&E is a global STM publisher - Tony Hammond is Data Architect, Technology @tonyhammond - Michele Pasin is Information Architect, Product Office @lambdaman * We merged earlier this year (May 2015) with Springer Science+Business Media to become Springer Nature. We are currently actively engaged in integrating our businesses.
  • 3. Macmillan: science and education brands
  • 4. We publish a lot of science! (1845-2015) http://www.nature.com/developers/hacks/articles/by-year 1,2 million articles in total
  • 5. Why we’re here today: to ask some questions We have been making semantic data available in RDF models for a number of years through our data.nature.com portal (2012–2015) Big questions: - Is this data of any use to the Linked Science community? - Should Springer Nature continue to invest in LOD sharing? More specifically: - Does the data contain enough items of interest? [Content] - Are the vocabularies understandable and useful? [Structure] - Are the data easy to get and to reuse? [Accessibility] - Is dereference / download / query the preferred option?
  • 6. Our goals and rationale - Semantic technologies are a promising way to do enterprise metadata management at web scale - Initially used primarily for data publishing / sharing (data.nature.com, 2011) - Since 2013, a core component of our digital publishing workflow (see ISWC14 paper) - Contributing to an emerging web of linked science data - As a major publisher since 1845, ideally positioned to bootstrap a science ‘publications hub’ - Building on the fundamental ties that exist between the actual research works and the publications that tell the story about it
  • 7. The vision of a science graph
  • 8. Zooming into the science graph
  • 9. Implementing this vision - Step 1: Linked Data Platform (2012–2014) - datasets - downloads + SPARQL endpoints (streaming, non-streaming) - linked data dereference - Step 2: Ontologies Portal (2015–) - datasets + models (core, domain) - downloads - extensive documentation
  • 12. The core ontology - Language: OWL 2, Profile: ALCHI(D) - Entities: ~50 classes, ~140 properties - Principles: Incremental Formation / Enterprise Integration / Model Coherence http://www.nature.com/ontologies/core/
  • 13. The core ontology: mappings :Asset :Thing :Publicat ion :Concept :Event :Subj ect :Type :Agent :Art icleType :Publishing Event :Aggregat ion Event :Component :Document :Serial cidoc-crm: Information_Carrier cidoc-crm: Conceptual_Object dbpedia:Agent dc:Agent dcterms:Agent cidoc-crm:Agent vcard:Agent foaf:Agent event:Event bibo:Event schema:Event cidoc-crm: TemporalEntity cidoc-crm:Type vcard:Type fabio:SubjectTerm bibo:Document cidoc-crm:Document foaf:Document bibo:Periodical fabio:Periodical schema:Periodical bibo:DocumentPart fabio:Expression cidoc-crm:InformationObject = owl:equivalentClass http://www.nature.com/ontologies/linksets/core/
  • 15. Domain models: subjects - Structure: SKOS, multi hierarchical tree, 6 branches, 7 levels of depth - Entities: ~2500 concepts - Mappings: 100% of terms, using skos:broadMatch or skos:closeMatch www.nature.com/ontologies/models/subjects/
  • 17. Domain models: mappings Article Types Subjects Journals Relations http://www.nature.com/ontologies/linksets
  • 18. Datasets - Articles: 25m records (for 1.2m articles) with metadata like title, publication etc.. except authors - Contributors: 11m records (for 2.7m contributors) i.e. the article’s authors, structured and ordered but not disambiguated - Citations: 218m records (for 9.3m citations) – from an earlier release
  • 19. Datasets: articles-wikipedia links How: data extracted using wikipedia search API, 51,309 links over 145 years Quality: only ~900 were links to nature.com without a DOI, rest all use DOIs correctly Encoding: cito:isCitedBy => wiki URL, foaf:topic => dbPedia URI http://www.nature.com/developers/hacks/wikilinks
  • 20. Data publishing: sources Sources: Ontologies (small scale; RDF native) - mastered as RDF data (Turtle) - managed in GitHub - in-memory RDF models built using Apache Jena - models augmented at build time using SPIN rules - deployed to MarkLogic as RDF/XML for query - exported as RDF dataset (Turtle) and as CSV Documents (large scale; XML native) - mastered as XML data - managed in MarkLogic XML database - data mined from XML documents (1.2m articles) using Scala - in-memory RDF models built using Apache Jena - injected as RDF/XML sections into XML documents for query - exported as RDF dataset (N-Quads) Organization: Named graphs – one graph per class
  • 22. Data publishing: rules (basic inference) construct { ?s npg:publicationStartYear ?xds1 . ?s npg:publicationStartYearMonth ?xds2 . ?s npg:publicationStartDate ?xds3 . ?s npg:publicationEndYear ?xde1 . ?s npg:publicationEndYearMonth ?xde2 . ?s npg:publicationEndDate ?xde3 . } where { ?s a npg:Journal . optional { ?s npg:dateStart ?dateStart } optional { ?s npg:dateEnd ?dateEnd } { bind (if(regex(?dateStart, "^d{4}"), substr(?dateStart,1,4), "") as ?ds1) bind (xsd:gYear(?ds1) as ?xds1) } union { bind (if(regex(?dateStart, "^d{4}-d{2}"), substr(?dateStart,1,7), "") as ?ds2) bind (xsd:gYearMonth(?ds2) as ?xds2) } union { bind (if(regex(?dateStart, "^d{4}-d{2}-d{2}$"), substr(?dateStart,1,10), "") as ?ds3) bind (xsd:date(?ds3) as ?xds3) } union { … } filter (?xds1 != "" || ?xds2 != "" || ?xds3 != "" || ?xde1 != "" || ?xde2 != "" || ?xde3 != "") }
  • 23. Data publishing: rules (validation) construct { npgg:journals npg:hasConstraintViolation [ a spin:ConstraintViolation ; npg:severityLevel "Warning" ; rdfs:label ?message ; spin:rule [ a sp:Construct ; sp:text ?query ; ] ; ] . } where { { select (count(?s) as ?count) where { ?s a npg:Journal . filter ( not exists { ?s bibo:shortTitle ?h . } ) } } bind (concat("! Found ", str(?count), " journals with no short title") as ?message) bind (""” construct { npgg:journals npg:hasConstraintViolation [ a spin:ConstraintViolation ; spin:violationRoot ?s ; … ] . } where { … } """ as ?query) }
  • 24. Data publishing: rules (contracts) knowledge-bases:public ... npg:hasContract [ rdfs:comment "Contract for ArticleTypes Ontology" ; npg:graph npgg:article-types ; npg:hasBinding [ npg:onOntology article-types: ; npg:allowsPredicate dc:creator , dc:date , dc:publisher , dc:rights , dcterms:license , npg:webpage , owl:imports , owl:versionInfo , rdf:type , rdfs:comment , skos:definition , skos:prefLabel , skos:note , vann:preferredNamespacePrefix , vann:preferredNamespaceUri ; ] , [ npg:onInstanceOf npg:ArticleType ; npg:allowsPredicate npg:hasRoot , npg:isPrimaryArticleType , npg:id , npg:isLeaf , npg:isRoot , npg:treeDepth , rdf:type , rdfs:isDefinedBy , rdfs:seeAlso , skos:broadMatch , skos:broader , skos:closeMatch , skos:definition , skos:exactMatch , skos:inScheme , skos:narrower , skos:prefLabel , skos:relatedMatch , skos:topConceptOf ; ] ; ] ; ...
  • 26. Next steps More features: - Linked data dereference - Richer dataset descriptions (VoID, PROV, HCLS Profile, etc.) - SPARQL endpoint? - JSON-LD API? More data: - Adding extra data points (funding info, abstracts, …) - Revamp citations dataset - Longer term: extending archive to include Springer content More feedback: - User testing around data accessibility - Surveying communities/users for this data
  • 27. Looking ahead: how can a publisher make linked science happen? From a business perspective: - Finding adequate licensing solutions - Justifying the effort to publishers - Who uses this data? What’s the ROI? From a communities perspective: - Do we actually know who are the users? - How do we get more feedback/uptake? - Should we work more with non-linked-data communities?

Editor's Notes

  1. main questions for the presentation > strucutre and mappings; accesisble enoguh? > content: big enough? > accessibility: need more ? > overall: is this useful? should NPG stop releasing these data and keep using it only for internal purposes?
  2. ideally link to online representation
  3. main questions for the presentation > structure and mappings; accessible enough? > content: big enough? > accessibility: need more ? > overall: is this useful? should NPG stop releasing these data and keep using it only for internal purposes? > data torrents?
  4. slide about vision [1]
  5. slide about vision [2]
  6. The core model is a formal model that defines the key concepts we use for content publishing. It includes branches that describe the things we publish (publications), the things we use to categorise the things we publish (types) and more abstract concepts to document details of the publication workflow (events). In designing the Core Ontology, we adhered to three main principles: Incremental formalization We started out with a relatively flat model and tested it against our use cases and system architecture adding additional structure as more precise requirements were made available. The choice of names for classes and properties has also been tested and validated against our target audience and the enterprise use cases. Cohesiveness Although we do make some use of public vocabularies such as BIBO and FOAF, in general we decided to follow a minimal commitment to external vocabularies as that would let us retain more control over our model and also create a much more cohesive ontology. This is mainly because currently our main driver is to support internal applications. In order to facilitate web-scale data integration we have whenever possible added mappings to other commonly used vocabularies, e.g. BIBO, FABIO and FOAF, via owl:equivalentClass and owl:equivalentProperty relationships. Focus on integration We have primarily focused on building a shared enterprise model, e.g. by getting the core classes and properties right and thus achieving some simple yet fundamental level of data integration. So even though we make use of SPIN rules and some basic inference in the data enrichment phase, we have not yet really taken advantage of the various inference mechanisms that can be built on top of OWL. Overall, the Core Ontology represents a measured balance between supporting legacy practices (some stretching back over many years) and enabling new requirements (which may only be revealed incrementally). It has been developed and grown within a cross-functional software delivery team. Some of the modelling clearly reflects immediate pragmatic concerns and the 'operational semantics' originating from our specific system architecture, but are included here to show how we are using this ontology to drive forward our content publishing and discovery processes.
  7. The Core Ontology is mapped to a number of external ontologies. We use owl:equivalentClass and owl:equivalentProperty properties to map our classes (>70 mappings) and properties (>30 mappings), respectively. This a work in progress as we are constantly trying to improve the precision and variety of our mappings. We would encourage any interested party to give us feedback and suggestions about other models we should link to.
  8. > The Subjects Ontology is mapped to the DBpedia and Wikidata datasets and also to the Bio2RDF and MeSH datasets. We use a skos:broadMatch or skos:closeMatch property to map our subjects instances.
  9. Most mature