Introduction of Linked Data for Science

Linked Open Data for ACademia

Introduction of Linked Data
for Science

Hideaki Takeda
takeda@nii.ac.jp / ORCID:0000-0002-2909-7163
Professor, National Institute of Informatics
2013 International Conference on Open Data in Biodiversity and Ecological Research, 20 November, 2013


Researchers in 1983
Survey, Research, and Writing

Printed Articles

Survey

Article Writing

Data
Data

Real World
Object


Researchers in 2013distribution of articles
Digital
More articles ever!

Sharing and re-use of data

Digital Articles
Printed Articles

Real and Digital objects as target

Survey

Article Writing

Digital Information
Data

Acquiring Data

Publishing Data

Data

Real World
Object


Trends of Research and Data
• Rapid Growth
– Increase of article publications
– Big data and many (small) databases

• Open and Share
– Open access
– Data sharing

• Integration
– Among different types of data
– Across domains


Key Requirements
• Accessibility
– Research results must be shared

• Reusability
– Research results are expected to be re-used by
other research

• Sustainability
– Research results must be preserved


Open Data
• Open Data is not just “data which is
open”, rather …
• “A piece of data or content is open if anyone is
free to use, reuse, and redistribute it —
subject only, at most, to the requirement to
attribute and/or share-alike.” http://opendefinition.org/
• Use, re-use, redistribute
• Open license


5 ★ Open Data
- link your data to
other data to
provide context
- use URIs to denote things, so that
people can point at your stuff
- use non-proprietary formats (e.g., CSV
instead of Excel)
- make it available as structured data (e.g., Excel instead
of image scan of a table)
- make your stuff available on the Web (whatever format)
under an open license
http://5stardata.info/


Linked Data/Linked Open Data (LOD)
- link your data to
other data to
provide context
- use URIs to denote things, so that
people can point at your stuff


Web of Documents


Web of Data
Another data to
the observation

Data identical
to this

What’s the
meaning of
the data?

Inter-connection between data in difference
data sources is enabled


Linked Data Principles
• The four rules for Linked Data
– Use URIs as names for things
• Give a URI to every object in the world!
– Use HTTP URIs so that people can look up those names.
• Don’t use URN
– When someone looks up a URI, provide useful information, using the
standards (RDF, SPARQL)
• Provide machine-readable data for URI
– Include links to other URIs. so that they can discover more things.
• Make data linked together just like Web

Linked Data, TBL, http://www.w3.org/DesignIssues/LinkedData.html


How to express data in Linked Data
• Use RDF(+RDFS, OWL)
– Very simple：<Subject> <predicate> <object> .
<http://www-kasm.nii.ac.jp/~takeda#me> rdfs:type foaf:Person .
<http://www-kasm.nii.ac.jp/~takeda#me> foaf:name “Hideaki Takeda” .
<http://www-kasm.nii.ac.jp/~takeda#me> foaf:gender “male” .
<http://www-kasm.nii.ac.jp/~takeda#me> foaf:knows
<http://southampton.rkbexplorer.com/id/person07113> .
foaf:Person
rdfs:type
http://www-kasm.nii.ac.jp/
~takeda#me
foaf:knows
foaf:name
“Hideaki Takeda”

foaf:gender
“male”

http://southampton.rkbexplorer.com
/id/person07113


Linked Dataの記述
foaf:Person
rdfs:type
http://www-kasm.nii.ac.jp/
~takeda#me
foaf:knows

foaf:name

foaf:gender

“Hideaki Takeda”

“male”

http://southampton.rkbexplorer.com/
id/person-07113

owl:sameAs

dbpprop:occupation
dbpedia:Computer_scientist

<http://dbpedia.org/resource/Tim_Berners-Lee>

dbpprop:name
“Sir Tim Berners-Lee”

dbpprop:birthPlace
“London, England”

dbpprop:birthDate
“1955-06-08”


Linking Open Data (LOD)
•
•
•

•

•

The project to collect published Linked Data
Major Linked Data
(Translated from the original resources)
– Dbpedia (Wikipedia) 270 Million Triples
– Geonames：Geo names and their latitudes and longitudes, 93 Million
Triples
– MusicBrainz：Music
– WordNet：Dictionary
– DBLP bibliography：Bibliography for technical papers. 28 Million Triples
– US Census Data: 1 Billion Triples
（Crawling)
– FOAF (Friend Of A Friend)
（Wrapper）
– Flickr Wrapper


LOD Cloud
(Linking Open Data)


Benefits of LOD for Science
• Truly de-centralized database
– No need for central database
– Everyone can create one and join the cloud!

• Truly open and sharable data and schemata
– Easy for re-use and mash-up
– Easy for cross-domain/discipline use and connection

• A single format for all kind of data
– Easy for data processing


Bio2RDF

At the heart of Linked Data for the Life Sciences

• Bio2RDF is an open source framework to produce
and provide biological linked data that uses
simple conventions on the emerging semantic
web
• Bio2RDF reduces the time and
effort involved in data
integration so that you can get
to doing science
• 19 datasets;
1,010,758,291 triples
http://bio2rdf.org/


Alison Callahan, José Cruz-Toledo, Peter Ansell, Michel Dumontier: Bio2RDF Release 2: Improved Coverage, Interoperability
and Provenance of Life Science Linked Data, The Semantic Web: Semantics and Big Data, Lecture Notes in Computer Science
Volume 7882, 2013, pp 200-212


Bio2RDF

LODAC Location:
Integration of location information

LODAC Project
- connecting academic data LODAC SPECIES: Connecting species data by name
Specimen

DB

Species
Info. DB

App. for query expansion

DBPedia Japanese

Research

GBIF

Taxon
Name DB

DB

BioSci.
No. of Names：
113118
No. of Triples：14,532,449 DB

LODAC Museum: LOD of data in museums
Raw Data for entities Minimum Data to identify entities Data for entities
Raw
Data from Source A
Integrated data
Data from Source B
Work
dc:references
dc:references
crm:P55_has_current_location
crm:P55_has_current_location dc:creator
dc:creator
dc:creator
Museum crm:P55_has_current_location
dc:references
dc:references
Creator
dc:references

dc:references

CKAN Japanese:
Catalog for Open Data


LODAC SPECIES: Linking Species
Information with names
Museum
Specimen
DB

Species
Info. DB
Research
DB

GBIF

Taxon Name
LOD
BioSci.
DB

No. of Species Names：113118
No. of Triples：14,532,449


Data model for intergration
TaxonName
rdfs:subClassOf
rdfs:subClassOf

CommonName

rdf:type

ScientificName

rdf:type

TaxonRank
rdf:type

rdf:type

rdf:type
hasTaxonRank

hasCommonName
hasScientificName

hasSuperTaxon

species
species

Butterfly

hasTaxonRank

BDLS

collectedDate

dcterms:source
crm:has_current_location

collectionLocality
institutionName

dcterms:publisher
rdf:type

Specimen
: owl:Class
: Named Graph

Bryophytes


Search application
with LODAC SPECIES

http://lod.ac/apps/lsdcs


LODAC Museum
• Integrated database for information on
museums in Japan
Type of Information

– Data
• No. of museums：114
• No. of triples：
40,059,131

RDF type

No. of items

Collections (total)

lodac:Specimen +
lodac:Work

ca. 1,770,000

Collections (specimen)

lodac:Specimen

ca. 1,690,000

Collections (creative and
historical work)

lodac:Work

ca. 130,000

Creators

foaf:Person

ca.

Institutes

Foaf:Organization

ca. 200,000

• Integration by creator, work and institute
• Data publication by RDF
• Some applications using the data

8,800


Integrated data processing by RDF
Collect

Refine

Integrate

Publish

Use

Processed by RDF
•
•
•
•
•

Collect：RDF by converting RDB / by scraping Web
Refine: Define schema and covert data by schema
Integrate: Schema mapping, ID mapping
Publish: Dump data / SPARQL Endpoint
Use: Mash-up applications

Collect

Extracting collection data from
museum websites

Extract
Property

Value

Property

Value


Dataset

Collect
Type
Art work
(lodac:Work)

No.

Data source
ca.80,000 Catalog of the collections of 3 National Art
Museum (25,180), National Museum of
Western Art (4,373), Tokushima Pref. Art
Museum (18,482) … over 100 museums
Database for National Treasure & Important
Cultural Property of National Designated (915)
The Japanese Art Thesaurus (266)

Specimen
(lodac:Speciment)

Person (foaf:Person)
Facilities (icls.
Museum)

ca.1,690,000 (100+ Museum collections)
Science Net (National Science Museum)

ca. 8,800 The Japanese Art Thesaurus
ca. 200,000 The Japanese Art Thesaurus
Cultural Heritage Online
GIS data National and Regional Planning
Bureau

Refine

Standardization of data

Re-organized common metadata.
dc:title
crm:P45_consistOf
skos:preflabel
Raw Data

....

lodac:era
Re-organized Metadata

Current organized policies
・Use existing metadata
・Define own metadata.
31

Refine

Metadata schema for works
lodac:Work

Genre
Type of cultural assets
Creator
Nationality
Title
Title Pronunciation (yomi)
Title in English
Inscription
Seal
No. of parts
Collection
Created year
Estimated starting year
Material

Property
lodac:genre
lodac:culturalAssets
dc:creator / dc11:creator
crm:P7_took_place_at
dc:title / skos:prefLabel
dc:title @ja-hrkt / skos:altLabel
dc:title @en / skos:altLabel
crm:P62I_is_depicted_by
crm:P65_shows_visual_item
crm:P57_has_number_of_parts
dc:isPartOf
dc:created
lodac:estimatedStartYear
dc:medium / crm:P45_consists_of


Integrating Data

Integrate

Raw Data for entities

Minimum Data to identify entities

Raw Data for entities

Integrated data

Data from Source B

Data from Source A

Work

dc:references
dc:references
dc:creator
dc:creator

dc:creator
Museum
dc:references
dc:references

Creator
dc:references

dc:references

Integrate
Integrate Item

Integrating Data
Source
A.Japanese Art Thesaurus

Amount
of Data

648

Facilities

77
B.Cultural Heritage Online

Title of important
cultural properties

Creator information
and Work Title

Integration
Data

A.Japanese Art Thesaurus (Art work)

915
3,800

74
B.DB for National Treasure (Art work)

10,115

A.Japanese Art Thesaurus (Creator)

1,332
15,020

B.All of art work (Work title string)

61,861

A.Japanese Art Thesaurus (Creator)

1,332

Creator name

615
B.All of art work title(using creator name)

61,861

34

Publish

Publishing data as RDF
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:lodac="http://lod.a
c/ns/lodac#" xmlns:dc="http://purl.org/dc/terms/"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:skos="http://www.w3.org
/2004/02/skos/core#">
<foaf:Person rdf:about="http://lod.ac/id/359">
<lodac:creates rdf:resource="http://lod.ac/id/20029"/>

ID-resource URI
(Own address)
http://lod.ac/id/359
Links to her/his
work URI


External link
DBpedia Japanese

……

<dc:references rdf:resource="http://ja.dbpedia.org/resource/下村観山"/>
<dc:references rdf:resource="http://lod.ac/ref/359"/>
<rdfs:label xml:lang="ja">下村観山</rdfs:label>

<skos:prefLabel xml:lang="ja">下村観山</skos:prefLabel>
<foaf:name xml:lang="ja">下村観山</foaf:name>
</foaf:Person>

Ref-resource URI
http://lod.ac/ref/359

Use

Yokohama Art Spot

LODAC Museum × Yokohama Art LOD

– Application using
museum and local data
– Data related to art in
Yokohama
• Collections
• Events
• Q&A
http://lod.ac/apps/yas/

× PinQA


System Architecture

Use

‣ Python + SPARQLWrapper
‣ Geolocation

Yokohama
Art LOD

PinQA

Question

User

JSON

SPARQL

Yokohama Art Spot

LODAC
Museum
Work

Event

Answer

Artist

Institution

Artist

Institution


Conclusion
• Data and Web
– Great Potential!

• Linked Data - Exploit the power of Web –
– Simple Structure: URI and RDF
– Truly distributed data management
– Easy to link to each other
– Suitable for inter-disciplinary areas

• Left Issues
– Scalability
– Sustainability
• DOI: DataCite
• ORCID

Introduction of Linked Data for Science

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to Introduction of Linked Data for Science

Similar to Introduction of Linked Data for Science (20)

More from National Institute of Informatics (NII)

More from National Institute of Informatics (NII) (20)

Recently uploaded

Recently uploaded (20)

Introduction of Linked Data for Science