Your SlideShare is downloading. ×
0
Linked Open Data for ACademia

Introduction of Linked Data
for Science

Hideaki Takeda
takeda@nii.ac.jp / ORCID:0000-0002-...
Linked Open Data for ACademia

Researchers in 1983
Survey, Research, and Writing

Printed Articles

Survey

Article Writin...
Linked Open Data for ACademia

Researchers in 2013distribution of articles
Digital
More articles ever!

Sharing and re-use...
Linked Open Data for ACademia

Trends of Research and Data
• Rapid Growth
– Increase of article publications
– Big data an...
Linked Open Data for ACademia

Key Requirements
• Accessibility
– Research results must be shared

• Reusability
– Researc...
Linked Open Data for ACademia

Key Requirements
• Accessibility
– Research results must be shared

• Reusability
– Researc...
Linked Open Data for ACademia

Open Data
• Open Data is not just “data which is
open”, rather …
• “A piece of data or cont...
Linked Open Data for ACademia

5 ★ Open Data
- link your data to
other data to
provide context
- use URIs to denote things...
Linked Open Data for ACademia

Linked Data/Linked Open Data (LOD)
- link your data to
other data to
provide context
- use ...
Linked Open Data for ACademia

Web of Documents
Linked Open Data for ACademia

Web of Data
Another data to
the observation

Data identical
to this

What’s the
meaning of
...
Linked Open Data for ACademia

Linked Data Principles
• The four rules for Linked Data
– Use URIs as names for things
• Gi...
Linked Open Data for ACademia

How to express data in Linked Data
• Use RDF(+RDFS, OWL)
– Very simple:<Subject> <predicate...
Linked Open Data for ACademia

Linked Dataの記述
foaf:Person
rdfs:type
http://www-kasm.nii.ac.jp/
~takeda#me
foaf:knows

foaf...
Linked Open Data for ACademia

Linking Open Data (LOD)
•
•
•

•

•

The project to collect published Linked Data
Major Lin...
Linked Open Data for ACademia
Linked Open Data for ACademia
Linked Open Data for ACademia

LOD Cloud
(Linking Open Data)
Linked Open Data for ACademia

Benefits of LOD for Science
• Truly de-centralized database
– No need for central database
...
Linked Open Data for ACademia

Bio2RDF

At the heart of Linked Data for the Life Sciences

• Bio2RDF is an open source fra...
Linked Open Data for ACademia

Alison Callahan, José Cruz-Toledo, Peter Ansell, Michel Dumontier: Bio2RDF Release 2: Impro...
Linked Open Data for ACademia

Bio2RDF
Linked Open Data for ACademia
LODAC Location:
Integration of location information

LODAC Project
- connecting academic dat...
Linked Open Data for ACademia

LODAC SPECIES: Linking Species
Information with names
Museum
Specimen
DB

Species
Info. DB
...
Linked Open Data for ACademia

Data model for intergration
TaxonName
rdfs:subClassOf
rdfs:subClassOf

CommonName

rdf:type...
Linked Open Data for ACademia

Search application
with LODAC SPECIES

http://lod.ac/apps/lsdcs
Linked Open Data for ACademia

LODAC Museum
• Integrated database for information on
museums in Japan
Type of Information
...
Linked Open Data for ACademia

Integrated data processing by RDF
Collect

Refine

Integrate

Publish

Use

Processed by RD...
Linked Open Data for ACademia
Collect

Extracting collection data from
museum websites

Extract
Property

Value

Property
...
Linked Open Data for ACademia

Dataset

Collect
Type
Art work
(lodac:Work)

No.

Data source
ca.80,000 Catalog of the coll...
Linked Open Data for ACademia
Refine

Standardization of data

Re-organized common metadata.
dc:title
crm:P45_consistOf
sk...
Linked Open Data for ACademia
Refine

Metadata schema for works
lodac:Work

Genre
Type of cultural assets
Creator
National...
Linked Open Data for ACademia

Integrating Data

Integrate

Raw Data for entities

Minimum Data to identify entities

Raw ...
Linked Open Data for ACademia
Integrate
Integrate Item

Integrating Data
Source
A.Japanese Art Thesaurus

Amount
of Data

...
Linked Open Data for ACademia
Publish

Publishing data as RDF
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="h...
Linked Open Data for ACademia
Use

Yokohama Art Spot

LODAC Museum × Yokohama Art LOD

– Application using
museum and loca...
Linked Open Data for ACademia

System Architecture

Use

‣ Python + SPARQLWrapper
‣ Geolocation

Yokohama
Art LOD

PinQA

...
Linked Open Data for ACademia

Conclusion
• Data and Web
– Great Potential!

• Linked Data - Exploit the power of Web –
– ...
Upcoming SlideShare
Loading in...5
×

Introduction of Linked Data for Science

542

Published on

Presented at 2013 International Conference on Open Data in Biodiversity and Ecological Research, 20 November, 2013

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
542
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Introduction of Linked Data for Science"

  1. 1. Linked Open Data for ACademia Introduction of Linked Data for Science Hideaki Takeda takeda@nii.ac.jp / ORCID:0000-0002-2909-7163 Professor, National Institute of Informatics 2013 International Conference on Open Data in Biodiversity and Ecological Research, 20 November, 2013
  2. 2. Linked Open Data for ACademia Researchers in 1983 Survey, Research, and Writing Printed Articles Survey Article Writing Data Data Real World Object
  3. 3. Linked Open Data for ACademia Researchers in 2013distribution of articles Digital More articles ever! Sharing and re-use of data Digital Articles Printed Articles Real and Digital objects as target Survey Article Writing Digital Information Data Acquiring Data Publishing Data Data Real World Object
  4. 4. Linked Open Data for ACademia Trends of Research and Data • Rapid Growth – Increase of article publications – Big data and many (small) databases • Open and Share – Open access – Data sharing • Integration – Among different types of data – Across domains
  5. 5. Linked Open Data for ACademia Key Requirements • Accessibility – Research results must be shared • Reusability – Research results are expected to be re-used by other research • Sustainability – Research results must be preserved
  6. 6. Linked Open Data for ACademia Key Requirements • Accessibility – Research results must be shared • Reusability – Research results are expected to be re-used by other research • Sustainability – Research results must be preserved
  7. 7. Linked Open Data for ACademia Open Data • Open Data is not just “data which is open”, rather … • “A piece of data or content is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike.” http://opendefinition.org/ • Use, re-use, redistribute • Open license
  8. 8. Linked Open Data for ACademia 5 ★ Open Data - link your data to other data to provide context - use URIs to denote things, so that people can point at your stuff - use non-proprietary formats (e.g., CSV instead of Excel) - make it available as structured data (e.g., Excel instead of image scan of a table) - make your stuff available on the Web (whatever format) under an open license http://5stardata.info/
  9. 9. Linked Open Data for ACademia Linked Data/Linked Open Data (LOD) - link your data to other data to provide context - use URIs to denote things, so that people can point at your stuff
  10. 10. Linked Open Data for ACademia Web of Documents
  11. 11. Linked Open Data for ACademia Web of Data Another data to the observation Data identical to this What’s the meaning of the data? Inter-connection between data in difference data sources is enabled
  12. 12. Linked Open Data for ACademia Linked Data Principles • The four rules for Linked Data – Use URIs as names for things • Give a URI to every object in the world! – Use HTTP URIs so that people can look up those names. • Don’t use URN – When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) • Provide machine-readable data for URI – Include links to other URIs. so that they can discover more things. • Make data linked together just like Web Linked Data, TBL, http://www.w3.org/DesignIssues/LinkedData.html
  13. 13. Linked Open Data for ACademia How to express data in Linked Data • Use RDF(+RDFS, OWL) – Very simple:<Subject> <predicate> <object> . <http://www-kasm.nii.ac.jp/~takeda#me> rdfs:type foaf:Person . <http://www-kasm.nii.ac.jp/~takeda#me> foaf:name “Hideaki Takeda” . <http://www-kasm.nii.ac.jp/~takeda#me> foaf:gender “male” . <http://www-kasm.nii.ac.jp/~takeda#me> foaf:knows <http://southampton.rkbexplorer.com/id/person07113> . foaf:Person rdfs:type http://www-kasm.nii.ac.jp/ ~takeda#me foaf:knows foaf:name “Hideaki Takeda” foaf:gender “male” http://southampton.rkbexplorer.com /id/person07113
  14. 14. Linked Open Data for ACademia Linked Dataの記述 foaf:Person rdfs:type http://www-kasm.nii.ac.jp/ ~takeda#me foaf:knows foaf:name foaf:gender “Hideaki Takeda” “male” http://southampton.rkbexplorer.com/ id/person-07113 owl:sameAs dbpprop:occupation dbpedia:Computer_scientist <http://dbpedia.org/resource/Tim_Berners-Lee> dbpprop:name “Sir Tim Berners-Lee” dbpprop:birthPlace “London, England” dbpprop:birthDate “1955-06-08”
  15. 15. Linked Open Data for ACademia Linking Open Data (LOD) • • • • • The project to collect published Linked Data Major Linked Data (Translated from the original resources) – Dbpedia (Wikipedia) 270 Million Triples – Geonames:Geo names and their latitudes and longitudes, 93 Million Triples – MusicBrainz:Music – WordNet:Dictionary – DBLP bibliography:Bibliography for technical papers. 28 Million Triples – US Census Data: 1 Billion Triples (Crawling) – FOAF (Friend Of A Friend) (Wrapper) – Flickr Wrapper
  16. 16. Linked Open Data for ACademia
  17. 17. Linked Open Data for ACademia
  18. 18. Linked Open Data for ACademia LOD Cloud (Linking Open Data)
  19. 19. Linked Open Data for ACademia Benefits of LOD for Science • Truly de-centralized database – No need for central database – Everyone can create one and join the cloud! • Truly open and sharable data and schemata – Easy for re-use and mash-up – Easy for cross-domain/discipline use and connection • A single format for all kind of data – Easy for data processing
  20. 20. Linked Open Data for ACademia Bio2RDF At the heart of Linked Data for the Life Sciences • Bio2RDF is an open source framework to produce and provide biological linked data that uses simple conventions on the emerging semantic web • Bio2RDF reduces the time and effort involved in data integration so that you can get to doing science • 19 datasets; 1,010,758,291 triples http://bio2rdf.org/
  21. 21. Linked Open Data for ACademia Alison Callahan, José Cruz-Toledo, Peter Ansell, Michel Dumontier: Bio2RDF Release 2: Improved Coverage, Interoperability and Provenance of Life Science Linked Data, The Semantic Web: Semantics and Big Data, Lecture Notes in Computer Science Volume 7882, 2013, pp 200-212
  22. 22. Linked Open Data for ACademia Bio2RDF
  23. 23. Linked Open Data for ACademia LODAC Location: Integration of location information LODAC Project - connecting academic data LODAC SPECIES: Connecting species data by name Specimen DB Species Info. DB App. for query expansion DBPedia Japanese Research GBIF Taxon Name DB DB BioSci. No. of Names: 113118 No. of Triples:14,532,449 DB LODAC Museum: LOD of data in museums Raw Data for entities Minimum Data to identify entities Data for entities Raw Data from Source A Integrated data Data from Source B Work dc:references dc:references crm:P55_has_current_location crm:P55_has_current_location dc:creator dc:creator dc:creator Museum crm:P55_has_current_location dc:references dc:references Creator dc:references dc:references CKAN Japanese: Catalog for Open Data
  24. 24. Linked Open Data for ACademia LODAC SPECIES: Linking Species Information with names Museum Specimen DB Species Info. DB Research DB GBIF Taxon Name LOD BioSci. DB No. of Species Names:113118 No. of Triples:14,532,449
  25. 25. Linked Open Data for ACademia Data model for intergration TaxonName rdfs:subClassOf rdfs:subClassOf CommonName rdf:type ScientificName rdf:type TaxonRank rdf:type rdf:type rdf:type hasTaxonRank hasCommonName hasScientificName hasSuperTaxon species species Butterfly hasTaxonRank BDLS collectedDate dcterms:source crm:has_current_location collectionLocality institutionName dcterms:publisher rdf:type Specimen : owl:Class : Named Graph Bryophytes
  26. 26. Linked Open Data for ACademia Search application with LODAC SPECIES http://lod.ac/apps/lsdcs
  27. 27. Linked Open Data for ACademia LODAC Museum • Integrated database for information on museums in Japan Type of Information – Data • No. of museums:114 • No. of triples: 40,059,131 RDF type No. of items Collections (total) lodac:Specimen + lodac:Work ca. 1,770,000 Collections (specimen) lodac:Specimen ca. 1,690,000 Collections (creative and historical work) lodac:Work ca. 130,000 Creators foaf:Person ca. Institutes Foaf:Organization ca. 200,000 • Integration by creator, work and institute • Data publication by RDF • Some applications using the data 8,800
  28. 28. Linked Open Data for ACademia Integrated data processing by RDF Collect Refine Integrate Publish Use Processed by RDF • • • • • Collect:RDF by converting RDB / by scraping Web Refine: Define schema and covert data by schema Integrate: Schema mapping, ID mapping Publish: Dump data / SPARQL Endpoint Use: Mash-up applications
  29. 29. Linked Open Data for ACademia Collect Extracting collection data from museum websites Extract Property Value Property Value
  30. 30. Linked Open Data for ACademia Dataset Collect Type Art work (lodac:Work) No. Data source ca.80,000 Catalog of the collections of 3 National Art Museum (25,180), National Museum of Western Art (4,373), Tokushima Pref. Art Museum (18,482) … over 100 museums Database for National Treasure & Important Cultural Property of National Designated (915) The Japanese Art Thesaurus (266) Specimen (lodac:Speciment) Person (foaf:Person) Facilities (icls. Museum) ca.1,690,000 (100+ Museum collections) Science Net (National Science Museum) ca. 8,800 The Japanese Art Thesaurus ca. 200,000 The Japanese Art Thesaurus Cultural Heritage Online GIS data National and Regional Planning Bureau
  31. 31. Linked Open Data for ACademia Refine Standardization of data Re-organized common metadata. dc:title crm:P45_consistOf skos:preflabel Raw Data .... lodac:era Re-organized Metadata Current organized policies ・Use existing metadata ・Define own metadata. 31
  32. 32. Linked Open Data for ACademia Refine Metadata schema for works lodac:Work Genre Type of cultural assets Creator Nationality Title Title Pronunciation (yomi) Title in English Inscription Seal No. of parts Collection Created year Estimated starting year Material Property lodac:genre lodac:culturalAssets dc:creator / dc11:creator crm:P7_took_place_at dc:title / skos:prefLabel dc:title @ja-hrkt / skos:altLabel dc:title @en / skos:altLabel crm:P62I_is_depicted_by crm:P65_shows_visual_item crm:P57_has_number_of_parts dc:isPartOf dc:created lodac:estimatedStartYear dc:medium / crm:P45_consists_of
  33. 33. Linked Open Data for ACademia Integrating Data Integrate Raw Data for entities Minimum Data to identify entities Raw Data for entities Integrated data Data from Source B Data from Source A Work dc:references dc:references crm:P55_has_current_location crm:P55_has_current_location dc:creator dc:creator dc:creator crm:P55_has_current_location Museum dc:references dc:references Creator dc:references dc:references
  34. 34. Linked Open Data for ACademia Integrate Integrate Item Integrating Data Source A.Japanese Art Thesaurus Amount of Data 648 Facilities 77 B.Cultural Heritage Online Title of important cultural properties Creator information and Work Title Integration Data A.Japanese Art Thesaurus (Art work) 915 3,800 74 B.DB for National Treasure (Art work) 10,115 A.Japanese Art Thesaurus (Creator) 1,332 15,020 B.All of art work (Work title string) 61,861 A.Japanese Art Thesaurus (Creator) 1,332 Creator name 615 B.All of art work title(using creator name) 61,861 34
  35. 35. Linked Open Data for ACademia Publish Publishing data as RDF <?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:lodac="http://lod.a c/ns/lodac#" xmlns:dc="http://purl.org/dc/terms/" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:skos="http://www.w3.org /2004/02/skos/core#"> <foaf:Person rdf:about="http://lod.ac/id/359"> <lodac:creates rdf:resource="http://lod.ac/id/20029"/> ID-resource URI (Own address) http://lod.ac/id/359 Links to her/his work URI <lodac:creates rdf:resource="http://lod.ac/id/20128"/> <lodac:creates rdf:resource="http://lod.ac/id/20755"/> External link DBpedia Japanese <lodac:creates rdf:resource="http://lod.ac/id/24768"/> <lodac:creates rdf:resource="http://lod.ac/id/26732"/> …… <dc:references rdf:resource="http://ja.dbpedia.org/resource/下村観山"/> <dc:references rdf:resource="http://lod.ac/ref/359"/> <rdfs:label xml:lang="ja">下村観山</rdfs:label> <skos:prefLabel xml:lang="ja">下村観山</skos:prefLabel> <foaf:name xml:lang="ja">下村観山</foaf:name> </foaf:Person> Ref-resource URI http://lod.ac/ref/359
  36. 36. Linked Open Data for ACademia Use Yokohama Art Spot LODAC Museum × Yokohama Art LOD – Application using museum and local data – Data related to art in Yokohama • Collections • Events • Q&A http://lod.ac/apps/yas/ × PinQA
  37. 37. Linked Open Data for ACademia System Architecture Use ‣ Python + SPARQLWrapper ‣ Geolocation Yokohama Art LOD PinQA Question User JSON SPARQL Yokohama Art Spot LODAC Museum Work Event Answer Artist Institution Artist Institution
  38. 38. Linked Open Data for ACademia Conclusion • Data and Web – Great Potential! • Linked Data - Exploit the power of Web – – Simple Structure: URI and RDF – Truly distributed data management – Easy to link to each other – Suitable for inter-disciplinary areas • Left Issues – Scalability – Sustainability • DOI: DataCite • ORCID
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×