Introduction of Linked Data for Science
Upcoming SlideShare
Loading in...5
×
 

Introduction of Linked Data for Science

on

  • 513 views

Presented at 2013 International Conference on Open Data in Biodiversity and Ecological Research, 20 November, 2013

Presented at 2013 International Conference on Open Data in Biodiversity and Ecological Research, 20 November, 2013

Statistics

Views

Total Views
513
Views on SlideShare
512
Embed Views
1

Actions

Likes
1
Downloads
1
Comments
0

1 Embed 1

https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Introduction of Linked Data for Science Introduction of Linked Data for Science Presentation Transcript

  • Linked Open Data for ACademia Introduction of Linked Data for Science Hideaki Takeda takeda@nii.ac.jp / ORCID:0000-0002-2909-7163 Professor, National Institute of Informatics 2013 International Conference on Open Data in Biodiversity and Ecological Research, 20 November, 2013
  • Linked Open Data for ACademia Researchers in 1983 Survey, Research, and Writing Printed Articles Survey Article Writing Data Data Real World Object
  • Linked Open Data for ACademia Researchers in 2013distribution of articles Digital More articles ever! Sharing and re-use of data Digital Articles Printed Articles Real and Digital objects as target Survey Article Writing Digital Information Data Acquiring Data Publishing Data Data Real World Object View slide
  • Linked Open Data for ACademia Trends of Research and Data • Rapid Growth – Increase of article publications – Big data and many (small) databases • Open and Share – Open access – Data sharing • Integration – Among different types of data – Across domains View slide
  • Linked Open Data for ACademia Key Requirements • Accessibility – Research results must be shared • Reusability – Research results are expected to be re-used by other research • Sustainability – Research results must be preserved
  • Linked Open Data for ACademia Key Requirements • Accessibility – Research results must be shared • Reusability – Research results are expected to be re-used by other research • Sustainability – Research results must be preserved
  • Linked Open Data for ACademia Open Data • Open Data is not just “data which is open”, rather … • “A piece of data or content is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike.” http://opendefinition.org/ • Use, re-use, redistribute • Open license
  • Linked Open Data for ACademia 5 ★ Open Data - link your data to other data to provide context - use URIs to denote things, so that people can point at your stuff - use non-proprietary formats (e.g., CSV instead of Excel) - make it available as structured data (e.g., Excel instead of image scan of a table) - make your stuff available on the Web (whatever format) under an open license http://5stardata.info/
  • Linked Open Data for ACademia Linked Data/Linked Open Data (LOD) - link your data to other data to provide context - use URIs to denote things, so that people can point at your stuff
  • Linked Open Data for ACademia Web of Documents
  • Linked Open Data for ACademia Web of Data Another data to the observation Data identical to this What’s the meaning of the data? Inter-connection between data in difference data sources is enabled
  • Linked Open Data for ACademia Linked Data Principles • The four rules for Linked Data – Use URIs as names for things • Give a URI to every object in the world! – Use HTTP URIs so that people can look up those names. • Don’t use URN – When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) • Provide machine-readable data for URI – Include links to other URIs. so that they can discover more things. • Make data linked together just like Web Linked Data, TBL, http://www.w3.org/DesignIssues/LinkedData.html
  • Linked Open Data for ACademia How to express data in Linked Data • Use RDF(+RDFS, OWL) – Very simple:<Subject> <predicate> <object> . <http://www-kasm.nii.ac.jp/~takeda#me> rdfs:type foaf:Person . <http://www-kasm.nii.ac.jp/~takeda#me> foaf:name “Hideaki Takeda” . <http://www-kasm.nii.ac.jp/~takeda#me> foaf:gender “male” . <http://www-kasm.nii.ac.jp/~takeda#me> foaf:knows <http://southampton.rkbexplorer.com/id/person07113> . foaf:Person rdfs:type http://www-kasm.nii.ac.jp/ ~takeda#me foaf:knows foaf:name “Hideaki Takeda” foaf:gender “male” http://southampton.rkbexplorer.com /id/person07113
  • Linked Open Data for ACademia Linked Dataの記述 foaf:Person rdfs:type http://www-kasm.nii.ac.jp/ ~takeda#me foaf:knows foaf:name foaf:gender “Hideaki Takeda” “male” http://southampton.rkbexplorer.com/ id/person-07113 owl:sameAs dbpprop:occupation dbpedia:Computer_scientist <http://dbpedia.org/resource/Tim_Berners-Lee> dbpprop:name “Sir Tim Berners-Lee” dbpprop:birthPlace “London, England” dbpprop:birthDate “1955-06-08”
  • Linked Open Data for ACademia Linking Open Data (LOD) • • • • • The project to collect published Linked Data Major Linked Data (Translated from the original resources) – Dbpedia (Wikipedia) 270 Million Triples – Geonames:Geo names and their latitudes and longitudes, 93 Million Triples – MusicBrainz:Music – WordNet:Dictionary – DBLP bibliography:Bibliography for technical papers. 28 Million Triples – US Census Data: 1 Billion Triples (Crawling) – FOAF (Friend Of A Friend) (Wrapper) – Flickr Wrapper
  • Linked Open Data for ACademia
  • Linked Open Data for ACademia
  • Linked Open Data for ACademia LOD Cloud (Linking Open Data)
  • Linked Open Data for ACademia Benefits of LOD for Science • Truly de-centralized database – No need for central database – Everyone can create one and join the cloud! • Truly open and sharable data and schemata – Easy for re-use and mash-up – Easy for cross-domain/discipline use and connection • A single format for all kind of data – Easy for data processing
  • Linked Open Data for ACademia Bio2RDF At the heart of Linked Data for the Life Sciences • Bio2RDF is an open source framework to produce and provide biological linked data that uses simple conventions on the emerging semantic web • Bio2RDF reduces the time and effort involved in data integration so that you can get to doing science • 19 datasets; 1,010,758,291 triples http://bio2rdf.org/
  • Linked Open Data for ACademia Alison Callahan, José Cruz-Toledo, Peter Ansell, Michel Dumontier: Bio2RDF Release 2: Improved Coverage, Interoperability and Provenance of Life Science Linked Data, The Semantic Web: Semantics and Big Data, Lecture Notes in Computer Science Volume 7882, 2013, pp 200-212
  • Linked Open Data for ACademia Bio2RDF
  • Linked Open Data for ACademia LODAC Location: Integration of location information LODAC Project - connecting academic data LODAC SPECIES: Connecting species data by name Specimen DB Species Info. DB App. for query expansion DBPedia Japanese Research GBIF Taxon Name DB DB BioSci. No. of Names: 113118 No. of Triples:14,532,449 DB LODAC Museum: LOD of data in museums Raw Data for entities Minimum Data to identify entities Data for entities Raw Data from Source A Integrated data Data from Source B Work dc:references dc:references crm:P55_has_current_location crm:P55_has_current_location dc:creator dc:creator dc:creator Museum crm:P55_has_current_location dc:references dc:references Creator dc:references dc:references CKAN Japanese: Catalog for Open Data
  • Linked Open Data for ACademia LODAC SPECIES: Linking Species Information with names Museum Specimen DB Species Info. DB Research DB GBIF Taxon Name LOD BioSci. DB No. of Species Names:113118 No. of Triples:14,532,449
  • Linked Open Data for ACademia Data model for intergration TaxonName rdfs:subClassOf rdfs:subClassOf CommonName rdf:type ScientificName rdf:type TaxonRank rdf:type rdf:type rdf:type hasTaxonRank hasCommonName hasScientificName hasSuperTaxon species species Butterfly hasTaxonRank BDLS collectedDate dcterms:source crm:has_current_location collectionLocality institutionName dcterms:publisher rdf:type Specimen : owl:Class : Named Graph Bryophytes
  • Linked Open Data for ACademia Search application with LODAC SPECIES http://lod.ac/apps/lsdcs
  • Linked Open Data for ACademia LODAC Museum • Integrated database for information on museums in Japan Type of Information – Data • No. of museums:114 • No. of triples: 40,059,131 RDF type No. of items Collections (total) lodac:Specimen + lodac:Work ca. 1,770,000 Collections (specimen) lodac:Specimen ca. 1,690,000 Collections (creative and historical work) lodac:Work ca. 130,000 Creators foaf:Person ca. Institutes Foaf:Organization ca. 200,000 • Integration by creator, work and institute • Data publication by RDF • Some applications using the data 8,800
  • Linked Open Data for ACademia Integrated data processing by RDF Collect Refine Integrate Publish Use Processed by RDF • • • • • Collect:RDF by converting RDB / by scraping Web Refine: Define schema and covert data by schema Integrate: Schema mapping, ID mapping Publish: Dump data / SPARQL Endpoint Use: Mash-up applications
  • Linked Open Data for ACademia Collect Extracting collection data from museum websites Extract Property Value Property Value
  • Linked Open Data for ACademia Dataset Collect Type Art work (lodac:Work) No. Data source ca.80,000 Catalog of the collections of 3 National Art Museum (25,180), National Museum of Western Art (4,373), Tokushima Pref. Art Museum (18,482) … over 100 museums Database for National Treasure & Important Cultural Property of National Designated (915) The Japanese Art Thesaurus (266) Specimen (lodac:Speciment) Person (foaf:Person) Facilities (icls. Museum) ca.1,690,000 (100+ Museum collections) Science Net (National Science Museum) ca. 8,800 The Japanese Art Thesaurus ca. 200,000 The Japanese Art Thesaurus Cultural Heritage Online GIS data National and Regional Planning Bureau
  • Linked Open Data for ACademia Refine Standardization of data Re-organized common metadata. dc:title crm:P45_consistOf skos:preflabel Raw Data .... lodac:era Re-organized Metadata Current organized policies ・Use existing metadata ・Define own metadata. 31
  • Linked Open Data for ACademia Refine Metadata schema for works lodac:Work Genre Type of cultural assets Creator Nationality Title Title Pronunciation (yomi) Title in English Inscription Seal No. of parts Collection Created year Estimated starting year Material Property lodac:genre lodac:culturalAssets dc:creator / dc11:creator crm:P7_took_place_at dc:title / skos:prefLabel dc:title @ja-hrkt / skos:altLabel dc:title @en / skos:altLabel crm:P62I_is_depicted_by crm:P65_shows_visual_item crm:P57_has_number_of_parts dc:isPartOf dc:created lodac:estimatedStartYear dc:medium / crm:P45_consists_of
  • Linked Open Data for ACademia Integrating Data Integrate Raw Data for entities Minimum Data to identify entities Raw Data for entities Integrated data Data from Source B Data from Source A Work dc:references dc:references crm:P55_has_current_location crm:P55_has_current_location dc:creator dc:creator dc:creator crm:P55_has_current_location Museum dc:references dc:references Creator dc:references dc:references
  • Linked Open Data for ACademia Integrate Integrate Item Integrating Data Source A.Japanese Art Thesaurus Amount of Data 648 Facilities 77 B.Cultural Heritage Online Title of important cultural properties Creator information and Work Title Integration Data A.Japanese Art Thesaurus (Art work) 915 3,800 74 B.DB for National Treasure (Art work) 10,115 A.Japanese Art Thesaurus (Creator) 1,332 15,020 B.All of art work (Work title string) 61,861 A.Japanese Art Thesaurus (Creator) 1,332 Creator name 615 B.All of art work title(using creator name) 61,861 34
  • Linked Open Data for ACademia Publish Publishing data as RDF <?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:lodac="http://lod.a c/ns/lodac#" xmlns:dc="http://purl.org/dc/terms/" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:skos="http://www.w3.org /2004/02/skos/core#"> <foaf:Person rdf:about="http://lod.ac/id/359"> <lodac:creates rdf:resource="http://lod.ac/id/20029"/> ID-resource URI (Own address) http://lod.ac/id/359 Links to her/his work URI <lodac:creates rdf:resource="http://lod.ac/id/20128"/> <lodac:creates rdf:resource="http://lod.ac/id/20755"/> External link DBpedia Japanese <lodac:creates rdf:resource="http://lod.ac/id/24768"/> <lodac:creates rdf:resource="http://lod.ac/id/26732"/> …… <dc:references rdf:resource="http://ja.dbpedia.org/resource/下村観山"/> <dc:references rdf:resource="http://lod.ac/ref/359"/> <rdfs:label xml:lang="ja">下村観山</rdfs:label> <skos:prefLabel xml:lang="ja">下村観山</skos:prefLabel> <foaf:name xml:lang="ja">下村観山</foaf:name> </foaf:Person> Ref-resource URI http://lod.ac/ref/359
  • Linked Open Data for ACademia Use Yokohama Art Spot LODAC Museum × Yokohama Art LOD – Application using museum and local data – Data related to art in Yokohama • Collections • Events • Q&A http://lod.ac/apps/yas/ × PinQA
  • Linked Open Data for ACademia System Architecture Use ‣ Python + SPARQLWrapper ‣ Geolocation Yokohama Art LOD PinQA Question User JSON SPARQL Yokohama Art Spot LODAC Museum Work Event Answer Artist Institution Artist Institution
  • Linked Open Data for ACademia Conclusion • Data and Web – Great Potential! • Linked Data - Exploit the power of Web – – Simple Structure: URI and RDF – Truly distributed data management – Easy to link to each other – Suitable for inter-disciplinary areas • Left Issues – Scalability – Sustainability • DOI: DataCite • ORCID