Your SlideShare is downloading. ×

Linking Scientific Metadata (presented at DC2010)

1,493

Published on

Linked entity data in metadata records builds a foundation for semantic web. Even though metadata records contain rich entity data, there is no linking between associated entities such as persons, …

Linked entity data in metadata records builds a foundation for semantic web. Even though metadata records contain rich entity data, there is no linking between associated entities such as persons, datasets, projects, publications, or organizations. We conducted a small experiment using the dataset collection from the Hubbard Brook Ecosystem Study (HBES), in which we converted the entities and their relationships into RDF triples and linked the URIs contained in RDF triples to the corresponding entities in the Ecological Metadata Language (EML) records. Through the transformation program written in XML Stylesheet Language (XSL), we turned a plain EML record display into an interlinked semantic web of ecological datasets. The experiment suggests a methodological feasibility in incorporating linked entity data into metadata records. The paper also argues for the need of changing the scientific as well as general metadata paradigm.

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,493
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
19
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. School of Information Studies Syracuse University Linking Entities in Scientific Metadata Jian Qin, Miao Chen, Xiaozhong Liu, & Andrea Wiggins School of Information Studies, Syracuse University
  • 2. The context: Islands of research information 1/30/2015 Linking Entities in Scientific Metadata -- DC2010 2 Data Projects Publications Research interest Researchers
  • 3. Unlinked entities Same entity! 1/30/2015 3Linking Entities in Scientific Metadata -- DC2010
  • 4. Duplication of entity data entry 1/30/2015 Linking Entities in Scientific Metadata -- DC2010 4 Seamless Daily Precipitation for the Conterminous United States Metadata: Identification_Information Data_Quality_Information Spatial_Data_Organization_Information Spatial_Reference_Information Entity_and_Attribute_Information Distribution_Information Metadata_Reference_Information
  • 5. What’s lacking in scientific metadata? • Standards focus on describing datasets, not entities • No mechanism is provided for linking entities – It is considered as an implementation issue • Islands of entities  duplication of data entry for the same entity – Increased costs and time in creating metadata – Effect in resource discovery and browse 1/30/2015 5Linking Entities in Scientific Metadata -- DC2010
  • 6. Defining the research Problem 1/30/2015 6Linking Entities in Scientific Metadata -- DC2010 How can we build an interlinked network of entities for a scientific domain? How can we associate the linked entities with their corresponding metadata records?
  • 7. Linked Data: A solution 1/30/2015 7Linking Entities in Scientific Metadata -- DC2010 Relational database containing entities and relationships Metadata records in XML format Problem: Lack relationships between entities Problem: Not related to metadata records Resource PropertyType Value RDF Triples Convert to Embed RDF triples into
  • 8. Linked data: How it works 1/30/2015 Linking Entities in Scientific Metadata -- DC2010 8
  • 9. Linked data is 1/30/2015 Linking Entities in Scientific Metadata -- DC2010 9 “…a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF.” --Wikipedia, http://en.wikipedia.org/wiki/Linked_Data
  • 10. A case study 1/30/2015 10Linking Entities in Scientific Metadata -- DC2010 Dataset collection search interface at HBES (http://hubbardbrook.org/data/dataset_search.php)
  • 11. Hubbard Brook Ecosystem Study (HBES) • Long term ecological research sites since 1960s • 3,160 hectare reserve • Six principle organizations & 10 other participants: – USDA Forest Service – Cornell – Dartmouth – Syracuse – Yale – the Institute of Ecosystem Studies (IES) – the U.S. Geological Survey • Over 300 datasets available and 2000 publications 1/30/2015 11Linking Entities in Scientific Metadata -- DC2010
  • 12. HBES Data Collection • Focused on entities on the HBES site: – Projects – Persons – Publications – Subject interests – Datasets – Events • Verified Person and Project information against the Long-Term Ecological Research (LTER) directory if necessary; • Stored the entities in relational database • Metadata records in EML format 1/30/2015 12Linking Entities in Scientific Metadata -- DC2010
  • 13. Ecological Metadata Language (EML) Structure and Modules 1/30/2015 Linking Entities in Scientific Metadata -- DC2010 13
  • 14. Conditions required for interlinking • URI-identified entities • Relationships between these entities • Relationships between the entities and metadata records 1/30/2015 14Linking Entities in Scientific Metadata -- DC2010
  • 15. Experiment stage 1: Data prep • Two sets of data: – Entities and their relationships • Person, subject interest, project, dataset, and paper • Many-to-many relations between the entities – Sample EML records in XML format • Downloaded from HBES website • Entity URIs added to the corresponding XML files to be used as semantic identifiers and hyperlinks to the entities • 126 XML files in total 1/30/2015 15Linking Entities in Scientific Metadata -- DC2010
  • 16. Entity relationships 1/30/2015 16Linking Entities in Scientific Metadata -- DC2010
  • 17. Experiment stage 2: Converting to RDF • Toolkit: D2R, a service for converting relational databases into RDF triples and publishing them on the web – Turn each table into a class – Turn each column as class property – Make each value in a column as an instance – Assign a URI to each class, property, and instance 1/30/2015 17Linking Entities in Scientific Metadata -- DC2010
  • 18. 1/30/2015 Linking Entities in Scientific Metadata -- DC2010 18
  • 19. Experiment stage 3: Incorporating URI into XML records • Add the URIs generated from the D2R software to their corresponding entities in EML records by using an XSL program • Transform the EML records with inserted URIs into the HTML format for display in browser 1/30/2015 19Linking Entities in Scientific Metadata -- DC2010
  • 20. Example of name with URI inserted 1/30/2015 Linking Entities in Scientific Metadata -- DC2010 20 Original EML record without URI URI added to individual name element <individualName> <givenName>Thomas G</givenName> <surName>Siccama</surName> </individualName> <individualName> <givenName>Thomas G. </givenName> <surName>Siccama</surName> <personURI>page/people/tsiccama </personURI> </individualName>
  • 21. 1/30/2015 Linking Entities in Scientific Metadata -- DC2010 21 Original display of EML record RDF-enabled display of EML record
  • 22. Discussion • Methodology for transforming islands of entities into linked scientific metadata • A larger scale data set needed to test its scalability • Potentials: – Reducing duplicate entity data entry – Applicable to legacy metadata generated using older data model – Linking semantic data already published on the web – Facilitating data/metadata visualization?? 1/30/2015 22Linking Entities in Scientific Metadata -- DC2010
  • 23. DEMO http://sdl.syr.edu/eml/ 1/30/2015 Linking Entities in Scientific Metadata -- DC2010 23

×