Successfully reported this slideshow.
Your SlideShare is downloading. ×

Facilitating Data Curation: a Solution Developed in the Toxicology Domain

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 18 Ad

Facilitating Data Curation: a Solution Developed in the Toxicology Domain

Download to read offline

Christophe Debruyne, Jonathan Riggio, Emma Gustafson, Declan O'Sullivan, Mathieu Vinken, Tamara Vanhaecke, Olga De Troyer.

Presented at the 2020 IEEE 14th International Conference on Semantic Computing, San Diego, California, 3-5 February 2020

Toxicology aims to understand the adverse effects of
chemical compounds or physical agents on living organisms. For
chemicals, much information regarding safety testing of cosmetic
ingredients is now scattered in a plethora of safety evaluation
reports. Toxicologists in our university intend to collect this
information into a single repository. Their current approach uses
spreadsheets, does not scale well, and makes data curation and
querying cumbersome. Semantic technologies (e.g., RDF, OWL,
and Linked Data principles) would be more appropriate for
this purpose. However, this technology is not very accessible to
toxicologists without extensive training. In this paper, we report
on a tool that supports subject matter experts in the construction
of an RDF–based knowledge base for the toxicology domain. The
tool is using the jigsaw metaphor for guiding the subject matter
experts. We demonstrate that the jigsaw metaphor is a viable
option for generating RDF. Future work includes investigating
appropriate methods and tools for knowledge evolution and data
analysis.

Christophe Debruyne, Jonathan Riggio, Emma Gustafson, Declan O'Sullivan, Mathieu Vinken, Tamara Vanhaecke, Olga De Troyer.

Presented at the 2020 IEEE 14th International Conference on Semantic Computing, San Diego, California, 3-5 February 2020

Toxicology aims to understand the adverse effects of
chemical compounds or physical agents on living organisms. For
chemicals, much information regarding safety testing of cosmetic
ingredients is now scattered in a plethora of safety evaluation
reports. Toxicologists in our university intend to collect this
information into a single repository. Their current approach uses
spreadsheets, does not scale well, and makes data curation and
querying cumbersome. Semantic technologies (e.g., RDF, OWL,
and Linked Data principles) would be more appropriate for
this purpose. However, this technology is not very accessible to
toxicologists without extensive training. In this paper, we report
on a tool that supports subject matter experts in the construction
of an RDF–based knowledge base for the toxicology domain. The
tool is using the jigsaw metaphor for guiding the subject matter
experts. We demonstrate that the jigsaw metaphor is a viable
option for generating RDF. Future work includes investigating
appropriate methods and tools for knowledge evolution and data
analysis.

Advertisement
Advertisement

More Related Content

Similar to Facilitating Data Curation: a Solution Developed in the Toxicology Domain (20)

More from Christophe Debruyne (20)

Advertisement

Recently uploaded (20)

Facilitating Data Curation: a Solution Developed in the Toxicology Domain

  1. 1. Facilitating Data Curation: a Solution Developed in the Toxicology Domain Christophe Debruyne1,3, Jonathan Riggio1, Emma Gustafson2, Declan O'Sullivan3, Mathieu Vinken2, Tamara Vanhaecke2, and Olga De Troyer1 1 WISE – Vrije Universiteit Brussel, Belgium 2 IVTD – Vrije Universiteit Brussel, Belgium 3 ADAPT – Trinity College Dublin, Ireland 2020-02-05 @ ICSC
  2. 2. Context • The goal of toxicology is to understand the adverse effects of chemical compounds or physical agents on living organisms. • The IVTD Group in the VUB aims to collect safety testing data of cosmetic ingredients available in publicly available safety evaluation reports with the goal of creating a knowledge base. 2/5/20 ICSC 2020 2
  3. 3. Problem • Subject matter experts currently rely on spreadsheets, which the experts consider a “computational database” – a term that experts in this domain use for any technology and representation allowing for some data manipulation and analysis, ranging from spreadsheets to more complex (domain-specific) database technologies. • Spreadsheets, however, causes problems for: data curation, data analysis, and data exploration. 2/5/20 ICSC 2020 3
  4. 4. Problem • Semantic Technologies can overcome the challenges that come with spreadsheets, and furthermore allows data to be linked with external datasets. • But these technologies are not easily accessible to these subject matter experts [1] • How can we facilitate subject matter experts in the domain of toxicology in the creation of a knowledge base for the available safety evaluation reports? 2/5/20 ICSC 2020 4
  5. 5. Structure of a Spreadsheet 2/5/20 ICSC 2020 5
  6. 6. Structure of a Spreadsheet 2/5/20 ICSC 2020 6
  7. 7. Proposed Solution: Using the Jigsaw Metaphor to Guide Experts in Creating Resources • An interface metaphor [2] is drawing upon the knowledge of familiar concepts to facilitate learning and using a system. • The Jigsaw Metaphor is drawing upon one’s familiarity with jigsaw puzzle pieces, and has been successful for other tasks: programming, querying, data integration,… • In our proposed solution, the Jigsaw puzzle pieces guide subject matter experts in creating valid data entries. 2/5/20 ICSC 2020 7
  8. 8. The Prototype 2/5/20 ICSC 2020 8 The prototype is built on top of Google Blockly for the metaphor and Apache Jena for the knowledge base.
  9. 9. The Prototype 2/5/20 ICSC 2020 9
  10. 10. Knowledge Organization 2/5/20 ICSC 2020 10 ns:Test a owl:Class . ns:SkinAbsorptionInVitroNonOECD a owl:Class . ns:year a owl:DatatypeProperty ; owl:range xsd:string . ... Ontology Stored as an OWL file on the Web with the namespace as the location (URL)
  11. 11. Knowledge Organization 2/5/20 ICSC 2020 11 ... ns:attribute [ ns:predicate ns:year ; ns:order "C" ] ; ... Graph ns:SkinAbsorptionInVitroNonOECD... ns:attribute [ ns:predicate ns:year ; ns:order "C" ] ; ... Graph ns:SkinAbsorptionInVitroNonOECDns:Test a owl:Class . ns:SkinAbsorptionInVitroNonOECD a owl:Class . ns:year a owl:DatatypeProperty ; owl:range xsd:string . ... Ontology Representation: attributes and attribute groups ... ns:attribute [ ns:predicate ns:year ; ns:order "C" ] ; ... Graph ns:SkinAbsorptionInVitroNonOECD Stored as an OWL file on the Web with the namespace as the location (URL) Each graph stored as a named graph in triplestore 1
  12. 12. Knowledge Organization 2/5/20 ICSC 2020 12 <http://.../dossier/7> a ns:Report ; ns:contains [ a ns:SkinAbsorptionInVitroNonOECD ; ns:year "ND"^^xsd:string ; ... ] ; ... . Graph <http://.../dossier/7> <http://.../dossier/7> a ns:Report ; ns:contains [ a ns:SkinAbsorptionInVitroNonOECD ; ns:year "ND"^^xsd:string ; ... ] ; ... . Graph <http://.../dossier/7> <http://.../dossier/7> a ns:Report ; ns:contains [ a ns:SkinAbsorptionInVitroNonOECD ; ns:year "ND"^^xsd:string ; ... ] ; ... . Graph <http://.../dossier/7> ... ns:attribute [ ns:predicate ns:year ; ns:order "C" ] ; ... Graph ns:SkinAbsorptionInVitroNonOECD... ns:attribute [ ns:predicate ns:year ; ns:order "C" ] ; ... Graph ns:SkinAbsorptionInVitroNonOECDns:Test a owl:Class . ns:SkinAbsorptionInVitroNonOECD a owl:Class . ns:year a owl:DatatypeProperty ; owl:range xsd:string . ... Ontology Representation: attributes and attribute groups Instances of reports and studies ... ns:attribute [ ns:predicate ns:year ; ns:order "C" ] ; ... Graph ns:SkinAbsorptionInVitroNonOECD Stored as an OWL file on the Web with the namespace as the location (URL) Each graph stored as a named graph in triplestore 1 Each graph stored as a named graph in triplestore 2
  13. 13. Knowledge Organization 2/5/20 ICSC 2020 13 <http://.../dossier/7> a ns:Report ; ns:contains [ a ns:SkinAbsorptionInVitroNonOECD ; ns:year "ND"^^xsd:string ; ... ] ; ... . Graph <http://.../dossier/7> <http://.../dossier/7> a ns:Report ; ns:contains [ a ns:SkinAbsorptionInVitroNonOECD ; ns:year "ND"^^xsd:string ; ... ] ; ... . Graph <http://.../dossier/7> <http://.../dossier/7> a ns:Report ; ns:contains [ a ns:SkinAbsorptionInVitroNonOECD ; ns:year "ND"^^xsd:string ; ... ] ; ... . Graph <http://.../dossier/7> ... ns:attribute [ ns:predicate ns:year ; ns:order "C" ] ; ... Graph ns:SkinAbsorptionInVitroNonOECD... ns:attribute [ ns:predicate ns:year ; ns:order "C" ] ; ... Graph ns:SkinAbsorptionInVitroNonOECDns:Test a owl:Class . ns:SkinAbsorptionInVitroNonOECD a owl:Class . ns:year a owl:DatatypeProperty ; owl:range xsd:string . ... Ontology Representation: attributes and attribute groups Instances of reports and studies ... ns:attribute [ ns:predicate ns:year ; ns:order "C" ] ; ... Graph ns:SkinAbsorptionInVitroNonOECD Stored as an OWL file on the Web with the namespace as the location (URL) Each graph stored as a named graph in triplestore 1 Each graph stored as a named graph in triplestore 2
  14. 14. Knowledge Organization 2/5/20 ICSC 2020 14 <http://.../dossier/7> a ns:Report ; ns:contains [ a ns:SkinAbsorptionInVitroNonOECD ; ns:year "ND"^^xsd:string ; ... ] ; ... . Graph <http://.../dossier/7> <http://.../dossier/7> a ns:Report ; ns:contains [ a ns:SkinAbsorptionInVitroNonOECD ; ns:year "ND"^^xsd:string ; ... ] ; ... . Graph <http://.../dossier/7> <http://.../dossier/7> a ns:Report ; ns:contains [ a ns:SkinAbsorptionInVitroNonOECD ; ns:year "ND"^^xsd:string ; ... ] ; ... . Graph <http://.../dossier/7> ... ns:attribute [ ns:predicate ns:year ; ns:order "C" ] ; ... Graph ns:SkinAbsorptionInVitroNonOECD... ns:attribute [ ns:predicate ns:year ; ns:order "C" ] ; ... Graph ns:SkinAbsorptionInVitroNonOECDns:Test a owl:Class . ns:SkinAbsorptionInVitroNonOECD a owl:Class . ns:year a owl:DatatypeProperty ; owl:range xsd:string . ... Ontology Representation: attributes and attribute groups Instances of reports and studies ... ns:attribute [ ns:predicate ns:year ; ns:order "C" ] ; ... Graph ns:SkinAbsorptionInVitroNonOECD Stored as an OWL file on the Web with the namespace as the location (URL) Each graph stored as a named graph in triplestore 1 Used for generating jigsaw pieces Composed reports used for generating RDF Experts ”assemble” reports Each graph stored as a named graph in triplestore 2
  15. 15. Representing the Structure of Puzzle Pieces • ns:attributeGroup relating instances of ns:Study or ns:AttributeGroup to instances of ns:AttributeGroup; • ns:attribute relating instances of ns:Study and ns:AttributeGroup to instances of ns:Attribute; • ns:predicate relates an ns:Attribute to a predicate from the ontology; • ns:order and ns:color. 2/5/20 ICSC 2020 15
  16. 16. Generating RDF Blockly.Blocks['...#SkinAbsorptionInVitroNonOECD-Endpoints'] = { init: function() { this.jsonInit({ "type": "...#SkinAbsorptionInVitroNonOECD-Endpoints", "message0": 'Endpoints', "message1": 'methodOfAnalysis %1', "args1": [{ "type": "field_input", "name": "http://ontologies.vub.be/oecd#methodOfAnalysis" }], // OMMITTED FOR BREVITY "output": "...#SkinAbsorptionInVitroNonOECD-Endpoints", "colour": 202 }); } }; 2/5/20 ICSC 2020 16 IRIs of the ontology are encoded in the block definitions. Blockly XML is then transformed into RDF
  17. 17. Conclusions and Future Work • We proposed the adoption of a jigsaw metaphor to facilitate data curation by subject matter experts. • Future work is twofold: – Use of the tool by subject matter experts for foreseen in 2020 – Adoption of the jigsaw metaphor defining and maintaining the various blocks 2/5/20 ICSC 2020 17
  18. 18. References 1. L. McKenna, C. Debruyne, and D. O'Sullivan, "Understanding the position of information professionals with regards to linked data: A survey of libraries, archives and museums,'' in Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, JCDL 2018, Fort Worth, TX, USA, June 03-07, 2018, J. Chen et al., Eds. ACM, 2018, pp. 7—16. 2. T. D. Erickson, "Working with interface metaphors,'' in Readings in Human- Computer Interaction. Elsevier, 1995, pp. 147—151. 2/5/20 ICSC 2020 18

×