Olga Giraldo presented work on formalizing laboratory protocols using ontologies and natural language processing tools to extract information. The goals are to develop a recommendation system to match protocols to users' situations and enable content-based retrieval of protocol information. An ontology was created by analyzing 175 protocols to represent protocols as documents and workflows. Future work involves analyzing protocols to identify English keywords and constructs to write rules for information extraction. The purpose is intelligent extraction of protocol information.
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
SMART Protocols in LISC-2014
1. SMART Protocols: SeMAntic
RepresenTation for
Experimental Protocols
Olga Giraldo
ogiraldo@fi.upm.es
Ontology engineering group (OEG)
Universidad Politécnica de Madrid
2. Agenda
• What is a lab protocol
• Motivation
• Our general research question
• Our assumption
• Our propose
• Preliminary results
• Future work
3. What is a lab protocol
• Laboratory protocols are like cooking recipes
• They have ingredients: reagents and sample,
• They have appliances: equipment,
• They have a total time,
• They have a list of instructions,
• They have critical steps.
• The laboratory protocols are “the how to do” an
experiment.
4. Some problems in lab protocols
some of them present
insufficient granularity,
the instructions can be
imprecise or ambiguous due to
the use of natural language.
• Incubate the
centrifuge tubes in a
water bath.
• Incubate the samples
for 5 min with gentle
shaking.
• Rinse DNA briefly in
1-2 ml of wash.
• Incubate at -20C
overnight.
5. Why do we need to formalize and extract information from
lab protocols?
Because we want a recommendation system…
• That matches protocols according to my situation, for
instance
• samples I have,
• availability of equipment, reagents, lab conditions
• expertise
We also want content based information retrieval
• Meaningful sentences, sample used, purpose of the
protocol, applicability, critical steps, etc. Also,
identification of instructions
• Find all protocols for DNA extraction that have been used in
Oryza sativa that are suitable for processing a large number of
samples with a low execution time.
Motivation
7. Our assumption
“Experimental protocols
are fundamental
information structures that
should support the
description of the
processes by means of
which results are
generated in experimental
research”
9. Methods to represent and extract information
• Gazetteer-based method: use existing lists of named
entities
Lists of proper nouns, which refer to real-life entities
• Rule-based approaches: write manual extraction rules
• Combination of the above
• Ontology model representing lab protocols
12. Methodology used to develop SMART Protocols
Kick-off
• Gathering use cases.
• Gathering competency questions.
Conceptualization
&
Formalization
• DAKA - Domain Analysis and Knowledge Acquisition
Analysis of 175 experimental protocols.1
• LISA - Linguistic and Semantic Analysis
Identification of key metadata for reporting protocols,2
Determination of workflow aspects in protocols (implicit
order in the instructions, following the input output
structure.)
Extraction of elements pertaining to domain knowledge.
(e.g. classification of protocols in groups according to the
purpose. Within each group were identified basic steps (or
common patterns), according to the type of protocol.
• IO - Iterative Ontology building
Design of conceptual maps and draft ontologies. The
ontology modules were gathering from DAKA and LISA
activities and exchanged with domain experts.
Evaluation
&
Evolution
• OWL
• Correction of syntactic inconsistencies by using OWLViz3
and OOPS4
• The ontology model evolves as new knowledge goes
through the whole cycle.
1http://goo.gl/MC4mR9
2goo.gl/gAVnn
3http://protegewiki.stanford.edu/wiki/OWLViz
4http://oeg-lia3.dia.fi.upm.es/oops/index-content.jsp
13. SMART Protocols - document
The Protocol as a document
sp:application of the protocol
sp:advantage of the protocol
sp:limitation of the protocol
sp:provenance of the protocol
sp:purpose of the protocol
sp:introduction section
sp:buffer list
sp:equipment and supplies list
sp:kit list
sp:primer list
sp:reagent list
sp:software list
sp:solution list
sp:materials section
exact:caution
sp:critical step
sp:hint
sp:pause point
sp:storage condition
sp:timing
sp:troubleshooting
sp:methods section
sp:experimental
protocol
iao:document iao:document part
iao:textual entity iao:data set
owl:subClassOf
ro:hasPart
ro:partOf
owl:subClassOf
owl:subClassOfowl:subClassOf
ro:hasPart
ro:hasPart
ro:hasPart
ro:partOf
ro:partOf
ro:partOf
owl:subClassOf owl:subClassOf
exact:alert message
owl:subClassOf
It is an extension of IAO ontology.
It supports rhetorical and structural components (e.g. introduction, materials, and methods);
It supports Information like application of the protocol, advantages and limitations, list of
reagents, critical steps.
SMART Protocols ontology
is available here:
http://vocab.linkeddata.es/S
MARTProtocols/
14. SMART Protocols - wf
sp:basic step of
DNA extraction
p-plan:Step
p-plan:Variable
sp:cell disruption
sp:plant tissue
Basic Steps of DNA Extraction
sp:DNA purification
obi:DNA extract
p-plan:hasInputVariable
p-plan:hasOutputVariable
p-plan:hasOutputVariable
owl:subClassOf
sp:digestion
reaction
sp:powdered tissue
owl:subClassOf owl:subClassOf
owl:subClassOf
p-plan:hasInputVariable
sp:digested
contaminant
p-plan:hasInputVariable
p-plan:hasOutputVariable
owl:subClassOfowl:subClassOfowl:subClassOfowl:subClassOf
bfo:isPrecededBy bfo:isPrecededBy
• It is an extension of the P-Plan Ontology.
• It represents of the workflow aspects in protocols
implicit order in the instructions, following the input output structure.
SMART Protocols ontology is
available here:
http://vocab.linkeddata.es/SM
ARTProtocols/
15. New and reused terms
Resource No. of terms Resource No. of terms
OBI 15 P-Plan 3
NCIthesaurus 9 NPO 3
CHEBI 7 EXACT 2
IAO 7 SO 2
MGEDOntology 3 MeSH 1
• Reused classes = 52
• Reused properties = 4
Property Origen Reused in
isManufacturedBy OBI SMART Protocols-Document
hasInputVar P-Plan SMART Protocols-Workflow
hasOutputVar P-Plan SMART Protocols-Workflow
isStepOfPlan P-Plan SMART Protocols-Workflow
Ontology No. of classes No. of properties
SMART Protocols-Document 60 7
SMART Protocols-Workflow 44 1
Total 104 8
• New terms
17. • Analysis of the protocols. Focus on the
identification of keywords and/or constructs in
English –e.g. instructions, actions.
• Writing rules.
• Executing, testing and debugging the rules.
Work in progress
18. Summarizing…
Our purpose is the
formalization of lab
protocols by using
ontologies and NLP
tools to intelligently
extract information.
19. Special thanks…
Supervisors
Oscar Corcho Alexander Garcia
OEG’s colleagues
Daniel Garijo María Poveda Pablo Calleja Nandana
Mihindukulasooriya
Olga Giraldo
ogiraldo@fi.upm.es
oxgiraldo@gmail.com
Ontology engineering group (OEG)
Universidad Politécnica de Madrid
Editor's Notes
And as I mentioned before an experimental protocol is a how to do an experiment. For this reason our assumption is that experimental protocols are…
What do we propose?
These set of methods to represent and extract intelligent information from laboratory protocols: the first one is an ontology model…
The use of gazetteer-based method, this is a list of entities or objects from lab protocols that we like to recovery.
The manual creation of rules,
And a combination of all of these methods.
which results we have obtained
The development of two ontology modules, one of them represent the metadata to report a laboratory protocol and the another module represent the protocol as a executable element.
Currently, our ontologies reuse 52 classes from 10 ontologies. Also, reuse 4 properties form two ontologies and were proposed 104 new classes and 8 new properties. Both modules were designed in OWL, reuse the BFO ontology and in general follows the good practices recommended by OBO consortium for the design of ontologies.