SlideShare a Scribd company logo
1 of 38
The electroni
org/10.1098/
uk.
*Author for c
Received 19 A
Accepted 9 M
An ontology of scientific experiments
Larisa N. Soldatova* and Ross D. King
The University of Wales, Aberystwyth, Penglais, Ceredigion
SY23 3DB, UK
The formal description of experiments for efficient analysis,
annotation and sharing of results
is a fundamental part of the practice of science. Ontologies are
required to achieve this
objective. A few subject-specific ontologies of experiments
currently exist. However, despite
the unity of scientific experimentation, no general ontology of
experiments exists. We
propose the ontology EXPO to meet this need. EXPO links the
SUMO (the Suggested Upper
Merged Ontology) with subject-specific ontologies of
experiments by formalizing the generic
concepts of experimental design, methodology and results
representation. EXPO is expressed
in the W3C standard ontology language OWL-DL. We
demonstrate the utility of EXPO and
its ability to describe different experimental domains, by
applying it to two experiments: one
in high-energy physics and the other in phylogenetics. The use
of EXPO made the goals and
structure of these experiments more explicit, revealed
ambiguities, and highlighted an
unexpected similarity. We conclude that, EXPO is of general
value in describing experiments
and a step towards the formalization of science.
Keywords: ontology; formalization; annotation; artificial
intelligence; metadata
1. INTRODUCTION
A fundamental part of scientific practice is to increase
our knowledge of the world through the performance of
experiments. This knowledge should, ideally, be
expressed in a formal logical language. To quote the
Encyclopaedia Britannica, ‘most analytical philoso-
phers of science have explicitly based their program on
a presupposition inherited from Descartes and Plato,
viz. that the intellectual content of any natural science
can be expressed in a formal propositional system,
having a definite, essential logical structure’ (Toulmin
2004). It is possible to quibble with the restriction to
propositional systems, but the desirability of the use of
formal languages is rarely disputed in the philosophy of
science. Formal languages promote semantic clarity,
which in turn supports the free exchange of scientific
knowledge and simplifies scientific reasoning (Curd &
Cover 1988).
Now, at the beginning of the twenty-first century,
the formalization of scientific knowledge is no longer
just a philosophical desirable, it is becoming a
technological necessity. In all areas of science there is
ever more information to assimilate and, in some fields,
this increase in information has become a ‘deluge’
(Hey & Trefethon 2003). The result is that science
increasingly depends on computers to store, integrate
and analyse data. The full power of computers—which
originated as a spin-off from the formalization of
mathematics (Turing 1936)—can only be efficiently
c supplementary material is available at http://dx.doi.
rsif.2006.0134 or via http://www.journals.royalsoc.ac.
orrespondence ([email protected]).
pril 2006
ay 2006 795
exploited when the knowledge they work with is
formalized. This line of reasoning is the motivation
for the development of e-Science, with its vision of
linking papers, data, metadata and analysis methods
together (www.nesc.ac.uk). It is also the driving force
behind the development of the Semantic Web (www.
w3.org/2001/sw/).
The first step in formalizing knowledge is to define an
explicit ontology, i.e. to describe what exists. As the
most characteristic feature of science is experi-
mentation, it follows that the development of ontology
of experiments is a fundamental step in the formaliza-
tion of science. It is therefore surprising that no general-
purpose ontology of scientific experiments currently
exists. In this paper we propose the most general
elements of a common ontology of scientific experi-
ments (EXPO). We aim to formalize generic knowledge
about scientific experimental design, methodology and
results representation. Such a common ontology is both
feasible and desirable because all the sciences follow the
same experimental principles. Despite their different
subject matter, all the sciences organize, execute and
analyse experiments in similar ways; they use related
instruments and materials; they describe experimental
results in identical formats, dimensional units, etc. The
aim of EXPO is to abstract out the fundamental
concepts in formalizing experiments that are domain
independent (figure 1). The advantage of this is that
generic knowledge about experiments is held in only
one place; ensuring consistency, clean updating and
non-redundancy. The practical benefit is that if in an
experiment, multiple sciences are involved (e.g. meta-
bolomics and organic chemistry or radio astronomy and
physical chemistry), then common experimental meta-
data will only need to be recorded once rather than
J. R. Soc. Interface (2006) 3, 795–803
doi:10.1098/rsif.2006.0134
Published online 6 June 2006
q 2006 The Royal Society
http://www.nesc.ac.uk
http://www.w3.org/2001/sw/
http://www.w3.org/2001/sw/
http://dx.doi.org/doi:10.1098/rsif.2006.0134
http://dx.doi.org/doi:10.1098/rsif.2006.0134
http://www.journals.royalsoc.ac.uk
http://www.journals.royalsoc.ac.uk
SUMO
ontology of science
FuGO
upper level
intermediate level
domain level
EXPO
SubjectOfExp. ObjectOfExp.
domain model
MSI
PSIMO
Figure 1. The position of EXPO. EXPO as a part of ontology of
science is an extension of the upper ontology SUMO. EXPO can
be further extended via the classes DomainOfExperiment,
SubjectOfExperiment, ObjectOfExperiment, etc. to domain
specific
ontologies of experiments such as MO, MSI, PSI, etc.
796 An ontology of scientific experiments L. N. Soldatova and
R. D. King
multiple times. The utilization of a common standard
ontology for the annotation of scientific experiments
will make scientific knowledge more explicit, help
detect errors, promote the interchange and reliability
of experimental methods and conclusions, and remove
redundancies in domain-specific ontologies. More
generally, we envisage EXPO as a part of a general
ontology of science that would include other scientific
methods as observational, theoretical, description of
technologies, resources, etc.
Although no ontology exists that formalizes general
experimental information, several ontologies exist for
specialized experimental areas in biology (mged.source
forge.net/;psidev.sourceforge.net/) and metadata
standards are appearing in many other sciences, e.g. in
chemistry (ftp://ftp.ebi.ac.uk/pub/databases/chebi/
ontology/) and physics (www.ph.ed.ac.uk/ukqcd/com
munity/the_grid/QCDml1.1/ConfigDoc/ConfigDoc.
html). Probably the best-known attempt to formalize
the description of experiments is that developed by the
Microarray Gene Expression Society (MGED) (mged.
sourceforge.net/). The MGED Ontology (MO) was
designed to formalize the descriptors required by
minimum information about a microarray experiment
(MIAME) standard for capturing core information
about microarray experiments. MO aims to provide a
conceptual structure for microarray experiment descrip-
tions and annotation. A number of ontological develop-
ments related to MO also exist. The HUPO PSI General
J. R. Soc. Interface (2006)
Proteomics Standards and Mass Spectrometry working
groups are building an ontology that will support
proteomic experiments (psidev.sourceforge.net/). The
metabolomics standards initiative (MSI) ontology
working group is seeking to facilitate the consistent
annotation of metabolomics experiments by developing
an ontology to help enable the scientific community to
understand, interpret and integrate metabolomic
experiments (msi-ontology.sourceforge.net/index.
htm). More generally, the Functional Genomics Inves-
tigation Ontology (FuGO) is developing an integrated
ontology that provides both a set of ‘universal’ terms, i.e.
terms applicable across functional genomics and
domain-specific extensions to terms (fugo.sourceforge.
net/). Although these ontologies are making significant
contributions to the formalization of experiments in
areas of biology, they are unsuitable as a template for a
general ontology of experiments, as they are primarily
oriented to specialized biomedical domains.
2. EXPO: AN ONTOLOGY OF SCIENTIFIC
EXPERIMENTS
We follow Schulze-Kremer’s description of an ontology
as ‘a concise and unambiguous description of what
principle entities are relevant to an application domain
and the relationship between them’. EXPO is based
on ideas from the philosophy of science (logical,
probabilistic, methodological, epistemological, etc.)
http://www.mged.sourceforge.net/
http://www.mged.sourceforge.net/
http://www.psidev.sourceforge.net/
http://ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/
http://ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/
http://www.ph.ed.ac.uk/ukqcd/community/the_grid/QCDml1.1/C
onfigDoc/ConfigDoc.html
http://www.ph.ed.ac.uk/ukqcd/community/the_grid/QCDml1.1/C
onfigDoc/ConfigDoc.html
http://www.ph.ed.ac.uk/ukqcd/community/the_grid/QCDml1.1/C
onfigDoc/ConfigDoc.html
http://mged.sourceforge.net/
http://mged.sourceforge.net/
http://www.psidev.sourceforge.net/
http://www.msi-ontology.sourceforge.net/index.htm
http://www.msi-ontology.sourceforge.net/index.htm
http://www.fugo.sourceforge.net/
http://www.fugo.sourceforge.net/
ScientificExperiment
ExperimentalGoal
ExperimentalResults
ExperimentalHypothesis
ExperimentalDesign
ClassificationOfExperiments
ExplainGoal
InvestigateGoal
ConfirmGoal
ComputeGoal
ClassificationByDomain
ClassificationByModel
NLMClassification
LibraryOfCongressCl.
DDC(Dewey)Classification
One-factorExperiment
Two-factorExperiment
Multi-factorExperiment
FactReject H1>
FactSupport H1
ResulEerror
HypothesisAcceptanceM.
FalsePositive
FalseNegative
ObjectOfExperiment
SubjectOfExperiment ExperimentalModel
ExperimentalTechnology
ExperimentalEquipment
ExperimentalDesignStrategy
PlanExperimentalActions
ObjectOfAction
SubjectOfAction
ActionGoal
ActionMethod
PhysicalQuantity
TimePoint
PublicAcademicStatus PublicStatus
NormalizationStrategy
QualityControlStrategy
ComparisonControl /TargetGroups
ResearchHypothesis H1
NullHypothesis H0
AlternativeHypothesis
SubjectEffect
ObjectEffectTimeEffect
ExperimentalDesignEffect
ExperimenterBias
TestingEffect
p/o a/o
p/o p/o
p/o
p/o
p/o
p/o
p/o
p/o p/o
TargetVariable
Factor
FactorLevel
p/o
p/o
p/o
p/o
p/op/o
p/o
p/o
p/o
p/o
PhysicalExp.
GalileanExp.
Hypothesis-drivenExp.
Hypothesis-formingExp.
ComputationalExp.
BaconianExp.
ExperimentalConclusionp/o
ErrorOfConclusion
FaultyComparison
IncompleteDataError
RepresentationStyle
Program
Text
LinguisticExpression
NaturalLanguage
ArtificialLanguage
AdminInfoAboutExperiment
TitleOfExperiment
IDExperiment
Organization
Author
StatusOfExp.DocumentRegion
TimeInterval
p/o
p/o
ParedComparison
PairedComparisonSingleSampleGroups
MethodComparison SubjectComparison
BiblioReference
p/o
p/o
p/o
p/o
p/o
p/o
p/o
p/o
p/o
Figure 2. The ontology of scientific experiments (a fragment),
where p/o is a part-of relation, a/o is an attribute-of relation and
an
arrow with an empty label corresponds to is-a relation. For each
type of experiments (e.g. Galilean hypothesis-driven or
computational experiment), there is a corresponding
experimental goal: to confirm, to explain, to investigate or to
compute. At
the design stage, experimental object, equipment, experimental
actions are specified in order to achieve the experimental goal.
Experimental hypotheses are used to verify and evaluate the
experimental results (for detailed description see EXPO).
An ontology of scientific experiments L. N. Soldatova and R. D.
King 797
(Curd & Cover 1988; Toulmin 2004), the theory
of knowledge representation (Sowa 2000), the analysis
of existing ontologies (suo.ieee.org/) including bio-
ontologies (obo.sourceforge.net) and the theory of
experiment design (Fisher 1956; Boniface 1995). The
division of ontological knowledge into appropriate
levels of abstraction is a fundamental part of our
EXPO proposal (see figure 1). The upper ontology
SUMO (suggested upper merged ontology) includes a
formalization of such top-level classes as physical
process, physical and abstract objects including dimen-
sional units, measures, time intervals, etc. As described
above, lower specialized experimental domain ontolo-
gies are also starting to appear that aim to formalize
knowledge about specific experimental techniques such
as for microarrays (MO). What is currently missing is
the intermediate layer of a general ontology of scientific
experiments to formalize the ontological knowledge
that is common between different scientific areas.
EXPO provides a structure to describe such common
concepts as experimental goals, experimental methods
and actions, types of experiments, rules for experi-
mental design, etc. (see figure 2). We see EXPO as
a part of a general ontology of science that should
formalize scientific tasks, methods, techniques,
infrastructure of science (such knowledge about aca-
demic staff, projects, scientific documents has
J. R. Soc. Interface (2006)
already partially been formalized in the KA2 ontology
(protege.stanford.edu/ontologies/ontologyOfScience/
ontology_of_science.htm)).
2.1. The design principles of EXPO
To form EXPO we have used a combined top-down
(designing EXPO with reference to an upper ontology)
bottom-up methodology (validating EXPO by appli-
cations in different domains). The first step in the top-
down approach was to anchor EXPO to a standard
upper ontology, which describes general knowledge
about the world. A standard upper ontology provides:
template structures, terms, and relations, along with
key definitions and axioms; a principled way of
determining the top-level concepts of our ontology (as
an extension of the upper ontology) and connections to
other ontologies (enabling cross ontology use and
inference). The Standard Upper Ontology Working
Group IEEE P1600.1 has proposed SUMO as a general
standard (Niles & Pease 2001) to support computer
applications such as data interoperability, information
search and retrieval, automated interfacing and natural
language processing (suo.ieee.org/). We therefore
selected SUMO as our upper ontology. Use of SUMO
ensures compatibility with other compliant SUMO
ontologies and enables EXPO to have wide reusability
http://www.suo.ieee.org/
http://www.obo.sourceforge.net
http://www.protege.stanford.edu/ontologies/ontologyOfScience/
ontology_of_science.htm
http://www.protege.stanford.edu/ontologies/ontologyOfScience/
ontology_of_science.htm
http://www.suo.ieee.org/
798 An ontology of scientific experiments L. N. Soldatova and
R. D. King
and functionality. The top-down development process
of EXPO ensures inclusion of the key concepts in
general scientific experiments which would be difficult
to ensure if EXPO was based on bottom-up general-
ization of experiments from a particular scientific
domain.
2.2. EXPO as an extension of SUMO
Below we describe the elements (terms) of SUMO used
to build EXPO. For each term we give both the SUMO
definition in quotes (suo.ieee.org/SUO/SUMO/index.
html) and an example of use from EXPO (sourceforge.
net/projects/expo). Note, the meaning of the terms
used in SUMO do not necessarily correspond to those in
mathematics, philosophy or computer science; SUMO
terms start with capitals; and terms in ontologies are by
convention presented in singular form.
Class. ‘Class differs from set in three important
respects. First, Class is not assumed to be extensional.
Second, Class typically has an associated ‘condition’
that determines the instances of the Class. Third, the
instances of a class may occur only once within the
class, i.e. a class cannot contain duplicate instances’.
For example in EXPO, the condition ‘being a statement
about cause-effect relations between known and
unknown variables of the domain of the experiment’
determines the class ExperimentalHypothesis. Each
class in EXPO has both a natural language definition,
and a computational definition (as a list of associated
relations).
Individual (also called Instance). An entity ‘is an
instance of a Class if it is included in that Class’. For
example in EXPO, the particular experiment ‘a
precision measurement of the mass of the top quark’
is an instance of the class ScientificExperiment. N.B.
EXPO provides a conceptual description and does not
contain individuals. However, as a reference model for
description of experiments, EXPO assumes extensions
by the adding of instances to represent particular
experiments. The concept of an instance is also
essential in EXPO in the definition of is-a relations.
Relations: Subclass (is-a). ‘(subclass ?CLASS1
?CLASS2) means that ?CLASS1 is a subclass of
?CLASS2, i.e. every instance of ?CLASS1 is also an
instance of ?CLASS2.’ For example in EXPO, the class
HypothesisAcceptanceMistake (‘the incorrect accep-
tance or rejecting of the research hypothesis’) is a
subclass of the class ResultError (‘an incorrectly
inferred conclusion about a research hypothesis or
about the phenomena involved in the experiment’).
Instance of. ‘is a BinaryPredicate (instance ?INDI-
VIDUAL ?CLASS) that means ?INDIVIDUAL has an
associated condition that determines the instances of
the ?CLASS’. For example in EXPO, the individual a
precision measurement of the mass of the top quark
experiment is associated to the class ScientificExperi-
ment with the conditions: ‘being an investigation of
J. R. Soc. Interface (2006)
cause-effect relations between known and unknown
variables of the field of study’.
Part (p/o). ‘The basic mereological relation. All
other mereological relations are defined in terms of this
one. (part ?PART ?WHOLE) simply means that the
Entity?PART is part of the Entity?WHOLE. Note that
since part is a ReflexiveRelation, every Entity is a part
of itself’. For example in EXPO, ExperimentalDesign is
a part of ScientificExperiment.
Attribute (a/o). ‘(attribute ?ENTITY ?PROP-
ERTY) means that ?PROPERTY is an Attribute of
?ENTITY’. For example in EXPO, Controllability of
an experimental variable is an attribute which charac-
terizes whether a subject of the experiment can
control/vary a variable.
Role. As an addition to SUMO EXPO defines the
Role predicate: ‘(role ?OBJECT ?ENTITY) means
that ?OBJECT-a Role holder, plays a Role ?ENTITY
in some context’ (Sunagawa et al. 2005). In EXPO, we
always consider the context of an experiment. For
example in EXPO, in the context of an experiment a
human object can either play the role SubjectOfExperi-
ment if the human is ‘one who executes the experi-
ment’, or the role ObjectOfExperiment if the human is
one ‘on whom an experiment is made’ (OED 1989).
In designing EXPO we have endeavoured to use as
few relations as possible. This helps to ensure that the
ontology is both comprehensible and extendable, while
being expressive enough to represent of all the required
relations between classes of the domain (SUMO
provides many other well defined relations (i.e.
contains, located, precondition, etc.) that may be
useful in extending EXPO).
We selected a subset of 46 SUMO classes that are
most relevant to describing scientific experiments, e.g.
PhysicalQuantity, TimeMeasure, ArtificialLanguage,
Experimenting. We then added 172 other classes that
we judged necessary to represent scientific experiment,
e.g. ExecutionOfExperiment, MeasurementError,
ExperimentalResults. A number of EXPO classes we
judge to be outside of the domain of experiments; these
belong more properly in a full ontology of science (or
SUMO), for example: Variable, Robustness, Reference.
We aimed to employ what we consider to be the best
practice in ontological development. For example, we
follow what we believe to be one of the best constructed
ontologies in science, the Foundational Model of
Anatomy (FMA) (Rosse & Mejino 2003), in disallowing
multiple inheritance; this we believe results in EXPO
being simpler to comprehend and makes it easier to
avoid the inference errors that can occur with multiple
inheritance. In defining EXPO concepts, we also follow
the FMA desiderata of relying on Aristotelian
definitions (Aristotle 350 BC). ‘In dictionaries the
unit of the information is a term., in an ontology.the
unit of information is a concept and the purpose of
definitions is to align all concepts in the ontology’s
domain in a coherent inheritance type hierarchy..
Definitions should state the essence of . entities in
terms of their characteristics consistent with the
ontology’s context. Paraphrasing Aristotle, the essence
of an entity is constituted by two sets of defining
http://www.suo.ieee.org/SUO/SUMO/index.html
http://www.suo.ieee.org/SUO/SUMO/index.html
http://www.sourceforge.net/projects/expo
http://www.sourceforge.net/projects/expo
An ontology of scientific experiments L. N. Soldatova and R. D.
King 799
attributes; one set, the genus, necessary to assign to an
entity to a class and the other set, the differentiae,
necessary to distinguish the entity from other entities
also assigned to the class. A collection of entities that
share the same set of essential characteristics constitu-
tes a class of the ontology’ (Rosse & Mejino 2003).
EXPO follows to SUMO naming conventions, such as:
NameOfTerm.
2.3. Bottom-up design of EXPO
We tested the validity of the design of EXPO by
applying it to a number of different scientific domains
(metabolomic, microbiological, computer science,
particle physics). This range of application areas
enabled us to have confidence that the classes in
EXPO cover the essential concepts of scientific
experiments.
One motivation for the use of a minimum set of
relations in EXPO is that we hope to provide compliance
not only with the upper ontology SUMO, but also with
the existing domain ontologies of experiments (or at
least a mapping to them). We have therefore also
engaged with the bio-ontology community to try to
ensure that EXPO is compatible with the development
of ontologies for experiments in biology. Although
SUMO and the upper bio-ontologies employ different
sets of relations, EXPO tries to avoid contradictions
between them. Currently there are no compositional
contradictions with the open biomedical ontologies
(OBO) relations ontology (http://obo.sourceforge.net)
(the latter has is-a, instance-of, part-of and other
relations that are not used in EXPO, additionally
OBO allows defining relations). To verify this approach
we plan to incorporate domain ontologies of experiments
that are already exist (MO, PSI) or are at the
development stage like FuGO.
2.4. The EXPO domain
We consider an experiment to have three levels: the
physical level (the real world FieldOfStudy about
which an experiment should discover new knowledge);
the model level (our knowledge about the experimental
domain ExperimentalModel); and a design level
Experimental Design (where parameters, target vari-
ables of an experiment, and a sequence of experimental
actions are determined). Our definition of an experi-
ment is ‘a scientific experiment is a research method
which permits the investigation of cause-effect relations
between known and unknown (target) variables of the
domain.’
EXPO describes PhysicalExperiment where an
experimenter manipulates by the real-world (physical)
domain and ComputationalExperiment where an
experimenter investigates cause-effect relations by
manipulating a computational (non-physical) domain
adequate to the real-world domain. EXPO is able to
represent experiments with both explicit experimental
hypothesis (GalileanExperiment) and without explicit
hypothesis (BaconianExperiment) (Medawar 1981).
Experiments are classified by the FieldOfStudy, i.e.
MicroarrayExperiment, or MetabolomicExperiment.
J. R. Soc. Interface (2006)
EXPO supports several classification systems of
domains: library classification DeweyDecimalClassifi-
cation, LibraryOfCongressClassification and Research-
CouncilsUKClassification. Our initial version of EXPO
contains 218 well-defined concepts about experimental
methods. We have automatically translated EXPO into
the W3C standard ontology language OWL-DL using
the Hozo ontology editor (Kozaki et al. 2002).
2.5. The future of EXPO
The development of a general ontology for scientific
experiments is an ambitious goal, and we have so far
only proposed the key top-level concepts. Although we
have sought to design EXPO to be uncontroversial and
consistent with the generally accepted view of experi-
ments in science, it is inevitable in work related to
philosophy, that some of our design decisions are
debateable. We have therefore opened up EXPO to
modification by placing EXPO in SourceForge (source
forge.net/projects/expo). We stress that EXPO is still
at an initial stage in its development, and we invite
scientist, researchers and practitioners to contribute to
its improvement.
3. APPLICATIONS OF EXPO
We argue that utilization of a common standard
ontology for the annotation of scientific experiments
will make scientific knowledge more explicit, help
detect errors, promote the interchange and reliability
of experimental methods and conclusions and remove
redundancies in domain-specific ontologies. To test
these claims we employed EXPO to annotate two
real-world examples: one from physics (high-energy/
particle physics) and the other from biology (phyloge-
netics). We selected these examples as extrema from an
arbitrarily selected issue of Nature (10 June 2004). Full
details of these annotations are in the electronic
supplementary material. In both papers annotation
with EXPO enabled the scientific knowledge presented
in the paper (encoded as natural language free-text) to
be made more explicit, and for problems in the papers
to be found. In addition, EXPO served as a basis for
logical inference about consistency and validity of the
conclusions stated in the articles. Annotation with
EXPO also suggested an unexpected similarity between
these two experiments.
3.1. High-energy physics application
The first example concerned a new estimate of the
mass of the top quark (Mtop) authored by the ‘D0
Collaboration’ (approx. 350 scientists) (D0 Collabor-
ation 2004). The D0 Collaboration in 1995 were joint
discovers of the top quark; a landmark event in
physics. The value of Mtop is of particular scientific
importance as it is a key constraint on the mass of
the hypothetical Higgs boson. The Higgs boson is
believed to provide the mechanism by which
particles acquire mass.
The application of EXPO to annotating this experi-
ment is shown in figure 3 (and more fully in the
http://obo.sourceforge.net
http://sourceforge.net/projects/expo
http://sourceforge.net/projects/expo
ScientificExperiment: ComputationalExperiment: Simulation
AdminInfoAboutExperiment: title: a precision measurement of
the mass of the top quark
ClassificationByDomain:
DomainOfExperiment: HighEnergyPhysics /ParticlePhysics
DDC(Dewey): 539.7 atomic and nuclear physics
LibraryOfCongress: QC 770 –798 atomic, nuclear, particle
physics
RelatedDomain:
ComputationalStatistics DDC (Dewey): 519 probabilities and
applied mathematics
LibraryOfCongress: QA 273 –274 probabilities
ResearchHypothesis: RepresentationStyle: text
LinguisticExpression: NaturalLanguage:
given the same observed data, use of the new statistical method
M1 will produce a
more accurate estimate of Mtop than the original method M0
LinguisticExpression: ArtificialLanguage
AlternativeHypothesis: SubjectEffect: ExperimenterBias:
LinguisticExpression: ArtificialLanguage:
P (The Standard Model) is high
∴ Mtop = 173.3
DomainModel: a statistical branching model of particles
ExperimentalModel:
Factor: M – a method of estimating Mtop
FactorLevel: M0 – the old method
M1 – the new method
TargetVariable: Mtop
ModelAssumption1: it is possible to compare the accuracy of
estimates of M0 and M1 without knowing
the true value of Mtop.
ModelAssumption2: 'the specific value of the Pbkg cut-off
based on Mtop = 175 GeV/c
2 [8] has no effects
on the experimental results'
ExperimentalConclusion: representation style: text
LinguisticExpression: NaturalLanguage
the probability of the research hypothesis H1 has been
increased:
method M1 'yields a greatly improved precision' of Mtop [8]
ResultError: ErrorOfConclusion: hypothesis acceptance
mistake: false positive ≠ 0
ErrorOfConclusion: FaultyComparison: M0 and M1
estimates
Higgs boson mass best fit mass estimate is
experimentally excluded
the standard model Higgs boson
M
M
M
>=
Figure 3. EXPO formalization of the particle physics
experiment (a fragment).
800 An ontology of scientific experiments L. N. Soldatova and
R. D. King
electronic supplementary material). This annotation
makes it explicit that the experiment was somewhat
unusual in not generating any new observational data.
Instead, it presents the results of applying a new
statistical analysis method to existing data (a set of
putative top quark pair decays events involving eCjets
and mCjets). No explicit hypothesis was put forward in
the paper. However, we argue that the paper’s implicit
experimental hypothesis was given the same observed
data, use of the new statistical method will produce a
more accurate estimate of Mtop than the original method.
This is based on the authors’ statement ‘here we report a
technique that extracts more information from each
top-quark event and yields a greatly improved precision
when compared to previous measurements’. We
consider that the paper’s hypothesis does not concern
J. R. Soc. Interface (2006)
the value of Mtop directly, as this is deductively inferred
from the hypothesis. We prefer the term ‘accuracy’ to
‘precision’ (which also occurs in the title) as its meaning
is more generally associated with the relationship
between the closeness of agreement between a measured
value and a true value (www.physics.unc.edu/wdear
dorf/uncertainty/definitions.html); which presumably
is what is meant. The use of EXPO may have alerted the
authors and the Nature editor to the unsuitability of use
of the term precision. Annotation with EXPO high-
lighted that little evidence is presented for or against the
experimental hypothesis, only one sentence in the
Methods section refers to simulation studies. Instead,
the authors assume that the new method is more
accurate and focused on application of the new
statistical method to estimating Mtop. The estimate of
http://www.physics.unc.edu/~deardorf/uncertainty/definitions.ht
ml
http://www.physics.unc.edu/~deardorf/uncertainty/definitions.ht
ml
ScientificExperiment: PhysicalExperiment:
Hypothesis-forming; Hypothesis-driven
AdminInfoAboutExperiment: Title: mesozoic origin of West
Indian insectivores
ClassificationByDomain:
DomainOfExperiment:
Zoology DDC(Dewey): 599: mammalology
LibraryOfCongress: QL351–QL352 Zoology-Classif.
Taxonomy DDC(Dewey): 575 evolution and genetics
LibraryOfCongress: QH 367.5 molecular phylogenetics
LibraryOfCongress: QH83 Biosystematics
RelatedDomain:
ComputationalStatistics DDC(Dewey): 519 probabilities and
applied mathematics
LibraryOfCongress: QA 273–274 probabilities
ResearchHypothesis: implicit
ExperimentalGoal: to investigate the phylogeny of the species:
Solenodon paradoxus and Solenodon
cubanus
NullHypothesisH01: RepresentationStyle: text
LinguisticExpression: NaturalLanguage:
'some have suggested a closerelationship to soricids (shrews)
but not to talpids'
LinguisticExpression: ArtificialLanguage
So, Sh, T, An mammalia
So . Sh . T . An . solenodon (So) soricoidea (Sh) talpoidea (T)
ancestor (An, So) ancestor (An, Sh) ancestor (An, T)
DomainModel: statistical branching model of evolution
ExperimentalModel:
Factor: T– phylogenetic tree relating Solenodon paradoxus and
Solenodon cubanus to other
mammal species
FactorLevel: 'sampling trees every 20 generations' (spl. in [9])
TargetVariable: PS (T)–position and BL (T)–branch length of
Solenodon paradoxus and Solenodon
cubanus in tree T
ModelAssumption1: the use of the selected DNA sequences
will produce results that generalize to the
complete genomes
ModelAssumption2: molecular clock assumptions
ExperimentalAction 1.1.1: extraction and purification
Object: sample of DNA
ParentGroup: DNA from Solenodon paradoxus
Sampling: random sampling
Instrument: Qiagen DNA cleanup kit
………………………………………………….
ExperimentalConclusionC2: comment: formed hypothesis
LinguisticExpression: ArtificialLanguage
So, Sh, T, E, An, De mammalia
So . Sh . T . E . X . An . solenodon (So) soricoidea (Sh)
talpoidea (T) erinaceidea (E) ancestor (An, So) ancestor (An,
De)
ancestor (De, So) ancestor (De, Sh) ancestor (De, T) ancestor
(De, E)
Figure 4. EXPO formalization of the phylogenetic experiment (a
fragment).
An ontology of scientific experiments L. N. Soldatova and R. D.
King 801
Mtop from the old method was 173.3G5.6 (stat) G5.5
(sys) GeV/c2 and from the new estimate 180.1G2.0
(stat) G2.6 (sys) GeV/c2. The current (April
2006) best estimate for Mtop is 174.2G3.4 GeV/c
2
(Tevatron Electroweak Working Group. CDF Collab-
oration, D0 Collaboration 2006); therefore, it would
appear that the original estimate was actually more
accurate! Of course, it is possible that stochastic factors
meant that the new statistical method was unlucky in
its prediction. However, it would seem at least as likely
that some form of methodological difficulty in the
experiment was involved.
One such problem was revealed when annotating the
experiment with EXPO. The authors state that a
‘critical difference’ between the old and new method
J. R. Soc. Interface (2006)
was: ‘the assignment of more weight to events that are
well measured or more likely to correspond to a top/anti-
top signal’. This is an application of the Carnap
principle: that you must take into account all of the
evidence relevant to a question (Curd & Cover 1988;
omega.math.albany.edu:8008/JaynesBook.html). In
this case, the information that some events are better
measured needs to be taken into account. However,
annotating the new method with EXPO makes it
explicit that 91 candidate events were used to calculate
the old value, but only 22 of these were used for the new
value. Therefore, the only weights used were 0 and 1.
The Carnap principle would only justify these extreme
weights if the 69 excluded events contained no infor-
mation on Mtop. As this was not demonstrated and
http://www.omega.math.albany.edu:8008/JaynesBook.html
802 An ontology of scientific experiments L. N. Soldatova and
R. D. King
would seem unlikely, it therefore appears that a source of
statistical inefficiency was introduced which counter-
acted the improved signal/noise ratio from choosing
well-measured events.
Another point highlighted by formalization in EXPO
is the relationship between estimating Mtop and the
existence of the Higgs boson. The paper concluded that
Mtop is higher than previously estimated, which deduc-
tively implies a higher mass for the Higgs boson. As the
Higgs boson has not yet been observed, even at energies
above its previously predicted maximum-likelihood
mass, the inferred higher Mtop lent support to the
existence of the Higgs boson. However, it would have
been possible to argue validly the other way: that the
Higgs boson is thought highly likely to exist therefore its
non-observation makes more probable a higher value of
Mtop. This argument was not explicit in the paper, but it
might well have existed implicitly as a motivation. The
paper would have benefited from making this argu-
ment/motivation explicit.
3.2. Phylogenetics application
The second example investigated the phylogenetic
status of the mammalian species Solenodon cubanus
and Solenodon paradoxus (Roca et al. 2004). The main
experimental approach used was the comparison of
nuclear and mitochondrial DNA sequences derived
from Solenodons with homologous sequences in other
mammals. Solenodons are endangered insectivores that
inhabit the forests of Hispaniola and Cuba and their
phylogenetic relationship with other mammals has long
been a matter of controversy (Symonds 2005).
Generally, this paper was clear and straightforward
and a model of its kind. Figure 4 shows part of the
EXPO instantiation of the experiment (for full details
of the annotation, along with formalization of a part of
the background knowledge written as logic program,
see the electronic supplementary material).
The use of EXPO to annotate the paper makes
explicit the different hypotheses described in the paper.
What we have identified as the ResearchHypothesis are
the main conclusions of the paper. However, these
conclusions are not mentioned as possible hypotheses in
the text. This contrasts with what we identify as seven
NullHypotheses, which are mentioned explicitly in the
main text. This highlights an interesting point: the
main experimental technique used in the paper,
molecular phylogeny programs, does not typically
employ explicit hypotheses. They aim to uncover the
most probable evolutionary trees based on a model of
evolution rather than to answer an explicit question of
relatedness. However, when explicit hypotheses are
available, as in the Solenodon case, this approach may
well be sub-optimal. For example, we estimate that at
least as much computer time was used to determine the
sub-tree phylogeny of bats as the sub-tree phylogeny of
Solenodons.
Considering the null hypotheses in detail, it is
worthy of note that no conclusions concerning
hypotheses H03 and H04 are mentioned in the main
text and the interested reader has to search the further
information. The use of EXPO would have made this
J. R. Soc. Interface (2006)
omission clear. Another aspect of the research which
use of EXPO would have highlighted, was that the
DNA sequences produced during the experiment were
stored in the EMBL database using the taxonomic term
Insectivora. This taxon is now generally recognized to
be polyphyletic (Symonds 2005), and its use contradicts
the actual conclusions of the paper.
We formalized using the logic programming
language Prolog, the biological knowledge and logical
inferences behind the authors’ argument that ‘Cuban
Solenodons should be classified in a distinct genus,
Atopogale’ (see the electronic supplementary material).
Our analysis indicates that it would be more internally
consistent for the authors to have classified Cuban
Solenodons as a distinct family. The authors’ hesitation
to name a new family probably owes more to the
sociology of science than logic—naming a new family is
a much more radical step. It is, of course, a serious
shortcoming in biology that ‘genus’ and ‘family’ are not
well-defined terms.
It is significant that using EXPO to annotate these
two very different experiments revealed an important
similarity between them. This is that the natural
phenomena studied are both modelled using stochastic
branching processes—although at vastly different time
scales (approx. 10
K24
s versus millions of years). This
means that related mathematical techniques can be,
and are, applied in both domains. As it is probable that
there is little communication between these two
domains, it is possible that one domain may have
invented techniques of relevance to the other domain,
which they are currently unaware of. Identification of
such overlaps illustrates one of the benefits of a unified
ontology for scientific experiments.
4. DISCUSSION
4.1. Dissemination of scientific knowledge
It is probable that publication of scientific papers will
soon happen almost exclusively online. It is also to be
expected that most scientific data will also be published
online. The vision of ‘e-Science’ is to publish online
both papers and all of the data and metadata from a
scientific experiment for posterity; so that all results
can be repeated and compared with other related
experiments (www.nesc.ac.uk). We believe that these
developments in the online availability of scientific
knowledge and data will greatly support the drive to
the formalization of science: as the only possible way to
effectively exploit this new data is to use computers,
and for computers to work best requires formalization.
We argue that the use of EXPO is an important step
towards this.
The traditional way of presenting scientific knowl-
edge in scientific papers has many limitations. The
most important and obvious of these is the use of
natural language to describe knowledge—albeit aug-
mented by various formalisms and mathematics. This is
problematic because natural language is notorious for
its imprecision and ambiguity. Its use is also a great
barrier in using computers to store and analyse
data—hence the growing importance of text mining.
http://www.nesc.ac.uk
An ontology of scientific experiments L. N. Soldatova and R. D.
King 803
We argue that the content of scientific papers should
increasingly be expressed in formal languages—is
writing a scientific paper closer to writing poetry or a
computer program?
For the application of EXPO to become widespread,
and a critical-mass of annotations made available,
convenient tools will need to be developed to enable
practicing scientists to annotate their own experiments.
We envisage such tools will, for example, ask the user to
describe the domain of the experiment, if the experiment
involved any hypotheses, what experimental results
support or reject hypotheses, etc. Such a tool could be
incorporated into an electronic lab-book or done as a
separate procedure at the same time as writing-up (the
traditional paper). With the rise in laboratory auto-
mation, and the increasing use of artificial intelligence to
aid scientific experimentation (King et al. 2004), many
parts of EXPO may be able to be automatically input. It
is possible to envisage that some journals may enforce
submission of such annotations, along with papers—just
as submission of data to repositories is often compulsory.
4.2. Ontologies of science
We view the development of EXPO as part of a general
drive to formalize science. The current level of
formalization varies greatly between the sciences. In
some fields, such as particle physics, the background
theories are highly formalized (in mathematical nota-
tion). Yet even in this case, the actual process of
experimental testing these theories, despite the great
technical sophistication of the experiments (witness the
new Large Hardron collider at CERN) is not yet
formalized. The situation of biology presents an
interesting contrast to particle physics. Very little
work has been done on formalizing the background
theories in biology, yet biology is leading the way in
ontology development, both generally and for
experiments.
It is important to note that the general lack of
explicit ontologies in science does not mean that
ontologies are not being employed. It means that we
are currently condemned to using implicit, non-
standardized and possibly naive ones.
5. CONCLUSIONS
The unity of scientific experimentation implies that an
accepted general ontology of experiments is both
possible and desirable. Such an ontology would
promote the sharing of results within and between
subjects, reducing both the duplication and loss of
knowledge. It is also an essential step in formalizing
science and so fully exploiting computer reasoning in
science (King et al. 2004). To quote Francis Bacon
‘Therefore, from a closer and purer league between
these two faculties, the experimental and the rational
(such as never yet been made), much may be hoped’.
J. R. Soc. Interface (2006)
REFERENCES
Aristotle, 350 BC Categories. Translated by E. M. Edghill.
See http://evans-experientialism.freewebspace.com/aris
totle_categories.htm.
Boniface, D. R. 1995 Experiment design and statistical
methods for behavioural and social research. London,
UK: Chapman & Hall.
Curd, M. & Cover, J. A. 1988 Philosophy of science. New
York, NY: Norton.
D0 Collaboration 2004 A precision measurement of the mass
of the top quark. Nature 429, 639–642. (doi:10.1038/
431639a)
Fisher, R. A. 1956 The design of experiments. Edinburgh, UK:
Oliver & Boyd.
Hey, T. & Trefethon, A. 2003 The data deluge: an e-science
perspective. In Grid computing-making the global infra-
structure a reality, vol. 36 (ed. F. Berman, G. C. Fox &
A. J. G. Hey), pp. 809–824. London, UK: Wiley.
King, R. D., Whelan, K. E., Jones, M. F., Reiser, P. G. K. &
Bryant, C. H. 2004 Functional genomics hypothesis
generation by a robot scientist. Nature 427, 247–252.
(doi:10.1038/nature02236)
Kozaki, K., Kitamura, Y., Ikeda, M. & Mizoguchi, R. 2002
Hozo: an environment for building/using ontologies based
on a fundamental consideration of “Role” and “Relation-
ship”. Knowl. Eng. Knowl. Manag., pp. 213–218.
Medawar, P. B. 1981 Advice to a young scientist. London, UK:
Pan Books Ltd.
Niles, I. & Pease, A. 2001 Towards a standard upper ontology.
In Proc. 2nd Int. Conf. on Formal Ontology in Information
Systems (FOIS-2001) (ed. C. Welty & B. Smith).
Roca, A. L., Bar-Gal, G. K., Eizirik, E., Helgen, M. K., Maria,
R., Springer, M. S., O’Brien, S. J. & Murphy, W. J. 2004
Mesozoic origin for West Indian insectivores. Nature 429,
649–651. (doi:10.1038/nature02597)
Rosse, C. & Mejino Jr, J. L. 2003 A reference ontology for
biological informatics: the foundational model of anatomy.
Biomed. Inform. 36, 478–500. (doi:10.1016/j.jbi.2003.11.
007)
Sowa, J. F. 2000 Knowledge representation: logical, philoso-
phical, and computational foundations. Pacific Grove, CA:
Brooks Cole Publishing.
Sunagawa, E., Kozaki, K., Kitamura, Y. & Mizoguchi, R.
2005 A framework for organizing role concepts in ontology
development tool: Hozo. Roles, an interdisciplinary
perspective: ontologies, programming languages, and mul-
tiagent systems AAAI Symp. FS-05-08, pp. 136–143.
Arlington, VA: AAAI.
Symonds, M. R. 2005 Phylogeny and life histories of the
‘insectivora’: controversies and consequences. Biol. Rev.
80, 93–128. (doi:10.1017/S1464793104006566)
The Oxford English Dictionary. 1989 2nd edn. Oxford:
Oxford University Press.
Tevatron Electroweak Working Group. CDF Collaboration,
D0 Collaboration, 2006 Combination of CDF and D0
results on the mass of the top quark. See http://arXiv.
org/abs/hep-ex/0604053.
Toulmin, S. 2004 Philosophy of science. Encyclopaedia
Britannica, Deluxe CD. Chicago, IL: Encyclopaedia
Britannica.
Turing, A. M. 1936 On computable numbers, with an
application to the Entscheidungsproblem. Proc. Lond.
Math. Soc. 2, 230–265.
http://evans-
experientialism.freewebspace.com/aristotle_categories.htm
http://evans-
experientialism.freewebspace.com/aristotle_categories.htm
http://dx.doi.org/doi:10.1038/431639a
http://dx.doi.org/doi:10.1038/431639a
http://dx.doi.org/doi:10.1038/nature02236
http://dx.doi.org/doi:10.1038/nature02597
http://dx.doi.org/doi:10.1016/j.jbi.2003.11.007
http://dx.doi.org/doi:10.1016/j.jbi.2003.11.007
http://dx.doi.org/doi:10.1017/S1464793104006566
http://arXiv.org/abs/hep-ex/0604053
http://arXiv.org/abs/hep-ex/0604053An ontology of scientific
experimentsIntroductionEXPO: an ontology of scientific
experimentsThe design principles of EXPOEXPO as an
extension of SUMOClassIndividual (also called
Instance)RelationsBottom-up design of EXPOThe EXPO
domainThe future of EXPOApplications of EXPOHigh-energy
physics applicationPhylogenetics
applicationDiscussionDissemination of scientific
knowledgeOntologies of scienceConclusionsReferences

More Related Content

Similar to The electroniorg10.1098uk.Author for cReceived .docx

A PROPOSED MULTI-DOMAIN APPROACH FOR AUTOMATIC CLASSIFICATION OF TEXT DOCUMENTS
A PROPOSED MULTI-DOMAIN APPROACH FOR AUTOMATIC CLASSIFICATION OF TEXT DOCUMENTSA PROPOSED MULTI-DOMAIN APPROACH FOR AUTOMATIC CLASSIFICATION OF TEXT DOCUMENTS
A PROPOSED MULTI-DOMAIN APPROACH FOR AUTOMATIC CLASSIFICATION OF TEXT DOCUMENTSijsc
 
A Proposed Multi-Domain Approach for Automatic Classification of Text Documents
A Proposed Multi-Domain Approach for Automatic Classification of Text Documents A Proposed Multi-Domain Approach for Automatic Classification of Text Documents
A Proposed Multi-Domain Approach for Automatic Classification of Text Documents ijsc
 
Machine Learning for Chemical Sciences
Machine Learning for Chemical SciencesMachine Learning for Chemical Sciences
Machine Learning for Chemical SciencesIchigaku Takigawa
 
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Ross Mounce
 
(2018.9) 分子のグラフ表現と機械学習
(2018.9) 分子のグラフ表現と機械学習(2018.9) 分子のグラフ表現と機械学習
(2018.9) 分子のグラフ表現と機械学習Ichigaku Takigawa
 
Bibliography (Microsoft Word, 61k)
Bibliography (Microsoft Word, 61k)Bibliography (Microsoft Word, 61k)
Bibliography (Microsoft Word, 61k)butest
 
247th ACS Meeting: Experiment Markup Language (ExptML)
247th ACS Meeting: Experiment Markup Language (ExptML)247th ACS Meeting: Experiment Markup Language (ExptML)
247th ACS Meeting: Experiment Markup Language (ExptML)Stuart Chalk
 
Evolution of e-Research
Evolution of e-ResearchEvolution of e-Research
Evolution of e-ResearchDavid De Roure
 
(2019.9) 不均一系触媒研究のための機械学習と最適実験計画
(2019.9) 不均一系触媒研究のための機械学習と最適実験計画(2019.9) 不均一系触媒研究のための機械学習と最適実験計画
(2019.9) 不均一系触媒研究のための機械学習と最適実験計画Ichigaku Takigawa
 
The physics behind systems biology
The physics behind systems biologyThe physics behind systems biology
The physics behind systems biologyImam Rosadi
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATIONONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATIONIJwest
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION dannyijwest
 
Poster session abs new trends 2015
Poster session abs new trends 2015Poster session abs new trends 2015
Poster session abs new trends 2015xrqtchemistry
 
E bank uk_linking_research_data_scholarly
E bank uk_linking_research_data_scholarlyE bank uk_linking_research_data_scholarly
E bank uk_linking_research_data_scholarlyLuisa Francisco
 
The Role of Ontology in the Era of Big Military Data
The Role of Ontology in the Era of Big Military DataThe Role of Ontology in the Era of Big Military Data
The Role of Ontology in the Era of Big Military DataBarry Smith
 
Annotated Bibliography On Research Methodologies
Annotated Bibliography On Research MethodologiesAnnotated Bibliography On Research Methodologies
Annotated Bibliography On Research MethodologiesJeff Nelson
 

Similar to The electroniorg10.1098uk.Author for cReceived .docx (20)

A PROPOSED MULTI-DOMAIN APPROACH FOR AUTOMATIC CLASSIFICATION OF TEXT DOCUMENTS
A PROPOSED MULTI-DOMAIN APPROACH FOR AUTOMATIC CLASSIFICATION OF TEXT DOCUMENTSA PROPOSED MULTI-DOMAIN APPROACH FOR AUTOMATIC CLASSIFICATION OF TEXT DOCUMENTS
A PROPOSED MULTI-DOMAIN APPROACH FOR AUTOMATIC CLASSIFICATION OF TEXT DOCUMENTS
 
A Proposed Multi-Domain Approach for Automatic Classification of Text Documents
A Proposed Multi-Domain Approach for Automatic Classification of Text Documents A Proposed Multi-Domain Approach for Automatic Classification of Text Documents
A Proposed Multi-Domain Approach for Automatic Classification of Text Documents
 
Machine Learning for Chemical Sciences
Machine Learning for Chemical SciencesMachine Learning for Chemical Sciences
Machine Learning for Chemical Sciences
 
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
 
(2018.9) 分子のグラフ表現と機械学習
(2018.9) 分子のグラフ表現と機械学習(2018.9) 分子のグラフ表現と機械学習
(2018.9) 分子のグラフ表現と機械学習
 
Bibliography (Microsoft Word, 61k)
Bibliography (Microsoft Word, 61k)Bibliography (Microsoft Word, 61k)
Bibliography (Microsoft Word, 61k)
 
FAIR data management in biomedicine
FAIR data management  in biomedicineFAIR data management  in biomedicine
FAIR data management in biomedicine
 
247th ACS Meeting: Experiment Markup Language (ExptML)
247th ACS Meeting: Experiment Markup Language (ExptML)247th ACS Meeting: Experiment Markup Language (ExptML)
247th ACS Meeting: Experiment Markup Language (ExptML)
 
Evolution of e-Research
Evolution of e-ResearchEvolution of e-Research
Evolution of e-Research
 
(2019.9) 不均一系触媒研究のための機械学習と最適実験計画
(2019.9) 不均一系触媒研究のための機械学習と最適実験計画(2019.9) 不均一系触媒研究のための機械学習と最適実験計画
(2019.9) 不均一系触媒研究のための機械学習と最適実験計画
 
The physics behind systems biology
The physics behind systems biologyThe physics behind systems biology
The physics behind systems biology
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATIONONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
 
Poster session abs new trends 2015
Poster session abs new trends 2015Poster session abs new trends 2015
Poster session abs new trends 2015
 
gky1131.pdf
gky1131.pdfgky1131.pdf
gky1131.pdf
 
E bank uk_linking_research_data_scholarly
E bank uk_linking_research_data_scholarlyE bank uk_linking_research_data_scholarly
E bank uk_linking_research_data_scholarly
 
The Role of Ontology in the Era of Big Military Data
The Role of Ontology in the Era of Big Military DataThe Role of Ontology in the Era of Big Military Data
The Role of Ontology in the Era of Big Military Data
 
list_of_publications
list_of_publicationslist_of_publications
list_of_publications
 
list_of_publications
list_of_publicationslist_of_publications
list_of_publications
 
Annotated Bibliography On Research Methodologies
Annotated Bibliography On Research MethodologiesAnnotated Bibliography On Research Methodologies
Annotated Bibliography On Research Methodologies
 

More from cherry686017

Please provide answer, write program in Prolog for the following.docx
Please provide answer, write program in Prolog for the following.docxPlease provide answer, write program in Prolog for the following.docx
Please provide answer, write program in Prolog for the following.docxcherry686017
 
Please provide references for your original postings in APA form.docx
Please provide references for your original postings in APA form.docxPlease provide references for your original postings in APA form.docx
Please provide references for your original postings in APA form.docxcherry686017
 
Please provide reference in APARequired FormatTitle Page AP.docx
Please provide reference in APARequired FormatTitle Page AP.docxPlease provide reference in APARequired FormatTitle Page AP.docx
Please provide reference in APARequired FormatTitle Page AP.docxcherry686017
 
Please post here your chosen topic and information about why y.docx
Please post here your chosen topic and information about why y.docxPlease post here your chosen topic and information about why y.docx
Please post here your chosen topic and information about why y.docxcherry686017
 
Please pick your favorite article from Ms Magazine  and do a one.docx
Please pick your favorite article from Ms Magazine  and do a one.docxPlease pick your favorite article from Ms Magazine  and do a one.docx
Please pick your favorite article from Ms Magazine  and do a one.docxcherry686017
 
Please provide discussion of the following1. Weyerhaeuser made .docx
Please provide discussion of the following1. Weyerhaeuser made .docxPlease provide discussion of the following1. Weyerhaeuser made .docx
Please provide discussion of the following1. Weyerhaeuser made .docxcherry686017
 
Please provide a summary of the key learning from the chapter.  The .docx
Please provide a summary of the key learning from the chapter.  The .docxPlease provide a summary of the key learning from the chapter.  The .docx
Please provide a summary of the key learning from the chapter.  The .docxcherry686017
 
Please pay close attention to the highlighted areas Please answe.docx
Please pay close attention to the highlighted areas Please answe.docxPlease pay close attention to the highlighted areas Please answe.docx
Please pay close attention to the highlighted areas Please answe.docxcherry686017
 
Please read ALL directions below before starting your final assignme.docx
Please read ALL directions below before starting your final assignme.docxPlease read ALL directions below before starting your final assignme.docx
Please read ALL directions below before starting your final assignme.docxcherry686017
 
Please pay attention to the topicZero Plagiarisfive referenc.docx
Please pay attention to the topicZero Plagiarisfive referenc.docxPlease pay attention to the topicZero Plagiarisfive referenc.docx
Please pay attention to the topicZero Plagiarisfive referenc.docxcherry686017
 
PLEASE OPEN THE ATTACH MENTWhen a dietary supplement is consid.docx
PLEASE OPEN THE ATTACH MENTWhen a dietary supplement is consid.docxPLEASE OPEN THE ATTACH MENTWhen a dietary supplement is consid.docx
PLEASE OPEN THE ATTACH MENTWhen a dietary supplement is consid.docxcherry686017
 
Please make sure that it is your own work and not copy and paste. Wa.docx
Please make sure that it is your own work and not copy and paste. Wa.docxPlease make sure that it is your own work and not copy and paste. Wa.docx
Please make sure that it is your own work and not copy and paste. Wa.docxcherry686017
 
please no plagiarism, 5 pages and fallow the rubic Quantitat.docx
please no plagiarism, 5 pages and fallow the rubic Quantitat.docxplease no plagiarism, 5 pages and fallow the rubic Quantitat.docx
please no plagiarism, 5 pages and fallow the rubic Quantitat.docxcherry686017
 
Please make sure to follow the below.Please note that this is .docx
Please make sure to follow the below.Please note that this is .docxPlease make sure to follow the below.Please note that this is .docx
Please make sure to follow the below.Please note that this is .docxcherry686017
 
Please make revision in the prospectus checklist assignment base.docx
Please make revision in the prospectus checklist assignment base.docxPlease make revision in the prospectus checklist assignment base.docx
Please make revision in the prospectus checklist assignment base.docxcherry686017
 
Please note research can NOT be on organization related to minors, i.docx
Please note research can NOT be on organization related to minors, i.docxPlease note research can NOT be on organization related to minors, i.docx
Please note research can NOT be on organization related to minors, i.docxcherry686017
 
please no plagiarism our class uses Turnitin  You are expected to pr.docx
please no plagiarism our class uses Turnitin  You are expected to pr.docxplease no plagiarism our class uses Turnitin  You are expected to pr.docx
please no plagiarism our class uses Turnitin  You are expected to pr.docxcherry686017
 
Please know that the score is just a ball-park and d.docx
Please know that the score is just a ball-park and d.docxPlease know that the score is just a ball-park and d.docx
Please know that the score is just a ball-park and d.docxcherry686017
 
Please note that the Reflections must have 1. MLA format-.docx
Please note that the Reflections must have 1. MLA format-.docxPlease note that the Reflections must have 1. MLA format-.docx
Please note that the Reflections must have 1. MLA format-.docxcherry686017
 
Please make sure you talk about the following  (IMO)internati.docx
Please make sure you talk about the following  (IMO)internati.docxPlease make sure you talk about the following  (IMO)internati.docx
Please make sure you talk about the following  (IMO)internati.docxcherry686017
 

More from cherry686017 (20)

Please provide answer, write program in Prolog for the following.docx
Please provide answer, write program in Prolog for the following.docxPlease provide answer, write program in Prolog for the following.docx
Please provide answer, write program in Prolog for the following.docx
 
Please provide references for your original postings in APA form.docx
Please provide references for your original postings in APA form.docxPlease provide references for your original postings in APA form.docx
Please provide references for your original postings in APA form.docx
 
Please provide reference in APARequired FormatTitle Page AP.docx
Please provide reference in APARequired FormatTitle Page AP.docxPlease provide reference in APARequired FormatTitle Page AP.docx
Please provide reference in APARequired FormatTitle Page AP.docx
 
Please post here your chosen topic and information about why y.docx
Please post here your chosen topic and information about why y.docxPlease post here your chosen topic and information about why y.docx
Please post here your chosen topic and information about why y.docx
 
Please pick your favorite article from Ms Magazine  and do a one.docx
Please pick your favorite article from Ms Magazine  and do a one.docxPlease pick your favorite article from Ms Magazine  and do a one.docx
Please pick your favorite article from Ms Magazine  and do a one.docx
 
Please provide discussion of the following1. Weyerhaeuser made .docx
Please provide discussion of the following1. Weyerhaeuser made .docxPlease provide discussion of the following1. Weyerhaeuser made .docx
Please provide discussion of the following1. Weyerhaeuser made .docx
 
Please provide a summary of the key learning from the chapter.  The .docx
Please provide a summary of the key learning from the chapter.  The .docxPlease provide a summary of the key learning from the chapter.  The .docx
Please provide a summary of the key learning from the chapter.  The .docx
 
Please pay close attention to the highlighted areas Please answe.docx
Please pay close attention to the highlighted areas Please answe.docxPlease pay close attention to the highlighted areas Please answe.docx
Please pay close attention to the highlighted areas Please answe.docx
 
Please read ALL directions below before starting your final assignme.docx
Please read ALL directions below before starting your final assignme.docxPlease read ALL directions below before starting your final assignme.docx
Please read ALL directions below before starting your final assignme.docx
 
Please pay attention to the topicZero Plagiarisfive referenc.docx
Please pay attention to the topicZero Plagiarisfive referenc.docxPlease pay attention to the topicZero Plagiarisfive referenc.docx
Please pay attention to the topicZero Plagiarisfive referenc.docx
 
PLEASE OPEN THE ATTACH MENTWhen a dietary supplement is consid.docx
PLEASE OPEN THE ATTACH MENTWhen a dietary supplement is consid.docxPLEASE OPEN THE ATTACH MENTWhen a dietary supplement is consid.docx
PLEASE OPEN THE ATTACH MENTWhen a dietary supplement is consid.docx
 
Please make sure that it is your own work and not copy and paste. Wa.docx
Please make sure that it is your own work and not copy and paste. Wa.docxPlease make sure that it is your own work and not copy and paste. Wa.docx
Please make sure that it is your own work and not copy and paste. Wa.docx
 
please no plagiarism, 5 pages and fallow the rubic Quantitat.docx
please no plagiarism, 5 pages and fallow the rubic Quantitat.docxplease no plagiarism, 5 pages and fallow the rubic Quantitat.docx
please no plagiarism, 5 pages and fallow the rubic Quantitat.docx
 
Please make sure to follow the below.Please note that this is .docx
Please make sure to follow the below.Please note that this is .docxPlease make sure to follow the below.Please note that this is .docx
Please make sure to follow the below.Please note that this is .docx
 
Please make revision in the prospectus checklist assignment base.docx
Please make revision in the prospectus checklist assignment base.docxPlease make revision in the prospectus checklist assignment base.docx
Please make revision in the prospectus checklist assignment base.docx
 
Please note research can NOT be on organization related to minors, i.docx
Please note research can NOT be on organization related to minors, i.docxPlease note research can NOT be on organization related to minors, i.docx
Please note research can NOT be on organization related to minors, i.docx
 
please no plagiarism our class uses Turnitin  You are expected to pr.docx
please no plagiarism our class uses Turnitin  You are expected to pr.docxplease no plagiarism our class uses Turnitin  You are expected to pr.docx
please no plagiarism our class uses Turnitin  You are expected to pr.docx
 
Please know that the score is just a ball-park and d.docx
Please know that the score is just a ball-park and d.docxPlease know that the score is just a ball-park and d.docx
Please know that the score is just a ball-park and d.docx
 
Please note that the Reflections must have 1. MLA format-.docx
Please note that the Reflections must have 1. MLA format-.docxPlease note that the Reflections must have 1. MLA format-.docx
Please note that the Reflections must have 1. MLA format-.docx
 
Please make sure you talk about the following  (IMO)internati.docx
Please make sure you talk about the following  (IMO)internati.docxPlease make sure you talk about the following  (IMO)internati.docx
Please make sure you talk about the following  (IMO)internati.docx
 

Recently uploaded

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 

Recently uploaded (20)

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 

The electroniorg10.1098uk.Author for cReceived .docx

  • 1. The electroni org/10.1098/ uk. *Author for c Received 19 A Accepted 9 M An ontology of scientific experiments Larisa N. Soldatova* and Ross D. King The University of Wales, Aberystwyth, Penglais, Ceredigion SY23 3DB, UK The formal description of experiments for efficient analysis, annotation and sharing of results is a fundamental part of the practice of science. Ontologies are required to achieve this objective. A few subject-specific ontologies of experiments currently exist. However, despite the unity of scientific experimentation, no general ontology of experiments exists. We propose the ontology EXPO to meet this need. EXPO links the SUMO (the Suggested Upper Merged Ontology) with subject-specific ontologies of experiments by formalizing the generic concepts of experimental design, methodology and results representation. EXPO is expressed in the W3C standard ontology language OWL-DL. We demonstrate the utility of EXPO and its ability to describe different experimental domains, by
  • 2. applying it to two experiments: one in high-energy physics and the other in phylogenetics. The use of EXPO made the goals and structure of these experiments more explicit, revealed ambiguities, and highlighted an unexpected similarity. We conclude that, EXPO is of general value in describing experiments and a step towards the formalization of science. Keywords: ontology; formalization; annotation; artificial intelligence; metadata 1. INTRODUCTION A fundamental part of scientific practice is to increase our knowledge of the world through the performance of experiments. This knowledge should, ideally, be expressed in a formal logical language. To quote the Encyclopaedia Britannica, ‘most analytical philoso- phers of science have explicitly based their program on a presupposition inherited from Descartes and Plato, viz. that the intellectual content of any natural science can be expressed in a formal propositional system, having a definite, essential logical structure’ (Toulmin 2004). It is possible to quibble with the restriction to propositional systems, but the desirability of the use of formal languages is rarely disputed in the philosophy of science. Formal languages promote semantic clarity, which in turn supports the free exchange of scientific knowledge and simplifies scientific reasoning (Curd & Cover 1988). Now, at the beginning of the twenty-first century, the formalization of scientific knowledge is no longer just a philosophical desirable, it is becoming a technological necessity. In all areas of science there is ever more information to assimilate and, in some fields,
  • 3. this increase in information has become a ‘deluge’ (Hey & Trefethon 2003). The result is that science increasingly depends on computers to store, integrate and analyse data. The full power of computers—which originated as a spin-off from the formalization of mathematics (Turing 1936)—can only be efficiently c supplementary material is available at http://dx.doi. rsif.2006.0134 or via http://www.journals.royalsoc.ac. orrespondence ([email protected]). pril 2006 ay 2006 795 exploited when the knowledge they work with is formalized. This line of reasoning is the motivation for the development of e-Science, with its vision of linking papers, data, metadata and analysis methods together (www.nesc.ac.uk). It is also the driving force behind the development of the Semantic Web (www. w3.org/2001/sw/). The first step in formalizing knowledge is to define an explicit ontology, i.e. to describe what exists. As the most characteristic feature of science is experi- mentation, it follows that the development of ontology of experiments is a fundamental step in the formaliza- tion of science. It is therefore surprising that no general- purpose ontology of scientific experiments currently exists. In this paper we propose the most general elements of a common ontology of scientific experi- ments (EXPO). We aim to formalize generic knowledge about scientific experimental design, methodology and results representation. Such a common ontology is both feasible and desirable because all the sciences follow the same experimental principles. Despite their different subject matter, all the sciences organize, execute and
  • 4. analyse experiments in similar ways; they use related instruments and materials; they describe experimental results in identical formats, dimensional units, etc. The aim of EXPO is to abstract out the fundamental concepts in formalizing experiments that are domain independent (figure 1). The advantage of this is that generic knowledge about experiments is held in only one place; ensuring consistency, clean updating and non-redundancy. The practical benefit is that if in an experiment, multiple sciences are involved (e.g. meta- bolomics and organic chemistry or radio astronomy and physical chemistry), then common experimental meta- data will only need to be recorded once rather than J. R. Soc. Interface (2006) 3, 795–803 doi:10.1098/rsif.2006.0134 Published online 6 June 2006 q 2006 The Royal Society http://www.nesc.ac.uk http://www.w3.org/2001/sw/ http://www.w3.org/2001/sw/ http://dx.doi.org/doi:10.1098/rsif.2006.0134 http://dx.doi.org/doi:10.1098/rsif.2006.0134 http://www.journals.royalsoc.ac.uk http://www.journals.royalsoc.ac.uk SUMO ontology of science FuGO upper level
  • 5. intermediate level domain level EXPO SubjectOfExp. ObjectOfExp. domain model MSI PSIMO Figure 1. The position of EXPO. EXPO as a part of ontology of science is an extension of the upper ontology SUMO. EXPO can be further extended via the classes DomainOfExperiment, SubjectOfExperiment, ObjectOfExperiment, etc. to domain specific ontologies of experiments such as MO, MSI, PSI, etc. 796 An ontology of scientific experiments L. N. Soldatova and R. D. King multiple times. The utilization of a common standard ontology for the annotation of scientific experiments will make scientific knowledge more explicit, help detect errors, promote the interchange and reliability of experimental methods and conclusions, and remove redundancies in domain-specific ontologies. More generally, we envisage EXPO as a part of a general ontology of science that would include other scientific methods as observational, theoretical, description of technologies, resources, etc. Although no ontology exists that formalizes general experimental information, several ontologies exist for
  • 6. specialized experimental areas in biology (mged.source forge.net/;psidev.sourceforge.net/) and metadata standards are appearing in many other sciences, e.g. in chemistry (ftp://ftp.ebi.ac.uk/pub/databases/chebi/ ontology/) and physics (www.ph.ed.ac.uk/ukqcd/com munity/the_grid/QCDml1.1/ConfigDoc/ConfigDoc. html). Probably the best-known attempt to formalize the description of experiments is that developed by the Microarray Gene Expression Society (MGED) (mged. sourceforge.net/). The MGED Ontology (MO) was designed to formalize the descriptors required by minimum information about a microarray experiment (MIAME) standard for capturing core information about microarray experiments. MO aims to provide a conceptual structure for microarray experiment descrip- tions and annotation. A number of ontological develop- ments related to MO also exist. The HUPO PSI General J. R. Soc. Interface (2006) Proteomics Standards and Mass Spectrometry working groups are building an ontology that will support proteomic experiments (psidev.sourceforge.net/). The metabolomics standards initiative (MSI) ontology working group is seeking to facilitate the consistent annotation of metabolomics experiments by developing an ontology to help enable the scientific community to understand, interpret and integrate metabolomic experiments (msi-ontology.sourceforge.net/index. htm). More generally, the Functional Genomics Inves- tigation Ontology (FuGO) is developing an integrated ontology that provides both a set of ‘universal’ terms, i.e. terms applicable across functional genomics and domain-specific extensions to terms (fugo.sourceforge. net/). Although these ontologies are making significant contributions to the formalization of experiments in areas of biology, they are unsuitable as a template for a general ontology of experiments, as they are primarily
  • 7. oriented to specialized biomedical domains. 2. EXPO: AN ONTOLOGY OF SCIENTIFIC EXPERIMENTS We follow Schulze-Kremer’s description of an ontology as ‘a concise and unambiguous description of what principle entities are relevant to an application domain and the relationship between them’. EXPO is based on ideas from the philosophy of science (logical, probabilistic, methodological, epistemological, etc.) http://www.mged.sourceforge.net/ http://www.mged.sourceforge.net/ http://www.psidev.sourceforge.net/ http://ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/ http://ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/ http://www.ph.ed.ac.uk/ukqcd/community/the_grid/QCDml1.1/C onfigDoc/ConfigDoc.html http://www.ph.ed.ac.uk/ukqcd/community/the_grid/QCDml1.1/C onfigDoc/ConfigDoc.html http://www.ph.ed.ac.uk/ukqcd/community/the_grid/QCDml1.1/C onfigDoc/ConfigDoc.html http://mged.sourceforge.net/ http://mged.sourceforge.net/ http://www.psidev.sourceforge.net/ http://www.msi-ontology.sourceforge.net/index.htm http://www.msi-ontology.sourceforge.net/index.htm http://www.fugo.sourceforge.net/ http://www.fugo.sourceforge.net/ ScientificExperiment ExperimentalGoal ExperimentalResults
  • 13. p/o p/o p/o p/o p/o p/o p/o p/o p/o Figure 2. The ontology of scientific experiments (a fragment), where p/o is a part-of relation, a/o is an attribute-of relation and an arrow with an empty label corresponds to is-a relation. For each type of experiments (e.g. Galilean hypothesis-driven or computational experiment), there is a corresponding experimental goal: to confirm, to explain, to investigate or to compute. At the design stage, experimental object, equipment, experimental actions are specified in order to achieve the experimental goal. Experimental hypotheses are used to verify and evaluate the experimental results (for detailed description see EXPO). An ontology of scientific experiments L. N. Soldatova and R. D. King 797 (Curd & Cover 1988; Toulmin 2004), the theory of knowledge representation (Sowa 2000), the analysis of existing ontologies (suo.ieee.org/) including bio- ontologies (obo.sourceforge.net) and the theory of
  • 14. experiment design (Fisher 1956; Boniface 1995). The division of ontological knowledge into appropriate levels of abstraction is a fundamental part of our EXPO proposal (see figure 1). The upper ontology SUMO (suggested upper merged ontology) includes a formalization of such top-level classes as physical process, physical and abstract objects including dimen- sional units, measures, time intervals, etc. As described above, lower specialized experimental domain ontolo- gies are also starting to appear that aim to formalize knowledge about specific experimental techniques such as for microarrays (MO). What is currently missing is the intermediate layer of a general ontology of scientific experiments to formalize the ontological knowledge that is common between different scientific areas. EXPO provides a structure to describe such common concepts as experimental goals, experimental methods and actions, types of experiments, rules for experi- mental design, etc. (see figure 2). We see EXPO as a part of a general ontology of science that should formalize scientific tasks, methods, techniques, infrastructure of science (such knowledge about aca- demic staff, projects, scientific documents has J. R. Soc. Interface (2006) already partially been formalized in the KA2 ontology (protege.stanford.edu/ontologies/ontologyOfScience/ ontology_of_science.htm)). 2.1. The design principles of EXPO To form EXPO we have used a combined top-down (designing EXPO with reference to an upper ontology) bottom-up methodology (validating EXPO by appli- cations in different domains). The first step in the top- down approach was to anchor EXPO to a standard upper ontology, which describes general knowledge about the world. A standard upper ontology provides:
  • 15. template structures, terms, and relations, along with key definitions and axioms; a principled way of determining the top-level concepts of our ontology (as an extension of the upper ontology) and connections to other ontologies (enabling cross ontology use and inference). The Standard Upper Ontology Working Group IEEE P1600.1 has proposed SUMO as a general standard (Niles & Pease 2001) to support computer applications such as data interoperability, information search and retrieval, automated interfacing and natural language processing (suo.ieee.org/). We therefore selected SUMO as our upper ontology. Use of SUMO ensures compatibility with other compliant SUMO ontologies and enables EXPO to have wide reusability http://www.suo.ieee.org/ http://www.obo.sourceforge.net http://www.protege.stanford.edu/ontologies/ontologyOfScience/ ontology_of_science.htm http://www.protege.stanford.edu/ontologies/ontologyOfScience/ ontology_of_science.htm http://www.suo.ieee.org/ 798 An ontology of scientific experiments L. N. Soldatova and R. D. King and functionality. The top-down development process of EXPO ensures inclusion of the key concepts in general scientific experiments which would be difficult to ensure if EXPO was based on bottom-up general- ization of experiments from a particular scientific domain. 2.2. EXPO as an extension of SUMO Below we describe the elements (terms) of SUMO used to build EXPO. For each term we give both the SUMO
  • 16. definition in quotes (suo.ieee.org/SUO/SUMO/index. html) and an example of use from EXPO (sourceforge. net/projects/expo). Note, the meaning of the terms used in SUMO do not necessarily correspond to those in mathematics, philosophy or computer science; SUMO terms start with capitals; and terms in ontologies are by convention presented in singular form. Class. ‘Class differs from set in three important respects. First, Class is not assumed to be extensional. Second, Class typically has an associated ‘condition’ that determines the instances of the Class. Third, the instances of a class may occur only once within the class, i.e. a class cannot contain duplicate instances’. For example in EXPO, the condition ‘being a statement about cause-effect relations between known and unknown variables of the domain of the experiment’ determines the class ExperimentalHypothesis. Each class in EXPO has both a natural language definition, and a computational definition (as a list of associated relations). Individual (also called Instance). An entity ‘is an instance of a Class if it is included in that Class’. For example in EXPO, the particular experiment ‘a precision measurement of the mass of the top quark’ is an instance of the class ScientificExperiment. N.B. EXPO provides a conceptual description and does not contain individuals. However, as a reference model for description of experiments, EXPO assumes extensions by the adding of instances to represent particular experiments. The concept of an instance is also essential in EXPO in the definition of is-a relations. Relations: Subclass (is-a). ‘(subclass ?CLASS1 ?CLASS2) means that ?CLASS1 is a subclass of ?CLASS2, i.e. every instance of ?CLASS1 is also an instance of ?CLASS2.’ For example in EXPO, the class HypothesisAcceptanceMistake (‘the incorrect accep-
  • 17. tance or rejecting of the research hypothesis’) is a subclass of the class ResultError (‘an incorrectly inferred conclusion about a research hypothesis or about the phenomena involved in the experiment’). Instance of. ‘is a BinaryPredicate (instance ?INDI- VIDUAL ?CLASS) that means ?INDIVIDUAL has an associated condition that determines the instances of the ?CLASS’. For example in EXPO, the individual a precision measurement of the mass of the top quark experiment is associated to the class ScientificExperi- ment with the conditions: ‘being an investigation of J. R. Soc. Interface (2006) cause-effect relations between known and unknown variables of the field of study’. Part (p/o). ‘The basic mereological relation. All other mereological relations are defined in terms of this one. (part ?PART ?WHOLE) simply means that the Entity?PART is part of the Entity?WHOLE. Note that since part is a ReflexiveRelation, every Entity is a part of itself’. For example in EXPO, ExperimentalDesign is a part of ScientificExperiment. Attribute (a/o). ‘(attribute ?ENTITY ?PROP- ERTY) means that ?PROPERTY is an Attribute of ?ENTITY’. For example in EXPO, Controllability of an experimental variable is an attribute which charac- terizes whether a subject of the experiment can control/vary a variable. Role. As an addition to SUMO EXPO defines the Role predicate: ‘(role ?OBJECT ?ENTITY) means that ?OBJECT-a Role holder, plays a Role ?ENTITY in some context’ (Sunagawa et al. 2005). In EXPO, we always consider the context of an experiment. For
  • 18. example in EXPO, in the context of an experiment a human object can either play the role SubjectOfExperi- ment if the human is ‘one who executes the experi- ment’, or the role ObjectOfExperiment if the human is one ‘on whom an experiment is made’ (OED 1989). In designing EXPO we have endeavoured to use as few relations as possible. This helps to ensure that the ontology is both comprehensible and extendable, while being expressive enough to represent of all the required relations between classes of the domain (SUMO provides many other well defined relations (i.e. contains, located, precondition, etc.) that may be useful in extending EXPO). We selected a subset of 46 SUMO classes that are most relevant to describing scientific experiments, e.g. PhysicalQuantity, TimeMeasure, ArtificialLanguage, Experimenting. We then added 172 other classes that we judged necessary to represent scientific experiment, e.g. ExecutionOfExperiment, MeasurementError, ExperimentalResults. A number of EXPO classes we judge to be outside of the domain of experiments; these belong more properly in a full ontology of science (or SUMO), for example: Variable, Robustness, Reference. We aimed to employ what we consider to be the best practice in ontological development. For example, we follow what we believe to be one of the best constructed ontologies in science, the Foundational Model of Anatomy (FMA) (Rosse & Mejino 2003), in disallowing multiple inheritance; this we believe results in EXPO being simpler to comprehend and makes it easier to avoid the inference errors that can occur with multiple inheritance. In defining EXPO concepts, we also follow the FMA desiderata of relying on Aristotelian
  • 19. definitions (Aristotle 350 BC). ‘In dictionaries the unit of the information is a term., in an ontology.the unit of information is a concept and the purpose of definitions is to align all concepts in the ontology’s domain in a coherent inheritance type hierarchy.. Definitions should state the essence of . entities in terms of their characteristics consistent with the ontology’s context. Paraphrasing Aristotle, the essence of an entity is constituted by two sets of defining http://www.suo.ieee.org/SUO/SUMO/index.html http://www.suo.ieee.org/SUO/SUMO/index.html http://www.sourceforge.net/projects/expo http://www.sourceforge.net/projects/expo An ontology of scientific experiments L. N. Soldatova and R. D. King 799 attributes; one set, the genus, necessary to assign to an entity to a class and the other set, the differentiae, necessary to distinguish the entity from other entities also assigned to the class. A collection of entities that share the same set of essential characteristics constitu- tes a class of the ontology’ (Rosse & Mejino 2003). EXPO follows to SUMO naming conventions, such as: NameOfTerm. 2.3. Bottom-up design of EXPO We tested the validity of the design of EXPO by applying it to a number of different scientific domains (metabolomic, microbiological, computer science, particle physics). This range of application areas enabled us to have confidence that the classes in EXPO cover the essential concepts of scientific experiments.
  • 20. One motivation for the use of a minimum set of relations in EXPO is that we hope to provide compliance not only with the upper ontology SUMO, but also with the existing domain ontologies of experiments (or at least a mapping to them). We have therefore also engaged with the bio-ontology community to try to ensure that EXPO is compatible with the development of ontologies for experiments in biology. Although SUMO and the upper bio-ontologies employ different sets of relations, EXPO tries to avoid contradictions between them. Currently there are no compositional contradictions with the open biomedical ontologies (OBO) relations ontology (http://obo.sourceforge.net) (the latter has is-a, instance-of, part-of and other relations that are not used in EXPO, additionally OBO allows defining relations). To verify this approach we plan to incorporate domain ontologies of experiments that are already exist (MO, PSI) or are at the development stage like FuGO. 2.4. The EXPO domain We consider an experiment to have three levels: the physical level (the real world FieldOfStudy about which an experiment should discover new knowledge); the model level (our knowledge about the experimental domain ExperimentalModel); and a design level Experimental Design (where parameters, target vari- ables of an experiment, and a sequence of experimental actions are determined). Our definition of an experi- ment is ‘a scientific experiment is a research method which permits the investigation of cause-effect relations between known and unknown (target) variables of the domain.’ EXPO describes PhysicalExperiment where an experimenter manipulates by the real-world (physical)
  • 21. domain and ComputationalExperiment where an experimenter investigates cause-effect relations by manipulating a computational (non-physical) domain adequate to the real-world domain. EXPO is able to represent experiments with both explicit experimental hypothesis (GalileanExperiment) and without explicit hypothesis (BaconianExperiment) (Medawar 1981). Experiments are classified by the FieldOfStudy, i.e. MicroarrayExperiment, or MetabolomicExperiment. J. R. Soc. Interface (2006) EXPO supports several classification systems of domains: library classification DeweyDecimalClassifi- cation, LibraryOfCongressClassification and Research- CouncilsUKClassification. Our initial version of EXPO contains 218 well-defined concepts about experimental methods. We have automatically translated EXPO into the W3C standard ontology language OWL-DL using the Hozo ontology editor (Kozaki et al. 2002). 2.5. The future of EXPO The development of a general ontology for scientific experiments is an ambitious goal, and we have so far only proposed the key top-level concepts. Although we have sought to design EXPO to be uncontroversial and consistent with the generally accepted view of experi- ments in science, it is inevitable in work related to philosophy, that some of our design decisions are debateable. We have therefore opened up EXPO to modification by placing EXPO in SourceForge (source forge.net/projects/expo). We stress that EXPO is still at an initial stage in its development, and we invite scientist, researchers and practitioners to contribute to its improvement. 3. APPLICATIONS OF EXPO We argue that utilization of a common standard
  • 22. ontology for the annotation of scientific experiments will make scientific knowledge more explicit, help detect errors, promote the interchange and reliability of experimental methods and conclusions and remove redundancies in domain-specific ontologies. To test these claims we employed EXPO to annotate two real-world examples: one from physics (high-energy/ particle physics) and the other from biology (phyloge- netics). We selected these examples as extrema from an arbitrarily selected issue of Nature (10 June 2004). Full details of these annotations are in the electronic supplementary material. In both papers annotation with EXPO enabled the scientific knowledge presented in the paper (encoded as natural language free-text) to be made more explicit, and for problems in the papers to be found. In addition, EXPO served as a basis for logical inference about consistency and validity of the conclusions stated in the articles. Annotation with EXPO also suggested an unexpected similarity between these two experiments. 3.1. High-energy physics application The first example concerned a new estimate of the mass of the top quark (Mtop) authored by the ‘D0 Collaboration’ (approx. 350 scientists) (D0 Collabor- ation 2004). The D0 Collaboration in 1995 were joint discovers of the top quark; a landmark event in physics. The value of Mtop is of particular scientific importance as it is a key constraint on the mass of the hypothetical Higgs boson. The Higgs boson is believed to provide the mechanism by which particles acquire mass. The application of EXPO to annotating this experi- ment is shown in figure 3 (and more fully in the
  • 23. http://obo.sourceforge.net http://sourceforge.net/projects/expo http://sourceforge.net/projects/expo ScientificExperiment: ComputationalExperiment: Simulation AdminInfoAboutExperiment: title: a precision measurement of the mass of the top quark ClassificationByDomain: DomainOfExperiment: HighEnergyPhysics /ParticlePhysics DDC(Dewey): 539.7 atomic and nuclear physics LibraryOfCongress: QC 770 –798 atomic, nuclear, particle physics RelatedDomain: ComputationalStatistics DDC (Dewey): 519 probabilities and applied mathematics LibraryOfCongress: QA 273 –274 probabilities ResearchHypothesis: RepresentationStyle: text LinguisticExpression: NaturalLanguage: given the same observed data, use of the new statistical method M1 will produce a more accurate estimate of Mtop than the original method M0 LinguisticExpression: ArtificialLanguage AlternativeHypothesis: SubjectEffect: ExperimenterBias: LinguisticExpression: ArtificialLanguage: P (The Standard Model) is high
  • 24. ∴ Mtop = 173.3 DomainModel: a statistical branching model of particles ExperimentalModel: Factor: M – a method of estimating Mtop FactorLevel: M0 – the old method M1 – the new method TargetVariable: Mtop ModelAssumption1: it is possible to compare the accuracy of estimates of M0 and M1 without knowing the true value of Mtop. ModelAssumption2: 'the specific value of the Pbkg cut-off based on Mtop = 175 GeV/c 2 [8] has no effects on the experimental results' ExperimentalConclusion: representation style: text LinguisticExpression: NaturalLanguage the probability of the research hypothesis H1 has been increased: method M1 'yields a greatly improved precision' of Mtop [8] ResultError: ErrorOfConclusion: hypothesis acceptance mistake: false positive ≠ 0 ErrorOfConclusion: FaultyComparison: M0 and M1 estimates Higgs boson mass best fit mass estimate is experimentally excluded the standard model Higgs boson
  • 25. M M M >= Figure 3. EXPO formalization of the particle physics experiment (a fragment). 800 An ontology of scientific experiments L. N. Soldatova and R. D. King electronic supplementary material). This annotation makes it explicit that the experiment was somewhat unusual in not generating any new observational data. Instead, it presents the results of applying a new statistical analysis method to existing data (a set of putative top quark pair decays events involving eCjets and mCjets). No explicit hypothesis was put forward in the paper. However, we argue that the paper’s implicit experimental hypothesis was given the same observed data, use of the new statistical method will produce a more accurate estimate of Mtop than the original method. This is based on the authors’ statement ‘here we report a technique that extracts more information from each top-quark event and yields a greatly improved precision when compared to previous measurements’. We consider that the paper’s hypothesis does not concern J. R. Soc. Interface (2006) the value of Mtop directly, as this is deductively inferred from the hypothesis. We prefer the term ‘accuracy’ to ‘precision’ (which also occurs in the title) as its meaning is more generally associated with the relationship between the closeness of agreement between a measured value and a true value (www.physics.unc.edu/wdear
  • 26. dorf/uncertainty/definitions.html); which presumably is what is meant. The use of EXPO may have alerted the authors and the Nature editor to the unsuitability of use of the term precision. Annotation with EXPO high- lighted that little evidence is presented for or against the experimental hypothesis, only one sentence in the Methods section refers to simulation studies. Instead, the authors assume that the new method is more accurate and focused on application of the new statistical method to estimating Mtop. The estimate of http://www.physics.unc.edu/~deardorf/uncertainty/definitions.ht ml http://www.physics.unc.edu/~deardorf/uncertainty/definitions.ht ml ScientificExperiment: PhysicalExperiment: Hypothesis-forming; Hypothesis-driven AdminInfoAboutExperiment: Title: mesozoic origin of West Indian insectivores ClassificationByDomain: DomainOfExperiment: Zoology DDC(Dewey): 599: mammalology LibraryOfCongress: QL351–QL352 Zoology-Classif. Taxonomy DDC(Dewey): 575 evolution and genetics LibraryOfCongress: QH 367.5 molecular phylogenetics LibraryOfCongress: QH83 Biosystematics RelatedDomain: ComputationalStatistics DDC(Dewey): 519 probabilities and
  • 27. applied mathematics LibraryOfCongress: QA 273–274 probabilities ResearchHypothesis: implicit ExperimentalGoal: to investigate the phylogeny of the species: Solenodon paradoxus and Solenodon cubanus NullHypothesisH01: RepresentationStyle: text LinguisticExpression: NaturalLanguage: 'some have suggested a closerelationship to soricids (shrews) but not to talpids' LinguisticExpression: ArtificialLanguage So, Sh, T, An mammalia So . Sh . T . An . solenodon (So) soricoidea (Sh) talpoidea (T) ancestor (An, So) ancestor (An, Sh) ancestor (An, T) DomainModel: statistical branching model of evolution ExperimentalModel: Factor: T– phylogenetic tree relating Solenodon paradoxus and Solenodon cubanus to other mammal species FactorLevel: 'sampling trees every 20 generations' (spl. in [9]) TargetVariable: PS (T)–position and BL (T)–branch length of Solenodon paradoxus and Solenodon cubanus in tree T ModelAssumption1: the use of the selected DNA sequences will produce results that generalize to the
  • 28. complete genomes ModelAssumption2: molecular clock assumptions ExperimentalAction 1.1.1: extraction and purification Object: sample of DNA ParentGroup: DNA from Solenodon paradoxus Sampling: random sampling Instrument: Qiagen DNA cleanup kit …………………………………………………. ExperimentalConclusionC2: comment: formed hypothesis LinguisticExpression: ArtificialLanguage So, Sh, T, E, An, De mammalia So . Sh . T . E . X . An . solenodon (So) soricoidea (Sh) talpoidea (T) erinaceidea (E) ancestor (An, So) ancestor (An, De) ancestor (De, So) ancestor (De, Sh) ancestor (De, T) ancestor (De, E) Figure 4. EXPO formalization of the phylogenetic experiment (a fragment). An ontology of scientific experiments L. N. Soldatova and R. D. King 801 Mtop from the old method was 173.3G5.6 (stat) G5.5 (sys) GeV/c2 and from the new estimate 180.1G2.0 (stat) G2.6 (sys) GeV/c2. The current (April 2006) best estimate for Mtop is 174.2G3.4 GeV/c
  • 29. 2 (Tevatron Electroweak Working Group. CDF Collab- oration, D0 Collaboration 2006); therefore, it would appear that the original estimate was actually more accurate! Of course, it is possible that stochastic factors meant that the new statistical method was unlucky in its prediction. However, it would seem at least as likely that some form of methodological difficulty in the experiment was involved. One such problem was revealed when annotating the experiment with EXPO. The authors state that a ‘critical difference’ between the old and new method J. R. Soc. Interface (2006) was: ‘the assignment of more weight to events that are well measured or more likely to correspond to a top/anti- top signal’. This is an application of the Carnap principle: that you must take into account all of the evidence relevant to a question (Curd & Cover 1988; omega.math.albany.edu:8008/JaynesBook.html). In this case, the information that some events are better measured needs to be taken into account. However, annotating the new method with EXPO makes it explicit that 91 candidate events were used to calculate the old value, but only 22 of these were used for the new value. Therefore, the only weights used were 0 and 1. The Carnap principle would only justify these extreme weights if the 69 excluded events contained no infor- mation on Mtop. As this was not demonstrated and http://www.omega.math.albany.edu:8008/JaynesBook.html 802 An ontology of scientific experiments L. N. Soldatova and R. D. King
  • 30. would seem unlikely, it therefore appears that a source of statistical inefficiency was introduced which counter- acted the improved signal/noise ratio from choosing well-measured events. Another point highlighted by formalization in EXPO is the relationship between estimating Mtop and the existence of the Higgs boson. The paper concluded that Mtop is higher than previously estimated, which deduc- tively implies a higher mass for the Higgs boson. As the Higgs boson has not yet been observed, even at energies above its previously predicted maximum-likelihood mass, the inferred higher Mtop lent support to the existence of the Higgs boson. However, it would have been possible to argue validly the other way: that the Higgs boson is thought highly likely to exist therefore its non-observation makes more probable a higher value of Mtop. This argument was not explicit in the paper, but it might well have existed implicitly as a motivation. The paper would have benefited from making this argu- ment/motivation explicit. 3.2. Phylogenetics application The second example investigated the phylogenetic status of the mammalian species Solenodon cubanus and Solenodon paradoxus (Roca et al. 2004). The main experimental approach used was the comparison of nuclear and mitochondrial DNA sequences derived from Solenodons with homologous sequences in other mammals. Solenodons are endangered insectivores that inhabit the forests of Hispaniola and Cuba and their phylogenetic relationship with other mammals has long been a matter of controversy (Symonds 2005). Generally, this paper was clear and straightforward and a model of its kind. Figure 4 shows part of the EXPO instantiation of the experiment (for full details
  • 31. of the annotation, along with formalization of a part of the background knowledge written as logic program, see the electronic supplementary material). The use of EXPO to annotate the paper makes explicit the different hypotheses described in the paper. What we have identified as the ResearchHypothesis are the main conclusions of the paper. However, these conclusions are not mentioned as possible hypotheses in the text. This contrasts with what we identify as seven NullHypotheses, which are mentioned explicitly in the main text. This highlights an interesting point: the main experimental technique used in the paper, molecular phylogeny programs, does not typically employ explicit hypotheses. They aim to uncover the most probable evolutionary trees based on a model of evolution rather than to answer an explicit question of relatedness. However, when explicit hypotheses are available, as in the Solenodon case, this approach may well be sub-optimal. For example, we estimate that at least as much computer time was used to determine the sub-tree phylogeny of bats as the sub-tree phylogeny of Solenodons. Considering the null hypotheses in detail, it is worthy of note that no conclusions concerning hypotheses H03 and H04 are mentioned in the main text and the interested reader has to search the further information. The use of EXPO would have made this J. R. Soc. Interface (2006) omission clear. Another aspect of the research which use of EXPO would have highlighted, was that the DNA sequences produced during the experiment were stored in the EMBL database using the taxonomic term Insectivora. This taxon is now generally recognized to be polyphyletic (Symonds 2005), and its use contradicts
  • 32. the actual conclusions of the paper. We formalized using the logic programming language Prolog, the biological knowledge and logical inferences behind the authors’ argument that ‘Cuban Solenodons should be classified in a distinct genus, Atopogale’ (see the electronic supplementary material). Our analysis indicates that it would be more internally consistent for the authors to have classified Cuban Solenodons as a distinct family. The authors’ hesitation to name a new family probably owes more to the sociology of science than logic—naming a new family is a much more radical step. It is, of course, a serious shortcoming in biology that ‘genus’ and ‘family’ are not well-defined terms. It is significant that using EXPO to annotate these two very different experiments revealed an important similarity between them. This is that the natural phenomena studied are both modelled using stochastic branching processes—although at vastly different time scales (approx. 10 K24 s versus millions of years). This means that related mathematical techniques can be, and are, applied in both domains. As it is probable that there is little communication between these two domains, it is possible that one domain may have invented techniques of relevance to the other domain, which they are currently unaware of. Identification of such overlaps illustrates one of the benefits of a unified ontology for scientific experiments. 4. DISCUSSION
  • 33. 4.1. Dissemination of scientific knowledge It is probable that publication of scientific papers will soon happen almost exclusively online. It is also to be expected that most scientific data will also be published online. The vision of ‘e-Science’ is to publish online both papers and all of the data and metadata from a scientific experiment for posterity; so that all results can be repeated and compared with other related experiments (www.nesc.ac.uk). We believe that these developments in the online availability of scientific knowledge and data will greatly support the drive to the formalization of science: as the only possible way to effectively exploit this new data is to use computers, and for computers to work best requires formalization. We argue that the use of EXPO is an important step towards this. The traditional way of presenting scientific knowl- edge in scientific papers has many limitations. The most important and obvious of these is the use of natural language to describe knowledge—albeit aug- mented by various formalisms and mathematics. This is problematic because natural language is notorious for its imprecision and ambiguity. Its use is also a great barrier in using computers to store and analyse data—hence the growing importance of text mining. http://www.nesc.ac.uk An ontology of scientific experiments L. N. Soldatova and R. D. King 803 We argue that the content of scientific papers should increasingly be expressed in formal languages—is writing a scientific paper closer to writing poetry or a
  • 34. computer program? For the application of EXPO to become widespread, and a critical-mass of annotations made available, convenient tools will need to be developed to enable practicing scientists to annotate their own experiments. We envisage such tools will, for example, ask the user to describe the domain of the experiment, if the experiment involved any hypotheses, what experimental results support or reject hypotheses, etc. Such a tool could be incorporated into an electronic lab-book or done as a separate procedure at the same time as writing-up (the traditional paper). With the rise in laboratory auto- mation, and the increasing use of artificial intelligence to aid scientific experimentation (King et al. 2004), many parts of EXPO may be able to be automatically input. It is possible to envisage that some journals may enforce submission of such annotations, along with papers—just as submission of data to repositories is often compulsory. 4.2. Ontologies of science We view the development of EXPO as part of a general drive to formalize science. The current level of formalization varies greatly between the sciences. In some fields, such as particle physics, the background theories are highly formalized (in mathematical nota- tion). Yet even in this case, the actual process of experimental testing these theories, despite the great technical sophistication of the experiments (witness the new Large Hardron collider at CERN) is not yet formalized. The situation of biology presents an interesting contrast to particle physics. Very little work has been done on formalizing the background theories in biology, yet biology is leading the way in ontology development, both generally and for experiments.
  • 35. It is important to note that the general lack of explicit ontologies in science does not mean that ontologies are not being employed. It means that we are currently condemned to using implicit, non- standardized and possibly naive ones. 5. CONCLUSIONS The unity of scientific experimentation implies that an accepted general ontology of experiments is both possible and desirable. Such an ontology would promote the sharing of results within and between subjects, reducing both the duplication and loss of knowledge. It is also an essential step in formalizing science and so fully exploiting computer reasoning in science (King et al. 2004). To quote Francis Bacon ‘Therefore, from a closer and purer league between these two faculties, the experimental and the rational (such as never yet been made), much may be hoped’. J. R. Soc. Interface (2006) REFERENCES Aristotle, 350 BC Categories. Translated by E. M. Edghill. See http://evans-experientialism.freewebspace.com/aris totle_categories.htm. Boniface, D. R. 1995 Experiment design and statistical methods for behavioural and social research. London, UK: Chapman & Hall. Curd, M. & Cover, J. A. 1988 Philosophy of science. New York, NY: Norton. D0 Collaboration 2004 A precision measurement of the mass of the top quark. Nature 429, 639–642. (doi:10.1038/ 431639a)
  • 36. Fisher, R. A. 1956 The design of experiments. Edinburgh, UK: Oliver & Boyd. Hey, T. & Trefethon, A. 2003 The data deluge: an e-science perspective. In Grid computing-making the global infra- structure a reality, vol. 36 (ed. F. Berman, G. C. Fox & A. J. G. Hey), pp. 809–824. London, UK: Wiley. King, R. D., Whelan, K. E., Jones, M. F., Reiser, P. G. K. & Bryant, C. H. 2004 Functional genomics hypothesis generation by a robot scientist. Nature 427, 247–252. (doi:10.1038/nature02236) Kozaki, K., Kitamura, Y., Ikeda, M. & Mizoguchi, R. 2002 Hozo: an environment for building/using ontologies based on a fundamental consideration of “Role” and “Relation- ship”. Knowl. Eng. Knowl. Manag., pp. 213–218. Medawar, P. B. 1981 Advice to a young scientist. London, UK: Pan Books Ltd. Niles, I. & Pease, A. 2001 Towards a standard upper ontology. In Proc. 2nd Int. Conf. on Formal Ontology in Information Systems (FOIS-2001) (ed. C. Welty & B. Smith). Roca, A. L., Bar-Gal, G. K., Eizirik, E., Helgen, M. K., Maria, R., Springer, M. S., O’Brien, S. J. & Murphy, W. J. 2004 Mesozoic origin for West Indian insectivores. Nature 429, 649–651. (doi:10.1038/nature02597) Rosse, C. & Mejino Jr, J. L. 2003 A reference ontology for biological informatics: the foundational model of anatomy. Biomed. Inform. 36, 478–500. (doi:10.1016/j.jbi.2003.11. 007)
  • 37. Sowa, J. F. 2000 Knowledge representation: logical, philoso- phical, and computational foundations. Pacific Grove, CA: Brooks Cole Publishing. Sunagawa, E., Kozaki, K., Kitamura, Y. & Mizoguchi, R. 2005 A framework for organizing role concepts in ontology development tool: Hozo. Roles, an interdisciplinary perspective: ontologies, programming languages, and mul- tiagent systems AAAI Symp. FS-05-08, pp. 136–143. Arlington, VA: AAAI. Symonds, M. R. 2005 Phylogeny and life histories of the ‘insectivora’: controversies and consequences. Biol. Rev. 80, 93–128. (doi:10.1017/S1464793104006566) The Oxford English Dictionary. 1989 2nd edn. Oxford: Oxford University Press. Tevatron Electroweak Working Group. CDF Collaboration, D0 Collaboration, 2006 Combination of CDF and D0 results on the mass of the top quark. See http://arXiv. org/abs/hep-ex/0604053. Toulmin, S. 2004 Philosophy of science. Encyclopaedia Britannica, Deluxe CD. Chicago, IL: Encyclopaedia Britannica. Turing, A. M. 1936 On computable numbers, with an application to the Entscheidungsproblem. Proc. Lond. Math. Soc. 2, 230–265. http://evans- experientialism.freewebspace.com/aristotle_categories.htm http://evans- experientialism.freewebspace.com/aristotle_categories.htm http://dx.doi.org/doi:10.1038/431639a
  • 38. http://dx.doi.org/doi:10.1038/431639a http://dx.doi.org/doi:10.1038/nature02236 http://dx.doi.org/doi:10.1038/nature02597 http://dx.doi.org/doi:10.1016/j.jbi.2003.11.007 http://dx.doi.org/doi:10.1016/j.jbi.2003.11.007 http://dx.doi.org/doi:10.1017/S1464793104006566 http://arXiv.org/abs/hep-ex/0604053 http://arXiv.org/abs/hep-ex/0604053An ontology of scientific experimentsIntroductionEXPO: an ontology of scientific experimentsThe design principles of EXPOEXPO as an extension of SUMOClassIndividual (also called Instance)RelationsBottom-up design of EXPOThe EXPO domainThe future of EXPOApplications of EXPOHigh-energy physics applicationPhylogenetics applicationDiscussionDissemination of scientific knowledgeOntologies of scienceConclusionsReferences