Blooming Together_ Growing a Community Garden Worksheet.docx
Experiences in building an ontology driven image database for ...
1. Standards and Ontologies for
Functional Genomics
Conference
October 23-26, 2004
University of Pennsylvania School
of Medicine
EXPERIENCES IN BUILDING AN ONTOLOGY-
DRIVEN IMAGE DATABASE FOR BIOLOGISTS
Chris Catton
Image BioInformatics Laboratory
Department of Zoology
University of Oxford, UK
e-mail: chris.catton@zoo.ox.ac.uk
2. Outline
• Why are images important?
• What is the BioImage database?
• Why use a semantic web architecture?
• Lessons and research questions
3. Why are biological images important in the
post-genomic age?
• Images are semantic instruments for capturing aspects of the real
world, and form a vital part of the scientific record, for which words are
no substitute
• In the post-genomic world, attention is now focused on the organization
and integration of information within cells, for functional analyses of
gene products
• In a month a single active cell biology lab may generate between 10
and 100 Gbytes of multidimensional image data
4. Images are complex …
• An image database must be
able to store original images in
any digital format currently
available or yet to be invented,
including multi-channel 3D
images, multi-channel videos,
etc.
5. The need for image databases
• The value of digital image information depends upon how easily it
can be located, searched for relevance, and retrieved
• Detailed descriptive metadata about the images are essential
• Without them, digital image repositories become little more than
meaningless and costly data graveyards
• Despite the growth of on-line journals that permit the inclusion of
media objects, few of these resources are freely available, and those
that are are difficult to locate and are not cross-searchable
• There is thus a need for a free publicly available image database with
rich well-structured searchable metadata
• The BioImage Database seeks to fulfil that need
7. What metadata?
• Image acquisition (who took the original micrograph, where,
when, under what conditions, for what purpose, etc.)
• The media object itself (source and derivation, image type,
dynamic range, resolution, format, codec, etc.),
• The denotation of the referent (e.g. the name, age and condition
of the subject),
• Connotation of the referent (the image’s interpretation, meaning,
purpose or significance, its relevance to its creator and others,
and its semantic relationship to other images).
• Field aspects of the real world that cannot conveniently be
attached to any particular object (e.g. variations of illumination
intensity or chemo-attractant concentration across the field of
view of a light microscope image).
• Sequences of change where there is a need to preserve the
concept of object identity in the face of radical spatio-temporal
changes in appearance.
8. Why use a semantic web architecture?
• Traditional relational databases don’t meet our needs
• Image data is complex, layered, and difficult to model
• Images are searched primarily through their metadata
• Metadata is time consuming and difficult to obtain
• Ontologies offer the promise of better retrieval accuracy through
linking to instances in an ontology, rather than attempting to
process free text.
• Ontologies offer the promise of easy inter-operability with other
systems
10. Lessons learned:
Performance, scalability …
• Database retrieval is slower than a traditional database would
be
• Scalability remains to be tested (true for all semantic web
software)
• Query languages (RDQL) are immature when compared to SQL
• Parsing RDF is hard and slow (RDF-ABBREV output of the
Jena parser is unreliable and the unstriped format requires
multiple passes to create XML that can easily be transformed to
HTML)
11. A problem with ontologies?
• The volume of data generated in the Life Sciences is now
estimated to be doubling every month
• Already people look less and less at the raw scientific data
(unless they are their own results)
• As this volume of data accumulates, few if any of us will have
the time or the mental capacity to assimilate new data, structure
them in a meaningful way and extract information, without first
processing the data through an ontology or some other similar
machine-based organisational aid
• THE ONTOLOGY WILL BE WRONG! (or we should all pack up
and go home)
12. Paradigm shifts
• Our human understanding of an area of science is never static,
but is constantly being revised by new research
• Such revisions in understanding are either evolutionary
(incremental), following the progressive discovery of more and
more detail, interpreted according to the prevailing paradigm, or
revolutionary, when the prevailing paradigm is overthrown by
another
• How do paradigm revolutions succeed?
"A new scientific truth does not triumph by convincing its
opponents and making them see the light, but rather
because its opponents eventually die, and a new generation
grows up that is familiar with it"
(Max Planck, 1949)
13. Factors preventing evolution
• Ontology builders are ‘monks’ (and nuns) - led by an ‘abbot’, a
relatively senior domain expert likely to be committed to encapsulating
the dominant paradigm
• Substantial problems confront any newcomers wishing to contribute,
since ontology building is time-consuming and expensive
• Since an ontology expresses the community consensus, there will be
massive social pressures against change
• If large volumes of data have already been encoded using an existing
ontology, this will make it difficult to introduce change
• The first ontology in a domain may assume a monopolistic position that
becomes unassailable, even if it has universally acknowledged
weaknesses
• Ontologies are unlikely to evolve in response to the same market
forces that drive the development of applications software
14. Encapsulating the dominant paradigm
• Imagine a section of an ontology describing the development of adult
mammalian bone marrow and brain, constructed according to the pre-1980
dominant paradigm that bone marrow develops from mesoderm, while
brain develops from ectoderm
15. An example of paradigm evolution
• Subsequently, adult mouse brain was found to contain haemopoietic stem cells
• Bartlett (1982) hypothesised that these cells developed from foetal haemopoietic cells that
entered the brain tissue before the barrier was established
• This challenge to the dominant paradigm that brain tissues are derived exclusively from
ectoderm can be accommodated by extending the graph
16. An example of paradigm revolution
• More recently, Brazelton et al. (2000) claimed that haemopoietic stem cells from adult
bone marrow can develop into neural cells in adult mouse brain
• If true, this result overthrows the paradigm that neuronal cells can only develop from
embryonic ectoderm, requiring a new ontology incompatible with the old
• This new ontology is no longer an extension of the previous one, since neural cells no
longer develop only from foetal neuroepithelium
17. A way forward – using Named Graphs in
RDF (and OWL?)
• In response to considerable frustration and confusion within the RDF
community about the best method of reifying RDF statements, Jeremy
Carroll et al. proposed an extension to RDF
18. Thanks and acknowledgements
• David Shotton and Simon Sparks
for BioImage developments
(http://www.bioimage.org)
• John Pybus, our computer systems
manager, for keeping us running in
spite of the problems
• Liz Mellings for unbounded
patience inputting data and testing
• The European Commission for
funding the BioImage Project (EC
IST 5th Framework Contract 2001-
32688: ORIEL – Online Research
Information Environment for the Life
Sciences; http://www.oriel.org)