Knowledge Management in a Knowledge Based Discipline

Knowledge Management in a Knowledge Based
Discipline
Robert Stevens
BioHealth Informatics Group
University of Manchester
Robert.Stevens@manchester.ac.uk

Introduction
• How do we do (molecular)biology
• Managing stamp albums
• A knowledge based discipline
• Representing knowledge computationally
• Ontologies that define what entities are in the
domain
• Describing biological knowledge ontologically
• Using ontologies and is it enough?

Ernest Rutherford
“All science is either physics or stamp collecting”
Image: http://en.wikipedia.org/wiki/File:Ernest_Rutherford2.jpg

Laws in Biology
Charles Darwin
Image: http://en.wikipedia.org/wiki/File:Charles_Darwin_01.jpg
On The Origin of Species - 1859

Classic and Modern Biology
Genotype Phenotype
Modern biology
Classic biology

Central Dogma
Image: http://cellbio.utmb.edu/CELLBIO/DNA-RNA.jpg

Speed of sequencing
• First human genome
– 10+ years to produce
– Cost $500 million
– Huge international effort
• Now done in 10 weeks
– (for $399)
– http://tinyurl.com/genomecost
– http://www.23andme.com

1000+ databases
• according to Nucleic Acids Research

PubMed: 2 papers per minute
• ~700,000 individual papers
• Grows at 2 papers per minute (see http://
blogs.bbsrc.ac.uk for details)

Uniprot:- A protein database?
Ι∆ ΠΡΙΟ_ΗΥΜΑΝ ΣΤΑΝ∆ΑΡ∆; ΠΡΤ; 253 ΑΑ.
ΑΧ Π04156;
∆Τ 01−ΝΟς−1986 (Ρελ. 03, Χρεατεδ)
∆Τ 01−ΝΟς−1986 (Ρελ. 03, Λαστ σεθυενχε υπδατε)
∆Τ 20−ΑΥΓ−2001 (Ρελ. 40, Λαστ αννοτατιον υπδατε)
∆Ε Μαϕορ πριον προτειν πρεχυρσορ (ΠρΠ) (ΠρΠ27−30) (ΠρΠ33−35Χ) (ΑΣΧΡ).
ΓΝ ΠΡΝΠ.
ΟΣ Ηοµο σαπιενσ (Ηυµαν).
ΟΧ Ευκαρψοτα; Μεταζοα; Χηορδατα; Χρανιατα; ςερτεβρατα; Ευτελεοστοµι;
ΟΧ Μαµµαλια; Ευτηερια; Πριµατεσ; Χαταρρηινι; Ηοµινιδαε; Ηοµο.
ΟΞ ΝΧΒΙ_ΤαξΙ∆=9606;
ΡΝ [1]
ΡΠ ΣΕΘΥΕΝΧΕ ΦΡΟΜ Ν.Α.
ΡΞ ΜΕ∆ΛΙΝΕ=86300093; ΠυβΜεδ=3755672;
ΡΑ Κρετζσχηµαρ Η.Α., Στοωρινγ Λ.Ε., Ωεσταωαψ ∆., Στυββλεβινε Ω.Η.,
ΡΑ Πρυσινερ Σ.Β., ∆εαρµονδ Σ.ϑ.;
ΡΤ ∀Μολεχυλαρ χλονινγ οφ α ηυµαν πριον προτειν χ∆ΝΑ.∀;
ΡΛ ∆ΝΑ 5:315−324(1986).
ΡΝ [2]
ΡΠ ΣΕΘΥΕΝΧΕ ΟΦ 8−253 ΦΡΟΜ Ν.Α.
ΡΞ ΜΕ∆ΛΙΝΕ=86261778; ΠυβΜεδ=3014653;
ΡΑ Λιαο Ψ.−Χ.ϑ., Λεβο Ρ.ς., Χλαωσον Γ.Α., Σµυχκλερ Ε.Α.;
ΡΤ ∀Ηυµαν πριον προτειν χ∆ΝΑ: µολεχυλαρ χλονινγ, χηροµοσοµαλ µαππινγ,
ΡΤ ανδ βιολογιχαλ ιµπλιχατιονσ.∀;
ΡΛ Σχιενχε 233:364−367(1986).
ΡΝ [3]
ΡΠ ΣΕΘΥΕΝΧΕ ΟΦ 58−85 ΑΝ∆ 111−150 (ςΑΡΙΑΝΤ ΑΜΨΛΟΙ∆ ΓΣΣ).
ΡΞ ΜΕ∆ΛΙΝΕ=91160504; ΠυβΜεδ=1672107;
ΡΑ Ταγλιαϖινι Φ., Πρελλι Φ., Γηισο ϑ., Βυγιανι Ο., Σερβαν ∆.,
ΡΑ Πρυσινερ Σ.Β., Φαρλοω Μ.Ρ., Γηεττι Β., Φρανγιονε Β.;
ΡΤ ∀Αµψλοιδ προτειν οφ Γερστµανν−Στραυσσλερ−Σχηεινκερ δισεασε (Ινδιανα
ΡΤ κινδρεδ) ισ αν 11 κδ φραγµεντ οφ πριον προτειν ωιτη αν Ν−τερµιναλ
ΡΤ γλψχινε ατ χοδον 58.∀;
ΡΛ ΕΜΒΟ ϑ. 10:513−519(1991).
ΡΝ [4]
ΡΠ ΣΤΡΥΧΤΥΡΕ ΒΨ ΝΜΡ ΟΦ 118−221.
ΡΞ ΜΕ∆ΛΙΝΕ=20359708; ΠυβΜεδ=10900000;
ΡΑ Χαλζολαι Λ., Λψσεκ ∆.Α., Γυντερτ Π., ϖον Σχηροεττερ Χ., Ριεκ Ρ.,
ΡΑ Ζαην Ρ., Ωυετηριχη Κ.;
ΡΤ ∀ΝΜΡ στρυχτυρεσ οφ τηρεε σινγλε−ρεσιδυε ϖαριαντσ οφ τηε ηυµαν πριον
ΡΤ προτειν.∀;
ΡΛ Προχ. Νατλ. Αχαδ. Σχι. Υ.Σ.Α. 97:8340−8345(2000).
ΧΧ −!− ΦΥΝΧΤΙΟΝ: ΤΗΕ ΦΥΝΧΤΙΟΝ ΟΦ ΠΡΠ ΙΣ ΝΟΤ ΚΝΟΩΝ. ΠΡΠ ΙΣ ΕΝΧΟ∆Ε∆ ΙΝ ΤΗΕ
ΧΧ ΗΟΣΤ ΓΕΝΟΜΕ ΑΝ∆ ΙΣ ΕΞΠΡΕΣΣΕ∆ ΒΟΤΗ ΙΝ ΝΟΡΜΑΛ ΑΝ∆ ΙΝΦΕΧΤΕ∆ ΧΕΛΛΣ.
ΧΧ −!− ΣΥΒΥΝΙΤ: ΠΡΠ ΗΑΣ Α ΤΕΝ∆ΕΝΧΨ ΤΟ ΑΓΓΡΕΓΑΤΕ ΨΙΕΛ∆ΙΝΓ ΠΟΛΨΜΕΡΣ ΧΑΛΛΕ∆
ΧΧ ∀ΡΟ∆Σ∀.
ΧΧ −!− ΣΥΒΧΕΛΛΥΛΑΡ ΛΟΧΑΤΙΟΝ: ΑΤΤΑΧΗΕ∆ ΤΟ ΤΗΕ ΜΕΜΒΡΑΝΕ ΒΨ Α ΓΠΙ−ΑΝΧΗΟΡ.
ΧΧ −!− ΠΟΛΨΜΟΡΠΗΙΣΜ: ΤΗΕ ΦΙςΕ ΤΑΝ∆ΕΜ ΟΧΤΑΠΕΠΤΙ∆Ε ΡΕΠΕΑΤΣ ΡΕΓΙΟΝ ΙΣ ΗΙΓΗΛΨ
ΧΧ ΥΝΣΤΑΒΛΕ. ΙΝΣΕΡΤΙΟΝΣ ΟΡ ∆ΕΛΕΤΙΟΝΣ ΟΦ ΟΧΤΑΠΕΠΤΙ∆Ε ΡΕΠΕΑΤ ΥΝΙΤΣ ΑΡΕ
ΧΧ ΑΣΣΟΧΙΑΤΕ∆ ΤΟ ΠΡΙΟΝ ∆ΙΣΕΑΣΕ.
ΦΤ ΣΙΓΝΑΛ 1 22
ΦΤ ΧΗΑΙΝ 23 230 ΜΑϑΟΡ ΠΡΙΟΝ ΠΡΟΤΕΙΝ.
ΦΤ ΠΡΟΠΕΠ 231 253 ΡΕΜΟςΕ∆ ΙΝ ΜΑΤΥΡΕ ΦΟΡΜ (ΒΨ ΣΙΜΙΛΑΡΙΤΨ).
ΦΤ ΛΙΠΙ∆ 230 230 ΓΠΙ−ΑΝΧΗΟΡ (ΒΨ ΣΙΜΙΛΑΡΙΤΨ).
ΦΤ ΧΑΡΒΟΗΨ∆ 181 181 Ν−ΛΙΝΚΕ∆ (ΓΛΧΝΑΧ...) (ΠΡΟΒΑΒΛΕ).
ΦΤ ∆ΙΣΥΛΦΙ∆ 179 214 ΒΨ ΣΙΜΙΛΑΡΙΤΨ.
ΦΤ ∆ΟΜΑΙΝ 51 91 5 Ξ 8 ΑΑ ΤΑΝ∆ΕΜ ΡΕΠΕΑΤΣ ΟΦ Π−Η−Γ−Γ−Γ−Ω−Γ−
ΦΤ Θ.
ΦΤ ΡΕΠΕΑΤ 51 59 1.
ΦΤ ΙΝ ΠΑΤΙΕΝΤΣ ΩΗΟ ΗΑςΕ Α ΠΡΠ ΜΥΤΑΤΙΟΝ ΑΤ
ΦΤ ΧΟ∆ΟΝ 178: ΠΑΤΙΕΝΤΣ ΩΙΤΗ ΜΕΤ ∆ΕςΕΛΟΠ ΦΦΙ,
ΦΤ ΤΗΟΣΕ ΩΙΤΗ ςΑΛ ∆ΕςΕΛΟΠ Χϑ∆).
ΦΤ /ΦΤΙδ=ςΑΡ_006467.
ΦΤ ςΑΡΙΑΝΤ 171 171 Ν −> Σ (ΙΝ ΣΧΗΙΖΟΑΦΦΕΧΤΙςΕ ∆ΙΣΟΡ∆ΕΡ).
ΦΤ ςΑΡΙΑΝΤ 178 178 ∆ −> Ν (ΙΝ ΦΦΙ ΑΝ∆ Χϑ∆).
ΦΤ ςΑΡΙΑΝΤ 180 180 ς −> Ι (ΙΝ Χϑ∆).
ΦΤ ςΑΡΙΑΝΤ 183 183 Τ −> Α (ΙΝ ΦΑΜΙΛΙΑΛ ΣΠΟΝΓΙΦΟΡΜ
ΦΤ ΕΝΧΕΠΗΑΛΟΠΑΤΗΨ).
ΦΤ ςΑΡΙΑΝΤ 187 187 Η −> Ρ (ΙΝ ΓΣΣ).
ΦΤ ςΑΡΙΑΝΤ 188 188 Τ −> Κ (ΙΝ ΕΟΑ∆; ∆ΕΜΕΝΤΙΑ ΑΣΣΟΧΙΑΤΕ∆ ΤΟ
ΦΤ ΠΡΙΟΝ ∆ΙΣΕΑΣΕΣ).
ΦΤ ςΑΡΙΑΝΤ 188 188 Τ −> Ρ.
ΦΤ ςΑΡΙΑΝΤ 196 196 Ε −> Κ (ΙΝ Χϑ∆).
ΣΘ ΣΕΘΥΕΝΧΕ 253 ΑΑ; 27661 ΜΩ; 43∆Β596ΒΑΑΑ66484 ΧΡΧ64;
ΜΑΝΛΓΧΩΜΛς ΛΦςΑΤΩΣ∆ΛΓ ΛΧΚΚΡΠΚΠΓΓ ΩΝΤΓΓΣΡΨΠΓ ΘΓΣΠΓΓΝΡΨΠ ΠΘΓΓ
ΓΓΩΓΘΠ ΗΓΓΓΩΓΘΠΗΓ ΓΓΩΓΘΠΗΓΓΓ ΩΓΘΠΗΓΓΓΩΓ ΘΓΓΓΤΗΣΘΩΝ ΚΠΣΚΠΚΤΝ
ΜΚ ΗΜΑΓΑΑΑΑΓΑ ςςΓΓΛΓΓΨΜΛ ΓΣΑΜΣΡΠΙΙΗ ΦΓΣ∆ΨΕ∆ΡΨΨ ΡΕΝΜΗΡΨΠΝΘ ςΨ
ΨΡΠΜ∆ΕΨΣ ΝΘΝΝΦςΗ∆Χς ΝΙΤΙΚΘΗΤςΤ ΤΤΤΚΓΕΝΦΤΕ Τ∆ςΚΜΜΕΡςς ΕΘΜΧΙΤΘΨ
ΕΡ ΕΣΘΑΨΨΘΡΓΣ ΣΜςΛΦΣΣΠΠς ΙΛΛΙΣΦΛΙΦΛ
ΙςΓ
//
ΧΧ −!− ∆ΙΣΕΑΣΕ: ΠΡΠ ΙΣ ΦΟΥΝ∆ ΙΝ ΗΙΓΗ ΘΥΑΝΤΙΤΨ ΙΝ ΤΗΕ
ΧΧ ΒΡΑΙΝ ΟΦ ΗΥΜΑΝΣ ΑΝ∆ ΑΝΙΜΑΛΣ ΙΝΦΕΧΤΕ∆
ΧΧ ΩΙΤΗ ΝΕΥΡΟ∆ΕΓΕΝΕΡΑΤΙςΕ ∆ΙΣΕΑΣΕΣ ΚΝΟΩΝ ΑΣ
ΧΧ ΤΡΑΝΣΜΙΣΣΙΒΛΕ ΣΠΟΝΓΙΦΟΡΜ ΕΝΧΕΠΗΑΛΟΠΑΤΗΙΕΣ ΟΡ ΠΡΙΟΝ Χ
Χ ∆ΙΣΕΑΣΕΣ,ΛΙΚΕ: ΧΡΕΥΤΖΦΕΛ∆Τ−ϑΑΚΟΒ ∆ΙΣΕΑΣΕ (Χϑ∆),
ΧΧ ΓΕΡΣΤΜΑΝΝ−ΣΤΡΑΥΣΣΛΕΡ ΣΨΝ∆ΡΟΜΕ (ΓΣΣ), ΦΑΤΑΛ
ΧΧ ΦΑΜΙΛΙΑΛ ΙΝΣΟΜΝΙΑ (ΦΦΙ) ΑΝ∆ ΚΥΡΥ ΙΝ ΗΥΜΑΝΣ;
ΧΧ ΣΧΡΑΠΙΕ ΙΝ ΣΗΕΕΠ ΑΝ∆ ΓΟΑΤ; ΒΟςΙΝΕ ΣΠΟΝΓΙΦΟΡΜ
ΧΧ ΕΝΧΕΠΗΑΛΟΠΑΤΗΨ (ΒΣΕ) ΙΝ ΧΑΤΤΛΕ; ΤΡΑΝΣΜΙΣΣΙΒΛΕ
ΧΧ ΜΙΝΚ ΕΝΧΕΠΗΑΛΟΠΑΤΗΨ (ΤΜΕ); ΧΗΡΟΝΙΧ ΩΑΣΤΙΝΓ
ΧΧ ∆ΙΣΕΑΣΕ (ΧΩ∆) ΟΦ ΜΥΛΕ ∆ΕΕΡ ΑΝ∆ ΕΛΚ; ΦΕΛΙΝΕ
ΧΧ ΣΠΟΝΓΙΦΟΡΜ ΕΝΧΕΠΗΑΛΟΠΑΤΗΨ (ΦΣΕ) ΙΝ ΧΑΤΣ ΑΝ∆
ΧΧ ΕΞΟΤΙΧ ΥΝΓΥΛΑΤΕ ΕΝΧΕΠΗΑΛΟΠΑΤΗΨ (ΕΥΕ) ΙΝ
ΧΧ ΝΨΑΛΑ ΑΝ∆ ΓΡΕΑΤΕΡ ΚΥ∆Υ. ΤΗΕ ΠΡΙΟΝ ∆ΙΣΕΑΣΕΣ
ΧΧ ΙΛΛΥΣΤΡΑΤΕ ΤΗΡΕΕ ΜΑΝΙΦΕΣΤΑΤΙΟΝΣ ΟΦ ΧΝΣ
ΧΧ ∆ΕΓΕΝΕΡΑΤΙΟΝ: (1) ΙΝΦΕΧΤΙΟΥΣ (2)
ΧΧ ΣΠΟΡΑ∆ΙΧ ΑΝ∆ (3) ∆ΟΜΙΝΑΝΤΛΨ ΙΝΗΕΡΙΤΕ∆ ΦΟΡΜΣ.
ΧΧ ΤΜΕ, ΧΩ∆, ΒΣΕ, ΦΣΕ, ΕΥΕ ΑΡΕ ΑΛΛ ΤΗΟΥΓΗΤ ΤΟ
ΧΧ ΟΧΧΥΡ ΑΦΤΕΡ ΧΟΝΣΥΜΠΤΙΟΝ ΟΦ ΠΡΙΟΝ−ΙΝΦΕΧΤΕ∆
ΧΧ ΦΟΟ∆ΣΤΥΦΦΣ.
∆Ρ ΕΜΒΛ; Μ13667; ΑΑΑ19664.1; −.
∆Ρ ΕΜΒΛ; Μ13899; ΑΑΑ60182.1; −.
∆Ρ ΕΜΒΛ; ∆00015; ΒΑΑ00011.1; −.
∆Ρ ΠΙΡ; Α05017; Α05017.
∆Ρ ΠΙΡ; Α24173; Α24173.
∆Ρ ΠΙΡ; Σ14078; Σ14078.
∆Ρ Π∆Β; 1Ε1Γ; 20−ϑΥΛ−00.
∆Ρ Π∆Β; 1Ε1ϑ; 20−ϑΥΛ−00.
∆Ρ Π∆Β; 1Ε1Π; 20−ϑΥΛ−00.
∆Ρ Π∆Β; 1Ε1Σ; 21−ϑΥΛ−00.
∆Ρ Π∆Β; 1Ε1Υ; 20−ϑΥΛ−00.
∆Ρ Π∆Β; 1Ε1Ω; 20−ϑΥΛ−00. ∆Ρ ΜΙΜ; 176640; −.
∆Ρ ΜΙΜ; 123400; −.
∆Ρ ΜΙΜ; 137440; −.
∆Ρ ΜΙΜ; 245300; −.
∆Ρ ΜΙΜ; 600072; −.
∆Ρ ΜΙΜ; 604920; −.
∆Ρ ΙντερΠρο; ΙΠΡ000817; Πριον.
∆Ρ Πφαµ; ΠΦ00377; πριον; 1.
∆Ρ ΠΡΙΝΤΣ; ΠΡ00341; ΠΡΙΟΝ.
∆Ρ ΣΜΑΡΤ; ΣΜ00157; ΠΡΠ; 1.
∆Ρ ΠΡΟΣΙΤΕ; ΠΣ00291; ΠΡΙΟΝ_1; 1.
∆Ρ ΠΡΟΣΙΤΕ; ΠΣ00706; ΠΡΙΟΝ_2; 1.
ΚΩ Πριον; Βραιν; Γλψχοπροτειν; ΓΠΙ−ανχηορ; Ρεπεατ; Σιγναλ;
ΚΩ 3∆−στρυχτυρε; Πολψµορπηισµ; ∆ισεασε µυτατιον.

What is Knowledge?
• Knowledge – all information
and an understanding to
carry out tasks and to infer
new information
• Information -- data
equipped with meaning
• Data -- un-interpreted
signals that reach our
senses
Michael Ashburner
Professor
University of Cambridge
UK
I
S
M
B
Name
Job
Institution
Country
C
o
n
f
man
academic, senior
ancient university, 5 rated
European
important figure in biology
B
I
O
L
O
G
Y

A Knowledge Based Discipline
• Rather than laws captured in mathematics….
• We have lots of facts: the discipline’s knowledge
• Rather than “calculating” what a protein does, we
investigate and write it down
• Equivalent to writing down the trajectories of all
thrown objects and not doing ballistics!
• To do biology one needs “the knowledge”

Heterogeneity
• 28 ways to format the representations of a biological
sequence
• Though one way to represent the bases or amino
acids…
• Different words same concept
• Different concepts same words
• Different and implicit data schema

Categories and Category Labels
GO:0000368
U2-type nuclear mRNA 5' splice site recognition
spliceosomal E complex formation
spliceosomal E complex biosynthesis
spliceosomal CC complex formation
U2-type nuclear mRNA 5'-splice site recognition

An Identity Crisis
• Database entries have identifiers unique within their
database
• The type of entity described in an entry doesn’t have
an identifier
• Different entries about the same type talk about it
differently
• How do we know when an entry in one DB talks
about the same thing as another entry in another
DB?
• That’s the skill of a bioinformatician

Why: Society of Biologists
• To do particle physics necessarily has central
organisation
• One central place to generate data
• A communitarian attitude
• It is still possible to do biology in the “garden shed”
• Historicaly less need to organise
• Hence…

Navigating the Web of
Knowledge in Bioinformatics

Biology is Special
• Large quantities of data: No it doesn’t
• Complex data: Yes it does
• Volatile data: Types of data and what is recorded
changes rapidly
• Nothing that special about biology
• …except that it has all the problem and often to a
large degree

Lots of catalogues
Genome
Proteome
Transcriptome
Interactome
Metabolome
PHENOME

Creating Woods, not Trees
Genes
Proteins
Pathways
Interactions
Literature
Complex
Machines
Virtual
Organism
…. from biological facts, we make a system that is some model of a real organism

Networks of Chemicals
Image: http://genome-www.stanford.edu/rap_sir/images/Web_FigF_RAP1_glycolysis.gif

Systems within Systems
Image: http://www.ehponline.org/members/2007/10373/fig1.jpg

A Biologist’s Skills
• By the time a biologist has finished a Ph.D. he/she is
about ready for action
• They have a comprehensive knowledge of the facts
of a (narrow) domain
• He/she also knows how to do experimentation in that
domain
• There are so many facts, it is difficult to move outside
one’s sub-discipline
• Yet in a systems view such movement is mandatory

The Role of Knowledge
• A lot of facts
• Perhaps organised into a system
• No equivalent of “laws of mechanics” – we
can’t do this biology with mathematics
• Or at least not without knowing what the
numbers mean...
• This is why we’ve been using ontologies!

What is an Ontology?
• A description of that which exists (in our data)
• What it means to be a member of a category
• What categories of things exist and how do I
recognise that a particular object is a member of a
given category

Uses of Ontology in Bioinformatics

Why develop an ontology?
• To make domain assumptions explicit
– Easier to change domain assumptions
– Easier to understand and update legacy data
• To separate domain knowledge from operational knowledge
– Re-use domain and operational knowledge
separately
• A community reference for applications
• To share a consistent understanding of what information means.

History of Bio-ontologies
1992 1996 1998
TAMBIS
2002
MGED
2006
1st
Bio-ontologies
meeting
Gene Ontology
starts
2005

Controlled Vocabulary
• An Ontology isn’t a controlled vocabulary, but can be
used to deliver one
• By agreeing upon the categories in a domain and
agreeing upon their labels we are controlling
vocabulary
• Addresses one major problem in biology
• Also forces examination of definitions
• Makes domain assumptions explicit

Transferring Characteristics
Uncharacterised protein
Tra1 La2 La3
High similarity transfer characteristics

Post-Genomic Biology
• Fly, mouse, yeast, worm all have their own
terminologies
• I want to compare genomes
• How?
• The genomic sequence is easily dealt with
computationally and comparisons are easy
• This is not true of the annotations or knowledge of
those sequences
• Need a common understanding

Annotation of Data
• Big effort to create controlled vocabularies using
ontologies
• A huge annotation efffort – describe the entities in DB
with terms from ontologies
• The Gene Ontology (http://www.geneontology.org))
• The Open Biomedical Ontologies Consortiym

Genotype Phenotype
Sequence
Proteins
Gene products Transcript
Pathways
Cell type
BRENDA tissue /
enzyme source
Development
Anatomy
Pheonotype
Plasmodium
life cycle
-Sequence types
and features
-Genetic Context
- Molecule role
- Molecular Function
- Biological process
- Cellular component
-Protein covalent bond
-Protein domain
-UniProt taxonomy
-Pathway ontology
-Event (INOH pathway
ontology)
-Systems Biology
-Protein-protein
interaction
-Arabidopsis development
-Cereal plant development
-Plant growth and developmental stage
-C. elegans development
-Drosophila development FBdv fly
development.obo OBO yes yes
-Human developmental anatomy, abstract
version
-Human developmental anatomy, timed version
-Mosquito gross anatomy
-Mouse adult gross anatomy
-Mouse gross anatomy and development
-C. elegans gross anatomy
-Arabidopsis gross anatomy
-Cereal plant gross anatomy
-Drosophila gross anatomy
-Dictyostelium discoideum anatomy
-Fungal gross anatomy FAO
-Plant structure
-Maize gross anatomy
-Medaka fish anatomy and development
-Zebrafish anatomy and development
-NCI Thesaurus
-Mouse pathology
-Human disease
-Cereal plant trait
-PATO PATO attribute and value.obo
-Mammalian phenotype
-Habronattus courtship
-Loggerhead nesting
-Animal natural history and life history
eVOC (Expressed
Sequence Annotation
for Humans)

The Sequence
Ontology
(http://obo.sf.net)

GO in Analysis
• Microarray analysis one of the original visions for GO
• Clustering of modulated genes cluster about
functional attributes of their proteins
• GO also used in, for example, semantic similarity;
text analysis; etc.

Fact Management
• When “stamp collecting” we’re collecting facts
• Biology is a fact management activity
• Knowing what these fact mean is very import
• Science is perofrmed on data and the smeantics of
data enable us to do science
• Semantic e-Science

Summary
• The nature of modern biology gives it interesting
knowledge (fact) management issues
• It is a knowledge based discipline
• Not unique, but often extreme
• Ontologies seen as one component in management
(but not a panacea)

acknowledgements
• All these people provided slides and input:
• Duncan Hull
• Simon Jupp
• Phil Lord
• Carole goble

Genotype to Pathway
Created by Paul Fisher

Pathway to Phenotype
Created by Paul Fisher

Ontology Space
(Axiomatic)Richness
Usage
Representation

Metadata toilet
• Everyone wants to use good metadata but few people want to
spend time curating and cleaning metadata
– Like a clean toilet

Biologists Wake up to Standards

Knowledge Management in a Knowledge Based Discipline

Recommended

Recommended

More Related Content

Similar to Knowledge Management in a Knowledge Based Discipline

Similar to Knowledge Management in a Knowledge Based Discipline (20)

More from robertstevens65

More from robertstevens65 (20)

Recently uploaded

Recently uploaded (20)

Knowledge Management in a Knowledge Based Discipline

Editor's Notes