The Changing Nature of Biomedical Research: Semantic e-Science

The Changing Nature of Biomedical
Research: Semantic e-Science
Robert Stevens
BioHealth Informatics Group
University of Manchester
Robert.Stevens@manchester.ac.uk

Introduction
• (Modern bio-molecular) Science
• E-Science
• Semantics and science
• Semantic e-Science

Ernest Rutherford
“All science is either physics or stamp collecting”
Image: http://en.wikipedia.org/wiki/File:Ernest_Rutherford2.jpg

Laws in Biology
Charles Darwin
Image: http://en.wikipedia.org/wiki/File:Charles_Darwin_01.jpg
On The Origin of Species - 1859

Central Dogma
Image: http://cellbio.utmb.edu/CELLBIO/DNA-RNA.jpg

Classic and Modern Biology
Genotype Phenotype
Modern biology
Classic biology

Speed of sequencing
• First human genome
– 10+ years to produce
– Cost $500 million
– Huge international effort
• Now done in 10 weeks
– (for $399)
– http://tinyurl.com/genomecost
– http://www.23andme.com

1000+ databases
• according to Nucleic Acids Research

PubMed: 2 papers per minute
• ~700,000 individual papers
• Grows at 2 papers per minute
(see http://blogs.bbsrc.ac.uk for details)

Lots of catalogues
Genome
Proteome
Transcriptome
Interactome
Metabolome
PHENOME

Creating Woods, not Trees
Genes
Proteins
Pathways
Interactions
Literature
Complex
Machines
Virtual
Organism
…. from biological facts, we make a system that is some model of a real organism

Networks of Chemicals
Image: http://genome-www.stanford.edu/rap_sir/images/Web_FigF_RAP1_glycolysis.gif

Systems within Systems
Image: http://www.ehponline.org/members/2007/10373/fig1.jpg

Uniprot:- A protein database?
Ι∆ ΠΡΙΟ_ΗΥΜΑΝ ΣΤΑΝ∆ΑΡ∆; ΠΡΤ; 253 ΑΑ.
ΑΧ Π04156;
∆Τ 01−ΝΟς−1986 (Ρελ. 03, Χρεατεδ)
∆Τ 01−ΝΟς−1986 (Ρελ. 03, Λαστ σεθυενχε υπδατε)
∆Τ 20−ΑΥΓ−2001 (Ρελ. 40, Λαστ αννοτατιον υπδατε)
∆Ε Μαϕορ πριον προτειν πρεχυρσορ (ΠρΠ) (ΠρΠ27−30) (ΠρΠ33−35Χ) (ΑΣΧΡ).
ΓΝ ΠΡΝΠ.
ΟΣ Ηοµο σαπιενσ (Ηυµαν).
ΟΧ Ευκαρψοτα; Μεταζοα; Χηορδατα; Χρανιατα; ςερτεβρατα; Ευτελεοστοµι;
ΟΧ Μαµµαλια; Ευτηερια; Πριµατεσ; Χαταρρηινι; Ηοµινιδαε; Ηοµο.
ΟΞ ΝΧΒΙ_ΤαξΙ∆=9606;
ΡΝ [1]
ΡΠ ΣΕΘΥΕΝΧΕ ΦΡΟΜ Ν.Α.
ΡΞ ΜΕ∆ΛΙΝΕ=86300093; ΠυβΜεδ=3755672;
ΡΑ Κρετζσχηµαρ Η.Α., Στοωρινγ Λ.Ε., Ωεσταωαψ ∆., Στυββλεβινε Ω.Η.,
ΡΑ Πρυσινερ Σ.Β., ∆εαρµονδ Σ.ϑ.;
ΡΤ ∀Μολεχυλαρ χλονινγ οφ α ηυµαν πριον προτειν χ∆ΝΑ.∀;
ΡΛ ∆ΝΑ 5:315−324(1986).
ΡΝ [2]
ΡΠ ΣΕΘΥΕΝΧΕ ΟΦ 8−253 ΦΡΟΜ Ν.Α.
ΡΞ ΜΕ∆ΛΙΝΕ=86261778; ΠυβΜεδ=3014653;
ΡΑ Λιαο Ψ.−Χ.ϑ., Λεβο Ρ.ς., Χλαωσον Γ.Α., Σµυχκλερ Ε.Α.;
ΡΤ ∀Ηυµαν πριον προτειν χ∆ΝΑ: µολεχυλαρ χλονινγ, χηροµοσοµαλ µαππινγ,
ΡΤ ανδ βιολογιχαλ ιµπλιχατιονσ.∀;
ΡΛ Σχιενχε 233:364−367(1986).
ΡΝ [3]
ΡΠ ΣΕΘΥΕΝΧΕ ΟΦ 58−85 ΑΝ∆ 111−150 (ςΑΡΙΑΝΤ ΑΜΨΛΟΙ∆ ΓΣΣ).
ΡΞ ΜΕ∆ΛΙΝΕ=91160504; ΠυβΜεδ=1672107;
ΡΑ Ταγλιαϖινι Φ., Πρελλι Φ., Γηισο ϑ., Βυγιανι Ο., Σερβαν ∆.,
ΡΑ Πρυσινερ Σ.Β., Φαρλοω Μ.Ρ., Γηεττι Β., Φρανγιονε Β.;
ΡΤ ∀Αµψλοιδ προτειν οφ Γερστµανν−Στραυσσλερ−Σχηεινκερ δισεασε (Ινδιανα
ΡΤ κινδρεδ) ισ αν 11 κδ φραγµεντ οφ πριον προτειν ωιτη αν Ν−τερµιναλ
ΡΤ γλψχινε ατ χοδον 58.∀;
ΡΛ ΕΜΒΟ ϑ. 10:513−519(1991).
ΡΝ [4]
ΡΠ ΣΤΡΥΧΤΥΡΕ ΒΨ ΝΜΡ ΟΦ 118−221.
ΡΞ ΜΕ∆ΛΙΝΕ=20359708; ΠυβΜεδ=10900000;
ΡΑ Χαλζολαι Λ., Λψσεκ ∆.Α., Γυντερτ Π., ϖον Σχηροεττερ Χ., Ριεκ Ρ.,
ΡΑ Ζαην Ρ., Ωυετηριχη Κ.;
ΡΤ ∀ΝΜΡ στρυχτυρεσ οφ τηρεε σινγλε−ρεσιδυε ϖαριαντσ οφ τηε ηυµαν πριον
ΡΤ προτειν.∀;
ΡΛ Προχ. Νατλ. Αχαδ. Σχι. Υ.Σ.Α. 97:8340−8345(2000).
ΧΧ −!− ΦΥΝΧΤΙΟΝ: ΤΗΕ ΦΥΝΧΤΙΟΝ ΟΦ ΠΡΠ ΙΣ ΝΟΤ ΚΝΟΩΝ. ΠΡΠ ΙΣ ΕΝΧΟ∆Ε∆ ΙΝ ΤΗΕ
ΧΧ ΗΟΣΤ ΓΕΝΟΜΕ ΑΝ∆ ΙΣ ΕΞΠΡΕΣΣΕ∆ ΒΟΤΗ ΙΝ ΝΟΡΜΑΛ ΑΝ∆ ΙΝΦΕΧΤΕ∆ ΧΕΛΛΣ.
ΧΧ −!− ΣΥΒΥΝΙΤ: ΠΡΠ ΗΑΣ Α ΤΕΝ∆ΕΝΧΨ ΤΟ ΑΓΓΡΕΓΑΤΕ ΨΙΕΛ∆ΙΝΓ ΠΟΛΨΜΕΡΣ ΧΑΛΛΕ∆
ΧΧ ∀ΡΟ∆Σ∀.
ΧΧ −!− ΣΥΒΧΕΛΛΥΛΑΡ ΛΟΧΑΤΙΟΝ: ΑΤΤΑΧΗΕ∆ ΤΟ ΤΗΕ ΜΕΜΒΡΑΝΕ ΒΨ Α ΓΠΙ−ΑΝΧΗΟΡ.
ΧΧ −!− ΠΟΛΨΜΟΡΠΗΙΣΜ: ΤΗΕ ΦΙςΕ ΤΑΝ∆ΕΜ ΟΧΤΑΠΕΠΤΙ∆Ε ΡΕΠΕΑΤΣ ΡΕΓΙΟΝ ΙΣ ΗΙΓΗΛΨ
ΧΧ ΥΝΣΤΑΒΛΕ. ΙΝΣΕΡΤΙΟΝΣ ΟΡ ∆ΕΛΕΤΙΟΝΣ ΟΦ ΟΧΤΑΠΕΠΤΙ∆Ε ΡΕΠΕΑΤ ΥΝΙΤΣ ΑΡΕ
ΧΧ ΑΣΣΟΧΙΑΤΕ∆ ΤΟ ΠΡΙΟΝ ∆ΙΣΕΑΣΕ.
ΦΤ ΣΙΓΝΑΛ 1 22
ΦΤ ΧΗΑΙΝ 23 230 ΜΑϑΟΡ ΠΡΙΟΝ ΠΡΟΤΕΙΝ.
ΦΤ ΠΡΟΠΕΠ 231 253 ΡΕΜΟςΕ∆ ΙΝ ΜΑΤΥΡΕ ΦΟΡΜ (ΒΨ ΣΙΜΙΛΑΡΙΤΨ).
ΦΤ ΛΙΠΙ∆ 230 230 ΓΠΙ−ΑΝΧΗΟΡ (ΒΨ ΣΙΜΙΛΑΡΙΤΨ).
ΦΤ ΧΑΡΒΟΗΨ∆ 181 181 Ν−ΛΙΝΚΕ∆ (ΓΛΧΝΑΧ...) (ΠΡΟΒΑΒΛΕ).
ΦΤ ∆ΙΣΥΛΦΙ∆ 179 214 ΒΨ ΣΙΜΙΛΑΡΙΤΨ.
ΦΤ ∆ΟΜΑΙΝ 51 91 5 Ξ 8 ΑΑ ΤΑΝ∆ΕΜ ΡΕΠΕΑΤΣ ΟΦ Π−Η−Γ−Γ−Γ−Ω−Γ−
ΦΤ Θ.
ΦΤ ΡΕΠΕΑΤ 51 59 1.
ΦΤ ΙΝ ΠΑΤΙΕΝΤΣ ΩΗΟ ΗΑςΕ Α ΠΡΠ ΜΥΤΑΤΙΟΝ ΑΤ
ΦΤ ΧΟ∆ΟΝ 178: ΠΑΤΙΕΝΤΣ ΩΙΤΗ ΜΕΤ ∆ΕςΕΛΟΠ ΦΦΙ,
ΦΤ ΤΗΟΣΕ ΩΙΤΗ ςΑΛ ∆ΕςΕΛΟΠ Χϑ∆).
ΦΤ /ΦΤΙδ=ςΑΡ_006467.
ΦΤ ςΑΡΙΑΝΤ 171 171 Ν −> Σ (ΙΝ ΣΧΗΙΖΟΑΦΦΕΧΤΙςΕ ∆ΙΣΟΡ∆ΕΡ).
ΦΤ ςΑΡΙΑΝΤ 178 178 ∆ −> Ν (ΙΝ ΦΦΙ ΑΝ∆ Χϑ∆).
ΦΤ ςΑΡΙΑΝΤ 180 180 ς −> Ι (ΙΝ Χϑ∆).
ΦΤ ςΑΡΙΑΝΤ 183 183 Τ −> Α (ΙΝ ΦΑΜΙΛΙΑΛ ΣΠΟΝΓΙΦΟΡΜ
ΦΤ ΕΝΧΕΠΗΑΛΟΠΑΤΗΨ).
ΦΤ ςΑΡΙΑΝΤ 187 187 Η −> Ρ (ΙΝ ΓΣΣ).
ΦΤ ςΑΡΙΑΝΤ 188 188 Τ −> Κ (ΙΝ ΕΟΑ∆; ∆ΕΜΕΝΤΙΑ ΑΣΣΟΧΙΑΤΕ∆ ΤΟ
ΦΤ ΠΡΙΟΝ ∆ΙΣΕΑΣΕΣ).
ΦΤ ςΑΡΙΑΝΤ 188 188 Τ −> Ρ.
ΦΤ ςΑΡΙΑΝΤ 196 196 Ε −> Κ (ΙΝ Χϑ∆).
ΣΘ ΣΕΘΥΕΝΧΕ 253 ΑΑ; 27661 ΜΩ; 43∆Β596ΒΑΑΑ66484 ΧΡΧ64;
ΜΑΝΛΓΧΩΜΛς ΛΦςΑΤΩΣ∆ΛΓ ΛΧΚΚΡΠΚΠΓΓ ΩΝΤΓΓΣΡΨΠΓ ΘΓΣΠΓΓΝΡΨΠ ΠΘΓΓ
ΓΓΩΓΘΠ ΗΓΓΓΩΓΘΠΗΓ ΓΓΩΓΘΠΗΓΓΓ ΩΓΘΠΗΓΓΓΩΓ ΘΓΓΓΤΗΣΘΩΝ ΚΠΣΚΠΚΤΝ
ΜΚ ΗΜΑΓΑΑΑΑΓΑ ςςΓΓΛΓΓΨΜΛ ΓΣΑΜΣΡΠΙΙΗ ΦΓΣ∆ΨΕ∆ΡΨΨ ΡΕΝΜΗΡΨΠΝΘ ςΨ
ΨΡΠΜ∆ΕΨΣ ΝΘΝΝΦςΗ∆Χς ΝΙΤΙΚΘΗΤςΤ ΤΤΤΚΓΕΝΦΤΕ Τ∆ςΚΜΜΕΡςς ΕΘΜΧΙΤΘΨ
ΕΡ ΕΣΘΑΨΨΘΡΓΣ ΣΜςΛΦΣΣΠΠς ΙΛΛΙΣΦΛΙΦΛ
ΙςΓ
//
ΧΧ −!− ∆ΙΣΕΑΣΕ: ΠΡΠ ΙΣ ΦΟΥΝ∆ ΙΝ ΗΙΓΗ ΘΥΑΝΤΙΤΨ ΙΝ ΤΗΕ
ΧΧ ΒΡΑΙΝ ΟΦ ΗΥΜΑΝΣ ΑΝ∆ ΑΝΙΜΑΛΣ ΙΝΦΕΧΤΕ∆
ΧΧ ΩΙΤΗ ΝΕΥΡΟ∆ΕΓΕΝΕΡΑΤΙςΕ ∆ΙΣΕΑΣΕΣ ΚΝΟΩΝ ΑΣ
ΧΧ ΤΡΑΝΣΜΙΣΣΙΒΛΕ ΣΠΟΝΓΙΦΟΡΜ ΕΝΧΕΠΗΑΛΟΠΑΤΗΙΕΣ ΟΡ ΠΡΙΟΝ Χ
Χ ∆ΙΣΕΑΣΕΣ,ΛΙΚΕ: ΧΡΕΥΤΖΦΕΛ∆Τ−ϑΑΚΟΒ ∆ΙΣΕΑΣΕ (Χϑ∆),
ΧΧ ΓΕΡΣΤΜΑΝΝ−ΣΤΡΑΥΣΣΛΕΡ ΣΨΝ∆ΡΟΜΕ (ΓΣΣ), ΦΑΤΑΛ
ΧΧ ΦΑΜΙΛΙΑΛ ΙΝΣΟΜΝΙΑ (ΦΦΙ) ΑΝ∆ ΚΥΡΥ ΙΝ ΗΥΜΑΝΣ;
ΧΧ ΣΧΡΑΠΙΕ ΙΝ ΣΗΕΕΠ ΑΝ∆ ΓΟΑΤ; ΒΟςΙΝΕ ΣΠΟΝΓΙΦΟΡΜ
ΧΧ ΕΝΧΕΠΗΑΛΟΠΑΤΗΨ (ΒΣΕ) ΙΝ ΧΑΤΤΛΕ; ΤΡΑΝΣΜΙΣΣΙΒΛΕ
ΧΧ ΜΙΝΚ ΕΝΧΕΠΗΑΛΟΠΑΤΗΨ (ΤΜΕ); ΧΗΡΟΝΙΧ ΩΑΣΤΙΝΓ
ΧΧ ∆ΙΣΕΑΣΕ (ΧΩ∆) ΟΦ ΜΥΛΕ ∆ΕΕΡ ΑΝ∆ ΕΛΚ; ΦΕΛΙΝΕ
ΧΧ ΣΠΟΝΓΙΦΟΡΜ ΕΝΧΕΠΗΑΛΟΠΑΤΗΨ (ΦΣΕ) ΙΝ ΧΑΤΣ ΑΝ∆
ΧΧ ΕΞΟΤΙΧ ΥΝΓΥΛΑΤΕ ΕΝΧΕΠΗΑΛΟΠΑΤΗΨ (ΕΥΕ) ΙΝ
ΧΧ ΝΨΑΛΑ ΑΝ∆ ΓΡΕΑΤΕΡ ΚΥ∆Υ. ΤΗΕ ΠΡΙΟΝ ∆ΙΣΕΑΣΕΣ
ΧΧ ΙΛΛΥΣΤΡΑΤΕ ΤΗΡΕΕ ΜΑΝΙΦΕΣΤΑΤΙΟΝΣ ΟΦ ΧΝΣ
ΧΧ ∆ΕΓΕΝΕΡΑΤΙΟΝ: (1) ΙΝΦΕΧΤΙΟΥΣ (2)
ΧΧ ΣΠΟΡΑ∆ΙΧ ΑΝ∆ (3) ∆ΟΜΙΝΑΝΤΛΨ ΙΝΗΕΡΙΤΕ∆ ΦΟΡΜΣ.
ΧΧ ΤΜΕ, ΧΩ∆, ΒΣΕ, ΦΣΕ, ΕΥΕ ΑΡΕ ΑΛΛ ΤΗΟΥΓΗΤ ΤΟ
ΧΧ ΟΧΧΥΡ ΑΦΤΕΡ ΧΟΝΣΥΜΠΤΙΟΝ ΟΦ ΠΡΙΟΝ−ΙΝΦΕΧΤΕ∆
ΧΧ ΦΟΟ∆ΣΤΥΦΦΣ.
∆Ρ ΕΜΒΛ; Μ13667; ΑΑΑ19664.1; −.
∆Ρ ΕΜΒΛ; Μ13899; ΑΑΑ60182.1; −.
∆Ρ ΕΜΒΛ; ∆00015; ΒΑΑ00011.1; −.
∆Ρ ΠΙΡ; Α05017; Α05017.
∆Ρ ΠΙΡ; Α24173; Α24173.
∆Ρ ΠΙΡ; Σ14078; Σ14078.
∆Ρ Π∆Β; 1Ε1Γ; 20−ϑΥΛ−00.
∆Ρ Π∆Β; 1Ε1ϑ; 20−ϑΥΛ−00.
∆Ρ Π∆Β; 1Ε1Π; 20−ϑΥΛ−00.
∆Ρ Π∆Β; 1Ε1Σ; 21−ϑΥΛ−00.
∆Ρ Π∆Β; 1Ε1Υ; 20−ϑΥΛ−00.
∆Ρ Π∆Β; 1Ε1Ω; 20−ϑΥΛ−00. ∆Ρ ΜΙΜ; 176640; −.
∆Ρ ΜΙΜ; 123400; −.
∆Ρ ΜΙΜ; 137440; −.
∆Ρ ΜΙΜ; 245300; −.
∆Ρ ΜΙΜ; 600072; −.
∆Ρ ΜΙΜ; 604920; −.
∆Ρ ΙντερΠρο; ΙΠΡ000817; Πριον.
∆Ρ Πφαµ; ΠΦ00377; πριον; 1.
∆Ρ ΠΡΙΝΤΣ; ΠΡ00341; ΠΡΙΟΝ.
∆Ρ ΣΜΑΡΤ; ΣΜ00157; ΠΡΠ; 1.
∆Ρ ΠΡΟΣΙΤΕ; ΠΣ00291; ΠΡΙΟΝ_1; 1.
∆Ρ ΠΡΟΣΙΤΕ; ΠΣ00706; ΠΡΙΟΝ_2; 1.
ΚΩ Πριον; Βραιν; Γλψχοπροτειν; ΓΠΙ−ανχηορ; Ρεπεατ; Σιγναλ;
ΚΩ 3∆−στρυχτυρε; Πολψµορπηισµ; ∆ισεασε µυτατιον.

Navigating the Web of
Knowledge in Bioinformatics

Bioinformatics Experiments are Data
pipelines
Resources/Services
Investigate the evolutionary relationships between proteins
Protein
sequences
Multiple
sequence
alignment
Query
[Peter Li]
My
data
My
tool

Linking together data resources
Hypo Science – the routine for the many
Hyper Science – big projects, big science

The In Silico Experiment
• We can mine these data for possible hypotheses
• “what are the genes that are involved in some disease
phenotype?”
• Correlate genes in QTL with differentially regulated genes in
microarray via pathways; query the literature base with these
genes, pathways and phenotype; …
• Resulting facts form some hypothesis: A co-ordinated set of
SNPs increase cholesterol biosynthesis in macrophage, while
delaying apoptosis of these cells; increased super-oxide
production aids tolerance to trypanosomiasis in cattle

How bioinformatics was
DoneIntegrating data sets
• Slave labour
• Collections of Scripts
• Warehouses
• Applications
– Galaxy
– Gaggle
– Integr8
– Ensembl
– …..
• Workflows!
12181 acatttctac caacagtgga
tgaggttgtt ggtctatgtt
ctcaccaaat ttggtgttgt
12241 cagtctttta aattttaacc
tttagagaag agtcatacag
tcaatagcct tttttagctt
12301 gaccatccta

Workflows: E. Science laboris
• Data preparation and analysis pipelines.
• Data preparation pipelines
• Data integration pipelines
• Data analysis pipelines
• Data annotation pipelines
• Warehouse population refreshing
• Data and text mining
• Knowledge extraction.
• Parameter sweeps over
simulations/computations
• Model building and verification
• Knowledge management and model
population
• Hypothesis generation and modelling

• A workflow is a specification.
• WFmS is the machinery for
coordinating the execution of
(scientific) services and linking
together (scientific) resources.
• Handles cross cutting concerns like:
error handling, service invocation,
data movement, data streaming, data
provenance tracking, process
auditing, execution monitoring,
security access, blah blah…..
• Agile software development
Workflows: E. Science laboris
Enactment
Engine
My
data
My
tool

Workflow Execution Engine
Workflow execution engine
Local desktop and remote server
Implicit iteration over large data collections
Nested workflows
Automated data flow
Event history log and data provenance tracking
Within-workflow programming
Extensibility points for plug-ins
Graphical workbench
For Professionals
Plug-in architecture
Incorporate new service without
coding. Services as they are.
Access to local and remote
resources and analysis tools
Re-Design
Rewritten

• Comparing resistant vs. susceptible
strains – Microarrays
• Mapping quantitative traits –
Classical genetics QTL
• Integrated Microarray data,
genomic sequences, pathway data,
literature mining.
Trypanosomiasis Study
Paul Fisher, et al Nucleic Acids Research,
2007, 35(16)

Genotype to Pathway
Created by Paul Fisher

Pathway to Phenotype
Created by Paul Fisher

• Eliminated user bias and premature filtering
• The scale and complexity of data and
literature.
• Systematic data analysis
• Data analysis provenance
• Manageable amount of output data for
biologists to interpret and verify
• Data driven science
“Looking where others hadn’t”
“make sense of this data” -> “does this make sense?”
http://www.youtube.com/watch?v=Y6_Kz5L010g

Transferring Characteristics
Uncharacterised protein
Tra1 La2 La3
High similarity transfer characteristics

… A Fact Based Discipline
• Rather than laws captured in mathematics….
• We have lots of facts: the discipline’s knowledge
• Rather than “calculating” what a protein does, we
investigate and write it down
• Equivalent to writing down the trajectories of all
thrown objects and not doing ballistics!
• To do biology one needs “the knowledge”

Heterogeneity
• 28 ways to format the representations of a biological
sequence
• Though one way to represent the bases or amino
acids…
• Different words same concept
• Different concepts same words
• Different and implicit data schema

An Identity Crisis
• Database entries have identifiers unique within their
database
• The type of entity described in an entry doesn’t have
an identifier
• Different entries about the same type talk about it
differently
• How do we know when an entry in one DB talks
about the same thing as another entry in another
DB?
• That’s the skill of a bioinformatician

Categories and Category Labels
GO:0000368
U2-type nuclear mRNA 5' splice site recognition
spliceosomal E complex formation
spliceosomal E complex biosynthesis
spliceosomal CC complex formation
U2-type nuclear mRNA 5'-splice site recognition

The Role of Knowledge
• A lot of facts
• Perhaps organised into a system
• No equivalent of “laws of mechanics” – we
can’t do this biology with mathematics
• Or at least not without knowing what the
numbers mean...
• This is why we’ve been using ontologies!

Uses of Ontology in Bioinformatics

Post-Genomic Biology
• Fly, mouse, yeast, worm all have their own
terminologies
• I want to compare genomes
• How?
• The genomic sequence is easily dealt with
computationally and comparisons are easy
• This is not true of the annotations or knowledge of
those sequences
• Need a common understanding

Annotation of Data
• Big effort to create controlled vocabularies using
ontologies
• A huge annotation effort – describe the entities in DB
with terms from ontologies
• The Gene Ontology (http://www.geneontology.org)
• The Open Biomedical Ontologies Consortium

GO in Analysis
• Microarray analysis one of the original visions for GO
• Clustering of modulated genes cluster about
functional attributes of their proteins
• GO also used in, for example, semantic similarity;
text analysis; etc.

Biocatalogue content screenshot

Shield users and applications
from service interoperability and incompatibility plumbing.
Turn your app into a service
Service
providers
Not only web
services
How a
bioinformatician
assumes stuff
should work

Pettifer, University of Manchester
inside
A collection
of
interactive
tools for
analysing
protein
sequence
and
structure
http://utopia.cs.manchester.ac.uk/

Semantic Descriptions of All
• Not just bio-entities in data
• The laboratory experiments by which they were
generated
• The protocols for their analysis
• The services for their analysis

Semantic Integration
• Same identifiers means integration and interoperation
• Most workflow hobbled by syntactic and semantic
heterogeneity
• Syntactic integration (Bio2RDF)
• Semantic integration via ontologies and naming
schemes
• Enables better e-Science through semantic science

Fact Management
• When “stamp collecting” we’re collecting facts
• Biology is a fact management activity
• Knowing what these facts mean is very important
• Science is performed on data and the semantics of data
enable us to do science
• Semantic e-Science

Summary
• The nature of modern biology gives it interesting
knowledge (fact) management issues
• It is a knowledge based discipline
• Not unique, but often extreme
• Ontologies seen as one component in management
(but not a panacea)
• E-Science gives infra-structure for management;
semantics enable analysis
• Actually, very light use of semantics

More Acknowledgements
• Phil Lord
• Simon Jupp
• Carole Goble

The Changing Nature of Biomedical Research: Semantic e-Science

Recommended

Recommended

More Related Content

Similar to The Changing Nature of Biomedical Research: Semantic e-Science

Similar to The Changing Nature of Biomedical Research: Semantic e-Science (20)

More from robertstevens65

More from robertstevens65 (20)

Recently uploaded

Recently uploaded (20)

The Changing Nature of Biomedical Research: Semantic e-Science

Editor's Notes