SlideShare a Scribd company logo
1 of 48
The Changing Nature of Biomedical
Research: Semantic e-Science
Robert Stevens
BioHealth Informatics Group
University of Manchester
Robert.Stevens@manchester.ac.uk
Introduction
• (Modern bio-molecular) Science
• E-Science
• Semantics and science
• Semantic e-Science
Ernest Rutherford
“All science is either physics or stamp collecting”
Image: http://en.wikipedia.org/wiki/File:Ernest_Rutherford2.jpg
Mathematical Sciences
Laws in Biology
Charles Darwin
Image: http://en.wikipedia.org/wiki/File:Charles_Darwin_01.jpg
On The Origin of Species - 1859
Central Dogma
Image: http://cellbio.utmb.edu/CELLBIO/DNA-RNA.jpg
Classic and Modern Biology
Genotype Phenotype
Modern biology
Classic biology
Speed of sequencing
• First human genome
– 10+ years to produce
– Cost $500 million
– Huge international effort
• Now done in 10 weeks
– (for $399)
– http://tinyurl.com/genomecost
– http://www.23andme.com
1000+ databases
• according to Nucleic Acids Research
PubMed: 2 papers per minute
• ~700,000 individual papers
• Grows at 2 papers per minute
(see http://blogs.bbsrc.ac.uk for details)
Biology now has lots of facts
Lots of catalogues
Genome
Proteome
Transcriptome
Interactome
Metabolome
PHENOME
Creating Woods, not Trees
Genes
Proteins
Pathways
Interactions
Literature
Complex
Machines
Virtual
Organism
…. from biological facts, we make a system that is some model of a real organism
Networks of Chemicals
Image: http://genome-www.stanford.edu/rap_sir/images/Web_FigF_RAP1_glycolysis.gif
Systems within Systems
Image: http://www.ehponline.org/members/2007/10373/fig1.jpg
Uniprot:- A protein database?
Ι∆ ΠΡΙΟ_ΗΥΜΑΝ ΣΤΑΝ∆ΑΡ∆; ΠΡΤ; 253 ΑΑ.
ΑΧ Π04156;
∆Τ 01−ΝΟς−1986 (Ρελ. 03, Χρεατεδ)
∆Τ 01−ΝΟς−1986 (Ρελ. 03, Λαστ σεθυενχε υπδατε)
∆Τ 20−ΑΥΓ−2001 (Ρελ. 40, Λαστ αννοτατιον υπδατε)
∆Ε Μαϕορ πριον προτειν πρεχυρσορ (ΠρΠ) (ΠρΠ27−30) (ΠρΠ33−35Χ) (ΑΣΧΡ).
ΓΝ ΠΡΝΠ.
ΟΣ Ηοµο σαπιενσ (Ηυµαν).
ΟΧ Ευκαρψοτα; Μεταζοα; Χηορδατα; Χρανιατα; ςερτεβρατα; Ευτελεοστοµι;
ΟΧ Μαµµαλια; Ευτηερια; Πριµατεσ; Χαταρρηινι; Ηοµινιδαε; Ηοµο.
ΟΞ ΝΧΒΙ_ΤαξΙ∆=9606;
ΡΝ [1]
ΡΠ ΣΕΘΥΕΝΧΕ ΦΡΟΜ Ν.Α.
ΡΞ ΜΕ∆ΛΙΝΕ=86300093; ΠυβΜεδ=3755672;
ΡΑ Κρετζσχηµαρ Η.Α., Στοωρινγ Λ.Ε., Ωεσταωαψ ∆., Στυββλεβινε Ω.Η.,
ΡΑ Πρυσινερ Σ.Β., ∆εαρµονδ Σ.ϑ.;
ΡΤ ∀Μολεχυλαρ χλονινγ οφ α ηυµαν πριον προτειν χ∆ΝΑ.∀;
ΡΛ ∆ΝΑ 5:315−324(1986).
ΡΝ [2]
ΡΠ ΣΕΘΥΕΝΧΕ ΟΦ 8−253 ΦΡΟΜ Ν.Α.
ΡΞ ΜΕ∆ΛΙΝΕ=86261778; ΠυβΜεδ=3014653;
ΡΑ Λιαο Ψ.−Χ.ϑ., Λεβο Ρ.ς., Χλαωσον Γ.Α., Σµυχκλερ Ε.Α.;
ΡΤ ∀Ηυµαν πριον προτειν χ∆ΝΑ: µολεχυλαρ χλονινγ, χηροµοσοµαλ µαππινγ,
ΡΤ ανδ βιολογιχαλ ιµπλιχατιονσ.∀;
ΡΛ Σχιενχε 233:364−367(1986).
ΡΝ [3]
ΡΠ ΣΕΘΥΕΝΧΕ ΟΦ 58−85 ΑΝ∆ 111−150 (ςΑΡΙΑΝΤ ΑΜΨΛΟΙ∆ ΓΣΣ).
ΡΞ ΜΕ∆ΛΙΝΕ=91160504; ΠυβΜεδ=1672107;
ΡΑ Ταγλιαϖινι Φ., Πρελλι Φ., Γηισο ϑ., Βυγιανι Ο., Σερβαν ∆.,
ΡΑ Πρυσινερ Σ.Β., Φαρλοω Μ.Ρ., Γηεττι Β., Φρανγιονε Β.;
ΡΤ ∀Αµψλοιδ προτειν οφ Γερστµανν−Στραυσσλερ−Σχηεινκερ δισεασε (Ινδιανα
ΡΤ κινδρεδ) ισ αν 11 κδ φραγµεντ οφ πριον προτειν ωιτη αν Ν−τερµιναλ
ΡΤ γλψχινε ατ χοδον 58.∀;
ΡΛ ΕΜΒΟ ϑ. 10:513−519(1991).
ΡΝ [4]
ΡΠ ΣΤΡΥΧΤΥΡΕ ΒΨ ΝΜΡ ΟΦ 118−221.
ΡΞ ΜΕ∆ΛΙΝΕ=20359708; ΠυβΜεδ=10900000;
ΡΑ Χαλζολαι Λ., Λψσεκ ∆.Α., Γυντερτ Π., ϖον Σχηροεττερ Χ., Ριεκ Ρ.,
ΡΑ Ζαην Ρ., Ωυετηριχη Κ.;
ΡΤ ∀ΝΜΡ στρυχτυρεσ οφ τηρεε σινγλε−ρεσιδυε ϖαριαντσ οφ τηε ηυµαν πριον
ΡΤ προτειν.∀;
ΡΛ Προχ. Νατλ. Αχαδ. Σχι. Υ.Σ.Α. 97:8340−8345(2000).
ΧΧ −!− ΦΥΝΧΤΙΟΝ: ΤΗΕ ΦΥΝΧΤΙΟΝ ΟΦ ΠΡΠ ΙΣ ΝΟΤ ΚΝΟΩΝ. ΠΡΠ ΙΣ ΕΝΧΟ∆Ε∆ ΙΝ ΤΗΕ
ΧΧ ΗΟΣΤ ΓΕΝΟΜΕ ΑΝ∆ ΙΣ ΕΞΠΡΕΣΣΕ∆ ΒΟΤΗ ΙΝ ΝΟΡΜΑΛ ΑΝ∆ ΙΝΦΕΧΤΕ∆ ΧΕΛΛΣ.
ΧΧ −!− ΣΥΒΥΝΙΤ: ΠΡΠ ΗΑΣ Α ΤΕΝ∆ΕΝΧΨ ΤΟ ΑΓΓΡΕΓΑΤΕ ΨΙΕΛ∆ΙΝΓ ΠΟΛΨΜΕΡΣ ΧΑΛΛΕ∆
ΧΧ ∀ΡΟ∆Σ∀.
ΧΧ −!− ΣΥΒΧΕΛΛΥΛΑΡ ΛΟΧΑΤΙΟΝ: ΑΤΤΑΧΗΕ∆ ΤΟ ΤΗΕ ΜΕΜΒΡΑΝΕ ΒΨ Α ΓΠΙ−ΑΝΧΗΟΡ.
ΧΧ −!− ΠΟΛΨΜΟΡΠΗΙΣΜ: ΤΗΕ ΦΙςΕ ΤΑΝ∆ΕΜ ΟΧΤΑΠΕΠΤΙ∆Ε ΡΕΠΕΑΤΣ ΡΕΓΙΟΝ ΙΣ ΗΙΓΗΛΨ
ΧΧ ΥΝΣΤΑΒΛΕ. ΙΝΣΕΡΤΙΟΝΣ ΟΡ ∆ΕΛΕΤΙΟΝΣ ΟΦ ΟΧΤΑΠΕΠΤΙ∆Ε ΡΕΠΕΑΤ ΥΝΙΤΣ ΑΡΕ
ΧΧ ΑΣΣΟΧΙΑΤΕ∆ ΤΟ ΠΡΙΟΝ ∆ΙΣΕΑΣΕ.
ΦΤ ΣΙΓΝΑΛ 1 22
ΦΤ ΧΗΑΙΝ 23 230 ΜΑϑΟΡ ΠΡΙΟΝ ΠΡΟΤΕΙΝ.
ΦΤ ΠΡΟΠΕΠ 231 253 ΡΕΜΟςΕ∆ ΙΝ ΜΑΤΥΡΕ ΦΟΡΜ (ΒΨ ΣΙΜΙΛΑΡΙΤΨ).
ΦΤ ΛΙΠΙ∆ 230 230 ΓΠΙ−ΑΝΧΗΟΡ (ΒΨ ΣΙΜΙΛΑΡΙΤΨ).
ΦΤ ΧΑΡΒΟΗΨ∆ 181 181 Ν−ΛΙΝΚΕ∆ (ΓΛΧΝΑΧ...) (ΠΡΟΒΑΒΛΕ).
ΦΤ ∆ΙΣΥΛΦΙ∆ 179 214 ΒΨ ΣΙΜΙΛΑΡΙΤΨ.
ΦΤ ∆ΟΜΑΙΝ 51 91 5 Ξ 8 ΑΑ ΤΑΝ∆ΕΜ ΡΕΠΕΑΤΣ ΟΦ Π−Η−Γ−Γ−Γ−Ω−Γ−
ΦΤ Θ.
ΦΤ ΡΕΠΕΑΤ 51 59 1.
ΦΤ ΡΕΠΕΑΤ 60 67 2.
ΦΤ ΡΕΠΕΑΤ 68 75 3.
ΦΤ ΡΕΠΕΑΤ 76 83 4.
ΦΤ ΡΕΠΕΑΤ 84 91 5.
ΦΤ ΙΝ ΠΑΤΙΕΝΤΣ ΩΗΟ ΗΑςΕ Α ΠΡΠ ΜΥΤΑΤΙΟΝ ΑΤ
ΦΤ ΧΟ∆ΟΝ 178: ΠΑΤΙΕΝΤΣ ΩΙΤΗ ΜΕΤ ∆ΕςΕΛΟΠ ΦΦΙ,
ΦΤ ΤΗΟΣΕ ΩΙΤΗ ςΑΛ ∆ΕςΕΛΟΠ Χϑ∆).
ΦΤ /ΦΤΙδ=ςΑΡ_006467.
ΦΤ ςΑΡΙΑΝΤ 171 171 Ν −> Σ (ΙΝ ΣΧΗΙΖΟΑΦΦΕΧΤΙςΕ ∆ΙΣΟΡ∆ΕΡ).
ΦΤ /ΦΤΙδ=ςΑΡ_006468.
ΦΤ ςΑΡΙΑΝΤ 178 178 ∆ −> Ν (ΙΝ ΦΦΙ ΑΝ∆ Χϑ∆).
ΦΤ /ΦΤΙδ=ςΑΡ_006469.
ΦΤ ςΑΡΙΑΝΤ 180 180 ς −> Ι (ΙΝ Χϑ∆).
ΦΤ /ΦΤΙδ=ςΑΡ_006470.
ΦΤ ςΑΡΙΑΝΤ 183 183 Τ −> Α (ΙΝ ΦΑΜΙΛΙΑΛ ΣΠΟΝΓΙΦΟΡΜ
ΦΤ ΕΝΧΕΠΗΑΛΟΠΑΤΗΨ).
ΦΤ /ΦΤΙδ=ςΑΡ_006471.
ΦΤ ςΑΡΙΑΝΤ 187 187 Η −> Ρ (ΙΝ ΓΣΣ).
ΦΤ /ΦΤΙδ=ςΑΡ_008746.
ΦΤ ςΑΡΙΑΝΤ 188 188 Τ −> Κ (ΙΝ ΕΟΑ∆; ∆ΕΜΕΝΤΙΑ ΑΣΣΟΧΙΑΤΕ∆ ΤΟ
ΦΤ ΠΡΙΟΝ ∆ΙΣΕΑΣΕΣ).
ΦΤ /ΦΤΙδ=ςΑΡ_008748.
ΦΤ ςΑΡΙΑΝΤ 188 188 Τ −> Ρ.
ΦΤ /ΦΤΙδ=ςΑΡ_008747.
ΦΤ ςΑΡΙΑΝΤ 196 196 Ε −> Κ (ΙΝ Χϑ∆).
ΦΤ /ΦΤΙδ=ςΑΡ_008749.
ΦΤ /ΦΤΙδ=ςΑΡ_006472.
ΣΘ ΣΕΘΥΕΝΧΕ 253 ΑΑ; 27661 ΜΩ; 43∆Β596ΒΑΑΑ66484 ΧΡΧ64;
ΜΑΝΛΓΧΩΜΛς ΛΦςΑΤΩΣ∆ΛΓ ΛΧΚΚΡΠΚΠΓΓ ΩΝΤΓΓΣΡΨΠΓ ΘΓΣΠΓΓΝΡΨΠ ΠΘΓΓ
ΓΓΩΓΘΠ ΗΓΓΓΩΓΘΠΗΓ ΓΓΩΓΘΠΗΓΓΓ ΩΓΘΠΗΓΓΓΩΓ ΘΓΓΓΤΗΣΘΩΝ ΚΠΣΚΠΚΤΝ
ΜΚ ΗΜΑΓΑΑΑΑΓΑ ςςΓΓΛΓΓΨΜΛ ΓΣΑΜΣΡΠΙΙΗ ΦΓΣ∆ΨΕ∆ΡΨΨ ΡΕΝΜΗΡΨΠΝΘ ςΨ
ΨΡΠΜ∆ΕΨΣ ΝΘΝΝΦςΗ∆Χς ΝΙΤΙΚΘΗΤςΤ ΤΤΤΚΓΕΝΦΤΕ Τ∆ςΚΜΜΕΡςς ΕΘΜΧΙΤΘΨ
ΕΡ ΕΣΘΑΨΨΘΡΓΣ ΣΜςΛΦΣΣΠΠς ΙΛΛΙΣΦΛΙΦΛ
ΙςΓ
//
ΧΧ −!− ∆ΙΣΕΑΣΕ: ΠΡΠ ΙΣ ΦΟΥΝ∆ ΙΝ ΗΙΓΗ ΘΥΑΝΤΙΤΨ ΙΝ ΤΗΕ
ΧΧ ΒΡΑΙΝ ΟΦ ΗΥΜΑΝΣ ΑΝ∆ ΑΝΙΜΑΛΣ ΙΝΦΕΧΤΕ∆
ΧΧ ΩΙΤΗ ΝΕΥΡΟ∆ΕΓΕΝΕΡΑΤΙςΕ ∆ΙΣΕΑΣΕΣ ΚΝΟΩΝ ΑΣ
ΧΧ ΤΡΑΝΣΜΙΣΣΙΒΛΕ ΣΠΟΝΓΙΦΟΡΜ ΕΝΧΕΠΗΑΛΟΠΑΤΗΙΕΣ ΟΡ ΠΡΙΟΝ Χ
Χ ∆ΙΣΕΑΣΕΣ,ΛΙΚΕ: ΧΡΕΥΤΖΦΕΛ∆Τ−ϑΑΚΟΒ ∆ΙΣΕΑΣΕ (Χϑ∆),
ΧΧ ΓΕΡΣΤΜΑΝΝ−ΣΤΡΑΥΣΣΛΕΡ ΣΨΝ∆ΡΟΜΕ (ΓΣΣ), ΦΑΤΑΛ
ΧΧ ΦΑΜΙΛΙΑΛ ΙΝΣΟΜΝΙΑ (ΦΦΙ) ΑΝ∆ ΚΥΡΥ ΙΝ ΗΥΜΑΝΣ;
ΧΧ ΣΧΡΑΠΙΕ ΙΝ ΣΗΕΕΠ ΑΝ∆ ΓΟΑΤ; ΒΟςΙΝΕ ΣΠΟΝΓΙΦΟΡΜ
ΧΧ ΕΝΧΕΠΗΑΛΟΠΑΤΗΨ (ΒΣΕ) ΙΝ ΧΑΤΤΛΕ; ΤΡΑΝΣΜΙΣΣΙΒΛΕ
ΧΧ ΜΙΝΚ ΕΝΧΕΠΗΑΛΟΠΑΤΗΨ (ΤΜΕ); ΧΗΡΟΝΙΧ ΩΑΣΤΙΝΓ
ΧΧ ∆ΙΣΕΑΣΕ (ΧΩ∆) ΟΦ ΜΥΛΕ ∆ΕΕΡ ΑΝ∆ ΕΛΚ; ΦΕΛΙΝΕ
ΧΧ ΣΠΟΝΓΙΦΟΡΜ ΕΝΧΕΠΗΑΛΟΠΑΤΗΨ (ΦΣΕ) ΙΝ ΧΑΤΣ ΑΝ∆
ΧΧ ΕΞΟΤΙΧ ΥΝΓΥΛΑΤΕ ΕΝΧΕΠΗΑΛΟΠΑΤΗΨ (ΕΥΕ) ΙΝ
ΧΧ ΝΨΑΛΑ ΑΝ∆ ΓΡΕΑΤΕΡ ΚΥ∆Υ. ΤΗΕ ΠΡΙΟΝ ∆ΙΣΕΑΣΕΣ
ΧΧ ΙΛΛΥΣΤΡΑΤΕ ΤΗΡΕΕ ΜΑΝΙΦΕΣΤΑΤΙΟΝΣ ΟΦ ΧΝΣ
ΧΧ ∆ΕΓΕΝΕΡΑΤΙΟΝ: (1) ΙΝΦΕΧΤΙΟΥΣ (2)
ΧΧ ΣΠΟΡΑ∆ΙΧ ΑΝ∆ (3) ∆ΟΜΙΝΑΝΤΛΨ ΙΝΗΕΡΙΤΕ∆ ΦΟΡΜΣ.
ΧΧ ΤΜΕ, ΧΩ∆, ΒΣΕ, ΦΣΕ, ΕΥΕ ΑΡΕ ΑΛΛ ΤΗΟΥΓΗΤ ΤΟ
ΧΧ ΟΧΧΥΡ ΑΦΤΕΡ ΧΟΝΣΥΜΠΤΙΟΝ ΟΦ ΠΡΙΟΝ−ΙΝΦΕΧΤΕ∆
ΧΧ ΦΟΟ∆ΣΤΥΦΦΣ.
∆Ρ ΕΜΒΛ; Μ13667; ΑΑΑ19664.1; −.
∆Ρ ΕΜΒΛ; Μ13899; ΑΑΑ60182.1; −.
∆Ρ ΕΜΒΛ; ∆00015; ΒΑΑ00011.1; −.
∆Ρ ΠΙΡ; Α05017; Α05017.
∆Ρ ΠΙΡ; Α24173; Α24173.
∆Ρ ΠΙΡ; Σ14078; Σ14078.
∆Ρ Π∆Β; 1Ε1Γ; 20−ϑΥΛ−00.
∆Ρ Π∆Β; 1Ε1ϑ; 20−ϑΥΛ−00.
∆Ρ Π∆Β; 1Ε1Π; 20−ϑΥΛ−00.
∆Ρ Π∆Β; 1Ε1Σ; 21−ϑΥΛ−00.
∆Ρ Π∆Β; 1Ε1Υ; 20−ϑΥΛ−00.
∆Ρ Π∆Β; 1Ε1Ω; 20−ϑΥΛ−00. ∆Ρ ΜΙΜ; 176640; −.
∆Ρ ΜΙΜ; 123400; −.
∆Ρ ΜΙΜ; 137440; −.
∆Ρ ΜΙΜ; 245300; −.
∆Ρ ΜΙΜ; 600072; −.
∆Ρ ΜΙΜ; 604920; −.
∆Ρ ΙντερΠρο; ΙΠΡ000817; Πριον.
∆Ρ Πφαµ; ΠΦ00377; πριον; 1.
∆Ρ ΠΡΙΝΤΣ; ΠΡ00341; ΠΡΙΟΝ.
∆Ρ ΣΜΑΡΤ; ΣΜ00157; ΠΡΠ; 1.
∆Ρ ΠΡΟΣΙΤΕ; ΠΣ00291; ΠΡΙΟΝ_1; 1.
∆Ρ ΠΡΟΣΙΤΕ; ΠΣ00706; ΠΡΙΟΝ_2; 1.
ΚΩ Πριον; Βραιν; Γλψχοπροτειν; ΓΠΙ−ανχηορ; Ρεπεατ; Σιγναλ;
ΚΩ 3∆−στρυχτυρε; Πολψµορπηισµ; ∆ισεασε µυτατιον.
Navigating the Web of
Knowledge in Bioinformatics
Bioinformatics Experiments are Data
pipelines
Resources/Services
Investigate the evolutionary relationships between proteins
Protein
sequences
Multiple
sequence
alignment
Query
[Peter Li]
My
data
My
tool
Linking together data resources
Hypo Science – the routine for the many
Hyper Science – big projects, big science
The In Silico Experiment
• We can mine these data for possible hypotheses
• “what are the genes that are involved in some disease
phenotype?”
• Correlate genes in QTL with differentially regulated genes in
microarray via pathways; query the literature base with these
genes, pathways and phenotype; …
• Resulting facts form some hypothesis: A co-ordinated set of
SNPs increase cholesterol biosynthesis in macrophage, while
delaying apoptosis of these cells; increased super-oxide
production aids tolerance to trypanosomiasis in cattle
How bioinformatics was
DoneIntegrating data sets
• Slave labour
• Collections of Scripts
• Warehouses
• Applications
– Galaxy
– Gaggle
– Integr8
– Ensembl
– …..
• Workflows!
12181 acatttctac caacagtgga
tgaggttgtt ggtctatgtt
ctcaccaaat ttggtgttgt
12241 cagtctttta aattttaacc
tttagagaag agtcatacag
tcaatagcct tttttagctt
12301 gaccatccta
Workflows: E. Science laboris
• Data preparation and analysis pipelines.
• Data preparation pipelines
• Data integration pipelines
• Data analysis pipelines
• Data annotation pipelines
• Warehouse population refreshing
• Data and text mining
• Knowledge extraction.
• Parameter sweeps over
simulations/computations
• Model building and verification
• Knowledge management and model
population
• Hypothesis generation and modelling
• A workflow is a specification.
• WFmS is the machinery for
coordinating the execution of
(scientific) services and linking
together (scientific) resources.
• Handles cross cutting concerns like:
error handling, service invocation,
data movement, data streaming, data
provenance tracking, process
auditing, execution monitoring,
security access, blah blah…..
• Agile software development
Workflows: E. Science laboris
Enactment
Engine
My
data
My
tool
Workflow Execution Engine
Workflow execution engine
Local desktop and remote server
Implicit iteration over large data collections
Nested workflows
Automated data flow
Event history log and data provenance tracking
Within-workflow programming
Extensibility points for plug-ins
Graphical workbench
For Professionals
Plug-in architecture
Incorporate new service without
coding. Services as they are.
Access to local and remote
resources and analysis tools
Re-Design
Rewritten
• Comparing resistant vs. susceptible
strains – Microarrays
• Mapping quantitative traits –
Classical genetics QTL
• Integrated Microarray data,
genomic sequences, pathway data,
literature mining.
Trypanosomiasis Study
Paul Fisher, et al Nucleic Acids Research,
2007, 35(16)
Genotype to Pathway
Created by Paul Fisher
Pathway to Phenotype
Created by Paul Fisher
• Eliminated user bias and premature filtering
• The scale and complexity of data and
literature.
• Systematic data analysis
• Data analysis provenance
• Manageable amount of output data for
biologists to interpret and verify
• Data driven science
“Looking where others hadn’t”
“make sense of this data” -> “does this make sense?”
http://www.youtube.com/watch?v=Y6_Kz5L010g
Transferring Characteristics
Uncharacterised protein
Tra1 La2 La3
High similarity transfer characteristics
… A Fact Based Discipline
• Rather than laws captured in mathematics….
• We have lots of facts: the discipline’s knowledge
• Rather than “calculating” what a protein does, we
investigate and write it down
• Equivalent to writing down the trajectories of all
thrown objects and not doing ballistics!
• To do biology one needs “the knowledge”
Heterogeneity
• 28 ways to format the representations of a biological
sequence
• Though one way to represent the bases or amino
acids…
• Different words same concept
• Different concepts same words
• Different and implicit data schema
An Identity Crisis
• Database entries have identifiers unique within their
database
• The type of entity described in an entry doesn’t have
an identifier
• Different entries about the same type talk about it
differently
• How do we know when an entry in one DB talks
about the same thing as another entry in another
DB?
• That’s the skill of a bioinformatician
Categories and Category Labels
GO:0000368
U2-type nuclear mRNA 5' splice site recognition
spliceosomal E complex formation
spliceosomal E complex biosynthesis
spliceosomal CC complex formation
U2-type nuclear mRNA 5'-splice site recognition
The Role of Knowledge
• A lot of facts
• Perhaps organised into a system
• No equivalent of “laws of mechanics” – we
can’t do this biology with mathematics
• Or at least not without knowing what the
numbers mean...
• This is why we’ve been using ontologies!
Uses of Ontology in Bioinformatics
Post-Genomic Biology
• Fly, mouse, yeast, worm all have their own
terminologies
• I want to compare genomes
• How?
• The genomic sequence is easily dealt with
computationally and comparisons are easy
• This is not true of the annotations or knowledge of
those sequences
• Need a common understanding
Annotation of Data
• Big effort to create controlled vocabularies using
ontologies
• A huge annotation effort – describe the entities in DB
with terms from ontologies
• The Gene Ontology (http://www.geneontology.org)
• The Open Biomedical Ontologies Consortium
GO in Analysis
• Microarray analysis one of the original visions for GO
• Clustering of modulated genes cluster about
functional attributes of their proteins
• GO also used in, for example, semantic similarity;
text analysis; etc.
Biocatalogue content screenshot
Shield users and applications
from service interoperability and incompatibility plumbing.
Turn your app into a service
Service
providers
Not only web
services
How a
bioinformatician
assumes stuff
should work
Pettifer, University of Manchester
inside
A collection
of
interactive
tools for
analysing
protein
sequence
and
structure
http://utopia.cs.manchester.ac.uk/
Semantic Descriptions of All
• Not just bio-entities in data
• The laboratory experiments by which they were
generated
• The protocols for their analysis
• The services for their analysis
Semantic Integration
• Same identifiers means integration and interoperation
• Most workflow hobbled by syntactic and semantic
heterogeneity
• Syntactic integration (Bio2RDF)
• Semantic integration via ontologies and naming
schemes
• Enables better e-Science through semantic science
Fact Management
• When “stamp collecting” we’re collecting facts
• Biology is a fact management activity
• Knowing what these facts mean is very important
• Science is performed on data and the semantics of data
enable us to do science
• Semantic e-Science
Summary
• The nature of modern biology gives it interesting
knowledge (fact) management issues
• It is a knowledge based discipline
• Not unique, but often extreme
• Ontologies seen as one component in management
(but not a panacea)
• E-Science gives infra-structure for management;
semantics enable analysis
• Actually, very light use of semantics
More Acknowledgements
• Phil Lord
• Simon Jupp
• Carole Goble

More Related Content

Similar to The Changing Nature of Biomedical Research: Semantic e-Science

Gold Standard Physiological Measurements and Novel Drug Delivery Methods - Se...
Gold Standard Physiological Measurements and Novel Drug Delivery Methods - Se...Gold Standard Physiological Measurements and Novel Drug Delivery Methods - Se...
Gold Standard Physiological Measurements and Novel Drug Delivery Methods - Se...InsideScientific
 
Poster MEDINFO 2010
Poster MEDINFO 2010Poster MEDINFO 2010
Poster MEDINFO 2010Timothy Cook
 
Turn Away from Traditional Tethering and Towards a Better Method for Data Col...
Turn Away from Traditional Tethering and Towards a Better Method for Data Col...Turn Away from Traditional Tethering and Towards a Better Method for Data Col...
Turn Away from Traditional Tethering and Towards a Better Method for Data Col...InsideScientific
 
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...Fatma Sayed Ibrahim
 
Freedom: The Promise of Telemetry Revisited - Stellar Telemetry Webinar (TSE ...
Freedom: The Promise of Telemetry Revisited - Stellar Telemetry Webinar (TSE ...Freedom: The Promise of Telemetry Revisited - Stellar Telemetry Webinar (TSE ...
Freedom: The Promise of Telemetry Revisited - Stellar Telemetry Webinar (TSE ...InsideScientific
 
Dr. Randall Prather - PRRS Resistant Pigs
Dr. Randall Prather - PRRS Resistant PigsDr. Randall Prather - PRRS Resistant Pigs
Dr. Randall Prather - PRRS Resistant PigsJohn Blue
 
Stroke prevention in patients with atrial fibrillation
Stroke prevention in patients with atrial fibrillationStroke prevention in patients with atrial fibrillation
Stroke prevention in patients with atrial fibrillationMgfamiliar Net
 
Health, Data Analytics and Decision Support
Health, Data Analytics and Decision SupportHealth, Data Analytics and Decision Support
Health, Data Analytics and Decision Supportimec
 
Best-Practices to Achieve Quality PV Loop Data
Best-Practices to Achieve Quality PV Loop DataBest-Practices to Achieve Quality PV Loop Data
Best-Practices to Achieve Quality PV Loop DataInsideScientific
 
Hyperkalemia, an update
Hyperkalemia, an updateHyperkalemia, an update
Hyperkalemia, an updateJoel Topf
 
EVOLUTION OF ONTOLOGY-BASED MAPPINGS
EVOLUTION OF ONTOLOGY-BASED MAPPINGSEVOLUTION OF ONTOLOGY-BASED MAPPINGS
EVOLUTION OF ONTOLOGY-BASED MAPPINGSAksw Group
 
Pioneer dehradun-english-edition-2021-06-05
Pioneer dehradun-english-edition-2021-06-05Pioneer dehradun-english-edition-2021-06-05
Pioneer dehradun-english-edition-2021-06-05DunEditorial
 
Aug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencingAug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencingGenomeInABottle
 
Employing Acoustic, Tactile and PPI Startle Response Procedures in Rodent Beh...
Employing Acoustic, Tactile and PPI Startle Response Procedures in Rodent Beh...Employing Acoustic, Tactile and PPI Startle Response Procedures in Rodent Beh...
Employing Acoustic, Tactile and PPI Startle Response Procedures in Rodent Beh...InsideScientific
 
Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...
Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...
Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...ExternalEvents
 

Similar to The Changing Nature of Biomedical Research: Semantic e-Science (20)

Gold Standard Physiological Measurements and Novel Drug Delivery Methods - Se...
Gold Standard Physiological Measurements and Novel Drug Delivery Methods - Se...Gold Standard Physiological Measurements and Novel Drug Delivery Methods - Se...
Gold Standard Physiological Measurements and Novel Drug Delivery Methods - Se...
 
VII Jornadas SEQT - hERG
VII Jornadas SEQT - hERGVII Jornadas SEQT - hERG
VII Jornadas SEQT - hERG
 
Poster MEDINFO 2010
Poster MEDINFO 2010Poster MEDINFO 2010
Poster MEDINFO 2010
 
Turn Away from Traditional Tethering and Towards a Better Method for Data Col...
Turn Away from Traditional Tethering and Towards a Better Method for Data Col...Turn Away from Traditional Tethering and Towards a Better Method for Data Col...
Turn Away from Traditional Tethering and Towards a Better Method for Data Col...
 
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
 
Freedom: The Promise of Telemetry Revisited - Stellar Telemetry Webinar (TSE ...
Freedom: The Promise of Telemetry Revisited - Stellar Telemetry Webinar (TSE ...Freedom: The Promise of Telemetry Revisited - Stellar Telemetry Webinar (TSE ...
Freedom: The Promise of Telemetry Revisited - Stellar Telemetry Webinar (TSE ...
 
Dr. Randall Prather - PRRS Resistant Pigs
Dr. Randall Prather - PRRS Resistant PigsDr. Randall Prather - PRRS Resistant Pigs
Dr. Randall Prather - PRRS Resistant Pigs
 
Stroke prevention in patients with atrial fibrillation
Stroke prevention in patients with atrial fibrillationStroke prevention in patients with atrial fibrillation
Stroke prevention in patients with atrial fibrillation
 
Health, Data Analytics and Decision Support
Health, Data Analytics and Decision SupportHealth, Data Analytics and Decision Support
Health, Data Analytics and Decision Support
 
Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014
 
40120140507005
4012014050700540120140507005
40120140507005
 
40120140507005
4012014050700540120140507005
40120140507005
 
Best-Practices to Achieve Quality PV Loop Data
Best-Practices to Achieve Quality PV Loop DataBest-Practices to Achieve Quality PV Loop Data
Best-Practices to Achieve Quality PV Loop Data
 
Hyperkalemia, an update
Hyperkalemia, an updateHyperkalemia, an update
Hyperkalemia, an update
 
EVOLUTION OF ONTOLOGY-BASED MAPPINGS
EVOLUTION OF ONTOLOGY-BASED MAPPINGSEVOLUTION OF ONTOLOGY-BASED MAPPINGS
EVOLUTION OF ONTOLOGY-BASED MAPPINGS
 
Pioneer dehradun-english-edition-2021-06-05
Pioneer dehradun-english-edition-2021-06-05Pioneer dehradun-english-edition-2021-06-05
Pioneer dehradun-english-edition-2021-06-05
 
Aug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencingAug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencing
 
Employing Acoustic, Tactile and PPI Startle Response Procedures in Rodent Beh...
Employing Acoustic, Tactile and PPI Startle Response Procedures in Rodent Beh...Employing Acoustic, Tactile and PPI Startle Response Procedures in Rodent Beh...
Employing Acoustic, Tactile and PPI Startle Response Procedures in Rodent Beh...
 
Inferential stat tests samples discuss 4
Inferential stat tests samples discuss 4Inferential stat tests samples discuss 4
Inferential stat tests samples discuss 4
 
Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...
Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...
Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...
 

More from robertstevens65

Ontologies: Necessary, but not sufficient
Ontologies: Necessary, but not sufficientOntologies: Necessary, but not sufficient
Ontologies: Necessary, but not sufficientrobertstevens65
 
The Pragmatics and Formality of Authoring OntologiesOdsl 2016
The Pragmatics and Formality of Authoring OntologiesOdsl 2016The Pragmatics and Formality of Authoring OntologiesOdsl 2016
The Pragmatics and Formality of Authoring OntologiesOdsl 2016robertstevens65
 
OBOPedia: An Encyclopaedia of Biology Using OBO OntologiesObopedia swat4ls-20...
OBOPedia: An Encyclopaedia of Biology Using OBO OntologiesObopedia swat4ls-20...OBOPedia: An Encyclopaedia of Biology Using OBO OntologiesObopedia swat4ls-20...
OBOPedia: An Encyclopaedia of Biology Using OBO OntologiesObopedia swat4ls-20...robertstevens65
 
The Quality of Method Reporting in
The Quality of Method Reporting in The Quality of Method Reporting in
The Quality of Method Reporting in robertstevens65
 
The Semantics of Genomic Analysis
The Semantics of  Genomic AnalysisThe Semantics of  Genomic Analysis
The Semantics of Genomic Analysisrobertstevens65
 
Issues and activities in authoring ontologies
Issues and activities in authoring ontologiesIssues and activities in authoring ontologies
Issues and activities in authoring ontologiesrobertstevens65
 
The state of the nation for ontology development
The state of the nation for ontology developmentThe state of the nation for ontology development
The state of the nation for ontology developmentrobertstevens65
 
Building and Using Ontologies to do biology
Building and Using Ontologies to do biologyBuilding and Using Ontologies to do biology
Building and Using Ontologies to do biologyrobertstevens65
 
Properties and Individuals in OWL: Reasoning About Family History
Properties and Individuals in OWL: Reasoning About Family HistoryProperties and Individuals in OWL: Reasoning About Family History
Properties and Individuals in OWL: Reasoning About Family Historyrobertstevens65
 
Choosing and Building Knowledge Artefacts
Choosing and Building Knowledge ArtefactsChoosing and Building Knowledge Artefacts
Choosing and Building Knowledge Artefactsrobertstevens65
 
Populous: A tool for Populating OWL Ontologies from Templates
Populous: A tool for Populating OWL Ontologies from TemplatesPopulous: A tool for Populating OWL Ontologies from Templates
Populous: A tool for Populating OWL Ontologies from Templatesrobertstevens65
 
Keeping ontology development Agile
Keeping ontology development AgileKeeping ontology development Agile
Keeping ontology development Agilerobertstevens65
 
Lessons from teaching non-computer scientists OWL and ontologies
Lessons from teaching non-computer scientists OWL and ontologiesLessons from teaching non-computer scientists OWL and ontologies
Lessons from teaching non-computer scientists OWL and ontologiesrobertstevens65
 
Kidney and Urinary Pathways Knowledge Base (part of e-LICO)
Kidney and Urinary Pathways Knowledge Base (part of e-LICO)Kidney and Urinary Pathways Knowledge Base (part of e-LICO)
Kidney and Urinary Pathways Knowledge Base (part of e-LICO)robertstevens65
 
A Rose by Any Other Name is Still a Rose
A Rose by Any Other Name is Still a RoseA Rose by Any Other Name is Still a Rose
A Rose by Any Other Name is Still a Roserobertstevens65
 
Working with big biomedical ontologies
Working with big biomedical ontologiesWorking with big biomedical ontologies
Working with big biomedical ontologiesrobertstevens65
 
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...robertstevens65
 
Ontology learning from text
Ontology learning from textOntology learning from text
Ontology learning from textrobertstevens65
 

More from robertstevens65 (20)

Ontologies: Necessary, but not sufficient
Ontologies: Necessary, but not sufficientOntologies: Necessary, but not sufficient
Ontologies: Necessary, but not sufficient
 
The Pragmatics and Formality of Authoring OntologiesOdsl 2016
The Pragmatics and Formality of Authoring OntologiesOdsl 2016The Pragmatics and Formality of Authoring OntologiesOdsl 2016
The Pragmatics and Formality of Authoring OntologiesOdsl 2016
 
OBOPedia: An Encyclopaedia of Biology Using OBO OntologiesObopedia swat4ls-20...
OBOPedia: An Encyclopaedia of Biology Using OBO OntologiesObopedia swat4ls-20...OBOPedia: An Encyclopaedia of Biology Using OBO OntologiesObopedia swat4ls-20...
OBOPedia: An Encyclopaedia of Biology Using OBO OntologiesObopedia swat4ls-20...
 
The Quality of Method Reporting in
The Quality of Method Reporting in The Quality of Method Reporting in
The Quality of Method Reporting in
 
The Semantics of Genomic Analysis
The Semantics of  Genomic AnalysisThe Semantics of  Genomic Analysis
The Semantics of Genomic Analysis
 
Issues and activities in authoring ontologies
Issues and activities in authoring ontologiesIssues and activities in authoring ontologies
Issues and activities in authoring ontologies
 
The state of the nation for ontology development
The state of the nation for ontology developmentThe state of the nation for ontology development
The state of the nation for ontology development
 
Building and Using Ontologies to do biology
Building and Using Ontologies to do biologyBuilding and Using Ontologies to do biology
Building and Using Ontologies to do biology
 
Properties and Individuals in OWL: Reasoning About Family History
Properties and Individuals in OWL: Reasoning About Family HistoryProperties and Individuals in OWL: Reasoning About Family History
Properties and Individuals in OWL: Reasoning About Family History
 
Choosing and Building Knowledge Artefacts
Choosing and Building Knowledge ArtefactsChoosing and Building Knowledge Artefacts
Choosing and Building Knowledge Artefacts
 
Populous: A tool for Populating OWL Ontologies from Templates
Populous: A tool for Populating OWL Ontologies from TemplatesPopulous: A tool for Populating OWL Ontologies from Templates
Populous: A tool for Populating OWL Ontologies from Templates
 
Keeping ontology development Agile
Keeping ontology development AgileKeeping ontology development Agile
Keeping ontology development Agile
 
Spreadsheets to OWL
Spreadsheets to OWLSpreadsheets to OWL
Spreadsheets to OWL
 
Lessons from teaching non-computer scientists OWL and ontologies
Lessons from teaching non-computer scientists OWL and ontologiesLessons from teaching non-computer scientists OWL and ontologies
Lessons from teaching non-computer scientists OWL and ontologies
 
Kidney and Urinary Pathways Knowledge Base (part of e-LICO)
Kidney and Urinary Pathways Knowledge Base (part of e-LICO)Kidney and Urinary Pathways Knowledge Base (part of e-LICO)
Kidney and Urinary Pathways Knowledge Base (part of e-LICO)
 
A Rose by Any Other Name is Still a Rose
A Rose by Any Other Name is Still a RoseA Rose by Any Other Name is Still a Rose
A Rose by Any Other Name is Still a Rose
 
Working with big biomedical ontologies
Working with big biomedical ontologiesWorking with big biomedical ontologies
Working with big biomedical ontologies
 
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
 
Ontology learning from text
Ontology learning from textOntology learning from text
Ontology learning from text
 
Ontology at Manchester
Ontology at ManchesterOntology at Manchester
Ontology at Manchester
 

Recently uploaded

Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 

Recently uploaded (20)

Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 

The Changing Nature of Biomedical Research: Semantic e-Science

  • 1. The Changing Nature of Biomedical Research: Semantic e-Science Robert Stevens BioHealth Informatics Group University of Manchester Robert.Stevens@manchester.ac.uk
  • 2. Introduction • (Modern bio-molecular) Science • E-Science • Semantics and science • Semantic e-Science
  • 3. Ernest Rutherford “All science is either physics or stamp collecting” Image: http://en.wikipedia.org/wiki/File:Ernest_Rutherford2.jpg
  • 5. Laws in Biology Charles Darwin Image: http://en.wikipedia.org/wiki/File:Charles_Darwin_01.jpg On The Origin of Species - 1859
  • 7. Classic and Modern Biology Genotype Phenotype Modern biology Classic biology
  • 8. Speed of sequencing • First human genome – 10+ years to produce – Cost $500 million – Huge international effort • Now done in 10 weeks – (for $399) – http://tinyurl.com/genomecost – http://www.23andme.com
  • 9. 1000+ databases • according to Nucleic Acids Research
  • 10. PubMed: 2 papers per minute • ~700,000 individual papers • Grows at 2 papers per minute (see http://blogs.bbsrc.ac.uk for details)
  • 11. Biology now has lots of facts
  • 13. Creating Woods, not Trees Genes Proteins Pathways Interactions Literature Complex Machines Virtual Organism …. from biological facts, we make a system that is some model of a real organism
  • 14. Networks of Chemicals Image: http://genome-www.stanford.edu/rap_sir/images/Web_FigF_RAP1_glycolysis.gif
  • 15. Systems within Systems Image: http://www.ehponline.org/members/2007/10373/fig1.jpg
  • 16. Uniprot:- A protein database? Ι∆ ΠΡΙΟ_ΗΥΜΑΝ ΣΤΑΝ∆ΑΡ∆; ΠΡΤ; 253 ΑΑ. ΑΧ Π04156; ∆Τ 01−ΝΟς−1986 (Ρελ. 03, Χρεατεδ) ∆Τ 01−ΝΟς−1986 (Ρελ. 03, Λαστ σεθυενχε υπδατε) ∆Τ 20−ΑΥΓ−2001 (Ρελ. 40, Λαστ αννοτατιον υπδατε) ∆Ε Μαϕορ πριον προτειν πρεχυρσορ (ΠρΠ) (ΠρΠ27−30) (ΠρΠ33−35Χ) (ΑΣΧΡ). ΓΝ ΠΡΝΠ. ΟΣ Ηοµο σαπιενσ (Ηυµαν). ΟΧ Ευκαρψοτα; Μεταζοα; Χηορδατα; Χρανιατα; ςερτεβρατα; Ευτελεοστοµι; ΟΧ Μαµµαλια; Ευτηερια; Πριµατεσ; Χαταρρηινι; Ηοµινιδαε; Ηοµο. ΟΞ ΝΧΒΙ_ΤαξΙ∆=9606; ΡΝ [1] ΡΠ ΣΕΘΥΕΝΧΕ ΦΡΟΜ Ν.Α. ΡΞ ΜΕ∆ΛΙΝΕ=86300093; ΠυβΜεδ=3755672; ΡΑ Κρετζσχηµαρ Η.Α., Στοωρινγ Λ.Ε., Ωεσταωαψ ∆., Στυββλεβινε Ω.Η., ΡΑ Πρυσινερ Σ.Β., ∆εαρµονδ Σ.ϑ.; ΡΤ ∀Μολεχυλαρ χλονινγ οφ α ηυµαν πριον προτειν χ∆ΝΑ.∀; ΡΛ ∆ΝΑ 5:315−324(1986). ΡΝ [2] ΡΠ ΣΕΘΥΕΝΧΕ ΟΦ 8−253 ΦΡΟΜ Ν.Α. ΡΞ ΜΕ∆ΛΙΝΕ=86261778; ΠυβΜεδ=3014653; ΡΑ Λιαο Ψ.−Χ.ϑ., Λεβο Ρ.ς., Χλαωσον Γ.Α., Σµυχκλερ Ε.Α.; ΡΤ ∀Ηυµαν πριον προτειν χ∆ΝΑ: µολεχυλαρ χλονινγ, χηροµοσοµαλ µαππινγ, ΡΤ ανδ βιολογιχαλ ιµπλιχατιονσ.∀; ΡΛ Σχιενχε 233:364−367(1986). ΡΝ [3] ΡΠ ΣΕΘΥΕΝΧΕ ΟΦ 58−85 ΑΝ∆ 111−150 (ςΑΡΙΑΝΤ ΑΜΨΛΟΙ∆ ΓΣΣ). ΡΞ ΜΕ∆ΛΙΝΕ=91160504; ΠυβΜεδ=1672107; ΡΑ Ταγλιαϖινι Φ., Πρελλι Φ., Γηισο ϑ., Βυγιανι Ο., Σερβαν ∆., ΡΑ Πρυσινερ Σ.Β., Φαρλοω Μ.Ρ., Γηεττι Β., Φρανγιονε Β.; ΡΤ ∀Αµψλοιδ προτειν οφ Γερστµανν−Στραυσσλερ−Σχηεινκερ δισεασε (Ινδιανα ΡΤ κινδρεδ) ισ αν 11 κδ φραγµεντ οφ πριον προτειν ωιτη αν Ν−τερµιναλ ΡΤ γλψχινε ατ χοδον 58.∀; ΡΛ ΕΜΒΟ ϑ. 10:513−519(1991). ΡΝ [4] ΡΠ ΣΤΡΥΧΤΥΡΕ ΒΨ ΝΜΡ ΟΦ 118−221. ΡΞ ΜΕ∆ΛΙΝΕ=20359708; ΠυβΜεδ=10900000; ΡΑ Χαλζολαι Λ., Λψσεκ ∆.Α., Γυντερτ Π., ϖον Σχηροεττερ Χ., Ριεκ Ρ., ΡΑ Ζαην Ρ., Ωυετηριχη Κ.; ΡΤ ∀ΝΜΡ στρυχτυρεσ οφ τηρεε σινγλε−ρεσιδυε ϖαριαντσ οφ τηε ηυµαν πριον ΡΤ προτειν.∀; ΡΛ Προχ. Νατλ. Αχαδ. Σχι. Υ.Σ.Α. 97:8340−8345(2000). ΧΧ −!− ΦΥΝΧΤΙΟΝ: ΤΗΕ ΦΥΝΧΤΙΟΝ ΟΦ ΠΡΠ ΙΣ ΝΟΤ ΚΝΟΩΝ. ΠΡΠ ΙΣ ΕΝΧΟ∆Ε∆ ΙΝ ΤΗΕ ΧΧ ΗΟΣΤ ΓΕΝΟΜΕ ΑΝ∆ ΙΣ ΕΞΠΡΕΣΣΕ∆ ΒΟΤΗ ΙΝ ΝΟΡΜΑΛ ΑΝ∆ ΙΝΦΕΧΤΕ∆ ΧΕΛΛΣ. ΧΧ −!− ΣΥΒΥΝΙΤ: ΠΡΠ ΗΑΣ Α ΤΕΝ∆ΕΝΧΨ ΤΟ ΑΓΓΡΕΓΑΤΕ ΨΙΕΛ∆ΙΝΓ ΠΟΛΨΜΕΡΣ ΧΑΛΛΕ∆ ΧΧ ∀ΡΟ∆Σ∀. ΧΧ −!− ΣΥΒΧΕΛΛΥΛΑΡ ΛΟΧΑΤΙΟΝ: ΑΤΤΑΧΗΕ∆ ΤΟ ΤΗΕ ΜΕΜΒΡΑΝΕ ΒΨ Α ΓΠΙ−ΑΝΧΗΟΡ. ΧΧ −!− ΠΟΛΨΜΟΡΠΗΙΣΜ: ΤΗΕ ΦΙςΕ ΤΑΝ∆ΕΜ ΟΧΤΑΠΕΠΤΙ∆Ε ΡΕΠΕΑΤΣ ΡΕΓΙΟΝ ΙΣ ΗΙΓΗΛΨ ΧΧ ΥΝΣΤΑΒΛΕ. ΙΝΣΕΡΤΙΟΝΣ ΟΡ ∆ΕΛΕΤΙΟΝΣ ΟΦ ΟΧΤΑΠΕΠΤΙ∆Ε ΡΕΠΕΑΤ ΥΝΙΤΣ ΑΡΕ ΧΧ ΑΣΣΟΧΙΑΤΕ∆ ΤΟ ΠΡΙΟΝ ∆ΙΣΕΑΣΕ. ΦΤ ΣΙΓΝΑΛ 1 22 ΦΤ ΧΗΑΙΝ 23 230 ΜΑϑΟΡ ΠΡΙΟΝ ΠΡΟΤΕΙΝ. ΦΤ ΠΡΟΠΕΠ 231 253 ΡΕΜΟςΕ∆ ΙΝ ΜΑΤΥΡΕ ΦΟΡΜ (ΒΨ ΣΙΜΙΛΑΡΙΤΨ). ΦΤ ΛΙΠΙ∆ 230 230 ΓΠΙ−ΑΝΧΗΟΡ (ΒΨ ΣΙΜΙΛΑΡΙΤΨ). ΦΤ ΧΑΡΒΟΗΨ∆ 181 181 Ν−ΛΙΝΚΕ∆ (ΓΛΧΝΑΧ...) (ΠΡΟΒΑΒΛΕ). ΦΤ ∆ΙΣΥΛΦΙ∆ 179 214 ΒΨ ΣΙΜΙΛΑΡΙΤΨ. ΦΤ ∆ΟΜΑΙΝ 51 91 5 Ξ 8 ΑΑ ΤΑΝ∆ΕΜ ΡΕΠΕΑΤΣ ΟΦ Π−Η−Γ−Γ−Γ−Ω−Γ− ΦΤ Θ. ΦΤ ΡΕΠΕΑΤ 51 59 1. ΦΤ ΡΕΠΕΑΤ 60 67 2. ΦΤ ΡΕΠΕΑΤ 68 75 3. ΦΤ ΡΕΠΕΑΤ 76 83 4. ΦΤ ΡΕΠΕΑΤ 84 91 5. ΦΤ ΙΝ ΠΑΤΙΕΝΤΣ ΩΗΟ ΗΑςΕ Α ΠΡΠ ΜΥΤΑΤΙΟΝ ΑΤ ΦΤ ΧΟ∆ΟΝ 178: ΠΑΤΙΕΝΤΣ ΩΙΤΗ ΜΕΤ ∆ΕςΕΛΟΠ ΦΦΙ, ΦΤ ΤΗΟΣΕ ΩΙΤΗ ςΑΛ ∆ΕςΕΛΟΠ Χϑ∆). ΦΤ /ΦΤΙδ=ςΑΡ_006467. ΦΤ ςΑΡΙΑΝΤ 171 171 Ν −> Σ (ΙΝ ΣΧΗΙΖΟΑΦΦΕΧΤΙςΕ ∆ΙΣΟΡ∆ΕΡ). ΦΤ /ΦΤΙδ=ςΑΡ_006468. ΦΤ ςΑΡΙΑΝΤ 178 178 ∆ −> Ν (ΙΝ ΦΦΙ ΑΝ∆ Χϑ∆). ΦΤ /ΦΤΙδ=ςΑΡ_006469. ΦΤ ςΑΡΙΑΝΤ 180 180 ς −> Ι (ΙΝ Χϑ∆). ΦΤ /ΦΤΙδ=ςΑΡ_006470. ΦΤ ςΑΡΙΑΝΤ 183 183 Τ −> Α (ΙΝ ΦΑΜΙΛΙΑΛ ΣΠΟΝΓΙΦΟΡΜ ΦΤ ΕΝΧΕΠΗΑΛΟΠΑΤΗΨ). ΦΤ /ΦΤΙδ=ςΑΡ_006471. ΦΤ ςΑΡΙΑΝΤ 187 187 Η −> Ρ (ΙΝ ΓΣΣ). ΦΤ /ΦΤΙδ=ςΑΡ_008746. ΦΤ ςΑΡΙΑΝΤ 188 188 Τ −> Κ (ΙΝ ΕΟΑ∆; ∆ΕΜΕΝΤΙΑ ΑΣΣΟΧΙΑΤΕ∆ ΤΟ ΦΤ ΠΡΙΟΝ ∆ΙΣΕΑΣΕΣ). ΦΤ /ΦΤΙδ=ςΑΡ_008748. ΦΤ ςΑΡΙΑΝΤ 188 188 Τ −> Ρ. ΦΤ /ΦΤΙδ=ςΑΡ_008747. ΦΤ ςΑΡΙΑΝΤ 196 196 Ε −> Κ (ΙΝ Χϑ∆). ΦΤ /ΦΤΙδ=ςΑΡ_008749. ΦΤ /ΦΤΙδ=ςΑΡ_006472. ΣΘ ΣΕΘΥΕΝΧΕ 253 ΑΑ; 27661 ΜΩ; 43∆Β596ΒΑΑΑ66484 ΧΡΧ64; ΜΑΝΛΓΧΩΜΛς ΛΦςΑΤΩΣ∆ΛΓ ΛΧΚΚΡΠΚΠΓΓ ΩΝΤΓΓΣΡΨΠΓ ΘΓΣΠΓΓΝΡΨΠ ΠΘΓΓ ΓΓΩΓΘΠ ΗΓΓΓΩΓΘΠΗΓ ΓΓΩΓΘΠΗΓΓΓ ΩΓΘΠΗΓΓΓΩΓ ΘΓΓΓΤΗΣΘΩΝ ΚΠΣΚΠΚΤΝ ΜΚ ΗΜΑΓΑΑΑΑΓΑ ςςΓΓΛΓΓΨΜΛ ΓΣΑΜΣΡΠΙΙΗ ΦΓΣ∆ΨΕ∆ΡΨΨ ΡΕΝΜΗΡΨΠΝΘ ςΨ ΨΡΠΜ∆ΕΨΣ ΝΘΝΝΦςΗ∆Χς ΝΙΤΙΚΘΗΤςΤ ΤΤΤΚΓΕΝΦΤΕ Τ∆ςΚΜΜΕΡςς ΕΘΜΧΙΤΘΨ ΕΡ ΕΣΘΑΨΨΘΡΓΣ ΣΜςΛΦΣΣΠΠς ΙΛΛΙΣΦΛΙΦΛ ΙςΓ // ΧΧ −!− ∆ΙΣΕΑΣΕ: ΠΡΠ ΙΣ ΦΟΥΝ∆ ΙΝ ΗΙΓΗ ΘΥΑΝΤΙΤΨ ΙΝ ΤΗΕ ΧΧ ΒΡΑΙΝ ΟΦ ΗΥΜΑΝΣ ΑΝ∆ ΑΝΙΜΑΛΣ ΙΝΦΕΧΤΕ∆ ΧΧ ΩΙΤΗ ΝΕΥΡΟ∆ΕΓΕΝΕΡΑΤΙςΕ ∆ΙΣΕΑΣΕΣ ΚΝΟΩΝ ΑΣ ΧΧ ΤΡΑΝΣΜΙΣΣΙΒΛΕ ΣΠΟΝΓΙΦΟΡΜ ΕΝΧΕΠΗΑΛΟΠΑΤΗΙΕΣ ΟΡ ΠΡΙΟΝ Χ Χ ∆ΙΣΕΑΣΕΣ,ΛΙΚΕ: ΧΡΕΥΤΖΦΕΛ∆Τ−ϑΑΚΟΒ ∆ΙΣΕΑΣΕ (Χϑ∆), ΧΧ ΓΕΡΣΤΜΑΝΝ−ΣΤΡΑΥΣΣΛΕΡ ΣΨΝ∆ΡΟΜΕ (ΓΣΣ), ΦΑΤΑΛ ΧΧ ΦΑΜΙΛΙΑΛ ΙΝΣΟΜΝΙΑ (ΦΦΙ) ΑΝ∆ ΚΥΡΥ ΙΝ ΗΥΜΑΝΣ; ΧΧ ΣΧΡΑΠΙΕ ΙΝ ΣΗΕΕΠ ΑΝ∆ ΓΟΑΤ; ΒΟςΙΝΕ ΣΠΟΝΓΙΦΟΡΜ ΧΧ ΕΝΧΕΠΗΑΛΟΠΑΤΗΨ (ΒΣΕ) ΙΝ ΧΑΤΤΛΕ; ΤΡΑΝΣΜΙΣΣΙΒΛΕ ΧΧ ΜΙΝΚ ΕΝΧΕΠΗΑΛΟΠΑΤΗΨ (ΤΜΕ); ΧΗΡΟΝΙΧ ΩΑΣΤΙΝΓ ΧΧ ∆ΙΣΕΑΣΕ (ΧΩ∆) ΟΦ ΜΥΛΕ ∆ΕΕΡ ΑΝ∆ ΕΛΚ; ΦΕΛΙΝΕ ΧΧ ΣΠΟΝΓΙΦΟΡΜ ΕΝΧΕΠΗΑΛΟΠΑΤΗΨ (ΦΣΕ) ΙΝ ΧΑΤΣ ΑΝ∆ ΧΧ ΕΞΟΤΙΧ ΥΝΓΥΛΑΤΕ ΕΝΧΕΠΗΑΛΟΠΑΤΗΨ (ΕΥΕ) ΙΝ ΧΧ ΝΨΑΛΑ ΑΝ∆ ΓΡΕΑΤΕΡ ΚΥ∆Υ. ΤΗΕ ΠΡΙΟΝ ∆ΙΣΕΑΣΕΣ ΧΧ ΙΛΛΥΣΤΡΑΤΕ ΤΗΡΕΕ ΜΑΝΙΦΕΣΤΑΤΙΟΝΣ ΟΦ ΧΝΣ ΧΧ ∆ΕΓΕΝΕΡΑΤΙΟΝ: (1) ΙΝΦΕΧΤΙΟΥΣ (2) ΧΧ ΣΠΟΡΑ∆ΙΧ ΑΝ∆ (3) ∆ΟΜΙΝΑΝΤΛΨ ΙΝΗΕΡΙΤΕ∆ ΦΟΡΜΣ. ΧΧ ΤΜΕ, ΧΩ∆, ΒΣΕ, ΦΣΕ, ΕΥΕ ΑΡΕ ΑΛΛ ΤΗΟΥΓΗΤ ΤΟ ΧΧ ΟΧΧΥΡ ΑΦΤΕΡ ΧΟΝΣΥΜΠΤΙΟΝ ΟΦ ΠΡΙΟΝ−ΙΝΦΕΧΤΕ∆ ΧΧ ΦΟΟ∆ΣΤΥΦΦΣ. ∆Ρ ΕΜΒΛ; Μ13667; ΑΑΑ19664.1; −. ∆Ρ ΕΜΒΛ; Μ13899; ΑΑΑ60182.1; −. ∆Ρ ΕΜΒΛ; ∆00015; ΒΑΑ00011.1; −. ∆Ρ ΠΙΡ; Α05017; Α05017. ∆Ρ ΠΙΡ; Α24173; Α24173. ∆Ρ ΠΙΡ; Σ14078; Σ14078. ∆Ρ Π∆Β; 1Ε1Γ; 20−ϑΥΛ−00. ∆Ρ Π∆Β; 1Ε1ϑ; 20−ϑΥΛ−00. ∆Ρ Π∆Β; 1Ε1Π; 20−ϑΥΛ−00. ∆Ρ Π∆Β; 1Ε1Σ; 21−ϑΥΛ−00. ∆Ρ Π∆Β; 1Ε1Υ; 20−ϑΥΛ−00. ∆Ρ Π∆Β; 1Ε1Ω; 20−ϑΥΛ−00. ∆Ρ ΜΙΜ; 176640; −. ∆Ρ ΜΙΜ; 123400; −. ∆Ρ ΜΙΜ; 137440; −. ∆Ρ ΜΙΜ; 245300; −. ∆Ρ ΜΙΜ; 600072; −. ∆Ρ ΜΙΜ; 604920; −. ∆Ρ ΙντερΠρο; ΙΠΡ000817; Πριον. ∆Ρ Πφαµ; ΠΦ00377; πριον; 1. ∆Ρ ΠΡΙΝΤΣ; ΠΡ00341; ΠΡΙΟΝ. ∆Ρ ΣΜΑΡΤ; ΣΜ00157; ΠΡΠ; 1. ∆Ρ ΠΡΟΣΙΤΕ; ΠΣ00291; ΠΡΙΟΝ_1; 1. ∆Ρ ΠΡΟΣΙΤΕ; ΠΣ00706; ΠΡΙΟΝ_2; 1. ΚΩ Πριον; Βραιν; Γλψχοπροτειν; ΓΠΙ−ανχηορ; Ρεπεατ; Σιγναλ; ΚΩ 3∆−στρυχτυρε; Πολψµορπηισµ; ∆ισεασε µυτατιον.
  • 17. Navigating the Web of Knowledge in Bioinformatics
  • 18. Bioinformatics Experiments are Data pipelines Resources/Services Investigate the evolutionary relationships between proteins Protein sequences Multiple sequence alignment Query [Peter Li] My data My tool
  • 19. Linking together data resources Hypo Science – the routine for the many Hyper Science – big projects, big science
  • 20. The In Silico Experiment • We can mine these data for possible hypotheses • “what are the genes that are involved in some disease phenotype?” • Correlate genes in QTL with differentially regulated genes in microarray via pathways; query the literature base with these genes, pathways and phenotype; … • Resulting facts form some hypothesis: A co-ordinated set of SNPs increase cholesterol biosynthesis in macrophage, while delaying apoptosis of these cells; increased super-oxide production aids tolerance to trypanosomiasis in cattle
  • 21. How bioinformatics was DoneIntegrating data sets • Slave labour • Collections of Scripts • Warehouses • Applications – Galaxy – Gaggle – Integr8 – Ensembl – ….. • Workflows! 12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta
  • 22. Workflows: E. Science laboris • Data preparation and analysis pipelines. • Data preparation pipelines • Data integration pipelines • Data analysis pipelines • Data annotation pipelines • Warehouse population refreshing • Data and text mining • Knowledge extraction. • Parameter sweeps over simulations/computations • Model building and verification • Knowledge management and model population • Hypothesis generation and modelling
  • 23. • A workflow is a specification. • WFmS is the machinery for coordinating the execution of (scientific) services and linking together (scientific) resources. • Handles cross cutting concerns like: error handling, service invocation, data movement, data streaming, data provenance tracking, process auditing, execution monitoring, security access, blah blah….. • Agile software development Workflows: E. Science laboris Enactment Engine My data My tool
  • 24. Workflow Execution Engine Workflow execution engine Local desktop and remote server Implicit iteration over large data collections Nested workflows Automated data flow Event history log and data provenance tracking Within-workflow programming Extensibility points for plug-ins Graphical workbench For Professionals Plug-in architecture Incorporate new service without coding. Services as they are. Access to local and remote resources and analysis tools Re-Design Rewritten
  • 25. • Comparing resistant vs. susceptible strains – Microarrays • Mapping quantitative traits – Classical genetics QTL • Integrated Microarray data, genomic sequences, pathway data, literature mining. Trypanosomiasis Study Paul Fisher, et al Nucleic Acids Research, 2007, 35(16)
  • 26. Genotype to Pathway Created by Paul Fisher
  • 28. • Eliminated user bias and premature filtering • The scale and complexity of data and literature. • Systematic data analysis • Data analysis provenance • Manageable amount of output data for biologists to interpret and verify • Data driven science “Looking where others hadn’t” “make sense of this data” -> “does this make sense?” http://www.youtube.com/watch?v=Y6_Kz5L010g
  • 29. Transferring Characteristics Uncharacterised protein Tra1 La2 La3 High similarity transfer characteristics
  • 30. … A Fact Based Discipline • Rather than laws captured in mathematics…. • We have lots of facts: the discipline’s knowledge • Rather than “calculating” what a protein does, we investigate and write it down • Equivalent to writing down the trajectories of all thrown objects and not doing ballistics! • To do biology one needs “the knowledge”
  • 31. Heterogeneity • 28 ways to format the representations of a biological sequence • Though one way to represent the bases or amino acids… • Different words same concept • Different concepts same words • Different and implicit data schema
  • 32. An Identity Crisis • Database entries have identifiers unique within their database • The type of entity described in an entry doesn’t have an identifier • Different entries about the same type talk about it differently • How do we know when an entry in one DB talks about the same thing as another entry in another DB? • That’s the skill of a bioinformatician
  • 33. Categories and Category Labels GO:0000368 U2-type nuclear mRNA 5' splice site recognition spliceosomal E complex formation spliceosomal E complex biosynthesis spliceosomal CC complex formation U2-type nuclear mRNA 5'-splice site recognition
  • 34. The Role of Knowledge • A lot of facts • Perhaps organised into a system • No equivalent of “laws of mechanics” – we can’t do this biology with mathematics • Or at least not without knowing what the numbers mean... • This is why we’ve been using ontologies!
  • 35. Uses of Ontology in Bioinformatics
  • 36. Post-Genomic Biology • Fly, mouse, yeast, worm all have their own terminologies • I want to compare genomes • How? • The genomic sequence is easily dealt with computationally and comparisons are easy • This is not true of the annotations or knowledge of those sequences • Need a common understanding
  • 37. Annotation of Data • Big effort to create controlled vocabularies using ontologies • A huge annotation effort – describe the entities in DB with terms from ontologies • The Gene Ontology (http://www.geneontology.org) • The Open Biomedical Ontologies Consortium
  • 38.
  • 39. GO in Analysis • Microarray analysis one of the original visions for GO • Clustering of modulated genes cluster about functional attributes of their proteins • GO also used in, for example, semantic similarity; text analysis; etc.
  • 41. Shield users and applications from service interoperability and incompatibility plumbing. Turn your app into a service Service providers Not only web services How a bioinformatician assumes stuff should work
  • 42. Pettifer, University of Manchester inside A collection of interactive tools for analysing protein sequence and structure http://utopia.cs.manchester.ac.uk/
  • 43. Semantic Descriptions of All • Not just bio-entities in data • The laboratory experiments by which they were generated • The protocols for their analysis • The services for their analysis
  • 44. Semantic Integration • Same identifiers means integration and interoperation • Most workflow hobbled by syntactic and semantic heterogeneity • Syntactic integration (Bio2RDF) • Semantic integration via ontologies and naming schemes • Enables better e-Science through semantic science
  • 45. Fact Management • When “stamp collecting” we’re collecting facts • Biology is a fact management activity • Knowing what these facts mean is very important • Science is performed on data and the semantics of data enable us to do science • Semantic e-Science
  • 46. Summary • The nature of modern biology gives it interesting knowledge (fact) management issues • It is a knowledge based discipline • Not unique, but often extreme • Ontologies seen as one component in management (but not a panacea) • E-Science gives infra-structure for management; semantics enable analysis • Actually, very light use of semantics
  • 47.
  • 48. More Acknowledgements • Phil Lord • Simon Jupp • Carole Goble

Editor's Notes

  1. Title Slide The Changing Nature of Biomedical Research: Semantic e-Science
  2. Introduction (Modern bio-molecular) Science E-Science Semantics and science Semantic e-Science
  3. Ernest Rutherford Slide All science is either physics or stamp collecting
  4. Mathematical Sciences Lists of formulae
  5. Laws in Biology Charles Darwin and Origin of Species
  6. Central Dogma
  7. Classis and Modern Biology Slide contains two semicircles labelled Genotype and Phenotype Text says: Classic Biology; Modern Biology
  8. Speed of Sequencing First human genome 10+ years to produce Cost $500 million Huge international effort Now done in 10 weeks (for $399) http://tinyurl.com/genomecost http://www.23andme.com
  9. 1000+ Databases according to Nucleic Acids Research - Contains a graph of database growth
  10. PubMed: 2 Papers per minute ~700,000 individual papers Grows at 2 papers per minute (see http://blogs.bbsrc.ac.uk for details)
  11. Literature Lots of books in a library
  12. Catalogues Stack of books listing: Genome Transcriptome Proteome Interactome Metabolome Phenome
  13. Creating Woods not trees Slide contains: Book on the left with a plus sign Black and white image, man sat at an old valve-style computer (i.e. manchester baby) Text saying: genes, proteins, interactions, pathways Mouse on the right Text below images says: (left) Literature (middle) complex machines (right) Organism (bottom) “…. from biological facts, we make a system that is some model of a real thing” - Robert Stevens – 2008
  14. Network of chemicals Shows a pathway of chemical interactions and compounds
  15. Systems within systems Shows lots of organs and tissues associated with a person (in centre)
  16. UniProt a database? Slide seems to contain a database entry with Greek characters on it
  17. Navigating the Web of Knowledge in Bioinformatics Shows lots of diagrams with numerous bits of bioinformatics on them.
  18. Data piplelines PL: In bioinformatics, we have services/resources which biologists use in their bioinformatics analyses. Services can be repositories such as the EMBL database which contains gene sequences or analysis programs like ClustalW and Blast, an algorithm which measures the similarity between nucleotide or protein sequences. These services are often combined into an ‘in silico’ bioinformatics experiment such as the one shown here. The swissprot database and the clustal-w analysis services can be integrated into an in silico experiment to investigate the evolutionary relationships between proteins.
  19. ** NO TITLE FOR SLIDE Linking together data resources Hypo Science – the routine for the many Hyper Science – big projects, big science
  20. The in silico Experiment We can mine these data for possible hypotheses “what are the genes that are involved in some disease phenotype?” Correlate genes in QTL with differentially regulated genes in microarray via pathways; query the literature base with these genes, pathways and phenotype; … Resulting facts form some hypothesis: A co-ordinated set of SNPs increase cholesterol biosynthesis in macrophage, while delaying apoptosis of these cells; increased super-oxide production aids tolerance to trypanosomiasis in cattle
  21. Interoperating data services / integrating datasets Slide shows Hannah web page process (mish-mash) and written protocol Text says: Slave labour Collections of Scripts Warehouses Applications Galaxy Gaggle Integr8 Ensembl ….. Workflows!
  22. Workflows: E. Science laboris Slide shows Taverna workflow Text says: Data preparation and analysis pipelines. Data preparation pipelines Data integration pipelines Data analysis pipelines Data annotation pipelines Warehouse population refreshing Data and text mining Knowledge extraction. Parameter sweeps over simulations/computations Model building and verification Knowledge management and model population Hypothesis generation and modelling
  23. Workflows: E. Science laboris A workflow is a specification. WFmS is the machinery for coordinating the execution of (scientific) services and linking together (scientific) resources. Handles cross cutting concerns like: error handling, service invocation, data movement, data streaming, data provenance tracking, process auditing, execution monitoring, security access, blah blah….. Agile software development
  24. Taverna 2 Re-design Usability of professional workbench Seamless integration with myExp and BioCat Workflows in Production Taverna Lite Author Workbench. Workflow Player. Application Port. Results browser. Component maker. Workflow template maker. “myGrid-in-a-Box” Virtualised Taverna server deployment and distribution, bundle of myExperiment, BioCatalogue and database/tools components. Vertical Markets Taverna4Chemistry, Taverna4Plants, Taverna4Mouse “Taverna Inside” Platform, plug-in, integration Beanshell scripting and XML processing support inside the workflows Taverna 2: long running workflows, data reference handling, data streaming and staging, multiple extensibility points. Complete the Taverna 2 properties New data reference handling, security management, provenance management, asynchronous processor and data streaming, explicit monitoring and steering support, new dispatch layer better, supports dynamic service binding and service invocation through a resource broker, improved concurrency handling at the workflow level Taverna Remote Execution Service (T-REX) Running workflows on a server Running workflows inside other applications Taverna is for informatics people (bioinformaticians, cheminformaticians etc). We need other interfaces for uptake by laboratory scientists and health workers
  25. Trypanosomiasis Study Identified a pathway for which its correlating gene (Daxx) is believed to play a role in trypanosomiasis resistance A form of Sleeping sickness in cattle – Known as n’gana Caused by Trypanosoma brucei Some cattle breeds more resistant than others What are the differences between resistant and susceptible cattle? Can we breed cattle resistant to n’gana infection References Paul Fisher NAR paper
  26. Genotype to Pathway QTL to Pathway workflow This workflow: Identifies all the genes, and their Ensembl ids, in a QTL region using BioMart Cross-references the gene ids to Entrez and Uniprot ids Entrez and Uniprot ids then map onto KEGG gene ids The KEGG gene ids are then used to identify KEGG pathways, including a description and an ID These lists of descriptions and IDs are then returned back to the user
  27. Pathway to Phenotype Pathways to PubMed abstracts workflow This workflow: Takes in a list of KEGG pathway descriptions Appends a search string to the end of each description Searches through PubMed using the NCBI eUtils Web Services For each article found in PubMed, as a PubMed id, an abstract is returned along with the date of publication These abstracts are then returned to the user as a single file Thos abstracts, coupled with abstracts from the phenotype, provide evidence linking those pathways to the phenotype
  28. Looking where others hadn’t Includes link to youtube video Text says: Eliminated user bias and premature filtering The scale and complexity of data and literature. Systematic data analysis Data analysis provenance Manageable amount of output data for biologists to interpret and verify Data driven science “make sense of this data” -> “does this make sense?”
  29. Transferring Characteristics Lots of wiggly lines with protein names
  30. A fact based discipline Rather than laws captured in mathematics…. We have lots of facts: the discipline’s knowledge Rather than “calculating” what a protein does, we investigate and write it down Equivalent to writing down the trajectories of all thrown objects and not doing ballistics! To do biology one needs “the knowledge”
  31. Heterogeneity 28 ways to format the representations of a biological sequence Though one way to represent the bases or amino acids… Different words same concept Different concepts same words Different and implicit data schema
  32. An identity crisis Database entries have identifiers unique within their database The type of entity described in an entry doesn’t have an identifier Different entries about the same type talk about it differently How do we know when an entry in one DB talks about the same thing as another entry in another DB? That’s the skill of a bioinformatician
  33. Categories and Category Labels Shows go category and various labels associated with it
  34. The role of knowledge A lot of facts Perhaps organised into a system No equivalent of “laws of mechanics” – we can’t do this biology with mathematics Or at least not without knowing what the numbers mean... This is why we’ve been using ontologies!
  35. Uses of Ontology in Bioinformatics Shows a spider diagram with “description” in centre and “knowledge acquisition” at top (one of nodes)
  36. Post-genomic biology Fly, mouse, yeast, worm all have their own terminologies I want to compare genomes How? The genomic sequence is easily dealt with computationally and comparisons are easy This is not true of the annotations or knowledge of those sequences Need a common understanding
  37. Annotation of data Big effort to create controlled vocabularies using ontologies A huge annotation effort – describe the entities in DB with terms from ontologies The Gene Ontology (http://www.geneontology.org) The Open Biomedical Ontologies Consortium
  38. ** NO SLIDE TITLE Lots of lines leading back to metabolism from acetylcholine biosynthesis at bottom Looks similar to tree diagram with lots of nodes, with “is a” in it, e.g. biosynthesis – is a - metabolism
  39. The Sequence Ontology
  40. GO in Analysis Microarray analysis one of the original visions for GO Clustering of modulated genes cluster about functional attributes of their proteins GO also used in, for example, semantic similarity; text analysis; etc.
  41. BioCatalogue screen shots
  42. Shield Users and applications Slide shows taverna on top of arrows to services (Jun diagram).
  43. Utopia slide Utopia is a collection of interactive tools for analysing protein sequence and structure. Up front are user-friendly and responsive visualisation applications, behind the scenes a sophisticated model that allows these to work together and hides much of the tedious work of dealing with file formats and web services. Workflows under the hood e-Laboratories (portals) Systems Biology, e-Health Web based execution Running workflows over the web through myExperiment Visualisation clients that call workflows in the background
  44. Fact Management When “stamp collecting” we’re collecting facts Biology is a fact management activity Knowing what these fact mean is very import Science is performed on data and the semantics of data enable us to do science Semantic e-Science
  45. Summary The nature of modern biology gives it interesting knowledge (fact) management issues It is a knowledge based discipline Not unique, but often extreme Ontologies seen as one component in management (but not a panacea)
  46. Acknowledgments With all the people who work on myGrid, myExperiment, Taverna … etc etc.