Powerful Google developer tools for immediate impact! (2023-24 C)
Biocuration 2013 - Fiona Brinkman - From genes, to genomes to networks, with community aided curation
1. From
genes,
to
genomes
to
networks,
with
community
aided
cura5on
Fiona
Brinkman
Simon
Fraser
University
Biocura4on
conference
April
2013
2. From
genes,
to
genomes
to
networks,
with
community
aided
cura5on
with
a
li?le
help
from
my
friends
…
3. My
Primary
Research
Interest
Developing
more
sustainable
approaches
for
infec:ous
disease
control
…using
novel
computa:onal
tools,
integrated
data
and
interdisciplinary
approaches
Targe4ng
major
players
resul4ng
in
infec4ous
disease:
o
Pathogen
virulence
ID
an4-‐infec4ves
(don’t
kill
the
pathogen,
disarm
them)
o
Host
immune
system
failure/over-‐ac4vity
Immune
modulators
that
dampen
damaging
inflamma4on
and
boost
“good”
immune
response
o
Changes
in
environment/social
factors
Integra4ng
pathogen
genome
data
with
environment,
microbiome,
and
social
network
data
Be?er
iden4fy
source/cause
of
disease
outbreaks
3
4. Some
of
our
labs
tools…
o
Pathogen
virulence
PSORTb
–
Protein
localiza4on
analysis
(ID
cell
surface/secreted
drug
targets)
IslandViewer
–
Genomic
island
analysis,
pathogen-‐associated
genes
Ortholuge
DB
–
Precomputed
assessments
of
bacterial
orthologs
Genera-‐specific
DBs
like
Pseudomonas
Genome
Database
o
Host
immune
system
failure/over-‐ac5vity
InnateDB
–
Human/Mouse
interactome
+
curated
innate
immunity-‐associated
interac4ons
o
Changes
in
environment/social
factors
Metagenomics
projects
Integrated
Rapid
Infec4ous
Disease
Analysis
Pipeline
(IRIDA)
4
5. Some
of
our
labs
tools…
o
Pathogen
virulence
PSORTb
–
Protein
localiza4on
analysis
(ID
cell
surface/secreted
drug
targets)
IslandViewer
–
Genomic
island
analysis,
pathogen-‐associated
genes
Ortholuge
DB
–
Precomputed
assessments
of
bacterial
orthologs
Genera-‐specific
DBs
like
Pseudomonas
Genome
Database
o
Host
immune
system
failure/over-‐ac5vity
InnateDB
–
Human/Mouse
interactome
+
curated
innate
immunity-‐associated
interac4ons
o
Changes
in
environment/social
factors
Metagenomics
projects
Integrated
Rapid
Infec4ous
Disease
Analysis
Pipeline
(IRIDA)
5
6. Research
Philosophy
High
quality
analyses
are
only
as
good
as
the
robust
data,
effec:ve
data
organiza:on
and
accurate
analysis
methods
used.
Robust data
Want
high
accuracy
The –
usually
erring
on
the
side
of
high
Nexus Accurate precision
at
the
expense
of
recall.
Data analysis
organization methods
To
a?ain
high
accuracy,
biocura4on
is
oben
KEY
6
7.
Overview
•
Community-‐based
Community-‐aided
gene/genome
annota4on
•
1997
–
present:
Pseudomonas
Genome
Project
and
PseudoCAP
(Pseudomonas
Community
Annota4on
Project)
•
Community-‐aided
Mul4ple
community-‐aided
contextual
cura4on
of
molecular
interac4ons
•
2006
–
present:
InnateDB
project
•
What
we’re
doing
next…
•
Funding
it
all!
7
8. Pseudomonas
Community
Annota5on
Project
Goals
Cri4cal
and
conserva4ve
genome
annota4on
Minimize
project
costs
Capitalize
on
large
Pseudomonas
aeruginosa
research
community
Solu:on
Community-‐based,
Internet-‐based
approach
for
(con4nually
updated)
genome
annota4on
“Crowdsourcing”
in
the
90’s!
8
9. Pseudomonas
Community
Annota5on
Project
Ini:al
PseudoCAP
leading
to
genome
publica:on
(1997
–
2000)
61
researchers
from
13
countries,
1741
annota4ons
Focus
on
conserva4ve
annota4on
Need
to
capture
researcher’s
excellent,
diverse
biol
biological
knowledge,
NOT
their
diverse
ways
of
annota4ng!
9
10. Pseudomonas
Community
Annota5on
Project
Ini:al
PseudoCAP
leading
to
genome
publica:on
(1997
–
2000)
Ini4al
1741
community-‐based
annota4ons…
Annota4ons
incorporated
by
3
annotators
through
web-‐based
tool
1st
fully
internet-‐based
community
annota4on
effort
10
11. Pseudomonas
Community
Annota5on
Project
Current
PseudoCAP
–
con:nually
updated
annota:on
(2000
–
present)
151
researchers,
2356
curated
gene
annota4ons
(not
incl.
computa4onal
analyses)
Movement
from
gene-‐based
genes
plus
other
genome
features
(2,590
other
genome
features
added
in
the
last
year
alone)
Found
we
needed
to
further
modify
our
community-‐based
approach…
Winsor et al 2011 PMID: 20929876
11
Winsor et al 2005 PMID: 15608211
12. Pseudomonas
Community
Annota5on
Project
Current
PseudoCAP
–
con:nually
updated
annota:on
(2000
–
present)
Annota4ons
incorporated
by
one
part
4me
project
coordinator
Subject
to
review
process
(peer
reviewed
paper
or
other
peer
review)
Increasing
movement
from
Community-‐based
Community-‐aided
-‐
Coordinator
contacts
researchers
more
to
get
input
-‐
Capitalize
on
exper4se
most
efficiently
-‐
Coordinator
ensures
consistency
Coordinator
and
community
collec4vely
ensures
quality
12
13. Pseudomonas
Community
Annota5on
Project
Challenges
and
Solu:ons
-‐
Disputes
between
researchers
regarding
an
annota4on
-‐
Go
with
first
published
and
have
alternate
annota4ons
-‐
Researchers
are
busy!
-‐
Keep
submission
system/input
process
simple!
-‐
We
now
contact
them
more
than
they
contact
us
-‐
Have
rounds
of
major
annota4on
pushes
Future:
Will
try
again
the
“paper
carrot”
for
another
annota4on
push
–
authorship
on
a
NAR
update
paper
(as
a
consor4um)
to
encourage
par4cipa4on
13
15. InnateDB
Developed
to
Aid
Two
Large
Interna4onal
Systems
Biology
Projects
Mouse Model Datasets:
Cerebral Malaria mouse model (IMR, Australia)
Tuberculosis mouse model (AECM)
Shigella xenograft model (Pasteur)
Human Clinical Datasets:
Typhoid & Malaria Vietnam (OUCRU/Stanford/
Sanger)
Non Typhoidal Salmonella Malawi (Sanger)
+ Chronic/Acute Helminth Ecuador (USF de Quito/
Sanger)
Dengue (OUCRU)
Modulating innate immune response via
Host Defense peptides (Hancock lab, UBC)
Mouse KOs (Sanger)
Novel insight into host response and mechanism of peptides.
Common Pathways, networks and transcriptional regulation.
Thompson et al PNAS December 2009
16. Systems Biology & The Innate
Immune Response:
Many layers of complexity.
Layers of regulation:
transcriptional;
post-transcriptional (miRNAs);
post-translational (ubiquitination,
phosphorylation)
Host-pathogen interactions
100s – 1000s DE genes
Not simple pathways - networks
of molecular interactions
Gardy*, Lynn*, Brinkman,
Hancock (2009). Enabling a
systems biology approach to
immunology: focus on innate
immunity. Trends in Immunology
PMID: 19428301
17. Breuer et al., 2013 InnateDB: systems biology of innate immunity and beyond… NAR (DB issue) PMID: 23180781
18. Manual Curation of Interaction Data From Literature to Database
Greatly Enhances Coverage of Innate Immunity Interactome
INNATEDB CURATED INTERACTOME
INTERACTIONS INTERACTIONS
ONLY CURATED ALSO CURATED
BY INNATEDB BY TOP 5 OTHER
INTERACTION
DATABASES:
BIND, INTACT, DIP,
BIOGRID & MINT
Lynn et al., Curating the Innate Immunity Interactome. BMC Systems Biology 2010 PMID: 20727158
19. Manual
Cura4on
of
Interac4on
Data
From
Literature
to
Database
–
Enhancing
coverage
of
Innate
Immunity
Interactome
The
InnateDB
curated
interactome
in
July
2012.
Red
edges
represent
interac4ons
that
have
been
added
in
2011
and
2012.
Breuer et al., 2013 InnateDB: systems biology of innate immunity and beyond… Nucleic Acids Research (Database issue)
20. Contextually Curating Innate Immunity-Relevant Interactions
Annotated fields include:
Molecule type; organism;
biological role; interaction
detection method; the
host system (in vitro, in
vivo, ex vivo); host
organism; interaction
type; cell, cell-line and
tissue types; cell status
(primary/cell line);
experimental role;
participant identification
method and sub-cellular
localization, plus variety
of additional curator
comments.
21. Curating Innate Immunity-Relevant Interactions
71% human, 22% mouse, 7% human-
mouse
~80% interactions in innate immunity
interactome not annotated by other major
databases
Protein (69%), DNA and RNA interactions
Developed InnateDB submission system
software to allow submission of interaction
annotation in an OBO ontology-controlled
and MIMIx & PSI-MI 2.5 compliant manner.
Lynn et al., Curating the Innate Immunity Interactome. BMC Systems Biology 2010 PMID: 20727158
22. Which journals are curated?
>4,400 journal articles curated to date
Don’t focus on specific journals - relevant articles curated if meet appropriate
quality standards for the interaction evidence.
Indeed, at least one protein has been curated from >200 different journals.
More than 70% of curated articles have come from 20 journals.
Note many journals in top 20 are not “immunology journals”, underscoring
importance of not limiting curation efforts to journals perceived as “relevant”.
23. Curating Innate Immunity-Relevant Interactions
– 4-pronged approach
Curation primarily pathway-centric
systematically review all literature describing interactions for a particular innate
immunity pathway.
Curate all other interactors regardless of whether the interacting molecule is a member
of the pathway or has any known role in innate immunity expands network outside
of known innate immunity players.
Systematically curated pathways are scheduled for frequent re-curation as the field is
moving quickly.
Also, new publications on innate immunity assessed on a daily basis to identify novel
interactions of interest.
Priority given to the most recent publications incorporates new information on the
most current research
Immunology Community-aided:
Curators consult with researchers to confirm unclear literature data
Most common issue: Unclear what species the protein/DNA/RNA interactors come
from
Curation Community-aided:
InnateDB curators review each others curations as an error check
IMEx consortium!
http://www.innatedb.com/doc/InnateDB_2010_curation_guide.pdf
24. • InnateDB
is
a
member
of
IMEx
–
an
interna4onal
consor4um
of
interac4on
databases
involved
in
cura4on
• Goal:
Develop
common
standards,
avoid
too
much
redundancy
in
data
collec4on/cura4on,
central
registry,
single
search
interface
• Orchard
et
al
Nature
Methods
9:345-‐350
PMID:
22453911
• Stay
tuned
for
Sandra
Orchard's
talk!
25. Going Beyond Innate Immunity – An Integrative Biology Resource
>196,000 human and mouse interactions
extracted & loaded from BIND, INTACT,
DIP, BIOGRID & MINT DBs
Cross-referenced genes to >3,000
pathways from KEGG, PID, BIOCARTA,
INOH, NetPath & Reactome DBs
Visualize/analyze interactions
associated with specific pathway
Pathway over-representation analysis
Ensembl annotation provides details of all
human & mouse genes/transcripts/
proteins. UniProt, Entrez, Gene Ontology,
etc rich protein & gene annotation
Transcript. factor–DNA interactions
experimentally confirmed from Transfac,
TransCompel
Robust orthology & gene synteny analysis
facilitate human-mouse comparisons
27. InnateDB
–
Facilita4ng
Systems-‐Level
Analyses
of
Gene
Expression
Data
Upload Your Own Gene Expression Data
- Up to 10 conditions/timepoints at 1 time.
Overlay Gene Expression Data Pathway, Gene Ontology & TF ORA tools
from Multiple Conditions on Find – DE Pathways/Functionally Related
Networks/Pathways Genes/TFs
Go Beyond Pathway Analysis – Differentially Expressed Sub-networks – New
Pathways? How Are DE Genes Actually Inter-connected? Central Regulators
(Network Hubs)
28. InnateDB and curated data aided study of an immune modulator –
host-directed adjunctive therapy coupled with anti-malarial
29. What
we’re
doing
next…
Need
to
develop
more
ontologies
and
data
standards
to
integrate
microbial
genomic
data
from
a
disease
outbreak
with
epidemiological
data.
Cura4ng
pathogen
status
for
complete
microbial
genomes
Will
try
the
“paper
carrot”
again
for
next
Pseudomonas
Genome
Database
cura4on
project
InnateDB
–
expanding
to
Allergy
and
Asthma
29
30. Iden4fy
genes
unique
to/shared
between
strains,
species,
genera,
any
selected
bacteria….
30
31. Funding!
Grants!
One
of
the
biggest
challenges
is
to
secure
long
term,
reliable
funding.
We've
found:
Need
to
target
cura4on
to
specific
bio
projects.
(ie
innate
immunity,
then
to
allergy
and
asthma;
aiding
a
specific
Pseudomonas
analysis)
Limits
what
we
can
do,
but
good
in
the
sense
that
cura4on
benefits
are
more
quickly
felt
as
they
are
needed/used
by
others
31
32. Concluding
comments
Using
community-‐aided,
expert
curator-‐centered,
approach
for
balancing
consistency,
reliability
and
maximizing
knowledge.
Degree
of
community
involvement
depends
on
nature
of
data.
Capitalize
on
both
bio
community
and
cura4on
community
–
keep
linked
Researchers
are
busy!
Make
it
super
easy
for
them
to
provide
input.
A
li?le
contribu4on
can
go
a
long
way
Paper
carrots!
Link
cura4on
to
bio
research
to
secure
funding
Indoctrinate
young
minds!
Get
biocura4on
and
its
challenges
into
undergrad
curriculums
32
33. Acknowledgements - InnateDB
InnateDB Principle Investigators: InnateDB Curation: www.innatedb.com
Fiona Brinkman (SFU)
Bob Hancock (UBC) Raymond Lo
David Lynn (Teagasc) Anastasia Sribnaia
Carol Chan
InnateDB Development: Misbah Naseer
Karin Breuer Melissa Yau
Geoff Winsor Giselle Ring
Matthew Laird Kathleen Wee
Calvin Chan Jaimmie Que
Amir Foroushani
Brian Meredith
Cerebral network visualizer:
Nathan Lawless
Nicolas Richard
Avinash Chikatamarla
Aaron Barsky
Jennifer Gardy
Fiona Roche
Tamara Munzner
Timothy Chan
Naisha Shah
Michael Acab
FNIH/GCGH Collaborators:
Gordon Dougan (Sanger)
Fernanda Schreiber (Sanger)
Melita Gordon (U. Liverpool)
Bill Jacobs (AECM)
Dee Dao (AECM)
Philip Cooper (St. Georges)
Louis Schofield (WEHI)
Sandra Pilat (WEHI)
Sarah Dunstan (OUCRU)
Brett Finlay (UBC)
34. Acknowledgements
–
PseudoCAP
Geoff
Winsor
Ray
Lo
Ma?
Laird
Bhav
Dhillon
Ma?hew
Whiteside
151
PseudoCAP
par4cipants
www.pseudomonas.com