Precise elucidation of the many different biological features encoded in a genome requires a careful curation process that involves reviewing all available evidence to allow researchers to resolve discrepancies and validate automated gene models, protein alignments, and other biological elements. Genome annotation is an inherently collaborative task; researchers only rarely work in isolation, turning to colleagues for second opinions and insights from those with expertise in particular domains and gene families.
The i5k initiative seeks to sequence the genomes of 5,000 insect and related arthropod species. The selected species are known to be important to worldwide agriculture, food safety, medicine, and energy production as well as many used as models in biology, those most abundant in world ecosystems, and representatives in every branch of the insect phylogeny in an effort to better understand arthropod evolution and phylogeny. Because computational genome analysis remains an imperfect art, each of these new genomes sequenced will require visualization and curation.
Apollo is an instantaneous, collaborative, genome annotation editor, and the new JavaScript based version allows researchers real-time interactivity, breaking down large amounts of data into manageable portions to mobilize groups of researchers with shared interests. The i5K is a broad and inclusive effort that seeks to involve scientists from around the world in their genome curation process and Apollo is serving as the platform to empower this community. Here we offer details about this collaboration.
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
1. APOLLO + i5K
Collaborative Curation and
Interactive Analysis of Genomes
Monica Munoz-Torres, PhD | @monimunozto
Nathan Dunn, Monica Poelchau, Ian Holmes, Colin Diesh, Deepak Unni,
Christine Elsik, and Suzanna Lewis.
Berkeley Bioinformatics Open-Source Projects (BBOP)
Genomics Division, Lawrence Berkeley National Laboratory
XXIII Plant and Animal Genome Conference. San Diego, CA. January 14, 2015
2. OUTLINE
• CURATING
GENOMES
steps
involved
• MANUAL
ANNOTATION
is
necessary,
but
does
not
always
scale
• WEB
APOLLO
empowering
curators
• i5K
pursuing
common
goals
Web
Apollo
CollaboraHve
CuraHon
and
InteracHve
Analysis
of
Genomes
2
3. CURATING GENOMES
steps involved
1 Crea-on
of
Gene
Models
calling
ORFs,
one
or
more
rounds
of
gene
predicHon,
etc.
2 Annota-on
of
gene
models
Describing
funcHon,
expression
paNerns,
and
metabolic
network
memberships.
3
Manual
annota-on
CURATING GENOMES 3
4. AUTOMATED ANNOTATION
remains an imperfect art
Unlike
the
more
highly
polished
genomes
of
earlier
projects,
today:
a. lower
coverage.
b. more
frequent
assembly
errors
and
annotaHon
of
genes
across
mulHple
scaffolds.
c. automated
genome
annotaHons
must
be
curated
to
resolve
discrepancies,
providing
clarity
and
validaHon.
CURATING GENOMES 4
Image:
www.BroadInsHtute.org
5. ACCURACY OF ANNOTATION
… it depends
EXAMPLE
v Eight
methods
for
differenHal
alternaHve
splicing
detecHon
in
plants,
using
RNAseq.
v Conclusion:
NO
single
method
performs
the
best
in
all
situaHons.
“The
accuracy
of
annota/on
has
a
major
impact
on
which
method
should
be
chosen
for
analysis.”
CURATING GENOMES 5
Liu
et
al.
BMC
BioinformaHcs
2014,
15:364
6. 6
MANUAL ANNOTATION
objectives
IdenHfies
elements
that
best
represent
the
underlying
biology
(including
missing
genes)
and
eliminates
elements
that
reflect
systemic
errors
of
automated
analyses.
Assigns
funcHon
through
comparaHve
analysis
of
similar
genome
elements
from
closely
related
species
using
literature,
databases,
and
researchers’
lab
data.
1
2
MANUAL ANNOTATION
hNp://GeneOntology.org
7. BUT, MANUAL CURATION
does not always scale
A
small
group
of
highly
trained
experts;
e.g.
GO
1
Museum
A
few
very
good
biologists
and
a
few
very
good
bioinformaHcians
camp
together,
during
intense
but
short
periods
of
Hme.
Jamboree
2
Researchers
work
by
themselves,
then
may
or
may
not
publicize
results;
may
be
a
dead-‐
end
with
very
few
people
ever
aware
of
these
results.
Co?age
3
Elsik
et
al.
2006.
Genome
Res.
16(11):1329-‐33.
MANUAL ANNOTATION 7
Too
many
sequences
and
not
enough
hands
to
approach
curaHon.
8. POWER TO THE CURATORS
augment existing tools
Fill
in
the
gap
for
all
the
things
that
won’t
be
easy
to
cover
with
these
approaches
and
allow
researchers
to
beNer
contribute
their
efforts.
Give
more
people
the
power
to
curate!
Big
data
are
not
a
subs/tute
for,
but
a
supplement
to
tradi/onal
data
collec/on
and
analysis.
The
Parable
of
Google
Flu.
Lazer
et
al.
2014.
Science
343
(6176):
1203-‐1205.
v Enable
more
curators
to
work
v Enable
beNer
scienHfic
publishing
v Credit
curators
for
their
work
WEB APOLLO 8
9. GENOME ANNOTATION
an inherently collaborative task
Researchers
ofen
turn
to
colleagues
for
second
opinions
and
insight
from
those
with
experHse
in
parHcular
areas
(e.g.,
domains,
families).
To
facilitate
and
encourage
this,
we
conHnue
to
improve
Apollo.
WEB APOLLO 9
v Web
based
for
easy
access.
v Concurrent
access
supports
real
Hme
collaboraHon.
v Built-‐in
support
for
standards
(transparently
compliant).
v AutomaHc
generaHon
of
ready-‐made
computable
data.
v Client-‐side
applicaHon
relieves
server
boNleneck
and
supports
privacy.
v Supports
annotaHon
of
genes,
pseudogenes,
tRNAs,
snRNAs,
snoRNAs,
ncRNAs,
miRNAs,
TEs,
and
repeats.
The
new
Javascript-‐based
Apollo
:
10. COLLABORATIONS
also crowdsourcing development
v New
avenues
for
landing
on
Apollo
and
customizaHon
of
addiHonal
applicaHons.
v Web
services
for
alignment
and
funcHonal
annotaHon
tools.
v RNAseq
datasets
being
used
to
re-‐annotate
the
bovine
genome,
finding
genes
that
neither
RefSeq
nor
Ensembl
predicted.
Also
creaHng
track
of
disagreement
between
sets.
v Bovine
genome
consorHum
making
previous
iteraHons
of
manual
annotaHon
efforts
(from
3
assemblies
ago)
available
for
integraHon
of
curated
models.
WEB APOLLO 10
UNIVERSITY
of MISSOURI
National
Agricultural
Library
11. i5K
5,000 insects and related Arthropod species
v Species
are
selected
in
an
effort
to
beNer
understand
arthropod
evoluHon
and
phylogeny
through:
v worldwide
agriculture
v food
safety
v medicine
v energy
producHon
v models
in
biology
v those
species
most
abundant
in
world
ecosystems
v every
branch
of
the
insect
phylogeny
v Each
new
genome
requires
visualizaHon
and
curaHon!
APOLLO + i5K 11
National
Agricultural
Library
hNp://arthropodgenomes.org/wiki/i5K
12. i5K
who can join?
v All
Arthropods
are
welcome!
v Pilot
project:
39
species
v 3
with
completed
manual
annotaHon
v 25
undergoing
manual
annotaHon
v We
offer
a
plaiorm
for
collaboraHve
genome
analysis.
v We
do
not
offer
funding
for
sequencing
projects.
APOLLO + i5K 12
National
Agricultural
Library
Wasmania
auropunctata
Phlebotomus
papatasi
hNp://arthropodgenomes.org/wiki/i5K
13. i5K
current workflow: pilot project
APOLLO + i5K 13
National
Agricultural
Library
Sequencing,
assembly,
&
annotaHon
Research
Plan
Select
genes
of
interest
Calling
all
collaborators
Manual
AnnotaHon
Merge
automated
&
manual
annotaHons
• Set
Hme
frame
• Training
• Q&A
Update
gene
set
for
computaHonal
analysis
• Gatekeeping
• More
curaHon
CollaboraHve
ComputaHonal
PublicaHon
14. i5K
tools at workspace@NAL
v Web
Apollo
v RegistraHon
module
v DifferenHal
user
permissions
v Django
BLAST
v Queries
mulHple
species
at
once
v Links
directly
to
Apollo
v Species
pages
&
Gene
pages
v project
details,
metrics,
staHsHcs
v Widget
to
track
all
WA
annotaHons
APOLLO + i5K 14
National
Agricultural
Library
Tripal,
Chado,
JBrowse,
Apollo
National
Agricultural
Library
15. i5K
what we have learned
v Enabling
collaboraHon
has
been
very
useful
to
communiHes
v Data
hosHng
and
administraHon
at
NAL
facilitates
process
for
many
groups
v You
must
enforce
strict
rules
and
formats
v Metadata
capture
is
a
must;
standards
must
be
generated
and
enforced
v Users
prefer
small
bits
of
help
info
at
a
Hme,
instead
of
lengthy
manuals
v The
ideal
assembly
is
of
high
quality
and
remains
stable
v InvesHng
Hme
and
effort
on
a
high
quality
set
of
automated
gene
predicHons
will
pay
off
v Quality
of
manually
annotated
set
will
depend
on
the
coordinator’s
“whip”
APOLLO + i5K 15
National
Agricultural
Library
16. i5K
how to join
v Visit
hNp://arthropodgenomes.org/wiki/i5K
to
sign
up
v Contact
us!
Please
tell
us
about
your
research
interests
and
comment
on
the
status
and
quality
of
sequencing
/
assembly
/
automated
annotaHon
for
your
genome
of
interest.
@monimunozto
|
mcmunozt
@
lbl.gov
v Check
out
the
i5K
Workspace@NAL
at
hNps://i5k.nal.usda.gov/
APOLLO + i5K 16
National
Agricultural
Library
17. FUTURE PLANS
educational tools
We
are
working
with
educators
to
make
Web
Apollo
part
of
their
curricula.
WEB APOLLO 17
Lecture
Series.
In
the
classroom.
At
the
lab.
Classroom
exercises:
from
genome
sequence
to
hypothesis.
CuraHon
group
dedicated
to
producing
educaHon
materials
for
non-‐model
organism
communiHes.
Our
team
provides
online
documentaHon,
hands-‐on
training,
and
rapid
response
to
users.
18. ALL ARE WELCOME
call or email to join the Apollo community
Open
Call
for
Developers
on
the
First
Thursday
of
each
month
at
9:00AM
(Pacific
Time).
Message
@monimunozto
for
details.
BBOP Projects 18
Join
the
conversaHon
by
submirng
your
email
at
hNps://lists.lbl.gov/sympa/subscribe/apollo
hNp://GenomeArchitect.org
hNp://ArthropodGenomes.org/wiki/i5K
19. • Berkeley
Bioinforma-cs
Open-‐source
Projects
(BBOP),
Berkeley
Lab:
Web
Apollo
and
Gene
Ontology
teams.
Suzanna
E.
Lewis
(PI).
• §
ChrisHne
G.
Elsik
(PI).
University
of
Missouri.
• *
Ian
Holmes
(PI).
University
of
California
Berkeley.
• Arthropod
genomics
community:
i5K
Steering
CommiNee
(esp.
Sue
Brown
(Kansas
State)),
Alexie
Papanicolaou
(CSIRO),
Monica
Poelchau,
Christopher
Childers
(USDA/NAL),
fringy
Richards,
Dan
Hughes,
Kim
Worley
(HGSC-‐BCM),
BGI,
Oliver
Niehuis
(1KITE
hNp://www.1kite.org/),
and
the
Honey
Bee
Genome
Sequencing
ConsorHum.
• Web
Apollo
is
supported
by
NIH
grants
5R01GM080203
from
NIGMS,
and
5R01HG004483
from
NHGRI,
and
by
the
Director,
Office
of
Science,
Office
of
Basic
Energy
Sciences,
of
the
U.S.
Department
of
Energy
under
Contract
No.
DE-‐
AC02-‐05CH11231.
• Insect
images
used
with
permission:
hNp://AlexanderWild.com
• For
your
a?en-on,
thank
you!
Thank you. 19
Web
Apollo
Nathan
Dunn
Colin
Diesh
§
Deepak
Unni
§
Gene
Ontology
Chris
Mungall
Seth
Carbon
Heiko
Dietze
BBOP
Web
Apollo:
hNp://GenomeArchitect.org
i5K:
hNp://arthropodgenomes.org/wiki/i5K
GO:
hNp://GeneOntology.org
Thanks!
NAL
at
USDA
Monica
Poelchau
Christopher
Childers
NAL
team
HGSC
at
BCM
fringy
Richards
Dan
Hughes
Kim
Worley