Talk at the 8th International Biocuration Conference. Beijing, China. April 23-26, 2015.
Obtaining meaningful results from genome analyses requires high quality annotations of all genomic elements. Today’s sequencing projects face challenges such as lower coverage, more frequent assembly errors, and the lack of closely related species with well-annotated genomes. Apollo is a web-based application that supports and enables collaborative genome curation in real time, analogous to Google Docs, allowing curators to improve on existing automated gene models through an intuitive interface. Apollo’s extensible architecture is built on top of JBrowse; its components are a web-based client, an annotation-editing engine, and a server-side data service. It allows users to visualize automated gene models, protein alignments, expression and variant data, and conduct structural and/or functional annotations.
Apollo is actively used within a variety of projects, including the initiative to sequence the genomes of 5,000 Arthropod species (i5K), and will become essential to the thousands of genomes now being sequenced and analyzed. Researchers from nearly 100 institutions worldwide are currently using Apollo on distributed curation efforts for over sixty genome projects across the tree of life; from plants to echinoderms, to fungi, to species of fish and other vertebrates including human, cattle (bovine), and dog. We are training the next generation of researchers by reaching out to educators to make these tools available as part of curricula, offering workshops and webinars to the scientific community, and through widely applied systems such as iPlant and DNA Subway. We are currently integrating Apollo into an annotation environment combining gene structural and functional annotation, transcriptomic, proteomic, and phenotypic annotation. In this presentation we will describe in detail its utility to users, introduce the architecture to developers interested in expanding on this open-source project, and offer details of our future plans.
Authors:
Monica Munoz-Torres(1), Nathan Dunn(1), Colin Diesh(2), Deepak Unni(2), Seth Carbon(1), Heiko Dietze(1), Christopher Mungall(1), Nicole Washington(1), Ian Holmes(3), Christine Elsik(2), and Suzanna E. Lewis(1)
1Lawrence Berkeley National Laboratory, Genomics Division, Berkeley, CA
2Divisions of Animal and Plant Sciences, University of Missouri, Columbia, MO
3University of California Berkeley, Bioengineering, Berkeley, CA
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
1. APOLLO: Scalable and
collaborative genome curation
Monica Munoz-Torres, PhD | @monimunozto
Nathan Dunn, Colin Diesh*, Deepak Unni*, Seth Carbon, Heiko Dietze, Christopher
Mungall, Nicole Washington, Ian Holmes*, Christine Elsik*, and Suzanna E. Lewis
Berkeley Bioinformatics Open-Source Projects
Genomics Division, Lawrence Berkeley National Laboratory
8th International Biocuration Conference. Beijing, China. 24 April, 2015
2. OUTLINE
• LAST
TIME
where
we
le.
off
last
year
• IMPROVEMENTS
architecture,
scalability,
features
• COLLABORATIONS
JBrowse
&
GenSAS
• FUTURE
PLANS
what
lies
on
the
horizon
Apollo
Scalable
and
CollaboraJve
Genome
CuraJon
2OUTLINE
3. APOLLO
genome annotation editing tool
3
v Web
based,
integrated
with
JBrowse.
v Supports
real
Jme
collaboraJon!
v AutomaJc
generaJon
of
ready-‐made
computable
data.
v Supports
annotaJon
of
genes,
pseudogenes,
tRNAs,
snRNAs,
snoRNAs,
ncRNAs,
miRNAs,
TEs,
and
repeats.
v IntuiJve
annotaJon,
gestures,
and
pull-‐down
menus
to
create
and
edit
transcripts
and
exons
structures,
insert
comments
(CV,
freeform
text),
GO
terms,
etc.
INTRODUCTION
5. LESSONS WE HAVE LEARNED
What
we
have
learned:
• CollaboraJve
work
disJlls
invaluable
knowledge
• We
must
enforce
strict
rules
and
formats
• We
must
evolve
with
the
data
• A
li]le
training
goes
a
long
way
• NGS
poses
addiJonal
challenges
PREVIOUSLY WE LEARNED 5
6. HIGHLIGHTED IMPROVEMENTS
scalability
SCALABILITY 6
• Easier
deployment,
more
detailed
documentaJon
• Supports
mulJple
organisms
per
server,
improved
comparaJve
tools
• Easier
to
query
the
data
and
build
extensions
• More
flexible
user
interface
via
removable
side-‐dock
with
customizable
tabs;
be]er
search
funcJonality,
validaJon
checks,
and
ediJng
capability
• Allows
larger
set
of
sequence
annotaJons
based
on
the
Sequence
Ontology
• Offers
fine-‐grained
user
and
group
level
permissions
7. NEW APOLLO ARCHITECTURE
simpler, more flexible
ARCHITECTURE 7
Web-‐based
client
+
annotaJon-‐ediJng
engine
+
server-‐side
data
service
REST / JSON
Websockets
Annotation Engine (Server)
Shiro
LDAP
OAuth
JBrowse Data
Organism 2
Annotations
Security
Preferences
Organisms
Tracks
BAM
BED
VCF
GFF3
BigWig
Annotators
Google Web Toolkit (GWT) /
Bootstrap
JBrowse DOJO / jQuery JBrowse Data
Organism 1
Load genomic
evidence for
selected organism
Single Data Store
PostgreSQL, MySQL,
MongoDB, ElasticSearch
Apollo v2.0
8. NEW APOLLO ARCHITECTURE
simpler, more flexible
ARCHITECTURE 8
REST / JSON
Websockets
Annotation Engine (Server)
Shiro
LDAP
OAuth
JBrowse Data
Organism 2
Annotations
Security
Preferences
Organisms
Tracks
BAM
BED
VCF
GFF3
BigWig
Annotators
Google Web Toolkit (GWT) /
Bootstrap
JBrowse DOJO / jQuery JBrowse Data
Organism 1
Single Data Store
PostgreSQL, MySQL,
MongoDB, ElasticSearch
Apollo v2.0
Single Data Store
PostgreSQL, MySQL,
MongoDB, ElasticSearch
Grails controllers (J2EE servlet) route
requests to the appropriate JBrowse
data directory for a given organismNEW!
Load genomic
evidence for
selected organism
9. NEW APOLLO ARCHITECTURE
simpler, more flexible
ARCHITECTURE 9
REST / JSON
Websockets
Annotation Engine (Server)
Shiro
LDAP
OAuth
JBrowse Data
Organism 2
Annotations
Security
Preferences
Organisms
Tracks
BAM
BED
VCF
GFF3
BigWig
Annotators
Google Web Toolkit (GWT) /
Bootstrap
JBrowse DOJO / jQuery JBrowse Data
Organism 1
Single Data Store
PostgreSQL, MySQL,
MongoDB, ElasticSearch
Apollo v2.0
Load genomic
evidence for
selected organism
Single Data Store
PostgreSQL, MySQL,
MongoDB, ElasticSearch
A single, queryable datastore houses annotations
NEW!
Apollo v2.0
10. HIGHLIGHTED IMPROVEMENTS
scalability
SCALABILITY 10
• Improvements
to
architecture:
easier
deployment,
be]er
documentaJon
• Supports
mulJple
organisms
per
server,
improved
comparaJve
tools
• Easier
to
query
the
data
and
build
extensions
• More
flexible
user
interface
via
removable
side-‐dock
with
customizable
tabs;
be]er
search
funcJonality,
validaJon
checks,
and
ediJng
capability
• Allows
larger
set
of
sequence
annotaJons
based
on
the
Sequence
Ontology
• Offers
fine-‐grained
user
and
group
level
permissions
11. HIGHLIGHTED IMPROVEMENTS
removable side dock with customizable tabs
HIGHLIGHTED IMPROVEMENTS 11
Tracks Organism Users Groups PreferencesAnnotations
Reference
Sequence
13. HIGHLIGHTED IMPROVEMENTS
visible in the Apollo window
HIGHLIGHTED IMPROVEMENTS 13
AutomaJcally
calculates
upstream
and
downstream
acceptor
and
donor
sites.
16. COLLABORATIONS
Apollo is open-source and extensible
HIGHLIGHTED IMPROVEMENTS 16
The Genome Sequence Annotation Server (GenSAS)
Annotate
Examples:
• GenSAS
whole-‐genome
structural
annotaJon
pipeline.
• i5K
Workspace@NAL
space
to
display
and
share
genome
assemblies
&
gene
models,
and
conduct
manual
annotaJon
efforts.
Apollo
users
can
add
so.ware
to
support
their
own
workflow.
18. JOIN US
Footer 18
h]p://GenomeArchitect.org/
Nathan
Dunn
Apollo
Technical
Lead
Please
bring
your
suggesJons,
requests,
and
contribuJons
to:
Special
Thanks
to:
Stephen
Ficklin
GenSAS,
Washington
State
University
Deepak
Unni
Colin
Diesh
Apollo
Developers,
University
of
Missouri
Suzi
Lewis
Principal
InvesJgator
BBOP
Eric
Yao
JBrowse,
UC
Berkeley
19. • Berkeley
Bioinforma9cs
Open-‐source
Projects
(BBOP),
Berkeley
Lab:
Web
Apollo
and
Gene
Ontology
teams.
Suzanna
E.
Lewis
(PI).
• §
Chris5ne
G.
Elsik
(PI).
University
of
Missouri.
• *
Ian
Holmes
(PI).
University
of
California
Berkeley.
• Arthropod
genomics
community:
i5K
Steering
Commi]ee
(esp.
Sue
Brown
(Kansas
State)),
Alexie
Papanicolaou
(UWS),
BGI,
Oliver
Niehuis
(1KITE
h]p://www.1kite.org/),
and
the
Honey
Bee
Genome
Sequencing
ConsorJum.
• Apollo
is
supported
by
NIH
grants
5R01GM080203
from
NIGMS,
and
5R01HG004483
from
NHGRI;
by
Contract
No.
60-‐8260-‐4-‐005
from
the
NaJonal
Agricultural
Library
(NAL)
at
the
United
States
Department
of
Agriculture
(USDA);
and
by
the
Director,
Office
of
Science,
Office
of
Basic
Energy
Sciences,
of
the
U.S.
Department
of
Energy
under
Contract
No.
DE-‐AC02-‐05CH11231.
• Insect
images
used
with
permission:
h]p://AlexanderWild.com
• For
your
aAen9on,
thank
you!
Thank you. 19
Web
Apollo
Nathan
Dunn
Colin
Diesh
§
Deepak
Unni
§
Gene
Ontology
Chris
Mungall
Seth
Carbon
Heiko
Dietze
BBOP
Web
Apollo:
h]p://GenomeArchitect.org
i5K:
h]p://arthropodgenomes.org/wiki/i5K
GO:
h]p://GeneOntology.org
Thanks!
NAL
at
USDA
Monica
Poelchau
Christopher
Childers
Gary
Moore
HGSC
at
BCM
fringy
Richards
Dan
Hughes
Kim
Worley
JBrowse
Eric
Yao
*