An introduction to Web Apollo.
A webinar for the i5K Pilot Species Projects - Hemiptera
Monica Munoz-Torres, PhD
Biocurato...
Outline
1. What is Web Apollo?:
• Definition & working concept.
2. Community based curation from our
experience. Lessons L...
What is Web Apollo?
• Web Apollo is a web-based, collaborative genomic
annotation editing platform.
We need annotation edi...
Brief history of Apollo*:
a. Desktop:
one person at a time editing a
specific region, annotations
saved in local files; sl...
Web Apollo
• Browser-based; plugin for JBrowse.
• Allows for intuitive annotation creation and editing,
with gestures and ...
Working
Concept
In the context of gene manual annotation,
curation tries to find the best examples
and/or eliminate (most)...
Dispersed, community-based gene
manual annotation efforts.
Using Web Apollo, we* have trained
geographically dispersed sci...
What have we learned?
Harvesting expertise from dispersed researchers who
assigned functions to predicted and curated pept...
It is helpful to work together.
Scientific community efforts bring together domain-
specific and natural history expertise...
Improved Automated Annotations*
In many cases, automated annotations have been
improved (e.g: Apis mellifera. Elsik et al....
Understanding the evolution of sociality.
Comparison of the genomes of 7 species of
ants contributed to a better understan...
A little training goes a long way!
With the right tools, wet lab scientists make exceptional
curators who can easily learn...
Manual annotation at i5K
How do we get there?
3. How do we get there? 13
Assembly
Manual
annotation
Experimental
validatio...
Gene Prediction
Gene Prediction:
Identification of protein-coding genes, tRNAs, rRNAs,
regulatory motifs, repetitive eleme...
Gene Annotation
Gene Annotation:
Integration of data from prediction tools to generate a
consensus set of predictions (gen...
The Collaborative Curation Process at
i5K
1) A computationally predicted consensus gene set has
been generated using multi...
Consensus set: reference and start point
• In some cases algorithms and metrics used to generate
consensus sets may actual...
Navigation tools:
pan and zoom Search box: go
to a scaffold or
a gene model.
Grey bar of coordinates
indicates location. Y...
Flags non-
canonical splice
sites.
Selection of features and
sub-features
Edge-matching
Evidence Tracks Area
‘User-created...
DNA Track
‘User-created Annotations’ Track
 Two new kinds of tracks:
 annotation editing
 sequence alteration editing
W...
Web Apollo
 Annotations, annotation edits, and History: stored in a centralized database.
4. Becoming Acquainted with Web...
Web Apollo
 Annotation Information Editor
4. Becoming Acquainted with Web Apollo.
Web Apollo
 Annotation Information Editor
4. Becoming Acquainted with Web Apollo.
[Some of the] Functionality:
 Protein-coding gene annotation (that you know and love)
 Sequence alterations (less covera...
Example: ORCO
Live Demonstration using the Cimex lectularius genome
Footer 25
Arthropodcentric Thanks!
AgriPest Base
FlyBase
Hymenoptera Genome Database
VectorBase
Apis mellifera
Tribolium castaneum
P...
Thanks!
• Berkeley Bioinformatics Open-source Projects
(BBOP), Berkeley Lab: Web Apollo and Gene
Ontology teams. Suzanna E...
Upcoming SlideShare
Loading in …5
×

An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

321 views
294 views

Published on

Introduction to Web Apollo for the i5K Pilot species project. WebApollo is genome annotation editor; it provides a web-based environment that allows multiple distributed users to review, edit, and share manual annotations. This presentation includes information specific to the projects of the Global Initiative to sequence the genomes of 5,000 species of arthropods, i5K. Let's get started!

Published in: Science, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
321
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Thank you!
  • An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

    1. 1. An introduction to Web Apollo. A webinar for the i5K Pilot Species Projects - Hemiptera Monica Munoz-Torres, PhD Biocurator & Bioinformatics Analyst | @monimunozto Genomics Division, Lawrence Berkeley National Laboratory 12+1 May, 2014 UNIVERSITY OF CALIFORNIA
    2. 2. Outline 1. What is Web Apollo?: • Definition & working concept. 2. Community based curation from our experience. Lessons Learned. 3. Manual Annotation at i5K: how do we get there? 4. Becoming acquainted with Web Apollo. An introduction to Web Apollo. A webinar for the i5K Pilot Species Projects - Hemiptera. Outline 2
    3. 3. What is Web Apollo? • Web Apollo is a web-based, collaborative genomic annotation editing platform. We need annotation editing tools to modify and refine the precise location and structure of the genome elements that predictive algorithms cannot yet resolve automatically. 31. What is Web Apollo? Find more about Web Apollo at http://GenomeArchitect.org and Genome Biol 14:R93. (2013).
    4. 4. Brief history of Apollo*: a. Desktop: one person at a time editing a specific region, annotations saved in local files; slowed down collaboration. b. Java Web Start: users saved annotations directly to a centralized database; potential issues with stale annotation data remained. 1. What is Web Apollo? 4 Biologists could finally visualize computational analyses and experimental evidence from genomic features and build manually-curated consensus gene structures. Apollo became a very popular, open source tool (insects, fish, mammals, birds, etc.). *
    5. 5. Web Apollo • Browser-based; plugin for JBrowse. • Allows for intuitive annotation creation and editing, with gestures and pull-down menus to create transcripts, add/delete/resize exons, merge/split exons or transcripts, insert comments (CV, freeform text), etc. • Customizable rules and appearance. • Edits in one client are instantly pushed to all other clients: Collaborative! 1. What is Web Apollo? 5
    6. 6. Working Concept In the context of gene manual annotation, curation tries to find the best examples and/or eliminate (most) errors. To conduct manual annotation efforts: Gather and evaluate all available evidence using quality-control metrics to corroborate or modify automated annotation predictions. Perform sequence similarity searches (phylogenetic framework) and use literature and public databases to: • Predict functional assignments from experimental data. • Distinguish orthologs from paralogs, and classify gene membership in families and networks. 2. In our experience. 6 Automated gene models Evidence: cDNAs, HMM domain searches, alignments with assemblies or genes from other species. Manual annotation & curation
    7. 7. Dispersed, community-based gene manual annotation efforts. Using Web Apollo, we* have trained geographically dispersed scientific communities to perform biologically supported manual annotations, and monitored their findings: ~80 institutions, 14 countries, hundreds of scientists, and gate keepers. – Training workshops and geneborees. – Tutorials with detailed instructions. – Personalized user support. 2. In our experience. 7 *Collaboration with Elsik Lab, Hymenoptera Genome Database.
    8. 8. What have we learned? Harvesting expertise from dispersed researchers who assigned functions to predicted and curated peptides, we have developed more interactive and responsive tools, as well as better visualization, editing, and analysis capabilities. 82. In our experience.
    9. 9. It is helpful to work together. Scientific community efforts bring together domain- specific and natural history expertise that would have otherwise remain disconnected. 92. In our experience.
    10. 10. Improved Automated Annotations* In many cases, automated annotations have been improved (e.g: Apis mellifera. Elsik et al. BMC Genomics 2014, 15:86). Also, learned of the challenges of newer sequencing technologies, e.g.: – Frameshifts and indel errors – Split genes across scaffolds – Highly repetitive sequences To face these challenges, we train annotators in recovering coding sequences in agreement with all available biological evidence. 102. In our experience.
    11. 11. Understanding the evolution of sociality. Comparison of the genomes of 7 species of ants contributed to a better understanding of the evolution and organization of insect societies at the molecular level. Insights drawn mainly from six core aspects of ant biology: 1. Alternative morphological castes 2. Division of labor 3. Chemical Communication 4. Alternative social organization 5. Social immunity 6. Mutualism 11 … groups of communities have taught us a lot! Libbrecht et al. 2012. Genome Biology 2013, 14:212 2. In our experience.
    12. 12. A little training goes a long way! With the right tools, wet lab scientists make exceptional curators who can easily learn to maximize the generation of accurate, biologically supported gene models. 122. In our experience.
    13. 13. Manual annotation at i5K How do we get there? 3. How do we get there? 13 Assembly Manual annotation Experimental validation Automated Annotation In a genome sequencing project…
    14. 14. Gene Prediction Gene Prediction: Identification of protein-coding genes, tRNAs, rRNAs, regulatory motifs, repetitive elements (masked), etc. Ab initio or homology-based. E.g: fgenesh, Augustus, geneid, SGP2 14 Nucleic Acids 2003 vol. 31 no. 13 3738-3741 3. How do we get there?
    15. 15. Gene Annotation Gene Annotation: Integration of data from prediction tools to generate a consensus set of predictions (gene models). • Models may be organized by: - automatic integration of predicted sets; e.g: GLEAN - packaging necessary tools into pipeline; e.g: MAKER • Transcriptomes are used to further inform the annotation process. 153. How do we get there?
    16. 16. The Collaborative Curation Process at i5K 1) A computationally predicted consensus gene set has been generated using multiple lines of evidence; e.g. CLEC_v0.5.3-Models. 2) i5K Projects will integrate consensus computational predictions with manual annotations to produce an updated Official Gene Set (OGS): » If it’s not on either track, it won’t make the OGS! » If it’s there and it shouldn’t, it will still make the OGS! 163. How do we get there?
    17. 17. Consensus set: reference and start point • In some cases algorithms and metrics used to generate consensus sets may actually reduce the accuracy of the gene’s representation; e.g. use Augustus model instead to create a new annotation. • Isoforms: drag original and alternatively spliced form to ‘User- created Annotations’ area. • If an annotation needs to be removed from the consensus set, drag it to the ‘User-created Annotations’ area and label as ‘Delete’ on Information Editor. • Overlapping interests? Collaborate to reach agreement. • Follow guidelines for i5K Pilot Species Projects as shown at http://goo.gl/LRu1VY 173. How do we get there?
    18. 18. Navigation tools: pan and zoom Search box: go to a scaffold or a gene model. Grey bar of coordinates indicates location. You can also select here in order to zoom to a sub-region. ‘View’: change color by CDS, toggle strands, set highlight. ‘File’: Upload your own evidence: GFF3, BAM, BigWig, VCF*. Add combination and sequence search tracks. ‘Tools’: Use BLAT to query the genome with a protein or DNA sequence. Available Tracks Evidence Tracks Area ‘User-created Annotations’ Track Login Web Apollo Graphical User Interface (GUI) for editing annotations 4. Becoming Acquainted with Web Apollo.
    19. 19. Flags non- canonical splice sites. Selection of features and sub-features Edge-matching Evidence Tracks Area ‘User-created Annotations’ Track  The editing logic (server):  selects longest ORF as CDS  flags non-canonical splice sites Web Apollo 4. Becoming Acquainted with Web Apollo.
    20. 20. DNA Track ‘User-created Annotations’ Track  Two new kinds of tracks:  annotation editing  sequence alteration editing Web Apollo 4. Becoming Acquainted with Web Apollo.
    21. 21. Web Apollo  Annotations, annotation edits, and History: stored in a centralized database. 4. Becoming Acquainted with Web Apollo.
    22. 22. Web Apollo  Annotation Information Editor 4. Becoming Acquainted with Web Apollo.
    23. 23. Web Apollo  Annotation Information Editor 4. Becoming Acquainted with Web Apollo.
    24. 24. [Some of the] Functionality:  Protein-coding gene annotation (that you know and love)  Sequence alterations (less coverage = more fragmentation)  Visualization of stage and cell-type specific transcription data as coverage plots, heat maps, and alignments 4. Becoming Acquainted with Web Apollo.
    25. 25. Example: ORCO Live Demonstration using the Cimex lectularius genome Footer 25
    26. 26. Arthropodcentric Thanks! AgriPest Base FlyBase Hymenoptera Genome Database VectorBase Apis mellifera Tribolium castaneum Pogonomyrmex barbatus Manduca sexta Bombus terrestris Helicoverpa armigera Nasonia vitripennis Acyrthosiphon pisum Mayetiola destructor Atta cephalotes Linepithema humile Camponotus floridanus Solenopsis invicta Acromyrmex echinatior
    27. 27. Thanks! • Berkeley Bioinformatics Open-source Projects (BBOP), Berkeley Lab: Web Apollo and Gene Ontology teams. Suzanna E. Lewis (PI). • Elsik Lab. § University of Missouri. Christine G. Elsik (PI). • Ian Holmes (PI). * University of California Berkeley. • Arthropod genomics community, i5K http://www.arthropodgenomes.org/wiki/i5K Steering Committee, USDA/NAL, HGSC-BCM, BGI, and 1KITE http://www.1kite.org/. • Web Apollo is supported by NIH grants 5R01GM080203 from NIGMS, and 5R01HG004483 from NHGRI, and by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. • Insect images used with permission: http://AlexanderWild.com • For your attention, thank you! Thank you. 27 Web Apollo Ed Lee Gregg Helt Colin Diesh § Deepak Unni § Rob Buels * Gene Ontology Chris Mungall Seth Carbon Heiko Dietze BBOP Web Apollo: http://GenomeArchitect.org GO: http://GeneOntology.org i5K: http://arthropodgenomes.org/wiki/i5K

    ×