Web Apollo: A Web-based Genomics Annotation Editing Platform. 13ArthGen


This is the talk I gave at the 7th Arthropod Genomics Symposium, hosted by the Eck Institute for Global Health at University of Notre Dame in South Bend, Indiana, USA.

More efficient sequencing technologies mean a dramatic increase in our access to whole genome sequences, and annotation efforts must adapt to keep pace in converting these sequence data into knowledge. The growing number of genome sequencing projects also means there will be a larger reliance on contributions from domain specialists. This is indicative of a curation environment shifting from a traditional centralized model to a geographically dispersed community annotation model, which requires new tools to support collaborative annotation. WebApollo is a successor to the Apollo annotation editor; it provides a web-based environment that allows multiple distributed users to review, edit, and share manual annotations. The WebApollo client is designed as an extension to JBrowse, a genome browser that provides a fast, highly interactive interface for visualization of genomic data. WebApollo allows users to create and modify transcript and exon structures through intuitive gestures, and flags potential problems within these manual annotations.

  1. 1. {Web ApolloA Web-based Genomics Annotation Editing PlatformEduardo Lee, Gregg Helt, Justin Reese, Monica Munoz-Torres*, ChristopherChilders, Rob Buels, Lincoln Stein, Ian Holmes, Christine Elsik, Suzanna LewisArthropod Genomics Symposium 2013 | South Bend, IN | * @monimunoztoLawrence Berkeley National Laboratory, Joint Genome Institute, for the US Department of Energy at UCB
  2. 2.  The first real-time, collaborative genomicsannotation editor on the Web Easy-to-use environment for multiple,distributed users to review, update, and sharegenome feature markupsWeb Apollo is:
  3. 3. Working Concept‘Gene Models’Automated predictions,assisted by some evidence‘Evidence’cDNA, HMM searches forprotein domains, alignmentsof assemblies, curated genesor other species‘Manual annotation’:Correct coordinates forgenes of interest.
  4. 4. The need for annotation toolsAssemblyManualannotationExperimentalvalidationAutomatedAnnotationRequires optimized genomevisualization and editing toolsThe need for genome visualization and editing tools prompted thedevelopment of the genome browsers we commonly use.Annotation editing tools then became necessary.
  5. 5.  Gather and evaluate all available evidence using quality-control metrics, to corroborate/modify automated predictions Use literature and public databases to infer gene functionfrom experimental data Run sequence-similarity searches within a phylogeneticframework (e.g. alignment trees) To predict protein functional assignments Distinguish orthologs from paralogs, classify genes as membersof a family Otherwise, incorrect and incomplete genome annotations willpoison every experiment that uses themManual curation is necessary!
  6. 6.  Access to computational analysis& experimental evidence Manual annotation & curation Compatibility with GMOD Saved annotations directly todatabase (not via email)* Widely used (initially designedfor centralized, resource-richprojects).Apollo: Desktop and Java Web Start*
  7. 7.  BUT… Must load all data for a region (range) at once No [automated]* support for sharing Possible update conflicts due to stale annotation data One annotator at a time Edits from other users not visible without reloading Require Apollo Download, Chado Install, Java Installation*Apollo: Desktop and Java Web Start*
  8. 8. The need for updated toolsThe democratization of genome-scale sequencingcalls for a new kind of annotation editing tool.• more assembly errors• lack of gold standard genestructure training data
  9. 9.  No installation required (for users). User interface is a browser-basedJavascript client communicating with anannotation editing server.Apollo: on the Web Is a plug-in for JBrowse, a successor tothe GBrowse genome browser. (GMOD) Plug-in offers a ‚User-createdAnnotations‛ track. Real Time annotation updates;annotations saved to centralizeddatabase. Uses dynamic (lazy) data loading:only the region of interest Customizable: rules, appearance. Supports user authentication: read, edit, review, complete, publish (export). Automatically promote tracks (script).
  10. 10. Navigation tools:pan and zoomSearch box: goto a scaffold or agene model.Grey bar of coordinatesindicates location. You canalso select here in order tozoom to a sub-region.‘Options’:change color byCDS, togglestrands‘File’:Upload yourown evidence:GFF3, BAM,BigWig, VCF*‘Tools’:Use BLAT to query thegenome with a proteinor DNA sequence.Available TracksEvidence Tracks Area‘User-created Annotations’ Track‘Share’: a stable linkshares your view andshows exactly whatyou are seeing (keeps arecord of yourannotation process)LoginWeb ApolloGraphical User Interface (GUI) for editing annotations
  11. 11. Flags non-canonicalsplice sites.Selection of features andsub-featuresEdge-matchingEvidence Tracks Area‘User-created Annotations’ TrackWeb Apollo The editing logic is on the server: selects longest ORF as CDS flags non-canonical splice sites
  12. 12. DNA Track‘User-created Annotations’ Track Two new kinds of tracks: annotation editing sequence alteration editingWeb Apollo
  13. 13. Web Apollo Annotations, annotation edits and History are stored in centralizeddatabase
  14. 14. Web Apollo Annotation Information Editor
  15. 15. - BAM- BigWig- GFF3- VCF*TrellisData Broker(Java)Static JSONGeneration Pipeline(Perl)Server-side Data Service Annotation Editing Engine (Java)Berkeley DBrealtime storeUserManagementData SourcesAnalysis Pipelines- BAM- BED- BigWig- GFF3- MAKERoutputData RepositoriesChadoMySQLDAS serverse.g. EnsemblAnnotation ExportsLocal DB.e.g. Chado- GFF3- FASTAAnnotatorsWebApolloJBrowseApollo Edit Operations& User ManagementUser Interface (JavaScript)JSONWeb ApolloArchitecture
  16. 16. DEMO
  17. 17.  Ability to annotate regulatory regions & features Collapsing and expanding tracks Sticky ‘User Annotations’ track Genome slicing: annotating across contigs Folding of intronic space Web Apollo at GMOD in the cloud[Near] Future Enhancements
  18. 18.  Release http://genomearchitect.org/webapollo/releases Demo Site http://genomearchitect.org/WebApolloDemo User Guide http://genomearchitect.org/webapollo/docs/webapollo_user_guide.pdf At GMOD http://gmod.org/wiki/WebApolloReleases & Demo
  19. 19.  To all our users & contributors! Especially: Code: Mitch Skinner, Nomi Harris, Thomas Down, Carson Holt. Feedback: Sue Brown, Sanjay Chellapilla, Daniel Ence, JuergenGadau, Nicolae Herndon, Elisabeth Huguet, Carolyn Lawrence,Sasha Mikheyev, Barry Moore, Jan Oettler, Xiang Qin, LukasSchrader, Kim Worley, Mark Yandell, Jing-Jiang Zhou.Formatting: Anna Bennett. To our funding agencies: NIH: NIGMS and NHGRI. DOE: Office of the Director, Office of Science, Office of BasicEnergy Sciences.Thanks
  20. 20. Thanks VectorBase AgriPest Base FlyBase Hymenoptera Genome Database Apis mellifera Tribolium castaneum Pogonomyrmex barbatus Manduca sexta Bombus terrestris Helicoverpa armigera Nasonia vitripennis Acyrthosiphon pisum Mayetiola destructor Atta cephalotes Linepithema humile Camponotus floridanus Solenopsis invicta Acromyrmex echinatior