Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apollo: Scalable & collaborative curation of genomes - Biocuration 2015

567 views

Published on

Talk at the 8th International Biocuration Conference. Beijing, China. April 23-26, 2015.

Obtaining meaningful results from genome analyses requires high quality annotations of all genomic elements. Today’s sequencing projects face challenges such as lower coverage, more frequent assembly errors, and the lack of closely related species with well-annotated genomes. Apollo is a web-based application that supports and enables collaborative genome curation in real time, analogous to Google Docs, allowing curators to improve on existing automated gene models through an intuitive interface. Apollo’s extensible architecture is built on top of JBrowse; its components are a web-based client, an annotation-editing engine, and a server-side data service. It allows users to visualize automated gene models, protein alignments, expression and variant data, and conduct structural and/or functional annotations.

Apollo is actively used within a variety of projects, including the initiative to sequence the genomes of 5,000 Arthropod species (i5K), and will become essential to the thousands of genomes now being sequenced and analyzed. Researchers from nearly 100 institutions worldwide are currently using Apollo on distributed curation efforts for over sixty genome projects across the tree of life; from plants to echinoderms, to fungi, to species of fish and other vertebrates including human, cattle (bovine), and dog. We are training the next generation of researchers by reaching out to educators to make these tools available as part of curricula, offering workshops and webinars to the scientific community, and through widely applied systems such as iPlant and DNA Subway. We are currently integrating Apollo into an annotation environment combining gene structural and functional annotation, transcriptomic, proteomic, and phenotypic annotation. In this presentation we will describe in detail its utility to users, introduce the architecture to developers interested in expanding on this open-source project, and offer details of our future plans.

Authors:
Monica Munoz-Torres(1), Nathan Dunn(1), Colin Diesh(2), Deepak Unni(2), Seth Carbon(1), Heiko Dietze(1), Christopher Mungall(1), Nicole Washington(1), Ian Holmes(3), Christine Elsik(2), and Suzanna E. Lewis(1)

1Lawrence Berkeley National Laboratory, Genomics Division, Berkeley, CA
2Divisions of Animal and Plant Sciences, University of Missouri, Columbia, MO
3University of California Berkeley, Bioengineering, Berkeley, CA

Published in: Science
  • Be the first to comment

  • Be the first to like this

Apollo: Scalable & collaborative curation of genomes - Biocuration 2015

  1. 1. APOLLO: Scalable and collaborative genome curation Monica Munoz-Torres, PhD | @monimunozto
 
 Nathan Dunn, Colin Diesh*, Deepak Unni*, Seth Carbon, Heiko Dietze, Christopher Mungall, Nicole Washington, Ian Holmes*, Christine Elsik*, and Suzanna E. Lewis
 
 Berkeley Bioinformatics Open-Source Projects
 Genomics Division, Lawrence Berkeley National Laboratory
 
 8th International Biocuration Conference. Beijing, China. 24 April, 2015
  2. 2. OUTLINE
 •  LAST  TIME   where  we  le.  off  last  year     •  IMPROVEMENTS   architecture,  scalability,  features     •  COLLABORATIONS   JBrowse  &  GenSAS     •  FUTURE  PLANS   what  lies  on  the  horizon   Apollo  Scalable  and  CollaboraJve     Genome  CuraJon   2OUTLINE
  3. 3. APOLLO
 genome annotation editing tool 3 v  Web  based,  integrated  with  JBrowse.   v  Supports  real  Jme  collaboraJon!   v  AutomaJc  generaJon  of  ready-­‐made  computable  data.     v  Supports  annotaJon  of  genes,    pseudogenes,  tRNAs,  snRNAs,   snoRNAs,  ncRNAs,  miRNAs,  TEs,  and  repeats.   v  IntuiJve  annotaJon,  gestures,  and  pull-­‐down  menus  to  create  and   edit  transcripts  and  exons  structures,  insert  comments  (CV,  freeform   text),  GO  terms,  etc.   INTRODUCTION
  4. 4. DETAILS FROM OUR LAST UPDATE
 •  ~  100  insJtuJons  worldwide     •  >  60  genomes  across  the  tree  of  life:     •  from  plants  to  arthropods,  to  fungi,     to  fish  and  other  vertebrates  including   human,  bovine  ca]le,  and  dog   PREVIOUSLY WE LEARNED 4 ©BroadInsJtute.org     Nature Rev Gen 2009 ©alexanderwild.com ©alexanderwild.com ©outdooralabama.com National Agricultural Library
  5. 5. LESSONS WE HAVE LEARNED
 What  we  have  learned:     •  CollaboraJve  work  disJlls  invaluable  knowledge   •  We  must  enforce  strict  rules  and  formats   •  We  must  evolve  with  the  data   •  A  li]le  training  goes  a  long  way   •  NGS  poses  addiJonal  challenges   PREVIOUSLY WE LEARNED 5
  6. 6. HIGHLIGHTED IMPROVEMENTS
 scalability SCALABILITY 6 •  Easier  deployment,  more  detailed  documentaJon   •  Supports  mulJple  organisms  per  server,  improved  comparaJve  tools   •  Easier  to  query  the  data  and  build  extensions     •  More  flexible  user  interface  via  removable  side-­‐dock  with  customizable  tabs;   be]er  search  funcJonality,  validaJon  checks,  and  ediJng  capability     •  Allows  larger  set  of  sequence  annotaJons  based  on  the  Sequence  Ontology   •  Offers  fine-­‐grained  user  and  group  level  permissions  
  7. 7. NEW APOLLO ARCHITECTURE
 simpler, more flexible ARCHITECTURE 7 Web-­‐based  client  +  annotaJon-­‐ediJng  engine  +  server-­‐side  data  service   REST / JSON Websockets Annotation Engine (Server) Shiro LDAP OAuth JBrowse Data Organism 2 Annotations Security Preferences Organisms Tracks BAM BED VCF GFF3 BigWig Annotators Google Web Toolkit (GWT) / Bootstrap JBrowse DOJO / jQuery JBrowse Data Organism 1 Load genomic evidence for selected organism Single Data Store PostgreSQL, MySQL, MongoDB, ElasticSearch Apollo v2.0
  8. 8. NEW APOLLO ARCHITECTURE
 simpler, more flexible ARCHITECTURE 8 REST / JSON Websockets Annotation Engine (Server) Shiro LDAP OAuth JBrowse Data Organism 2 Annotations Security Preferences Organisms Tracks BAM BED VCF GFF3 BigWig Annotators Google Web Toolkit (GWT) / Bootstrap JBrowse DOJO / jQuery JBrowse Data Organism 1 Single Data Store PostgreSQL, MySQL, MongoDB, ElasticSearch Apollo v2.0 Single Data Store PostgreSQL, MySQL, MongoDB, ElasticSearch     Grails controllers (J2EE servlet) route requests to the appropriate JBrowse data directory for a given organismNEW! Load genomic evidence for selected organism
  9. 9. NEW APOLLO ARCHITECTURE
 simpler, more flexible ARCHITECTURE 9 REST / JSON Websockets Annotation Engine (Server) Shiro LDAP OAuth JBrowse Data Organism 2 Annotations Security Preferences Organisms Tracks BAM BED VCF GFF3 BigWig Annotators Google Web Toolkit (GWT) / Bootstrap JBrowse DOJO / jQuery JBrowse Data Organism 1 Single Data Store PostgreSQL, MySQL, MongoDB, ElasticSearch Apollo v2.0 Load genomic evidence for selected organism Single Data Store PostgreSQL, MySQL, MongoDB, ElasticSearch A single, queryable datastore houses annotations NEW! Apollo v2.0
  10. 10. HIGHLIGHTED IMPROVEMENTS
 scalability SCALABILITY 10 •  Improvements  to  architecture:  easier  deployment,  be]er  documentaJon   •  Supports  mulJple  organisms  per  server,  improved  comparaJve  tools   •  Easier  to  query  the  data  and  build  extensions     •  More  flexible  user  interface  via  removable  side-­‐dock  with  customizable  tabs;   be]er  search  funcJonality,  validaJon  checks,  and  ediJng  capability     •  Allows  larger  set  of  sequence  annotaJons  based  on  the  Sequence  Ontology     •  Offers  fine-­‐grained  user  and  group  level  permissions  
  11. 11. HIGHLIGHTED IMPROVEMENTS
 removable side dock with customizable tabs HIGHLIGHTED IMPROVEMENTS 11 Tracks Organism Users Groups PreferencesAnnotations Reference Sequence
  12. 12. HIGHLIGHTED IMPROVEMENTS
 annotation details, exon boundaries, data export HIGHLIGHTED IMPROVEMENTS 12 Annotations Reference Sequences 1 2 3 1 2 3
  13. 13. HIGHLIGHTED IMPROVEMENTS
 visible in the Apollo window HIGHLIGHTED IMPROVEMENTS 13 AutomaJcally  calculates  upstream  and   downstream  acceptor  and  donor  sites.  
  14. 14. OTHER IMPROVEMENTS
 behind the scenes OTHER IMPROVEMENTS 14 h]ps://github.com/GMOD/Apollo  
  15. 15. APOLLO
 demonstration DEMO 15 See  Apollo  DemonstraJon  Video  at:   h]ps://youtu.be/VgPtAP_fvxY      
  16. 16. COLLABORATIONS
 Apollo is open-source and extensible HIGHLIGHTED IMPROVEMENTS 16 The Genome Sequence Annotation Server (GenSAS) Annotate Examples:     •  GenSAS     whole-­‐genome  structural   annotaJon  pipeline.   •  i5K  Workspace@NAL   space  to  display  and  share   genome  assemblies  &   gene  models,  and  conduct   manual  annotaJon   efforts.   Apollo  users  can  add  so.ware  to  support  their  own  workflow.  
  17. 17. FUTURE PLANS
 currently working on Footer 17
  18. 18. JOIN US
 Footer 18 h]p://GenomeArchitect.org/   Nathan  Dunn     Apollo  Technical  Lead   Please  bring  your  suggesJons,  requests,   and  contribuJons  to:   Special  Thanks  to:   Stephen  Ficklin   GenSAS,  Washington   State  University     Deepak  Unni   Colin  Diesh   Apollo  Developers,     University  of  Missouri   Suzi  Lewis   Principal  InvesJgator   BBOP   Eric  Yao   JBrowse,  UC  Berkeley  
  19. 19. •  Berkeley  Bioinforma9cs  Open-­‐source  Projects  (BBOP),   Berkeley  Lab:  Web  Apollo  and  Gene  Ontology  teams.   Suzanna  E.  Lewis  (PI).   •  §  Chris5ne  G.  Elsik  (PI).  University  of  Missouri.     •  *  Ian  Holmes  (PI).  University  of  California  Berkeley.   •  Arthropod  genomics  community:  i5K  Steering   Commi]ee  (esp.  Sue  Brown  (Kansas  State)),  Alexie   Papanicolaou  (UWS),  BGI,  Oliver  Niehuis  (1KITE   h]p://www.1kite.org/),  and  the  Honey  Bee  Genome   Sequencing  ConsorJum.   •  Apollo  is  supported  by  NIH  grants  5R01GM080203  from   NIGMS,  and  5R01HG004483  from  NHGRI;  by  Contract   No.  60-­‐8260-­‐4-­‐005  from  the  NaJonal  Agricultural  Library   (NAL)  at  the  United  States  Department  of  Agriculture   (USDA);  and  by  the  Director,  Office  of  Science,  Office  of   Basic  Energy  Sciences,  of  the  U.S.  Department  of  Energy   under  Contract  No.  DE-­‐AC02-­‐05CH11231.   •  Insect  images  used  with  permission:   h]p://AlexanderWild.com   •  For  your  aAen9on,  thank  you!   Thank you. 19 Web  Apollo   Nathan  Dunn   Colin  Diesh  §   Deepak  Unni  §       Gene  Ontology   Chris  Mungall   Seth  Carbon   Heiko  Dietze     BBOP   Web  Apollo:  h]p://GenomeArchitect.org     i5K:  h]p://arthropodgenomes.org/wiki/i5K   GO:  h]p://GeneOntology.org   Thanks!   NAL  at  USDA   Monica  Poelchau   Christopher  Childers   Gary  Moore   HGSC  at  BCM   fringy  Richards   Dan  Hughes   Kim  Worley     JBrowse          Eric  Yao  *  

×