Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to Apollo - i5k Research Community – Calanoida (copepod)

180 views

Published on

Apollo is a web-based application that supports and enables collaborative genome curation in real time, allowing teams of curators to improve on existing automated gene models through an intuitive interface. Apollo allows researchers to break down large amounts of data into manageable portions to mobilize groups of researchers with shared interests.

The i5k, an initiative to sequence the genomes of 5,000 insect and related arthropod species, is a broad and inclusive effort that seeks to involve scientists from around the world in their genome curation process, and Apollo is serving as the platform to empower this community.

This presentation is an introduction to Apollo for the members of the i5K Pilot Project working on species of the order Calanoida (copepod).

Published in: Science
  • Be the first to comment

  • Be the first to like this

Introduction to Apollo - i5k Research Community – Calanoida (copepod)

  1. 1. Introduction to Apollo Collaborative genome annotation editing A webinar for the i5K Research Community – Calanoida (copepod) Monica Munoz-Torres | @monimunozto Berkeley Bioinformatics Open-Source Projects (BBOP) Environmental Genomics & Systems Biology Division, Lawrence Berkeley National Laboratory i5k Pilot Project Species Calls | 17 October, 2016 http://GenomeArchitect.org
  2. 2. Outline • Today you will discover effective ways to extract valuable information about a genome through curation efforts.
  3. 3. After this talk you will... • Better understand ‘curation’ in the context of genome annotation: assembled genome à automated annotation à manual annotation • Become familiar with Apollo’s environment and functionality. • Learn to identify homologs of known genes of interest in your newly sequenced genome. • Learn how to corroborate and modify automatically annotated gene models using all available evidence in Apollo.
  4. 4. Experimental design, sampling. Comparative analyses Official / Merged Gene Set Manual Annotation Automated Annotation Sequencing Assembly Synthesis & dissemination. This is our focus.
  5. 5. We must care about curation Marbach et al. 2011. Nature Methods | Shutterstock.com | Alexander Wild The gene set of an organism informs a variety of studies: • Characterization: Gene number, GC%, TEs, repeats. • Functional assignments. • Molecular evolution, sequence conservation. • Gene families. • Metabolic pathways. • What makes an organism what it is? What makes a bee a “bee”?
  6. 6. Genome Curation Identifies elements that best represent the underlying biology and eliminates elements that reflect systemic errors of automated analyses. Assigns function through comparative analysis of similar genome elements from closely related species using literature, databases, and experimental data. Apollo Gene Ontology Resources
  7. 7. A few things to remember when conducting manual annotation 7BIO-REFRESHER • KEEP A GLOSSARY HANDY from contig to splice site • WHAT IS A GENE? defining your goal • TRANSCRIPTION mRNA in detail • TRANSLATION reading frames, etc. • GENOME CURATION steps involved
  8. 8. The gene: a “moving target” “The gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products.” Gerstein et al., 2007. Genome Res
  9. 9. 9 "Gene structure" by Daycd- Wikimedia Commons BIO-REFRESHER mRNA • Although of brief existence, understanding mRNAs is crucial, as they will become the center of your work.
  10. 10. 10BIO-REFRESHER Reading frames v In eukaryotes, only one reading frame per section of DNA is biologically relevant at a time: it has the potential to be transcribed into RNA and translated into protein. This is called the OPEN READING FRAME (ORF) • ORF = Start signal + coding sequence (divisible by 3) + Stop signal
  11. 11. 11BIO-REFRESHER Splice sites v The spliceosome catalyzes the removal of introns and the ligation of flanking exons. v Splicing signals (from the point of view of an intron): • One splice signal (site) on the 5’ end: usually GT (less common: GC) • And a 3’ end splice site: usually AG • Canonical splice sites look like this: …]5’-GT/AG-3’[…
  12. 12. 12BIO-REFRESHER Exons and Introns v Introns can interrupt the reading frame of a gene by inserting a sequence between two consecutive codons v Between the first and second nucleotide of a codon v Or between the second and third nucleotide of a codon "Exon and Intron classes”. Licensed under Fair use via Wikipedia
  13. 13. Prediction & Annotation
  14. 14. 14GENE PREDICTION & ANNOTATION PREDICTION & ANNOTATION v Identification and annotation of genome features: • primarily focuses on protein-coding genes. • also identifies RNAs (tRNA, rRNA, long and small non-coding RNAs (ncRNA)), regulatory motifs, repetitive elements, etc. • happens in 2 phases: 1. Computation phase 2. Annotation phase
  15. 15. 15GENE PREDICTION & ANNOTATION COMPUTATION PHASE a. Experimental data are aligned to the genome: expressed sequence tags, RNA-sequencing reads, proteins (also from other species). a. Gene predictions are generated: - ab initio: based on nucleotide sequence and composition e.g. Augustus, GENSCAN, geneid, fgenesh, etc. - evidence-driven: identifying also domains and motifs e.g. SGP2, JAMg, fgenesh++, etc. Result: the single most likely coding sequence, no UTRs, no isoforms. Yandell & Ence. Nature Rev 2012 doi:10.1038/nrg3174
  16. 16. 16GENE PREDICTION & ANNOTATION ANNOTATION PHASE Experimental data (evidence) and predictions are synthetized into gene annotations. Result: gene models that generally include UTRs, isoforms, evidence trails. Yandell & Ence. Nature Rev 2012 doi:10.1038/nrg3174 5’ UTR 3’ UTR
  17. 17. 17 In some cases algorithms and metrics used to generate consensus sets may actually reduce the accuracy of the gene’s representation. CONSENSUS GENE SETS Gene models may be organized into sets using: v combiners for automatic integration of predicted sets e.g: GLEAN, EvidenceModeler or v tools packaged into pipelines e.g: MAKER, PASA, Gnomon, Ensembl, etc. GENE PREDICTION & ANNOTATION
  18. 18. ANNOTATION needs some refinement No one is perfect, least of all automated annotation. 18 New technologies bring new challenges: • Assembly errors can cause fragmented annotations • Limited coverage makes precise identification a difficult task
  19. 19. MANUAL ANNOTATION improving predictions Precise elucidation of biological features encoded in the genome requires careful examination and review. Schiex et al. Nucleic Acids 2003 (31) 13: 3738-3741 Automated Predictions Experimental Evidence Manual Annotation – to the rescue. 19 cDNAs, HMM domain searches, RNAseq, genes from other species.
  20. 20. GENOME CURATION an inherently collaborative task GENE PREDICTION & ANNOTATION 20 So many sequences, not enough hands. Apis mellifera | Alexander Wild | www.alexanderwild.com
  21. 21. We have provided continuous training and support for hundreds of geographically dispersed scientists to conduct manual annotations efforts in order to recover coding sequences in agreement with all available biological evidence. 21 Collaboration is key! APOLLO • Collaborative work distills invaluable knowledge. • A little training goes a long way! Wet lab scientists can easily learn to maximize the generation of accurate, biologically supported gene models.
  22. 22. Apollo
  23. 23. APOLLO: versatile genome annotation editing • Apollo is a web-based genome annotation editor, integrated with JBrowse • Supports real time collaboration & generates analysis-ready data USER-CREATED ANNOTATIONS EVIDENCE TRACKS ANNOTATOR PANEL
  24. 24. BECOMING ACQUAINTED WITH APOLLO General process of curation 1. Select or find a region of interest, e.g. scaffold. 2. Select appropriate evidence tracks to review the gene model. 3. Determine whether a feature in an existing evidence track will provide a reasonable gene model to start working. 4. If necessary, adjust the gene model. 5. Check your edited gene model for integrity and accuracy by comparing it with available homologs. 6. Comment and finish.
  25. 25. Apollo- version at i5K Workspace@NAL 4. Becoming Acquainted with Web Apollo. 25 The Sequence Selection Window
  26. 26. Sort Apollo- version at i5K Workspace@NAL “Old Track Select Page” 4. Becoming Acquainted with Web Apollo. 26
  27. 27. APOLLO annotation editing environment BECOMING ACQUAINTED WITH APOLLO Color by CDS frame, toggle strands, set color scheme and highlights. - Upload evidence files (GFF3, BAM, BigWig), - combination track - sequence search track Query the genome using BLAT. Navigation and zoom. Search for a gene model or a scaffold. Get coordinates and “rubber band” selection for zooming. Login User-created annotations. New annotator panel. Evidence Tracks Stage and cell-type specific transcription data. http://genomearchitect.org/web_apollo_user_guide
  28. 28. 28 | BECOMING ACQUAINTED WITH APOLLO USER NAVIGATION Annotator panel. • Choose appropriate evidence from list of “Tracks” on annotator panel. • Select & drag elements from evidence track into the ‘User-created Annotations’ area. • Hovering over annotation in progress brings up an information pop-up. • Creating a new annotation
  29. 29. Adding a gene model
  30. 30. Adding a gene model
  31. 31. Adding a gene model
  32. 32. Editing functionality
  33. 33. Editing functionality Example: Adding an exon supported by experimental data • RNAseq reads show evidence in support of a transcribed product that was not predicted. • Add exon by dragging up one of the RNAseq reads.
  34. 34. Editing functionality Example: Adjusting exon boundaries supported by experimental data
  35. 35. Curating with Apollo
  36. 36. 36 | USER NAVIGATION BECOMING ACQUAINTED WITH APOLLO • ‘Zoom to base level’ reveals the DNA Track.
  37. 37. 37 | USER NAVIGATION BECOMING ACQUAINTED WITH APOLLO • Color exons by CDS from the ‘View’ menu.
  38. 38. 38 | Zoom in/out with keyboard: shift + arrow keys up/down USER NAVIGATION BECOMING ACQUAINTED WITH APOLLO • Toggle reference DNA sequence and translation frames in forward strand. Toggle models in either direction.
  39. 39. annotating simple cases
  40. 40. “Simple case”: - the predicted gene model is correct or nearly correct, and - this model is supported by evidence that completely or mostly agrees with the prediction. - evidence that extends beyond the predicted model is assumed to be non-coding sequence. The following are simple modifications. ANNOTATING SIMPLE CASES BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
  41. 41. • A confirmation box will warn you if the receiving transcript is not on the same strand as the feature where the new exon originated. • Check ‘Start’ and ‘Stop’ signals after each edit. ADDING EXONS BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
  42. 42. If transcript alignment data are available & extend beyond your original annotation, you may extend or add UTRs. 1. Right click at the exon edge and ‘Zoom to base level’. 2. Place the cursor over the edge of the exon until it becomes a black arrow then click and drag the edge of the exon to the new coordinate position that includes the UTR. ADDING UTRs To add a new spliced UTR to an existing annotation also follow the procedure for adding an exon. BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
  43. 43. To modify an exon boundary and match data in the evidence tracks: select both the [offending] exon and the feature with the expected boundary, then right click on the annotation to select ‘Set 3’ end’ or ‘Set 5’ end’ as appropriate. In some cases all the data may disagree with the annotation, in other cases some data support the annotation and some of the data support one or more alternative transcripts. Try to annotate as many alternative transcripts as are well supported by the data. MATCHING EXON BOUNDARY TO EVIDENCE BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
  44. 44. 1. Two exons from different tracks sharing the same start/end coordinates display a red bar to indicate matching edges. 2. Selecting the whole annotation or one exon at a time, use this edge- matching function and scroll along the length of the annotation, verifying exon boundaries against available data. Use square [ ] brackets to scroll from exon to exon. User curly { } brackets to scroll from annotation to annotation. 3. Check if cDNA / RNAseq reads lack one or more of the annotated exons or include additional exons. CHECKING EXON INTEGRITY BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
  45. 45. Non-canonical splice sites flags. Double click: selection of feature and sub-features Evidence Tracks Area ‘User-created Annotations’ Track Edge-matching Apollo’s editing logic (brain): § selects longest ORF as CDS § flags non-canonical splice sites ORFs AND SPLICE SITES BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
  46. 46. Non-canonical splices are indicated by an orange circle with a white exclamation point inside, placed over the edge of the offending exon. Canonical splice sites: 3’-…exon]GA / TG[exon…-5’ 5’-…exon]GT / AG[exon…-3’ reverse strand, not reverse-complemented: forward strand SPLICE SITES Zoom to review non-canonical splice site warnings. Although these may not always have to be corrected (e.g GC donor), they should be flagged with a comment. Exon/intron splice site error warning Curated model BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
  47. 47. Apollo calculates the longest possible open reading frame (ORF) that includes canonical ‘Start’ and ‘Stop’ signals within the predicted exons. If ‘Start’ appears to be incorrect, modify it by selecting an in-frame ‘Start’ codon further up or downstream, depending on evidence (proteins, RNAseq). It may be present outside the predicted gene model, within a region supported by another evidence track. In very rare cases, the actual ‘Start’ codon may be non-canonical (non-ATG). ‘Start’ AND ‘Stop’ SITES BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
  48. 48. annotating complex cases
  49. 49. Evidence may support joining two or more different gene models. Warning: protein alignments may have incorrect splice sites and lack non-conserved regions! 1. In ‘User-created Annotations’ area shift-click to select an intron from each gene model and right click to select the ‘Merge’ option from the menu. 2. Drag supporting evidence tracks over the candidate models to corroborate overlap, or review edge matching and coverage across models. 3. Check the resulting translation by querying a protein database e.g. UniProt, NCBI nr. Add comments to record that this annotation is the result of a merge. Red lines around exons: ‘edge-matching’ allows annotators to confirm whether the evidence is in agreement without examining each exon at the base level. COMPLEX CASES merge two gene predictions on the same scaffold BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
  50. 50. One or more splits may be recommended when: - different segments of the predicted protein align to two or more different gene families - predicted protein doesn’t align to known proteins over its entire length - Transcript data may support a split, but first verify whether they are alternative transcripts. COMPLEX CASES split a gene prediction BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
  51. 51. DNA Track ‘User-created Annotations’ Track COMPLEX CASES annotate frameshifts and correct single-base errors Always remember: when annotating gene models using Apollo, you are looking at a ‘frozen’ version of the genome assembly and you will not be able to modify the assembly itself. BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
  52. 52. COMPLEX CASES correcting selenocysteine containing proteins BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
  53. 53. COMPLEX CASES correcting selenocysteine containing proteins BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
  54. 54. 1. Apollo allows annotators to make single base modifications or frameshifts that are reflected in the sequence and structure of any transcripts overlapping the modification. These manipulations do NOT change the underlying genomic sequence. 2. If you determine that you need to make one of these changes, zoom in to the nucleotide level and right click over a single nucleotide on the genomic sequence to access a menu that provides options for creating insertions, deletions or substitutions. 3. The ‘Create Genomic Insertion’ feature will require you to enter the necessary string of nucleotide residues that will be inserted to the right of the cursor’s current location. The ‘Create Genomic Deletion’ option will require you to enter the length of the deletion, starting with the nucleotide where the cursor is positioned. The ‘Create Genomic Substitution’ feature asks for the string of nucleotide residues that will replace the ones on the DNA track. 4. Once you have entered the modifications, Apollo will recalculate the corrected transcript and protein sequences, which will appear when you use the right-click menu ‘Get Sequence’ option. Since the underlying genomic sequence is reflected in all annotations that include the modified region you should alert the curators of your organisms database using the ‘Comments’ section to report the CDS edits. 5. In special cases such as selenocysteine containing proteins (read-throughs), right-click over the offending/premature ‘Stop’ signal and choose the ‘Set readthrough stop codon’ option from the menu. COMPLEX CASES annotating frameshifts and correcting single-base errors & selenocysteines BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
  55. 55. 55 | USER NAVIGATION BECOMING ACQUAINTED WITH APOLLO • Information Editor
  56. 56. The Annotation Information Editor USER NAVIGATION BECOMING ACQUAINTED WITH APOLLO
  57. 57. The Annotation Information Editor • Add PubMed IDs • Include GO terms as appropriate from any of the three ontologies • Write comments stating how you have validated each model. USER NAVIGATION BECOMING ACQUAINTED WITH APOLLO
  58. 58. 58 | USER NAVIGATION BECOMING ACQUAINTED WITH APOLLO • Keeping track of each edit
  59. 59. Annotations, annotation edits, and History: stored in a centralized database. USER NAVIGATION BECOMING ACQUAINTED WITH APOLLO
  60. 60. Follow the checklist until you are happy with the annotation! And remember to… – comment to validate your annotation, even if you made no changes to an existing model. Think of comments as your vote of confidence. – or add a comment to inform the community of unresolved issues you think this model may have. 60 | Always Remember: Apollo curation is a community effort so please use comments to communicate the reasons for your annotation. Your comments will be visible to everyone. COMPLETING THE ANNOTATION BECOMING ACQUAINTED WITH APOLLO
  61. 61. Checklist
  62. 62. • Check ‘Start’ and ‘Stop’ sites. • Check splice sites: most splice sites display these residues …]5’-GT/AG-3’[… • Check if you can annotate UTRs, for example using RNA-Seq data: – align it against relevant genes/gene family – blastp against NCBI’s RefSeq or nr • Check for gaps in the genome. • Additional functionality may be necessary: – merging 2 gene predictions - same scaffold – ‘merging’ 2 gene predictions - different scaffolds – splitting a gene prediction – annotating frameshifts – annotating selenocysteines, correcting single-base and other assembly errors, etc. 62 | • Add: – Important project information in the form of comments – IDs from public databases e.g. GenBank (via DBXRef), gene symbol(s), common name(s), synonyms, top BLAST hits, orthologs with species names, and everything else you can think of, because you are the expert. – Comments about the kinds of changes you made to the gene model of interest, if any. – Any appropriate functional assignments, e.g. via BLAST, RNA-Seq data, literature searches, etc. CHECKLIST for accuracy and integrity MANUAL ANNOTATION CHECKLIST
  63. 63. Genome curation with i5k
  64. 64. 64i5K Workspace@NAL The collaborative curation process at i5k 1. A computationally predicted consensus gene set has been generated using multiple lines of evidence; e.g. HVIT_v0.5.3-Models 1. i5K Projects will integrate consensus computational predictions with manual annotations to produce an updated Official Gene Set (OGS): Warning! • If it’s not on either track, it won’t make the OGS! • If it’s there and it shouldn’t, it will still make the OGS!
  65. 65. The ‘Replace Models’ rules BECOMING ACQUAINTED WITH APOLLO http://tinyurl.com/apollo-i5k-replace
  66. 66. 66i5K Workspace@NAL 3. In some cases algorithms and metrics used to generate consensus sets may actually reduce the accuracy of the gene’s representation. Use your judgment, try choosing a different model to begin the annotation. 4. Isoforms: drag original and alternatively spliced form to ‘User-created Annotations’ area. 5. If an annotation needs to be removed from the consensus set, drag it to the ‘User-created Annotations’ area and label as ‘Delete’ on the Information Editor. 6. Overlapping interests? Collaborate to reach agreement. 7. Follow guidelines for i5K Pilot Species Projects, at http://goo.gl/LRu1VY The collaborative curation process at i5k
  67. 67. Example
  68. 68. What’s new?... finding inspiration in PubMed. Example 68 “Molecular analysis of bed bug populations from across the USA and Europe found that >80% and >95% of the respective populations contained V419L and/or L925I mutations in the voltage-gated sodium channel gene, indicating widespread distribution of target-site-based pyrethroid resistance.” Homalodisca vitripennis | Alexander Wild | www.alexanderwild.comHalyomorpha halys | Fondazione Edmund Mach - Italy Now for our species of interest. . .
  69. 69. Example Example 69 Curation example using the Hyalella azteca genome (amphipod crustacean).
  70. 70. What do we know about this genome? • Currently publicly available data at NCBI: • >37,000 nucleotide seqsà scaffolds, mitochondrial genes • 344 amino acid seqsà mitochondrion • 47 ESTs • 0 conserved domains identified • 0 “gene” entries submitted • Data at i5K Workspace@NAL (annotation hosted at USDA) - 10,832 scaffolds: 23,288 transcripts: 12,906 proteins Example 70
  71. 71. PubMed Search: what’s new? Example 71
  72. 72. PubMed Search: what’s new? Example 72 “Ten populations differed by at least 550-fold in sensitivity to pyrethroids.” “Sequencing the primary pyrethroid target site, the voltage- gated sodium channel (vgsc), shows that point mutations and their spread in natural populations were responsible for differences in pyrethroid sensitivity.” “The finding that a non-target aquatic species has acquired resistance to pesticides used only on terrestrial pests is troubling evidence of the impact of chronic pesticide transport from land-based applications into aquatic systems.”
  73. 73. How many sequences are there, publicly available, for our gene of interest? Example 73 • Para, (voltage-gated sodium channel alpha subunit; Nasonia vitripennis). • NaCP60E (Sodium channel protein 60 E; D. melanogaster). – MF: voltage-gated cation channel activity (IDA, GO:0022843). – BP: olfactory behavior (IMP, GO:0042048), sodium ion transmembrane transport (ISS,GO:0035725). – CC: voltage-gated sodium channel complex (IEA, GO:0001518). And what do we know about them?
  74. 74. Retrieving sequences for a sequence similarity search. Example 74 >vgsc-Segment3-DomainII RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQD GQMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
  75. 75. BLAT search input Example 75 >vgsc-Segment3-DomainII RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQD GQMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
  76. 76. BLAT search results Example 76 • High-scoring segment pairs (hsp) are listed in tabulated format. • Clicking on one line of results sends you to those coordinates.
  77. 77. BLAST at i5K https://i5k.nal.usda.gov/blast Example 77 >vgsc-Segment3-DomainII RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQD GQMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
  78. 78. BLAST at i5K https://i5k.nal.usda.gov/blast Example 78
  79. 79. BLAST at i5K: hsps in “BLAST+ Results” track Example 79
  80. 80. Creating a new gene model: drag and drop Example 80 • Apollo automatically calculates longest ORF. • In this case, ORF includes the high-scoring segment pairs (hsp), marked here in blue. • Note that gene is transcribed from reverse strand.
  81. 81. Available Tracks Example 81
  82. 82. Get Sequence Example 82 http://blast.ncbi.nlm.nih.gov/Blast.cgi
  83. 83. Also, flanking sequences (other gene models) vs. NCBI nr Example 83 In this case, two gene models upstream, at 5’ end. BLAST hsps
  84. 84. Review alignments Example 84 HaztTmpM006234 HaztTmpM006233 HaztTmpM006232
  85. 85. Hypothesis for vgsc gene model Example 85
  86. 86. Editing: merge the three models Example 86 Merge by dropping an exon or gene model onto another. Merge by selecting two exons (holding down “Shift”) and using the right click menu. or…
  87. 87. Result of merging the gene models: Example 87
  88. 88. Editing: correct offending splice site Example 88 Modify exon / intron boundary: - Drag the end of the exon to the nearest canonical splice site. or - Use right-click menu.
  89. 89. Editing: set translation start Example 89
  90. 90. Editing: delete exon not supported by evidence Example 90 Delete first exon from HaztTmpM006233
  91. 91. Editing: add an exon supported by RNAseq Example 91 • RNAseq reads show evidence in support of transcribed product, which was not predicted. • Add exon at coordinates 97946-98012 by dragging up one of the RNAseq reads.
  92. 92. Editing: adjust offending splice site using evidence Example 92
  93. 93. Editing: adjust other boundaries supported by evidence Example 93
  94. 94. Finished model Example 94 Corroborate integrity and accuracy of the model: - Start and Stop - Exon structure and splice sites …]5’-GT/AG-3’[… - Check the predicted protein product vs. NCBI nr, UniProt, etc.
  95. 95. Information Editor • DBXRefs: e.g. NP_001128389.1, N. vitripennis, RefSeq • PubMed identifier: PMID: 24065824 • Gene Ontology IDs: GO:0022843, GO:0042048, GO:0035725, GO:0001518. • Comments • Name, Symbol • Approve / Delete radio button Example 95 Comments (if applicable)
  96. 96. Go play!
  97. 97. PUBLIC DEMO 97 | APOLLO ON THE WEB instructions At i5K 1. Register for access to Apollo at the i5K Workspace@NAL at https://i5k.nal.usda.gov/web-apollo-registration 2. Contact the coordinator for each species community to receive more information about how to contribute. Contact info is available on each organism’s page.
  98. 98. PUBLIC DEMO 98 | APOLLO ON THE WEB instructions Public Honey bee demo available at: http://GenomeArchitect.org/WebApolloDemo Username: demo@demo.com Password: demo
  99. 99. APOLLO demonstration PUBLIC DEMO 99 Demonstration video is available at https://youtu.be/VgPtAP_fvxY
  100. 100. OUTLINE 100OUTLINE • BIO-REFRESHER biological concepts for curation • ANNOTATION automatic predictions • MANUAL ANNOTATION necessary, collaborative • APOLLO advancing collaborative curation • EXAMPLE demos
  101. 101. Apollo Development Nathan Dunn Technical Lead Eric Yao Christine Elsik’s Lab, University of Missouri Suzi Lewis Principal Investigator BBOP Moni Munoz-Torres Project Manager Deepak Unni JBrowse. Ian Holmes’ Lab University of California, Berkeley
  102. 102. • Berkeley Bioinformatics Open-source Projects (BBOP), Berkeley Lab: Apollo and Gene Ontology teams. Suzanna E. Lewis (PI). • § Christine G. Elsik (PI). University of Missouri. • * Ian Holmes (PI). University of California Berkeley. • Arthropod genomics community & i5K Steering Committee. • Stephen Ficklin, GenSAS, Washington State University • Apollo is supported by NIH grants 5R01GM080203 from NIGMS, and 5R01HG004483 from NHGRI. Also supported by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231 • For your attention, thank you! Apollo Nathan Dunn Deepak Unni § Gene Ontology Chris Mungall Seth Carbon Heiko Dietze BBOP Learn more about Apollo at http://GenomeArchitect.org Thank you! NAL at USDA Monica Poelchau Mei-Ju Chen Christopher Childers Gary Moore HGSC at BCM fringy Richards Kim Worley JBrowse Eric Yao *
  103. 103. Interface Updates Annotator Panel
  104. 104. Interface Updates gene mRNA
  105. 105. Update: Transforming coordinates Bringing exons closer together to facilitate annotation of gene models with long introns. 1,275 bp Concept for Apollo v2.1 – Northern Spring 2016
  106. 106. Transforming coordinates Assembly artifacts may cause gene models to be split across two or more scaffolds. To facilitate annotation, Apollo allows the generation of an artificial space where the annotation can be completed. Scaffold 2Scaffold 1 Genome Assembly . . . . . . Scaffold n

×