Genome Annotation      Delivered by  Muhammad Tajammal Khan      M.Phil (Botany)      11-arid-3759
Definition:     Genome Annotation is the process ofinterpreting raw sequence data into useful biologicalinformation Annota...
Unannotated DNA  5                           3Annotated DNALegend:       Exon (protein coding)       Intron       Intergen...
Annotation may beStructural annotationORFs and their localisation (http://www.ncbi.nlm.nih.gov/gorf/gorf.html)gene structu...
Things we are looking toannotate?   CDS   mRNA   Promoter and Poly-A Signal   Pseudogenes   ncRNA
Tools   ORF detectors    ◦ NCBI: http://www.ncbi.nih.gov/gorf/gorf.html   Promoter predictors    ◦ CSHL: http://rulai.cs...
Overview ofgenomeanalysis
Two approaches to genome sequencingWhole Genome ShotgunAn approach used to decode an organisms genomeby shredding it into ...
Two approaches to genome sequencingHierarchical shotgun methodAssemble contigs from various chromosomes, then sequence and...
Hierarchical vs. Whole Genome
Sequencing technology in 12 steps
1. Prepare genomic DNA                                           2. Attach DNA to surface    DNA                          ...
adapter                           DNA                 1. Prepare genomic DNA                           fragment           ...
1. Prepare genomic DNA                                            2. Attach DNA to surface                                ...
1. Prepare genomic DNA                                             2. Attach DNA to surface                               ...
1. Prepare genomic DNA                                 2. Attach DNA to surface                     Attached    3. Bridge ...
1. Prepare genomic DNA                                       2. Attach DNA to surface                                     ...
7. Determine first base                                              8. Image first base                                  ...
7. Determine first base                                             8. Image first base                                   ...
7. Determine first base                                           8. Image first base                                     ...
7. Determine first base                                          8. Image first base                                      ...
7. Determine first base                                        8. Image first base                                        ...
Reference    sequence                                       7. Determine first base                                       ...
The generic structure of an automatic genome annotation pipeline anddelivery system
The Annotation Process               ANNALYSIS SOFTWAREDNA SEQUENCE                                                Useful ...
Annotation Process                           DNA sequenceRepeatMasker    Blastn     Gene finders      Blastx     Halfwise ...
Genome BrowsersGeneric Genome Browser (CSHL)           NCBI Map Viewer                 Ensembl Genome Browserwww.wormbase....
What is gene                    prediction?Detecting meaningful signals in uncharacterised DNA sequences.Knowledge of the ...
Basic Gene Prediction Flow      Chart            Obtain new genomic DNA sequence1. Translate in all six reading frames and...
Approaches to gene predictionAb Initio Gene Finding        http://exon.gatech.edu/GenMark/eukhmm.cgi        http://sun1.s...
Visualization Tools         http://www.gmod.org/?q=node/4         http://www.gmod.org/?q=node/71
Prediciton of Secondary Structure and Folding Classes• nnpredict      http://www.cmpharm.ucsf.edu/_nomi/nnpredict.html• Pr...
Search using the genename
Click the name for transcriptinfo
Click the contig to see detailed information
Click the contig to see detailed informationClick here to export the contig
Select output format, then click Continue
Save or copy the data for further analysis
Genomic region
Coding sequence
Click here for pairwise BLAST
Click ‘Align’ toproceed       Paste in genomic sequence       Paste in CDS sequence
This is a protein BLAST(BLASTP)
Some Concluding remarks Trust but verify Beware of gene prediction tools! Always use more than one gene  prediction too...
Gemome annotation
Gemome annotation
Gemome annotation
Gemome annotation
Gemome annotation
Gemome annotation
Upcoming SlideShare
Loading in...5
×

Gemome annotation

690

Published on

This presentation elaborates the genome annotation. This presentation presented at PMAS-UAAR, Pakistan.
Delivered by: MUHAMMAD TAJAMMAL KHAN

Published in: Education, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
690
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Gemome annotation

  1. 1. Genome Annotation Delivered by Muhammad Tajammal Khan M.Phil (Botany) 11-arid-3759
  2. 2. Definition: Genome Annotation is the process ofinterpreting raw sequence data into useful biologicalinformation Annotations describe the genome andtransform raw genome sequences into biologicalinformation by integrating computational analyses,other biological data and biological expertise.
  3. 3. Unannotated DNA 5 3Annotated DNALegend: Exon (protein coding) Intron Intergenic sequence
  4. 4. Annotation may beStructural annotationORFs and their localisation (http://www.ncbi.nlm.nih.gov/gorf/gorf.html)gene structurecoding regionslocation of regulatory motifsFunctional annotationbiochemical functionbiological functioninvolved regulation and interactionsexpression
  5. 5. Things we are looking toannotate? CDS mRNA Promoter and Poly-A Signal Pseudogenes ncRNA
  6. 6. Tools ORF detectors ◦ NCBI: http://www.ncbi.nih.gov/gorf/gorf.html Promoter predictors ◦ CSHL: http://rulai.cshl.org/software/index1.htm ◦ BDGP: fruitfly.org/seq_tools/promoter.html ◦ ICG: TATA-Box predictor PolyA signal predictors ◦ CSHL: argon.cshl.org/tabaska/polyadq_form.html Splice site predictors ◦ BDGP: http://www.fruitfly.org/seq_tools/splice.html Start-/stop-codon identifiers ◦ DNALC: Translator/ORF-Finder ◦ BCM: Searchlauncher
  7. 7. Overview ofgenomeanalysis
  8. 8. Two approaches to genome sequencingWhole Genome ShotgunAn approach used to decode an organisms genomeby shredding it into smaller fragments of DNA whichcan be sequenced individually. The sequences of thesefragments are then ordered, based on overlaps in thegenetic code, and finally reassembled into the completesequence. The whole genome shotgun (WGS) method isapplied to the entire genome all at once, while thehierarchical shotgun method is applied to large,overlapping DNA fragments of known location inthe genome.
  9. 9. Two approaches to genome sequencingHierarchical shotgun methodAssemble contigs from various chromosomes, then sequence and assemblethem. A contig is a set of overlapping clones or sequences from which asequence can be obtained.A contig is thus a chromosome map showing the locations of those regions ofa chromosome where contiguous DNA segments overlap. Contig maps areimportant because they provide the ability to study a complete, and oftenlarge segment of the genome by examining a series of overlapping cloneswhich then provide an unbroken succession of information aboutthat region.
  10. 10. Hierarchical vs. Whole Genome
  11. 11. Sequencing technology in 12 steps
  12. 12. 1. Prepare genomic DNA 2. Attach DNA to surface DNA 3. Bridge amplification 4. Fragments become adapters double stranded 5. Denature the double- stranded molecules 6. Complete amplificationRandomly fragment genomic DNA and ligateadapters to both ends of the fragments
  13. 13. adapter DNA 1. Prepare genomic DNA fragment 2. Attach DNA to surface dense lawn 3. Bridge amplification of primers adapter 4. Fragments become double stranded 5. Denature the double- stranded molecules 6. Complete amplificationBind single-stranded fragments randomly tothe inside surface of the flow cell channels
  14. 14. 1. Prepare genomic DNA 2. Attach DNA to surface 3. Bridge amplification 4. Fragments become double stranded 5. Denature the double- stranded molecules 6. Complete amplificationAdd unlabeled nucleotides and enzyme toinitiate solid-phase bridge amplification
  15. 15. 1. Prepare genomic DNA 2. Attach DNA to surface Attached 3. Bridge amplificationAttached terminus free terminus terminus 4. Fragments become double stranded 5. Denature the double- stranded molecules 6. Complete amplification The enzyme incorporates nucleotides to build double-stranded bridges on the solid-phase substrate
  16. 16. 1. Prepare genomic DNA 2. Attach DNA to surface Attached 3. Bridge amplification Attached 4. Fragments become double stranded 5. Denature the double- stranded molecules 6. Complete amplificationDenaturation leaves single-stranded templates anchored tothe substrate
  17. 17. 1. Prepare genomic DNA 2. Attach DNA to surface 3. Bridge amplification 4. Fragments become double stranded 5. Denature the double- stranded molecules Clusters 6. Complete amplificationSeveral million dense clusters ofdouble-stranded DNA are generated ineach channel of the flow cell
  18. 18. 7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles 12. Align data LaserThe first sequencing cycle begins byadding four labeled reversible terminators,primers, and DNA polymerase
  19. 19. 7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles 12. Align dataAfter laser excitation, the emittedfluorescence from each cluster is capturedand the first base is identified
  20. 20. 7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles Laser 12. Align dataThe next cycle repeats the incorporationof four labeled reversible terminators,primers, and DNA polymerase
  21. 21. 7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles 12. Align dataAfter laser excitation the image iscaptured as before, and the identity ofthe second base is recorded.
  22. 22. 7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles 12. Align dataThe sequencing cycles are repeated todetermine the sequence of bases in afragment, one base at a time.
  23. 23. Reference sequence 7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles 12. Align dataThe data are aligned and compared toa reference, and sequencingdifferences are identified.
  24. 24. The generic structure of an automatic genome annotation pipeline anddelivery system
  25. 25. The Annotation Process ANNALYSIS SOFTWAREDNA SEQUENCE Useful Information Annotator
  26. 26. Annotation Process DNA sequenceRepeatMasker Blastn Gene finders Blastx Halfwise tRNA scan Repeats Promoters rRNA Pseudo-Genes tRNA Genes Fasta BlastP Pfam Prosite Psort SignalP TMHMM
  27. 27. Genome BrowsersGeneric Genome Browser (CSHL) NCBI Map Viewer Ensembl Genome Browserwww.wormbase.org/db/seq/gbrowse www.ncbi.nlm.nih.gov/mapview/ www.ensembl.org/ UCSC Genome Browser Apollo Genome Browser genome.ucsc.edu/cgi-bin/hgGateway?org=human www.bdgp.org/annot/apollo/
  28. 28. What is gene prediction?Detecting meaningful signals in uncharacterised DNA sequences.Knowledge of the interesting information in DNA. GATCGGTCGAGCGTAAGCTAGCTAG ATCGATGATCGATCGGCCATATATC ACTAGAGCTAGAATCGATAATCGAT CGATATAGCTATAGCTATAGCCTAT  Gene prediction is ‘recognising protein- coding regions in genomic sequence’
  29. 29. Basic Gene Prediction Flow Chart Obtain new genomic DNA sequence1. Translate in all six reading frames and compare to proteinsequence databases2. Perform database similarity search of expressed sequence tagSites (EST) database of same organism, or cDNA sequences if available Use gene prediction program to locate genes Analyze regulatory sequences in the gene
  30. 30. Approaches to gene predictionAb Initio Gene Finding http://exon.gatech.edu/GenMark/eukhmm.cgi http://sun1.softberry.com/berry.phtml=fgenesh&group=programs &subgroup=gfindRepeat Masking http://www.repeatmasker.org http://www.repeatmasker.org/cgi-bin/WEBRepeatMaskerTranscript based prediction http://plantta.tigr.org http://harvest.ucr.edu/Gene function CDNA http://au.expasy.org/sprot/ http://www.pir.uniprot.org/Gene Ontologies http://www.geneontology.org
  31. 31. Visualization Tools http://www.gmod.org/?q=node/4 http://www.gmod.org/?q=node/71
  32. 32. Prediciton of Secondary Structure and Folding Classes• nnpredict http://www.cmpharm.ucsf.edu/_nomi/nnpredict.html• PredictProtein http://www.embl-heidelberg.de/predictprotein/• SOPMA http://pbil.ibcp.fr/• Jpred http://jura.ebi.ac.uk:8888/• PSIPRED http://insulin.brunel.ac.uk/psipred• PREDATOR http://www.embl-heidelberg.de/predator.htmlPrediction of Specialized Structures or Features• COILS http://www.ch.embnet.org/software/COILSform.html• MacStripe www.york.ac.uk/depts/biol/units/coils/mstr2.html• PHDtopology http://www.embl-heidelberg.de/predictprotein• SignalP http://www.cbs.dtu.dk/services/SignalP/• TMpred http://www.isrec.isb-sib.ch/ftp-erver/tmpred www/TMPREDform.htmlStructure Prediction• DALI http://www2.ebi.ac.uk/dali/• Bryant-Lawrence ftp://ncbi.nlm.nih.gov/pub/pkb/• FSSP http://www2.ebi.ac.uk/dali/fssp/• UCLA-DOE http://fold.doe-mbi.ucla.edu/Home• SWISS-MODEL http://www.expasy.ch/swissmod/SWISS-MODEL
  33. 33. Search using the genename
  34. 34. Click the name for transcriptinfo
  35. 35. Click the contig to see detailed information
  36. 36. Click the contig to see detailed informationClick here to export the contig
  37. 37. Select output format, then click Continue
  38. 38. Save or copy the data for further analysis
  39. 39. Genomic region
  40. 40. Coding sequence
  41. 41. Click here for pairwise BLAST
  42. 42. Click ‘Align’ toproceed Paste in genomic sequence Paste in CDS sequence
  43. 43. This is a protein BLAST(BLASTP)
  44. 44. Some Concluding remarks Trust but verify Beware of gene prediction tools! Always use more than one gene prediction tool and more than one genome when possible. Active area of bioinformatics research, so be mindful of the new literature in this .

×