• Save
Gemome annotation
Upcoming SlideShare
Loading in...5
×
 

Gemome annotation

on

  • 513 views

This presentation elaborates the genome annotation. This presentation presented at PMAS-UAAR, Pakistan.

This presentation elaborates the genome annotation. This presentation presented at PMAS-UAAR, Pakistan.
Delivered by: MUHAMMAD TAJAMMAL KHAN

Statistics

Views

Total Views
513
Slideshare-icon Views on SlideShare
513
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Gemome annotation Gemome annotation Presentation Transcript

    • Genome Annotation Delivered by Muhammad Tajammal Khan M.Phil (Botany) 11-arid-3759
    • Definition: Genome Annotation is the process ofinterpreting raw sequence data into useful biologicalinformation Annotations describe the genome andtransform raw genome sequences into biologicalinformation by integrating computational analyses,other biological data and biological expertise.
    • Unannotated DNA 5 3Annotated DNALegend: Exon (protein coding) Intron Intergenic sequence
    • Annotation may beStructural annotationORFs and their localisation (http://www.ncbi.nlm.nih.gov/gorf/gorf.html)gene structurecoding regionslocation of regulatory motifsFunctional annotationbiochemical functionbiological functioninvolved regulation and interactionsexpression
    • Things we are looking toannotate? CDS mRNA Promoter and Poly-A Signal Pseudogenes ncRNA
    • Tools ORF detectors ◦ NCBI: http://www.ncbi.nih.gov/gorf/gorf.html Promoter predictors ◦ CSHL: http://rulai.cshl.org/software/index1.htm ◦ BDGP: fruitfly.org/seq_tools/promoter.html ◦ ICG: TATA-Box predictor PolyA signal predictors ◦ CSHL: argon.cshl.org/tabaska/polyadq_form.html Splice site predictors ◦ BDGP: http://www.fruitfly.org/seq_tools/splice.html Start-/stop-codon identifiers ◦ DNALC: Translator/ORF-Finder ◦ BCM: Searchlauncher
    • Overview ofgenomeanalysis
    • Two approaches to genome sequencingWhole Genome ShotgunAn approach used to decode an organisms genomeby shredding it into smaller fragments of DNA whichcan be sequenced individually. The sequences of thesefragments are then ordered, based on overlaps in thegenetic code, and finally reassembled into the completesequence. The whole genome shotgun (WGS) method isapplied to the entire genome all at once, while thehierarchical shotgun method is applied to large,overlapping DNA fragments of known location inthe genome.
    • Two approaches to genome sequencingHierarchical shotgun methodAssemble contigs from various chromosomes, then sequence and assemblethem. A contig is a set of overlapping clones or sequences from which asequence can be obtained.A contig is thus a chromosome map showing the locations of those regions ofa chromosome where contiguous DNA segments overlap. Contig maps areimportant because they provide the ability to study a complete, and oftenlarge segment of the genome by examining a series of overlapping cloneswhich then provide an unbroken succession of information aboutthat region.
    • Hierarchical vs. Whole Genome
    • Sequencing technology in 12 steps
    • 1. Prepare genomic DNA 2. Attach DNA to surface DNA 3. Bridge amplification 4. Fragments become adapters double stranded 5. Denature the double- stranded molecules 6. Complete amplificationRandomly fragment genomic DNA and ligateadapters to both ends of the fragments
    • adapter DNA 1. Prepare genomic DNA fragment 2. Attach DNA to surface dense lawn 3. Bridge amplification of primers adapter 4. Fragments become double stranded 5. Denature the double- stranded molecules 6. Complete amplificationBind single-stranded fragments randomly tothe inside surface of the flow cell channels
    • 1. Prepare genomic DNA 2. Attach DNA to surface 3. Bridge amplification 4. Fragments become double stranded 5. Denature the double- stranded molecules 6. Complete amplificationAdd unlabeled nucleotides and enzyme toinitiate solid-phase bridge amplification
    • 1. Prepare genomic DNA 2. Attach DNA to surface Attached 3. Bridge amplificationAttached terminus free terminus terminus 4. Fragments become double stranded 5. Denature the double- stranded molecules 6. Complete amplification The enzyme incorporates nucleotides to build double-stranded bridges on the solid-phase substrate
    • 1. Prepare genomic DNA 2. Attach DNA to surface Attached 3. Bridge amplification Attached 4. Fragments become double stranded 5. Denature the double- stranded molecules 6. Complete amplificationDenaturation leaves single-stranded templates anchored tothe substrate
    • 1. Prepare genomic DNA 2. Attach DNA to surface 3. Bridge amplification 4. Fragments become double stranded 5. Denature the double- stranded molecules Clusters 6. Complete amplificationSeveral million dense clusters ofdouble-stranded DNA are generated ineach channel of the flow cell
    • 7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles 12. Align data LaserThe first sequencing cycle begins byadding four labeled reversible terminators,primers, and DNA polymerase
    • 7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles 12. Align dataAfter laser excitation, the emittedfluorescence from each cluster is capturedand the first base is identified
    • 7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles Laser 12. Align dataThe next cycle repeats the incorporationof four labeled reversible terminators,primers, and DNA polymerase
    • 7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles 12. Align dataAfter laser excitation the image iscaptured as before, and the identity ofthe second base is recorded.
    • 7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles 12. Align dataThe sequencing cycles are repeated todetermine the sequence of bases in afragment, one base at a time.
    • Reference sequence 7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles 12. Align dataThe data are aligned and compared toa reference, and sequencingdifferences are identified.
    • The generic structure of an automatic genome annotation pipeline anddelivery system
    • The Annotation Process ANNALYSIS SOFTWAREDNA SEQUENCE Useful Information Annotator
    • Annotation Process DNA sequenceRepeatMasker Blastn Gene finders Blastx Halfwise tRNA scan Repeats Promoters rRNA Pseudo-Genes tRNA Genes Fasta BlastP Pfam Prosite Psort SignalP TMHMM
    • Genome BrowsersGeneric Genome Browser (CSHL) NCBI Map Viewer Ensembl Genome Browserwww.wormbase.org/db/seq/gbrowse www.ncbi.nlm.nih.gov/mapview/ www.ensembl.org/ UCSC Genome Browser Apollo Genome Browser genome.ucsc.edu/cgi-bin/hgGateway?org=human www.bdgp.org/annot/apollo/
    • What is gene prediction?Detecting meaningful signals in uncharacterised DNA sequences.Knowledge of the interesting information in DNA. GATCGGTCGAGCGTAAGCTAGCTAG ATCGATGATCGATCGGCCATATATC ACTAGAGCTAGAATCGATAATCGAT CGATATAGCTATAGCTATAGCCTAT  Gene prediction is ‘recognising protein- coding regions in genomic sequence’
    • Basic Gene Prediction Flow Chart Obtain new genomic DNA sequence1. Translate in all six reading frames and compare to proteinsequence databases2. Perform database similarity search of expressed sequence tagSites (EST) database of same organism, or cDNA sequences if available Use gene prediction program to locate genes Analyze regulatory sequences in the gene
    • Approaches to gene predictionAb Initio Gene Finding http://exon.gatech.edu/GenMark/eukhmm.cgi http://sun1.softberry.com/berry.phtml=fgenesh&group=programs &subgroup=gfindRepeat Masking http://www.repeatmasker.org http://www.repeatmasker.org/cgi-bin/WEBRepeatMaskerTranscript based prediction http://plantta.tigr.org http://harvest.ucr.edu/Gene function CDNA http://au.expasy.org/sprot/ http://www.pir.uniprot.org/Gene Ontologies http://www.geneontology.org
    • Visualization Tools http://www.gmod.org/?q=node/4 http://www.gmod.org/?q=node/71
    • Prediciton of Secondary Structure and Folding Classes• nnpredict http://www.cmpharm.ucsf.edu/_nomi/nnpredict.html• PredictProtein http://www.embl-heidelberg.de/predictprotein/• SOPMA http://pbil.ibcp.fr/• Jpred http://jura.ebi.ac.uk:8888/• PSIPRED http://insulin.brunel.ac.uk/psipred• PREDATOR http://www.embl-heidelberg.de/predator.htmlPrediction of Specialized Structures or Features• COILS http://www.ch.embnet.org/software/COILSform.html• MacStripe www.york.ac.uk/depts/biol/units/coils/mstr2.html• PHDtopology http://www.embl-heidelberg.de/predictprotein• SignalP http://www.cbs.dtu.dk/services/SignalP/• TMpred http://www.isrec.isb-sib.ch/ftp-erver/tmpred www/TMPREDform.htmlStructure Prediction• DALI http://www2.ebi.ac.uk/dali/• Bryant-Lawrence ftp://ncbi.nlm.nih.gov/pub/pkb/• FSSP http://www2.ebi.ac.uk/dali/fssp/• UCLA-DOE http://fold.doe-mbi.ucla.edu/Home• SWISS-MODEL http://www.expasy.ch/swissmod/SWISS-MODEL
    • Search using the genename
    • Click the name for transcriptinfo
    • Click the contig to see detailed information
    • Click the contig to see detailed informationClick here to export the contig
    • Select output format, then click Continue
    • Save or copy the data for further analysis
    • Genomic region
    • Coding sequence
    • Click here for pairwise BLAST
    • Click ‘Align’ toproceed Paste in genomic sequence Paste in CDS sequence
    • This is a protein BLAST(BLASTP)
    • Some Concluding remarks Trust but verify Beware of gene prediction tools! Always use more than one gene prediction tool and more than one genome when possible. Active area of bioinformatics research, so be mindful of the new literature in this .