SlideShare a Scribd company logo
1 of 50
Genome Annotation
      Delivered by
  Muhammad Tajammal Khan
      M.Phil (Botany)
      11-arid-3759
Definition:     Genome Annotation is the process of
interpreting raw sequence data into useful biological
information Annotations describe the genome and
transform raw genome sequences into biological
information by integrating computational analyses,
other biological data and biological expertise.
Unannotated DNA

  5'                           3'

Annotated DNA




Legend:

       Exon (protein coding)
       Intron
       Intergenic sequence
Annotation may be
Structural annotation
ORFs and their localisation (http://www.ncbi.nlm.nih.gov/gorf/gorf.html)
gene structure
coding regions
location of regulatory motifs


Functional annotation
biochemical function
biological function
involved regulation and interactions
expression
Things we are looking to
annotate?

   CDS
   mRNA
   Promoter and Poly-A Signal
   Pseudogenes
   ncRNA
Tools
   ORF detectors
    ◦ NCBI: http://www.ncbi.nih.gov/gorf/gorf.html
   Promoter predictors
    ◦ CSHL: http://rulai.cshl.org/software/index1.htm
    ◦ BDGP: fruitfly.org/seq_tools/promoter.html
    ◦ ICG: TATA-Box predictor
   PolyA signal predictors
    ◦ CSHL: argon.cshl.org/tabaska/polyadq_form.html
   Splice site predictors
    ◦ BDGP:
      http://www.fruitfly.org/seq_tools/splice.html
   Start-/stop-codon identifiers
    ◦ DNALC: Translator/ORF-Finder
    ◦ BCM: Searchlauncher
Overview of
genome
analysis
Two approaches to genome sequencing
Whole Genome Shotgun
An approach used to decode an organism's genome
by shredding it into smaller fragments of DNA which
can be sequenced individually. The sequences of these
fragments are then ordered, based on overlaps in the
genetic code, and finally reassembled into the complete
sequence. The 'whole genome shotgun' (WGS) method is
applied to the entire genome all at once, while the
'hierarchical shotgun' method is applied to large,
overlapping DNA fragments of known location in
the genome.
Two approaches to genome sequencing

Hierarchical shotgun method
Assemble contigs from various chromosomes, then sequence and assemble
them. A contig is a set of overlapping clones or sequences from which a
sequence can be obtained.

A contig is thus a chromosome map showing the locations of those regions of
a chromosome where contiguous DNA segments overlap. Contig maps are
important because they provide the ability to study a complete, and often
large segment of the genome by examining a series of overlapping clones
which then provide an unbroken succession of information about
that region.
Hierarchical vs. Whole Genome
Sequencing technology in 12 steps
1. Prepare genomic DNA

                                           2. Attach DNA to surface

    DNA                                    3. Bridge amplification

                                           4. Fragments become
                       adapters
                                           double stranded

                                           5. Denature the double-
                                           stranded molecules

                                           6. Complete amplification

Randomly fragment genomic DNA and ligate
adapters to both ends of the fragments
adapter
                           DNA                 1. Prepare genomic DNA
                           fragment
                                               2. Attach DNA to surface

                           dense lawn          3. Bridge amplification
                           of primers
            adapter                            4. Fragments become
                                               double stranded

                                               5. Denature the double-
                                               stranded molecules

                                               6. Complete amplification



Bind single-stranded fragments randomly to
the inside surface of the flow cell channels
1. Prepare genomic DNA

                                            2. Attach DNA to surface

                                            3. Bridge amplification

                                            4. Fragments become
                                            double stranded

                                            5. Denature the double-
                                            stranded molecules

                                            6. Complete amplification


Add unlabeled nucleotides and enzyme to
initiate solid-phase bridge amplification
1. Prepare genomic DNA

                                             2. Attach DNA to surface

                               Attached      3. Bridge amplification
Attached terminus   free       terminus
                    terminus                 4. Fragments become
                                             double stranded

                                             5. Denature the double-
                                             stranded molecules

                                             6. Complete amplification

    The enzyme incorporates nucleotides to
    build double-stranded bridges on the
    solid-phase substrate
1. Prepare genomic DNA

                                 2. Attach DNA to surface

                     Attached    3. Bridge amplification
       Attached
                                 4. Fragments become
                                 double stranded

                                 5. Denature the double-
                                 stranded molecules

                                 6. Complete amplification

Denaturation leaves single-
stranded templates anchored to
the substrate
1. Prepare genomic DNA

                                       2. Attach DNA to surface

                                       3. Bridge amplification

                                       4. Fragments become
                                       double stranded

                                       5. Denature the double-
                                       stranded molecules
                     Clusters
                                       6. Complete amplification

Several million dense clusters of
double-stranded DNA are generated in
each channel of the flow cell
7. Determine first base

                                              8. Image first base

                                              9. Determine second base

                                              10. Image second
                                              chemistry cycle

                                              11. Sequencing over
                                              multiple chemistry cycles

                                              12. Align data
                 Laser

The first sequencing cycle begins by
adding four labeled reversible terminators,
primers, and DNA polymerase
7. Determine first base

                                             8. Image first base

                                             9. Determine second base

                                             10. Image second
                                             chemistry cycle

                                             11. Sequencing over
                                             multiple chemistry cycles

                                             12. Align data

After laser excitation, the emitted
fluorescence from each cluster is captured
and the first base is identified
7. Determine first base

                                           8. Image first base

                                           9. Determine second base

                                           10. Image second
                                           chemistry cycle

                                           11. Sequencing over
                                           multiple chemistry cycles

                    Laser                  12. Align data

The next cycle repeats the incorporation
of four labeled reversible terminators,
primers, and DNA polymerase
7. Determine first base

                                          8. Image first base

                                          9. Determine second base

                                          10. Image second
                                          chemistry cycle

                                          11. Sequencing over
                                          multiple chemistry cycles

                                          12. Align data

After laser excitation the image is
captured as before, and the identity of
the second base is recorded.
7. Determine first base

                                        8. Image first base

                                        9. Determine second base

                                        10. Image second
                                        chemistry cycle

                                        11. Sequencing over
                                        multiple chemistry cycles

                                        12. Align data

The sequencing cycles are repeated to
determine the sequence of bases in a
fragment, one base at a time.
Reference
    sequence
                                       7. Determine first base

                                       8. Image first base

                                       9. Determine second base

                                       10. Image second
                                       chemistry cycle

                                       11. Sequencing over
                                       multiple chemistry cycles

                                       12. Align data

The data are aligned and compared to
a reference, and sequencing
differences are identified.
The generic structure of an automatic genome annotation pipeline and
delivery system
The Annotation Process




               ANNALYSIS SOFTWARE
DNA SEQUENCE
                                                Useful
                                                Information




                                    Annotator
Annotation Process


                           DNA sequence




RepeatMasker    Blastn     Gene finders      Blastx     Halfwise    tRNA scan




 Repeats       Promoters    rRNA     Pseudo-Genes                       tRNA
                                                          Genes


    Fasta       BlastP     Pfam    Prosite      Psort     SignalP    TMHMM
Genome Browsers




Generic Genome Browser (CSHL)           NCBI Map Viewer                 Ensembl Genome Browser
www.wormbase.org/db/seq/gbrowse   www.ncbi.nlm.nih.gov/mapview/             www.ensembl.org/




               UCSC Genome Browser                                 Apollo Genome Browser
    genome.ucsc.edu/cgi-bin/hgGateway?org=human                   www.bdgp.org/annot/apollo/
What is gene
                    prediction?

Detecting meaningful signals in uncharacterised DNA sequences.
Knowledge of the interesting information in DNA.




 GATCGGTCGAGCGTAAGCTAGCTAG
 ATCGATGATCGATCGGCCATATATC
 ACTAGAGCTAGAATCGATAATCGAT
 CGATATAGCTATAGCTATAGCCTAT



  Gene prediction is ‘recognising protein-
 coding regions in genomic sequence’
Basic Gene Prediction Flow
      Chart
            Obtain new genomic DNA sequence



1. Translate in all six reading frames and compare to protein
sequence databases
2. Perform database similarity search of expressed sequence tag
Sites (EST) database of same organism, or cDNA sequences if
 available
           Use gene prediction program to locate genes


           Analyze regulatory sequences in the gene
Approaches to gene prediction

Ab Initio Gene Finding
        http://exon.gatech.edu/GenMark/eukhmm.cgi
        http://sun1.softberry.com/berry.phtml=fgenesh&group=programs
        &subgroup=gfind
Repeat Masking
      http://www.repeatmasker.org
      http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker
Transcript based prediction
       http://plantta.tigr.org
       http://harvest.ucr.edu/
Gene function CDNA
       http://au.expasy.org/sprot/
       http://www.pir.uniprot.org/
Gene Ontologies
       http://www.geneontology.org
Visualization Tools
         http://www.gmod.org/?q=node/4
         http://www.gmod.org/?q=node/71
Prediciton of Secondary Structure and Folding Classes
• nnpredict      http://www.cmpharm.ucsf.edu/_nomi/nnpredict.html
• PredictProtein    http://www.embl-heidelberg.de/predictprotein/
• SOPMA              http://pbil.ibcp.fr/
• Jpred              http://jura.ebi.ac.uk:8888/
• PSIPRED            http://insulin.brunel.ac.uk/psipred
• PREDATOR           http://www.embl-heidelberg.de/predator.html
Prediction of Specialized Structures or Features
• COILS          http://www.ch.embnet.org/software/COILSform.html
• MacStripe          www.york.ac.uk/depts/biol/units/coils/mstr2.html
• PHDtopology        http://www.embl-heidelberg.de/predictprotein
• SignalP            http://www.cbs.dtu.dk/services/SignalP/
• TMpred             http://www.isrec.isb-sib.ch/ftp-erver/tmpred
                        www/TMPREDform.html
Structure Prediction
• DALI               http://www2.ebi.ac.uk/dali/
• Bryant-Lawrence ftp://ncbi.nlm.nih.gov/pub/pkb/
• FSSP                http://www2.ebi.ac.uk/dali/fssp/
• UCLA-DOE           http://fold.doe-mbi.ucla.edu/Home
• SWISS-MODEL         http://www.expasy.ch/swissmod/SWISS-MODEL
Search using the gene
name
Click the name for transcript
info
Click the contig to see detailed information
Click the contig to see detailed information




Click here to export the contig
Select output format, then click Continue
Save or copy the data for further analysis
Genomic region
Coding sequence
Click here for pairwise BLAST
Click ‘Align’ to
proceed




       Paste in genomic sequence




       Paste in CDS sequence
This is a protein BLAST
(BLASTP)
Some Concluding remarks

 Trust but verify
 Beware of gene prediction tools!
 Always use more than one gene
  prediction tool and more than one
  genome when possible.
 Active area of bioinformatics research,
  so be mindful of the new literature in
  this .
Gemome annotation

More Related Content

What's hot

STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSHEETHUMOLKS
 
Whole genome shotgun sequencing
Whole genome shotgun sequencingWhole genome shotgun sequencing
Whole genome shotgun sequencingGoutham Sarovar
 
Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionArindam Ghosh
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data miningSangeeta Das
 
shotgun sequncing
 shotgun sequncing shotgun sequncing
shotgun sequncingSAIFALI444
 
Functional annotation
Functional annotationFunctional annotation
Functional annotationRavi Gandham
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicsAthira RG
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicshemantbreeder
 
Secondary protein structure prediction
Secondary protein structure predictionSecondary protein structure prediction
Secondary protein structure predictionSiva Dharshini R
 
Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysisDr. Naveen Gaurav srivastava
 
Gene bank by kk sahu
Gene bank by kk sahuGene bank by kk sahu
Gene bank by kk sahuKAUSHAL SAHU
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Vijay Hemmadi
 

What's hot (20)

Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Express sequence tags
Express sequence tagsExpress sequence tags
Express sequence tags
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
 
Scop database
Scop databaseScop database
Scop database
 
Whole genome shotgun sequencing
Whole genome shotgun sequencingWhole genome shotgun sequencing
Whole genome shotgun sequencing
 
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
 
Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure Prediction
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
shotgun sequncing
 shotgun sequncing shotgun sequncing
shotgun sequncing
 
Prosite
PrositeProsite
Prosite
 
Functional annotation
Functional annotationFunctional annotation
Functional annotation
 
Structural databases
Structural databases Structural databases
Structural databases
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Secondary protein structure prediction
Secondary protein structure predictionSecondary protein structure prediction
Secondary protein structure prediction
 
Artificial Vectors
Artificial VectorsArtificial Vectors
Artificial Vectors
 
Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysis
 
Gene bank by kk sahu
Gene bank by kk sahuGene bank by kk sahu
Gene bank by kk sahu
 
Phylogenetic data analysis
Phylogenetic data analysisPhylogenetic data analysis
Phylogenetic data analysis
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 

Viewers also liked

BIOL335: How to annotate a genome
BIOL335: How to annotate a genomeBIOL335: How to annotate a genome
BIOL335: How to annotate a genomePaul Gardner
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijayVijay Hemmadi
 
Gene identification and discovery
Gene identification and discoveryGene identification and discovery
Gene identification and discoveryAmit Ruchi Yadav
 
Genome assembly: then and now — v1.0
Genome assembly: then and now — v1.0Genome assembly: then and now — v1.0
Genome assembly: then and now — v1.0Keith Bradnam
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotationScott Dawson
 
Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Mark Pallen
 
Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Keith Bradnam
 
Computational Approaches to Systems Biology
Computational Approaches to Systems BiologyComputational Approaches to Systems Biology
Computational Approaches to Systems BiologyMike Hucka
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsajay301
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBIgeetikaJethra
 
The human genome project
The human genome projectThe human genome project
The human genome projectSahil Biswas
 
Molecular markers used in biotechnology
Molecular markers used in biotechnology Molecular markers used in biotechnology
Molecular markers used in biotechnology sana sana
 
Fine structure of gene
Fine structure of geneFine structure of gene
Fine structure of geneSayali28
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencingShital Pal
 
15 molecular markers techniques
15 molecular markers techniques15 molecular markers techniques
15 molecular markers techniquesAVINASH KUSHWAHA
 

Viewers also liked (20)

BIOL335: How to annotate a genome
BIOL335: How to annotate a genomeBIOL335: How to annotate a genome
BIOL335: How to annotate a genome
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
Gene identification and discovery
Gene identification and discoveryGene identification and discovery
Gene identification and discovery
 
Genome assembly: then and now — v1.0
Genome assembly: then and now — v1.0Genome assembly: then and now — v1.0
Genome assembly: then and now — v1.0
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotation
 
Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012
 
Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
Data mining ppt
Data mining pptData mining ppt
Data mining ppt
 
Computational Approaches to Systems Biology
Computational Approaches to Systems BiologyComputational Approaches to Systems Biology
Computational Approaches to Systems Biology
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBI
 
Rflp,rapd&aflp
Rflp,rapd&aflpRflp,rapd&aflp
Rflp,rapd&aflp
 
The human genome project
The human genome projectThe human genome project
The human genome project
 
Molecular markers used in biotechnology
Molecular markers used in biotechnology Molecular markers used in biotechnology
Molecular markers used in biotechnology
 
Fine structure of gene
Fine structure of geneFine structure of gene
Fine structure of gene
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 
15 molecular markers techniques
15 molecular markers techniques15 molecular markers techniques
15 molecular markers techniques
 

Similar to Gemome annotation

Dna Replication Slide
Dna Replication SlideDna Replication Slide
Dna Replication SlideQuanina Quan
 
Dna replication slide
Dna replication slideDna replication slide
Dna replication slideQuanina Quan
 
7.1 dna structure & replication
7.1 dna structure & replication7.1 dna structure & replication
7.1 dna structure & replicationdabagus
 
transcription and translation ppt 16.pptx
transcription and translation ppt 16.pptxtranscription and translation ppt 16.pptx
transcription and translation ppt 16.pptxKennedyKen2
 
Donohue dna practice questions
Donohue dna practice questionsDonohue dna practice questions
Donohue dna practice questionsMaria Donohue
 
Ch09 lecture dna and its role in heredity
Ch09 lecture dna and its role in heredityCh09 lecture dna and its role in heredity
Ch09 lecture dna and its role in heredityTia Hohler
 
2023 REPLICATION COMPLETE.pptx
2023 REPLICATION COMPLETE.pptx2023 REPLICATION COMPLETE.pptx
2023 REPLICATION COMPLETE.pptxFaridahAhmed1
 
Prokaryotic and eukaryotic dna replication with their clinical applications
Prokaryotic and eukaryotic dna replication with their clinical applicationsProkaryotic and eukaryotic dna replication with their clinical applications
Prokaryotic and eukaryotic dna replication with their clinical applicationsrohini sane
 
Manal- sequencing presentation-biotechpresentation-.pptx
Manal- sequencing presentation-biotechpresentation-.pptxManal- sequencing presentation-biotechpresentation-.pptx
Manal- sequencing presentation-biotechpresentation-.pptxabun6
 
Biok_2.7_DNA_replication_transcription_ and_translation.pptx
Biok_2.7_DNA_replication_transcription_ and_translation.pptxBiok_2.7_DNA_replication_transcription_ and_translation.pptx
Biok_2.7_DNA_replication_transcription_ and_translation.pptxguptav2
 
IDENTIFICATION OF PROTEIN BINDING SITE.docx
IDENTIFICATION OF PROTEIN BINDING SITE.docxIDENTIFICATION OF PROTEIN BINDING SITE.docx
IDENTIFICATION OF PROTEIN BINDING SITE.docxSNEHA AGRAWAL GUPTA
 
IB Biology 2.7 & 7.1 Slides: DNA Replication
IB Biology 2.7 & 7.1 Slides: DNA ReplicationIB Biology 2.7 & 7.1 Slides: DNA Replication
IB Biology 2.7 & 7.1 Slides: DNA ReplicationJacob Cedarbaum
 
Chapter 12 notes
Chapter 12 notesChapter 12 notes
Chapter 12 notesCXG050
 
Chp 12 cornell notes
Chp 12 cornell notesChp 12 cornell notes
Chp 12 cornell notesMRINCON002
 
Notes chpt 12
Notes chpt 12Notes chpt 12
Notes chpt 12jfg082
 
12.2 replication of dna
12.2 replication of dna12.2 replication of dna
12.2 replication of dnakathy_lambert
 
Replication class final.ppt
Replication class final.ppt Replication class final.ppt
Replication class final.ppt biochemistry1234
 
Human genome project
Human genome projectHuman genome project
Human genome project15cookho
 

Similar to Gemome annotation (20)

MOLCULAR BIOLOGY
MOLCULAR BIOLOGY MOLCULAR BIOLOGY
MOLCULAR BIOLOGY
 
Dna Replication Slide
Dna Replication SlideDna Replication Slide
Dna Replication Slide
 
Dna replication slide
Dna replication slideDna replication slide
Dna replication slide
 
7.1 dna structure & replication
7.1 dna structure & replication7.1 dna structure & replication
7.1 dna structure & replication
 
transcription and translation ppt 16.pptx
transcription and translation ppt 16.pptxtranscription and translation ppt 16.pptx
transcription and translation ppt 16.pptx
 
Donohue dna practice questions
Donohue dna practice questionsDonohue dna practice questions
Donohue dna practice questions
 
Ch09 lecture dna and its role in heredity
Ch09 lecture dna and its role in heredityCh09 lecture dna and its role in heredity
Ch09 lecture dna and its role in heredity
 
2023 REPLICATION COMPLETE.pptx
2023 REPLICATION COMPLETE.pptx2023 REPLICATION COMPLETE.pptx
2023 REPLICATION COMPLETE.pptx
 
Prokaryotic and eukaryotic dna replication with their clinical applications
Prokaryotic and eukaryotic dna replication with their clinical applicationsProkaryotic and eukaryotic dna replication with their clinical applications
Prokaryotic and eukaryotic dna replication with their clinical applications
 
Manal- sequencing presentation-biotechpresentation-.pptx
Manal- sequencing presentation-biotechpresentation-.pptxManal- sequencing presentation-biotechpresentation-.pptx
Manal- sequencing presentation-biotechpresentation-.pptx
 
Biok_2.7_DNA_replication_transcription_ and_translation.pptx
Biok_2.7_DNA_replication_transcription_ and_translation.pptxBiok_2.7_DNA_replication_transcription_ and_translation.pptx
Biok_2.7_DNA_replication_transcription_ and_translation.pptx
 
IDENTIFICATION OF PROTEIN BINDING SITE.docx
IDENTIFICATION OF PROTEIN BINDING SITE.docxIDENTIFICATION OF PROTEIN BINDING SITE.docx
IDENTIFICATION OF PROTEIN BINDING SITE.docx
 
IB Biology 2.7 & 7.1 Slides: DNA Replication
IB Biology 2.7 & 7.1 Slides: DNA ReplicationIB Biology 2.7 & 7.1 Slides: DNA Replication
IB Biology 2.7 & 7.1 Slides: DNA Replication
 
Chapter 12 notes
Chapter 12 notesChapter 12 notes
Chapter 12 notes
 
Chp 12 cornell notes
Chp 12 cornell notesChp 12 cornell notes
Chp 12 cornell notes
 
Notes chpt 12
Notes chpt 12Notes chpt 12
Notes chpt 12
 
12.2 replication of dna
12.2 replication of dna12.2 replication of dna
12.2 replication of dna
 
Replication class final.ppt
Replication class final.ppt Replication class final.ppt
Replication class final.ppt
 
Human genome project
Human genome projectHuman genome project
Human genome project
 
Tame documents
Tame documents Tame documents
Tame documents
 

Recently uploaded

Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 

Recently uploaded (20)

Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 

Gemome annotation

  • 1. Genome Annotation Delivered by Muhammad Tajammal Khan M.Phil (Botany) 11-arid-3759
  • 2. Definition: Genome Annotation is the process of interpreting raw sequence data into useful biological information Annotations describe the genome and transform raw genome sequences into biological information by integrating computational analyses, other biological data and biological expertise.
  • 3. Unannotated DNA 5' 3' Annotated DNA Legend: Exon (protein coding) Intron Intergenic sequence
  • 4. Annotation may be Structural annotation ORFs and their localisation (http://www.ncbi.nlm.nih.gov/gorf/gorf.html) gene structure coding regions location of regulatory motifs Functional annotation biochemical function biological function involved regulation and interactions expression
  • 5. Things we are looking to annotate?  CDS  mRNA  Promoter and Poly-A Signal  Pseudogenes  ncRNA
  • 6. Tools  ORF detectors ◦ NCBI: http://www.ncbi.nih.gov/gorf/gorf.html  Promoter predictors ◦ CSHL: http://rulai.cshl.org/software/index1.htm ◦ BDGP: fruitfly.org/seq_tools/promoter.html ◦ ICG: TATA-Box predictor  PolyA signal predictors ◦ CSHL: argon.cshl.org/tabaska/polyadq_form.html  Splice site predictors ◦ BDGP: http://www.fruitfly.org/seq_tools/splice.html  Start-/stop-codon identifiers ◦ DNALC: Translator/ORF-Finder ◦ BCM: Searchlauncher
  • 8. Two approaches to genome sequencing Whole Genome Shotgun An approach used to decode an organism's genome by shredding it into smaller fragments of DNA which can be sequenced individually. The sequences of these fragments are then ordered, based on overlaps in the genetic code, and finally reassembled into the complete sequence. The 'whole genome shotgun' (WGS) method is applied to the entire genome all at once, while the 'hierarchical shotgun' method is applied to large, overlapping DNA fragments of known location in the genome.
  • 9. Two approaches to genome sequencing Hierarchical shotgun method Assemble contigs from various chromosomes, then sequence and assemble them. A contig is a set of overlapping clones or sequences from which a sequence can be obtained. A contig is thus a chromosome map showing the locations of those regions of a chromosome where contiguous DNA segments overlap. Contig maps are important because they provide the ability to study a complete, and often large segment of the genome by examining a series of overlapping clones which then provide an unbroken succession of information about that region.
  • 12. 1. Prepare genomic DNA 2. Attach DNA to surface DNA 3. Bridge amplification 4. Fragments become adapters double stranded 5. Denature the double- stranded molecules 6. Complete amplification Randomly fragment genomic DNA and ligate adapters to both ends of the fragments
  • 13. adapter DNA 1. Prepare genomic DNA fragment 2. Attach DNA to surface dense lawn 3. Bridge amplification of primers adapter 4. Fragments become double stranded 5. Denature the double- stranded molecules 6. Complete amplification Bind single-stranded fragments randomly to the inside surface of the flow cell channels
  • 14. 1. Prepare genomic DNA 2. Attach DNA to surface 3. Bridge amplification 4. Fragments become double stranded 5. Denature the double- stranded molecules 6. Complete amplification Add unlabeled nucleotides and enzyme to initiate solid-phase bridge amplification
  • 15. 1. Prepare genomic DNA 2. Attach DNA to surface Attached 3. Bridge amplification Attached terminus free terminus terminus 4. Fragments become double stranded 5. Denature the double- stranded molecules 6. Complete amplification The enzyme incorporates nucleotides to build double-stranded bridges on the solid-phase substrate
  • 16. 1. Prepare genomic DNA 2. Attach DNA to surface Attached 3. Bridge amplification Attached 4. Fragments become double stranded 5. Denature the double- stranded molecules 6. Complete amplification Denaturation leaves single- stranded templates anchored to the substrate
  • 17. 1. Prepare genomic DNA 2. Attach DNA to surface 3. Bridge amplification 4. Fragments become double stranded 5. Denature the double- stranded molecules Clusters 6. Complete amplification Several million dense clusters of double-stranded DNA are generated in each channel of the flow cell
  • 18. 7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles 12. Align data Laser The first sequencing cycle begins by adding four labeled reversible terminators, primers, and DNA polymerase
  • 19. 7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles 12. Align data After laser excitation, the emitted fluorescence from each cluster is captured and the first base is identified
  • 20. 7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles Laser 12. Align data The next cycle repeats the incorporation of four labeled reversible terminators, primers, and DNA polymerase
  • 21. 7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles 12. Align data After laser excitation the image is captured as before, and the identity of the second base is recorded.
  • 22. 7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles 12. Align data The sequencing cycles are repeated to determine the sequence of bases in a fragment, one base at a time.
  • 23. Reference sequence 7. Determine first base 8. Image first base 9. Determine second base 10. Image second chemistry cycle 11. Sequencing over multiple chemistry cycles 12. Align data The data are aligned and compared to a reference, and sequencing differences are identified.
  • 24. The generic structure of an automatic genome annotation pipeline and delivery system
  • 25. The Annotation Process ANNALYSIS SOFTWARE DNA SEQUENCE Useful Information Annotator
  • 26. Annotation Process DNA sequence RepeatMasker Blastn Gene finders Blastx Halfwise tRNA scan Repeats Promoters rRNA Pseudo-Genes tRNA Genes Fasta BlastP Pfam Prosite Psort SignalP TMHMM
  • 27. Genome Browsers Generic Genome Browser (CSHL) NCBI Map Viewer Ensembl Genome Browser www.wormbase.org/db/seq/gbrowse www.ncbi.nlm.nih.gov/mapview/ www.ensembl.org/ UCSC Genome Browser Apollo Genome Browser genome.ucsc.edu/cgi-bin/hgGateway?org=human www.bdgp.org/annot/apollo/
  • 28. What is gene prediction? Detecting meaningful signals in uncharacterised DNA sequences. Knowledge of the interesting information in DNA. GATCGGTCGAGCGTAAGCTAGCTAG ATCGATGATCGATCGGCCATATATC ACTAGAGCTAGAATCGATAATCGAT CGATATAGCTATAGCTATAGCCTAT  Gene prediction is ‘recognising protein- coding regions in genomic sequence’
  • 29. Basic Gene Prediction Flow Chart Obtain new genomic DNA sequence 1. Translate in all six reading frames and compare to protein sequence databases 2. Perform database similarity search of expressed sequence tag Sites (EST) database of same organism, or cDNA sequences if available Use gene prediction program to locate genes Analyze regulatory sequences in the gene
  • 30. Approaches to gene prediction Ab Initio Gene Finding http://exon.gatech.edu/GenMark/eukhmm.cgi http://sun1.softberry.com/berry.phtml=fgenesh&group=programs &subgroup=gfind Repeat Masking http://www.repeatmasker.org http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker Transcript based prediction http://plantta.tigr.org http://harvest.ucr.edu/ Gene function CDNA http://au.expasy.org/sprot/ http://www.pir.uniprot.org/ Gene Ontologies http://www.geneontology.org
  • 31. Visualization Tools http://www.gmod.org/?q=node/4 http://www.gmod.org/?q=node/71
  • 32. Prediciton of Secondary Structure and Folding Classes • nnpredict http://www.cmpharm.ucsf.edu/_nomi/nnpredict.html • PredictProtein http://www.embl-heidelberg.de/predictprotein/ • SOPMA http://pbil.ibcp.fr/ • Jpred http://jura.ebi.ac.uk:8888/ • PSIPRED http://insulin.brunel.ac.uk/psipred • PREDATOR http://www.embl-heidelberg.de/predator.html Prediction of Specialized Structures or Features • COILS http://www.ch.embnet.org/software/COILSform.html • MacStripe www.york.ac.uk/depts/biol/units/coils/mstr2.html • PHDtopology http://www.embl-heidelberg.de/predictprotein • SignalP http://www.cbs.dtu.dk/services/SignalP/ • TMpred http://www.isrec.isb-sib.ch/ftp-erver/tmpred www/TMPREDform.html Structure Prediction • DALI http://www2.ebi.ac.uk/dali/ • Bryant-Lawrence ftp://ncbi.nlm.nih.gov/pub/pkb/ • FSSP http://www2.ebi.ac.uk/dali/fssp/ • UCLA-DOE http://fold.doe-mbi.ucla.edu/Home • SWISS-MODEL http://www.expasy.ch/swissmod/SWISS-MODEL
  • 33. Search using the gene name
  • 34. Click the name for transcript info
  • 35. Click the contig to see detailed information
  • 36. Click the contig to see detailed information Click here to export the contig
  • 37.
  • 38. Select output format, then click Continue
  • 39. Save or copy the data for further analysis
  • 42. Click here for pairwise BLAST
  • 43. Click ‘Align’ to proceed Paste in genomic sequence Paste in CDS sequence
  • 44.
  • 45.
  • 46.
  • 47. This is a protein BLAST (BLASTP)
  • 48.
  • 49. Some Concluding remarks  Trust but verify  Beware of gene prediction tools!  Always use more than one gene prediction tool and more than one genome when possible.  Active area of bioinformatics research, so be mindful of the new literature in this .