Your SlideShare is downloading. ×
0
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Genome annotation 2013
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Genome annotation 2013

1,055

Published on

Genome annotation, NGS sequence data, decoding sequence information, The genome contains all the biological information required to build and maintain any given living organism. …

Genome annotation, NGS sequence data, decoding sequence information, The genome contains all the biological information required to build and maintain any given living organism.

Published in: Education, Technology
1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total Views
1,055
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
96
Comments
1
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Try to describe Genome annotation as a process
    Emphasize the ongoing nature of annotation.
    There is no real end point to the annotation process (only artificially defined ones)
    Best to think of this as a ‘best guess’ annotation
  • Softmasking
  • Softmasking
  • Try to describe Genome annotation as a process
    Emphasize the ongoing nature of annotation.
    There is no real end point to the annotation process (only artificially defined ones)
    Best to think of this as a ‘best guess’ annotation
  • Try to describe Genome annotation as a process
    Emphasize the ongoing nature of annotation.
    There is no real end point to the annotation process (only artificially defined ones)
    Best to think of this as a ‘best guess’ annotation
  • Try to describe Genome annotation as a process
    Emphasize the ongoing nature of annotation.
    There is no real end point to the annotation process (only artificially defined ones)
    Best to think of this as a ‘best guess’ annotation
  • Try to describe Genome annotation as a process
    Emphasize the ongoing nature of annotation.
    There is no real end point to the annotation process (only artificially defined ones)
    Best to think of this as a ‘best guess’ annotation
  • Transcript

    • 1. Genome Annotation Karan Veer Singh, Scientist. NBAGR, Karnal, India 1
    • 2. The Genome • The genome contains all the biological information required to build and maintain any given living organism • The genome contains the organisms molecular history • Decoding the biological information encoded in these molecules will have enormous impact in our understanding of biology
    • 3. Genomics 1. Structural genomics-genetic and physical mapping of genomes. 2. Functional genomics-analysis of gene function (and non-genes). 3. Comparative genomics-comparison of genomes across species.  Includes structural and functional genomics.  Evolutionary genomics.
    • 4. Human Genome Project The Human genome project promised to revolutionise medicine and explain every base of our DNA. Large MEDICAL GENETICS focus Identify variation in the genome that is disease causing Determine how individual genes play a role in health and disease
    • 5. Human Genome Project & Functional Genome It cost 3 billion dollars and took 10 years to complete (5 less than initially predicted). • Approx 200 Mb still in progress – Heterochromatin – Repetitive
    • 6. Genomics & Genome annotation  First genome annotation software system was designed in 1995 by Dr. Owen White with The Institute for Genomic Research that sequenced and analyzed the first genome of a free-living organism to be decoded, the bacterium Haemophilus influenzae  It involve assembling of the reads to form contigs then assembling with a reference genome (reference assembly) or de novo assembly to obtain the complete genome  Variations such as mutations, SNP, InDels etc can be identified  The genome is then annotated by structural and functional annotation  Mapping Image of Whole genome in an easily understandable manner.
    • 7. Sequence to Annotation
    • 8. Input1 to Genome Viewer- Variant Annotation
    • 9. Input2 to Genome Viewer- Structural Annotation  Structural 2.5.5) Annotation- AUGUSTUS (version
    • 10. Input3 to Genome Viewer-Functional Annotation
    • 11. Genome Annotation  The process of identifying the locations of genes and the coding regions in a genome to determe what those genes do  Finding and attaching the structural elements and its related function to each genome locations 11
    • 12. Genome Annotation gene structure prediction gene function prediction Identifying elements (Introns/exons,CDS,stop,start) in the genome Attaching biological information to these elements- eg: for which 12 protein exon will code for
    • 13. Structural annotation Structural annotation - identification of genomic elements Open reading frame and their localisation gene structure coding regions location of regulatory motifs
    • 14. Functional annotation Functional annotation- attaching biological information to genomic elements biochemical function biological function involved regulations
    • 15. Genome annotation - workflow Genome sequence Repeats Masked or un-masked genome sequence Structural annotation-Gene finding nc-RNAs (tRNA, rRNA), Introns Protein-coding genes Functional annotation View in Genome viewer 16
    • 16. Genome Repeats & features Polymorphic between individuals/populations  Percentage of repetitive sequences in different organisms Genome Aedes aegypti Genome Size (Mb) % Repeat ~70 Anopheles gambiae 260 ~30 Culex pipiens      1,300 540 ~50 Microsatellite Minisatellite Tandem repeat Short tandem repeat SSR 17
    • 17. Finding repeats as a preliminary to gene prediction  Repeat discovery Homology based approaches Use RepeatMasker to search the genome and mask the sequence 18
    • 18. Masked sequence   Repeatmasked sequence is an artificial construction where those regions which are thought to be repetitive are marked with X’s Widely used to reduce the overhead of subsequent computational analyses and to reduce the impact of TE’s in the final annotation set >my sequence >my sequence (repeatmasked) atgagcttcgatagcgatcagctagcgatcaggct actattggcttctctagactcgtctatctctatta gctatcatctcgatagcgatcagctagcgatcagg ctactattggcttcgatagcgatcagctagcgatc aggctactattggcttcgatagcgatcagctagcg atcaggctactattggctgatcttaggtcttctga tcttct atgagcttcgatagcgatcagctagcgatcaggct actattxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxatctcgatagcgatcagctagcgatcagg ctactattxxxxxxxxxxxxxxxxxxxtagcgatc aggctactattggcttcgatagcgatcagctagcg atcaggctxxxxxxxxxxxxxxxxxxxtcttctga tcttct Positions/locations are not affected by masking 19
    • 19. Types of Masking- Hard or Soft?  Sometimes we want to mark up repetitive sequence but not to exclude it from downstream analyses. This is achieved using a format known as soft-masked >my sequence >my sequence (softmasked) ATGAGCTTCGATAGCGCATCAGCTAGCGATCAGGC TACTATTGGCTTCTCTAGACTCGTCTATCTCTATT AGTATCATCTCGATAGCGATCAGCTAGCGATCAGG CTACTATTGGCTTCGATAGCGATCAGCTAGCGATC AGGCTACTATTGGCTTCGATAGCGATCAGCTAGCG ATCAGGCTACTATTGGCTGATCTTAGGTCTTCTGA TCTTCT ATGAGCTTCGATAGCGCATCAGCTAGCGATCAGGC TACTATTggcttctctagactcgtctatctctatt agtatcATCTCGATAGCGATCAGCTAGCGATCAGG CTACTATTggcttcgatagcgatcagcTAGCGATC AGGCTACTATTggcttcgatagcgatcagcTAGCG ATCAGGCTACTATTGGCTGATCTTAGGTCTTCTGA TCTTCT >my sequence (hardmasked) atgagcttcgatagcgatcagctagcgatcaggct actattxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxatctcgatagcgatcagctagcgatcagg ctactattxxxxxxxxxxxxxxxxxxxtagcgatc aggctactattggcttcgatagcgatcagctagcg atcaggctxxxxxxxxxxxxxxxxxxxtcttctga20 tcttct
    • 20. Genome annotation - workflow Genome sequence Map repeats Masked or un-masked Gene finding- structural annotation nc-RNAs, Introns Protein-coding genes Functional annotation View in Genome viewer 21
    • 21. Structural annotation Identification of genomic elements  Open reading frame and their localization  Coding regions  Location of regulatory motifs  Start/Stop  Splice Sites  Non coding Regions/RNA’s  Introns 22
    • 22. Methods  Similarity • Similarity between sequences which does not necessarily infer any evolutionary linkage  Ab- initio prediction • Prediction of gene structure from first principles using only the genome sequence 24
    • 23. Genefinding ab initio similarity 25
    • 24. ab initio prediction Genome Coding potential ATG & Stop codons Splice sites ATG & Stop codons Coding potential Examples: Genefinder, Augustus, Glimmer, SNAP, fgenesh 26
    • 25. Genefinding - similarity  Use known coding sequence to define coding regions  EST sequences  Peptide sequences Problem to handle fuzzy alignment regions around splice sites Examples: EST2Genome, exonerate, genewise, Augustus, Prodigal Gene-finding - comparative  Use two or more genomic sequences to predict genes based on conservation of exon sequences  Examples: Twinscan and SLAM 27
    • 26. Genome annotation - workflow Genome sequence Map repeats Masked or un-masked Gene finding- structural annotation Gene finding- structural annotation nc-RNAs, Introns Protein-coding genes Functional annotation View in Genome viewer 28
    • 27. Genefinding - non-coding RNA genes  Non-coding RNA genes can be predicted using knowledge of their structure or by similarity with known examples  tRNAscan - uses an HMM and co-variance model for prediction of tRNA genes  Rfam - a suite of HMM’s trained against a large number of different RNA genes 29
    • 28. Gene-finding omissions Alternative isoforms Currently there is no good method for predicting alternative isoforms Only created where supporting transcript evidence is present Pseudogenes Each genome project has a fuzzy definition of pseudogenes Badly curated/described across the board Promoters Rarely a priority for a genome project Some algorithms exist but usually not integrated into an annotation set 30
    • 29. Practical- structural annotation Eukaryotes- AUGUSTUS (gene model) ~/Programs/augustus.2.5.5/bin/augustus --strand=both --genemodel=partial --singlestrand=true --alternatives-from-evidence=true --alternatives-from-sampling=tr --progress=true --gff3=on --uniqueGeneId=true --species=magnaporthe_grisea our_genome.fasta >structural_annotation.gff Prokaryotes – PRODIGAL (Codon Usage table) ~/Programs/prodigal.v2_60.linux -a protein_file.fa -g 11 –d nucleotide_exon_seq.fa -f gff -i contigs.fa -o genes_quality.txt -s genes_score.txt -t genome_training_file.txt 31
    • 30. Structural Annotation-output  Structural Annotation conducted using AUGUSTUS (version 2.5.5), Magnaporthe_grisea as genome model
    • 31. Functional annotation 33
    • 32. Genome annotation - workflow Genome sequence Map repeats Masked or un-masked Gene finding- structural annotation nc-RNAs, Introns Protein-coding genes Functional annotation View in Genome viewer 34
    • 33. Functional annotation Genome Transcription Primary Transcript RNA processing Processed mRNA ATG STOP m 7G AAAn Translation Polypeptide Protein folding Folded protein Find function Enzyme activity Functional activity A B 35
    • 34. Functional annotation Attaching biological information to genomic elements Biochemical function Biological function Involved regulation and interactions Expression • Utilize known structural annotation to predicted protein sequence 36
    • 35. Functional annotation – Homology Based  Predicted Exons/CDS/ORF are searched against the non-redundant protein database (NCBI, SwissProt) to search for similarities  Visually assess the top 5-10 hits to identify whether these have been assigned a function  Functions are assigned 37
    • 36. Functional annotation - Other features  Other       features which can be determined Signal peptides Transmembrane domains Low complexity regions Various binding sites, glycosylation sites etc. Protein Domain Secretome See http://expasy.org/tools/ for a good list of possible prediction algorithms 38
    • 37. Functional annotation - Other features (Ontologies)  Use  of ontologies to annotate gene products Gene Ontology (GO)    Cellular component Molecular function Biological process 39
    • 38. Practical - FUNCTIONAL ANNOTATION  Homology Based Method  setup blast database for nucleotide/protein  Blasting the genome.fasta for annotations (nucleotide/protein)  sorting for blast minimum E-value (>=0.01) for nucleotide/protein  assigning functions 40
    • 39. Functional annotation- output August 2008 Bioinformatics tools for Comparative Genomics of Vectors 41
    • 40. Conclusion  Annotation accuracy is dependent available supporting data at the time of annotation; update information is necessary  Gene predictions will change over time as new data becomes available (NCBI) that are much similar than previous ones  Functional assignments will change over time as new data becomes available (characterization of hypothetical proteins) 42
    • 41. Genome annotation - workflow Genome sequence Map repeats Masked or un-masked Gene finding- structural annotation nc-RNAs, Introns Protein-coding genes Functional annotation View in Genome viewer 43
    • 42. Genome Viewer The Files that can be visualised Annotation files Indel files Consensus sequence Comparative Genomics 44
    • 43. Genome View August 2008 45
    • 44. 46
    • 45. 47
    • 46. 48
    • 47. Short Read track 49
    • 48. Thank You 50

    ×