SlideShare a Scribd company logo
STRUCTURAL ANNOTATION
• Structural annotation of genome by computer analysis of sequence data and experimental
techniques
INTRODUCTION
• The scope of genome annotation has expanded since the first complete annotation
of the Haemophilus influenzae genome in 1995( Fleishmann et al., 1995).
• Once a DNA sequence has been obtained, whether it is the sequence of a single
cloned fragment or of an entire chromosome, then various methods can be
employed to locate the genes that are present.
• These methods can be divided into those that involve simply inspecting the
sequence, by eye or more frequently by computer, to look for the special sequence
features associated with genes, and those methods that locate genes by
experimental analysis of the DNA sequence. The computer methods form part of the
methodology called bioinformatics.
• The first software used to analyze sequencing reads is the ‘Staden Package’
created by Rodger Staden in 1977( Staden, 1977).
STRUCTURAL ANNOTATION
• Finding features of DNA—exons, introns, promoters, transposons, etc.—is known as
structural annotation. Structural annotation attempts to find genes in a genomic sequence.
• A gene can be defined as "a sequence region necessary for generating functional
products" . Functional products of genes are proteins and RNAs. Genes that lead to the
production of proteins are called protein-coding genes.
• Other genes that do not code proteins, but instead functional RNA molecules, are called
noncoding genes. Noncoding RNA genes include genes for ribosomal RNA (rRNA),
transfer RNA (tRNA), microRNA (miRNA), small nuclear RNA and nucleolar RNA (snRNA
and snoRNA, respectively) and long noncoding RNA (lncRNA).
• Structural annotations also identify pseudogenes. They were initially considered to be
functionless and evolutionary dead-ends. We now know that they sometimes participate in
gene regulation. Hence, their prediction improves our understanding of genomes.
• Sequence inspection can be used to locate genes because genes are not random series of nucleotides but
instead have distinctive features.
• At present we do not fully understand the nature of all of these specific features, and sequence inspection is
therefore not a foolproof way of locating genes, but it is still a powerful tool and is usually the first method that
is applied to analysis of a new genome sequence.
GENE LOCATION BY SEQUENCE INSPECTION
The coding regions of genes are open reading frames
• Genes that code for proteins comprise open reading frames
(ORFs) consisting of a series of codons that specify the amino
acid sequence of the protein that the gene codes for.
• The ORF begins with an initiation codon usually (but not
always) ATG-and ends with a termination codon: TAA, TAG, or
TGA . Searching a DNA sequence for ORFs that begin with an
ATG and end with a termination triplet is therefore one way of
looking for genes.
• The analysis is complicated by the fact that each DNA
sequence has six reading frames, three in one direction and
three in the reverse direction on the complementary strand , but
computers are quite capable of scanning all six reading frames
for ORFs.
A PROTEIN-CODING GENE IS AN OPEN READING FRAME
OF TRIPLET CODONS
• With bacterial genomes, simple ORF scanning is an effective way of locating most of the genes in a
DNA sequence.
• With bacteria the analysis is further simplified by the fact that the genes are very closely spaced and
hence there is relatively little intergenic DNA in the genome (only 11% for E. coli).
• If we assume that the real genes do not overlap, which is true for most bacterial genes, then it is only
in the intergenic regions that there is a possibility of mistaking a short, spurious ORF for a real gene.
• So if the intergenic component of a genome is small, then there is a reduced chance of making
mistakes in interpreting the results of a simple ORF scan.
Repeats
• The first step in structural annotation involves repeat masking. DNA repeats occur in both
prokaryotic and eukaryotic organisms.
• The repeats account for 0% to over 42% of the prokaryotic genome . Similarly, eukaryotic
genomes can harbor millions of repeats.
• For instance, repeats account for two-thirds of the human genome . Repeat sequences can be
localized in tandem, i.e., adjacent to one other, and are typically found in the centromere .
Alternatively, they can be interspersed in different forms of transposable elements, e.g., in long
and short interspersed nuclear elements (LINEs and SINEs), DNA transposons, etc. .
• Repeat masking tools rely on databases with lists of already identified repeats. RepeatMasker is
a good example of such tool.
• Aligning transcript and protein evidence after masking is the second step of structural annotation
before gene identification, although it is not mandatory. BLAST or BLAT can be used to align the
transcript and protein evidence.
• Further, RNA-seq evidence can be aligned using TopHat or HISAT .
Predictions of Gene and Different Features
• Identifying protein-coding genes and other regulatory elements takes center stage in gene annotation. Gene
prediction is a complex process, especially for eukaryotic DNA .
• The varying sizes of introns(noncoding sequences) in-between exons and alternative splice variants make
gene structure prediction difficult.
• Many gene prediction programs exist. They can be categorized into three groups: ab initio methods,
homology-based methods, and combined methods.
• Approaches for gene prediction based on nucleotide sequence are called ab initio methods. Ab initio
approaches rely on statistical models, such as the hidden Markov model (HMM), to identify promoters, coding
or noncoding regions, and intron–exon junctions in the genome sequence.
• The second approach aligns the sequence with expressed sequence tags (EST), complementary DNA (cDNA),
or protein evidence, and uses detected similarities for gene prediction.
• The other group comprises programs that combine ab initio and evidence- or homology-based
approaches for gene prediction.
• In addition, gene prediction programs should be able to predict alternative splicing sites because
alternative splicing is a major actor in the regulation of gene expression, and transcriptome and proteome
diversity .
• Accordingly, gene prediction programs use various models to predict splice sites. Since approximately
99% of the introns in sequenced genomes begin with GT and end with AG, these features are denoted as
mandatory by most gene prediction systems for splice site detection.
• In addition, incorporation of a strong splice donor consensus, such as the GC–AG splice site, improves
the accuracy of gene prediction programs.
https://services.healthtech.dtu.dk/service.php?EasyGene-1.2
http://www.softberry.com/berry.phtml?topic=fgen
es &group=programs&subgroup=gfind
http://opal.biology.gatech.edu/GeneMark/
http://www.genezilla.org/
http://hollywood.mit.edu/GENSCAN.html
Commonly used gene prediction programs and their classification, based on the above discussion
http://ccb.jhu.edu/software/glimmerhmm/
https://services.healthtech.dtu.dk/service.php?HMMgene-1.1
https://galaxy.inf.ethz.ch/tool_runner?tool_id=mgenepredict
https://services.healthtech.dtu.dk/service.php?NetGene2-2.42
http://www.cbs.dtu.dk/services/RNAmmer/
https://github.com/KorfLab/SNAP
http://lowelab.ucsc.edu/tRNAscan-SE/
http://galaxy.informatik.uni-halle.de/
http://genomethreader.org/
https://mblab.wustl.edu/software.html
http://www.pseudogene.org/pseudopipe/
https://mblab.wustl.edu/software.html
http://bioinf.uni-greifswald.de/augustus/
http://www.cbcb.umd.edu/software/jigsaw/
Databases for Structural Annotation
• Annotations require supporting data that can be used or presented as evidence of predicted
assignments. Currently, homology-based methods play a central role in genome annotation because
of the huge amount of EST and cDNA sequences available .
• Homology-based methods depend on DNA, RNA, or protein sequence alignment data, which can
easily be retrieved from biological databases. Ab initio annotations, on the other hand, identify genes
and their structures using mathematical models.
• Nonetheless, the ab initio gene predictors have to be trained using high-quality gene models or
organism-specific genome traits, such as codon frequency and intron–exon length distribution .
Further, ab initio models require ESTs, RNA-seq data, and proteins to improve prediction accuracy.
• Databases readily provide such data. Nucleotide and protein sequence or structure can easily be
found in comprehensive public-domain databases, e.g., the GenBank , European Nucleotide Archive
(ENA) , and DNA Databank of Japan (DDBJ). UniProt , which is a protein sequence database that
combines UniProtKB/Swiss-Prot (over 560,000 manually curated sequences) and
UniProtKB/TrEMBL (180 million automatically annotated sequences), provides the scientific
community with high-quality and freely accessible protein sequences with the associated functional
information.
Comparative Annotation Methods
• Genome annotation achieved by comparison of genes and genomes across species can be a reliable
information source for understanding genome evolution. Comparative annotation allows annotations of a
well-studied genome to be projected onto an evolutionarily close species. It often focuses on the coding
genes.
• Valuable information for comparative annotation can be found from genome alignment. A well-aligned
genome will yield sound data for comparative annotation .
• Approaches to comparative annotation of genomes can be categorized into ab initio methods and
homology-based methods, considering the input information used for annotation, i.e., either a statistical
model of genes, or protein sequence, EST, and cDNA, accordingly.
• Ab initio approaches are preferred for genes that are weakly or not at all represented in RNA-seq library and
have insufficient similarity to any known protein and lack other evidence.
• Related species have genomes that share similarities inherited from their common ancestor,
over- laid with species-specific differences that have arisen since the species began to evolve
independently. Because of natural selection, the sequence similarities between related genomes
are greatest within the genes and lowest in the intergenic regions.
• Therefore, when related genomes are compared, homologous genes are easily identified
because they have high sequence similarity, and any ORF that does not have a clear homolog in
the second genome can be discounted as almost certainly being a chance sequence and not a
genuine gene. This type of analysis-called comparative genomics
Homology-Based Annotation
• For predict and annotate genes by identifying significant matches from a well annotated genome
sequence by employing alignment tools such as BLAST.
• Homology-based annotations use the coding sequences (CDS), usually protein sequences and
sometimes transcripts in the form of mRNA, cDNA, or EST to predict genes, assuming similar sequence
regions encode homologous proteins.
• Tools like Exonerate and DIALIGN can be used for sequence alignment; GenomeThreader and
AGenDA are used for gene predictions. Increased evolutionary distance between the input protein and
the target protein reduces the accuracy of homology-based gene finding. This happens because of
heavy reliance on the alignment and information derived from the already known genes, which creates a
challenge in identifying genes whose properties are different from those of referenced genes.
• However, newer comparative approaches solve this issue by relying to a greater degree on sequence
conservation, which enables them to identify genes with new features and different statistical
composition.
• TWINSCAN and SGP2 are examples of tools in which gene prediction uses the analysis of sequence
conservation patterns between genomic sequences of evolutionarily related organisms.
Ab Initio Annotation
• Ab initio annotation relies on ab initio gene predictors, which in turn rely on training data to construct an
algorithm or model.
• Prediction is done based on the genomic sequence in question, using statistical analysis and other gene
signals such as k-mer statistics and frame length.
• Some popular ab initio gene predictors are discussed below. AUGUSTUS defines the probability distributions
for eukaryotic genome sequences based on GHMM. AUGUSTUS is re-trainable and it can predict alterative
splicing, and the 50UTR and 30UTR, including introns. AUGUSTUS is one of the most accurate ab initio
gene prediction programs for the species it has been trained for .
• FGENESH is an HMM-based, very fast, and accurate ab initio gene structure prediction program for
humans, Drosophila, plants, yeasts, and nematodes.
• This renders it the fastest tool among HMM-based gene finding programs. GENSCAN is another HMM-
based ab initio tool for predicting locations and exon–intron structures of genes in genomic sequences of a
variety of organisms.
• Vertebrate and invertebrate versions of GENSCAN are available. The accuracy of the latter is lower
because the original tool was primarily designed for the detection of genes in human and vertebrate
genomic sequences.
• It is becoming a common practice to use ab initio annotation methods in combine a sequences
transcriptome information such as that provided by RNA-seq.
• This can be viewed as an evidence-based or extrinsic approach. For example, a newer version of
AUGUSTUS can incorporate information from EST and protein alignments.
• In addition, a variant of FGENESH called FGENESH-C uses HMM and cDNA for predictions, while
GenomeScan (an extension of GENSCAN) uses extrinsic information of protein BLAST alignments for
gene structure prediction.
Ab initio and Homology based annotation tools summary.
Annotation Pipelines
• Analysis of large amounts of data generated by the sequencing requires multiple computationally-intensive
steps . Sets of algorithms that process sequence data and are executed in a predefined order are called a
bioinformatic pipelines.
• Pipelines process massive amounts of sequence data and the associated metadata using multiple
software components, databases, and environments.
• They are comprehensive, holistic packages that try to exploit relevant information provided by both ab
initio and similarity-based gene predictors.
Structural Pipelines
• MAKER2 is a multi-threaded, parallelized genome annotation and data management application, which
builds up on MAKER.
• Ab initio gene prediction tools SNAP, AUGUSTUS, and GenMark-ES are integrated in MAKER2. Novel
genomes with limited training data available can be annotated with MAKER2. The tool can also be used
to improve annotation quality by integrating mRNA-seq data.
• NCBI Eukaryotic Annotation Pipeline is an automated pipeline for eukaryotes, in which coding and
noncoding genes, transcripts, and proteins in both finished and draft genomes can be annotated. This
pipeline uses Splign and ProSplign for alignment. It also has its own gene prediction tool called
GNOMON which combines HMM-based ab initio models and homology search information extracted
from experimental evidence.
• Comparative Annotation Toolkit (CAT) is a fully open-source software toolkit for end-to-end annotation.
CAT uses Progressive Cactus for multiple alignments. It’s output, together with previously annotated
genomes, is used to project annotations using TransMAP .
• CAT uses AUGUSTUS for gene prediction both from transMap projections and for ab initio gene
prediction.
• CAT wan developed by the GENCODE, and was utilized for the annotation of genomes of laboratory
mouse strains and great apes .
• BRAKER1 is a fully automated and highly accurate unsupervised RNA-seq–based genome
annotation pipeline for eukaryotic genomes.
Annotation Visualization
• File Formats
Most bioinformatic tools use the FASTA format as a standard for sequence data sharing. The FASTA
format is used for searching sequence databases, evaluating similarity scores, and identification of
periodic similarity scores.
Other standard file formats exisformat can accommodate additional information and can be used by
different programs, and interpreformat human users. It format genomic features in a standard text file
format.
• Genome Browsers
• Researchers and users utilize genome browsers to integrate various types of information, as well as
analyze and visualize data related to annotation.
• Genome browsers are usually used to efficiently and conveniently browse, search, retrieve, and
examine genomic sequence and annotation data, via a graphical interface. The UCSC Genome
Browser is the most commonly used genome browser; many visualization tools are modeled based on
this tool.
• The Ensembl genome browser is another widely used genome browser for vertebrate genomes, which
supports comparative genomics, sequence variation analysis, and transcriptional regulation analysis.
• Generic Model Organism Database (GMOD) is a collection of interconnected open-source software
tools and databases for managing, visualizing, storing, and sharing genetic and genomic information.
Re-Annotation
• We have seen that as a result of the increasing volume of data from genome sequencing projects,
computational analysis methods have become a considerable element of genome annotations. However, this
has led to high levels of misannotation in public databases .
• Re-annotation benefits the end-user by providing the latest resources. Updating a previously annotated
genome can be seen as re-annotation . Automated annotations save time and resources, but manual
annotations, although time-consuming, are better than automated annotations.
• Re-annotation can be used to create large complete genomes, and indeed, there are tools that can be used
for this purpose. Restauro-G is rapid bacterial genome re-annotation software that utilizes a BLAST-like
alignment tool for re-annotation.
• MAKER2 incorporates an external annotation pass-through mechanism that accepts pre-existing genome
annotations.
• Wiki-based sites have been proven successful in providing accurate, useful, and updated information,
despite the fear of being filled with unreliable and inaccurate data. Currently, new information emerges from
different corners of bioinformatic fields, which impacts gene annotation, rendering re-annotation a never-
ending process, to some degree.
EXPERIMENTAL TECHNIQUES FOR GENE LOCATION
• Most experimental methods for gene location are not based on direct examination of
DNA molecules but instead rely on detection of the RNA molecules that are
transcribed from genes.
• All genes are transcribed into RNA, and if the gene is discontinuous then the primary
transcript is subsequently processed to remove the introns and link up the exons .
• Techniques that map the positions of transcribed sequences in a DNA fragment can
therefore be used to locate exons and entire genes. The only problem to be kept in
mind is that the transcript is usually longer than the coding part of the gene because it
begins several tens of nucleotides upstream of the initiation codon and continues
several tens or hundreds of nucleotides downstream of the termination codon.
Hybridization tests can determine if a fragment contains transcribed sequences
• The simplest procedures for studying transcribed sequences are based on hybridization analysis.
• RNA molecules can be separated by specialized forms of agarose gel electrophoresis, transferred to a
nitrocellulose or nylon membrane, and examined by the process called northern hybridization.
• This differs from Southern hybridization only in the precise conditions under which the transfer is carried
out, and the fact that it was not invented by a Dr Northern and so does not have a capital "N."
• If a northern blot of cellular RNA is probed with a labeled fragment of the genome, then RNAs
transcribed from genes within that fragment will be detected. Northern hybridization is therefore,
theoretically, a means of determining the number of genes present in a DNA fragment and the size of
each coding region.
• An RNA is electrophoresed under denaturing
conditions in an agarose gel.
• After ethidium bromide staining, two bands are seen.
These are the two largest rRNA molecules , which are
abundant in most cells.
• The smaller rRNAs, which are also abundant, are not
seen because they are so short that they run out the
bottom of the gel and, in most cells, none of the
mRNAs are enough to form a band visible after
ethidium bromide staining.
• The gel is blotted onto a nylon membrane and, in this
example, probed with a radioactively labeled DNA
fragment.
• A single band is visible on the autoradiograph, showing
that the DNA fragment used as the probe contains part
or all of one transcribed sequence. northern hybridization
zoo-blotting
• A second type of hybridization analysis avoids the problems with
poorly expressed and tissue-specific genes by searching not for
RNAs but for related sequences in the DNAs of other organisms.
• This approach, like homology searching, is based on the fact that
homologous genes in related organisms have similar sequences,
whereas the intergenic DNA is usually quite different. a DNA
from one species is used to probe a Southern transfer of DNAs
from related species, and one or more hybridization signals are
obtained, then it is likely that the probe contains one or more
genes. This is called zoo-blotting.
zoo-blotting.
cDNA sequencing enables genes to be mapped within DNA fragments
• Northern hybridization and zoo-blotting enable the presence or absence of genes in a DNA fragment to be
determined, but give no positional information relating to the location of those genes in the DNA sequence.
The easiest way to obtain this information is to sequence the relevant cDNAs.
• A cDNA is a copy of an mRNA and so corresponds to the coding region of a gene, plus any leader or trailer
sequences that are also transcribed. Comparing a cDNA sequence with a genomic DNA sequence
therefore delineates the position of the relevant gene and reveals the exon-intron boundaries.
• In order to obtain an individual cDNA, a cDNA library must first be prepared from all of the mRNA in the
tissue being studied. Once the library has been prepared, the success of cDNA sequencing as a means of
gene location depends on two factors.
• The first concerns the frequency of the desired cDNAs in the library. As with northern hybridization,
the problem relates to the different expression levels of different genes. If the DNA fragment being
studied contains one or more poorly expressed genes, then the relevant cDNAs will be rare in the
library and it might be necessary to screen many clones before the desired one is identified.
• To get around this problem, various methods of cDNA capture or cDNA selection have been
devised, in which the DNA fragment being studied is repeatedly hybridized to the pool of cDNAs in
order to enrich the pool for the desired clones.
• Because the cDNA pool contains so many different sequences, it is generally not possible to
discard all the irrelevant clones by these repeated hybridizations, but it is possible to increase
significantly the frequency of those clones that specifically hybridize to the DNA fragment. This
reduces the size of the library that must subsequently be screened under stringent conditions to
identify the desired clones.
• A second factor that determines success or failure is the completeness of the individual cDNA
molecules. Usually, cDNAs are made by copying RNA molecules into single-stranded DNA with
reverse transcriptase and then converting the single-stranded DNA into double-stranded DNA with a
DNA polymerase..
• There is always a chance that one or other of the strand synthesis reactions will not proceed to
completion, resulting in a truncated cDNA. The presence of intramolecular base pairs in the RNA can
also lead to incomplete copying. Truncated cDNAs may lack some of the information needed to locate
the start and end points of a gene and all its exon-intron boundaries.
Methods are available for precise mapping of the ends of transcripts – (RACE)
• The problems with incomplete cDNAs mean that more robust methods are needed for locating the
precise start and end points of gene transcripts.
• One possibility is a special type of PCR that uses RNA rather than DNA as the starting material. The
first step in this type of PCR is to convert the RNA into cDNA with reverse transcriptase, after which
the cDNA is amplified with Taq polymerase in the same way as in a normal PCR. These methods go
under the collective name of reverse transcriptase PCR (RT-PCR) but the particular version that
interests us at present is rapid amplification of cDNA ends (RACE).
• In the simplest form of this method, one of the primers is specific for an internal region close to the
beginning of the gene being studied. This primer attaches to the mRNA for the gene and directs the
first reverse transcriptase-catalyzed stage of the process, during which a cDNA corresponding to the
start of the mRNA is made.
• Because only a small segment of the mRNA is being copied, the expectation is that the cDNA
synthesis will not terminate prematurely, so one end of the cDNA will correspond exactly with the start
of the mRNA.
• Once the cDNA has been made, a short poly(A) tail is attached to its 3' end. The second primer
anneals to this poly(A) sequence and, during the first round of the normal PCR, converts the single-
stranded cDNA into a double-stranded molecule, which is subsequently amplified as the PCR
proceeds. The sequence of this amplified molecule will reveal the precise position of the start of the
transcript.
RACE – Rapid Amplification of cDNA Ends.
• The RNA being studied is converted into a partial cDNA by
extension of a DNA primer that anneals at an internal
position not too distant from the 5' end of the molecule.
• The 3' end of the cDNA is further extended by treatment with
terminal deoxynucleotidyl transferase in the presence of
dNTP, which results in a series of As being added to the
cDNA.
• This series of As acts as the annealing site for the anchor
primer. Extension of the anchor primer leads to a double-
stranded DNA molecule, which can now be amplified by a
standard PCR.
• This is 5'-RACE, so-called because it results in amplification
of the 5' end of the starting RNA. A similar method- 3'-RACE-
can be used if the 3' end- sequence is desired.
• Other methods for precise transcript mapping involve heteroduplex analysis. If the DNA region being
studied is cloned as a restriction fragment in an M13 vector then it can be obtained as single-stranded
DNA. When mixed with an appropriate RNA preparation, the transcribed sequence in the cloned DNA
hybridizes with the equivalent mRNA, forming a double-stranded heteroduplex.
• The start of this mRNA lies within the cloned restriction fragment, so some of the cloned fragment
participates in the heteroduplex, but the rest does not. The single-stranded regions can be digested
by treatment with a single-strand-specific nuclease such as S1.
• The size of the heteroduplex is determined by degrading the RNA component with alkali and
electrophoresing the resulting single-stranded DNA in an agarose gel.
• This size measurement is then used to position the start of the transcript relative to the restriction site
at the end of the cloned fragment. Heteroduplex analysis can also be used to locate exon-intron
boundaries.
Heteroduplex analysis.
Exon-intron boundaries can also be located with precision
• A second method for finding exons in a genome sequence is called exon trapping.
• This requires a special type of vector that contains a minigene consisting of two exons flanking an
intron sequence, the first exon being preceded by the sequence signals needed to initiate transcription
in a eukaryotic cell.
• To use the vector, the piece of DNA to be studied is inserted into a restriction site located within the
vector's intron region.
• The vector is then introduced into a suitable eukaryotic cell line, where it is transcribed and the RNA
produced from it is spliced.
• The result is that any exon contained in the genomic fragment becomes attached between the
upstream and downstream exons from the minigene.
• RT-PCR with primers annealing within the two minigene exons is now used to amplify a DNA fragment,
which is sequenced. As the mini- gene sequence is already known, the nucleotide positions at which
the insertedexon starts and ends can be determined, precisely delineating this exon.
THANK YOU……

More Related Content

Similar to Structural annotation................pptx

Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research Community
Monica Munoz-Torres
 
Comparative and functional genomics
Comparative and functional genomicsComparative and functional genomics
Comparative and functional genomics
Jalormi Parekh
 
Introduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research CommunityIntroduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research Community
Monica Munoz-Torres
 
Apollo : A workshop for the Manakin Research Coordination Network
Apollo: A workshop for the Manakin Research Coordination NetworkApollo: A workshop for the Manakin Research Coordination Network
Apollo : A workshop for the Manakin Research Coordination Network
Monica Munoz-Torres
 
prediction methods for ORF
prediction methods for ORFprediction methods for ORF
prediction methods for ORF
karamveer prajapat
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
Nawfal Aldujaily
 
Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical Notebook
Naima Tahsin
 
Genomics types
Genomics typesGenomics types
Genome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTKGenome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTK
Monica Munoz-Torres
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research community
Monica Munoz-Torres
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
Long Pei
 
Genomics(functional genomics)
Genomics(functional genomics)Genomics(functional genomics)
Genomics(functional genomics)
IndrajaDoradla
 
Functional annotation- prediction of genes.pptx
Functional annotation- prediction of genes.pptxFunctional annotation- prediction of genes.pptx
Functional annotation- prediction of genes.pptx
Sridharshinisathishk
 
Transcriptomics,techniqes, applications.pdf
Transcriptomics,techniqes, applications.pdfTranscriptomics,techniqes, applications.pdf
Transcriptomics,techniqes, applications.pdf
shinycthomas
 
Genome analysis2
Genome analysis2Genome analysis2
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
IJERD Editor
 
Introduction of bioinformatics
Introduction of bioinformaticsIntroduction of bioinformatics
Introduction of bioinformatics
Dr NEETHU ASOKAN
 
Genome Curation using Apollo
Genome Curation using ApolloGenome Curation using Apollo
Genome Curation using Apollo
Monica Munoz-Torres
 
Bioinformatics t8-go-hmm wim-vancriekinge_v2013
Bioinformatics t8-go-hmm wim-vancriekinge_v2013Bioinformatics t8-go-hmm wim-vancriekinge_v2013
Bioinformatics t8-go-hmm wim-vancriekinge_v2013
Prof. Wim Van Criekinge
 
1.introduction to genetic engineering and restriction enzymes
1.introduction to genetic engineering and restriction enzymes1.introduction to genetic engineering and restriction enzymes
1.introduction to genetic engineering and restriction enzymes
Getachew Birhanu
 

Similar to Structural annotation................pptx (20)

Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research Community
 
Comparative and functional genomics
Comparative and functional genomicsComparative and functional genomics
Comparative and functional genomics
 
Introduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research CommunityIntroduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research Community
 
Apollo : A workshop for the Manakin Research Coordination Network
Apollo: A workshop for the Manakin Research Coordination NetworkApollo: A workshop for the Manakin Research Coordination Network
Apollo : A workshop for the Manakin Research Coordination Network
 
prediction methods for ORF
prediction methods for ORFprediction methods for ORF
prediction methods for ORF
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical Notebook
 
Genomics types
Genomics typesGenomics types
Genomics types
 
Genome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTKGenome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTK
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research community
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
 
Genomics(functional genomics)
Genomics(functional genomics)Genomics(functional genomics)
Genomics(functional genomics)
 
Functional annotation- prediction of genes.pptx
Functional annotation- prediction of genes.pptxFunctional annotation- prediction of genes.pptx
Functional annotation- prediction of genes.pptx
 
Transcriptomics,techniqes, applications.pdf
Transcriptomics,techniqes, applications.pdfTranscriptomics,techniqes, applications.pdf
Transcriptomics,techniqes, applications.pdf
 
Genome analysis2
Genome analysis2Genome analysis2
Genome analysis2
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Introduction of bioinformatics
Introduction of bioinformaticsIntroduction of bioinformatics
Introduction of bioinformatics
 
Genome Curation using Apollo
Genome Curation using ApolloGenome Curation using Apollo
Genome Curation using Apollo
 
Bioinformatics t8-go-hmm wim-vancriekinge_v2013
Bioinformatics t8-go-hmm wim-vancriekinge_v2013Bioinformatics t8-go-hmm wim-vancriekinge_v2013
Bioinformatics t8-go-hmm wim-vancriekinge_v2013
 
1.introduction to genetic engineering and restriction enzymes
1.introduction to genetic engineering and restriction enzymes1.introduction to genetic engineering and restriction enzymes
1.introduction to genetic engineering and restriction enzymes
 

More from Cherry

Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptx
Cherry
 
INDUSTRIAL PRODUCTION OF ETHANOL.....pptx
INDUSTRIAL PRODUCTION OF ETHANOL.....pptxINDUSTRIAL PRODUCTION OF ETHANOL.....pptx
INDUSTRIAL PRODUCTION OF ETHANOL.....pptx
Cherry
 
AMYLASE..............................pptx
AMYLASE..............................pptxAMYLASE..............................pptx
AMYLASE..............................pptx
Cherry
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
Cherry
 
RETROGRESSIVE CHANGES, CONCEPT OF CLIMAX COMMUNITIES AND RESILIENCE OF COMMU...
RETROGRESSIVE CHANGES, CONCEPT OF  CLIMAX COMMUNITIES AND RESILIENCE OF COMMU...RETROGRESSIVE CHANGES, CONCEPT OF  CLIMAX COMMUNITIES AND RESILIENCE OF COMMU...
RETROGRESSIVE CHANGES, CONCEPT OF CLIMAX COMMUNITIES AND RESILIENCE OF COMMU...
Cherry
 
COMMUNITY DYNAMICS CHARACTERISTICS- CYCLIC AND NON-CYCLIC REPLACEMENT CHANGES...
COMMUNITY DYNAMICS CHARACTERISTICS- CYCLIC AND NON-CYCLIC REPLACEMENT CHANGES...COMMUNITY DYNAMICS CHARACTERISTICS- CYCLIC AND NON-CYCLIC REPLACEMENT CHANGES...
COMMUNITY DYNAMICS CHARACTERISTICS- CYCLIC AND NON-CYCLIC REPLACEMENT CHANGES...
Cherry
 
Remote sensing.......................pptx
Remote sensing.......................pptxRemote sensing.......................pptx
Remote sensing.......................pptx
Cherry
 
METHODS OF TRANSCRIPTOME ANALYSIS....pptx
METHODS OF TRANSCRIPTOME ANALYSIS....pptxMETHODS OF TRANSCRIPTOME ANALYSIS....pptx
METHODS OF TRANSCRIPTOME ANALYSIS....pptx
Cherry
 
AIZOACEAE............................pptx
AIZOACEAE............................pptxAIZOACEAE............................pptx
AIZOACEAE............................pptx
Cherry
 
Cryoprervation techniques.............pptx
Cryoprervation techniques.............pptxCryoprervation techniques.............pptx
Cryoprervation techniques.............pptx
Cherry
 
APPLICATIONS OF GM ANIMALS...........pptx
APPLICATIONS OF GM ANIMALS...........pptxAPPLICATIONS OF GM ANIMALS...........pptx
APPLICATIONS OF GM ANIMALS...........pptx
Cherry
 
Tropical coastal ecosystems...........pptx
Tropical coastal ecosystems...........pptxTropical coastal ecosystems...........pptx
Tropical coastal ecosystems...........pptx
Cherry
 
Phytogeography........................pptx
Phytogeography........................pptxPhytogeography........................pptx
Phytogeography........................pptx
Cherry
 
Adventitious shoot regeneration.....pptx
Adventitious shoot regeneration.....pptxAdventitious shoot regeneration.....pptx
Adventitious shoot regeneration.....pptx
Cherry
 
Tissue engineering......................pptx
Tissue engineering......................pptxTissue engineering......................pptx
Tissue engineering......................pptx
Cherry
 
Triploidy ...............................pptx
Triploidy ...............................pptxTriploidy ...............................pptx
Triploidy ...............................pptx
Cherry
 
SYNTHETIC SEED PRODUCTION.............pptx
SYNTHETIC SEED PRODUCTION.............pptxSYNTHETIC SEED PRODUCTION.............pptx
SYNTHETIC SEED PRODUCTION.............pptx
Cherry
 
Reporter genes.......................pptx
Reporter genes.......................pptxReporter genes.......................pptx
Reporter genes.......................pptx
Cherry
 
Somaclonal Variation.....................pptx
Somaclonal Variation.....................pptxSomaclonal Variation.....................pptx
Somaclonal Variation.....................pptx
Cherry
 
INSERTIONAL INACTIVATION AND COMPLEMENTATION OF DEFINED MUTATION (1).pptx
INSERTIONAL INACTIVATION AND COMPLEMENTATION OF DEFINED MUTATION (1).pptxINSERTIONAL INACTIVATION AND COMPLEMENTATION OF DEFINED MUTATION (1).pptx
INSERTIONAL INACTIVATION AND COMPLEMENTATION OF DEFINED MUTATION (1).pptx
Cherry
 

More from Cherry (20)

Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptx
 
INDUSTRIAL PRODUCTION OF ETHANOL.....pptx
INDUSTRIAL PRODUCTION OF ETHANOL.....pptxINDUSTRIAL PRODUCTION OF ETHANOL.....pptx
INDUSTRIAL PRODUCTION OF ETHANOL.....pptx
 
AMYLASE..............................pptx
AMYLASE..............................pptxAMYLASE..............................pptx
AMYLASE..............................pptx
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
 
RETROGRESSIVE CHANGES, CONCEPT OF CLIMAX COMMUNITIES AND RESILIENCE OF COMMU...
RETROGRESSIVE CHANGES, CONCEPT OF  CLIMAX COMMUNITIES AND RESILIENCE OF COMMU...RETROGRESSIVE CHANGES, CONCEPT OF  CLIMAX COMMUNITIES AND RESILIENCE OF COMMU...
RETROGRESSIVE CHANGES, CONCEPT OF CLIMAX COMMUNITIES AND RESILIENCE OF COMMU...
 
COMMUNITY DYNAMICS CHARACTERISTICS- CYCLIC AND NON-CYCLIC REPLACEMENT CHANGES...
COMMUNITY DYNAMICS CHARACTERISTICS- CYCLIC AND NON-CYCLIC REPLACEMENT CHANGES...COMMUNITY DYNAMICS CHARACTERISTICS- CYCLIC AND NON-CYCLIC REPLACEMENT CHANGES...
COMMUNITY DYNAMICS CHARACTERISTICS- CYCLIC AND NON-CYCLIC REPLACEMENT CHANGES...
 
Remote sensing.......................pptx
Remote sensing.......................pptxRemote sensing.......................pptx
Remote sensing.......................pptx
 
METHODS OF TRANSCRIPTOME ANALYSIS....pptx
METHODS OF TRANSCRIPTOME ANALYSIS....pptxMETHODS OF TRANSCRIPTOME ANALYSIS....pptx
METHODS OF TRANSCRIPTOME ANALYSIS....pptx
 
AIZOACEAE............................pptx
AIZOACEAE............................pptxAIZOACEAE............................pptx
AIZOACEAE............................pptx
 
Cryoprervation techniques.............pptx
Cryoprervation techniques.............pptxCryoprervation techniques.............pptx
Cryoprervation techniques.............pptx
 
APPLICATIONS OF GM ANIMALS...........pptx
APPLICATIONS OF GM ANIMALS...........pptxAPPLICATIONS OF GM ANIMALS...........pptx
APPLICATIONS OF GM ANIMALS...........pptx
 
Tropical coastal ecosystems...........pptx
Tropical coastal ecosystems...........pptxTropical coastal ecosystems...........pptx
Tropical coastal ecosystems...........pptx
 
Phytogeography........................pptx
Phytogeography........................pptxPhytogeography........................pptx
Phytogeography........................pptx
 
Adventitious shoot regeneration.....pptx
Adventitious shoot regeneration.....pptxAdventitious shoot regeneration.....pptx
Adventitious shoot regeneration.....pptx
 
Tissue engineering......................pptx
Tissue engineering......................pptxTissue engineering......................pptx
Tissue engineering......................pptx
 
Triploidy ...............................pptx
Triploidy ...............................pptxTriploidy ...............................pptx
Triploidy ...............................pptx
 
SYNTHETIC SEED PRODUCTION.............pptx
SYNTHETIC SEED PRODUCTION.............pptxSYNTHETIC SEED PRODUCTION.............pptx
SYNTHETIC SEED PRODUCTION.............pptx
 
Reporter genes.......................pptx
Reporter genes.......................pptxReporter genes.......................pptx
Reporter genes.......................pptx
 
Somaclonal Variation.....................pptx
Somaclonal Variation.....................pptxSomaclonal Variation.....................pptx
Somaclonal Variation.....................pptx
 
INSERTIONAL INACTIVATION AND COMPLEMENTATION OF DEFINED MUTATION (1).pptx
INSERTIONAL INACTIVATION AND COMPLEMENTATION OF DEFINED MUTATION (1).pptxINSERTIONAL INACTIVATION AND COMPLEMENTATION OF DEFINED MUTATION (1).pptx
INSERTIONAL INACTIVATION AND COMPLEMENTATION OF DEFINED MUTATION (1).pptx
 

Recently uploaded

Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
PirithiRaju
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
frank0071
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
fermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptxfermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptx
ananya23nair
 
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
eitps1506
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
vadgavevedant86
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills MN
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Sérgio Sacani
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
Ritik83251
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
Scintica Instrumentation
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Sérgio Sacani
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
sammy700571
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
PirithiRaju
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
Tissue fluids_etiology_volume regulation_pressure.pptx
Tissue fluids_etiology_volume regulation_pressure.pptxTissue fluids_etiology_volume regulation_pressure.pptx
Tissue fluids_etiology_volume regulation_pressure.pptx
muralinath2
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Sérgio Sacani
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 

Recently uploaded (20)

Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
fermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptxfermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptx
 
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
 
Tissue fluids_etiology_volume regulation_pressure.pptx
Tissue fluids_etiology_volume regulation_pressure.pptxTissue fluids_etiology_volume regulation_pressure.pptx
Tissue fluids_etiology_volume regulation_pressure.pptx
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 

Structural annotation................pptx

  • 1. STRUCTURAL ANNOTATION • Structural annotation of genome by computer analysis of sequence data and experimental techniques
  • 2. INTRODUCTION • The scope of genome annotation has expanded since the first complete annotation of the Haemophilus influenzae genome in 1995( Fleishmann et al., 1995). • Once a DNA sequence has been obtained, whether it is the sequence of a single cloned fragment or of an entire chromosome, then various methods can be employed to locate the genes that are present. • These methods can be divided into those that involve simply inspecting the sequence, by eye or more frequently by computer, to look for the special sequence features associated with genes, and those methods that locate genes by experimental analysis of the DNA sequence. The computer methods form part of the methodology called bioinformatics. • The first software used to analyze sequencing reads is the ‘Staden Package’ created by Rodger Staden in 1977( Staden, 1977).
  • 3. STRUCTURAL ANNOTATION • Finding features of DNA—exons, introns, promoters, transposons, etc.—is known as structural annotation. Structural annotation attempts to find genes in a genomic sequence. • A gene can be defined as "a sequence region necessary for generating functional products" . Functional products of genes are proteins and RNAs. Genes that lead to the production of proteins are called protein-coding genes. • Other genes that do not code proteins, but instead functional RNA molecules, are called noncoding genes. Noncoding RNA genes include genes for ribosomal RNA (rRNA), transfer RNA (tRNA), microRNA (miRNA), small nuclear RNA and nucleolar RNA (snRNA and snoRNA, respectively) and long noncoding RNA (lncRNA). • Structural annotations also identify pseudogenes. They were initially considered to be functionless and evolutionary dead-ends. We now know that they sometimes participate in gene regulation. Hence, their prediction improves our understanding of genomes.
  • 4. • Sequence inspection can be used to locate genes because genes are not random series of nucleotides but instead have distinctive features. • At present we do not fully understand the nature of all of these specific features, and sequence inspection is therefore not a foolproof way of locating genes, but it is still a powerful tool and is usually the first method that is applied to analysis of a new genome sequence. GENE LOCATION BY SEQUENCE INSPECTION
  • 5. The coding regions of genes are open reading frames • Genes that code for proteins comprise open reading frames (ORFs) consisting of a series of codons that specify the amino acid sequence of the protein that the gene codes for. • The ORF begins with an initiation codon usually (but not always) ATG-and ends with a termination codon: TAA, TAG, or TGA . Searching a DNA sequence for ORFs that begin with an ATG and end with a termination triplet is therefore one way of looking for genes. • The analysis is complicated by the fact that each DNA sequence has six reading frames, three in one direction and three in the reverse direction on the complementary strand , but computers are quite capable of scanning all six reading frames for ORFs. A PROTEIN-CODING GENE IS AN OPEN READING FRAME OF TRIPLET CODONS
  • 6. • With bacterial genomes, simple ORF scanning is an effective way of locating most of the genes in a DNA sequence. • With bacteria the analysis is further simplified by the fact that the genes are very closely spaced and hence there is relatively little intergenic DNA in the genome (only 11% for E. coli). • If we assume that the real genes do not overlap, which is true for most bacterial genes, then it is only in the intergenic regions that there is a possibility of mistaking a short, spurious ORF for a real gene. • So if the intergenic component of a genome is small, then there is a reduced chance of making mistakes in interpreting the results of a simple ORF scan.
  • 7. Repeats • The first step in structural annotation involves repeat masking. DNA repeats occur in both prokaryotic and eukaryotic organisms. • The repeats account for 0% to over 42% of the prokaryotic genome . Similarly, eukaryotic genomes can harbor millions of repeats. • For instance, repeats account for two-thirds of the human genome . Repeat sequences can be localized in tandem, i.e., adjacent to one other, and are typically found in the centromere . Alternatively, they can be interspersed in different forms of transposable elements, e.g., in long and short interspersed nuclear elements (LINEs and SINEs), DNA transposons, etc. . • Repeat masking tools rely on databases with lists of already identified repeats. RepeatMasker is a good example of such tool. • Aligning transcript and protein evidence after masking is the second step of structural annotation before gene identification, although it is not mandatory. BLAST or BLAT can be used to align the transcript and protein evidence. • Further, RNA-seq evidence can be aligned using TopHat or HISAT .
  • 8. Predictions of Gene and Different Features • Identifying protein-coding genes and other regulatory elements takes center stage in gene annotation. Gene prediction is a complex process, especially for eukaryotic DNA . • The varying sizes of introns(noncoding sequences) in-between exons and alternative splice variants make gene structure prediction difficult. • Many gene prediction programs exist. They can be categorized into three groups: ab initio methods, homology-based methods, and combined methods. • Approaches for gene prediction based on nucleotide sequence are called ab initio methods. Ab initio approaches rely on statistical models, such as the hidden Markov model (HMM), to identify promoters, coding or noncoding regions, and intron–exon junctions in the genome sequence. • The second approach aligns the sequence with expressed sequence tags (EST), complementary DNA (cDNA), or protein evidence, and uses detected similarities for gene prediction.
  • 9. • The other group comprises programs that combine ab initio and evidence- or homology-based approaches for gene prediction. • In addition, gene prediction programs should be able to predict alternative splicing sites because alternative splicing is a major actor in the regulation of gene expression, and transcriptome and proteome diversity . • Accordingly, gene prediction programs use various models to predict splice sites. Since approximately 99% of the introns in sequenced genomes begin with GT and end with AG, these features are denoted as mandatory by most gene prediction systems for splice site detection. • In addition, incorporation of a strong splice donor consensus, such as the GC–AG splice site, improves the accuracy of gene prediction programs.
  • 13. Databases for Structural Annotation • Annotations require supporting data that can be used or presented as evidence of predicted assignments. Currently, homology-based methods play a central role in genome annotation because of the huge amount of EST and cDNA sequences available . • Homology-based methods depend on DNA, RNA, or protein sequence alignment data, which can easily be retrieved from biological databases. Ab initio annotations, on the other hand, identify genes and their structures using mathematical models. • Nonetheless, the ab initio gene predictors have to be trained using high-quality gene models or organism-specific genome traits, such as codon frequency and intron–exon length distribution . Further, ab initio models require ESTs, RNA-seq data, and proteins to improve prediction accuracy. • Databases readily provide such data. Nucleotide and protein sequence or structure can easily be found in comprehensive public-domain databases, e.g., the GenBank , European Nucleotide Archive (ENA) , and DNA Databank of Japan (DDBJ). UniProt , which is a protein sequence database that combines UniProtKB/Swiss-Prot (over 560,000 manually curated sequences) and UniProtKB/TrEMBL (180 million automatically annotated sequences), provides the scientific community with high-quality and freely accessible protein sequences with the associated functional information.
  • 14. Comparative Annotation Methods • Genome annotation achieved by comparison of genes and genomes across species can be a reliable information source for understanding genome evolution. Comparative annotation allows annotations of a well-studied genome to be projected onto an evolutionarily close species. It often focuses on the coding genes. • Valuable information for comparative annotation can be found from genome alignment. A well-aligned genome will yield sound data for comparative annotation . • Approaches to comparative annotation of genomes can be categorized into ab initio methods and homology-based methods, considering the input information used for annotation, i.e., either a statistical model of genes, or protein sequence, EST, and cDNA, accordingly. • Ab initio approaches are preferred for genes that are weakly or not at all represented in RNA-seq library and have insufficient similarity to any known protein and lack other evidence.
  • 15. • Related species have genomes that share similarities inherited from their common ancestor, over- laid with species-specific differences that have arisen since the species began to evolve independently. Because of natural selection, the sequence similarities between related genomes are greatest within the genes and lowest in the intergenic regions. • Therefore, when related genomes are compared, homologous genes are easily identified because they have high sequence similarity, and any ORF that does not have a clear homolog in the second genome can be discounted as almost certainly being a chance sequence and not a genuine gene. This type of analysis-called comparative genomics
  • 16. Homology-Based Annotation • For predict and annotate genes by identifying significant matches from a well annotated genome sequence by employing alignment tools such as BLAST. • Homology-based annotations use the coding sequences (CDS), usually protein sequences and sometimes transcripts in the form of mRNA, cDNA, or EST to predict genes, assuming similar sequence regions encode homologous proteins. • Tools like Exonerate and DIALIGN can be used for sequence alignment; GenomeThreader and AGenDA are used for gene predictions. Increased evolutionary distance between the input protein and the target protein reduces the accuracy of homology-based gene finding. This happens because of heavy reliance on the alignment and information derived from the already known genes, which creates a challenge in identifying genes whose properties are different from those of referenced genes. • However, newer comparative approaches solve this issue by relying to a greater degree on sequence conservation, which enables them to identify genes with new features and different statistical composition. • TWINSCAN and SGP2 are examples of tools in which gene prediction uses the analysis of sequence conservation patterns between genomic sequences of evolutionarily related organisms.
  • 17. Ab Initio Annotation • Ab initio annotation relies on ab initio gene predictors, which in turn rely on training data to construct an algorithm or model. • Prediction is done based on the genomic sequence in question, using statistical analysis and other gene signals such as k-mer statistics and frame length. • Some popular ab initio gene predictors are discussed below. AUGUSTUS defines the probability distributions for eukaryotic genome sequences based on GHMM. AUGUSTUS is re-trainable and it can predict alterative splicing, and the 50UTR and 30UTR, including introns. AUGUSTUS is one of the most accurate ab initio gene prediction programs for the species it has been trained for . • FGENESH is an HMM-based, very fast, and accurate ab initio gene structure prediction program for humans, Drosophila, plants, yeasts, and nematodes. • This renders it the fastest tool among HMM-based gene finding programs. GENSCAN is another HMM- based ab initio tool for predicting locations and exon–intron structures of genes in genomic sequences of a variety of organisms.
  • 18. • Vertebrate and invertebrate versions of GENSCAN are available. The accuracy of the latter is lower because the original tool was primarily designed for the detection of genes in human and vertebrate genomic sequences. • It is becoming a common practice to use ab initio annotation methods in combine a sequences transcriptome information such as that provided by RNA-seq. • This can be viewed as an evidence-based or extrinsic approach. For example, a newer version of AUGUSTUS can incorporate information from EST and protein alignments. • In addition, a variant of FGENESH called FGENESH-C uses HMM and cDNA for predictions, while GenomeScan (an extension of GENSCAN) uses extrinsic information of protein BLAST alignments for gene structure prediction.
  • 19. Ab initio and Homology based annotation tools summary.
  • 20. Annotation Pipelines • Analysis of large amounts of data generated by the sequencing requires multiple computationally-intensive steps . Sets of algorithms that process sequence data and are executed in a predefined order are called a bioinformatic pipelines. • Pipelines process massive amounts of sequence data and the associated metadata using multiple software components, databases, and environments. • They are comprehensive, holistic packages that try to exploit relevant information provided by both ab initio and similarity-based gene predictors.
  • 21. Structural Pipelines • MAKER2 is a multi-threaded, parallelized genome annotation and data management application, which builds up on MAKER. • Ab initio gene prediction tools SNAP, AUGUSTUS, and GenMark-ES are integrated in MAKER2. Novel genomes with limited training data available can be annotated with MAKER2. The tool can also be used to improve annotation quality by integrating mRNA-seq data. • NCBI Eukaryotic Annotation Pipeline is an automated pipeline for eukaryotes, in which coding and noncoding genes, transcripts, and proteins in both finished and draft genomes can be annotated. This pipeline uses Splign and ProSplign for alignment. It also has its own gene prediction tool called GNOMON which combines HMM-based ab initio models and homology search information extracted from experimental evidence. • Comparative Annotation Toolkit (CAT) is a fully open-source software toolkit for end-to-end annotation. CAT uses Progressive Cactus for multiple alignments. It’s output, together with previously annotated genomes, is used to project annotations using TransMAP .
  • 22. • CAT uses AUGUSTUS for gene prediction both from transMap projections and for ab initio gene prediction. • CAT wan developed by the GENCODE, and was utilized for the annotation of genomes of laboratory mouse strains and great apes . • BRAKER1 is a fully automated and highly accurate unsupervised RNA-seq–based genome annotation pipeline for eukaryotic genomes.
  • 23. Annotation Visualization • File Formats Most bioinformatic tools use the FASTA format as a standard for sequence data sharing. The FASTA format is used for searching sequence databases, evaluating similarity scores, and identification of periodic similarity scores. Other standard file formats exisformat can accommodate additional information and can be used by different programs, and interpreformat human users. It format genomic features in a standard text file format. • Genome Browsers • Researchers and users utilize genome browsers to integrate various types of information, as well as analyze and visualize data related to annotation. • Genome browsers are usually used to efficiently and conveniently browse, search, retrieve, and examine genomic sequence and annotation data, via a graphical interface. The UCSC Genome Browser is the most commonly used genome browser; many visualization tools are modeled based on this tool. • The Ensembl genome browser is another widely used genome browser for vertebrate genomes, which supports comparative genomics, sequence variation analysis, and transcriptional regulation analysis. • Generic Model Organism Database (GMOD) is a collection of interconnected open-source software tools and databases for managing, visualizing, storing, and sharing genetic and genomic information.
  • 24. Re-Annotation • We have seen that as a result of the increasing volume of data from genome sequencing projects, computational analysis methods have become a considerable element of genome annotations. However, this has led to high levels of misannotation in public databases . • Re-annotation benefits the end-user by providing the latest resources. Updating a previously annotated genome can be seen as re-annotation . Automated annotations save time and resources, but manual annotations, although time-consuming, are better than automated annotations. • Re-annotation can be used to create large complete genomes, and indeed, there are tools that can be used for this purpose. Restauro-G is rapid bacterial genome re-annotation software that utilizes a BLAST-like alignment tool for re-annotation. • MAKER2 incorporates an external annotation pass-through mechanism that accepts pre-existing genome annotations. • Wiki-based sites have been proven successful in providing accurate, useful, and updated information, despite the fear of being filled with unreliable and inaccurate data. Currently, new information emerges from different corners of bioinformatic fields, which impacts gene annotation, rendering re-annotation a never- ending process, to some degree.
  • 25. EXPERIMENTAL TECHNIQUES FOR GENE LOCATION • Most experimental methods for gene location are not based on direct examination of DNA molecules but instead rely on detection of the RNA molecules that are transcribed from genes. • All genes are transcribed into RNA, and if the gene is discontinuous then the primary transcript is subsequently processed to remove the introns and link up the exons . • Techniques that map the positions of transcribed sequences in a DNA fragment can therefore be used to locate exons and entire genes. The only problem to be kept in mind is that the transcript is usually longer than the coding part of the gene because it begins several tens of nucleotides upstream of the initiation codon and continues several tens or hundreds of nucleotides downstream of the termination codon.
  • 26. Hybridization tests can determine if a fragment contains transcribed sequences • The simplest procedures for studying transcribed sequences are based on hybridization analysis. • RNA molecules can be separated by specialized forms of agarose gel electrophoresis, transferred to a nitrocellulose or nylon membrane, and examined by the process called northern hybridization. • This differs from Southern hybridization only in the precise conditions under which the transfer is carried out, and the fact that it was not invented by a Dr Northern and so does not have a capital "N." • If a northern blot of cellular RNA is probed with a labeled fragment of the genome, then RNAs transcribed from genes within that fragment will be detected. Northern hybridization is therefore, theoretically, a means of determining the number of genes present in a DNA fragment and the size of each coding region.
  • 27. • An RNA is electrophoresed under denaturing conditions in an agarose gel. • After ethidium bromide staining, two bands are seen. These are the two largest rRNA molecules , which are abundant in most cells. • The smaller rRNAs, which are also abundant, are not seen because they are so short that they run out the bottom of the gel and, in most cells, none of the mRNAs are enough to form a band visible after ethidium bromide staining. • The gel is blotted onto a nylon membrane and, in this example, probed with a radioactively labeled DNA fragment. • A single band is visible on the autoradiograph, showing that the DNA fragment used as the probe contains part or all of one transcribed sequence. northern hybridization
  • 28. zoo-blotting • A second type of hybridization analysis avoids the problems with poorly expressed and tissue-specific genes by searching not for RNAs but for related sequences in the DNAs of other organisms. • This approach, like homology searching, is based on the fact that homologous genes in related organisms have similar sequences, whereas the intergenic DNA is usually quite different. a DNA from one species is used to probe a Southern transfer of DNAs from related species, and one or more hybridization signals are obtained, then it is likely that the probe contains one or more genes. This is called zoo-blotting. zoo-blotting.
  • 29. cDNA sequencing enables genes to be mapped within DNA fragments • Northern hybridization and zoo-blotting enable the presence or absence of genes in a DNA fragment to be determined, but give no positional information relating to the location of those genes in the DNA sequence. The easiest way to obtain this information is to sequence the relevant cDNAs. • A cDNA is a copy of an mRNA and so corresponds to the coding region of a gene, plus any leader or trailer sequences that are also transcribed. Comparing a cDNA sequence with a genomic DNA sequence therefore delineates the position of the relevant gene and reveals the exon-intron boundaries. • In order to obtain an individual cDNA, a cDNA library must first be prepared from all of the mRNA in the tissue being studied. Once the library has been prepared, the success of cDNA sequencing as a means of gene location depends on two factors.
  • 30. • The first concerns the frequency of the desired cDNAs in the library. As with northern hybridization, the problem relates to the different expression levels of different genes. If the DNA fragment being studied contains one or more poorly expressed genes, then the relevant cDNAs will be rare in the library and it might be necessary to screen many clones before the desired one is identified. • To get around this problem, various methods of cDNA capture or cDNA selection have been devised, in which the DNA fragment being studied is repeatedly hybridized to the pool of cDNAs in order to enrich the pool for the desired clones. • Because the cDNA pool contains so many different sequences, it is generally not possible to discard all the irrelevant clones by these repeated hybridizations, but it is possible to increase significantly the frequency of those clones that specifically hybridize to the DNA fragment. This reduces the size of the library that must subsequently be screened under stringent conditions to identify the desired clones.
  • 31. • A second factor that determines success or failure is the completeness of the individual cDNA molecules. Usually, cDNAs are made by copying RNA molecules into single-stranded DNA with reverse transcriptase and then converting the single-stranded DNA into double-stranded DNA with a DNA polymerase.. • There is always a chance that one or other of the strand synthesis reactions will not proceed to completion, resulting in a truncated cDNA. The presence of intramolecular base pairs in the RNA can also lead to incomplete copying. Truncated cDNAs may lack some of the information needed to locate the start and end points of a gene and all its exon-intron boundaries.
  • 32. Methods are available for precise mapping of the ends of transcripts – (RACE) • The problems with incomplete cDNAs mean that more robust methods are needed for locating the precise start and end points of gene transcripts. • One possibility is a special type of PCR that uses RNA rather than DNA as the starting material. The first step in this type of PCR is to convert the RNA into cDNA with reverse transcriptase, after which the cDNA is amplified with Taq polymerase in the same way as in a normal PCR. These methods go under the collective name of reverse transcriptase PCR (RT-PCR) but the particular version that interests us at present is rapid amplification of cDNA ends (RACE).
  • 33. • In the simplest form of this method, one of the primers is specific for an internal region close to the beginning of the gene being studied. This primer attaches to the mRNA for the gene and directs the first reverse transcriptase-catalyzed stage of the process, during which a cDNA corresponding to the start of the mRNA is made. • Because only a small segment of the mRNA is being copied, the expectation is that the cDNA synthesis will not terminate prematurely, so one end of the cDNA will correspond exactly with the start of the mRNA. • Once the cDNA has been made, a short poly(A) tail is attached to its 3' end. The second primer anneals to this poly(A) sequence and, during the first round of the normal PCR, converts the single- stranded cDNA into a double-stranded molecule, which is subsequently amplified as the PCR proceeds. The sequence of this amplified molecule will reveal the precise position of the start of the transcript.
  • 34. RACE – Rapid Amplification of cDNA Ends. • The RNA being studied is converted into a partial cDNA by extension of a DNA primer that anneals at an internal position not too distant from the 5' end of the molecule. • The 3' end of the cDNA is further extended by treatment with terminal deoxynucleotidyl transferase in the presence of dNTP, which results in a series of As being added to the cDNA. • This series of As acts as the annealing site for the anchor primer. Extension of the anchor primer leads to a double- stranded DNA molecule, which can now be amplified by a standard PCR. • This is 5'-RACE, so-called because it results in amplification of the 5' end of the starting RNA. A similar method- 3'-RACE- can be used if the 3' end- sequence is desired.
  • 35. • Other methods for precise transcript mapping involve heteroduplex analysis. If the DNA region being studied is cloned as a restriction fragment in an M13 vector then it can be obtained as single-stranded DNA. When mixed with an appropriate RNA preparation, the transcribed sequence in the cloned DNA hybridizes with the equivalent mRNA, forming a double-stranded heteroduplex. • The start of this mRNA lies within the cloned restriction fragment, so some of the cloned fragment participates in the heteroduplex, but the rest does not. The single-stranded regions can be digested by treatment with a single-strand-specific nuclease such as S1. • The size of the heteroduplex is determined by degrading the RNA component with alkali and electrophoresing the resulting single-stranded DNA in an agarose gel. • This size measurement is then used to position the start of the transcript relative to the restriction site at the end of the cloned fragment. Heteroduplex analysis can also be used to locate exon-intron boundaries. Heteroduplex analysis.
  • 36. Exon-intron boundaries can also be located with precision • A second method for finding exons in a genome sequence is called exon trapping. • This requires a special type of vector that contains a minigene consisting of two exons flanking an intron sequence, the first exon being preceded by the sequence signals needed to initiate transcription in a eukaryotic cell. • To use the vector, the piece of DNA to be studied is inserted into a restriction site located within the vector's intron region. • The vector is then introduced into a suitable eukaryotic cell line, where it is transcribed and the RNA produced from it is spliced. • The result is that any exon contained in the genomic fragment becomes attached between the upstream and downstream exons from the minigene. • RT-PCR with primers annealing within the two minigene exons is now used to amplify a DNA fragment, which is sequenced. As the mini- gene sequence is already known, the nucleotide positions at which the insertedexon starts and ends can be determined, precisely delineating this exon.