ESTs are small pieces of DNA sequence (usually 100 to 800 nucleotides long) generated by sequencing randomly selected cDNA clones from a library
Expressed Sequence Tags (ESTs) are short , single-pass sequence reads from mRNA (cDNA).
ESTs are bits of DNA sequence that represent genes expressed in certain cells, tissues, or organs from different organisms and
use these " tags " to fish a gene out of a portion of chromosomal DNA by matching base pairs.
Identify unknown genes and to map their positions within a genome.
ESTs provide researchers with a quick and inexpensive route for discovering new genes ,
For obtaining data on gene expression and regulation,
constructing genome maps.
Economical approach to identify and characterize expressed genes
EST represent a snapshot of genes expressed in a given tissue and/or at a given developmental stage.
mRNA is very unstable outside of a cell; therefore they are converted to complementary DNA (cDNA) .
cDNA is a much more stable compound and, importantly, because it is generated from a mRNA in which the introns have been removed,
cDNA represents only expressed DNA sequence .
Sequencing only the beginning portion of the cDNA produces 5' EST .
5' EST is obtained from the portion of a transcript that usually codes for a protein. These regions tend to be conserved across species and do not change much within a gene family .
Sequencing the ending portion of the cDNA molecule produces 3' EST .
3' EST are likely to fall within non-coding, or untranslated regions (UTRs) , and therefore tend to exhibit less cross-species conservation than do coding sequences.
ESTs as Genome Landmarks
The 3' ESTs serve as a common source of STSs because of their likelihood of being unique to a particular
An STS is a short DNA sequence that is easily recognizable and occurs only once in a genome (or chromosome
ESTs as Gene Discovery Resources
Tools in the hunt for genes involved in hereditary diseases .
ESTs are generated rapidly and inexpensively , only one sequencing experiment is needed per each cDNA generated, and they do not have to be checked for sequencing errors .
Using ESTs, scientists have rapidly isolated some of the genes involved in Alzheimer's disease and colon cancer.
In 1992, scientists at NCBI developed a new database designed to serve as a collection point for ESTs.
Once an EST that was submitted to GenBank had been screened and annotated , it was then deposited in this new database, called dbEST .
Scientists at NCBI created dbEST to organize, store, and provide access to the great mass of public EST data that has already accumulated and that continues to grow daily.
Using dbEST, a scientist can access not only data on human ESTs but information on ESTs from over 300 other organisms as well.
EST sequences are included in the EST division of GenBank , available from NCBI by anonymous ftp and through Entrez .
The nucleotide sequences may be searched using the BLAST electronic mail server.
The TBLASTN program which takes an amino acid query sequence and compares it with six-frame translations of dbEST DNA sequences is particularly useful.
EST sequences are also available as a flat file in the FASTA format by anonymous ftp in the /repository/db EST directory at ftp.ncbi.nih.gov
"Trace data" for sequences derived from I.M.A.G.E clones is available from the Washington University Genome Sequencing Center.
Traces are retrieved using the identifier labelled " EST name " in the dbEST record (e.g. yb01a01.s1 , corresponding to GenBank accession number T48601 ). Note that this identifier may also be found in the DEFINITION line of the GenBank Record
Clones may be ordered using the identifier labelled " Clone Id " in the dbEST record (e.g. 69864 , corresponding to GenBank accession number T48601 )
This identifier is also available in the DEFINITION line of the corresponding GenBank Record
It is very difficult to isolate mRNA from some tissues and cell types.
Second is that important gene regulatory sequences may be found within an intron .
For most ESTs, there is no indication as to the gene from which it is derived
Because a gene can be expressed as mRNA many, many times, ESTs ultimately derived from this mRNA may be redundant .
There may be many identical, or similar, copies of the same EST.
To resolve the redundancy and overlap problem, NCBI investigators developed the UniGene database
UniGene cluster EST sequences with traditional gene sequences
It automatically partitions GenBank sequences into a non-redundant set of ESTs
For each cluster additional information are included
Clustering : associate individual EST sequences with unique transcripts or genes
Assembling : derive consensus sequences from overlapping ESTs belonging to the same cluster
Mapping : associate ESTs (or EST contigs) with exons in genomic sequences
Interpreting : find and correct coding regions
Incyte has created the LifeSeq databases.
SANBI in South Africa produces the STACK collection of human EST contigs
MIPS in Munich and the SIB produce BLAST-searchable contigs from Unigene
TIGEM in Italy has a nice collection of EST search and assembly tools, local & remote
The CBIL at the U. of Pennsylvania has assembled the DOTS database