Est database


Published on

Published in: Education, Technology

Est database

  2. 2. <ul><li>ESTs are small pieces of DNA sequence (usually 100 to 800 nucleotides long) generated by sequencing randomly selected cDNA clones from a library </li></ul><ul><li>Expressed Sequence Tags (ESTs) are short , single-pass sequence reads from mRNA (cDNA). </li></ul><ul><li>ESTs are bits of DNA sequence that represent genes expressed in certain cells, tissues, or organs from different organisms and </li></ul><ul><li>use these &quot; tags &quot; to fish a gene out of a portion of chromosomal DNA by matching base pairs. </li></ul>
  3. 3. <ul><li>Identify unknown genes and to map their positions within a genome. </li></ul><ul><li>ESTs provide researchers with a quick and inexpensive route for discovering new genes , </li></ul><ul><li>For obtaining data on gene expression and regulation, </li></ul><ul><li>constructing genome maps. </li></ul><ul><li>Economical approach to identify and characterize expressed genes </li></ul><ul><li>EST represent a snapshot of genes expressed in a given tissue and/or at a given developmental stage. </li></ul>
  4. 4. <ul><li>mRNA is very unstable outside of a cell; therefore they are converted to complementary DNA (cDNA) . </li></ul><ul><li>cDNA is a much more stable compound and, importantly, because it is generated from a mRNA in which the introns have been removed, </li></ul><ul><li>cDNA represents only expressed DNA sequence . </li></ul>
  5. 5. <ul><li>Sequencing only the beginning portion of the cDNA produces 5' EST . </li></ul><ul><li>5' EST is obtained from the portion of a transcript that usually codes for a protein. These regions tend to be conserved across species and do not change much within a gene family . </li></ul><ul><li>Sequencing the ending portion of the cDNA molecule produces 3' EST . </li></ul><ul><li>3' EST are likely to fall within non-coding, or untranslated regions (UTRs) , and therefore tend to exhibit less cross-species conservation than do coding sequences. </li></ul>
  6. 6. <ul><li>ESTs as Genome Landmarks </li></ul><ul><ul><li>The 3' ESTs serve as a common source of STSs because of their likelihood of being unique to a particular </li></ul></ul><ul><ul><li>An STS is a short DNA sequence that is easily recognizable and occurs only once in a genome (or chromosome </li></ul></ul><ul><li>ESTs as Gene Discovery Resources </li></ul><ul><ul><li>Tools in the hunt for genes involved in hereditary diseases . </li></ul></ul><ul><ul><li>ESTs are generated rapidly and inexpensively , only one sequencing experiment is needed per each cDNA generated, and they do not have to be checked for sequencing errors . </li></ul></ul><ul><ul><li>Using ESTs, scientists have rapidly isolated some of the genes involved in Alzheimer's disease and colon cancer. </li></ul></ul>
  7. 7. <ul><li>In 1992, scientists at NCBI developed a new database designed to serve as a collection point for ESTs. </li></ul><ul><li>Once an EST that was submitted to GenBank had been screened and annotated , it was then deposited in this new database, called dbEST . </li></ul><ul><li>  </li></ul>
  8. 9. <ul><li>Scientists at NCBI created dbEST to organize, store, and provide access to the great mass of public EST data that has already accumulated and that continues to grow daily. </li></ul><ul><li>Using dbEST, a scientist can access not only data on human ESTs but information on ESTs from over 300 other organisms as well. </li></ul>
  9. 10. <ul><li>EST sequences are included in the EST division of GenBank , available from NCBI by anonymous ftp and through Entrez . </li></ul><ul><li>The nucleotide sequences may be searched using the BLAST electronic mail server. </li></ul><ul><li>The TBLASTN program which takes an amino acid query sequence and compares it with six-frame translations of dbEST DNA sequences is particularly useful. </li></ul><ul><li>EST sequences are also available as a flat file in the FASTA format by anonymous ftp in the /repository/db EST directory at </li></ul>
  10. 11. <ul><li>&quot;Trace data&quot; for sequences derived from I.M.A.G.E clones is available from the Washington University Genome Sequencing Center. </li></ul><ul><li>Traces are retrieved using the identifier labelled &quot; EST name &quot; in the dbEST record (e.g. yb01a01.s1 , corresponding to GenBank accession number T48601 ). Note that this identifier may also be found in the DEFINITION line of the GenBank Record </li></ul><ul><li>Clones may be ordered using the identifier labelled &quot; Clone Id &quot; in the dbEST record (e.g. 69864 , corresponding to GenBank accession number T48601 ) </li></ul><ul><li>This identifier is also available in the DEFINITION line of the corresponding GenBank Record </li></ul>
  11. 14. <ul><li>It is very difficult to isolate mRNA from some tissues and cell types. </li></ul><ul><li>Second is that important gene regulatory sequences may be found within an intron . </li></ul><ul><li>For most ESTs, there is no indication as to the gene from which it is derived </li></ul><ul><li>Because a gene can be expressed as mRNA many, many times, ESTs ultimately derived from this mRNA may be redundant . </li></ul><ul><li>There may be many identical, or similar, copies of the same EST. </li></ul>
  12. 15. <ul><li>To resolve the redundancy and overlap problem, NCBI investigators developed the UniGene database </li></ul><ul><li>UniGene cluster EST sequences with traditional gene sequences </li></ul><ul><li>It automatically partitions GenBank sequences into a non-redundant set of ESTs </li></ul><ul><li>For each cluster additional information are included </li></ul>
  13. 17. <ul><li>Clustering : associate individual EST sequences with unique transcripts or genes </li></ul><ul><li>Assembling : derive consensus sequences from overlapping ESTs belonging to the same cluster </li></ul><ul><li>Mapping : associate ESTs (or EST contigs) with exons in genomic sequences </li></ul><ul><li>Interpreting : find and correct coding regions </li></ul>
  14. 18. <ul><li>Incyte has created the LifeSeq databases. </li></ul><ul><li>SANBI in South Africa produces the STACK collection of human EST contigs </li></ul><ul><li>MIPS in Munich and the SIB produce BLAST-searchable contigs from Unigene </li></ul><ul><li>TIGEM in Italy has a nice collection of EST search and assembly tools, local & remote </li></ul><ul><li>The CBIL at the U. of Pennsylvania has assembled the DOTS database </li></ul>