2. the genome is the entirety of an organism's
hereditary information.
Protein coding region of genome is gene
Question??
3. The human genome contains 3164.7 million chemical
nucleotide bases (A, C, T, and G).
The average gene consists of 3000 bases, but sizes vary
greatly, with the largest known human gene containing
2.4 million bases.
The total number of genes is estimated at 30,000 to
35,000.
Less than 2% of the genome is used in protein coding.
At least 50% of the genome is comprised of unused
repetitive sequences.
8. Gene to protein
DNA
transcription
mRNA
translation
Protein
CCTGAGCCAACTATTGATGAA
CCUGAGCCAACUAUUGAUGAA
PEPTIDE
9. Protein-coding regions of DNA have been found to have a
peak at frequency 2π/3 in their Fourier spectra. This is
called the period-3 property.
The period-3 property is related to the different statistical
distributions of codons between protein-coding and
noncoding DNA sections.
The period-3 property can be used as a basis for
identifying the coding and non-coding regions in a DNA
Sequence.
10. Identification of protein coding regions
Prediction of the proper reading frame
Comparing to traditional methods, signal
processing methods are much quicker, and
can be even more accurate in some cases.
11. By mapping the chemical bases of DNA to a number
set, we give ourselves an effective “DNA signal” .
A properly defined Fourier transform is a powerful
predictor of both the existence and the reading frame of
protein coding regions in DNA sequences.
Their respective color mapping schemes can help in
visually identifying protein coding regions.
20. Challenges and Future Work
• Genomic signal processing opens a new signal
processing frontier
• Sequence analysis: symbolic or categorical signal,
classical signal processing methods are not directly
applicable
• Increasingly high dimensionality of genetic data sets
and the complexity involved call for fast and high
throughput implementations of genomic signal
processing algorithms
• Future work: spectral analysis of DNA sequence and
data clustering of microarray data. Modify classical
signal processing methods, and develop new ones.