A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

483 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
483
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

  1. 1. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) A New Multiple Classifiers Soft Decisions Fusion Approach for Exons Prediction in DNA Sequences Ismail M. El-Badawy, Ashraf M. Aziz, Senior Member, IEEE, Safa Gasser and Mohamed E. Khedr Department of Electronics & Communications Engineering Arab Academy for Science, Technology and Maritime Transport, Egypt Presented by Ismail M. El-Badawy
  2. 2. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Outline  Introduction  DNA Structure  Predicting Exons Locations  Exons Prediction using DFT  Proposed Soft Decisions Fusion Approach  Performance Evaluation  Conclusion
  3. 3. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Introduction  Digital Signal Processing has proved its success in different fields, and bioinformatics is one of these fields.  Identification of protein coding regions in DNA sequences is one of the important topics in biosignal processing and bioinformatics area.  With the significant growth of sequenced genomic data, it has become important to come up with computarized methods for predicting these important protein coding regions (exons) in DNA sequences.
  4. 4. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) DNA Structure  DNA, or deoxyribonucleic acid, is the hereditary material in humans and almost all other organisms.
  5. 5. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) DNA Structure  Organisms can be categorized into prokaryotes (e.g bacteria) and eukaryotes (e.g human).  In both categories, DNA consists of genes separated by intergenic regions.  In eukaryotes, genes are further divided into protein-coding regions (exons) and noncoding regions (introns).
  6. 6. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) DNA Structure  DNA is made up of nucleotides.  Nucleotides are identified by the four nitrogen bases.  Nitrogen bases pair up with each other forming a double helix. Adenine (A) Thymine (T) Cytosine (C) Guanine (G)  The two DNA strands are complementary to each other.
  7. 7. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) DNA Structure  DNA = Chain of nucleotides {A, C, G and T}.  This DNA chain (Exons and introns) can symbolically be represented by a character string of four alphabet letters. ………TCCGATCGATCGATCTCTCTAGCGTCTACGCTAT CATCGCTCTCTATTATCGCGCGATCGTCGATCGCGCG AGAGTATGCTACGTCGATCGAATTG …………………………
  8. 8. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) DNA Structure  Protein-Coding regions (Exons) are the portions in DNA that contain the information for producing proteins.
  9. 9. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Predicting Exons Locations  Accurate prediction of the exons locations in DNA sequences is an important issue for biologists since they are considered as information bearing parts. TATTCCGATCGATCGATCT CTCTAGCGTCTACGCTATC ATCGCTCTCTATTATCGCG CG …… Exons finder Exons Locations
  10. 10. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Predicting Exons Locations  The order of the nucleotides stored in the Exons spell out a code for protein synthesis.  Triplets of nucleotides (codons) in the exonic segments of DNA specify each type of amino acid based on a genetic code.  Each amino acid is encoded by one or more codons (many to one mapping).
  11. 11. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Predicting Exons Locations  It was shown in previous publications that exonic parts exhibit a period-3 property due to the codon structure and the nonuniform usage of codons in exonic regions.  This periodicity is absent outside the exonic segments. ……… ACGTATTCCGATCGA …………… GACTCTAGCGTCTAC ………
  12. 12. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Predicting Exons Locations  Three main steps to predict exons locations using digital signal processing (DSP) tool. …TATTCCGATCGATCGATCTCTCTAGCGTCTAC GCTATCATCGCTCTCTATTATCGCGCG …… Symbolic to Numeric Mapping Track the strength of the period-3 component using DSP tool Decision Making Exons Locations
  13. 13. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Exons Prediction using DFT  Sliding window DFT is one of various DSP methods previously proposed in the filed of exons prediction based on DNA spectral analysis. …TATTCCGATCGATCGATCTCTCTAGCGTCTAC GCTATCATCGCTCTCTATTATCGCGCG …… Numerical Mapping X[n] Sliding Window DFT S[L/3] Exons Locations
  14. 14. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Exons Prediction using DFT  Calculating the power spectrum of a windowed DNA numerical sequence at k=L/3 is sufficient as it is expected to be large value in exonic regions and small value outside. …TATTCCGATCGATCGATCTCTCTAGCGTCTAC GCTATCATCGCTCTCTATTATCGCGCG …… Numerical Mapping X[n] Sliding Window DFT S[L/3]
  15. 15. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Exons Prediction using DFT  A hard decision for each nucleotide (exonic or intronic nucleotide) is made according to the corresponding S[L/3] value, whether it is above or below a decision threshold. S[L/3] Exons Locations
  16. 16. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Exons Prediction using DFT  In our work, we selected two symbolic-to-numeric mapping schemes from different schemes that previously showed a reasonable performance. …TATTCCGATCGATCGATCTCTCTAGCGTCTAC GCTATCATCGCTCTCTATTATCGCGCG …… Nucleotide Numerical Mapping CIS Adenine (A) 0.1260 1 Cytosine (C) X[n] EIIP 0.1340 -j Guanine (G) 0.0806 -1 Thymine (T) 0.1335 j
  17. 17. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Exons Prediction using DFT …TATTCCGATCGATCGAT…CTCTC…TAGCGTCT ACGCTATCATCGCTCTCT…ATTATCGCGCG …… Gene F56F11.4 contains five exons 1 EIIP Mapping X[n] Sliding Window DFT 0.5 S[L/3] 0 0 1000 2000 3000 4000 5000 Nucleotide Positions 6000 7000 8000 0 1000 2000 3000 4000 5000 Nucleotide Positions 6000 7000 8000 1 CIS Mapping X[n] Sliding Window DFT 0.5 S[L/3] 0
  18. 18. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Exons Prediction using DFT Gene F56F11.4 contains five exons  Each mapping scheme is able pronounce the peaks in some exonic segments than the other scheme.  The peaks in the exonic segments are not always consistently large while those in the intronic segments are not always consistently low. 1 0.5 0 0 1000 2000 3000 4000 5000 Nucleotide Positions 6000 7000 8000 0 1000 2000 3000 4000 5000 Nucleotide Positions 6000 7000 8000 1 0.5 0
  19. 19. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Proposed Soft Decisions Fusion Approach …TATTCCGATCGATCGAT…CTCTC…TAGCGTCT ACGCTATCATCGCTCTCT…ATTATCGCGCG …… EIIP Mapping X[n] Sliding Window DFT S[L/3] Soft Decisions CIS Mapping X[n] Sliding Window DFT S[L/3]
  20. 20. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Proposed Soft Decisions Fusion Approach Hard Decision (0 or 1) Soft Decision (0 to 1)
  21. 21. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Proposed Soft Decisions Fusion Approach  Each nucleotide belongs to exonic regions with a partial S[L/3] membership value (i.e possibility of being an exonic nucleotide).
  22. 22. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Proposed Soft Decisions Fusion Approach …TATTCCGATCGATCGAT…CTCTC…TAGCGTCT ACGCTATCATCGCTCTCT…ATTATCGCGCG …… EIIP Mapping X[n] Sliding Window DFT S[L/3] DFC CIS Mapping X[n] Sliding Window DFT S[L/3] Exons Locations
  23. 23. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Proposed Soft Decisions Fusion Approach  The DFC averages the two local soft decisions.  If the average exceeds 0.5 (i.e the average possibility of being an exonic nucleotide exceeds 50% ), the final decision is ‘1’, otherwise ‘0’.  The combined decision Soft Decisions helps in making a more reliable decision as compared to making a hard decision depending on only one classifier. DFC Exons Locations
  24. 24. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Performance Evaluation Metrics True Prediction Decision Positive Negative False True False
  25. 25. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Performance Evaluation Metrics True Prediction Decision Positive Negative False True False
  26. 26. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Performance Evaluation Metrics
  27. 27. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Performance Evaluation Metrics
  28. 28. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Performance Evaluation Metrics  Area under the ROC curve (AUC) is a good indicator.
  29. 29. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Performance Evaluation Metrics  F_measure Vs Decision threshold is also a good indicator.
  30. 30. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Performance Evaluation  MATLAB Simulation is conducted on real data (HMR195 dataset) which is available online.  It contains 195 mammalian sequences consisting of 43 single- exon and 152 multi-exon genes.  Traditional and proposed approaches are simulated using different window shapes with a constant length (L=351) as reported in previous publications.
  31. 31. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Performance Evaluation  AUC values for HMR195 dataset and ROC curves plotted in case of using Bartlett window. Window Shape Single Classifier EIIP CIS Multiple Classifier Rectangular 0.7280 0.7398 0.7862 Nutall 0.7264 0.7439 0.7972 Parzen 0.7281 0.7457 0.7989 Bohman 0.7314 0.7490 0.8021 Blackman 0.7331 0.7504 0.8035 Hanning 0.7387 0.7553 0.8079 Hamming 0.7425 0.7580 0.8106 Bartlett 0.7438 0.7589 0.8115
  32. 32. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Performance Evaluation Numerical Scheme used by the classifier Number of Classifiers EIIP % of exonic nucleotides detected as true positives at 10% FPR at 20% FPR at 30% FPR 1 43.5 56.9 66.4 CIS 1 46.8 59.9 68.7 Both 2 54.1 67.3 76.0 At 10% FPR: by 24.4 % over single classifier using EIIP by 15.6 % over single classifier using CIS
  33. 33. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Performance Evaluation  Maximum F_measures achieved and corresponding decision thresholds for HMR195 dataset. Single Classifier EIIP CIS Multiple Classifier Maximum F_measure 0.4287 0.4562 0.5086 Decision Threshold 0.029 0.048 0.037 by 18.6 % over single classifier using EIIP by 11.5 % over single classifier using CIS
  34. 34. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013) Conclusion  In our work, a new multiple DFT-based classifiers approach for exons prediction has been proposed.  Making soft decisions instead of hard decisions and depending on two classifiers instead of one helps in making more reliable decisions.  The prediction accuracy is enhanced at the expense of increasing computational time and complexity.  Although the analysis of the proposed approach has been investigated in case of only two classifiers for simplicity, it can be easily be extended to more than two classifiers.
  35. 35. Thank You

×