Transcription Factor DNA Binding Prediction

689 views
552 views

Published on

Transcription Factor DNA Binding Prediction

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
689
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcription Factor DNA Binding Prediction

  1. 1. Transcription Factor-DNA binding predictionTahmina AhmedProsunjit BiswasIffat Sharmin ChowdhuryBadri Sampath 1
  2. 2. Motivation• Label the unlabeled DNA sequences by the model, built by examining the labeled DNA sequences and be able to perceive some real world Machine Learning problems. 2
  3. 3. Approaches• K-mer based Fixed length K-mer K-mer with Mismatches Using Regular Expression• PWM based MEME and MAST• Combined Model Unite both model 3
  4. 4. K-mer Approach Based on Regular ExpressionMotivation 2-mer appears mostly in the sequences. So, emphasize mostly on 2-mer.Strategy - For any two 2-mers X & Y, generate regular expression X(.*)Y and Y(.*)X. - Use these Regular expression as candidate attribute.
  5. 5. Classifier Selection Fig : Around 9 classifiers applied on TF data setAlgorithms are numbered as follows - (1)Logistic (2)SMO (3)NaiveBayes (4)BayesianLogisticRegression (5)Kstar (6)Bagging 7)LogitBoost (8)RandomForest (9)J48Summary - * 9 classifiers are applied on 10 data set. 3 are shown among them * choosing an absolute classifier is not a trivial task * same classifier behaves differently on different data sets 5
  6. 6. Change in Accuracy due to Different Classifiers Logistic J48 RandomForest NaiveBayes Logistic J48 RandomForest NaiveBayes Fig : The performance of different types of Classifiers on TF_3 data set Fig : The performance of different types of Classifiers on TF_5 data setSummary - * classifiers have great consequences on accuracy * one has to be prudent when choosing classifiers 6
  7. 7. Change in Accuracy due to Different K-mer Length 4-mer 5-mer 6-mer Fig : The performance of different length K-mer on TF_3 data setSummary - * K-mer length also has consequences on accuracy * not trivial, difficult to find the absolute one 7
  8. 8. Attribute Space Selection Fig : The performance of different selecting k-mer on TF_4 data setSummary - * considering number of attributes also has consequences on accuracy * accuracy increases if we consider greater number of attributes, but from such saturation point it decreases. 8
  9. 9. PWM based Analysis on Accuracy (TF_1 data set)Fig : J48, minW 6 - maxW 15, no. of sites 10 Fig : J48, minW 6 – maxW 15, no. of motifs 5Summary - * accuracy increases when we have more motifs but fixed no. of sites * accuracy increases when we have more sites but fixed no. of motifs * what happened when we increases both ????? 9
  10. 10. PWM based Analysis Fig : Accuracy vary on no. of motifs and no. of sites* 1st bar concern with no. of sites* 2nd bar concern with no. of motifs* 3rd bar concern with accuracy* the point is that accuracy decreases when we increases no. of motifs and no. of sites.
  11. 11. Extra Work for TF_20 Sequences identified by both modelK-mer The New Model + for TF-20Pwm Sequences Biased 2- Newly identified mer Model Labeled differently Sequences Fig : Flow diagram of Building New Model for TF-20Summary - * we have done some extra work for TF_20
  12. 12. AUC based on the Feedback (bonus model) Fig : AUC of 10 data sets based on last submission* accuracy improved than first submission* PWM does not have pleasant result 12
  13. 13. Participation Background Working Working Paramete Automation Study with Tools with r Tuning Models Badri DNA,RNA, AlignAce, PWM K-mer Arff Writer, Sampath protein, MEME, Mast output motif MAST writer Iffat Protein, Weka, K-mer PWM Script for Sharmin Motif, AlignAce, FASTA,Chowdhury Transcriptio ScanAce Weka nProsunjit DNA, MEME, K-mer PWM Script for Biswas Transcriptio MAST RE, for new nK-mer model Tahmina MEME, MEME, PWM K-mer Script for Ahmed MAST, MAST, MEME, PWM Weka MAST 13
  14. 14. Acknowledgment 14
  15. 15. Questions ???

×