SlideShare a Scribd company logo
1 of 3
Download to read offline
Final Project –CS6243
Transcription Factor DNA Binding Prediction




                    Team Members:
                    Badri Sampath α

                    Iffat Sharmin Chowdhury α

                    Prosunjit Biswas α

                    Tahmina Ahmed α
            α
                Department of Computer Science

            University of Texas at San Antonio.
1. Defining the Scope of the Project:

In this project, we have given a number of labeled (which are p & n) DNA sequence and a number of
unlabeled DNA sequence which we have to label based on a model built from the given labeled
sequences. Eventually, the scope of the problem is to build a binary classifier model based on the given
training DNA sequence and apply the model to label the unlabeled DNA sequence.

        1.1 Challenges of the Projects:

In conventional classification problem, there are a number of different attributes that we can readily use to
build the classifier. In this project, we are only given sequences and label. So, part of the work for this
project, is to find a way for generating meaningful attribute.




                                 Fig. 1 : Overall scope of the project.

    2. K-mer Based Approach:

        In the K-mer approach, we have generated all possible combination of DNA characters for a
specified length of K. The K-mer Approach is shown in details in figure 2. The important steps of the k-
mer approach are discussed in the following paragraphs.




                                 Fig 2: Overall K-mer based process.

After we have generated the K-mers, we have followed different kind of approaches to count the
their frequencies which are i)Strict matching , ii) matching with mismatch and iii) matching based
on Regular Expression.

In order to build an optimum model, we have tuned different parameters of the model. Some of
parameters and their impact on the classifier is shown in table I.

    3. PWM Based Approach:

We have used a motif finding tool named MEME [1] to generate specified number of motifs of
specific minimum and maximum length and motif Alignment and search tool MAST [2] to get the
E-value (bounded to 100)for each sequence. We have derived scores from these E-values by
subtracting the E-value from 100 for ordering the sequences according to their E-value. We
have used these scores specific to each motif as attributes of the sequences and feed them to
different classifiers. Table II gives the synopsis of parameters and their impact on the model.

Table I: Synopsis of the parameters and their effect in the K-mer model building process.

  K-mer Value        Classifier Selection    String Match            MisMatch               Regular
                                                                                           Expression
     5( Best)           Logistic (Best)      When applied         When not applied      Not significant
                                             (perform best)        (perform best)
  4(reasonably           SMO (Good)         When not applied    When applied (perform
      good)                                     (perform          relatively worse)
                                            relatively worse)
 6 (Comparatively     J48 (Comparatively
      bad)                  weak)



Table II: Synopsis of the parameters for PWM approach and their effect in the model

 No. of Motif    No.of Sites a      Min / Max Length of Motif                 Classifier
                 Motif appear
     10                18                     6-15                            J48(Best)
      8                20                     5-16                        Logistic(Moderate)
      5                10                     6-15               Naïve Bayes(comparatively Bad)



   4. Combining K-mer & PWM approach:

In order to obtain a better model, we have combined both K-mer and PWM approaches with
known best parameters. We found reasonable improvement for the combined approach when
applying it in the training data.

   5. Some Difficulties and Limitation of our Work:

Tuning the parameters for the classifier was the most challenging part of the project. We think,
we have done reasonable experiment for choosing the parameters given the limited timeline.

   6. Acknowledgement:

At the end of the project, we would like to thank Dr. Ruan for assigning us such a challenging
project. It offered us good working knowledge of practical Machine Learning and data mining
stuffs. Working in the group was also a nice experience and knowledge sharing scope for us.

References:

[1-2] “MEME Suite“, available at http://meme.sdsc.edu/meme/meme-download.html
[3] “Weka”, available at: http://www.cs.waikato.ac.nz/ml/weka/index_downloading.html

More Related Content

What's hot

Speaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained DataSpeaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained Data
sipij
 
Accurate global localization using visual odometry and digital (1)
Accurate global localization using visual odometry and digital (1)Accurate global localization using visual odometry and digital (1)
Accurate global localization using visual odometry and digital (1)
Naveen Gouda
 
Study on Some Key Issues of Synergetic Neural Network
Study on Some Key Issues of Synergetic Neural Network  Study on Some Key Issues of Synergetic Neural Network
Study on Some Key Issues of Synergetic Neural Network
Jie Bao
 

What's hot (10)

Speaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained DataSpeaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained Data
 
Accurate global localization using visual odometry and digital (1)
Accurate global localization using visual odometry and digital (1)Accurate global localization using visual odometry and digital (1)
Accurate global localization using visual odometry and digital (1)
 
IRJET- American Sign Language Classification
IRJET- American Sign Language ClassificationIRJET- American Sign Language Classification
IRJET- American Sign Language Classification
 
Matlab course syllabus
Matlab course syllabusMatlab course syllabus
Matlab course syllabus
 
352735350 rsh-qam11-tif-15-doc
352735350 rsh-qam11-tif-15-doc352735350 rsh-qam11-tif-15-doc
352735350 rsh-qam11-tif-15-doc
 
Frontier in reinforcement learning
Frontier in reinforcement learningFrontier in reinforcement learning
Frontier in reinforcement learning
 
D111823
D111823D111823
D111823
 
Reciprocal Ranking Fusion in Consumer Health Search - IMS UNIPD @ CLEF eHealt...
Reciprocal Ranking Fusion in Consumer Health Search - IMS UNIPD @ CLEF eHealt...Reciprocal Ranking Fusion in Consumer Health Search - IMS UNIPD @ CLEF eHealt...
Reciprocal Ranking Fusion in Consumer Health Search - IMS UNIPD @ CLEF eHealt...
 
Analog Communication Apr 2013
Analog Communication Apr 2013Analog Communication Apr 2013
Analog Communication Apr 2013
 
Study on Some Key Issues of Synergetic Neural Network
Study on Some Key Issues of Synergetic Neural Network  Study on Some Key Issues of Synergetic Neural Network
Study on Some Key Issues of Synergetic Neural Network
 

Viewers also liked

Branch prediction contest_report
Branch prediction contest_reportBranch prediction contest_report
Branch prediction contest_report
UT, San Antonio
 
تصنيع البروتينات في الخلية
تصنيع البروتينات في الخليةتصنيع البروتينات في الخلية
تصنيع البروتينات في الخلية
Univ. of Tripoli
 
Attribute Based Encryption
Attribute Based EncryptionAttribute Based Encryption
Attribute Based Encryption
UT, San Antonio
 
Sample graduation project presentation
Sample graduation project presentationSample graduation project presentation
Sample graduation project presentation
burnsr
 

Viewers also liked (10)

An Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationAn Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif Identification
 
Branch prediction contest_report
Branch prediction contest_reportBranch prediction contest_report
Branch prediction contest_report
 
Cyber Security Exam 2
Cyber Security Exam 2Cyber Security Exam 2
Cyber Security Exam 2
 
Recitation
RecitationRecitation
Recitation
 
Recitation
RecitationRecitation
Recitation
 
Ksi
KsiKsi
Ksi
 
تصنيع البروتينات في الخلية
تصنيع البروتينات في الخليةتصنيع البروتينات في الخلية
تصنيع البروتينات في الخلية
 
DNA Motif Finding 2010
DNA Motif Finding 2010DNA Motif Finding 2010
DNA Motif Finding 2010
 
Attribute Based Encryption
Attribute Based EncryptionAttribute Based Encryption
Attribute Based Encryption
 
Sample graduation project presentation
Sample graduation project presentationSample graduation project presentation
Sample graduation project presentation
 

Similar to Transcription Factor DNA Binding Prediction

IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD Editor
 
Principles of effort estimation
Principles of effort estimationPrinciples of effort estimation
Principles of effort estimation
CS, NcState
 
KnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProjectKnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProject
Marciano Moreno
 

Similar to Transcription Factor DNA Binding Prediction (20)

InternshipReport
InternshipReportInternshipReport
InternshipReport
 
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware Performance
 
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor DriveIRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
2cee Master Cocomo20071
2cee Master Cocomo200712cee Master Cocomo20071
2cee Master Cocomo20071
 
SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.
 
IRJET - Cognitive based Emotion Analysis of a Child Reading a Book
IRJET -  	  Cognitive based Emotion Analysis of a Child Reading a BookIRJET -  	  Cognitive based Emotion Analysis of a Child Reading a Book
IRJET - Cognitive based Emotion Analysis of a Child Reading a Book
 
Test for AI model
Test for AI modelTest for AI model
Test for AI model
 
Developing Tools for “What if…” Testing of Large-scale Software Systems
Developing Tools for “What if…” Testing of Large-scale Software SystemsDeveloping Tools for “What if…” Testing of Large-scale Software Systems
Developing Tools for “What if…” Testing of Large-scale Software Systems
 
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
 
Software Product Measurement and Analysis in a Continuous Integration Environ...
Software Product Measurement and Analysis in a Continuous Integration Environ...Software Product Measurement and Analysis in a Continuous Integration Environ...
Software Product Measurement and Analysis in a Continuous Integration Environ...
 
Quality Prediction in Fingerprint Compression
Quality Prediction in Fingerprint CompressionQuality Prediction in Fingerprint Compression
Quality Prediction in Fingerprint Compression
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Image Features Matching and Classification Using Machine Learning
Image Features Matching and Classification Using Machine LearningImage Features Matching and Classification Using Machine Learning
Image Features Matching and Classification Using Machine Learning
 
Archana kalapgar 19210184_ca684
Archana kalapgar 19210184_ca684Archana kalapgar 19210184_ca684
Archana kalapgar 19210184_ca684
 
Principles of effort estimation
Principles of effort estimationPrinciples of effort estimation
Principles of effort estimation
 
KnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProjectKnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProject
 
Sign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedSign Detection from Hearing Impaired
Sign Detection from Hearing Impaired
 
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGESA DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...
 

More from UT, San Antonio

Security_of_openstack_keystone
Security_of_openstack_keystoneSecurity_of_openstack_keystone
Security_of_openstack_keystone
UT, San Antonio
 
On the incoherencies in web browser access control
On the incoherencies in web browser access controlOn the incoherencies in web browser access control
On the incoherencies in web browser access control
UT, San Antonio
 

More from UT, San Antonio (20)

digital certificate - types and formats
digital certificate - types and formatsdigital certificate - types and formats
digital certificate - types and formats
 
Saml metadata
Saml metadataSaml metadata
Saml metadata
 
Static Analysis with Sonarlint
Static Analysis with SonarlintStatic Analysis with Sonarlint
Static Analysis with Sonarlint
 
Shellshock- from bug towards vulnerability
Shellshock- from bug towards vulnerabilityShellshock- from bug towards vulnerability
Shellshock- from bug towards vulnerability
 
Abac17 prosun-slides
Abac17 prosun-slidesAbac17 prosun-slides
Abac17 prosun-slides
 
Abac17 prosun-slides
Abac17 prosun-slidesAbac17 prosun-slides
Abac17 prosun-slides
 
Big Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationBig Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory Computation
 
Enumerated authorization policy ABAC (EP-ABAC) model
Enumerated authorization policy ABAC (EP-ABAC) modelEnumerated authorization policy ABAC (EP-ABAC) model
Enumerated authorization policy ABAC (EP-ABAC) model
 
Where is my Privacy presentation slideshow (one page only)
Where is my Privacy presentation slideshow (one page only)Where is my Privacy presentation slideshow (one page only)
Where is my Privacy presentation slideshow (one page only)
 
Three month course
Three month courseThree month course
Three month course
 
One month-syllabus
One month-syllabusOne month-syllabus
One month-syllabus
 
Zerovm backgroud
Zerovm backgroudZerovm backgroud
Zerovm backgroud
 
Security_of_openstack_keystone
Security_of_openstack_keystoneSecurity_of_openstack_keystone
Security_of_openstack_keystone
 
Research seminar group_1_prosunjit
Research seminar group_1_prosunjitResearch seminar group_1_prosunjit
Research seminar group_1_prosunjit
 
Final Project Transciption Factor DNA binding Prediction
Final Project Transciption Factor DNA binding Prediction Final Project Transciption Factor DNA binding Prediction
Final Project Transciption Factor DNA binding Prediction
 
Transcription Factor DNA Binding Prediction
Transcription Factor DNA Binding PredictionTranscription Factor DNA Binding Prediction
Transcription Factor DNA Binding Prediction
 
Secure webbrowsing 1
Secure webbrowsing 1Secure webbrowsing 1
Secure webbrowsing 1
 
On the incoherencies in web browser access control
On the incoherencies in web browser access controlOn the incoherencies in web browser access control
On the incoherencies in web browser access control
 
Cultural conflict
Cultural conflictCultural conflict
Cultural conflict
 
Pair programming
Pair programmingPair programming
Pair programming
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Transcription Factor DNA Binding Prediction

  • 1. Final Project –CS6243 Transcription Factor DNA Binding Prediction Team Members: Badri Sampath α Iffat Sharmin Chowdhury α Prosunjit Biswas α Tahmina Ahmed α α Department of Computer Science University of Texas at San Antonio.
  • 2. 1. Defining the Scope of the Project: In this project, we have given a number of labeled (which are p & n) DNA sequence and a number of unlabeled DNA sequence which we have to label based on a model built from the given labeled sequences. Eventually, the scope of the problem is to build a binary classifier model based on the given training DNA sequence and apply the model to label the unlabeled DNA sequence. 1.1 Challenges of the Projects: In conventional classification problem, there are a number of different attributes that we can readily use to build the classifier. In this project, we are only given sequences and label. So, part of the work for this project, is to find a way for generating meaningful attribute. Fig. 1 : Overall scope of the project. 2. K-mer Based Approach: In the K-mer approach, we have generated all possible combination of DNA characters for a specified length of K. The K-mer Approach is shown in details in figure 2. The important steps of the k- mer approach are discussed in the following paragraphs. Fig 2: Overall K-mer based process. After we have generated the K-mers, we have followed different kind of approaches to count the their frequencies which are i)Strict matching , ii) matching with mismatch and iii) matching based on Regular Expression. In order to build an optimum model, we have tuned different parameters of the model. Some of parameters and their impact on the classifier is shown in table I. 3. PWM Based Approach: We have used a motif finding tool named MEME [1] to generate specified number of motifs of specific minimum and maximum length and motif Alignment and search tool MAST [2] to get the E-value (bounded to 100)for each sequence. We have derived scores from these E-values by subtracting the E-value from 100 for ordering the sequences according to their E-value. We
  • 3. have used these scores specific to each motif as attributes of the sequences and feed them to different classifiers. Table II gives the synopsis of parameters and their impact on the model. Table I: Synopsis of the parameters and their effect in the K-mer model building process. K-mer Value Classifier Selection String Match MisMatch Regular Expression 5( Best) Logistic (Best) When applied When not applied Not significant (perform best) (perform best) 4(reasonably SMO (Good) When not applied When applied (perform good) (perform relatively worse) relatively worse) 6 (Comparatively J48 (Comparatively bad) weak) Table II: Synopsis of the parameters for PWM approach and their effect in the model No. of Motif No.of Sites a Min / Max Length of Motif Classifier Motif appear 10 18 6-15 J48(Best) 8 20 5-16 Logistic(Moderate) 5 10 6-15 Naïve Bayes(comparatively Bad) 4. Combining K-mer & PWM approach: In order to obtain a better model, we have combined both K-mer and PWM approaches with known best parameters. We found reasonable improvement for the combined approach when applying it in the training data. 5. Some Difficulties and Limitation of our Work: Tuning the parameters for the classifier was the most challenging part of the project. We think, we have done reasonable experiment for choosing the parameters given the limited timeline. 6. Acknowledgement: At the end of the project, we would like to thank Dr. Ruan for assigning us such a challenging project. It offered us good working knowledge of practical Machine Learning and data mining stuffs. Working in the group was also a nice experience and knowledge sharing scope for us. References: [1-2] “MEME Suite“, available at http://meme.sdsc.edu/meme/meme-download.html [3] “Weka”, available at: http://www.cs.waikato.ac.nz/ml/weka/index_downloading.html