SlideShare a Scribd company logo
1 of 3
Download to read offline
Final Project –CS6243
Transcription Factor DNA Binding Prediction




                    Team Members:
                    Badri Sampath α

                    Iffat Sharmin Chowdhury α

                    Prosunjit Biswas α

                    Tahmina Ahmed α
            α
                Department of Computer Science

            University of Texas at San Antonio.
1. Defining the Scope of the Project:

In this project, we have given a number of labeled (which are p & n) DNA sequence and a number of
unlabeled DNA sequence which we have to label based on a model built from the given labeled
sequences. Eventually, the scope of the problem is to build a binary classifier model based on the given
training DNA sequence and apply the model to label the unlabeled DNA sequence.

        1.1 Challenges of the Projects:

In conventional classification problem, there are a number of different attributes that we can readily use to
build the classifier. In this project, we are only given sequences and label. So, part of the work for this
project, is to find a way for generating meaningful attribute.




                                 Fig. 1 : Overall scope of the project.

    2. K-mer Based Approach:

        In the K-mer approach, we have generated all possible combination of DNA characters for a
specified length of K. The K-mer Approach is shown in details in figure 2. The important steps of the k-
mer approach are discussed in the following paragraphs.




                                 Fig 2: Overall K-mer based process.

After we have generated the K-mers, we have followed different kind of approaches to count the
their frequencies which are i)Strict matching , ii) matching with mismatch and iii) matching based
on Regular Expression.

In order to build an optimum model, we have tuned different parameters of the model. Some of
parameters and their impact on the classifier is shown in table I.

    3. PWM Based Approach:

We have used a motif finding tool named MEME [1] to generate specified number of motifs of
specific minimum and maximum length and motif Alignment and search tool MAST [2] to get the
E-value (bounded to 100)for each sequence. We have derived scores from these E-values by
subtracting the E-value from 100 for ordering the sequences according to their E-value. We
have used these scores specific to each motif as attributes of the sequences and feed them to
different classifiers. Table II gives the synopsis of parameters and their impact on the model.

Table I: Synopsis of the parameters and their effect in the K-mer model building process.

  K-mer Value        Classifier Selection    String Match            MisMatch               Regular
                                                                                           Expression
     5( Best)           Logistic (Best)      When applied         When not applied      Not significant
                                             (perform best)        (perform best)
  4(reasonably           SMO (Good)         When not applied    When applied (perform
      good)                                     (perform          relatively worse)
                                            relatively worse)
 6 (Comparatively     J48 (Comparatively
      bad)                  weak)



Table II: Synopsis of the parameters for PWM approach and their effect in the model

 No. of Motif    No.of Sites a      Min / Max Length of Motif                 Classifier
                 Motif appear
     10                18                     6-15                            J48(Best)
      8                20                     5-16                        Logistic(Moderate)
      5                10                     6-15               Naïve Bayes(comparatively Bad)



   4. Combining K-mer & PWM approach:

In order to obtain a better model, we have combined both K-mer and PWM approaches with
known best parameters. We found reasonable improvement for the combined approach when
applying it in the training data.

   5. Some Difficulties and Limitation of our Work:

Tuning the parameters for the classifier was the most challenging part of the project. We think,
we have done reasonable experiment for choosing the parameters given the limited timeline.

   6. Acknowledgement:

At the end of the project, we would like to thank Dr. Ruan for assigning us such a challenging
project. It offered us good working knowledge of practical Machine Learning and data mining
stuffs. Working in the group was also a nice experience and knowledge sharing scope for us.

References:

[1-2] “MEME Suite“, available at http://meme.sdsc.edu/meme/meme-download.html
[3] “Weka”, available at: http://www.cs.waikato.ac.nz/ml/weka/index_downloading.html

More Related Content

What's hot

Speaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained DataSpeaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained Datasipij
 
Accurate global localization using visual odometry and digital (1)
Accurate global localization using visual odometry and digital (1)Accurate global localization using visual odometry and digital (1)
Accurate global localization using visual odometry and digital (1)Naveen Gouda
 
IRJET- American Sign Language Classification
IRJET- American Sign Language ClassificationIRJET- American Sign Language Classification
IRJET- American Sign Language ClassificationIRJET Journal
 
352735350 rsh-qam11-tif-15-doc
352735350 rsh-qam11-tif-15-doc352735350 rsh-qam11-tif-15-doc
352735350 rsh-qam11-tif-15-docFiras Husseini
 
Frontier in reinforcement learning
Frontier in reinforcement learningFrontier in reinforcement learning
Frontier in reinforcement learningJie-Han Chen
 
Reciprocal Ranking Fusion in Consumer Health Search - IMS UNIPD @ CLEF eHealt...
Reciprocal Ranking Fusion in Consumer Health Search - IMS UNIPD @ CLEF eHealt...Reciprocal Ranking Fusion in Consumer Health Search - IMS UNIPD @ CLEF eHealt...
Reciprocal Ranking Fusion in Consumer Health Search - IMS UNIPD @ CLEF eHealt...Giorgio Di Nunzio
 
Study on Some Key Issues of Synergetic Neural Network
Study on Some Key Issues of Synergetic Neural Network  Study on Some Key Issues of Synergetic Neural Network
Study on Some Key Issues of Synergetic Neural Network Jie Bao
 

What's hot (10)

Speaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained DataSpeaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained Data
 
Accurate global localization using visual odometry and digital (1)
Accurate global localization using visual odometry and digital (1)Accurate global localization using visual odometry and digital (1)
Accurate global localization using visual odometry and digital (1)
 
IRJET- American Sign Language Classification
IRJET- American Sign Language ClassificationIRJET- American Sign Language Classification
IRJET- American Sign Language Classification
 
Matlab course syllabus
Matlab course syllabusMatlab course syllabus
Matlab course syllabus
 
352735350 rsh-qam11-tif-15-doc
352735350 rsh-qam11-tif-15-doc352735350 rsh-qam11-tif-15-doc
352735350 rsh-qam11-tif-15-doc
 
Frontier in reinforcement learning
Frontier in reinforcement learningFrontier in reinforcement learning
Frontier in reinforcement learning
 
D111823
D111823D111823
D111823
 
Reciprocal Ranking Fusion in Consumer Health Search - IMS UNIPD @ CLEF eHealt...
Reciprocal Ranking Fusion in Consumer Health Search - IMS UNIPD @ CLEF eHealt...Reciprocal Ranking Fusion in Consumer Health Search - IMS UNIPD @ CLEF eHealt...
Reciprocal Ranking Fusion in Consumer Health Search - IMS UNIPD @ CLEF eHealt...
 
Analog Communication Apr 2013
Analog Communication Apr 2013Analog Communication Apr 2013
Analog Communication Apr 2013
 
Study on Some Key Issues of Synergetic Neural Network
Study on Some Key Issues of Synergetic Neural Network  Study on Some Key Issues of Synergetic Neural Network
Study on Some Key Issues of Synergetic Neural Network
 

Viewers also liked

An Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationAn Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationCSCJournals
 
Branch prediction contest_report
Branch prediction contest_reportBranch prediction contest_report
Branch prediction contest_reportUT, San Antonio
 
تصنيع البروتينات في الخلية
تصنيع البروتينات في الخليةتصنيع البروتينات في الخلية
تصنيع البروتينات في الخليةUniv. of Tripoli
 
Attribute Based Encryption
Attribute Based EncryptionAttribute Based Encryption
Attribute Based EncryptionUT, San Antonio
 
Sample graduation project presentation
Sample graduation project presentationSample graduation project presentation
Sample graduation project presentationburnsr
 

Viewers also liked (10)

An Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationAn Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif Identification
 
Branch prediction contest_report
Branch prediction contest_reportBranch prediction contest_report
Branch prediction contest_report
 
Cyber Security Exam 2
Cyber Security Exam 2Cyber Security Exam 2
Cyber Security Exam 2
 
Recitation
RecitationRecitation
Recitation
 
Recitation
RecitationRecitation
Recitation
 
Ksi
KsiKsi
Ksi
 
تصنيع البروتينات في الخلية
تصنيع البروتينات في الخليةتصنيع البروتينات في الخلية
تصنيع البروتينات في الخلية
 
DNA Motif Finding 2010
DNA Motif Finding 2010DNA Motif Finding 2010
DNA Motif Finding 2010
 
Attribute Based Encryption
Attribute Based EncryptionAttribute Based Encryption
Attribute Based Encryption
 
Sample graduation project presentation
Sample graduation project presentationSample graduation project presentation
Sample graduation project presentation
 

Similar to Transcription Factor DNA Binding Prediction

IRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET Journal
 
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor DriveIRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor DriveIRJET Journal
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
2cee Master Cocomo20071
2cee Master Cocomo200712cee Master Cocomo20071
2cee Master Cocomo20071CS, NcState
 
SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.bhavinecindus
 
IRJET - Cognitive based Emotion Analysis of a Child Reading a Book
IRJET -  	  Cognitive based Emotion Analysis of a Child Reading a BookIRJET -  	  Cognitive based Emotion Analysis of a Child Reading a Book
IRJET - Cognitive based Emotion Analysis of a Child Reading a BookIRJET Journal
 
Developing Tools for “What if…” Testing of Large-scale Software Systems
Developing Tools for “What if…” Testing of Large-scale Software SystemsDeveloping Tools for “What if…” Testing of Large-scale Software Systems
Developing Tools for “What if…” Testing of Large-scale Software Systems James Hill
 
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...IRJET Journal
 
Software Product Measurement and Analysis in a Continuous Integration Environ...
Software Product Measurement and Analysis in a Continuous Integration Environ...Software Product Measurement and Analysis in a Continuous Integration Environ...
Software Product Measurement and Analysis in a Continuous Integration Environ...Gabriel Moreira
 
Quality Prediction in Fingerprint Compression
Quality Prediction in Fingerprint CompressionQuality Prediction in Fingerprint Compression
Quality Prediction in Fingerprint CompressionIJTET Journal
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10Roger Barga
 
Image Features Matching and Classification Using Machine Learning
Image Features Matching and Classification Using Machine LearningImage Features Matching and Classification Using Machine Learning
Image Features Matching and Classification Using Machine LearningIRJET Journal
 
Archana kalapgar 19210184_ca684
Archana kalapgar 19210184_ca684Archana kalapgar 19210184_ca684
Archana kalapgar 19210184_ca684ArchanaKalapgar
 
Principles of effort estimation
Principles of effort estimationPrinciples of effort estimation
Principles of effort estimationCS, NcState
 
KnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProjectKnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProjectMarciano Moreno
 
Sign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedSign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedIRJET Journal
 
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGESA DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGESPNandaSai
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...IOSR Journals
 

Similar to Transcription Factor DNA Binding Prediction (20)

InternshipReport
InternshipReportInternshipReport
InternshipReport
 
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware Performance
 
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor DriveIRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
2cee Master Cocomo20071
2cee Master Cocomo200712cee Master Cocomo20071
2cee Master Cocomo20071
 
SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.
 
IRJET - Cognitive based Emotion Analysis of a Child Reading a Book
IRJET -  	  Cognitive based Emotion Analysis of a Child Reading a BookIRJET -  	  Cognitive based Emotion Analysis of a Child Reading a Book
IRJET - Cognitive based Emotion Analysis of a Child Reading a Book
 
Test for AI model
Test for AI modelTest for AI model
Test for AI model
 
Developing Tools for “What if…” Testing of Large-scale Software Systems
Developing Tools for “What if…” Testing of Large-scale Software SystemsDeveloping Tools for “What if…” Testing of Large-scale Software Systems
Developing Tools for “What if…” Testing of Large-scale Software Systems
 
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
 
Software Product Measurement and Analysis in a Continuous Integration Environ...
Software Product Measurement and Analysis in a Continuous Integration Environ...Software Product Measurement and Analysis in a Continuous Integration Environ...
Software Product Measurement and Analysis in a Continuous Integration Environ...
 
Quality Prediction in Fingerprint Compression
Quality Prediction in Fingerprint CompressionQuality Prediction in Fingerprint Compression
Quality Prediction in Fingerprint Compression
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Image Features Matching and Classification Using Machine Learning
Image Features Matching and Classification Using Machine LearningImage Features Matching and Classification Using Machine Learning
Image Features Matching and Classification Using Machine Learning
 
Archana kalapgar 19210184_ca684
Archana kalapgar 19210184_ca684Archana kalapgar 19210184_ca684
Archana kalapgar 19210184_ca684
 
Principles of effort estimation
Principles of effort estimationPrinciples of effort estimation
Principles of effort estimation
 
KnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProjectKnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProject
 
Sign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedSign Detection from Hearing Impaired
Sign Detection from Hearing Impaired
 
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGESA DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...
 

More from UT, San Antonio

digital certificate - types and formats
digital certificate - types and formatsdigital certificate - types and formats
digital certificate - types and formatsUT, San Antonio
 
Static Analysis with Sonarlint
Static Analysis with SonarlintStatic Analysis with Sonarlint
Static Analysis with SonarlintUT, San Antonio
 
Shellshock- from bug towards vulnerability
Shellshock- from bug towards vulnerabilityShellshock- from bug towards vulnerability
Shellshock- from bug towards vulnerabilityUT, San Antonio
 
Big Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationBig Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationUT, San Antonio
 
Enumerated authorization policy ABAC (EP-ABAC) model
Enumerated authorization policy ABAC (EP-ABAC) modelEnumerated authorization policy ABAC (EP-ABAC) model
Enumerated authorization policy ABAC (EP-ABAC) modelUT, San Antonio
 
Where is my Privacy presentation slideshow (one page only)
Where is my Privacy presentation slideshow (one page only)Where is my Privacy presentation slideshow (one page only)
Where is my Privacy presentation slideshow (one page only)UT, San Antonio
 
Security_of_openstack_keystone
Security_of_openstack_keystoneSecurity_of_openstack_keystone
Security_of_openstack_keystoneUT, San Antonio
 
Research seminar group_1_prosunjit
Research seminar group_1_prosunjitResearch seminar group_1_prosunjit
Research seminar group_1_prosunjitUT, San Antonio
 
Final Project Transciption Factor DNA binding Prediction
Final Project Transciption Factor DNA binding Prediction Final Project Transciption Factor DNA binding Prediction
Final Project Transciption Factor DNA binding Prediction UT, San Antonio
 
Transcription Factor DNA Binding Prediction
Transcription Factor DNA Binding PredictionTranscription Factor DNA Binding Prediction
Transcription Factor DNA Binding PredictionUT, San Antonio
 
On the incoherencies in web browser access control
On the incoherencies in web browser access controlOn the incoherencies in web browser access control
On the incoherencies in web browser access controlUT, San Antonio
 

More from UT, San Antonio (20)

digital certificate - types and formats
digital certificate - types and formatsdigital certificate - types and formats
digital certificate - types and formats
 
Saml metadata
Saml metadataSaml metadata
Saml metadata
 
Static Analysis with Sonarlint
Static Analysis with SonarlintStatic Analysis with Sonarlint
Static Analysis with Sonarlint
 
Shellshock- from bug towards vulnerability
Shellshock- from bug towards vulnerabilityShellshock- from bug towards vulnerability
Shellshock- from bug towards vulnerability
 
Abac17 prosun-slides
Abac17 prosun-slidesAbac17 prosun-slides
Abac17 prosun-slides
 
Abac17 prosun-slides
Abac17 prosun-slidesAbac17 prosun-slides
Abac17 prosun-slides
 
Big Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationBig Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory Computation
 
Enumerated authorization policy ABAC (EP-ABAC) model
Enumerated authorization policy ABAC (EP-ABAC) modelEnumerated authorization policy ABAC (EP-ABAC) model
Enumerated authorization policy ABAC (EP-ABAC) model
 
Where is my Privacy presentation slideshow (one page only)
Where is my Privacy presentation slideshow (one page only)Where is my Privacy presentation slideshow (one page only)
Where is my Privacy presentation slideshow (one page only)
 
Three month course
Three month courseThree month course
Three month course
 
One month-syllabus
One month-syllabusOne month-syllabus
One month-syllabus
 
Zerovm backgroud
Zerovm backgroudZerovm backgroud
Zerovm backgroud
 
Security_of_openstack_keystone
Security_of_openstack_keystoneSecurity_of_openstack_keystone
Security_of_openstack_keystone
 
Research seminar group_1_prosunjit
Research seminar group_1_prosunjitResearch seminar group_1_prosunjit
Research seminar group_1_prosunjit
 
Final Project Transciption Factor DNA binding Prediction
Final Project Transciption Factor DNA binding Prediction Final Project Transciption Factor DNA binding Prediction
Final Project Transciption Factor DNA binding Prediction
 
Transcription Factor DNA Binding Prediction
Transcription Factor DNA Binding PredictionTranscription Factor DNA Binding Prediction
Transcription Factor DNA Binding Prediction
 
Secure webbrowsing 1
Secure webbrowsing 1Secure webbrowsing 1
Secure webbrowsing 1
 
On the incoherencies in web browser access control
On the incoherencies in web browser access controlOn the incoherencies in web browser access control
On the incoherencies in web browser access control
 
Cultural conflict
Cultural conflictCultural conflict
Cultural conflict
 
Pair programming
Pair programmingPair programming
Pair programming
 

Recently uploaded

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Transcription Factor DNA Binding Prediction

  • 1. Final Project –CS6243 Transcription Factor DNA Binding Prediction Team Members: Badri Sampath α Iffat Sharmin Chowdhury α Prosunjit Biswas α Tahmina Ahmed α α Department of Computer Science University of Texas at San Antonio.
  • 2. 1. Defining the Scope of the Project: In this project, we have given a number of labeled (which are p & n) DNA sequence and a number of unlabeled DNA sequence which we have to label based on a model built from the given labeled sequences. Eventually, the scope of the problem is to build a binary classifier model based on the given training DNA sequence and apply the model to label the unlabeled DNA sequence. 1.1 Challenges of the Projects: In conventional classification problem, there are a number of different attributes that we can readily use to build the classifier. In this project, we are only given sequences and label. So, part of the work for this project, is to find a way for generating meaningful attribute. Fig. 1 : Overall scope of the project. 2. K-mer Based Approach: In the K-mer approach, we have generated all possible combination of DNA characters for a specified length of K. The K-mer Approach is shown in details in figure 2. The important steps of the k- mer approach are discussed in the following paragraphs. Fig 2: Overall K-mer based process. After we have generated the K-mers, we have followed different kind of approaches to count the their frequencies which are i)Strict matching , ii) matching with mismatch and iii) matching based on Regular Expression. In order to build an optimum model, we have tuned different parameters of the model. Some of parameters and their impact on the classifier is shown in table I. 3. PWM Based Approach: We have used a motif finding tool named MEME [1] to generate specified number of motifs of specific minimum and maximum length and motif Alignment and search tool MAST [2] to get the E-value (bounded to 100)for each sequence. We have derived scores from these E-values by subtracting the E-value from 100 for ordering the sequences according to their E-value. We
  • 3. have used these scores specific to each motif as attributes of the sequences and feed them to different classifiers. Table II gives the synopsis of parameters and their impact on the model. Table I: Synopsis of the parameters and their effect in the K-mer model building process. K-mer Value Classifier Selection String Match MisMatch Regular Expression 5( Best) Logistic (Best) When applied When not applied Not significant (perform best) (perform best) 4(reasonably SMO (Good) When not applied When applied (perform good) (perform relatively worse) relatively worse) 6 (Comparatively J48 (Comparatively bad) weak) Table II: Synopsis of the parameters for PWM approach and their effect in the model No. of Motif No.of Sites a Min / Max Length of Motif Classifier Motif appear 10 18 6-15 J48(Best) 8 20 5-16 Logistic(Moderate) 5 10 6-15 Naïve Bayes(comparatively Bad) 4. Combining K-mer & PWM approach: In order to obtain a better model, we have combined both K-mer and PWM approaches with known best parameters. We found reasonable improvement for the combined approach when applying it in the training data. 5. Some Difficulties and Limitation of our Work: Tuning the parameters for the classifier was the most challenging part of the project. We think, we have done reasonable experiment for choosing the parameters given the limited timeline. 6. Acknowledgement: At the end of the project, we would like to thank Dr. Ruan for assigning us such a challenging project. It offered us good working knowledge of practical Machine Learning and data mining stuffs. Working in the group was also a nice experience and knowledge sharing scope for us. References: [1-2] “MEME Suite“, available at http://meme.sdsc.edu/meme/meme-download.html [3] “Weka”, available at: http://www.cs.waikato.ac.nz/ml/weka/index_downloading.html