SlideShare a Scribd company logo
1




Computational discovery of
composite motifs in DNA

Geir Kjetil Sandve, Osman Abul and Finn Drabløs


                                     Finn Drabløs [tare.medisin.ntnu.no]
Introduction                                                  2



   Basic gene regulation
 • Proteins (transcription
   factors, TFs)
   recognise binding
   sites (sequence
   motifs) in gene
   regulatory regions
 • The transcription
   factors stabilise the                      Michael Lones

   transcription complex
 • Distal promoters
   (enhancers) interact
   through DNA looping

                             Finn Drabløs [tare.medisin.ntnu.no]
Motivation                                                                                     3



 De novo prediction of binding sites
 • Make a set of co-regulated genes
     – E.g. from microarray experiments, normally imperfect sets
 • Extract assumed regulatory regions
     – Normally a fixed region upstream from TSS of each gene
 • Search for overrepresented patterns in these regions
     – Use a model for what a motif should look like
         • Consensus sequence with mismatches
         • Position Weight Matrix (PWM) based on log odds scores for occurrences
     – Use a strategy to find (local) optima for this model
         • E.g. Gibbs sampling, expectation maximisation …

 • Problem: More than 100 different methods
     – Which methods are reliable?



                                                              Finn Drabløs [tare.medisin.ntnu.no]
Motivation                                                                            4



   Benchmarking of de novo tools
   • Tompa et al, Nature Biotech 23, 137-144 (2005)
   • Tested 14 different tools for motif discovery
   • Used 52 data sets from fly (6), human (26), mouse (12)
     and yeast (8)
   • Used data sets with real (Transfac) binding sites in
     different sequence contexts
       – ”real” – The actual promoter sequences
       – ”generic” – Randomly chosen promoter sequences from same genome
       – ”markov” – Sequences generated by Markov chain of order 3
   • Measured performance at nucleotide level




                                                     Finn Drabløs [tare.medisin.ntnu.no]
Motivation                                                                                  5




 Average benchmark performance
   Method         TP      FP     FN       TN     TP FN
   AlignAce       477    3789   8186   436048    FP TN   Pred_P        Pred_N
   ANN-Spec       754    7799   7909   432038
   Consensus      178    1394   8485   438443   Real_P      471            8192
   GLAM           223    5619   8440   434218   Real_N     5167        434670
   Improbizer     594   7942    8069   431895
   MEME           581    4836   8082   435001
   MEME3          673    6726   7990   433111   nCC = 0.053
   MITRA          272    4092   8391   435745
   MotifSampler   520   4344    8143   435493   Performance is close to
   Oligo/dyad     345    1891   8318   437946
   QuickScore     151    4856   8512   434981
                                                random!
   SeSiMCMC       530   13813   8133   426024
   Weeder         748    1748   7915   438089   Too many FP, FN
   YMF            554    3492   8109   436345




                                                           Finn Drabløs [tare.medisin.ntnu.no]
Motivation                                                                              6



   Can we improve performance?
 • Use better motif representations
     – Hidden Markov Models
 • Use better algorithms
     – More exhaustive searching TODAY!
     – Discriminative motif discovery
 • Use better background models
     – Real sequences (not Markov models)     TODAY!



 • Filter out false positives
     – Identify “motif-like” solutions
     – Identify regulatory regions
     – Use co-occurrence of motifs
                                         TODAY!
         • Modules, composite motifs

                                                       Finn Drabløs [tare.medisin.ntnu.no]
Approach                                                               7



 Composite motif discovery




• TFs act together as modules
• Modules are not completely unique

                                      Finn Drabløs [tare.medisin.ntnu.no]
Algorithm                                                                                           8



 Basic definitions
 • Frequent modules
     – Modules (and motifs) can be ranked by support
            • Fraction of sequences where the module (or motif) is found
     – Support is monotonous
            • Adding a motif to a module can never increase module support

 • Specific modules
     – Modules can be ranked by hit probability
            • Probability that a sequence supports the module
     – Hit probability is monotonous (as for support)
     – Specific modules have low hit probability in background sequences
 • Significant modules
     – Modules can be ranked by significance
            • Probability that support in sequence ≠ background



                                                                   Finn Drabløs [tare.medisin.ntnu.no]
Algorithm                                                                      9



 Search tree
 • Discretized single motifs
   {1, 2, 3, …} organised as an
   implicit search tree
 • Support set H and hit
   probability P is iteratively
   computed (monotonicity)
     – Initially H is full sequence set and
       P is 1)
 • Search tree is efficiently
   pruned (indicated with X)
   based on H and P
 • Final output can be ranked
   by module significance
                                              Finn Drabløs [tare.medisin.ntnu.no]
Implementation                                                                                   10



 Module significance
 • Position-level probability in background
     – Probability of single motif at specific location
     – Estimated from real DNA background sequences
 • Sequence-level probability in background
     – Probability of single motif at least once in given background sequence
     – Estimated as union of position-level probabilities
 • Hit-probability in background
     – Probability of composite motif at least once in background sequence
     – Estimated as product of individual motif components
 • Significance p-value of observed support
     – Probability of seeing at least observed support in background set
     – Estimated as right tail of binomial distribution
 p       • At least k out of n successes given hit-probability


                                                                 Finn Drabløs [tare.medisin.ntnu.no]
Implementation                                                                        11



 Problem specification
 • Frequent and specific modules
     – Use thresholds on support and
       specificity
     – Complete solutions but multi-
       objective optimization
 • Top-ranking modules
     – Combine objectives into single
       measure, e.g. p-value
 • Pareto-optimal modules
     – Each objective is a separate
       dimension of optimality
                                          http://en.wikipedia.org/wiki/Pareto_efficiency
     – Return Pareto front of composite
       motifs



                                                      Finn Drabløs [tare.medisin.ntnu.no]
Implementation                                            12



 Motif prediction flowchart




                          Finn Drabløs [tare.medisin.ntnu.no]
Benchmarking                                                                               13



 Benchmark data set



 • Known composite motifs from the TransCompel database
 • Tests performance by adding “noise matrices” to input
    – Matrices for TFs assumed not to bind in sequence set
        • Will have random (false positive) hits
    – Selected at random from Transfac
        • Max noise level includes all Transfac matrices
    – Similar to actual usage
        • Searching for motifs consisting of unknown TFs


                                                           Finn Drabløs [tare.medisin.ntnu.no]
Benchmarking                                                                14



 General performance (nCC)




 • Compo compared to several other tools
    – TransCompel benchmark set
 • Compo has clearly best performance, in particular at
   realistic settings (high noise level)

                                            Finn Drabløs [tare.medisin.ntnu.no]
Benchmarking                                                                       15



 Background and support
 • Compo gains performance from realistic background (real
   DNA) and support
    – Random DNA based on multinomial sequence model
 • Performance without real DNA background or support
   comparable to other tools




                                                   Finn Drabløs [tare.medisin.ntnu.no]
Future development                                                            16



 Pareto front
• Pareto front on support,
  max motif distance and
  significance (colour)
• Compo prediction not
  optimal
    – Compo predicted Ets and
      GATA
    – Annotated motif is AP1 and
      NFAT
• Explore alternative
  solutions
• Explore parameter                X – NFAT
  interactions                     O – AP1
                                              Finn Drabløs [tare.medisin.ntnu.no]
Acknowledgements                                                                                17



  The research group
   BiGR                                   Programmers / Technicians
                                          Johansen, Jostein
   Drabløs, Finn                          Thomas, Laurent
                                          Olsen, Lene C.
   Postdocs / Researchers
   Sætrom, Pål                            Others
   Kusnierczyk, Wacek                     Solbakken, Trude
   Rye, Morten
   Klein, Jörn                            Master students
   Anderssen, Endre                       Bolstad, Kjersti
   Wang, Xinhui (ERCIM)                   Muiser, Iwe
   Capatana, Ana (ERCIM, starting 2009)   Sponberg, Bjørn
                                          Brands, Stef
   PhDs                                   Skaland, Even
   Bratlie, Marit Skyrud
   Klepper, Kjetil                        Former members
   Saito, Takaya                          Sandve, Geir Kjetil
   Lundbæk, Marie                         Abul, Osman
   Håndstad, Tony                         Schwalie, Petra
                                          Lones, Michael

                                                                Finn Drabløs [tare.medisin.ntnu.no]

More Related Content

Similar to Drablos Composite Motifs Bosc2009

Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...
Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...
Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...
Varun Ojha
 
Reference Materials Selection and Design Working Group Summary Aug2012
Reference Materials Selection and Design Working Group Summary Aug2012Reference Materials Selection and Design Working Group Summary Aug2012
Reference Materials Selection and Design Working Group Summary Aug2012
GenomeInABottle
 
Deep Learning Frameworks slides
Deep Learning Frameworks slides Deep Learning Frameworks slides
Deep Learning Frameworks slides
Sheamus McGovern
 
Deep learning frameworks v0.40
Deep learning frameworks v0.40Deep learning frameworks v0.40
Deep learning frameworks v0.40
Jessica Willis
 
Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...
Daichi Kitamura
 
High-Dimensional Machine Learning for Medicine
High-Dimensional Machine Learning for MedicineHigh-Dimensional Machine Learning for Medicine
High-Dimensional Machine Learning for Medicine
Paris Women in Machine Learning and Data Science
 
Towards automating machine learning: benchmarking tools for hyperparameter tu...
Towards automating machine learning: benchmarking tools for hyperparameter tu...Towards automating machine learning: benchmarking tools for hyperparameter tu...
Towards automating machine learning: benchmarking tools for hyperparameter tu...
PyData
 
2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
c.titus.brown
 
SC1.pptx
SC1.pptxSC1.pptx
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
DataScienceConferenc1
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
butest
 
Evolutionary (deep) neural network
Evolutionary (deep) neural networkEvolutionary (deep) neural network
Evolutionary (deep) neural network
Soo-Yong Shin
 
Neural network
Neural networkNeural network
Neural network
Saddam Hussain
 
140127 rm selection wg summary
140127 rm selection wg summary140127 rm selection wg summary
140127 rm selection wg summary
GenomeInABottle
 
Lec 18-19.pptx
Lec 18-19.pptxLec 18-19.pptx
Lec 18-19.pptx
vijaita kashyap
 
Artificial Neural Network Learning Algorithm.ppt
Artificial Neural Network Learning Algorithm.pptArtificial Neural Network Learning Algorithm.ppt
Artificial Neural Network Learning Algorithm.ppt
NJUSTAiMo
 
Predicting Customer Conversion with Random Forests
Predicting Customer Conversion with Random ForestsPredicting Customer Conversion with Random Forests
Predicting Customer Conversion with Random Forests
Enplus Advisors, Inc.
 
13 random forest
13 random forest13 random forest
13 random forest
Vishal Dutt
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
yang947066
 
Introduction to Deep Learning presentation
Introduction to Deep Learning presentationIntroduction to Deep Learning presentation
Introduction to Deep Learning presentation
johanericka2
 

Similar to Drablos Composite Motifs Bosc2009 (20)

Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...
Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...
Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...
 
Reference Materials Selection and Design Working Group Summary Aug2012
Reference Materials Selection and Design Working Group Summary Aug2012Reference Materials Selection and Design Working Group Summary Aug2012
Reference Materials Selection and Design Working Group Summary Aug2012
 
Deep Learning Frameworks slides
Deep Learning Frameworks slides Deep Learning Frameworks slides
Deep Learning Frameworks slides
 
Deep learning frameworks v0.40
Deep learning frameworks v0.40Deep learning frameworks v0.40
Deep learning frameworks v0.40
 
Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...
 
High-Dimensional Machine Learning for Medicine
High-Dimensional Machine Learning for MedicineHigh-Dimensional Machine Learning for Medicine
High-Dimensional Machine Learning for Medicine
 
Towards automating machine learning: benchmarking tools for hyperparameter tu...
Towards automating machine learning: benchmarking tools for hyperparameter tu...Towards automating machine learning: benchmarking tools for hyperparameter tu...
Towards automating machine learning: benchmarking tools for hyperparameter tu...
 
2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
SC1.pptx
SC1.pptxSC1.pptx
SC1.pptx
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
 
Evolutionary (deep) neural network
Evolutionary (deep) neural networkEvolutionary (deep) neural network
Evolutionary (deep) neural network
 
Neural network
Neural networkNeural network
Neural network
 
140127 rm selection wg summary
140127 rm selection wg summary140127 rm selection wg summary
140127 rm selection wg summary
 
Lec 18-19.pptx
Lec 18-19.pptxLec 18-19.pptx
Lec 18-19.pptx
 
Artificial Neural Network Learning Algorithm.ppt
Artificial Neural Network Learning Algorithm.pptArtificial Neural Network Learning Algorithm.ppt
Artificial Neural Network Learning Algorithm.ppt
 
Predicting Customer Conversion with Random Forests
Predicting Customer Conversion with Random ForestsPredicting Customer Conversion with Random Forests
Predicting Customer Conversion with Random Forests
 
13 random forest
13 random forest13 random forest
13 random forest
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
Introduction to Deep Learning presentation
Introduction to Deep Learning presentationIntroduction to Deep Learning presentation
Introduction to Deep Learning presentation
 

More from bosc

Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009
bosc
 
Bosc Intro 20090627
Bosc Intro 20090627Bosc Intro 20090627
Bosc Intro 20090627
bosc
 
Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009
bosc
 
Schbath Rmes Bosc2009
Schbath Rmes Bosc2009Schbath Rmes Bosc2009
Schbath Rmes Bosc2009
bosc
 
Kallio Chipster Bosc2009
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009
bosc
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
bosc
 
Rice Emboss Bosc2009
Rice Emboss Bosc2009Rice Emboss Bosc2009
Rice Emboss Bosc2009
bosc
 
Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009
bosc
 
Senger Soaplab Bosc2009
Senger Soaplab Bosc2009Senger Soaplab Bosc2009
Senger Soaplab Bosc2009
bosc
 
Cock Biopython Bosc2009
Cock Biopython Bosc2009Cock Biopython Bosc2009
Cock Biopython Bosc2009
bosc
 
Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009
bosc
 
Snell Psoda Bosc2009
Snell Psoda Bosc2009Snell Psoda Bosc2009
Snell Psoda Bosc2009
bosc
 
Procter Vamsas Bosc2009
Procter Vamsas Bosc2009Procter Vamsas Bosc2009
Procter Vamsas Bosc2009
bosc
 
Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009
bosc
 
Moeller Debian Bosc2009
Moeller Debian Bosc2009Moeller Debian Bosc2009
Moeller Debian Bosc2009
bosc
 
Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009
bosc
 
Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009
bosc
 
Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009
bosc
 
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
bosc
 
Trelles_QnormBOSC2009
Trelles_QnormBOSC2009Trelles_QnormBOSC2009
Trelles_QnormBOSC2009
bosc
 

More from bosc (20)

Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009
 
Bosc Intro 20090627
Bosc Intro 20090627Bosc Intro 20090627
Bosc Intro 20090627
 
Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009
 
Schbath Rmes Bosc2009
Schbath Rmes Bosc2009Schbath Rmes Bosc2009
Schbath Rmes Bosc2009
 
Kallio Chipster Bosc2009
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
 
Rice Emboss Bosc2009
Rice Emboss Bosc2009Rice Emboss Bosc2009
Rice Emboss Bosc2009
 
Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009
 
Senger Soaplab Bosc2009
Senger Soaplab Bosc2009Senger Soaplab Bosc2009
Senger Soaplab Bosc2009
 
Cock Biopython Bosc2009
Cock Biopython Bosc2009Cock Biopython Bosc2009
Cock Biopython Bosc2009
 
Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009
 
Snell Psoda Bosc2009
Snell Psoda Bosc2009Snell Psoda Bosc2009
Snell Psoda Bosc2009
 
Procter Vamsas Bosc2009
Procter Vamsas Bosc2009Procter Vamsas Bosc2009
Procter Vamsas Bosc2009
 
Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009
 
Moeller Debian Bosc2009
Moeller Debian Bosc2009Moeller Debian Bosc2009
Moeller Debian Bosc2009
 
Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009
 
Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009
 
Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009
 
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
 
Trelles_QnormBOSC2009
Trelles_QnormBOSC2009Trelles_QnormBOSC2009
Trelles_QnormBOSC2009
 

Recently uploaded

Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 

Recently uploaded (20)

Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 

Drablos Composite Motifs Bosc2009

  • 1. 1 Computational discovery of composite motifs in DNA Geir Kjetil Sandve, Osman Abul and Finn Drabløs Finn Drabløs [tare.medisin.ntnu.no]
  • 2. Introduction 2 Basic gene regulation • Proteins (transcription factors, TFs) recognise binding sites (sequence motifs) in gene regulatory regions • The transcription factors stabilise the Michael Lones transcription complex • Distal promoters (enhancers) interact through DNA looping Finn Drabløs [tare.medisin.ntnu.no]
  • 3. Motivation 3 De novo prediction of binding sites • Make a set of co-regulated genes – E.g. from microarray experiments, normally imperfect sets • Extract assumed regulatory regions – Normally a fixed region upstream from TSS of each gene • Search for overrepresented patterns in these regions – Use a model for what a motif should look like • Consensus sequence with mismatches • Position Weight Matrix (PWM) based on log odds scores for occurrences – Use a strategy to find (local) optima for this model • E.g. Gibbs sampling, expectation maximisation … • Problem: More than 100 different methods – Which methods are reliable? Finn Drabløs [tare.medisin.ntnu.no]
  • 4. Motivation 4 Benchmarking of de novo tools • Tompa et al, Nature Biotech 23, 137-144 (2005) • Tested 14 different tools for motif discovery • Used 52 data sets from fly (6), human (26), mouse (12) and yeast (8) • Used data sets with real (Transfac) binding sites in different sequence contexts – ”real” – The actual promoter sequences – ”generic” – Randomly chosen promoter sequences from same genome – ”markov” – Sequences generated by Markov chain of order 3 • Measured performance at nucleotide level Finn Drabløs [tare.medisin.ntnu.no]
  • 5. Motivation 5 Average benchmark performance Method TP FP FN TN TP FN AlignAce 477 3789 8186 436048 FP TN Pred_P Pred_N ANN-Spec 754 7799 7909 432038 Consensus 178 1394 8485 438443 Real_P 471 8192 GLAM 223 5619 8440 434218 Real_N 5167 434670 Improbizer 594 7942 8069 431895 MEME 581 4836 8082 435001 MEME3 673 6726 7990 433111 nCC = 0.053 MITRA 272 4092 8391 435745 MotifSampler 520 4344 8143 435493 Performance is close to Oligo/dyad 345 1891 8318 437946 QuickScore 151 4856 8512 434981 random! SeSiMCMC 530 13813 8133 426024 Weeder 748 1748 7915 438089 Too many FP, FN YMF 554 3492 8109 436345 Finn Drabløs [tare.medisin.ntnu.no]
  • 6. Motivation 6 Can we improve performance? • Use better motif representations – Hidden Markov Models • Use better algorithms – More exhaustive searching TODAY! – Discriminative motif discovery • Use better background models – Real sequences (not Markov models) TODAY! • Filter out false positives – Identify “motif-like” solutions – Identify regulatory regions – Use co-occurrence of motifs TODAY! • Modules, composite motifs Finn Drabløs [tare.medisin.ntnu.no]
  • 7. Approach 7 Composite motif discovery • TFs act together as modules • Modules are not completely unique Finn Drabløs [tare.medisin.ntnu.no]
  • 8. Algorithm 8 Basic definitions • Frequent modules – Modules (and motifs) can be ranked by support • Fraction of sequences where the module (or motif) is found – Support is monotonous • Adding a motif to a module can never increase module support • Specific modules – Modules can be ranked by hit probability • Probability that a sequence supports the module – Hit probability is monotonous (as for support) – Specific modules have low hit probability in background sequences • Significant modules – Modules can be ranked by significance • Probability that support in sequence ≠ background Finn Drabløs [tare.medisin.ntnu.no]
  • 9. Algorithm 9 Search tree • Discretized single motifs {1, 2, 3, …} organised as an implicit search tree • Support set H and hit probability P is iteratively computed (monotonicity) – Initially H is full sequence set and P is 1) • Search tree is efficiently pruned (indicated with X) based on H and P • Final output can be ranked by module significance Finn Drabløs [tare.medisin.ntnu.no]
  • 10. Implementation 10 Module significance • Position-level probability in background – Probability of single motif at specific location – Estimated from real DNA background sequences • Sequence-level probability in background – Probability of single motif at least once in given background sequence – Estimated as union of position-level probabilities • Hit-probability in background – Probability of composite motif at least once in background sequence – Estimated as product of individual motif components • Significance p-value of observed support – Probability of seeing at least observed support in background set – Estimated as right tail of binomial distribution p • At least k out of n successes given hit-probability Finn Drabløs [tare.medisin.ntnu.no]
  • 11. Implementation 11 Problem specification • Frequent and specific modules – Use thresholds on support and specificity – Complete solutions but multi- objective optimization • Top-ranking modules – Combine objectives into single measure, e.g. p-value • Pareto-optimal modules – Each objective is a separate dimension of optimality http://en.wikipedia.org/wiki/Pareto_efficiency – Return Pareto front of composite motifs Finn Drabløs [tare.medisin.ntnu.no]
  • 12. Implementation 12 Motif prediction flowchart Finn Drabløs [tare.medisin.ntnu.no]
  • 13. Benchmarking 13 Benchmark data set • Known composite motifs from the TransCompel database • Tests performance by adding “noise matrices” to input – Matrices for TFs assumed not to bind in sequence set • Will have random (false positive) hits – Selected at random from Transfac • Max noise level includes all Transfac matrices – Similar to actual usage • Searching for motifs consisting of unknown TFs Finn Drabløs [tare.medisin.ntnu.no]
  • 14. Benchmarking 14 General performance (nCC) • Compo compared to several other tools – TransCompel benchmark set • Compo has clearly best performance, in particular at realistic settings (high noise level) Finn Drabløs [tare.medisin.ntnu.no]
  • 15. Benchmarking 15 Background and support • Compo gains performance from realistic background (real DNA) and support – Random DNA based on multinomial sequence model • Performance without real DNA background or support comparable to other tools Finn Drabløs [tare.medisin.ntnu.no]
  • 16. Future development 16 Pareto front • Pareto front on support, max motif distance and significance (colour) • Compo prediction not optimal – Compo predicted Ets and GATA – Annotated motif is AP1 and NFAT • Explore alternative solutions • Explore parameter X – NFAT interactions O – AP1 Finn Drabløs [tare.medisin.ntnu.no]
  • 17. Acknowledgements 17 The research group BiGR Programmers / Technicians Johansen, Jostein Drabløs, Finn Thomas, Laurent Olsen, Lene C. Postdocs / Researchers Sætrom, Pål Others Kusnierczyk, Wacek Solbakken, Trude Rye, Morten Klein, Jörn Master students Anderssen, Endre Bolstad, Kjersti Wang, Xinhui (ERCIM) Muiser, Iwe Capatana, Ana (ERCIM, starting 2009) Sponberg, Bjørn Brands, Stef PhDs Skaland, Even Bratlie, Marit Skyrud Klepper, Kjetil Former members Saito, Takaya Sandve, Geir Kjetil Lundbæk, Marie Abul, Osman Håndstad, Tony Schwalie, Petra Lones, Michael Finn Drabløs [tare.medisin.ntnu.no]