XPRIME: A Novel Motif Searching Method

             Rachel L. Poulsen

             Department of Statistics
            ...
Introduction




      DNA contains the genetic instructions that uniquely define an
      organism
      RNA is created to...
Introduction




      DNA contains the genetic instructions that uniquely define an
      organism
      RNA is created to...
Transcription

                DNA
Transcription

                DNA   RNA
Transcription

                DNA   RNA
Position Weight Matrix (PWM) (Hertz et al 1990)
Position Weight Matrix (PWM) (Hertz et al 1990)




       ETS1 TF binding motif
    Position:        1     2       3    ...
Sequence Logos




           Figure: DNA binding motif for the ETS1 TF
De Novo motif searching
De Novo motif searching




      Regular expression enumeration
De Novo motif searching




      Regular expression enumeration
        1   Actual count vs. expected count
        2   D...
De Novo motif searching




      Regular expression enumeration
        1   Actual count vs. expected count
        2   D...
De Novo motif searching




      Regular expression enumeration
        1   Actual count vs. expected count
        2   D...
Known Motif Search




    1   GREP
    2   Database search with scoring function (Hertz et al 1990)
XPIME: An Improved Method
XPIME: An Improved Method




     TRANSFAC (Matys et al 2003)
         Information pulled from in vitro experiments and l...
XPIME: An Improved Method




     TRANSFAC (Matys et al 2003)
         Information pulled from in vitro experiments and l...
XPIME: An Improved Method




     TRANSFAC (Matys et al 2003)
         Information pulled from in vitro experiments and l...
Notation and Data
Notation and Data


      Indices
          w: width of motif
          L: length of sequence
          m: motif indicator...
Notation and Data


      Indices
          w: width of motif
          L: length of sequence
          m: motif indicator...
Notation and Data


      Indices
          w: width of motif
          L: length of sequence
          m: motif indicator...
The Scoring Function




                                 w
          MotifScore = f (y) =                     pij I (yj =...
Methods: Complete Data Likelihood




      (m+1) – component mixture model
Methods: Complete Data Likelihood




      (m+1) – component mixture model
               Ls
    L(θ|z) =         C (yi )...
Methods: Priors


      fm+1 (y ) is fixed a priori
      ∆(m+1)i ’s are missing a priori
      f1 (y ), · · · , fm (y ) ha...
Methods: Gibbs Algorithm
Methods: Gibbs Algorithm




    1   Draws ∆’s from a multinomial distribution
            p∆ ∝ rM ∗ fM (y )
Methods: Gibbs Algorithm




    1   Draws ∆’s from a multinomial distribution
            p∆ ∝ rM ∗ fM (y )
    2   Draws...
Methods: Gibbs Algorithm




    1   Draws ∆’s from a multinomial distribution
            p∆ ∝ rM ∗ fM (y )
    2   Draws...
An Example: ETS1




     We hypothesize that ETS1 has a specific binding site

     The Data
       1   ETS1 only
       2...
ETS1 Binding Motifs




       (a) ETS1 from TRANSFAC     (b) ETS1 from ETS1 only




       (c) ETS1 from GABP only   (d)...
Justification of Prior Information


       Pete Hollenhorst sequence logo
Justification of Prior Information


             Figure: Motif found without prior specification




              Figure: ...
Conclusions and Future Research
Conclusions and Future Research




      XPRIME successfully searches for de novo and known motifs
Conclusions and Future Research




      XPRIME successfully searches for de novo and known motifs
      Evidence found s...
Conclusions and Future Research




      XPRIME successfully searches for de novo and known motifs
      Evidence found s...
Conclusions and Future Research




      XPRIME successfully searches for de novo and known motifs
      Evidence found s...
Upcoming SlideShare
Loading in …5
×

XPRIME: A Novel Motif Searching Method

586
-1

Published on

Presentation prepared for the WNAR conference held at Portland State University in 2009

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
586
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

XPRIME: A Novel Motif Searching Method

  1. 1. XPRIME: A Novel Motif Searching Method Rachel L. Poulsen Department of Statistics Brigham Young University June 15, 2009
  2. 2. Introduction DNA contains the genetic instructions that uniquely define an organism RNA is created to carry genetic instructions from the DNA to the rest of the cell
  3. 3. Introduction DNA contains the genetic instructions that uniquely define an organism RNA is created to carry genetic instructions from the DNA to the rest of the cell The process of DNA “talking” to the rest of the cell is called transcription
  4. 4. Transcription DNA
  5. 5. Transcription DNA RNA
  6. 6. Transcription DNA RNA
  7. 7. Position Weight Matrix (PWM) (Hertz et al 1990)
  8. 8. Position Weight Matrix (PWM) (Hertz et al 1990) ETS1 TF binding motif Position:  1 2 3 4 5 6 7 8  A 0.067 0.333 0.0 0.0 1.0 0.533 0.267 0.067 C   0.933 0.600 0.0 0.0 0.0 0.133 0.067 0.400   G  0.000 0.000 1.0 1.0 0.0 0.000 0.667 0.000  T 0.000 0.067 0.0 0.0 0.0 0.333 0.000 0.533
  9. 9. Sequence Logos Figure: DNA binding motif for the ETS1 TF
  10. 10. De Novo motif searching
  11. 11. De Novo motif searching Regular expression enumeration
  12. 12. De Novo motif searching Regular expression enumeration 1 Actual count vs. expected count 2 Dictionary-based sequence model (Bussemaker et al. 2000)
  13. 13. De Novo motif searching Regular expression enumeration 1 Actual count vs. expected count 2 Dictionary-based sequence model (Bussemaker et al. 2000) PWM updating
  14. 14. De Novo motif searching Regular expression enumeration 1 Actual count vs. expected count 2 Dictionary-based sequence model (Bussemaker et al. 2000) PWM updating 1 MEME (Bailey et al 1995) 2 Gibbs Motif Sampler (GMS) (Lawrence et al 1993) 3 BioProspector (Liu et al 2001) 4 AlignACE (Roth et al 1998)
  15. 15. Known Motif Search 1 GREP 2 Database search with scoring function (Hertz et al 1990)
  16. 16. XPIME: An Improved Method
  17. 17. XPIME: An Improved Method TRANSFAC (Matys et al 2003) Information pulled from in vitro experiments and literature Most methods justify results using TRANSFAC
  18. 18. XPIME: An Improved Method TRANSFAC (Matys et al 2003) Information pulled from in vitro experiments and literature Most methods justify results using TRANSFAC XPRIME incorporates prior information
  19. 19. XPIME: An Improved Method TRANSFAC (Matys et al 2003) Information pulled from in vitro experiments and literature Most methods justify results using TRANSFAC XPRIME incorporates prior information XPRIME can search for both de novo motifs and known motifs simultaneously
  20. 20. Notation and Data
  21. 21. Notation and Data Indices w: width of motif L: length of sequence m: motif indicator i: position in sequence j: position in motif s: indicates sequence
  22. 22. Notation and Data Indices w: width of motif L: length of sequence m: motif indicator i: position in sequence j: position in motif s: indicates sequence The data, zs
  23. 23. Notation and Data Indices w: width of motif L: length of sequence m: motif indicator i: position in sequence j: position in motif s: indicates sequence The data, zs zs = (yis , ∆1i , ∆2i , · · · , ∆(m+1)i ) yi represents the position (w-mer) ∆mi indicates if yi belongs to motif m or not ∆(m+1)i indicates if yi belongs to the backgrond motif or not
  24. 24. The Scoring Function w MotifScore = f (y) = pij I (yj = i). j=1 i∈A,C ,G ,T
  25. 25. Methods: Complete Data Likelihood (m+1) – component mixture model
  26. 26. Methods: Complete Data Likelihood (m+1) – component mixture model Ls L(θ|z) = C (yi )[r1 f1 (yi )]∆1i [r2 f2 (yi )]∆2i · · · [rm+1 fm+1 ]∆(m+1)i i=1 f(y) is the Motif Score equation
  27. 27. Methods: Priors fm+1 (y ) is fixed a priori ∆(m+1)i ’s are missing a priori f1 (y ), · · · , fm (y ) have product Dirichlet priors such that L ap mij −1 π(fm (y )) ∝ pmjk j=1 k∈(A,C ,G ,T ) r also has a Dirichlet prior M ari −1 π(r) ∝ ri i=1
  28. 28. Methods: Gibbs Algorithm
  29. 29. Methods: Gibbs Algorithm 1 Draws ∆’s from a multinomial distribution p∆ ∝ rM ∗ fM (y )
  30. 30. Methods: Gibbs Algorithm 1 Draws ∆’s from a multinomial distribution p∆ ∝ rM ∗ fM (y ) 2 Draws r from a Dirichlet distribution L αr = i=1 ∆Mi + aM
  31. 31. Methods: Gibbs Algorithm 1 Draws ∆’s from a multinomial distribution p∆ ∝ rM ∗ fM (y ) 2 Draws r from a Dirichlet distribution L αr = i=1 ∆Mi + aM 3 Draws pmij from a Dirichlet distribution L αpmij = i=1 k={A,C ,G ,T } ∆mi I (yij = k) + apmij
  32. 32. An Example: ETS1 We hypothesize that ETS1 has a specific binding site The Data 1 ETS1 only 2 GABP only 3 ETS1 and GABP
  33. 33. ETS1 Binding Motifs (a) ETS1 from TRANSFAC (b) ETS1 from ETS1 only (c) ETS1 from GABP only (d) ETS1 from ETS1/GABP
  34. 34. Justification of Prior Information Pete Hollenhorst sequence logo
  35. 35. Justification of Prior Information Figure: Motif found without prior specification Figure: Motif found with prior specification
  36. 36. Conclusions and Future Research
  37. 37. Conclusions and Future Research XPRIME successfully searches for de novo and known motifs
  38. 38. Conclusions and Future Research XPRIME successfully searches for de novo and known motifs Evidence found suggesting ETS1 has its own binding motif
  39. 39. Conclusions and Future Research XPRIME successfully searches for de novo and known motifs Evidence found suggesting ETS1 has its own binding motif Hidden Markov Models and forward backward algorithm
  40. 40. Conclusions and Future Research XPRIME successfully searches for de novo and known motifs Evidence found suggesting ETS1 has its own binding motif Hidden Markov Models and forward backward algorithm Prior information on r
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×