INTRODUCTION Background A peek into my work Conclusions
Efficient
Probabilistic Logic Programming
for
Biological Sequence A...
INTRODUCTION Background A peek into my work Conclusions
OUTLINE
INTRODUCTION
Domain
Research questions
Background
Gene find...
INTRODUCTION Background A peek into my work Conclusions
INTRODUCTION
INTRODUCTION Background A peek into my work Conclusions
BIOLOGICAL SEQUENCE ANALYSIS
Subfield of bioinformatics
INTRODUCTION Background A peek into my work Conclusions
BIOLOGICAL SEQUENCE ANALYSIS
Subfield of bioinformatics
Analyze bio...
INTRODUCTION Background A peek into my work Conclusions
BIOLOGICAL SEQUENCE ANALYSIS
Subfield of bioinformatics
Analyze bio...
INTRODUCTION Background A peek into my work Conclusions
BIOLOGICAL SEQUENCE ANALYSIS
Subfield of bioinformatics
Analyze bio...
INTRODUCTION Background A peek into my work Conclusions
BIOLOGICAL SEQUENCE ANALYSIS
Subfield of bioinformatics
Analyze bio...
INTRODUCTION Background A peek into my work Conclusions
BIOLOGICAL SEQUENCE ANALYSIS
Subfield of bioinformatics
Analyze bio...
INTRODUCTION Background A peek into my work Conclusions
BIOLOGICAL SEQUENCE ANALYSIS
Subfield of bioinformatics
Analyze bio...
INTRODUCTION Background A peek into my work Conclusions
BIOLOGICAL SEQUENCE ANALYSIS
Subfield of bioinformatics
Analyze bio...
INTRODUCTION Background A peek into my work Conclusions
BIOLOGICAL SEQUENCE ANALYSIS
Subfield of bioinformatics
Analyze bio...
INTRODUCTION Background A peek into my work Conclusions
PROBABILISTIC LOGIC PROGRAMMING
Declarative programming paradigm
A...
INTRODUCTION Background A peek into my work Conclusions
MODELS FOR BIOLOGICAL SEQUENCE ANALYSIS
Reflect relationships betwe...
INTRODUCTION Background A peek into my work Conclusions
THE LOST PROJECT
. . . seeks to improve ease of modeling, accuracy...
INTRODUCTION Background A peek into my work Conclusions
RESEARCH QUESTIONS
1. To what extent is it possible to use probabi...
INTRODUCTION Background A peek into my work Conclusions
RELATIONS BETWEEN RESEARCH QUESTIONS
1. To what extent is it possi...
INTRODUCTION Background A peek into my work Conclusions
APPROACH
To build and evaluate
Applications
Abstractions
Optimizat...
INTRODUCTION Background A peek into my work Conclusions
APPROACH
Applications
Deal with relevant biological sequence analy...
INTRODUCTION Background A peek into my work Conclusions
BACKGROUND
Prokaryotic gene finding
Probabilistic logic programming
INTRODUCTION Background A peek into my work Conclusions
PROKARYOTIC GENE FINDING
Identify regions of DNA which encode prot...
INTRODUCTION Background A peek into my work Conclusions
GENES AND OPEN READING FRAMES
The identification of prokaryotic gen...
INTRODUCTION Background A peek into my work Conclusions
SIGNALS FOR PROKARYOTIC GENE FINDING
Open reading frames
Length
Nu...
INTRODUCTION Background A peek into my work Conclusions
READING FRAMES AND OVERLAPPING GENES
RNA can be transcribed from e...
INTRODUCTION Background A peek into my work Conclusions
PROBABILISTIC LOGIC PROGRAMMING
Logic programming and Prolog
Proba...
INTRODUCTION Background A peek into my work Conclusions
LOGIC PROGRAMMING AND PROLOG
A Prolog program consist of a finite s...
INTRODUCTION Background A peek into my work Conclusions
TERMS, LITERALS AND VARIABLES
Literals can consist of (possibly) s...
INTRODUCTION Background A peek into my work Conclusions
TERMS, LITERALS AND VARIABLES
Literals can consist of (possibly) s...
INTRODUCTION Background A peek into my work Conclusions
TERMS, LITERALS AND VARIABLES
Literals can consist of (possibly) s...
INTRODUCTION Background A peek into my work Conclusions
TERMS, LITERALS AND VARIABLES
Literals can consist of (possibly) s...
INTRODUCTION Background A peek into my work Conclusions
TERMS, LITERALS AND VARIABLES
Literals can consist of (possibly) s...
INTRODUCTION Background A peek into my work Conclusions
RESOLUTION
Problems are stated as theorems (goals) to be proved, e...
INTRODUCTION Background A peek into my work Conclusions
RESOLUTION
Problems are stated as theorems (goals) to be proved, e...
INTRODUCTION Background A peek into my work Conclusions
RESOLUTION
Problems are stated as theorems (goals) to be proved, e...
INTRODUCTION Background A peek into my work Conclusions
RESOLUTION
Problems are stated as theorems (goals) to be proved, e...
INTRODUCTION Background A peek into my work Conclusions
RESOLUTION
Problems are stated as theorems (goals) to be proved, e...
INTRODUCTION Background A peek into my work Conclusions
RESOLUTION
Problems are stated as theorems (goals) to be proved, e...
INTRODUCTION Background A peek into my work Conclusions
RESOLUTION
Problems are stated as theorems (goals) to be proved, e...
INTRODUCTION Background A peek into my work Conclusions
RESOLUTION
Problems are stated as theorems (goals) to be proved, e...
INTRODUCTION Background A peek into my work Conclusions
RESOLUTION
Problems are stated as theorems (goals) to be proved, e...
INTRODUCTION Background A peek into my work Conclusions
RESOLUTION
Problems are stated as theorems (goals) to be proved, e...
INTRODUCTION Background A peek into my work Conclusions
DERIVATIONS AND EXPLANATION GRAPHS
Consider the following
program ...
INTRODUCTION Background A peek into my work Conclusions
DERIVATIONS TREE
s(s(0))+s(s(0))
s(0)+s(s(0))
0+s(s(0))
0+s(0)
0+0...
INTRODUCTION Background A peek into my work Conclusions
DERIVATIONS TREE
s(s(0))+s(s(0))
s(0)+s(s(0))
0+s(s(0))
0+s(0)
0+0...
INTRODUCTION Background A peek into my work Conclusions
EXPLANATION GRAPH
Polynomial1
1
O(n ∗ m), but would be O(n + m) if...
INTRODUCTION Background A peek into my work Conclusions
TABLING
Idea
The system maintains a table of calls and their answe...
INTRODUCTION Background A peek into my work Conclusions
PROBABILISTIC LOGIC PROGRAMMING
Probabilistic logic programming is...
INTRODUCTION Background A peek into my work Conclusions
PRISM
PRogramming In Statistical Modelling is a framework for
prob...
INTRODUCTION Background A peek into my work Conclusions
HIDDEN MARKOV MODEL EXAMPLE
Postcard
Greetings from wherever, wher...
INTRODUCTION Background A peek into my work Conclusions
HIDDEN MARKOV MODEL run
Definition
A run of an HMM as a pair consis...
INTRODUCTION Background A peek into my work Conclusions
DECODING WITH HIDDEN MARKOV MODELS
Infer the hidden path given the...
INTRODUCTION Background A peek into my work Conclusions
EXAMPLE HMM IN PRISM
values/2
declares the
outcomes of
random vari...
INTRODUCTION Background A peek into my work Conclusions
EXAMPLE HMM IN PRISM
values/2
declares the
outcomes of
random vari...
INTRODUCTION Background A peek into my work Conclusions
EXAMPLE HMM IN PRISM
values/2
declares the
outcomes of
random vari...
INTRODUCTION Background A peek into my work Conclusions
A PEEK INTO MY WORK
Overview of papers
A few selected cases:
An ab...
INTRODUCTION Background A peek into my work Conclusions
PAPERS 1
1. Henning Christiansen, Christian Theil Have, Ole Torp L...
INTRODUCTION Background A peek into my work Conclusions
PAPERS 2
4. Henning Christiansen, Christian Theil Have, Ole Torp L...
INTRODUCTION Background A peek into my work Conclusions
PAPERS 3
7. Henning Christiansen, Christian Theil Have, Ole Torp L...
INTRODUCTION Background A peek into my work Conclusions
PAPERS 4
10. Christian Theil Have and Søren Mørk
A Probabilistic G...
INTRODUCTION Background A peek into my work Conclusions
THE TROUBLE WITH TABLING OF STRUCTURED DATA
INTRODUCTION Background A peek into my work Conclusions
THE TROUBLE WITH TABLING OF STRUCTURED DATA
An innocent looking
pr...
INTRODUCTION Background A peek into my work Conclusions
A WORKAROUND IMPLEMENTED IN PROLOG
We describe a workaround giving...
INTRODUCTION Background A peek into my work Conclusions
AN ABSTRACT DATA TYPE
store_term( +ground-term, pointer)
The groun...
INTRODUCTION Background A peek into my work Conclusions
ADT EXAMPLE
The following call converts the term f(a,g(b)) into it...
INTRODUCTION Background A peek into my work Conclusions
AN AUTOMATIC PROGRAM TRANSFORMATION
We introduce an automatic tran...
INTRODUCTION Background A peek into my work Conclusions
TRANSFORMED HIDDEN MARKOV MODEL
original version
hmm(_,[]).
hmm(S,...
INTRODUCTION Background A peek into my work Conclusions
BENCHMARKING RESULTS
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●...
INTRODUCTION Background A peek into my work Conclusions
THE NEXT STEP
Integration at the Prolog engine implementation leve...
INTRODUCTION Background A peek into my work Conclusions
CONSTRAINED HMMS
Definition
A constrained HMM (CHMM)
is an HMM
exte...
INTRODUCTION Background A peek into my work Conclusions
CONSTRAINED HMMS
Why extend an HMM with side-constraints?
Convenie...
INTRODUCTION Background A peek into my work Conclusions
ALIGNMENT WITH A CONSTRAINED PAIR HMM
In a biological context, we ...
INTRODUCTION Background A peek into my work Conclusions
ALIGNMENT WITH CONSTRAINTS
INTRODUCTION Background A peek into my work Conclusions
ADDING CONSTRAINT CHECKING TO THE HMM
HMM with constraint checking...
INTRODUCTION Background A peek into my work Conclusions
A LIBRARY OF GLOBAL CONSTRAINTS FOR HIDDEN
MARKOV MODELS
Our imple...
INTRODUCTION Background A peek into my work Conclusions
TABLING ISSUES
Problem: The extra Store argument makes PRISM table...
INTRODUCTION Background A peek into my work Conclusions
IMPACT OF USING A SEPARATE CONSTRAINT STORE
STACK
INTRODUCTION Background A peek into my work Conclusions
GENOME MODELS
Gene finding in a genomic context
What are the constr...
INTRODUCTION Background A peek into my work Conclusions
AN APPLICATION OF CONSTRAINED MARKOV
MODELS
We wish to incorporate...
INTRODUCTION Background A peek into my work Conclusions
PRUNING STEP AS A CONSTRAINT OPTIMIZATION
PROBLEM
CSP formulation
...
INTRODUCTION Background A peek into my work Conclusions
COP IMPLEMENTATION WITH MARKOV CHAIN (1)
We propose to use a (cons...
INTRODUCTION Background A peek into my work Conclusions
FROM CONFIDENCE SCORES TO TRANSITION
PROBABILITIES
P(α1|begin) = σ...
INTRODUCTION Background A peek into my work Conclusions
ENCODING CONSTRAINTS WITH CONSTRAINT
HANDLING RULES
Constraints: a...
INTRODUCTION Background A peek into my work Conclusions
EXPERIMENTAL RESULTS
Prediction on E.coli. using simplistic codon ...
INTRODUCTION Background A peek into my work Conclusions
A MODEL FOR THE GENOME-WIDE SEQUENCE OF
READING FRAMES
We wish to ...
INTRODUCTION Background A peek into my work Conclusions
METHODOLOGY
Genes predictions are
sorted by stop codon
position.
G...
INTRODUCTION Background A peek into my work Conclusions
MODEL
F1 . . . F6 Emissions: Finite
set of i symbols δ1 . . . δn
c...
INTRODUCTION Background A peek into my work Conclusions
RESULTS
1−Specificity (FPR)
Sensitivity(TPR)
0.0 0.1 0.2 0.3 0.4 0...
INTRODUCTION Background A peek into my work Conclusions
CONCLUSIONS
1. To what extent is it possible to use
probabilistic ...
INTRODUCTION Background A peek into my work Conclusions
CONCLUSIONS
Commonly used models for biological analysis can
conve...
INTRODUCTION Background A peek into my work Conclusions
THANKS
Upcoming SlideShare
Loading in …5
×

Efficient Probabilistic Logic Programming for Biological Sequence Analysis

414 views

Published on

Slides from Ph.D. defense

Published in: Data & Analytics, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
414
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Efficient Probabilistic Logic Programming for Biological Sequence Analysis

  1. 1. INTRODUCTION Background A peek into my work Conclusions Efficient Probabilistic Logic Programming for Biological Sequence Analysis Christian Theil Have Research group PLIS: Programming, Logic and Intelligent Systems Department of Communication, Business and Information Technologies Roskilde University
  2. 2. INTRODUCTION Background A peek into my work Conclusions OUTLINE INTRODUCTION Domain Research questions Background Gene finding Probabilistic Logic Programming A peek into my work Overview of papers The trouble with tabling of structured data Constrained HMMs Applications: Genome models Conclusions
  3. 3. INTRODUCTION Background A peek into my work Conclusions INTRODUCTION
  4. 4. INTRODUCTION Background A peek into my work Conclusions BIOLOGICAL SEQUENCE ANALYSIS Subfield of bioinformatics
  5. 5. INTRODUCTION Background A peek into my work Conclusions BIOLOGICAL SEQUENCE ANALYSIS Subfield of bioinformatics Analyze biological sequences
  6. 6. INTRODUCTION Background A peek into my work Conclusions BIOLOGICAL SEQUENCE ANALYSIS Subfield of bioinformatics Analyze biological sequences DNA
  7. 7. INTRODUCTION Background A peek into my work Conclusions BIOLOGICAL SEQUENCE ANALYSIS Subfield of bioinformatics Analyze biological sequences DNA RNA
  8. 8. INTRODUCTION Background A peek into my work Conclusions BIOLOGICAL SEQUENCE ANALYSIS Subfield of bioinformatics Analyze biological sequences DNA RNA Proteins
  9. 9. INTRODUCTION Background A peek into my work Conclusions BIOLOGICAL SEQUENCE ANALYSIS Subfield of bioinformatics Analyze biological sequences DNA RNA Proteins to understand
  10. 10. INTRODUCTION Background A peek into my work Conclusions BIOLOGICAL SEQUENCE ANALYSIS Subfield of bioinformatics Analyze biological sequences DNA RNA Proteins to understand Features
  11. 11. INTRODUCTION Background A peek into my work Conclusions BIOLOGICAL SEQUENCE ANALYSIS Subfield of bioinformatics Analyze biological sequences DNA RNA Proteins to understand Features Functions
  12. 12. INTRODUCTION Background A peek into my work Conclusions BIOLOGICAL SEQUENCE ANALYSIS Subfield of bioinformatics Analyze biological sequences DNA RNA Proteins to understand Features Functions Evolutionary relationships
  13. 13. INTRODUCTION Background A peek into my work Conclusions PROBABILISTIC LOGIC PROGRAMMING Declarative programming paradigm Ability to express common and complex models used in biological sequence analysis Concise expression of complex models Separation between logic and control Generic inference algorithms Transformations
  14. 14. INTRODUCTION Background A peek into my work Conclusions MODELS FOR BIOLOGICAL SEQUENCE ANALYSIS Reflect relationships between features of sequence data Embody constraints – assumptions about data Infer information from data Reasoning about uncertainty → probabilities
  15. 15. INTRODUCTION Background A peek into my work Conclusions THE LOST PROJECT . . . seeks to improve ease of modeling, accuracy and reliability of sequence analysis by using logic-statistical models . . . Key focus areas: The PRISM system Prokaryotic gene finding My Ph.D. project is part of the LoSt project and share these focus areas.
  16. 16. INTRODUCTION Background A peek into my work Conclusions RESEARCH QUESTIONS 1. To what extent is it possible to use probabilistic logic programming for biological sequence analysis? 2. How can constraints relevant to the domain of biological sequence analysis be combined with probabilistic logic programming? 3. What are the limitations with regard to efficiency and how can these be dealt with? I believe that these are the central questions that need be addressed in order to be able to construct useful tools for biological sequence analysis using probabilistic logic programming.
  17. 17. INTRODUCTION Background A peek into my work Conclusions RELATIONS BETWEEN RESEARCH QUESTIONS 1. To what extent is it possible to use probabilistic logic programming for biological sequence analysis? 2. How can constraints relevant to the domain of biological sequence analysis be combined with probabilistic logic programming? 3. What are the limitations with regard to efficiency and how can these be dealt with?
  18. 18. INTRODUCTION Background A peek into my work Conclusions APPROACH To build and evaluate Applications Abstractions Optimizations for biological sequence analysis using probabilistic logic programming.
  19. 19. INTRODUCTION Background A peek into my work Conclusions APPROACH Applications Deal with relevant biological sequence analysis problems Potentially to contribute new knowledge to biology or bioinformatics Direct substantiation with regard to research question 1 Abstractions Ease modeling Language for incorporating constraints from the domain A higher level of declarativity; Focus on problem rather than implementation (model) details Optimizations Deal with limitations of probabilistic logic programming that may hinder its use in biological sequence analysis. Efficient inference is a precondition for practical use.
  20. 20. INTRODUCTION Background A peek into my work Conclusions BACKGROUND Prokaryotic gene finding Probabilistic logic programming
  21. 21. INTRODUCTION Background A peek into my work Conclusions PROKARYOTIC GENE FINDING Identify regions of DNA which encode proteins: A (prokaryotic) gene is a consecutive stretch of DNA which, is transcribed as part of an RNA is translated to a complete protein and has a length which is a multiple of three (codons) starts with a “start” codon last codon is a “stop” codon
  22. 22. INTRODUCTION Background A peek into my work Conclusions GENES AND OPEN READING FRAMES The identification of prokaryotic genes may be decomposed into two distinct problems: 1. Identification of ORFs which contain protein coding genes. 2. Identification of the correct start codon within an ORF. ORF ::= start not-stop * stop start ::= TTG | CTG | ATT | ATC | ATA | ATG | GTG stop ::= TAA | TAG | TGA not-stop ::= AAA | ... | TTT //all codons except those in stop
  23. 23. INTRODUCTION Background A peek into my work Conclusions SIGNALS FOR PROKARYOTIC GENE FINDING Open reading frames Length Nucleotide sequence composition Conservation (sequence similarity in other organisms) Local context Promoters Ribosomal binding site Termination signal GB -35 PB -10 +1 tss SD ≈ +10 Gene ≈ +15-20 Terminator
  24. 24. INTRODUCTION Background A peek into my work Conclusions READING FRAMES AND OVERLAPPING GENES RNA can be transcribed from either strand Genes may start in different “reading frames” Genes can overlap in the same and in different reading frames on opposite strands
  25. 25. INTRODUCTION Background A peek into my work Conclusions PROBABILISTIC LOGIC PROGRAMMING Logic programming and Prolog Probabilistic logic programming and PRISM
  26. 26. INTRODUCTION Background A peek into my work Conclusions LOGIC PROGRAMMING AND PROLOG A Prolog program consist of a finite sequence of rules, B:-A1, . . . , An. These rules define implications, i.e., B if A1 and . . . and An
  27. 27. INTRODUCTION Background A peek into my work Conclusions TERMS, LITERALS AND VARIABLES Literals can consist of (possibly) structured terms, that may include variables. number(0). number(s(X)) :- number(X).
  28. 28. INTRODUCTION Background A peek into my work Conclusions TERMS, LITERALS AND VARIABLES Literals can consist of (possibly) structured terms, that may include variables. fact number(0). number(s(X)) :- number(X).
  29. 29. INTRODUCTION Background A peek into my work Conclusions TERMS, LITERALS AND VARIABLES Literals can consist of (possibly) structured terms, that may include variables. constant number(0). number(s(X)) :- number(X).
  30. 30. INTRODUCTION Background A peek into my work Conclusions TERMS, LITERALS AND VARIABLES Literals can consist of (possibly) structured terms, that may include variables. number(0). number(s(X)) :- number(X). term
  31. 31. INTRODUCTION Background A peek into my work Conclusions TERMS, LITERALS AND VARIABLES Literals can consist of (possibly) structured terms, that may include variables. number(0). number(s(X)) :- number(X). variables
  32. 32. INTRODUCTION Background A peek into my work Conclusions RESOLUTION Problems are stated as theorems (goals) to be proved, e.g., number(X) To prove a consequent, we recursively need to prove the antecedents by using rules where these appear as consequents, number(0). number(s(X)) :- number(X). Solutions number(X) → X = 0 number(X) → X = s(0) number(X) → X = s(s(0)) . . . Derivation
  33. 33. INTRODUCTION Background A peek into my work Conclusions RESOLUTION Problems are stated as theorems (goals) to be proved, e.g., number(X) To prove a consequent, we recursively need to prove the antecedents by using rules where these appear as consequents, number(0). number(s(X)) :- number(X). Solutions number(X) → X = 0 number(X) → X = s(0) number(X) → X = s(s(0)) . . . Derivation number(X) → number(0)
  34. 34. INTRODUCTION Background A peek into my work Conclusions RESOLUTION Problems are stated as theorems (goals) to be proved, e.g., number(X) To prove a consequent, we recursively need to prove the antecedents by using rules where these appear as consequents, number(0). number(s(X)) :- number(X). Solutions number(X) → X = 0 number(X) → X = s(0) number(X) → X = s(s(0)) . . . Derivation number(X) → number(s(X)) →
  35. 35. INTRODUCTION Background A peek into my work Conclusions RESOLUTION Problems are stated as theorems (goals) to be proved, e.g., number(X) To prove a consequent, we recursively need to prove the antecedents by using rules where these appear as consequents, number(0). number(s(X)) :- number(X). Solutions number(X) → X = 0 number(X) → X = s(0) number(X) → X = s(s(0)) . . . Derivation number(X) → number(s(X)) →
  36. 36. INTRODUCTION Background A peek into my work Conclusions RESOLUTION Problems are stated as theorems (goals) to be proved, e.g., number(X) To prove a consequent, we recursively need to prove the antecedents by using rules where these appear as consequents, number(0). number(s(X)) :- number(X). Solutions number(X) → X = 0 number(X) → X = s(0) number(X) → X = s(s(0)) . . . Derivation number(X) → number(s(X)) → number(s(0))
  37. 37. INTRODUCTION Background A peek into my work Conclusions RESOLUTION Problems are stated as theorems (goals) to be proved, e.g., number(X) To prove a consequent, we recursively need to prove the antecedents by using rules where these appear as consequents, number(0). number(s(X)) :- number(X). Solutions number(X) → X = 0 number(X) → X = s(0) number(X) → X = s(s(0)) . . . Derivation number(X) → number(s(X)) →
  38. 38. INTRODUCTION Background A peek into my work Conclusions RESOLUTION Problems are stated as theorems (goals) to be proved, e.g., number(X) To prove a consequent, we recursively need to prove the antecedents by using rules where these appear as consequents, number(0). number(s(X)) :- number(X). Solutions number(X) → X = 0 number(X) → X = s(0) number(X) → X = s(s(0)) . . . Derivation number(X) → number(s(X)) →
  39. 39. INTRODUCTION Background A peek into my work Conclusions RESOLUTION Problems are stated as theorems (goals) to be proved, e.g., number(X) To prove a consequent, we recursively need to prove the antecedents by using rules where these appear as consequents, number(0). number(s(X)) :- number(X). Solutions number(X) → X = 0 number(X) → X = s(0) number(X) → X = s(s(0)) . . . Derivation number(X) → number(s(X)) → number(s(s(X))) →
  40. 40. INTRODUCTION Background A peek into my work Conclusions RESOLUTION Problems are stated as theorems (goals) to be proved, e.g., number(X) To prove a consequent, we recursively need to prove the antecedents by using rules where these appear as consequents, number(0). number(s(X)) :- number(X). Solutions number(X) → X = 0 number(X) → X = s(0) number(X) → X = s(s(0)) . . . Derivation number(X) → number(s(X)) → number(s(s(X))) →
  41. 41. INTRODUCTION Background A peek into my work Conclusions RESOLUTION Problems are stated as theorems (goals) to be proved, e.g., number(X) To prove a consequent, we recursively need to prove the antecedents by using rules where these appear as consequents, number(0). number(s(X)) :- number(X). Solutions number(X) → X = 0 number(X) → X = s(0) number(X) → X = s(s(0)) . . . Derivation number(X) → number(s(X)) → number(s(s(X))) → number(s(s(0)))
  42. 42. INTRODUCTION Background A peek into my work Conclusions DERIVATIONS AND EXPLANATION GRAPHS Consider the following program which adds natural numbers: add(0+0,0). add(A+s(B),s(C)) :- add(A+B,C). add(s(A)+B,s(C)) :- add(A+B,C). And suppose we call the goal, add(s(s(0))+s(s(0)),R) We now have two alternative applicable clauses, alternatives Resulting in either, add(s(0)+s(s(0)),s(R)) or add(s(s(0))+s(0),s(R))
  43. 43. INTRODUCTION Background A peek into my work Conclusions DERIVATIONS TREE s(s(0))+s(s(0)) s(0)+s(s(0)) 0+s(s(0)) 0+s(0) 0+0 s(0)+s(0) 0+s(0) 0+0 s(0)+0 0+0 s(s(0))+s(0) s(0)+s(0) 0+s(0) 0+0 s(0)+0 0+0 s(s(0))+0 s(0)+0 0+0
  44. 44. INTRODUCTION Background A peek into my work Conclusions DERIVATIONS TREE s(s(0))+s(s(0)) s(0)+s(s(0)) 0+s(s(0)) 0+s(0) 0+0 s(0)+s(0) 0+s(0) 0+0 s(0)+0 0+0 s(s(0))+s(0) s(0)+s(0) 0+s(0) 0+0 s(0)+0 0+0 s(s(0))+0 s(0)+0 0+0 Exponential!
  45. 45. INTRODUCTION Background A peek into my work Conclusions EXPLANATION GRAPH Polynomial1 1 O(n ∗ m), but would be O(n + m) if arguments were ordered by size
  46. 46. INTRODUCTION Background A peek into my work Conclusions TABLING Idea The system maintains a table of calls and their answers. when a new call is entered, check if it is stored in the table if so, use previously found solution Consequence: Explanation graph representation. Significant speed-up of program execution.
  47. 47. INTRODUCTION Background A peek into my work Conclusions PROBABILISTIC LOGIC PROGRAMMING Probabilistic logic programming is a form of logic programming which deals with uncertainty. Assign probability to each possible derivation in a logic program. Probabilistic inference, e.g., derive the probability of a goal Infer the most probable derivation of a goal Learn the affinities for different derivations from data
  48. 48. INTRODUCTION Background A peek into my work Conclusions PRISM PRogramming In Statistical Modelling is a framework for probabilistic logic programming Developed by collaboration partners of the Lost project: Yoshitaka Kameya, Taisuke Sato, and Neng-Fa Zhou. An extension of Prolog with random variables, called MSWs Provides efficient generalized inference algorithms PRISM program = probabilistic model
  49. 49. INTRODUCTION Background A peek into my work Conclusions HIDDEN MARKOV MODEL EXAMPLE Postcard Greetings from wherever, where I am having a great time. Here is what I have been doing: The first two days, I stayed at the hotel reading a good book. Then, on the third day I decided to go shopping. The next three days I did nothing but lie on the beach. On my last day, I went shopping for some gifts to bring home and wrote you this postcard. Sincerely, Some friend of yours Observation sequence
  50. 50. INTRODUCTION Background A peek into my work Conclusions HIDDEN MARKOV MODEL run Definition A run of an HMM as a pair consisting of a sequence of states s(0) s(1) . . . s(n) , called a path and a corresponding sequence of emissions e(1) . . . e(n) , called an observation, such that s(0) = s0; ∀i, 0 ≤ i ≤ n − 1, p(s(i); s(i+1)) > 0 (probability to transit from s(i) to s(i+1)); ∀i, 0 < i ≤ n, p(s(i); e(i)) > 0 (probability to emit e(i) from s(i)). Definition The probability of such a run is defined as i=1..n p(s(i−1); s(i)) · p(s(i); e(i))
  51. 51. INTRODUCTION Background A peek into my work Conclusions DECODING WITH HIDDEN MARKOV MODELS Infer the hidden path given the observation sequence. argmaxpathP(path|observation) source: wikipedia →The Viterbi algorithm
  52. 52. INTRODUCTION Background A peek into my work Conclusions EXAMPLE HMM IN PRISM values/2 declares the outcomes of random variables msw/2 simulates a random variable, stochastically selecting one of the outcomes Model in Prolog Specifies relation between variables Example HMM in PRISM values(trans(_), [sunny,rainy]). values(emit(_), [shop,beach,read]). hmm(L):- run_length(T),hmm(T,start,L). hmm(0,_,[]). hmm(T,State,[Emit|EmitRest]) :- T > 0, msw(trans(State),NextState), msw(emit(NextState),Emit), T1 is T-1, hmm(T1,NextState,EmitRest). run_length(7).
  53. 53. INTRODUCTION Background A peek into my work Conclusions EXAMPLE HMM IN PRISM values/2 declares the outcomes of random variables msw/2 simulates a random variable, stochastically selecting one of the outcomes Model in Prolog Specifies relation between variables Example HMM in PRISM values(trans(_), [sunny,rainy]). values(emit(_), [shop,beach,read]). hmm(L):- run_length(T),hmm(T,start,L). hmm(0,_,[]). hmm(T,State,[Emit|EmitRest]) :- T > 0, msw(trans(State),NextState), msw(emit(NextState),Emit), T1 is T-1, hmm(T1,NextState,EmitRest). run_length(7).
  54. 54. INTRODUCTION Background A peek into my work Conclusions EXAMPLE HMM IN PRISM values/2 declares the outcomes of random variables msw/2 simulates a random variable, stochastically selecting one of the outcomes Model in Prolog Specifies relation between variables Example HMM in PRISM values(trans(_), [sunny,rainy]). values(emit(_), [shop,beach,read]). hmm(L):- run_length(T),hmm(T,start,L). hmm(0,_,[]). hmm(T,State,[Emit|EmitRest]) :- T > 0, msw(trans(State),NextState), msw(emit(NextState),Emit), T1 is T-1, hmm(T1,NextState,EmitRest). run_length(7).
  55. 55. INTRODUCTION Background A peek into my work Conclusions A PEEK INTO MY WORK Overview of papers A few selected cases: An abstraction: Constrained HMMs (also an optimization) An optimization: Regarding tabling of structured data A couple of applications: Genome models Using constrained probabilistic models for gene finding with overlapping genes Gene finding with a probabilistic model for genome-sequence of reading
  56. 56. INTRODUCTION Background A peek into my work Conclusions PAPERS 1 1. Henning Christiansen, Christian Theil Have, Ole Torp Lassen and Matthieu Petit Taming the Zoo of Discrete HMM Subspecies & some of their Relatives Frontiers in Artificial Intelligence and Applications, 2011 2. Henning Christiansen, Christian Theil Have, Ole Torp Lassen and Matthieu Petit Inference with constrained hidden Markov models in PRISM Theory and Practice of Logic Programming, 2010 3. Christian Theil Have Constraints and Global Optimization for Gene Prediction Overlap Resolution Workshop on Constraint Based Methods for Bioinformatics, 2011
  57. 57. INTRODUCTION Background A peek into my work Conclusions PAPERS 2 4. Henning Christiansen, Christian Theil Have, Ole Torp Lassen and Matthieu Petit The Viterbi Algorithm expressed in Constraint Handling Rules 7th International Workshop on Constraint Handling Rules, 2010 5. Christian Theil Have and Henning Christiansen Modeling Repeats in DNA Using Probabilistic Extended Regular Expressions Frontiers in Artificial Intelligence and Applications, 2011 6. Henning Christiansen, Christian Theil Have, Ole Torp Lassen and Matthieu Petit Bayesian Annotation Networks for Complex Sequence Analysis Technical Communications of the 27th International Conference on Logic Programming (ICLP’11)
  58. 58. INTRODUCTION Background A peek into my work Conclusions PAPERS 3 7. Henning Christiansen, Christian Theil Have, Ole Torp Lassen and Matthieu Petit A declarative pipeline language for big data analysis Presented at LOPSTR, 2012 8. Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data Using Indexing and Program Transformation Practical Aspects of Declarative Languages, 2012 9. Neng-Fa Zhou and Christian Theil Have Efficient tabling of structured data with enhanced hash-consing Theory and Practice of Logic Programming, 2012
  59. 59. INTRODUCTION Background A peek into my work Conclusions PAPERS 4 10. Christian Theil Have and Søren Mørk A Probabilistic Genome-Wide Gene Reading Frame Sequence Model Submitted to PLOS One, 2012 11. Christian Theil Have, Sine Zambach and Henning Christiansen Effects of using Coding Potential, Sequence Conservation and mRNA Structure Conservation for Predicting Pyrrolysine Containing Genes Submitted to BMC Bionformatics, 2012
  60. 60. INTRODUCTION Background A peek into my work Conclusions THE TROUBLE WITH TABLING OF STRUCTURED DATA
  61. 61. INTRODUCTION Background A peek into my work Conclusions THE TROUBLE WITH TABLING OF STRUCTURED DATA An innocent looking predicate: last/2 last([X],X). last([_|L],X) :- last(L,X). Traverses a list to find the last element. Time/space complexity: O(n). If we table last/2: n + n − 1 + n − 2 . . . 1 ≈ O(n2) ! call: last([1,2,3,4,5],X) last([1,2,3,4,5],X) last([1,2,3,4],X) last([1,2,3],X) last([1,2],X) last([1],X) call table last([1,2,3,4,5],X). last([1,2,3,4],X). last([1,2,3],X). last([1,2],X). last([1],X).
  62. 62. INTRODUCTION Background A peek into my work Conclusions A WORKAROUND IMPLEMENTED IN PROLOG We describe a workaround giving O(1) time and space complexity for table lookups for programs with arbitrarily large ground structured data as input arguments. A term is represented as a set of facts. A subterm is referenced by a unique integer serving as an abstract pointer. Matching related to tabling is done solely by comparison of such pointers.
  63. 63. INTRODUCTION Background A peek into my work Conclusions AN ABSTRACT DATA TYPE store_term( +ground-term, pointer) The ground-term is any ground term, and the pointer returned is a unique reference (an integer) for that term. retrieve_term( +pointer, ?functor, ?arg-pointers-list) Returns the functor and a list of pointers to representations of the substructures of the term represented by pointer.
  64. 64. INTRODUCTION Background A peek into my work Conclusions ADT EXAMPLE The following call converts the term f(a,g(b)) into its internal representation and returns a pointer value in the variable P. store_term(f(a,g(b)),P). Implementation with assert, e.g., retrieve_term(100,f,[101,102]). retrieve_term(101,a,[]). retrieve_term(102,g,[103]). retrieve_term(103,b,[]).
  65. 65. INTRODUCTION Background A peek into my work Conclusions AN AUTOMATIC PROGRAM TRANSFORMATION We introduce an automatic transformation: Structured terms are moved from the head of clauses to calls in the body to retrieve_term/2.
  66. 66. INTRODUCTION Background A peek into my work Conclusions TRANSFORMED HIDDEN MARKOV MODEL original version hmm(_,[]). hmm(S,[Ob|Y]) :- msw(out(S),Ob), msw(tr(S),Next), hmm(Next,Y). transformed version hmm(S,ObsPtr):- retrieve_term(ObsPtr,[]). hmm(S,ObsPtr) :- retrieve_term(ObsPtr,[Ob,Y]), msw(out(S),Ob), msw(tr(S),Next), hmm(Next,Y).
  67. 67. INTRODUCTION Background A peek into my work Conclusions BENCHMARKING RESULTS ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1000 2000 3000 4000 5000 020406080100120140 b) Running time without indexed lookup sequence length Runningtime(seconds) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1000 2000 3000 4000 5000 0.000.020.040.060.08 a) Running time with indexed lookup sequence length Runningtime(seconds)
  68. 68. INTRODUCTION Background A peek into my work Conclusions THE NEXT STEP Integration at the Prolog engine implementation level. Neng-Fa Zhou and Christian Theil Have Efficient tabling of structured data with enhanced hash-consing Theory and Practice of Logic Programming, 2012 Full sharing between tables (call and answer) Sharing with structured data in call stack
  69. 69. INTRODUCTION Background A peek into my work Conclusions CONSTRAINED HMMS Definition A constrained HMM (CHMM) is an HMM extended with a set of constraints C, each of which is a mapping from HMM runs into {true, false}.
  70. 70. INTRODUCTION Background A peek into my work Conclusions CONSTRAINED HMMS Why extend an HMM with side-constraints? Convenient to express knowledge in terms of constraints Reuse underlying model with different assumptions Some constraints are not feasible as model structure (e.g. all_different) fewer paths to consider for any given sequence → decreased running time (under certain conditions)
  71. 71. INTRODUCTION Background A peek into my work Conclusions ALIGNMENT WITH A CONSTRAINED PAIR HMM In a biological context, we may want to only consider alignments with a limited number of insertions and deletions given the assumption that the two sequences are closely related.
  72. 72. INTRODUCTION Background A peek into my work Conclusions ALIGNMENT WITH CONSTRAINTS
  73. 73. INTRODUCTION Background A peek into my work Conclusions ADDING CONSTRAINT CHECKING TO THE HMM HMM with constraint checking hmm(T,State,[Emit|EmitRest],StoreIn) :- T > 0, msw(trans(State),NxtState), msw(emit(NxtState),Emit), check_constraints([NxtState,Emit],StoreIn,StoreOut), T1 is T-1, hmm(T1,NxtState,EmitRest,StoreOut). Call to check_constraints/3 after each distinct sequence of msw applications Side-constaints: The constraints are assumed to be declared elsewhere and not interleaved with model specification Extra Store argument in the probabilistic predicate
  74. 74. INTRODUCTION Background A peek into my work Conclusions A LIBRARY OF GLOBAL CONSTRAINTS FOR HIDDEN MARKOV MODELS Our implementation contains a few well-known global constraints adapted to Hidden Markov Models. Global constraints cardinality lock_to_sequence all_different lock_to_set In addition, the implementation provides operators which may be used to apply constraints to a limited set of variables. Constraint operators state_specific emission_specific forall_subseq (sliding window operator) for_range (time step range operator)
  75. 75. INTRODUCTION Background A peek into my work Conclusions TABLING ISSUES Problem: The extra Store argument makes PRISM table multiple goals (for different constraint stores) when it should only store one. hmm(T,State,[Emit|EmitRest],Store) To get rid of the extra argument, check_constraints dynamically maintains it as a stack using assert/retract. Note: This is not sound solution for all types of constraints (some need tabling).
  76. 76. INTRODUCTION Background A peek into my work Conclusions IMPACT OF USING A SEPARATE CONSTRAINT STORE STACK
  77. 77. INTRODUCTION Background A peek into my work Conclusions GENOME MODELS Gene finding in a genomic context What are the constraints between adjacent genes in the genome? Extent of (possible) overlap Modeled as hard constraints Gene reading frames, i.e., due to leading strand bias, operons etc. Modeled as (probabilistic) soft constraints
  78. 78. INTRODUCTION Background A peek into my work Conclusions AN APPLICATION OF CONSTRAINED MARKOV MODELS We wish to incorporate overlapping gene constraints into gene finding. Divide and conquer two step approach to gene finding: 1. Gene prediction: A gene finder supplies a set of candidate predictions p1 . . . pn, called the initial set. 2. Pruning: The initial set is pruned according to certain rules or constraints. We call the pruned set the final set.
  79. 79. INTRODUCTION Background A peek into my work Conclusions PRUNING STEP AS A CONSTRAINT OPTIMIZATION PROBLEM CSP formulation We introduce variables X = xi . . . xn corresponding to each prediction p1 . . . pn in the initial set (sorted by position in genome) All variables have boolean domains, ∀xi ∈ X, D(xi) = {true, false} and xi = true ⇔ pi ∈ final set. Multiple solutions We want the “best” solution Optimize for prediction confidence scores Constraint Optimization Problem (COP) COP formulation Let the scores of p1 . . . pn be s1 . . . sn and si ∈ R+. Maximize n i=1 si, subject to C.
  80. 80. INTRODUCTION Background A peek into my work Conclusions COP IMPLEMENTATION WITH MARKOV CHAIN (1) We propose to use a (constrained) Markov chain for the COP. The Markov chain has a begin state, an end state and two states for each variable xi corresponding to its boolean domain D(xi). The state corresponding to D(xi) = true is denoted αi and the state corresponding to D(xi) = false is denoted βi. In this model, a path from the begin state to the end state corresponds to a potential solution of the CSP.
  81. 81. INTRODUCTION Background A peek into my work Conclusions FROM CONFIDENCE SCORES TO TRANSITION PROBABILITIES P(α1|begin) = σ1 P(β1|begin) = 1 − σ1 P(end|αn) = P(end|βn) = 1. P(αi|αi−1) = P(αi|βi−1) = σi P(βi|αi−1) = P(βi|βi−1) = 1 − σi σi = 0.5 + λ + (0.5 − λ) × (si − min(s1 . . . sn)) max(s1 . . . sn) − min(s1 . . . sn)
  82. 82. INTRODUCTION Background A peek into my work Conclusions ENCODING CONSTRAINTS WITH CONSTRAINT HANDLING RULES Constraints: alpha/2 and beta/2 ≈ visited states. Example: Genemark inconsistency rules alpha(Left1,Right1), alpha(Left2,Right2) <=> Left1 =< Left2, Right1 >= Right2 | fail. beta(Left1,Right1), alpha(Left2,Right2) <=> Left1 =< Left2, Right1 >= Right2 | fail. The most probable consistent path is found using PRISMs generic adaptation of the Viterbi algorithm Each step adds either a alpha or beta (active) constraint Incremental Pruning: For each step we only apply constraints which may be transitively involved in rules with the active constraint
  83. 83. INTRODUCTION Background A peek into my work Conclusions EXPERIMENTAL RESULTS Prediction on E.coli. using simplistic codon frequency based gene finder. Pruning using our global optimization approach (with all inconsistency rules) versus local heuristic rules2. Method #predictions Sensitivity Specificity Time (seconds) initial set 10799 0.7625 0.2926 na Genemark rules 5823 0.7558 0.5379 1.4 ECOGENE rules 4981 0.7148 0.5947 1.7 global optimization 5222 0.7201 0.5714 75 Sensitivity = fraction of known reference genes predicted. Specificity = fraction of predicted genes that are correct. 2 Note that the results for the ECOGENE heuristic may vary depending on execution strategy - in case of above results, predictions with lower left position are considered first.
  84. 84. INTRODUCTION Background A peek into my work Conclusions A MODEL FOR THE GENOME-WIDE SEQUENCE OF READING FRAMES We wish to incorporate gene reading frame constraints into gene finding. Divide and conquer two step approach to gene finding (again): 1. Gene prediction: A gene finder supplies a set of candidate predictions p1 . . . pn, called the initial set. 2. Pruning: The initial set is pruned according to gene finder confidence scores and the the probabilities adjacent gene reading frames. We call the pruned set the final set.
  85. 85. INTRODUCTION Background A peek into my work Conclusions METHODOLOGY Genes predictions are sorted by stop codon position. Gene finder scores are discretized into symbolic values. A type of Hidden Markov Model which we call a delete-HMM: A state for each of the six possible reading frames and one delete state F1 F2 F3 F4 F5 F6 delete
  86. 86. INTRODUCTION Background A peek into my work Conclusions MODEL F1 . . . F6 Emissions: Finite set of i symbols δ1 . . . δn corresponding to ranges of prediction scores Delete state emission: P(δi|state = delete) = FPδi FP Frame state transitions: Relative frequency of "observed" adjacent gene reading frame pairs (normalized) Transition to delete: P(delete) = 1 − TP TP+FP (tunable) F1 F2 F3 F4 F5 F6 delete
  87. 87. INTRODUCTION Background A peek into my work Conclusions RESULTS 1−Specificity (FPR) Sensitivity(TPR) 0.0 0.1 0.2 0.3 0.4 0.5 0.50.60.70.80.91.0 threshold frameseq, trained on Escherichia frameseq, trained on Salmonella frameseq, trained on Legionella frameseq, trained on Bacillus frameseq, trained on Thermoplasma
  88. 88. INTRODUCTION Background A peek into my work Conclusions CONCLUSIONS 1. To what extent is it possible to use probabilistic logic programming for biological sequence analysis? 2. How can constraints relevant to the domain of biological sequence analysis be combined with probabilistic logic programming? 3. What are the limitations with regard to efficiency and how can these be dealt with?
  89. 89. INTRODUCTION Background A peek into my work Conclusions CONCLUSIONS Commonly used models for biological analysis can conveniently expressed using probabilistic logic programming Probabilistic logic programming is also a powerful tool for experimenting with new kinds of models It can support integration of constraints in a variety of ways Efficiency is an issue, but with suitable optimizations it is efficient enough for many interesting problems It is not merely a powerful abstraction A valuable and practical tool for biological sequence analysis
  90. 90. INTRODUCTION Background A peek into my work Conclusions THANKS

×