SlideShare a Scribd company logo
1 of 99
Download to read offline
INTRODUCTION Background A peek into my work Conclusions
Efficient
Probabilistic Logic Programming
for
Biological Sequence Analysis
Christian Theil Have
Research group PLIS: Programming, Logic and Intelligent Systems
Department of Communication, Business and Information Technologies
Roskilde University
INTRODUCTION Background A peek into my work Conclusions
OUTLINE
INTRODUCTION
Domain
Research questions
Background
Gene finding
Probabilistic Logic Programming
A peek into my work
Overview of papers
The trouble with tabling of structured data
Constrained HMMs
Applications: Genome models
Conclusions
INTRODUCTION Background A peek into my work Conclusions
INTRODUCTION
INTRODUCTION Background A peek into my work Conclusions
BIOLOGICAL SEQUENCE ANALYSIS
Subfield of bioinformatics
INTRODUCTION Background A peek into my work Conclusions
BIOLOGICAL SEQUENCE ANALYSIS
Subfield of bioinformatics
Analyze biological
sequences
INTRODUCTION Background A peek into my work Conclusions
BIOLOGICAL SEQUENCE ANALYSIS
Subfield of bioinformatics
Analyze biological
sequences
DNA
INTRODUCTION Background A peek into my work Conclusions
BIOLOGICAL SEQUENCE ANALYSIS
Subfield of bioinformatics
Analyze biological
sequences
DNA
RNA
INTRODUCTION Background A peek into my work Conclusions
BIOLOGICAL SEQUENCE ANALYSIS
Subfield of bioinformatics
Analyze biological
sequences
DNA
RNA
Proteins
INTRODUCTION Background A peek into my work Conclusions
BIOLOGICAL SEQUENCE ANALYSIS
Subfield of bioinformatics
Analyze biological
sequences
DNA
RNA
Proteins
to understand
INTRODUCTION Background A peek into my work Conclusions
BIOLOGICAL SEQUENCE ANALYSIS
Subfield of bioinformatics
Analyze biological
sequences
DNA
RNA
Proteins
to understand
Features
INTRODUCTION Background A peek into my work Conclusions
BIOLOGICAL SEQUENCE ANALYSIS
Subfield of bioinformatics
Analyze biological
sequences
DNA
RNA
Proteins
to understand
Features
Functions
INTRODUCTION Background A peek into my work Conclusions
BIOLOGICAL SEQUENCE ANALYSIS
Subfield of bioinformatics
Analyze biological
sequences
DNA
RNA
Proteins
to understand
Features
Functions
Evolutionary
relationships
INTRODUCTION Background A peek into my work Conclusions
PROBABILISTIC LOGIC PROGRAMMING
Declarative programming paradigm
Ability to express common and complex models used in
biological sequence analysis
Concise expression of complex models
Separation between logic and control
Generic inference algorithms
Code as data: Transformations
INTRODUCTION Background A peek into my work Conclusions
MODELS FOR BIOLOGICAL SEQUENCE ANALYSIS
Reflect relationships between features of sequence data
Embody constraints – assumptions about data
Infer information from data
Reasoning about uncertainty → probabilities
INTRODUCTION Background A peek into my work Conclusions
THE LOST PROJECT
. . . seeks to improve ease of modeling, accuracy and reliability of
sequence analysis by using logic-statistical models that are yet largely
untested in bioinformatics . . .
Key focus areas:
The PRISM system
Prokaryotic gene finding
My Ph.D. project is part of the LoSt project and share these
focus areas.
INTRODUCTION Background A peek into my work Conclusions
RESEARCH QUESTIONS
1. To what extent is it possible to use probabilistic logic
programming for biological sequence analysis?
2. How can constraints relevant to the domain of biological
sequence analysis be combined with probabilistic logic
programming?
3. What are the limitations with regard to efficiency and how can
these be dealt with?
I believe that these are the central questions that need be
addressed in order to be able to construct useful tools for
biological sequence analysis using probabilistic logic
programming.
INTRODUCTION Background A peek into my work Conclusions
RELATIONS BETWEEN RESEARCH QUESTIONS
1. To what extent is it possible to use
probabilistic logic programming for
biological sequence analysis?
2. How can constraints relevant to the
domain of biological sequence analysis
be combined with probabilistic logic
programming?
3. What are the limitations with regard to
efficiency and how can these be dealt
with?
INTRODUCTION Background A peek into my work Conclusions
APPROACH
To build and evaluate
Applications
Abstractions
Optimizations
for biological sequence analysis using probabilistic logic
programming.
INTRODUCTION Background A peek into my work Conclusions
APPROACH
Applications
Deal with relevant biological sequence analysis problems
Potentially to contribute new knowledge to biology or
bioinformatics
Direct substantiation with regard to research question 1
Abstractions
Ease modeling
Language for incorporating constraints from the domain
A higher level of declarativity;
Focus on problem rather than implementation (model)
details
Optimizations
Deal with limitations of probabilistic logic programming
that may hinder its use in biological sequence analysis.
Efficient inference is a precondition for practical use.
INTRODUCTION Background A peek into my work Conclusions
BACKGROUND
Prokaryotic gene finding
Probabilistic logic programming
INTRODUCTION Background A peek into my work Conclusions
PROKARYOTIC GENE FINDING
Identify regions of DNA which encode proteins:
A (prokaryotic) gene is a consecutive stretch of DNA which,
is transcribed as part of an RNA
is translated to a complete protein and
has a length which is a multiple of three (codons)
starts with a “start” codon
last codon is a “stop” codon
INTRODUCTION Background A peek into my work Conclusions
GENES AND OPEN READING FRAMES
The identification of prokaryotic genes may be decomposed
into two distinct problems:
1. Identification of ORFs which contain protein coding genes.
2. Identification of the correct start codon within an ORF.
ORF ::= start not-stop * stop
start ::= TTG | CTG | ATT | ATC | ATA | ATG | GTG
stop ::= TAA | TAG | TGA
not-stop ::= AAA | ... | TTT //all codons except those in stop
INTRODUCTION Background A peek into my work Conclusions
SIGNALS FOR PROKARYOTIC GENE FINDING
Open reading frames
Length
Nucleotide sequence composition
Conservation (sequence similarity in other organisms)
Local context
Promoters
Ribosomal binding site
Termination signal
GB
-35
PB
-10 +1
tss
SD
≈ +10
Gene
≈ +15-20
Terminator
INTRODUCTION Background A peek into my work Conclusions
READING FRAMES AND OVERLAPPING GENES
RNA can be transcribed from either strand
Genes may start in different “reading frames”
Genes can overlap
in the same and in different reading frames
on opposite strands
INTRODUCTION Background A peek into my work Conclusions
PROBABILISTIC LOGIC PROGRAMMING
Logic programming and Prolog
Probabilistic logic programming and PRISM
INTRODUCTION Background A peek into my work Conclusions
LOGIC PROGRAMMING AND PROLOG
A Prolog program consist of a finite sequence of rules,
B:-A1, . . . , An.
These rules define implications, i.e.,
B if A1 and . . . and An
INTRODUCTION Background A peek into my work Conclusions
TERMS, LITERALS AND VARIABLES
Literals can consist of (possibly) structured terms, that may
include variables.
number(0).
number(s(X)) :- number(X).
INTRODUCTION Background A peek into my work Conclusions
TERMS, LITERALS AND VARIABLES
Literals can consist of (possibly) structured terms, that may
include variables.
fact
number(0).
number(s(X)) :- number(X).
INTRODUCTION Background A peek into my work Conclusions
TERMS, LITERALS AND VARIABLES
Literals can consist of (possibly) structured terms, that may
include variables.
constant
number(0).
number(s(X)) :- number(X).
INTRODUCTION Background A peek into my work Conclusions
TERMS, LITERALS AND VARIABLES
Literals can consist of (possibly) structured terms, that may
include variables.
number(0).
number(s(X)) :- number(X).
term
INTRODUCTION Background A peek into my work Conclusions
TERMS, LITERALS AND VARIABLES
Literals can consist of (possibly) structured terms, that may
include variables.
number(0).
number(s(X)) :- number(X).
variables
INTRODUCTION Background A peek into my work Conclusions
RESOLUTION
Problems are stated as theorems (goals) to be proved, e.g.,
number(X)
To prove a consequent, we recursively need to prove the
antecedents by using rules where these appear as consequents,
number(0).
number(s(X)) :-
number(X).
Solutions
number(X) → X = 0
number(X) → X = s(0)
number(X) → X = s(s(0))
. . .
Derivation
INTRODUCTION Background A peek into my work Conclusions
RESOLUTION
Problems are stated as theorems (goals) to be proved, e.g.,
number(X)
To prove a consequent, we recursively need to prove the
antecedents by using rules where these appear as consequents,
number(0).
number(s(X)) :-
number(X).
Solutions
number(X) → X = 0
number(X) → X = s(0)
number(X) → X = s(s(0))
. . .
Derivation
number(X) →
number(0)
INTRODUCTION Background A peek into my work Conclusions
RESOLUTION
Problems are stated as theorems (goals) to be proved, e.g.,
number(X)
To prove a consequent, we recursively need to prove the
antecedents by using rules where these appear as consequents,
number(0).
number(s(X)) :-
number(X).
Solutions
number(X) → X = 0
number(X) → X = s(0)
number(X) → X = s(s(0))
. . .
Derivation
number(X) →
number(s(X)) →
INTRODUCTION Background A peek into my work Conclusions
RESOLUTION
Problems are stated as theorems (goals) to be proved, e.g.,
number(X)
To prove a consequent, we recursively need to prove the
antecedents by using rules where these appear as consequents,
number(0).
number(s(X)) :-
number(X).
Solutions
number(X) → X = 0
number(X) → X = s(0)
number(X) → X = s(s(0))
. . .
Derivation
number(X) →
number(s(X)) →
INTRODUCTION Background A peek into my work Conclusions
RESOLUTION
Problems are stated as theorems (goals) to be proved, e.g.,
number(X)
To prove a consequent, we recursively need to prove the
antecedents by using rules where these appear as consequents,
number(0).
number(s(X)) :-
number(X).
Solutions
number(X) → X = 0
number(X) → X = s(0)
number(X) → X = s(s(0))
. . .
Derivation
number(X) →
number(s(X)) →
number(s(0))
INTRODUCTION Background A peek into my work Conclusions
RESOLUTION
Problems are stated as theorems (goals) to be proved, e.g.,
number(X)
To prove a consequent, we recursively need to prove the
antecedents by using rules where these appear as consequents,
number(0).
number(s(X)) :-
number(X).
Solutions
number(X) → X = 0
number(X) → X = s(0)
number(X) → X = s(s(0))
. . .
Derivation
number(X) →
number(s(X)) →
INTRODUCTION Background A peek into my work Conclusions
RESOLUTION
Problems are stated as theorems (goals) to be proved, e.g.,
number(X)
To prove a consequent, we recursively need to prove the
antecedents by using rules where these appear as consequents,
number(0).
number(s(X)) :-
number(X).
Solutions
number(X) → X = 0
number(X) → X = s(0)
number(X) → X = s(s(0))
. . .
Derivation
number(X) →
number(s(X)) →
INTRODUCTION Background A peek into my work Conclusions
RESOLUTION
Problems are stated as theorems (goals) to be proved, e.g.,
number(X)
To prove a consequent, we recursively need to prove the
antecedents by using rules where these appear as consequents,
number(0).
number(s(X)) :-
number(X).
Solutions
number(X) → X = 0
number(X) → X = s(0)
number(X) → X = s(s(0))
. . .
Derivation
number(X) →
number(s(X)) →
number(s(s(X))) →
INTRODUCTION Background A peek into my work Conclusions
RESOLUTION
Problems are stated as theorems (goals) to be proved, e.g.,
number(X)
To prove a consequent, we recursively need to prove the
antecedents by using rules where these appear as consequents,
number(0).
number(s(X)) :-
number(X).
Solutions
number(X) → X = 0
number(X) → X = s(0)
number(X) → X = s(s(0))
. . .
Derivation
number(X) →
number(s(X)) →
number(s(s(X))) →
INTRODUCTION Background A peek into my work Conclusions
RESOLUTION
Problems are stated as theorems (goals) to be proved, e.g.,
number(X)
To prove a consequent, we recursively need to prove the
antecedents by using rules where these appear as consequents,
number(0).
number(s(X)) :-
number(X).
Solutions
number(X) → X = 0
number(X) → X = s(0)
number(X) → X = s(s(0))
. . .
Derivation
number(X) →
number(s(X)) →
number(s(s(X))) →
number(s(s(0)))
INTRODUCTION Background A peek into my work Conclusions
DERIVATION TREES AND EXPLANATION GRAPHS
Consider the following
program which adds natural
numbers:
add(0+0,0).
add(A+s(B),s(C)) :-
add(A+B,C).
add(s(A)+B,s(C)) :-
add(A+B,C).
And suppose we call the goal,
add(s(s(0))+s(s(0)),R)
We now have two alternative
applicable clauses,
alternatives
Resulting in either,
add(s(0)+s(s(0)),s(R))
or
add(s(s(0))+s(0),s(R))
INTRODUCTION Background A peek into my work Conclusions
DERIVATION TREE
s(s(0))+s(s(0))
s(0)+s(s(0))
0+s(s(0))
0+s(0)
0+0
s(0)+s(0)
0+s(0)
0+0
s(0)+0
0+0
s(s(0))+s(0)
s(0)+s(0)
0+s(0)
0+0
s(0)+0
0+0
s(s(0))+0
s(0)+0
0+0
INTRODUCTION Background A peek into my work Conclusions
DERIVATION TREE
s(s(0))+s(s(0))
s(0)+s(s(0))
0+s(s(0))
0+s(0)
0+0
s(0)+s(0)
0+s(0)
0+0
s(0)+0
0+0
s(s(0))+s(0)
s(0)+s(0)
0+s(0)
0+0
s(0)+0
0+0
s(s(0))+0
s(0)+0
0+0
Exponential!
INTRODUCTION Background A peek into my work Conclusions
EXPLANATION GRAPH
Polynomial1
1
O(n ∗ m), but would be O(n + m) if arguments were ordered by size
INTRODUCTION Background A peek into my work Conclusions
TABLING
Idea
The system maintains a table of calls and their answers.
when a new call is entered, check if it is stored in the table
if so, use previously found solution
Consequence:
Explanation graph representation.
Significant speed-up of program execution.
INTRODUCTION Background A peek into my work Conclusions
PROBABILISTIC LOGIC PROGRAMMING
Probabilistic logic programming is a form of logic
programming which deals with uncertainty.
A logic program induces a set of possible worlds, i.e., the set of
derivable consequents and their alternative proofs.
Probabilistic logic programming extends logic programming by
assigning probabilities to each of these possible worlds and
extends logical inference into probabilistic inference, as to, e.g.,
derive the probability of a goal
Infer the most probable derivation of a goal
Infer the affinities (represented by probabilities) for
different possible worlds from data
INTRODUCTION Background A peek into my work Conclusions
PRISM
PRogramming In Statistical Modelling is a framework for
probabilistic logic programming
Developed by collaboration partners of the Lost project:
Yoshitaka Kameya, Taisuke Sato, and Neng-Fa Zhou.
An extension of Prolog with random variables, called MSWs
Provides efficient generalized inference algorithms
(Viterbi, EM, etc) using tabling
PRISM program = probabilistic model
INTRODUCTION Background A peek into my work Conclusions
HIDDEN MARKOV MODEL EXAMPLE
Postcard
Greetings from wherever, where I am having
a great time. Here is what I have been doing:
The first two days, I stayed at the hotel
reading a good book. Then, on the third day I
decided to go shopping. The next three days I
did nothing but lie on the beach. On my last
day, I went shopping for some gifts to bring
home and wrote you this postcard.
Sincerely, Some friend of yours
Observation sequence
INTRODUCTION Background A peek into my work Conclusions
HIDDEN MARKOV MODEL run
Definition
A run of an HMM as a pair consisting of a sequence of states
s(0)
s(1)
. . . s(n)
, called a path and a corresponding sequence of
emissions e(1)
. . . e(n)
, called an observation, such that
s(0) = s0;
∀i, 0 ≤ i ≤ n − 1, p(s(i); s(i+1)) > 0
(probability to transit from s(i) to s(i+1));
∀i, 0 < i ≤ n, p(s(i); e(i)) > 0
(probability to emit e(i) from s(i)).
Definition
The probability of such a run is defined as
i=1..n p(s(i−1); s(i)) · p(s(i); e(i))
INTRODUCTION Background A peek into my work Conclusions
DECODING WITH HIDDEN MARKOV MODELS
Inferr the hidden path given the observation sequence.
argmaxpathP(path|observation)
source: wikipedia
The Viterbi algorithm: can be seen as keeping track of, for each prefix
of an observed emission sequence, the most probable (partial) path
leading to each possible state, and extending those step by step into
longer paths, eventually covering the entire emission sequence.
INTRODUCTION Background A peek into my work Conclusions
EXAMPLE HMM IN PRISM
values/2
declares the
outcomes of
random variables
msw/2
simulates a
random variable,
stochastically
selecting one of
the outcomes
Model in Prolog
Specifies relation
between variables
Example HMM in PRISM
values(trans(_), [sunny,rainy]).
values(emit(_), [shop,beach,read]).
hmm(L):- run_length(T),hmm(T,start,L).
hmm(0,_,[]).
hmm(T,State,[Emit|EmitRest]) :-
T > 0,
msw(trans(State),NextState),
msw(emit(NextState),Emit),
T1 is T-1,
hmm(T1,NextState,EmitRest).
run_length(7).
INTRODUCTION Background A peek into my work Conclusions
EXAMPLE HMM IN PRISM
values/2
declares the
outcomes of
random variables
msw/2
simulates a
random variable,
stochastically
selecting one of
the outcomes
Model in Prolog
Specifies relation
between variables
Example HMM in PRISM
values(trans(_), [sunny,rainy]).
values(emit(_), [shop,beach,read]).
hmm(L):- run_length(T),hmm(T,start,L).
hmm(0,_,[]).
hmm(T,State,[Emit|EmitRest]) :-
T > 0,
msw(trans(State),NextState),
msw(emit(NextState),Emit),
T1 is T-1,
hmm(T1,NextState,EmitRest).
run_length(7).
INTRODUCTION Background A peek into my work Conclusions
EXAMPLE HMM IN PRISM
values/2
declares the
outcomes of
random variables
msw/2
simulates a
random variable,
stochastically
selecting one of
the outcomes
Model in Prolog
Specifies relation
between variables
Example HMM in PRISM
values(trans(_), [sunny,rainy]).
values(emit(_), [shop,beach,read]).
hmm(L):- run_length(T),hmm(T,start,L).
hmm(0,_,[]).
hmm(T,State,[Emit|EmitRest]) :-
T > 0,
msw(trans(State),NextState),
msw(emit(NextState),Emit),
T1 is T-1,
hmm(T1,NextState,EmitRest).
run_length(7).
INTRODUCTION Background A peek into my work Conclusions
A PEEK INTO MY WORK
Overview of papers
A few selected cases:
An abstraction: Constrained HMMs (also an optimization)
An optimization: Regarding tabling of structured data
A couple of applications: Genome models
Using constrained probabilistic models for gene finding with
overlapping genes
Gene finding with a probabilistic model for
genome-sequence of reading
INTRODUCTION Background A peek into my work Conclusions
PAPERS 1
1. Henning Christiansen, Christian Theil Have, Ole Torp Lassen
and Matthieu Petit
Taming the Zoo of Discrete HMM Subspecies & some of their Relatives
Frontiers in Artificial Intelligence and Applications, 2011
2. Henning Christiansen, Christian Theil Have, Ole Torp Lassen
and Matthieu Petit
Inference with constrained hidden Markov models in PRISM
Theory and Practice of Logic Programming, 2010
3. Christian Theil Have
Constraints and Global Optimization for Gene Prediction Overlap
Resolution
Workshop on Constraint Based Methods for Bioinformatics, 2011
INTRODUCTION Background A peek into my work Conclusions
PAPERS 2
4. Henning Christiansen, Christian Theil Have, Ole Torp Lassen
and Matthieu Petit
The Viterbi Algorithm expressed in Constraint Handling Rules
7th International Workshop on Constraint Handling Rules, 2010
5. Christian Theil Have and Henning Christiansen
Modeling Repeats in DNA Using Probabilistic Extended Regular
Expressions
Frontiers in Artificial Intelligence and Applications, 2011
6. Henning Christiansen, Christian Theil Have, Ole Torp Lassen
and Matthieu Petit
Bayesian Annotation Networks for Complex Sequence Analysis
Technical Communications of the 27th International Conference
on Logic Programming (ICLP’11)
INTRODUCTION Background A peek into my work Conclusions
PAPERS 3
7. Henning Christiansen, Christian Theil Have, Ole Torp Lassen
and Matthieu Petit
A declarative pipeline language for big data analysis
Presented at LOPSTR, 2012
8. Christian Theil Have and Henning Christiansen
Efficient Tabling of Structured Data Using Indexing and Program
Transformation
Practical Aspects of Declarative Languages, 2012
9. Neng-Fa Zhou and Christian Theil Have
Efficient tabling of structured data with enhanced hash-consing
Theory and Practice of Logic Programming, 2012
INTRODUCTION Background A peek into my work Conclusions
PAPERS 4
10. Christian Theil Have and Søren Mørk
A Probabilistic Genome-Wide Gene Reading Frame Sequence Model
Submitted to PLOS One, 2012
11. Christian Theil Have, Sine Zambach and Henning
Christiansen
Effects of using Coding Potential, Sequence Conservation and mRNA
Structure Conservation for Predicting Pyrrolysine Containing Genes
Submitted to BMC Bionformatics, 2012
INTRODUCTION Background A peek into my work Conclusions
THE TROUBLE WITH TABLING OF STRUCTURED DATA
INTRODUCTION Background A peek into my work Conclusions
THE TROUBLE WITH TABLING OF STRUCTURED DATA
An innocent looking
predicate: last/2
last([X],X).
last([_|L],X) :-
last(L,X).
Traverses a list to
find the last element.
Time/space
complexity: O(n).
If we table last/2:
n + n − 1 + n − 2 . . . 1
≈ O(n2) !
call:
last([1,2,3,4,5],X)
last([1,2,3,4,5],X)
last([1,2,3,4],X)
last([1,2,3],X)
last([1,2],X)
last([1],X)
call table
last([1,2,3,4,5],X).
last([1,2,3,4],X).
last([1,2,3],X).
last([1,2],X).
last([1],X).
INTRODUCTION Background A peek into my work Conclusions
A WORKAROUND IMPLEMENTED IN PROLOG
We describe a workaround giving O(1) time and space
complexity for table lookups for programs with arbitrarily
large ground structured data as input arguments.
A term is represented as a set of facts.
A subterm is referenced by a unique integer serving as an
abstract pointer.
Matching related to tabling is done solely by comparison
of such pointers.
INTRODUCTION Background A peek into my work Conclusions
AN ABSTRACT DATA TYPE
The representation is given by the following predicates which
all together can be understood as an abstract datatype.
store_term( +ground-term, pointer)
The ground-term is any ground term, and the
pointer returned is a unique reference (an integer)
for that term.
retrieve_term( +pointer, ?functor, ?arg-pointers-list)
Returns the functor and a list of pointers to
representations of the substructures of the term
represented by pointer.
full_retrieve_term( +pointer, ?ground-term)
Returns the term represented by pointer.
INTRODUCTION Background A peek into my work Conclusions
ADT EXAMPLE
Example
The following call converts the term f(a,g(b)) into its
internal representation and returns a pointer value in the
variable P.
store_term(f(a,g(b)),P).
After this, the following sequence of calls will succeed.
retrieve_term(P,f,[P1,P2]),
retrieve_term(P1,a,[]),
retrieve_term(P2,g,[P21]),
retrieve_term(P21,b,[]),
full_retrieve_term(P,f(a,g(b))).
INTRODUCTION Background A peek into my work Conclusions
AN AUTOMATIC PROGRAM TRANSFORMATION
We introduce an automatic transformation from a tabled
program to an efficient version using our approach.
Structured terms are moved from the head of clauses to calls in
the body to retrive_term/2 and full_retrieve_term/3.
INTRODUCTION Background A peek into my work Conclusions
TRANSFORMED HIDDEN MARKOV MODEL
We only need to consider the recursive predicate, hmm/2.
original version
hmm(_,[]).
hmm(S,[Ob|Y]) :-
msw(out(S),Ob),
msw(tr(S),Next),
hmm(Next,Y).
transformed version
hmm(S,ObsPtr):-
retrieve_term(ObsPtr,[]).
hmm(S,ObsPtr) :-
retrieve_term(ObsPtr,[Ob,Y]),
msw(out(S),Ob),
msw(tr(S),Next),
hmm(Next,Y).
INTRODUCTION Background A peek into my work Conclusions
BENCHMARKING RESULTS
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 1000 2000 3000 4000 5000
020406080100120140
b) Running time without indexed lookup
sequence length
Runningtime(seconds)
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
● ● ●
●
●
0 1000 2000 3000 4000 5000
0.000.020.040.060.08
a) Running time with indexed lookup
sequence length
Runningtime(seconds)
INTRODUCTION Background A peek into my work Conclusions
THE NEXT STEP
Integration at the Prolog engine implementation level.
Neng-Fa Zhou and Christian Theil Have
Efficient tabling of structured data with enhanced hash-consing
Theory and Practice of Logic Programming, 2012
Full sharing between tables (call and answer)
Sharing with structured data in call stack
INTRODUCTION Background A peek into my work Conclusions
CONSTRAINED HMMS
Definition
A constrained HMM (CHMM)
is an HMM
extended with a set of constraints C, each of which is a
mapping from HMM runs into {true, false}.
A run of a CHMM, path, observation is a run of the
corresponding HMM for which C(path, observation) is true.
INTRODUCTION Background A peek into my work Conclusions
CONSTRAINED HMMS
Why extend an HMM with side-constraints?
To create better, more specific models with fewer states
Convenient to express prior knowledge in terms of
constraints
No need to change underlying HMM
Sometimes it is not possible or feasible to express such
constraints as HMM structure (e.g. all_different)
→ infeasibly huge state and parameter space
fewer paths to consider for any given sequence
→ decreased running time
INTRODUCTION Background A peek into my work Conclusions
PAIR HMMS FOR SEQUENCE ALIGNMENT
A pair HMM is a special kind of HMM that emits two
sequences
INTRODUCTION Background A peek into my work Conclusions
PAIR HMMS FOR SEQUENCE ALIGNMENT
A pair HMM is a special kind of HMM that emits two
sequences
The match state emit a pair (xiyj) of symbols
INTRODUCTION Background A peek into my work Conclusions
PAIR HMMS FOR SEQUENCE ALIGNMENT
A pair HMM is a special kind of HMM that emits two
sequences
The match state emit a pair (xiyj) of symbols
The insert state emits one symbol xi, from sequence x
INTRODUCTION Background A peek into my work Conclusions
PAIR HMMS FOR SEQUENCE ALIGNMENT
A pair HMM is a special kind of HMM that emits two
sequences
The match state emit a pair (xiyj) of symbols
The insert state emits one symbol xi, from sequence x
The delete state emits one symbol yj, from sequence y
INTRODUCTION Background A peek into my work Conclusions
PAIR HMMS FOR SEQUENCE ALIGNMENT
A pair HMM is a special kind of HMM that emits two
sequences
The match state emit a pair (xiyj) of symbols
The insert state emits one symbol xi, from sequence x
The delete state emits one symbol yj, from sequence y
A run of this model produces an alignment of x and y
INTRODUCTION Background A peek into my work Conclusions
ALIGNMENT WITH A CONSTRAINED PAIR HMM
Consider adding constraints to the pair HMM introduced
earlier.
For instance..
In a biological context, we may want to
only consider alignments with a limited
number of insertions and deletions given
the assumption that the two sequences
are closely related.
C = {cardinality_atmost(Nd, [S1, . . . , Sn], delete),
cardinality_atmost(Ni, [S1, . . . , Sn], insert)} .
The constraint cardinality_atmost(N, L, X) is satisfied whenever
L is a list of elements, out of which at most N are equal to X.
INTRODUCTION Background A peek into my work Conclusions
ALIGNMENT WITH CONSTRAINTS
INTRODUCTION Background A peek into my work Conclusions
ADDING CONSTRAINT CHECKING TO THE HMM
HMM with constraint checking
hmm(T,State,[Emit|EmitRest],StoreIn) :-
T > 0,
msw(trans(State),NxtState),
msw(emit(NxtState),Emit),
check_constraints([NxtState,Emit],StoreIn,StoreOut),
T1 is T-1,
hmm(T1,NxtState,EmitRest,StoreOut).
Call to check_constraints/3 after each distinct
sequence of msw applications
Side-constaints: The constraints are assumed to be
declared elsewhere and not interleaved with model
specification
Extra Store argument in the probabilistic predicate
INTRODUCTION Background A peek into my work Conclusions
CHECKING THE CONSTRAINTS
The goal check_constraints/3 calls constraint checkers for
all constraints declared on the model.
For instance, with our example pair HMM constraint,
C = {cardinality_atmost(Nd, [S1, . . . , Sn], delete),
cardinality_atmost(Ni, [S1, . . . , Sn], insert)} .
We have the following incremental constraint checker
implementation
A cardinality_atmost constraint checker
init_constraint_store(cardinality_atmost(_,_), 0).
check_sat(cardinality_atmost(U,Max), U, In, Out) :-
Out is In + 1,Out =< Max.
check_sat(cardinality_atmost(X,_),U,S,S) :- X = U.
INTRODUCTION Background A peek into my work Conclusions
A LIBRARY OF GLOBAL CONSTRAINTS FOR HIDDEN
MARKOV MODELS
Our implementation contains a few well-known global
constraints adapted to Hidden Markov Models.
Global constraints
cardinality lock_to_sequence
all_different lock_to_set
In addition, the implementation provides operators which may
be used to apply constraints to a limited set of variables.
Constraint operators
state_specific
emission_specific
forall_subseq (sliding window operator)
for_range (time step range operator)
INTRODUCTION Background A peek into my work Conclusions
TABLING ISSUES
Problem: The extra Store argument makes PRISM table
multiple goals (for different constraint stores) when it should
only store one.
hmm(T,State,[Emit|EmitRest],Store)
To get rid of the extra argument, check_constraints
dynamically maintains it as a stack using assert/retract:
check_constraints(Update) :-
get_store(StoreBefore),
check_constraints(Update,StoreBefore,StoreAfter),
forward_store(StoreAfter).
get_store(S) :- store(S), !.
forward_store(S) :- asserta(store(S)) ; retract(store(S)
INTRODUCTION Background A peek into my work Conclusions
IMPACT OF USING A SEPARATE CONSTRAINT STORE
STACK
INTRODUCTION Background A peek into my work Conclusions
DISCUSSION AND LIMITATIONS
B-Prolog has later added an nt tabling mode which avoid
tabling of arguments, but implemented at the level of the
Prolog system.
However,
Avoiding tabling does not work for all types of constraints
For some constraints, it only works under the certain
assumptions about the model
No interaction between constraints in this implementation
For tabled constraints we need
canonical representation
Pruning of non-essential parts of the constraint store
INTRODUCTION Background A peek into my work Conclusions
GENOME MODELS
Gene finding in a genomic context
What are the constraints between adjacent genes in the
genome?
Extent of (possible) overlap
Modeled as hard constraints
Gene reading frames, i.e., due to leading strand bias,
operons etc.
Modeled as (probabilistic) soft constraints
INTRODUCTION Background A peek into my work Conclusions
AN APPLICATION OF CONSTRAINED MARKOV
MODELS
We wish to incorporate overlapping gene constraints into gene
finding.
Divide and conquer two step approach to gene finding:
1. Gene prediction: A gene finder supplies a set of candidate
predictions p1 . . . pn, called the initial set.
2. Pruning: The initial set is pruned according to certain rules
or constraints. We call the pruned set the final set.
INTRODUCTION Background A peek into my work Conclusions
PRUNING STEP AS A CONSTRAINT OPTIMIZATION
PROBLEM
CSP formulation
We introduce variables X = xi . . . xn corresponding to each
prediction p1 . . . pn in the initial set. All variables have boolean
domains, ∀xi ∈ X, D(xi) = {true, false} and
xi = true ⇔ pi ∈ final set.
Multiple solutions
We want the “best” solution
Optimize for prediction confidence scores
Constraint Optimization Problem (COP)
COP formulation
Let the scores of p1 . . . pn be s1 . . . sn and si ∈ R+.
Maximize n
i=1 si, subject to C.
INTRODUCTION Background A peek into my work Conclusions
VARIABLE ORDERING
Assume an ordering on the variables,
Initial set predictions p1 . . . pn are sorted by the position of
their left-most base, such that
∀pi, pj, i < j ⇒ left-most(pi) ≤ left-most(pj).
The variables x1 . . . xn of the CSP/COP are given the same
ordering.
INTRODUCTION Background A peek into my work Conclusions
COP IMPLEMENTATION WITH MARKOV CHAIN (1)
We propose to use a (constrained) Markov chain for the COP.
The Markov chain has a begin state, an end state and two
states for each variable xi corresponding to its boolean
domain D(xi).
The state corresponding to D(xi) = true is denoted αi and
the state corresponding to D(xi) = false is denoted βi.
In this model, a path from the begin state to the end state
corresponds to a potential solution of the CSP.
INTRODUCTION Background A peek into my work Conclusions
FROM CONFIDENCE SCORES TO TRANSITION
PROBABILITIES
P(α1|begin) = σ1
P(β1|begin) = 1 − σ1
P(end|αn) = P(end|βn) = 1.
P(αi|αi−1) = P(αi|βi−1) = σi
P(βi|αi−1) = P(βi|βi−1) = 1 − σi
σi = 0.5 + λ +
(0.5 − λ) × (si − min(s1 . . . sn))
max(s1 . . . sn) − min(s1 . . . sn)
INTRODUCTION Background A peek into my work Conclusions
ENCODING CONSTRAINTS WITH CONSTRAINT
HANDLING RULES
Constraints: alpha/2 and beta/2 ≈ visited states.
Example: Genemark inconsistency rules
alpha(Left1,Right1), alpha(Left2,Right2) <=>
Left1 =< Left2, Right1 >= Right2 | fail.
beta(Left1,Right1), alpha(Left2,Right2) <=>
Left1 =< Left2, Right1 >= Right2 | fail.
The most probable consistent path is found using PRISMs
generic adaptation of the Viterbi algorithm
Each step adds either a alpha or beta (active) constraint
Incremental Pruning: For each step we only apply
constraints which may be transitively involved in rules
with the active constraint
INTRODUCTION Background A peek into my work Conclusions
EXPERIMENTAL RESULTS
Prediction on E.coli. using simplistic codon frequency
based gene finder.
Pruning using our global optimization approach (with all
inconsistency rules) versus local heuristic rules2.
Method #predictions Sensitivity Specificity Time (seconds)
initial set 10799 0.7625 0.2926 na
Genemark rules 5823 0.7558 0.5379 1.4
ECOGENE rules 4981 0.7148 0.5947 1.7
global optimization 5222 0.7201 0.5714 75
Sensitivity = fraction of known reference genes predicted.
Specificity = fraction of predicted genes that are correct.
2
Note that the results for the ECOGENE heuristic may vary depending on
execution strategy - in case of above results, predictions with lower left
position are considered first.
INTRODUCTION Background A peek into my work Conclusions
A MODEL FOR THE GENOME-WIDE SEQUENCE OF
READING FRAMES
We wish to incorporate gene reading frame constraints into
gene finding.
Divide and conquer two step approach to gene finding (again):
1. Gene prediction: A gene finder supplies a set of candidate
predictions p1 . . . pn, called the initial set.
2. Pruning: The initial set is pruned according to gene finder
confidence scores and the the probabilities adjacent gene
reading frames. We call the pruned set the final set.
INTRODUCTION Background A peek into my work Conclusions
METHODOLOGY
Genes predictions are
sorted by stop codon
position.
Gene finder scores are
discretized into symbolic
values.
A type of Hidden Markov
Model which we call a
delete-HMM:
A state for each of the
six possible reading
frames and
one delete state
F1
F2
F3
F4
F5
F6
delete
INTRODUCTION Background A peek into my work Conclusions
MODEL
Emission: Finite set of i
symbols δ1 . . . δn
corresponding to ranges of
prediction scores
Frame state transitions:
Relative frequency of
"observed" adjacent gene
reading frame pairs
Transition to delete:
P(δi|state = delete) =
FPδi
FP
(tunable)
F1
F2
F3
F4
F5
F6
delete
INTRODUCTION Background A peek into my work Conclusions
RESULTS
1−Specificity (FPR)
Sensitivity(TPR)
0.0 0.1 0.2 0.3 0.4 0.5
0.50.60.70.80.91.0
threshold
frameseq, trained on Escherichia
frameseq, trained on Salmonella
frameseq, trained on Legionella
frameseq, trained on Bacillus
frameseq, trained on Thermoplasma
INTRODUCTION Background A peek into my work Conclusions
CONCLUSIONS
1. To what extent is it possible to use
probabilistic logic programming for
biological sequence analysis?
2. How can constraints relevant to the
domain of biological sequence analysis
be combined with probabilistic logic
programming?
3. What are the limitations with regard to
efficiency and how can these be dealt
with?
INTRODUCTION Background A peek into my work Conclusions
TO WHAT EXTENT IS IT POSSIBLE TO USE
PROBABILISTIC LOGIC PROGRAMMING FOR
BIOLOGICAL SEQUENCE ANALYSIS?
Commonly used models for biological analysis can be
implemented with probabilistic logic programming
Probabilistic logic programming is a powerful tool for
experimenting with new kinds of models
Efficiency is an issue, but with tabling optimizations it is
efficient enough for many interesting problems
Not merely a powerful abstraction, but
a valuable and practical tool for biological sequence
analysis
INTRODUCTION Background A peek into my work Conclusions
HOW CAN CONSTRAINTS RELEVANT TO THE DOMAIN
OF BIOLOGICAL SEQUENCE ANALYSIS BE COMBINED
WITH PROBABILISTIC LOGIC PROGRAMMING?
Probabilistic logic programming is a suitable language for
building higher level abstractions
A variety of models from biological sequence analysis can
be expressed, e.g., ZOO paper
Constrained Hidden Markov Models
Probabilistic Regular Expressions
(Probabilistic) soft constraints and hard constraints
Side-constraints or as part of model
Constraints affect the search space
INTRODUCTION Background A peek into my work Conclusions
WHAT ARE THE LIMITATIONS WITH REGARD TO
EFFICIENCY AND HOW CAN THESE BE DEALT WITH?
Tabling issues
Discriminating arguments (Christiansen and Gallagher)
Tabling of structured data
Tabling of constraint stores
Constraints.
Can be useful for reducing the search space
Can make search space exponential
Problem decomposition with Bayesian Annotation
Networks
Approximation
Feasibility of inference with complex models
Automatic parallelization – BANpipe

More Related Content

What's hot

20051128.doc
20051128.doc20051128.doc
20051128.docbutest
 
Non-parametric Subject Prediction
Non-parametric Subject PredictionNon-parametric Subject Prediction
Non-parametric Subject PredictionShenghui Wang
 
Latent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text SummarizationLatent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text SummarizationElaheh Barati
 
Browsing-oriented Semantic Faceted Search
Browsing-oriented Semantic Faceted SearchBrowsing-oriented Semantic Faceted Search
Browsing-oriented Semantic Faceted SearchWagner Andreas
 
PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabili...
PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabili...PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabili...
PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabili...Rommel Carvalho
 
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary StudyOn the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study Andre Freitas
 
Centroid-based Text Summarization through Compositionality of Word Embeddings
Centroid-based Text Summarization through Compositionality of Word EmbeddingsCentroid-based Text Summarization through Compositionality of Word Embeddings
Centroid-based Text Summarization through Compositionality of Word EmbeddingsGaetano Rossiello, PhD
 
AN IMPLEMENTATION, EMPIRICAL EVALUATION AND PROPOSED IMPROVEMENT FOR BIDIRECT...
AN IMPLEMENTATION, EMPIRICAL EVALUATION AND PROPOSED IMPROVEMENT FOR BIDIRECT...AN IMPLEMENTATION, EMPIRICAL EVALUATION AND PROPOSED IMPROVEMENT FOR BIDIRECT...
AN IMPLEMENTATION, EMPIRICAL EVALUATION AND PROPOSED IMPROVEMENT FOR BIDIRECT...ijaia
 
Topic detecton by clustering and text mining
Topic detecton by clustering and text miningTopic detecton by clustering and text mining
Topic detecton by clustering and text miningIRJET Journal
 
Ch 9-1.Machine Learning: Symbol-based
Ch 9-1.Machine Learning: Symbol-basedCh 9-1.Machine Learning: Symbol-based
Ch 9-1.Machine Learning: Symbol-basedbutest
 
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...csandit
 
Which Rationality For Pragmatics6
Which Rationality For Pragmatics6Which Rationality For Pragmatics6
Which Rationality For Pragmatics6Louis de Saussure
 
OwlOntDB: A Scalable Reasoning System for OWL 2 RL Ontologies with Large ABoxes
OwlOntDB: A Scalable Reasoning System for OWL 2 RL Ontologies with Large ABoxesOwlOntDB: A Scalable Reasoning System for OWL 2 RL Ontologies with Large ABoxes
OwlOntDB: A Scalable Reasoning System for OWL 2 RL Ontologies with Large ABoxesRokan Uddin Faruqui
 

What's hot (18)

[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
 
20051128.doc
20051128.doc20051128.doc
20051128.doc
 
Non-parametric Subject Prediction
Non-parametric Subject PredictionNon-parametric Subject Prediction
Non-parametric Subject Prediction
 
Latent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text SummarizationLatent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text Summarization
 
Browsing-oriented Semantic Faceted Search
Browsing-oriented Semantic Faceted SearchBrowsing-oriented Semantic Faceted Search
Browsing-oriented Semantic Faceted Search
 
PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabili...
PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabili...PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabili...
PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabili...
 
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary StudyOn the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
 
Centroid-based Text Summarization through Compositionality of Word Embeddings
Centroid-based Text Summarization through Compositionality of Word EmbeddingsCentroid-based Text Summarization through Compositionality of Word Embeddings
Centroid-based Text Summarization through Compositionality of Word Embeddings
 
AN IMPLEMENTATION, EMPIRICAL EVALUATION AND PROPOSED IMPROVEMENT FOR BIDIRECT...
AN IMPLEMENTATION, EMPIRICAL EVALUATION AND PROPOSED IMPROVEMENT FOR BIDIRECT...AN IMPLEMENTATION, EMPIRICAL EVALUATION AND PROPOSED IMPROVEMENT FOR BIDIRECT...
AN IMPLEMENTATION, EMPIRICAL EVALUATION AND PROPOSED IMPROVEMENT FOR BIDIRECT...
 
Topic detecton by clustering and text mining
Topic detecton by clustering and text miningTopic detecton by clustering and text mining
Topic detecton by clustering and text mining
 
Ch 9-1.Machine Learning: Symbol-based
Ch 9-1.Machine Learning: Symbol-basedCh 9-1.Machine Learning: Symbol-based
Ch 9-1.Machine Learning: Symbol-based
 
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
 
ComparativeMotifFinding
ComparativeMotifFindingComparativeMotifFinding
ComparativeMotifFinding
 
Which Rationality For Pragmatics6
Which Rationality For Pragmatics6Which Rationality For Pragmatics6
Which Rationality For Pragmatics6
 
OwlOntDB: A Scalable Reasoning System for OWL 2 RL Ontologies with Large ABoxes
OwlOntDB: A Scalable Reasoning System for OWL 2 RL Ontologies with Large ABoxesOwlOntDB: A Scalable Reasoning System for OWL 2 RL Ontologies with Large ABoxes
OwlOntDB: A Scalable Reasoning System for OWL 2 RL Ontologies with Large ABoxes
 
4 full
4 full4 full
4 full
 
International Journal of Engineering Inventions (IJEI)
International Journal of Engineering Inventions (IJEI)International Journal of Engineering Inventions (IJEI)
International Journal of Engineering Inventions (IJEI)
 
Thesis_Rehan_Aziz
Thesis_Rehan_AzizThesis_Rehan_Aziz
Thesis_Rehan_Aziz
 

Similar to Efficient Probabilistic Logic Programming for Biological Sequence Analysis

Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Miningbutest
 
A scalable ontology reasoner via incremental materialization
A scalable ontology reasoner via incremental materializationA scalable ontology reasoner via incremental materialization
A scalable ontology reasoner via incremental materializationRokan Uddin Faruqui
 
Directed versus undirected network analysis of student essays
Directed versus undirected network analysis of student essaysDirected versus undirected network analysis of student essays
Directed versus undirected network analysis of student essaysRoy Clariana
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2Viral Gupta
 
Probabilistic Latent Factor Induction and
 Statistical Factor Analysis
Probabilistic Latent Factor Induction and
 Statistical Factor AnalysisProbabilistic Latent Factor Induction and
 Statistical Factor Analysis
Probabilistic Latent Factor Induction and
 Statistical Factor AnalysisBayesia USA
 
The Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologyThe Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologySnow Owl
 
Lecture 9 slides: Machine learning for Protein Structure ...
Lecture 9 slides: Machine learning for Protein Structure ...Lecture 9 slides: Machine learning for Protein Structure ...
Lecture 9 slides: Machine learning for Protein Structure ...butest
 
A report of the work done in this project is available here
A report of the work done in this project is available hereA report of the work done in this project is available here
A report of the work done in this project is available herebutest
 
download
downloaddownload
downloadbutest
 
download
downloaddownload
downloadbutest
 
Towards Computational Research Objects
Towards Computational Research ObjectsTowards Computational Research Objects
Towards Computational Research ObjectsDavid De Roure
 
Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...
Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...
Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...Facultad de Informática UCM
 
NLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful inNLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful inKumari Naveen
 
lect36-tasks.ppt
lect36-tasks.pptlect36-tasks.ppt
lect36-tasks.pptHaHa501620
 
The Role Of Ontology In Modern Expert Systems Dallas 2008
The Role Of Ontology In Modern Expert Systems   Dallas   2008The Role Of Ontology In Modern Expert Systems   Dallas   2008
The Role Of Ontology In Modern Expert Systems Dallas 2008Jason Morris
 
Sound Empirical Evidence in Software Testing
Sound Empirical Evidence in Software TestingSound Empirical Evidence in Software Testing
Sound Empirical Evidence in Software TestingJaguaraci Silva
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Reviewchangedaeoh
 

Similar to Efficient Probabilistic Logic Programming for Biological Sequence Analysis (20)

Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
 
A scalable ontology reasoner via incremental materialization
A scalable ontology reasoner via incremental materializationA scalable ontology reasoner via incremental materialization
A scalable ontology reasoner via incremental materialization
 
Directed versus undirected network analysis of student essays
Directed versus undirected network analysis of student essaysDirected versus undirected network analysis of student essays
Directed versus undirected network analysis of student essays
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2
 
How to write a paper
How to write a paperHow to write a paper
How to write a paper
 
Probabilistic Latent Factor Induction and
 Statistical Factor Analysis
Probabilistic Latent Factor Induction and
 Statistical Factor AnalysisProbabilistic Latent Factor Induction and
 Statistical Factor Analysis
Probabilistic Latent Factor Induction and
 Statistical Factor Analysis
 
The Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologyThe Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to Terminology
 
Exposé Ontology
Exposé OntologyExposé Ontology
Exposé Ontology
 
Lecture 9 slides: Machine learning for Protein Structure ...
Lecture 9 slides: Machine learning for Protein Structure ...Lecture 9 slides: Machine learning for Protein Structure ...
Lecture 9 slides: Machine learning for Protein Structure ...
 
A report of the work done in this project is available here
A report of the work done in this project is available hereA report of the work done in this project is available here
A report of the work done in this project is available here
 
download
downloaddownload
download
 
download
downloaddownload
download
 
Towards Computational Research Objects
Towards Computational Research ObjectsTowards Computational Research Objects
Towards Computational Research Objects
 
Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...
Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...
Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...
 
NLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful inNLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful in
 
lect36-tasks.ppt
lect36-tasks.pptlect36-tasks.ppt
lect36-tasks.ppt
 
The Role Of Ontology In Modern Expert Systems Dallas 2008
The Role Of Ontology In Modern Expert Systems   Dallas   2008The Role Of Ontology In Modern Expert Systems   Dallas   2008
The Role Of Ontology In Modern Expert Systems Dallas 2008
 
Sound Empirical Evidence in Software Testing
Sound Empirical Evidence in Software TestingSound Empirical Evidence in Software Testing
Sound Empirical Evidence in Software Testing
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
 
Paul Groth
Paul GrothPaul Groth
Paul Groth
 

More from Christian Have

Efficient Tabling of Structured Data Using Indexing and Program Transformation
Efficient Tabling of Structured Data Using Indexing and Program TransformationEfficient Tabling of Structured Data Using Indexing and Program Transformation
Efficient Tabling of Structured Data Using Indexing and Program TransformationChristian Have
 
Constraints and Global Optimization for Gene Prediction Overlap Resolution
Constraints and Global Optimization for Gene Prediction Overlap ResolutionConstraints and Global Optimization for Gene Prediction Overlap Resolution
Constraints and Global Optimization for Gene Prediction Overlap ResolutionChristian Have
 
Nagios præsentation (på dansk)
Nagios præsentation (på dansk)Nagios præsentation (på dansk)
Nagios præsentation (på dansk)Christian Have
 
Stochastic Definite Clause Grammars
Stochastic Definite Clause GrammarsStochastic Definite Clause Grammars
Stochastic Definite Clause GrammarsChristian Have
 
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...Christian Have
 
Inference with Constrained Hidden Markov Models in PRISM
Inference with Constrained Hidden Markov Models in PRISMInference with Constrained Hidden Markov Models in PRISM
Inference with Constrained Hidden Markov Models in PRISMChristian Have
 

More from Christian Have (6)

Efficient Tabling of Structured Data Using Indexing and Program Transformation
Efficient Tabling of Structured Data Using Indexing and Program TransformationEfficient Tabling of Structured Data Using Indexing and Program Transformation
Efficient Tabling of Structured Data Using Indexing and Program Transformation
 
Constraints and Global Optimization for Gene Prediction Overlap Resolution
Constraints and Global Optimization for Gene Prediction Overlap ResolutionConstraints and Global Optimization for Gene Prediction Overlap Resolution
Constraints and Global Optimization for Gene Prediction Overlap Resolution
 
Nagios præsentation (på dansk)
Nagios præsentation (på dansk)Nagios præsentation (på dansk)
Nagios præsentation (på dansk)
 
Stochastic Definite Clause Grammars
Stochastic Definite Clause GrammarsStochastic Definite Clause Grammars
Stochastic Definite Clause Grammars
 
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
 
Inference with Constrained Hidden Markov Models in PRISM
Inference with Constrained Hidden Markov Models in PRISMInference with Constrained Hidden Markov Models in PRISM
Inference with Constrained Hidden Markov Models in PRISM
 

Recently uploaded

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 

Recently uploaded (20)

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 

Efficient Probabilistic Logic Programming for Biological Sequence Analysis

  • 1. INTRODUCTION Background A peek into my work Conclusions Efficient Probabilistic Logic Programming for Biological Sequence Analysis Christian Theil Have Research group PLIS: Programming, Logic and Intelligent Systems Department of Communication, Business and Information Technologies Roskilde University
  • 2. INTRODUCTION Background A peek into my work Conclusions OUTLINE INTRODUCTION Domain Research questions Background Gene finding Probabilistic Logic Programming A peek into my work Overview of papers The trouble with tabling of structured data Constrained HMMs Applications: Genome models Conclusions
  • 3. INTRODUCTION Background A peek into my work Conclusions INTRODUCTION
  • 4. INTRODUCTION Background A peek into my work Conclusions BIOLOGICAL SEQUENCE ANALYSIS Subfield of bioinformatics
  • 5. INTRODUCTION Background A peek into my work Conclusions BIOLOGICAL SEQUENCE ANALYSIS Subfield of bioinformatics Analyze biological sequences
  • 6. INTRODUCTION Background A peek into my work Conclusions BIOLOGICAL SEQUENCE ANALYSIS Subfield of bioinformatics Analyze biological sequences DNA
  • 7. INTRODUCTION Background A peek into my work Conclusions BIOLOGICAL SEQUENCE ANALYSIS Subfield of bioinformatics Analyze biological sequences DNA RNA
  • 8. INTRODUCTION Background A peek into my work Conclusions BIOLOGICAL SEQUENCE ANALYSIS Subfield of bioinformatics Analyze biological sequences DNA RNA Proteins
  • 9. INTRODUCTION Background A peek into my work Conclusions BIOLOGICAL SEQUENCE ANALYSIS Subfield of bioinformatics Analyze biological sequences DNA RNA Proteins to understand
  • 10. INTRODUCTION Background A peek into my work Conclusions BIOLOGICAL SEQUENCE ANALYSIS Subfield of bioinformatics Analyze biological sequences DNA RNA Proteins to understand Features
  • 11. INTRODUCTION Background A peek into my work Conclusions BIOLOGICAL SEQUENCE ANALYSIS Subfield of bioinformatics Analyze biological sequences DNA RNA Proteins to understand Features Functions
  • 12. INTRODUCTION Background A peek into my work Conclusions BIOLOGICAL SEQUENCE ANALYSIS Subfield of bioinformatics Analyze biological sequences DNA RNA Proteins to understand Features Functions Evolutionary relationships
  • 13. INTRODUCTION Background A peek into my work Conclusions PROBABILISTIC LOGIC PROGRAMMING Declarative programming paradigm Ability to express common and complex models used in biological sequence analysis Concise expression of complex models Separation between logic and control Generic inference algorithms Code as data: Transformations
  • 14. INTRODUCTION Background A peek into my work Conclusions MODELS FOR BIOLOGICAL SEQUENCE ANALYSIS Reflect relationships between features of sequence data Embody constraints – assumptions about data Infer information from data Reasoning about uncertainty → probabilities
  • 15. INTRODUCTION Background A peek into my work Conclusions THE LOST PROJECT . . . seeks to improve ease of modeling, accuracy and reliability of sequence analysis by using logic-statistical models that are yet largely untested in bioinformatics . . . Key focus areas: The PRISM system Prokaryotic gene finding My Ph.D. project is part of the LoSt project and share these focus areas.
  • 16. INTRODUCTION Background A peek into my work Conclusions RESEARCH QUESTIONS 1. To what extent is it possible to use probabilistic logic programming for biological sequence analysis? 2. How can constraints relevant to the domain of biological sequence analysis be combined with probabilistic logic programming? 3. What are the limitations with regard to efficiency and how can these be dealt with? I believe that these are the central questions that need be addressed in order to be able to construct useful tools for biological sequence analysis using probabilistic logic programming.
  • 17. INTRODUCTION Background A peek into my work Conclusions RELATIONS BETWEEN RESEARCH QUESTIONS 1. To what extent is it possible to use probabilistic logic programming for biological sequence analysis? 2. How can constraints relevant to the domain of biological sequence analysis be combined with probabilistic logic programming? 3. What are the limitations with regard to efficiency and how can these be dealt with?
  • 18. INTRODUCTION Background A peek into my work Conclusions APPROACH To build and evaluate Applications Abstractions Optimizations for biological sequence analysis using probabilistic logic programming.
  • 19. INTRODUCTION Background A peek into my work Conclusions APPROACH Applications Deal with relevant biological sequence analysis problems Potentially to contribute new knowledge to biology or bioinformatics Direct substantiation with regard to research question 1 Abstractions Ease modeling Language for incorporating constraints from the domain A higher level of declarativity; Focus on problem rather than implementation (model) details Optimizations Deal with limitations of probabilistic logic programming that may hinder its use in biological sequence analysis. Efficient inference is a precondition for practical use.
  • 20. INTRODUCTION Background A peek into my work Conclusions BACKGROUND Prokaryotic gene finding Probabilistic logic programming
  • 21. INTRODUCTION Background A peek into my work Conclusions PROKARYOTIC GENE FINDING Identify regions of DNA which encode proteins: A (prokaryotic) gene is a consecutive stretch of DNA which, is transcribed as part of an RNA is translated to a complete protein and has a length which is a multiple of three (codons) starts with a “start” codon last codon is a “stop” codon
  • 22. INTRODUCTION Background A peek into my work Conclusions GENES AND OPEN READING FRAMES The identification of prokaryotic genes may be decomposed into two distinct problems: 1. Identification of ORFs which contain protein coding genes. 2. Identification of the correct start codon within an ORF. ORF ::= start not-stop * stop start ::= TTG | CTG | ATT | ATC | ATA | ATG | GTG stop ::= TAA | TAG | TGA not-stop ::= AAA | ... | TTT //all codons except those in stop
  • 23. INTRODUCTION Background A peek into my work Conclusions SIGNALS FOR PROKARYOTIC GENE FINDING Open reading frames Length Nucleotide sequence composition Conservation (sequence similarity in other organisms) Local context Promoters Ribosomal binding site Termination signal GB -35 PB -10 +1 tss SD ≈ +10 Gene ≈ +15-20 Terminator
  • 24. INTRODUCTION Background A peek into my work Conclusions READING FRAMES AND OVERLAPPING GENES RNA can be transcribed from either strand Genes may start in different “reading frames” Genes can overlap in the same and in different reading frames on opposite strands
  • 25. INTRODUCTION Background A peek into my work Conclusions PROBABILISTIC LOGIC PROGRAMMING Logic programming and Prolog Probabilistic logic programming and PRISM
  • 26. INTRODUCTION Background A peek into my work Conclusions LOGIC PROGRAMMING AND PROLOG A Prolog program consist of a finite sequence of rules, B:-A1, . . . , An. These rules define implications, i.e., B if A1 and . . . and An
  • 27. INTRODUCTION Background A peek into my work Conclusions TERMS, LITERALS AND VARIABLES Literals can consist of (possibly) structured terms, that may include variables. number(0). number(s(X)) :- number(X).
  • 28. INTRODUCTION Background A peek into my work Conclusions TERMS, LITERALS AND VARIABLES Literals can consist of (possibly) structured terms, that may include variables. fact number(0). number(s(X)) :- number(X).
  • 29. INTRODUCTION Background A peek into my work Conclusions TERMS, LITERALS AND VARIABLES Literals can consist of (possibly) structured terms, that may include variables. constant number(0). number(s(X)) :- number(X).
  • 30. INTRODUCTION Background A peek into my work Conclusions TERMS, LITERALS AND VARIABLES Literals can consist of (possibly) structured terms, that may include variables. number(0). number(s(X)) :- number(X). term
  • 31. INTRODUCTION Background A peek into my work Conclusions TERMS, LITERALS AND VARIABLES Literals can consist of (possibly) structured terms, that may include variables. number(0). number(s(X)) :- number(X). variables
  • 32. INTRODUCTION Background A peek into my work Conclusions RESOLUTION Problems are stated as theorems (goals) to be proved, e.g., number(X) To prove a consequent, we recursively need to prove the antecedents by using rules where these appear as consequents, number(0). number(s(X)) :- number(X). Solutions number(X) → X = 0 number(X) → X = s(0) number(X) → X = s(s(0)) . . . Derivation
  • 33. INTRODUCTION Background A peek into my work Conclusions RESOLUTION Problems are stated as theorems (goals) to be proved, e.g., number(X) To prove a consequent, we recursively need to prove the antecedents by using rules where these appear as consequents, number(0). number(s(X)) :- number(X). Solutions number(X) → X = 0 number(X) → X = s(0) number(X) → X = s(s(0)) . . . Derivation number(X) → number(0)
  • 34. INTRODUCTION Background A peek into my work Conclusions RESOLUTION Problems are stated as theorems (goals) to be proved, e.g., number(X) To prove a consequent, we recursively need to prove the antecedents by using rules where these appear as consequents, number(0). number(s(X)) :- number(X). Solutions number(X) → X = 0 number(X) → X = s(0) number(X) → X = s(s(0)) . . . Derivation number(X) → number(s(X)) →
  • 35. INTRODUCTION Background A peek into my work Conclusions RESOLUTION Problems are stated as theorems (goals) to be proved, e.g., number(X) To prove a consequent, we recursively need to prove the antecedents by using rules where these appear as consequents, number(0). number(s(X)) :- number(X). Solutions number(X) → X = 0 number(X) → X = s(0) number(X) → X = s(s(0)) . . . Derivation number(X) → number(s(X)) →
  • 36. INTRODUCTION Background A peek into my work Conclusions RESOLUTION Problems are stated as theorems (goals) to be proved, e.g., number(X) To prove a consequent, we recursively need to prove the antecedents by using rules where these appear as consequents, number(0). number(s(X)) :- number(X). Solutions number(X) → X = 0 number(X) → X = s(0) number(X) → X = s(s(0)) . . . Derivation number(X) → number(s(X)) → number(s(0))
  • 37. INTRODUCTION Background A peek into my work Conclusions RESOLUTION Problems are stated as theorems (goals) to be proved, e.g., number(X) To prove a consequent, we recursively need to prove the antecedents by using rules where these appear as consequents, number(0). number(s(X)) :- number(X). Solutions number(X) → X = 0 number(X) → X = s(0) number(X) → X = s(s(0)) . . . Derivation number(X) → number(s(X)) →
  • 38. INTRODUCTION Background A peek into my work Conclusions RESOLUTION Problems are stated as theorems (goals) to be proved, e.g., number(X) To prove a consequent, we recursively need to prove the antecedents by using rules where these appear as consequents, number(0). number(s(X)) :- number(X). Solutions number(X) → X = 0 number(X) → X = s(0) number(X) → X = s(s(0)) . . . Derivation number(X) → number(s(X)) →
  • 39. INTRODUCTION Background A peek into my work Conclusions RESOLUTION Problems are stated as theorems (goals) to be proved, e.g., number(X) To prove a consequent, we recursively need to prove the antecedents by using rules where these appear as consequents, number(0). number(s(X)) :- number(X). Solutions number(X) → X = 0 number(X) → X = s(0) number(X) → X = s(s(0)) . . . Derivation number(X) → number(s(X)) → number(s(s(X))) →
  • 40. INTRODUCTION Background A peek into my work Conclusions RESOLUTION Problems are stated as theorems (goals) to be proved, e.g., number(X) To prove a consequent, we recursively need to prove the antecedents by using rules where these appear as consequents, number(0). number(s(X)) :- number(X). Solutions number(X) → X = 0 number(X) → X = s(0) number(X) → X = s(s(0)) . . . Derivation number(X) → number(s(X)) → number(s(s(X))) →
  • 41. INTRODUCTION Background A peek into my work Conclusions RESOLUTION Problems are stated as theorems (goals) to be proved, e.g., number(X) To prove a consequent, we recursively need to prove the antecedents by using rules where these appear as consequents, number(0). number(s(X)) :- number(X). Solutions number(X) → X = 0 number(X) → X = s(0) number(X) → X = s(s(0)) . . . Derivation number(X) → number(s(X)) → number(s(s(X))) → number(s(s(0)))
  • 42. INTRODUCTION Background A peek into my work Conclusions DERIVATION TREES AND EXPLANATION GRAPHS Consider the following program which adds natural numbers: add(0+0,0). add(A+s(B),s(C)) :- add(A+B,C). add(s(A)+B,s(C)) :- add(A+B,C). And suppose we call the goal, add(s(s(0))+s(s(0)),R) We now have two alternative applicable clauses, alternatives Resulting in either, add(s(0)+s(s(0)),s(R)) or add(s(s(0))+s(0),s(R))
  • 43. INTRODUCTION Background A peek into my work Conclusions DERIVATION TREE s(s(0))+s(s(0)) s(0)+s(s(0)) 0+s(s(0)) 0+s(0) 0+0 s(0)+s(0) 0+s(0) 0+0 s(0)+0 0+0 s(s(0))+s(0) s(0)+s(0) 0+s(0) 0+0 s(0)+0 0+0 s(s(0))+0 s(0)+0 0+0
  • 44. INTRODUCTION Background A peek into my work Conclusions DERIVATION TREE s(s(0))+s(s(0)) s(0)+s(s(0)) 0+s(s(0)) 0+s(0) 0+0 s(0)+s(0) 0+s(0) 0+0 s(0)+0 0+0 s(s(0))+s(0) s(0)+s(0) 0+s(0) 0+0 s(0)+0 0+0 s(s(0))+0 s(0)+0 0+0 Exponential!
  • 45. INTRODUCTION Background A peek into my work Conclusions EXPLANATION GRAPH Polynomial1 1 O(n ∗ m), but would be O(n + m) if arguments were ordered by size
  • 46. INTRODUCTION Background A peek into my work Conclusions TABLING Idea The system maintains a table of calls and their answers. when a new call is entered, check if it is stored in the table if so, use previously found solution Consequence: Explanation graph representation. Significant speed-up of program execution.
  • 47. INTRODUCTION Background A peek into my work Conclusions PROBABILISTIC LOGIC PROGRAMMING Probabilistic logic programming is a form of logic programming which deals with uncertainty. A logic program induces a set of possible worlds, i.e., the set of derivable consequents and their alternative proofs. Probabilistic logic programming extends logic programming by assigning probabilities to each of these possible worlds and extends logical inference into probabilistic inference, as to, e.g., derive the probability of a goal Infer the most probable derivation of a goal Infer the affinities (represented by probabilities) for different possible worlds from data
  • 48. INTRODUCTION Background A peek into my work Conclusions PRISM PRogramming In Statistical Modelling is a framework for probabilistic logic programming Developed by collaboration partners of the Lost project: Yoshitaka Kameya, Taisuke Sato, and Neng-Fa Zhou. An extension of Prolog with random variables, called MSWs Provides efficient generalized inference algorithms (Viterbi, EM, etc) using tabling PRISM program = probabilistic model
  • 49. INTRODUCTION Background A peek into my work Conclusions HIDDEN MARKOV MODEL EXAMPLE Postcard Greetings from wherever, where I am having a great time. Here is what I have been doing: The first two days, I stayed at the hotel reading a good book. Then, on the third day I decided to go shopping. The next three days I did nothing but lie on the beach. On my last day, I went shopping for some gifts to bring home and wrote you this postcard. Sincerely, Some friend of yours Observation sequence
  • 50. INTRODUCTION Background A peek into my work Conclusions HIDDEN MARKOV MODEL run Definition A run of an HMM as a pair consisting of a sequence of states s(0) s(1) . . . s(n) , called a path and a corresponding sequence of emissions e(1) . . . e(n) , called an observation, such that s(0) = s0; ∀i, 0 ≤ i ≤ n − 1, p(s(i); s(i+1)) > 0 (probability to transit from s(i) to s(i+1)); ∀i, 0 < i ≤ n, p(s(i); e(i)) > 0 (probability to emit e(i) from s(i)). Definition The probability of such a run is defined as i=1..n p(s(i−1); s(i)) · p(s(i); e(i))
  • 51. INTRODUCTION Background A peek into my work Conclusions DECODING WITH HIDDEN MARKOV MODELS Inferr the hidden path given the observation sequence. argmaxpathP(path|observation) source: wikipedia The Viterbi algorithm: can be seen as keeping track of, for each prefix of an observed emission sequence, the most probable (partial) path leading to each possible state, and extending those step by step into longer paths, eventually covering the entire emission sequence.
  • 52. INTRODUCTION Background A peek into my work Conclusions EXAMPLE HMM IN PRISM values/2 declares the outcomes of random variables msw/2 simulates a random variable, stochastically selecting one of the outcomes Model in Prolog Specifies relation between variables Example HMM in PRISM values(trans(_), [sunny,rainy]). values(emit(_), [shop,beach,read]). hmm(L):- run_length(T),hmm(T,start,L). hmm(0,_,[]). hmm(T,State,[Emit|EmitRest]) :- T > 0, msw(trans(State),NextState), msw(emit(NextState),Emit), T1 is T-1, hmm(T1,NextState,EmitRest). run_length(7).
  • 53. INTRODUCTION Background A peek into my work Conclusions EXAMPLE HMM IN PRISM values/2 declares the outcomes of random variables msw/2 simulates a random variable, stochastically selecting one of the outcomes Model in Prolog Specifies relation between variables Example HMM in PRISM values(trans(_), [sunny,rainy]). values(emit(_), [shop,beach,read]). hmm(L):- run_length(T),hmm(T,start,L). hmm(0,_,[]). hmm(T,State,[Emit|EmitRest]) :- T > 0, msw(trans(State),NextState), msw(emit(NextState),Emit), T1 is T-1, hmm(T1,NextState,EmitRest). run_length(7).
  • 54. INTRODUCTION Background A peek into my work Conclusions EXAMPLE HMM IN PRISM values/2 declares the outcomes of random variables msw/2 simulates a random variable, stochastically selecting one of the outcomes Model in Prolog Specifies relation between variables Example HMM in PRISM values(trans(_), [sunny,rainy]). values(emit(_), [shop,beach,read]). hmm(L):- run_length(T),hmm(T,start,L). hmm(0,_,[]). hmm(T,State,[Emit|EmitRest]) :- T > 0, msw(trans(State),NextState), msw(emit(NextState),Emit), T1 is T-1, hmm(T1,NextState,EmitRest). run_length(7).
  • 55. INTRODUCTION Background A peek into my work Conclusions A PEEK INTO MY WORK Overview of papers A few selected cases: An abstraction: Constrained HMMs (also an optimization) An optimization: Regarding tabling of structured data A couple of applications: Genome models Using constrained probabilistic models for gene finding with overlapping genes Gene finding with a probabilistic model for genome-sequence of reading
  • 56. INTRODUCTION Background A peek into my work Conclusions PAPERS 1 1. Henning Christiansen, Christian Theil Have, Ole Torp Lassen and Matthieu Petit Taming the Zoo of Discrete HMM Subspecies & some of their Relatives Frontiers in Artificial Intelligence and Applications, 2011 2. Henning Christiansen, Christian Theil Have, Ole Torp Lassen and Matthieu Petit Inference with constrained hidden Markov models in PRISM Theory and Practice of Logic Programming, 2010 3. Christian Theil Have Constraints and Global Optimization for Gene Prediction Overlap Resolution Workshop on Constraint Based Methods for Bioinformatics, 2011
  • 57. INTRODUCTION Background A peek into my work Conclusions PAPERS 2 4. Henning Christiansen, Christian Theil Have, Ole Torp Lassen and Matthieu Petit The Viterbi Algorithm expressed in Constraint Handling Rules 7th International Workshop on Constraint Handling Rules, 2010 5. Christian Theil Have and Henning Christiansen Modeling Repeats in DNA Using Probabilistic Extended Regular Expressions Frontiers in Artificial Intelligence and Applications, 2011 6. Henning Christiansen, Christian Theil Have, Ole Torp Lassen and Matthieu Petit Bayesian Annotation Networks for Complex Sequence Analysis Technical Communications of the 27th International Conference on Logic Programming (ICLP’11)
  • 58. INTRODUCTION Background A peek into my work Conclusions PAPERS 3 7. Henning Christiansen, Christian Theil Have, Ole Torp Lassen and Matthieu Petit A declarative pipeline language for big data analysis Presented at LOPSTR, 2012 8. Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data Using Indexing and Program Transformation Practical Aspects of Declarative Languages, 2012 9. Neng-Fa Zhou and Christian Theil Have Efficient tabling of structured data with enhanced hash-consing Theory and Practice of Logic Programming, 2012
  • 59. INTRODUCTION Background A peek into my work Conclusions PAPERS 4 10. Christian Theil Have and Søren Mørk A Probabilistic Genome-Wide Gene Reading Frame Sequence Model Submitted to PLOS One, 2012 11. Christian Theil Have, Sine Zambach and Henning Christiansen Effects of using Coding Potential, Sequence Conservation and mRNA Structure Conservation for Predicting Pyrrolysine Containing Genes Submitted to BMC Bionformatics, 2012
  • 60. INTRODUCTION Background A peek into my work Conclusions THE TROUBLE WITH TABLING OF STRUCTURED DATA
  • 61. INTRODUCTION Background A peek into my work Conclusions THE TROUBLE WITH TABLING OF STRUCTURED DATA An innocent looking predicate: last/2 last([X],X). last([_|L],X) :- last(L,X). Traverses a list to find the last element. Time/space complexity: O(n). If we table last/2: n + n − 1 + n − 2 . . . 1 ≈ O(n2) ! call: last([1,2,3,4,5],X) last([1,2,3,4,5],X) last([1,2,3,4],X) last([1,2,3],X) last([1,2],X) last([1],X) call table last([1,2,3,4,5],X). last([1,2,3,4],X). last([1,2,3],X). last([1,2],X). last([1],X).
  • 62. INTRODUCTION Background A peek into my work Conclusions A WORKAROUND IMPLEMENTED IN PROLOG We describe a workaround giving O(1) time and space complexity for table lookups for programs with arbitrarily large ground structured data as input arguments. A term is represented as a set of facts. A subterm is referenced by a unique integer serving as an abstract pointer. Matching related to tabling is done solely by comparison of such pointers.
  • 63. INTRODUCTION Background A peek into my work Conclusions AN ABSTRACT DATA TYPE The representation is given by the following predicates which all together can be understood as an abstract datatype. store_term( +ground-term, pointer) The ground-term is any ground term, and the pointer returned is a unique reference (an integer) for that term. retrieve_term( +pointer, ?functor, ?arg-pointers-list) Returns the functor and a list of pointers to representations of the substructures of the term represented by pointer. full_retrieve_term( +pointer, ?ground-term) Returns the term represented by pointer.
  • 64. INTRODUCTION Background A peek into my work Conclusions ADT EXAMPLE Example The following call converts the term f(a,g(b)) into its internal representation and returns a pointer value in the variable P. store_term(f(a,g(b)),P). After this, the following sequence of calls will succeed. retrieve_term(P,f,[P1,P2]), retrieve_term(P1,a,[]), retrieve_term(P2,g,[P21]), retrieve_term(P21,b,[]), full_retrieve_term(P,f(a,g(b))).
  • 65. INTRODUCTION Background A peek into my work Conclusions AN AUTOMATIC PROGRAM TRANSFORMATION We introduce an automatic transformation from a tabled program to an efficient version using our approach. Structured terms are moved from the head of clauses to calls in the body to retrive_term/2 and full_retrieve_term/3.
  • 66. INTRODUCTION Background A peek into my work Conclusions TRANSFORMED HIDDEN MARKOV MODEL We only need to consider the recursive predicate, hmm/2. original version hmm(_,[]). hmm(S,[Ob|Y]) :- msw(out(S),Ob), msw(tr(S),Next), hmm(Next,Y). transformed version hmm(S,ObsPtr):- retrieve_term(ObsPtr,[]). hmm(S,ObsPtr) :- retrieve_term(ObsPtr,[Ob,Y]), msw(out(S),Ob), msw(tr(S),Next), hmm(Next,Y).
  • 67. INTRODUCTION Background A peek into my work Conclusions BENCHMARKING RESULTS ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1000 2000 3000 4000 5000 020406080100120140 b) Running time without indexed lookup sequence length Runningtime(seconds) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1000 2000 3000 4000 5000 0.000.020.040.060.08 a) Running time with indexed lookup sequence length Runningtime(seconds)
  • 68. INTRODUCTION Background A peek into my work Conclusions THE NEXT STEP Integration at the Prolog engine implementation level. Neng-Fa Zhou and Christian Theil Have Efficient tabling of structured data with enhanced hash-consing Theory and Practice of Logic Programming, 2012 Full sharing between tables (call and answer) Sharing with structured data in call stack
  • 69. INTRODUCTION Background A peek into my work Conclusions CONSTRAINED HMMS Definition A constrained HMM (CHMM) is an HMM extended with a set of constraints C, each of which is a mapping from HMM runs into {true, false}. A run of a CHMM, path, observation is a run of the corresponding HMM for which C(path, observation) is true.
  • 70. INTRODUCTION Background A peek into my work Conclusions CONSTRAINED HMMS Why extend an HMM with side-constraints? To create better, more specific models with fewer states Convenient to express prior knowledge in terms of constraints No need to change underlying HMM Sometimes it is not possible or feasible to express such constraints as HMM structure (e.g. all_different) → infeasibly huge state and parameter space fewer paths to consider for any given sequence → decreased running time
  • 71. INTRODUCTION Background A peek into my work Conclusions PAIR HMMS FOR SEQUENCE ALIGNMENT A pair HMM is a special kind of HMM that emits two sequences
  • 72. INTRODUCTION Background A peek into my work Conclusions PAIR HMMS FOR SEQUENCE ALIGNMENT A pair HMM is a special kind of HMM that emits two sequences The match state emit a pair (xiyj) of symbols
  • 73. INTRODUCTION Background A peek into my work Conclusions PAIR HMMS FOR SEQUENCE ALIGNMENT A pair HMM is a special kind of HMM that emits two sequences The match state emit a pair (xiyj) of symbols The insert state emits one symbol xi, from sequence x
  • 74. INTRODUCTION Background A peek into my work Conclusions PAIR HMMS FOR SEQUENCE ALIGNMENT A pair HMM is a special kind of HMM that emits two sequences The match state emit a pair (xiyj) of symbols The insert state emits one symbol xi, from sequence x The delete state emits one symbol yj, from sequence y
  • 75. INTRODUCTION Background A peek into my work Conclusions PAIR HMMS FOR SEQUENCE ALIGNMENT A pair HMM is a special kind of HMM that emits two sequences The match state emit a pair (xiyj) of symbols The insert state emits one symbol xi, from sequence x The delete state emits one symbol yj, from sequence y A run of this model produces an alignment of x and y
  • 76. INTRODUCTION Background A peek into my work Conclusions ALIGNMENT WITH A CONSTRAINED PAIR HMM Consider adding constraints to the pair HMM introduced earlier. For instance.. In a biological context, we may want to only consider alignments with a limited number of insertions and deletions given the assumption that the two sequences are closely related. C = {cardinality_atmost(Nd, [S1, . . . , Sn], delete), cardinality_atmost(Ni, [S1, . . . , Sn], insert)} . The constraint cardinality_atmost(N, L, X) is satisfied whenever L is a list of elements, out of which at most N are equal to X.
  • 77. INTRODUCTION Background A peek into my work Conclusions ALIGNMENT WITH CONSTRAINTS
  • 78. INTRODUCTION Background A peek into my work Conclusions ADDING CONSTRAINT CHECKING TO THE HMM HMM with constraint checking hmm(T,State,[Emit|EmitRest],StoreIn) :- T > 0, msw(trans(State),NxtState), msw(emit(NxtState),Emit), check_constraints([NxtState,Emit],StoreIn,StoreOut), T1 is T-1, hmm(T1,NxtState,EmitRest,StoreOut). Call to check_constraints/3 after each distinct sequence of msw applications Side-constaints: The constraints are assumed to be declared elsewhere and not interleaved with model specification Extra Store argument in the probabilistic predicate
  • 79. INTRODUCTION Background A peek into my work Conclusions CHECKING THE CONSTRAINTS The goal check_constraints/3 calls constraint checkers for all constraints declared on the model. For instance, with our example pair HMM constraint, C = {cardinality_atmost(Nd, [S1, . . . , Sn], delete), cardinality_atmost(Ni, [S1, . . . , Sn], insert)} . We have the following incremental constraint checker implementation A cardinality_atmost constraint checker init_constraint_store(cardinality_atmost(_,_), 0). check_sat(cardinality_atmost(U,Max), U, In, Out) :- Out is In + 1,Out =< Max. check_sat(cardinality_atmost(X,_),U,S,S) :- X = U.
  • 80. INTRODUCTION Background A peek into my work Conclusions A LIBRARY OF GLOBAL CONSTRAINTS FOR HIDDEN MARKOV MODELS Our implementation contains a few well-known global constraints adapted to Hidden Markov Models. Global constraints cardinality lock_to_sequence all_different lock_to_set In addition, the implementation provides operators which may be used to apply constraints to a limited set of variables. Constraint operators state_specific emission_specific forall_subseq (sliding window operator) for_range (time step range operator)
  • 81. INTRODUCTION Background A peek into my work Conclusions TABLING ISSUES Problem: The extra Store argument makes PRISM table multiple goals (for different constraint stores) when it should only store one. hmm(T,State,[Emit|EmitRest],Store) To get rid of the extra argument, check_constraints dynamically maintains it as a stack using assert/retract: check_constraints(Update) :- get_store(StoreBefore), check_constraints(Update,StoreBefore,StoreAfter), forward_store(StoreAfter). get_store(S) :- store(S), !. forward_store(S) :- asserta(store(S)) ; retract(store(S)
  • 82. INTRODUCTION Background A peek into my work Conclusions IMPACT OF USING A SEPARATE CONSTRAINT STORE STACK
  • 83. INTRODUCTION Background A peek into my work Conclusions DISCUSSION AND LIMITATIONS B-Prolog has later added an nt tabling mode which avoid tabling of arguments, but implemented at the level of the Prolog system. However, Avoiding tabling does not work for all types of constraints For some constraints, it only works under the certain assumptions about the model No interaction between constraints in this implementation For tabled constraints we need canonical representation Pruning of non-essential parts of the constraint store
  • 84. INTRODUCTION Background A peek into my work Conclusions GENOME MODELS Gene finding in a genomic context What are the constraints between adjacent genes in the genome? Extent of (possible) overlap Modeled as hard constraints Gene reading frames, i.e., due to leading strand bias, operons etc. Modeled as (probabilistic) soft constraints
  • 85. INTRODUCTION Background A peek into my work Conclusions AN APPLICATION OF CONSTRAINED MARKOV MODELS We wish to incorporate overlapping gene constraints into gene finding. Divide and conquer two step approach to gene finding: 1. Gene prediction: A gene finder supplies a set of candidate predictions p1 . . . pn, called the initial set. 2. Pruning: The initial set is pruned according to certain rules or constraints. We call the pruned set the final set.
  • 86. INTRODUCTION Background A peek into my work Conclusions PRUNING STEP AS A CONSTRAINT OPTIMIZATION PROBLEM CSP formulation We introduce variables X = xi . . . xn corresponding to each prediction p1 . . . pn in the initial set. All variables have boolean domains, ∀xi ∈ X, D(xi) = {true, false} and xi = true ⇔ pi ∈ final set. Multiple solutions We want the “best” solution Optimize for prediction confidence scores Constraint Optimization Problem (COP) COP formulation Let the scores of p1 . . . pn be s1 . . . sn and si ∈ R+. Maximize n i=1 si, subject to C.
  • 87. INTRODUCTION Background A peek into my work Conclusions VARIABLE ORDERING Assume an ordering on the variables, Initial set predictions p1 . . . pn are sorted by the position of their left-most base, such that ∀pi, pj, i < j ⇒ left-most(pi) ≤ left-most(pj). The variables x1 . . . xn of the CSP/COP are given the same ordering.
  • 88. INTRODUCTION Background A peek into my work Conclusions COP IMPLEMENTATION WITH MARKOV CHAIN (1) We propose to use a (constrained) Markov chain for the COP. The Markov chain has a begin state, an end state and two states for each variable xi corresponding to its boolean domain D(xi). The state corresponding to D(xi) = true is denoted αi and the state corresponding to D(xi) = false is denoted βi. In this model, a path from the begin state to the end state corresponds to a potential solution of the CSP.
  • 89. INTRODUCTION Background A peek into my work Conclusions FROM CONFIDENCE SCORES TO TRANSITION PROBABILITIES P(α1|begin) = σ1 P(β1|begin) = 1 − σ1 P(end|αn) = P(end|βn) = 1. P(αi|αi−1) = P(αi|βi−1) = σi P(βi|αi−1) = P(βi|βi−1) = 1 − σi σi = 0.5 + λ + (0.5 − λ) × (si − min(s1 . . . sn)) max(s1 . . . sn) − min(s1 . . . sn)
  • 90. INTRODUCTION Background A peek into my work Conclusions ENCODING CONSTRAINTS WITH CONSTRAINT HANDLING RULES Constraints: alpha/2 and beta/2 ≈ visited states. Example: Genemark inconsistency rules alpha(Left1,Right1), alpha(Left2,Right2) <=> Left1 =< Left2, Right1 >= Right2 | fail. beta(Left1,Right1), alpha(Left2,Right2) <=> Left1 =< Left2, Right1 >= Right2 | fail. The most probable consistent path is found using PRISMs generic adaptation of the Viterbi algorithm Each step adds either a alpha or beta (active) constraint Incremental Pruning: For each step we only apply constraints which may be transitively involved in rules with the active constraint
  • 91. INTRODUCTION Background A peek into my work Conclusions EXPERIMENTAL RESULTS Prediction on E.coli. using simplistic codon frequency based gene finder. Pruning using our global optimization approach (with all inconsistency rules) versus local heuristic rules2. Method #predictions Sensitivity Specificity Time (seconds) initial set 10799 0.7625 0.2926 na Genemark rules 5823 0.7558 0.5379 1.4 ECOGENE rules 4981 0.7148 0.5947 1.7 global optimization 5222 0.7201 0.5714 75 Sensitivity = fraction of known reference genes predicted. Specificity = fraction of predicted genes that are correct. 2 Note that the results for the ECOGENE heuristic may vary depending on execution strategy - in case of above results, predictions with lower left position are considered first.
  • 92. INTRODUCTION Background A peek into my work Conclusions A MODEL FOR THE GENOME-WIDE SEQUENCE OF READING FRAMES We wish to incorporate gene reading frame constraints into gene finding. Divide and conquer two step approach to gene finding (again): 1. Gene prediction: A gene finder supplies a set of candidate predictions p1 . . . pn, called the initial set. 2. Pruning: The initial set is pruned according to gene finder confidence scores and the the probabilities adjacent gene reading frames. We call the pruned set the final set.
  • 93. INTRODUCTION Background A peek into my work Conclusions METHODOLOGY Genes predictions are sorted by stop codon position. Gene finder scores are discretized into symbolic values. A type of Hidden Markov Model which we call a delete-HMM: A state for each of the six possible reading frames and one delete state F1 F2 F3 F4 F5 F6 delete
  • 94. INTRODUCTION Background A peek into my work Conclusions MODEL Emission: Finite set of i symbols δ1 . . . δn corresponding to ranges of prediction scores Frame state transitions: Relative frequency of "observed" adjacent gene reading frame pairs Transition to delete: P(δi|state = delete) = FPδi FP (tunable) F1 F2 F3 F4 F5 F6 delete
  • 95. INTRODUCTION Background A peek into my work Conclusions RESULTS 1−Specificity (FPR) Sensitivity(TPR) 0.0 0.1 0.2 0.3 0.4 0.5 0.50.60.70.80.91.0 threshold frameseq, trained on Escherichia frameseq, trained on Salmonella frameseq, trained on Legionella frameseq, trained on Bacillus frameseq, trained on Thermoplasma
  • 96. INTRODUCTION Background A peek into my work Conclusions CONCLUSIONS 1. To what extent is it possible to use probabilistic logic programming for biological sequence analysis? 2. How can constraints relevant to the domain of biological sequence analysis be combined with probabilistic logic programming? 3. What are the limitations with regard to efficiency and how can these be dealt with?
  • 97. INTRODUCTION Background A peek into my work Conclusions TO WHAT EXTENT IS IT POSSIBLE TO USE PROBABILISTIC LOGIC PROGRAMMING FOR BIOLOGICAL SEQUENCE ANALYSIS? Commonly used models for biological analysis can be implemented with probabilistic logic programming Probabilistic logic programming is a powerful tool for experimenting with new kinds of models Efficiency is an issue, but with tabling optimizations it is efficient enough for many interesting problems Not merely a powerful abstraction, but a valuable and practical tool for biological sequence analysis
  • 98. INTRODUCTION Background A peek into my work Conclusions HOW CAN CONSTRAINTS RELEVANT TO THE DOMAIN OF BIOLOGICAL SEQUENCE ANALYSIS BE COMBINED WITH PROBABILISTIC LOGIC PROGRAMMING? Probabilistic logic programming is a suitable language for building higher level abstractions A variety of models from biological sequence analysis can be expressed, e.g., ZOO paper Constrained Hidden Markov Models Probabilistic Regular Expressions (Probabilistic) soft constraints and hard constraints Side-constraints or as part of model Constraints affect the search space
  • 99. INTRODUCTION Background A peek into my work Conclusions WHAT ARE THE LIMITATIONS WITH REGARD TO EFFICIENCY AND HOW CAN THESE BE DEALT WITH? Tabling issues Discriminating arguments (Christiansen and Gallagher) Tabling of structured data Tabling of constraint stores Constraints. Can be useful for reducing the search space Can make search space exponential Problem decomposition with Bayesian Annotation Networks Approximation Feasibility of inference with complex models Automatic parallelization – BANpipe