1. I think this paper “probabilistic logic learning” introduces approaches about how
can we combine logic, probability and learning. More precisely, this is overview
of the approaches combining knowledge representation framework such as first
order logic, inference method such as in Bayesian network or Hidden Markov
model, and machine learning.
In the past few years, there have been some attempts to solve the problem of
probabilistic logic learning. Author says this state-of-art.
First, they split the type of logical view into a model theory and proof theory. In
a model theoretic view, a logic program restricts the set of possible worlds, it is
so-called interpretation. And in a proof theoretic view, a logic program can be
used to prove the goal is logically entailed by the program, it is so-called
resolution.
Second, formalism about defining probabilities on each logical view is
introduced. There are Bayesian network which can describe the conditional
probability distribution. Another is stochastic context free grammar which can
define probability on derivation.
Third, approaches that combine probabilistic reasoning and logical
representation. There are some approaches about upgrading Bayesian network
for model theoretic view. But this has problems about Bayesian network’s
virtual. The first is guaranteeing acyclic requirement. This causes high
computational cost. And the second is arbitrarily random variable problem which
is so complicated. This is solved by introducing combination rule and aggregate
function. Peter Haddawy’s approach is first, and later he with Ngo use
combination rule to solve the second problem (Probabilistic-logic program). And
Bayesian logic programs which have a minimal set of primitives. And last are
probabilistic relational models. This use the database’s entity relational model
this is efficient to learn and solve acyclic requirement problem.
and PRIMSs and stochastic logic program is introduced for proof. SLPs are a
upgrading the stochastic context free grammar. There is a problem about
missing values so normalization is needed. But in PRISMs, labeling to facts will
solve this problem. Differences this approaches from before are stress more on
probabilistic than logical inference.
Above two is expressive but not efficient. So, intermediate representations will
be used. Markov model is a finite state automaton. There are relational Markov
models which don’t allow variables and unification. And Logical Hidden Markov
models which do allow. Hidden tree Markov models can define over spaces of
2. trees and graph.
Learning probabilistic logics is the process which one adapts probabilistic model
we describe above chapter on the basis of data. The distinction between
Learning from interpretation and learning from entailment is useful in inductive
logic programming perspective.
There are two tasks for learning, parameter estimation problem and structure
learning. Using the likelihood of the data, one can estimate the parameters.
There are EM-algorithm which is famous for using iteration and converge. In
the parameter estimation, there is Bayesian network parameter estimation
method which is discussed in Heckerman’s tutorial. And Cussen introduce the
failure-adjusted maximization which fixing the logical part but learn the
parameters.
The structure learning, in satisfying the cover criterion and scoring function, is
find the optimal hypothesis. This is considered as applying refinement operators
on the program. This can add or delete some structures.
In Bayesian network, a proposition is that. In Probabilistic relational model
aggregate literals can be added or deleted. In Bayesian logic programs, similar
to ILP, but has problem about Bayesian network’s violation. And in stochastic
logic program, regular clauses can be applied by the operator. there are
variables, Muggleton’s two-phase approaches and integrating both phase for
single predicate learning. And the last is logical and relational Markov models
which is about transitions between the abstract states.
I think this paper is about integrated various concepts and comparisons in many
approaches. And some materials in this course are somewhat a part of this
integrated concept “Probabilistic Logic Learning”. So, there is explanation about
relationships between works in this paper, For example, probabilistic relational
models (PRMs) is explained in first order probabilistic logics chapter. PRMs use
entity relationship model instead of definite clause logic. This allows multiple
attributes in information storing and dependency at attribute level.
Inductive logic programming which is implemented application CIGOL, Golem
or Progol. I think Probabilistic logic learning is extended version of inductive
logic programming with probabilistic method. The important approaches are
learning Bayesian network, stochastic logic programs and Markov models.
These all involve probabilistic structure in interpretation, proof and trace. So, in
using this structure in ILP way, they can learn examples more close to real
world.
3. This paper is well written and has explicit assumptions about the boundaries of
this introductory survey. And I think this research field is important because this
is question about how can we solve the difficulties about real world learning (or
data mining) which concerns so complicated data.
But there are some portions that need more detail explanation which I don’t
understand thoroughly, The stochastic logic program and PRISMs part. I still
don’t get the way how the value 0.624 comes from. And I feel somewhat
inconvenience to reference many reference materials.