Upcoming SlideShare
×

# Stochastic Definite Clause Grammars

632 views

Published on

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
632
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
5
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Stochastic Definite Clause Grammars

1. 1. Stochastic Definite Clause Grammars InterLogOnt, Nov 24 Saarbrücken Christian Theil Have cth@ruc.dk
2. 2. What and why? ● DCG Syntax ● Probabilistic model – Convenient – Polynomial parsing – Expresssive – Parameter learning – Flexible – Robust Stochastic Definite Clause Grammars
3. 3. DCG Grammar rules ● Definite Clause Grammars – Grammar formalism on top of Prolog. – Production rules with unification variables – Context-sensitive.. (stronger actually) – Exploits unification semantics of Prolog Simple DCG grammar Difference list representation sentence --> subject(N), verb(N), object. sentence(L1,L4) :- subject(sing) --> [he]. subject(N,L1, L2), subject(plur) --> [they]. verb(N,L2,L3), object --> [cake]. object(L3,L4). object --> [food]. subject(sing,[he|R],R). verb(sing) --> [eats]. ... verb(plur) --> [eat].
4. 4. Stochastic Definite Clause Grammars ● Implemented as a DCG compiler – With some extensions to DCG syntax ● Transforms a DCG (grammar) into a stochastic logic program implemented in PRISM. ● Probabilistic inferences and parameter learning are then performed using PRISM (S)DCG Compilation PRISM program
5. 5. Compilation process
6. 6. ● PRISM - http://sato-www.cs.titech.ac.jp/prism/ ● Extends Prolog with random variables (msws in PRISM lingo) ● Performs probabilistic inferences over such programs , ● Probability calculation - probability of a derivation ● Viterbi - find most probable derivation ● EM learning – learn parameters from a set of example goals PRISM program example: Bernoulli trials target(ber,2). values(coin,[heads,tails]). :- set_sw(coin, 0.6+0.4). ber(N,[R,Y]) :- N>0, msw(coin,R), % Probabilistic choice N1 is N – 1, ber(N1,Y). % Recursion ber(0,[]).
7. 7. The probabilistic model One random variable encodes probability of expansion for rules with same functor/ arity s(N) ==> np(N). s(N) ==> np(N),vp(N). The choice is made a selection rule The selected rule is invoked through unification transformation target(s,2). values(s,[s1,s2]). Selection rule s(A,B) :- msw(s,Outcome), s(Outcome, A, B). s(s1, A, B) :- np(_, A, B). Implementation rules s(s2, A, B) :- np(N, A, D), vp(N, D, B).
8. 8. Unification failure Since SDCG embodies unification constraints, some derivations may fail We only observe the successful derivations in sample data. All derivations If the training algorithm only considers successful derivations, it will converge to a wrong probability distribution (missing probability Failed derivations mass). In PRISM this is handled using the fgEM algorithm, which is based on Cussens Failure-Adjusted Maximization (FAM) algorithm. A “failure program” which traces all derivations is derived using First Order Compilaton and the probabilities of failed derivations are estimated as part of the fgEM algorithm.
9. 9. Unification failure issues Infinite/long derivation paths ● Impossible/difficult to derive failure program. ● Workaround: SDCG has an option which limits the depth of derivation. ● Still: size of the failure program is very much an issue. FOC requirement - “universally quantified clauses”: ● Not the case with Difference Lists: 'C'([X|Y], X,Y). ● Workaround 1: – Trick the first order compiler by manually adding implications after program is partly compiled. – Works empirically, but may be dubious ● Workaround 2: – Append based grammar – Works, but have inherent inefficiencies
10. 10. Syntax extensions ● SDCG extends the usual DCG syntax – Compatible with DCG (superset) ● Extensions: – Regular expression operators ● Convenient rule recursion – “Macros” ● Allows writing rules as templates which are filled out according to certain rules – Conditioning ● Convenient expression of higher order HMM's ● Lexicalization
11. 11. Regular expression operators Regular expressions operators can be associated with rule constituents: name ==> ?(title), +(firstname), *(lastname). Meaning: ? may be repeated zero or one times * may be repeated zero or more times + may be one or more time The constituent in the original rule is replaced with a substitute which refers to intermediary rules, which implements the regular expression. ? regex_sub ==> [] * regex_sub ==> original_constituent regex_sub ==> regex_sub,regex_sub + Limitation: Cannot be used in rules with unification variables.
12. 12. Template macros Special goals prefixed with @ are treated as macros. Grammar rules with macros are dynamically expanded. expand_mode Example: determines which word(he,sg,masc). word(she,sg,fem). variables to keep number(Word,Number) :- word(Word,Number,_). gender(Word,Gender) :- word(Word,_,Gender). wordlist(X,[X]). remove insert expand_mode(number(-, +)). word(@number(Word, N), @gender(Word,G)) ==> expand_mode(gender(-, +)). @wordlist(Word, WordList). expand_mode(wordlist(-, +)). Meta rule is created and called, exp(Word, N, G, WordList) :- number(Word,N), gender(Word, G), wordlist(Word,WordList). Resulting grammar: word(sg,masc) ==> [ he ]. find all answers word(sg,fem) ==> [ she ].
13. 13. Conditioning A conditioned rule takes the form, name(F1,F2,...,Fn) | V1,V2,...,Vn ==> C1,C2,...,Cn. The | operator can be seen as a guard that assures the rule is only expanded if the conditions V1..Vn unify with F1..FN It is possible to specify which variables must unify using a condition_mode: condition_mode(n(+,+,-)). n(A,B,C) | x,y ==> c1, c2. Conditioned rules are grouped by non-terminal name and arity and always has the same number of conditions. Probabilistic semantics: A distinct probability distribution for each distinct set of conditions.
14. 14. Conditioning semantics Model without conditioning: Model with conditioning: n ==> n1. n|a ==> n1(X). n ==> n2. n|a ==> n2(X). n1 ==> ... n|b ==> n1(X). ... n|b ==> n2(X). ... n1_1 n|a n1 n2_1 n n n2 n1_2 n|b n2_2 Stochastic selection Selection using unification
15. 15. Example, simple toy grammar start ==> s(N). n(sg) ==> [time]. s(N) ==> np(N). n(pl) ==> [flies]. s(N) ==> np(N),vp(N). v(sg) ==> [flies]. np(N) ==> n(sg),n(N). v(sg) ==> [crawls]. np(N) ==> n(N). v(pl) ==> [fly]. vp(N) ==> v(N),np(N). vp(N) ==> v(N) Probability of a | ?- prob(start([time,flies],[],Tree), P). sentence P = 0.083333333333333 ? yes | ?- viterbig(start([time,flies],[],Tree), P). Tree = [start,[[s(pl),[[np(pl),[[n(sg),[[]]],[n(pl),[[]]]]]]]]] The most probable P = 0.0625 ? parse yes | ?- n_viterbig(10,start([time,flies],[],Tree), P). Tree = [start,[[s(pl),[[np(pl),[[n(sg),[[]]],[n(pl),[[]]]]]]]]] P = 0.0625 ?; Most probable parses Tree = [start,[[s(sg),[[np(sg),[[n(sg),[[]]]]],[vp(sg),[[v(sg),[[]]]]]]]]] (indeed all two) P = 0.020833333333333 ?; no
16. 16. More interesting example Simple part of speech tagger – fully connected first order HMM. consume_word([Word]) :- word(Word). conditioning_mode(tag_word(+,-,-)). start(TagList) ==> tag_word(none,_,TagList). tag_word(Previous, @tag(Current), [Current|TagsRest]) | @tag(SomeTag) ==> @consume_word(W), ?(tag_word(Current,_,TagsRest)). Some tags Some words tag(none). word(the). tag(det). word(can). tag(noun). word(will). tag(verb). word(rust). tag(modalverb).
17. 17. Questions?