Optimization of probabilistic argumentation with Markov processes
1. optimization of probabilistic argumentation
with markov processes
E. Hadoux1
, A. Beynier1
, N. Maudet1
, P. Weng2
and A. Hunter3
Tue., Sept. 29th
(1) Sorbonne Universités, UPMC Univ Paris 6, UMR 7606, LIP6, F-75005, Paris, France
(2) SYSU-CMU Joint Institute of Engineering, Guangzhou, China
SYSU-CMU Shunde International Joint Research Institute, Shunde, China
(3) Department of Computer Science, University College London, Gower Street, London
WC1E 6BT, UK
4. Introduction
∙ Debate argumentation problems between two agents
∙ Probabilistic executable logic to improve expressivity
∙ New class of problems: Argumentation Problem with
Probabilistic Strategies (APS) (Hunter, 2014)
1
5. Introduction
∙ Debate argumentation problems between two agents
∙ Probabilistic executable logic to improve expressivity
∙ New class of problems: Argumentation Problem with
Probabilistic Strategies (APS) (Hunter, 2014)
∙ Purpose of this work: optimize the sequence of arguments
of one agent
1
6. Introduction
∙ Debate argumentation problems between two agents
∙ Probabilistic executable logic to improve expressivity
∙ New class of problems: Argumentation Problem with
Probabilistic Strategies (APS) (Hunter, 2014)
∙ Purpose of this work: optimize the sequence of arguments
of one agent
There will be abuse of the word predicate!
1
8. Formalization of a debate problem
∙ Turn-based game between two agents
∙ Rules to fire in order to attack arguments of the opponent
and revise knowledge
3
9. Formalization of a debate problem
∙ Turn-based game between two agents
∙ Rules to fire in order to attack arguments of the opponent
and revise knowledge
Let us define a debate problem with:
∙ A, the set or arguments
3
10. Formalization of a debate problem
∙ Turn-based game between two agents
∙ Rules to fire in order to attack arguments of the opponent
and revise knowledge
Let us define a debate problem with:
∙ A, the set or arguments
∙ E, the set of attacks
3
11. Formalization of a debate problem
∙ Turn-based game between two agents
∙ Rules to fire in order to attack arguments of the opponent
and revise knowledge
Let us define a debate problem with:
∙ A, the set or arguments
∙ E, the set of attacks
∙ P = 2A × 2E, the public space gathering voiced arguments
3
12. Formalization of a debate problem
∙ Turn-based game between two agents
∙ Rules to fire in order to attack arguments of the opponent
and revise knowledge
Let us define a debate problem with:
∙ A, the set or arguments
∙ E, the set of attacks
∙ P = 2A × 2E, the public space gathering voiced arguments
∙ Two agents: agent 1 and agent 2
3
15. Notation
∙ Arguments: literals (e.g., a, b, c)
∙ Attacks: e(x, y) if x attacks y
∙ Args. in public (resp. private) space: a(x) (resp. hi(x))
4
16. Notation
∙ Arguments: literals (e.g., a, b, c)
∙ Attacks: e(x, y) if x attacks y
∙ Args. in public (resp. private) space: a(x) (resp. hi(x))
∙ Goals:
∧
k g(xk) (resp. g(¬xk)) if xk is (resp. is not) accepted
in the public space (Dung, 1995)
4
17. Notation
∙ Arguments: literals (e.g., a, b, c)
∙ Attacks: e(x, y) if x attacks y
∙ Args. in public (resp. private) space: a(x) (resp. hi(x))
∙ Goals:
∧
k g(xk) (resp. g(¬xk)) if xk is (resp. is not) accepted
in the public space (Dung, 1995)
∙ Rules: prem ⇒ Pr(Acts)
4
18. Notation
∙ Arguments: literals (e.g., a, b, c)
∙ Attacks: e(x, y) if x attacks y
∙ Args. in public (resp. private) space: a(x) (resp. hi(x))
∙ Goals:
∧
k g(xk) (resp. g(¬xk)) if xk is (resp. is not) accepted
in the public space (Dung, 1995)
∙ Rules: prem ⇒ Pr(Acts)
∙ Premises: conjunctions of e(, ), a(), hi()
4
19. Notation
∙ Arguments: literals (e.g., a, b, c)
∙ Attacks: e(x, y) if x attacks y
∙ Args. in public (resp. private) space: a(x) (resp. hi(x))
∙ Goals:
∧
k g(xk) (resp. g(¬xk)) if xk is (resp. is not) accepted
in the public space (Dung, 1995)
∙ Rules: prem ⇒ Pr(Acts)
∙ Premises: conjunctions of e(, ), a(), hi()
∙ Acts: conjunctions of ⊞, ⊟ on e(, ), a() and ⊕, ⊖ on hi()
4
20. Formalization of an APS
An APS is characterized (from the point of view of agent 1) by
⟨A, E, G, S1, g1, g2, S2, P, R1, R2⟩:
∙ A, E, P as specified above
∙ G, the set of all possible goals
∙ Si, the set of private states for agent i
∙ gi ∈ G, the given goal for agent i
∙ Ri, the set of rules for agent i
5
22. Example: Arguments
Is e-sport a sport?
a E-sport is a sport
b E-sport requires focusing,
precision and generates
tiredness
c Not all sports are physical
d Sports not referenced by IOC
exist
e Chess is a sport
f E-sport is not a physical
activity
g E-sport is not referenced by
IOC
h Working requires focusing and
generates tiredness but is not
a sport
6
30. Probabilistic Finite State Machine
To optimize the sequence of arguments for agent 1, we could
optimize the PFSM but:
11
31. Probabilistic Finite State Machine
To optimize the sequence of arguments for agent 1, we could
optimize the PFSM but:
1. depends of the initial state
11
32. Probabilistic Finite State Machine
To optimize the sequence of arguments for agent 1, we could
optimize the PFSM but:
1. depends of the initial state
2. requires knowledge of the private state of the opponent
11
33. Probabilistic Finite State Machine
To optimize the sequence of arguments for agent 1, we could
optimize the PFSM but:
1. depends of the initial state
2. requires knowledge of the private state of the opponent
Using Markov models, we can relax assumptions 1 and 2.
Moreover, the APS formalization can be modified in order to
comply with the Markov assumption.
11
34. Markov Decision Process
A Markov Decision Process (MDP) (Puterman, 1994) is
characterized by a tuple ⟨S, A, T, R⟩:
∙ S, a set of states,
∙ A, a set of actions,
∙ T : S × A → Pr(S), a transition function,
∙ R : S × A → R, a reward function.
12
35. Partially-Observable Markov Decision Process
A Partially-Observable MDP (POMDP) (Puterman, 1994) is
characterized by a tuple ⟨S, A, T, R, O, Q⟩:
∙ S, a set of states,
∙ A, a set of actions,
∙ T : S × A → Pr(S), a transition function,
∙ R : S × A → R, a reward function,
∙ O, an observation set,
∙ Q : S × A → Pr(O), an observation function.
13
36. Mixed-Observability Markov Decision Process
A Mixed-Observability MDP (MOMDP) (Ong et al., 2010) is
characterized by a tuple ⟨Sv, Sh, A, T, R, Ov, Oh, Q⟩:
∙ Sv, Sh, a visible and hidden parts of the state,
∙ A, a set of actions,
∙ T : Sv × A × Sh → Pr(Sv × Sh), a transition function,
∙ R : Sv × A × Sh → R, a reward function,
∙ Ov = Sv, an observation set on the visible part of the state,
∙ Oh, an observation set on the hidden part of the state,
∙ Q : Sv × A × Sh → Pr(Ov × Oh), an observation function.
14
38. Transformation to a MOMDP
An APS from the point of view of agent 1 can be transformed to
a MOMDP:
∙ Sv = S1 × P, Sh = S2
∙ A = {prem(r) ⇒ m|r ∈ R1 and m ∈ acts(r)}
∙ Ov = Sv and Oh = ∅
∙ Q(⟨sv, sh⟩, a, ⟨sv⟩) = 1, otherwise 0
∙ T, see after
16
39. Transformation to a MOMDP: Transition function
Application set
Let Cs(Ri) be the set of rules of Ri that can be fired in state s.
The application set Fr(m, s) is the set of predicates resulting
from the application of act m of a rule r on s. If r cannot be
fired in s, Fr(m, s) = s.
∙ s, a state and r : p ⇒ m, an action s.t. r ∈ A
∙ s′ = Fr(m, s)
∙ r′ ∈ Cs′ (R2) s.t. r′ : p′ ⇒ [π1/m1, . . . , πn/mn]
∙ s′′
i = Fr′ (mi, s′)
∙ T(s, r, s′′
i ) = πi
17
40. Reward function
For the reward function:
∙ with Dung’s semantics: positive reward for each part holding
∙ can be generalized: General Gradual Valuation (Cayrol and
Lagasquie-Schiex, 2005)
18
41. Transformation to a MOMDP
Model sizes:
APS : 8 arguments, 8 attacks, 6 rules
POMDP : 4 294 967 296 states
MOMDP : 16 777 216 states
Untractable instances → need to optimize at the root
19
43. Solving an APS
Two algorithms to solve MOMDPs:
∙ MO-IP (Araya-López et al., 2010), IP of POMDP on MOMDP
(exact method)
∙ MO-SARSOP (Ong et al., 2010), SARSOP of POMDP on MOMDP
(approximate method albeit very efficient)
Two kinds of optimizations: with or without dependencies on
the initial state
21
47. Optimizations without dependencies
Irr. Prunes irrelevant arguments
Enth. Infers attacks
Dom. Removes dominated arguments
Guarantee on the unicity and optimality of the solution.
22
48. Attacks graph
Argument dominance
If an argument is attacked by
any unattacked argument, it is
dominated.
a f
g
b
c
d e
h
Figure: Attacks graph of Example
23
50. Optimization with dependencies
Irr(s0) has to be reapplied each time the initial state changes.
1. For each predicate that is never modified but used as
premises:
1.1 Remove all the rules that are not compatible with the value of
this predicate in the initial state.
1.2 For all remaining rules, remove the predicate from the premises.
24
51. Optimization with dependencies
Irr(s0) has to be reapplied each time the initial state changes.
1. For each predicate that is never modified but used as
premises:
1.1 Remove all the rules that are not compatible with the value of
this predicate in the initial state.
1.2 For all remaining rules, remove the predicate from the premises.
2. For each remaining action of agent 1, track the rules of agent
2 compatible with the application of this action. If a rule of
agent 2 is not compatible with any application of an action
of agent 1, remove it.
24
53. Experiments
We computed a solution for the e-sport problem with:
∙ MO-IP, which did not finish after tens of hours
∙ MO-SARSOP without optimizations, idem
∙ MO-SARSOP with optimizations, 4sec for the optimal solution
26
57. Conclusion
We presented:
1. A new framework to represent more complex debate
problems (APS)
2. A method to transform those problems to a MOMDP
3. Several optimizations that can be used outside of the
context of MOMDP
4. A method to optimize actions of an agent in an APS
30
58. Perspectives
We are currently working on using POMCP (Silver and Veness,
2010).
We are also using HS3MDPs (Hadoux et al., 2014).
31
60. Bibliography I
Araya-López, M., Thomas, V., Buffet, O., and Charpillet, F. (2010).
A closer look at MOMDPs. In 22nd IEEE International
Conference on Tools with Artificial Intelligence (ICTAI).
Cayrol, C. and Lagasquie-Schiex, M.-C. (2005). Graduality in
argumentation. Journal of Artificial Intelligence Research
(JAIR), 23:245–297.
Dung, P. M. (1995). On the acceptability of arguments and its
fundamental role in nonmonotonic reasoning, logic
programming and n-person games. Artificial Intelligence,
77(2):321–358.
33
61. Bibliography II
Hadoux, E., Beynier, A., and Weng, P. (2014). Solving
Hidden-Semi-Markov-Mode Markov Decision Problems. In
Straccia, U. and Calì, A., editors, Scalable Uncertainty
Management, volume 8720 of Lecture Notes in Computer
Science, pages 176–189. Springer International Publishing.
Hunter, A. (2014). Probabilistic strategies in dialogical
argumentation. In International Conference on Scalable
Uncertainty Management (SUM’14) LNCS volume 8720.
Ong, S. C., Png, S. W., Hsu, D., and Lee, W. S. (2010). Planning
under uncertainty for robotic tasks with mixed observability.
In The International Journal of Robotics Research.
34
62. Bibliography III
Puterman, M. L. (1994). Markov Decision Processes: discrete
stochastic dynamic programming. John Wiley & Sons.
Silver, D. and Veness, J. (2010). Monte-Carlo planning in large
POMDPs. In Proceedings of the 24th Conference on Neural
Information Processing Systems (NIPS), pages 2164–2172.
35