A Proof of Concept of an early warning system that is able to simulate users and spot potential dangers before they occur. We desire in particular assist and alert users in order to prevent them from getting in dangerous situations, which is important when dealing with impaired individuals.
An Early Warning System for Ambient Assisted Living
1. An Early Warning System For Ambient Assisted Living
Andrea Monacchi
School of Computer Science
Reykjavik University
Menntavegur 1, IS-101, Iceland
andrea11@ru.is
June 4th, 2012
4. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Motivation
Motivation
Life expectancy increased significantly → more and more elderly people
http://www.minutewomen.net
Many elderly people live on their own.
may be affected by a cognitive or physical impairment
may need assistance to ensure their health, safety and well-being
Assistive technologies help reducing costs of dedicated caregivers.
5. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Motivation
Motivation
Daily life activities at home can generate dangers that may lead to accidents.
http://www.boomers-with-elderly-parents.com/
People with impairments find difficult to notice those situations.
Discovering dangers and warning users is important for preventing accidents.
6. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Problem
Problem statement
Monitoring world changes:
being aware of the current context
predicting intentions leading to dangers
“Let’s suppose we have a way to recognize the current state and user’s goal.”
An early warning system is about:
Finding a safe path leading to the goal (i.e. simulating the user)
Disclosing dangers close to the user
Preventing dangers by alerting the user beforehand
7. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Problem
Problem statement
Monitoring world changes:
being aware of the current context
predicting intentions leading to dangers
“Let’s suppose we have a way to recognize the current state and user’s goal.”
An early warning system is about:
Finding a safe path leading to the goal (i.e. simulating the user)
Disclosing dangers close to the user
Preventing dangers by alerting the user beforehand
8. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Problem
One day in the future
It is like giving a look to the future to improve the present.
http://s2.thisnext.com
“Here is the thing about the future.
Every time you look at, it changes,
because you looked at it, and that
changes everything else.”. Next movie.
9. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Research statement
Research statement
Design a system that:
Gets a representation of the environment as input
Learns to evaluate states according to their danger level
Explores/interacts with the environment model
Stores its experience
Guides and Alerts the user to prevent potential dangers
unitedshutdownsafety.com
We need to:
Represent the environment in terms of properties
Implement a decision maker that evaluates the danger level
Evaluate the effectiveness of the system
10. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Research statement
Research statement
Design a system that:
Gets a representation of the environment as input
Learns to evaluate states according to their danger level
Explores/interacts with the environment model
Stores its experience
Guides and Alerts the user to prevent potential dangers
unitedshutdownsafety.com
We need to:
Represent the environment in terms of properties
Implement a decision maker that evaluates the danger level
Evaluate the effectiveness of the system
11. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Research statement
Research statement
Design a system that:
Gets a representation of the environment as input
Learns to evaluate states according to their danger level
Explores/interacts with the environment model
Stores its experience
Guides and Alerts the user to prevent potential dangers
unitedshutdownsafety.com
We need to:
Represent the environment in terms of properties
Implement a decision maker that evaluates the danger level
Evaluate the effectiveness of the system
12. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Research statement
Research statement
Design a system that:
Gets a representation of the environment as input
Learns to evaluate states according to their danger level
Explores/interacts with the environment model
Stores its experience
Guides and Alerts the user to prevent potential dangers
unitedshutdownsafety.com
We need to:
Represent the environment in terms of properties
Implement a decision maker that evaluates the danger level
Evaluate the effectiveness of the system
13. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Research statement
Research statement
Design a system that:
Gets a representation of the environment as input
Learns to evaluate states according to their danger level
Explores/interacts with the environment model
Stores its experience
Guides and Alerts the user to prevent potential dangers
unitedshutdownsafety.com
We need to:
Represent the environment in terms of properties
Implement a decision maker that evaluates the danger level
Evaluate the effectiveness of the system
15. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Context-aware computing
Context-aware computing
Context is the way to produce unobtrusive systems.
Situational information (environment, user, ICT)
Understanding the human intent in order to act properly
Reducing the interaction and disappear into the environment
Context adaptation: (adaptive systems)
Planning agents
Machine learning agents
learning user’s preferences
tailored and adaptive service
Context prediction: (proactive systems)
Anticipating future contexts
Proactive adaptation of services
e.g. heating based on next activity
16. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Context-aware computing
Context-aware computing
Context is the way to produce unobtrusive systems.
Situational information (environment, user, ICT)
Understanding the human intent in order to act properly
Reducing the interaction and disappear into the environment
Context adaptation: (adaptive systems)
Planning agents
Machine learning agents
learning user’s preferences
tailored and adaptive service
Context prediction: (proactive systems)
Anticipating future contexts
Proactive adaptation of services
e.g. heating based on next activity
17. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Context-aware computing
Context-aware computing
Context is the way to produce unobtrusive systems.
Situational information (environment, user, ICT)
Understanding the human intent in order to act properly
Reducing the interaction and disappear into the environment
Context adaptation: (adaptive systems)
Planning agents
Machine learning agents
learning user’s preferences
tailored and adaptive service
Context prediction: (proactive systems)
Anticipating future contexts
Proactive adaptation of services
e.g. heating based on next activity
18. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Specifying dynamical systems
Specifying dynamical systems
Knowledge representation
e.g. Situation calculus, Event calculus, Fluent calculus
The Game Description Language
First order logic and purely axiomatic language
Deterministic and fully observable games (I), imperfect information (II)
Games as state machines
State: set of fluents (holding properties)
Each Player selects an action to modify the global state
Specification of multiagent societies as games
Declarative language: agents learn to behave from rules
19. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Specifying dynamical systems
Specifying dynamical systems
Knowledge representation
e.g. Situation calculus, Event calculus, Fluent calculus
The Game Description Language
First order logic and purely axiomatic language
Deterministic and fully observable games (I), imperfect information (II)
Games as state machines
State: set of fluents (holding properties)
Each Player selects an action to modify the global state
Specification of multiagent societies as games
Declarative language: agents learn to behave from rules
20. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Specifying dynamical systems
GDL relations: an example
role(?r) ?r is a player
init(?f) ?f holds in the initial po-
sition
true(?f) ?f holds in the current
position
legal(?r,?m) role ?r can perform the
move ?m
does(?r,?m) role ?r does move ?m
next(?f) ?f holds in the next po-
sition
terminal the state is terminal
goal(?r,?v) role ?r gets the reward
?v
sees(?r,?p) the role ?r perceives ?p
in the next turn
random the random player
(role x)
(role o)
(init (cell 1 1 b))
(⇐ (legal ?player (mark ?x ?y))
(true (cell ?x ?y b))
(true (control ?player)))
(⇐ (next (cell ?x ?y ?player))
(does ?player (mark ?x ?y)))
(⇐ (goal ?player 100)
(line ?player))
(⇐ terminal
(role ?player)
(line ?player))
(⇐ terminal
(not open))
21. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Specifying dynamical systems
GDL relations: an example
role(?r) ?r is a player
init(?f) ?f holds in the initial po-
sition
true(?f) ?f holds in the current
position
legal(?r,?m) role ?r can perform the
move ?m
does(?r,?m) role ?r does move ?m
next(?f) ?f holds in the next po-
sition
terminal the state is terminal
goal(?r,?v) role ?r gets the reward
?v
sees(?r,?p) the role ?r perceives ?p
in the next turn
random the random player
(role x)
(role o)
(init (cell 1 1 b))
(⇐ (legal ?player (mark ?x ?y))
(true (cell ?x ?y b))
(true (control ?player)))
(⇐ (next (cell ?x ?y ?player))
(does ?player (mark ?x ?y)))
(⇐ (goal ?player 100)
(line ?player))
(⇐ terminal
(role ?player)
(line ?player))
(⇐ terminal
(not open))
22. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Specifying dynamical systems
Learning to make complex decisions
Decision making: making a choice among several alternatives.
The real environment is stochastic.
Acting may imply unexpected effects
The same behaviour may yield different scores
Deterministic planners (e.g. online replanning) may not be enough
Various solutions:
Markov Decision Processes (MDPs)
MDP = (S, A, R, T)
Partially Observable MDPs
sensor model for (belief) states
Computing a policy:
Dynamic programming → complete transition model
Optimization methods → search for a policy
Reinforcement Learning
23. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Specifying dynamical systems
Learning to make complex decisions
Decision making: making a choice among several alternatives.
The real environment is stochastic.
Acting may imply unexpected effects
The same behaviour may yield different scores
Deterministic planners (e.g. online replanning) may not be enough
Various solutions:
Markov Decision Processes (MDPs)
MDP = (S, A, R, T)
Partially Observable MDPs
sensor model for (belief) states
Computing a policy:
Dynamic programming → complete transition model
Optimization methods → search for a policy
Reinforcement Learning
24. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Specifying dynamical systems
Learning to make complex decisions
Decision making: making a choice among several alternatives.
The real environment is stochastic.
Acting may imply unexpected effects
The same behaviour may yield different scores
Deterministic planners (e.g. online replanning) may not be enough
Various solutions:
Markov Decision Processes (MDPs)
MDP = (S, A, R, T)
Partially Observable MDPs
sensor model for (belief) states
Computing a policy:
Dynamic programming → complete transition model
Optimization methods → search for a policy
Reinforcement Learning
25. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Specifying dynamical systems
Learning to make complex decisions
Decision making: making a choice among several alternatives.
The real environment is stochastic.
Acting may imply unexpected effects
The same behaviour may yield different scores
Deterministic planners (e.g. online replanning) may not be enough
Various solutions:
Markov Decision Processes (MDPs)
MDP = (S, A, R, T)
Partially Observable MDPs
sensor model for (belief) states
Computing a policy:
Dynamic programming → complete transition model
Optimization methods → search for a policy
Reinforcement Learning
27. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Related work
Related work
RL-GGP: integrating GGP and reinforcement learning (Jocular+RL-Glue)
Assisted Living with MDPs
Handwashing tutoring system
Prompting aids
Minimizing intrusiveness and maximizing completed handwashing
Questionnaire for system-caregiver comparison
Control tasks
Smart light control (e.g. MAVHOME)
Energy saving
Maximizing comfort and minimizing interaction
Visual and audio cues to notify dangers beforehand
Rule-based risk assessment
User study to understand how users perceive and react to notifications
29. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Modeling a domestic environment
Simulating user’s behaviour
Classification of people’s actions in domestic environments:
Action Examples
Position changes left, right, forward and backward
Manipulation of passive objects
Take an apple
Hold an apple
Release the apple
Interaction with active objects
Switch a stove on/off
Open/Close a cupboard
Table: Actions in a domestic context
Game Description Language for modeling the domestic setting.
Using tools from the General Game Playing context
Leading expertise of Reykjavik University
30. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Designing an early warning system
Guiding the user
Planning problem: finding a path of actions leading to the goal
Search driven by danger level: the path must avoid dangers
Environment is stochastic → Deterministic planners may not be enough
Probabilistic planning by means of MDPs and POMDPs
Solution is a behaviour/policy covering each state
31. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Designing an early warning system
Guiding the user
Planning problem: finding a path of actions leading to the goal
Search driven by danger level: the path must avoid dangers
Environment is stochastic → Deterministic planners may not be enough
Probabilistic planning by means of MDPs and POMDPs
Solution is a behaviour/policy covering each state
32. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Designing an early warning system
Guiding the user
Planning problem: finding a path of actions leading to the goal
Search driven by danger level: the path must avoid dangers
Environment is stochastic → Deterministic planners may not be enough
Probabilistic planning by means of MDPs and POMDPs
Solution is a behaviour/policy covering each state
33. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Designing an early warning system
Computing a policy
TD-learning → off-model algorithms (no prior knowledge required)
Q-learning
Off-policy method
The selection can be guided by a pseudorandom strategy (e.g. -greedy)
Thus more flexible, less realistic, and slower than on-policy ones
General way to perform planning in stochastic environments
Storing experience:
Tabular version (e.g. Hash table)
Knowledge as entries (state,action) → value
Requires filling entries → infeasible for big state spaces
Function approximator: allows the learner to generalize from experience
linear (e.g. weighted sum of features)
non-linear (e.g. neural network)
34. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Designing an early warning system
Computing a policy
TD-learning → off-model algorithms (no prior knowledge required)
Q-learning
Off-policy method
The selection can be guided by a pseudorandom strategy (e.g. -greedy)
Thus more flexible, less realistic, and slower than on-policy ones
General way to perform planning in stochastic environments
Storing experience:
Tabular version (e.g. Hash table)
Knowledge as entries (state,action) → value
Requires filling entries → infeasible for big state spaces
Function approximator: allows the learner to generalize from experience
linear (e.g. weighted sum of features)
non-linear (e.g. neural network)
35. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Designing an early warning system
Computing a policy
TD-learning → off-model algorithms (no prior knowledge required)
Q-learning
Off-policy method
The selection can be guided by a pseudorandom strategy (e.g. -greedy)
Thus more flexible, less realistic, and slower than on-policy ones
General way to perform planning in stochastic environments
Storing experience:
Tabular version (e.g. Hash table)
Knowledge as entries (state,action) → value
Requires filling entries → infeasible for big state spaces
Function approximator: allows the learner to generalize from experience
linear (e.g. weighted sum of features)
non-linear (e.g. neural network)
36. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Designing an early warning system
Warning the user
Monitoring user’s sphere of protection
Finding dangerous states within sphere
Alerting the user when too close
Distance = number of actions to risk
First action of each risky sequence
Variant of breadth-first search → limited depth
38. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Practical reasoning with GDL
An overview of the system
The system consists of:
39. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Practical reasoning with GDL
Practical reasoning with GDL
Game dynamics as state machine
Automatic reasoning tool: The General Game Playing Base package
Language modifications
goal → reward
danger relation
appliance and object to use certain rules
other agents as roles (e.g. telephone ringing)
40. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Implementing a warning agent
QBox
Plenty of libraries and frameworks
However:
Need for a customizable tool
Simple implementation and learning
experience
QBox library
TD(0), Q(0), Watkins Q(λ), SARSA The QBox logo
The QBox organization
41. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Implementing a warning agent
Implementing a warning agent
The warning agent
Warning process: running an episode
Tabular Q(λ) agent + depth-limited breadth-first search
Experience stored in the brain used to evaluate and guide actual user’s
behaviour
System returns:
Last action evaluation
Best action
Danger level
Action to avoid
42. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Implementing a warning agent
The user interface
Testing the system:
Providing awareness of current
state
showing a view
using visual indicators
Simulating particular situations
Solution:
Virtual environments for
simulating smart environments
Rapid prototyping technique in
HCI
Flexible, fast and cheap
jMonkey engine
The GUI during a simulation
44. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Implementing a warning agent
Evaluating the system: environment
Optimal policy specified going through state space
Deviation increases for unexplored states and wrong orders
1 Experiment = 20 policies trained for 200 episodes
Results reported as charts (jFreeChart library)
ExpDev(%) = (AvgDEV /AN) ∗ 100,
AvgDEV = N
k=1 devk /N
Acc(%) = 100 − ExpDev
45. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Implementing a warning agent
Evaluating the system: scenario
Domestic scenario as testing environment
User’s goal: cooking - using the pot and the stove
Danger: a flammable cleaning product
46. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Implementing a warning agent
Evaluating the system: exploring the state space
Exploration of the state space: = 0.1, 0.3, 0.5, 0.7, 0.9, exponential decay
0.9999.
Parameter Value
α (learning rate) 0.2
α-decay 0.8
α-decay type exponential (ensures convergence)
γ (discount factor) 0.95
λ (decay rate) 0.9
47. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Implementing a warning agent
Evaluating the system: defining rewards
Rewards determines system behaviour
Difficult task
May produce cycles in the policy
Main behaviours:
Take the bottle away from danger (danger matters)
Stove set on without the pot (goal matters)
No-danger/Goal Danger/No-Goal No-danger/No-goal Danger/Goal Accurac
-1.0 1.0 -0.01 0.0 44.09%
0.0 1.0 -0.01 0.0 39.63%
1.0 0.0 -0.01 0.0 71.01%
1.0 -1.0 -0.01 0.0 84.35%
0.0 -1.0 -0.01 0.0 75.88%
Table: Results for different reward functions
Results may be improved by increasing exploration
48. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Implementing a warning agent
Evaluating the system: defining rewards
Rewards determines system behaviour
Difficult task
May produce cycles in the policy
Main behaviours:
Take the bottle away from danger (danger matters)
Stove set on without the pot (goal matters)
No-danger/Goal Danger/No-Goal No-danger/No-goal Danger/Goal Accurac
-1.0 1.0 -0.01 0.0 44.09%
0.0 1.0 -0.01 0.0 39.63%
1.0 0.0 -0.01 0.0 71.01%
1.0 -1.0 -0.01 0.0 84.35%
0.0 -1.0 -0.01 0.0 75.88%
Table: Results for different reward functions
Results may be improved by increasing exploration
49. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Implementing a warning agent
Evaluating the system: defining rewards
Rewards determines system behaviour
Difficult task
May produce cycles in the policy
Main behaviours:
Take the bottle away from danger (danger matters)
Stove set on without the pot (goal matters)
No-danger/Goal Danger/No-Goal No-danger/No-goal Danger/Goal Accurac
-1.0 1.0 -0.01 0.0 44.09%
0.0 1.0 -0.01 0.0 39.63%
1.0 0.0 -0.01 0.0 71.01%
1.0 -1.0 -0.01 0.0 84.35%
0.0 -1.0 -0.01 0.0 75.88%
Table: Results for different reward functions
Results may be improved by increasing exploration
50. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Implementing a warning agent
Evaluating the system: defining rewards
Rewards determines system behaviour
Difficult task
May produce cycles in the policy
Main behaviours:
Take the bottle away from danger (danger matters)
Stove set on without the pot (goal matters)
No-danger/Goal Danger/No-Goal No-danger/No-goal Danger/Goal Accurac
-1.0 1.0 -0.01 0.0 44.09%
0.0 1.0 -0.01 0.0 39.63%
1.0 0.0 -0.01 0.0 71.01%
1.0 -1.0 -0.01 0.0 84.35%
0.0 -1.0 -0.01 0.0 75.88%
Table: Results for different reward functions
Results may be improved by increasing exploration
51. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Implementing a warning agent
Assessing the interaction with users
53. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Conclusions
Conclusions
System able to prevent users from getting too close to dangers
General solution: GDL definitions
Danger is evaluated automatically
Indicators report suggestions and warning notifications to users
54. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Future work
Future work: learning to intervene
Need for a dynamic threshold to decide whether to intervene
Adapting to different preferences and awareness faculties
System trained by the end user accepting or rejecting the intervention
Tailored service
Lack of generality
Requires interaction with actual users
Future work
55. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Future work
Future work: learning to intervene
Need for a dynamic threshold to decide whether to intervene
Adapting to different preferences and awareness faculties
System trained by the end user accepting or rejecting the intervention
Tailored service
Lack of generality
Requires interaction with actual users
Future work
56. Introduction Background Related Work Approach Implementation Evaluating the solution Conclusions
Future work
Future work
Implementing a function approximator and/or tile coding to scale the
solution
Exploiting hierarchical approaches
Assigning rewards through apprenticeship learning
Taking habits into account for the exploration
Learning to intervene to minimize discomfort
Speeding the reasoning process up by using FPGAs
Using virtual environments as time machines for simulating future events
57. Questions
Thanks for your attention.
“An early warning system for Ambient Assisted Living”
Andrea Monacchi
andrea11@ru.is
http://andreamonacchi.tk