SlideShare a Scribd company logo
1 of 19
© SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger
Explaining Online Reinforcement Learning Decisions
of Self-Adaptive Systems
Felix Feit, Andreas Metzger, Klaus Pohl
ACSOS 2022
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Agenda
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
A. Metzger, ACSOS 2022 2
1. Motivation
2. Explanation Approach “XRL-DINE“
3. Validation
4. Discussion and Outlook
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Online Reinforcement Learning for SAS
A. Metzger, ACSOS 2022
3
MAPE-K Combination
RL
Self-Adaptation Logic
Analyze
Monitor Execute
Plan
Knowledge
Online RL for SAS
Execute
Policy
(K)
Monitor
Action
Selection
(A + P)
Policy
Update
Action
A
State S
Reward R
Next state S’
Action A
State S
Reward R
Action
Selection
Next state S’
RL Agent
Policy
Policy
Update
Environ-
ment
• Online RL = Emerging approach for addressing design time uncertainty [Palm et al. @ CAiSE 2020]
 Leverage information only available @ runtime (i.e., during live system execution)
• Since 2019 use of Learning for SAS most prominent strategy [Porter et al. @ ACSOS 2020]
• Conceptual Model of Online RL:
[Metzger et al. @ Computing 2022]
Learning Goal
defined by
Reward
Function
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Online Reinforcement Learning for SAS
Policy (Knowledge) represented
as deep neural network
Pro
• Handling of continuous
states and actions
• Generalization over unseen,
neighboring states
Con
• Deep RL = “black box”
•  Limited trustworthiness
•  Difficult to debug
e.g., reward function
correctly defined?
A. Metzger, ACSOS 2022 4
Increasing use of Deep RL The power of deep learning:
“/imagine yellow Labrador in the style of…”
[generated using Midjourney AI]
UP RIGHT DOWN LEFT
-12,224348 -12,044198 -12,463232 -12,349292
-11,368766 -11,407281 -11,567699 -11,741966
-10,724603 -10,758294 -10,878073 -10,956671
-10,118299 -9,9104485 -10,209248 -9,9700161
-9,2663503 -9,0282991 -9,4003475 -9,4925009
-8,1970854 -8,145322 -8,3578677 -8,6139991
-7,5583819 -7,3056764 -7,4218645 -7,6021728
-6,5359939 -6,4629871 -6,6290838 -6,800472
-5,8610507 -5,6457718 -5,6477581 -6,1414037
-5,01 -4,8068304 -4,7890915 -4,8609271
-3,9 -3,902123 -3,9078592 -4,0513057
-3,1389 -3,273 -2,9849214 -3,3706948
-12,609306 -12,612216 -12,579926 -12,784373
-11,845763 -11,794108 -11,845683 -12,478945
-11,237922 -10,873018 -10,900784 -11,436694
-10,008564 -9,9425947 -9,9517958 -10,268937
-9,2581936 -8,9864275 -8,9889605 -9,1798613
-8,1866584 -7,9931344 -7,9940185 -8,9510004
-7,0729858 -6,9970914 -6,9977784 -8,2920288
-6,0319487 -5,9988816 -5,9989739 -6,6224484
-6,0996963 -4,9994285 -4,999519 -6,232246
-4,8057376 -3,9997425 -3,9997874 -4,9212681
-3,1293615 -2,9999285 -2,9999384 -3,556979
-2,7588749 -2,1 -2 -2,2345458
-13,360876 -12 -13,954412 -12,991696
-12,565624 -11 -112,1835 -12,995431
-11,733772 -10 -112,79659 -11,980454
-10,740429 -9 -111,48554 -10,947642
-9,95339 -8 -112,12695 -9,9878453
-8,9112 -7 -112,67844 -8,8890419
-7,992178 -6 -112,91331 -7,965498
-6,9846114 -5 -112,41604 -6,9763523
-5,9533325 -4 -111,81117 -5,9401313
-4,9217978 -3 -110,6219 -4,8068301
-3,9307738 -2 -112,85584 -3,9418704
-2,9796888 -1,9340884 -1 -2,9827181
-13 -112,90933 -13,998779 -13,995334
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
UP RIGHT DOWN LEFT
-12,224348 -12,044198 -12,463232 -12,349292
-11,368766 -11,407281 -11,567699 -11,741966
-10,724603 -10,758294 -10,878073 -10,956671
-10,118299 -9,9104485 -10,209248 -9,9700161
-9,2663503 -9,0282991 -9,4003475 -9,4925009
-8,1970854 -8,145322 -8,3578677 -8,6139991
-7,5583819 -7,3056764 -7,4218645 -7,6021728
-6,5359939 -6,4629871 -6,6290838 -6,800472
-5,8610507 -5,6457718 -5,6477581 -6,1414037
-5,01 -4,8068304 -4,7890915 -4,8609271
-3,9 -3,902123 -3,9078592 -4,0513057
-3,1389 -3,273 -2,9849214 -3,3706948
-12,609306 -12,612216 -12,579926 -12,784373
-11,845763 -11,794108 -11,845683 -12,478945
-11,237922 -10,873018 -10,900784 -11,436694
-10,008564 -9,9425947 -9,9517958 -10,268937
-9,2581936 -8,9864275 -8,9889605 -9,1798613
State
S
Action A
State
S
Action
A
Deep RL Classical RL (Q-Learning)
Environ-
ment
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Explainable Reinforcement Learning (XRL) for SAS
A. Metzger, ACSOS 2022 5
State of the Art
Goal-based Models [Welsh et al. @ Trans. CCI 2014]
• Explanations in terms of the satisficement of softgoals
• Requires making assumptions about environment dynamics at
design time (difficult due to design time uncertainty)
Provenance Graphs [Reynolds et al. @ MODELS Wkshp 2020]
• Graph history to determine if, how and why model changed
• Graph can become too complex to be meaningfully interpreted by humans
• Query language suggested for “pruning” graphs but not for explanations
Temporal Graph Models [Ullauri et al. @ SoSym 2022]
• Explicitly considers Deep RL
• Explanations via queries to model @ runtime
• Interesting points of interactions extracted via CEP
• No detailed, contrastive decomposition of explanations
Example: Vacuum Cleaner
Example:
Fibonacci
Example: Remote Data Mirroring
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Agenda
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
A. Metzger, ACSOS 2022 6
1. Motivation
2. Explanation Approach “XRL-DINE“
3. Validation
4. Discussion and Outlook
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
XRL-DINE
Reward Decomposition
[Sequeira et al. @ Artif. Intell. 2020]
Decompose reward function to explain short-
term goal orientation of RL (train sub-RL agents)
Pro
• Helpful in the presence of multiple, “competing”
quality goals for learning
• Provides contrastive (counterfactual) explanations
Con
• No indication of explanation’s relevance
• Requires manually selecting relevant explanations
 cognitive overhead
Interestingness Elements
[Juozapaitis et al. @ IJCAI Wkshp 2019]
Identify relevant moments of interaction
between agent and environment at runtime
Pro
• Facilitates automatically selecting relevant
interactions to be explained
Con
• Does not explain whether RL behaves as expected
and for the right reasons
7
Augment and Combine RL Explanation Techniques from AI Research
Decomposed Interestingness Elements (DINEs)
A. Metzger, ACSOS 2022
+
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
XRL-DINE
Internal Behaviour
Evolution of Reward
External Behaviour
Evolution of States and Actions
8
Understanding RL without DINEs?
A. Metzger, ACSOS 2022
R
S, A
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
XRL-DINE
Important Interaction
Determine whether RL in given state is
uncertain (wide range of actions) or
certain (almost always same action)
• How much does relative importance
of actions differ for each sub-agent?
• Number of DINES shown can be
tuned via Threshold ρ (level of
inequality)
9
Three Types of DINEs
A. Metzger, ACSOS 2022
Visualization in Dashboard
Certain Uncertain
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
XRL-DINE
Reward Channel Dominance
Influence that each sub-agent has on
each possible action
• Influence of rewards of sub-agents
on composed decision
10
Three Types of DINEs
A. Metzger, ACSOS 2022
Visualization in Dashboard
Relative
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
XRL-DINE
Reward Channel Extremum
Points after local minimum/maximum
of state-value  RL decisions in
potentially critical states
• ExpectedReward (S) –
ExpectedReward (S’) > ϕ
 Maximum
• Number of DINES shown can be
tuned via Threshold ϕ
11
Three Types of DINEs
A. Metzger, ACSOS 2022
Visualization in Dashboard
Minimum
Maximum
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Agenda
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
A. Metzger, ACSOS 2022 12
1. Motivation
2. Explanation Approach “XRL-DINE“
3. Validation
4. Discussion and Outlook
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Validation
Proof-of-Concept
Implementation
• Double Deep Q-Networks with
Experience Replay [Hasselt et al.
@ AAAI 2016]
• Approximation of environment
model using supervised
learning on contents of replay
memory
• OpenAI Gym interface to
connect RL and SAS
A. Metzger, ACSOS 2022 13
Experimental Setup
RL Problem Formulation
• Action Space =
{Add / remove web servers,
Change dimmer value}
• State Space =
{Request arrival rate,
Average throughput,
Average response time}
• Decomposed Reward Function
Self-Adaptive System
• “SWIM” Exemplar [Moreno et al.
@ SEAMS 2018]
• Self-adaptive multi-tier web
application
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Validation
A. Metzger, ACSOS 2022 14
Qualitative Results
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Validation
Important Interactions Reward Channel Extrema
A. Metzger, ACSOS 2022 15
Quantitative Results
Cognitive load ~ number of DINEs shown to developers
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Agenda
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
A. Metzger, ACSOS 2022 16
1. Motivation
2. Explanation Approach “XRL-DINE“
3. Validation
4. Discussion and Outlook
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Discussion
A. Metzger, ACSOS 2022 17
Limitations of XRL-DINE
May generate difficult to understand explanations
• Reason 1: Reward function was decomposed incorrectly or non-optimally
• Reason 2: Environment dynamics may delay effects of adaptations and thus understandability
Not directly applicable to collaborative adaptive systems
• XRL-DINE does not consider decisions of other RL agents
• May lead to misleading explanations if DINE-XRL is directly applied in collaborative setting
Only works for value-based deep RL
• XRL-DINEs computed using value-function Q(S, A) – details see paper
• Value-based deep RL: policy = Q(S, A) approximated by neural network
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Outlook
A. Metzger, ACSOS 2022 18
Considering Explanation Requirements from Social Sciences [Miller @ Artif. Intell. 2019]
Contrastive: “why P happened instead of Q?”
 “Reward Channel Dominance” DINE
Selective: “no need for complete course of events”
 “Reward Channel Extrema” DINE &
“Important Interactions” DINE
Causal: “most likely explanation not
necessarily the best”
 Future work: e.g., check whether agent relies on
spurious correlations (and not causality)
[Gajcin et al. @ AAAMAS Wkshp 2022]
Social: “transfer of knowledge as part of a
conversation”
 Future work: e.g., Chatbot4XAI


?
? Prediction
Explanation
Train will be
delayed
Train passed
last light 5
min later
than typical
This also
happened
yesterday, but
why today a
problem?
Because of
attribute
“number of
trains behind
current one” > 5
What would
have to change
such that not
delay
(counterfactual)
“number of
trains behind
current one” < 2
Human
(Explainee)
Chatbot4XAI
(Explainer)
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Thank You!
Research leading to these results has received funding from the EU’s Horizon 2020 research and
innovation programme under grant agreements no. 780351 & 871493
Further Reading
• A. Metzger, C. Quinton, Z. Á. Mann, L. Baresi, K. Pohl, “Realizing Self-Adaptive Systems via Online
Reinforcement Learning and Feature-Model-guided Exploration”, Computing, Springer, March, 2022
• A. Metzger, C. Quinton, Z. Mann, L. Baresi, and K. Pohl, “Feature model-guided online reinforcement
learning for self-adaptive services,” in 18th Int’l Conf. on Service-Oriented Computing (ICSOC 2020),
LNCS 12571, Springer, 2020
• A. Palm, A. Metzger, and K. Pohl, “Online reinforcement learning for self-adaptive information
systems,” in 32nd Int’l Conf. on Advanced Information Systems Engineering (CAiSE 2020), LNCS
12127. Springer, 2020
A. Metzger, ACSOS 2022 19
www.enact-project.eu www.dataports-project.eu

More Related Content

Similar to Explaining Online Reinforcement Learning Decisions of Self-Adaptive Systems

Robust Tracking Via Feature Mapping Method and Support Vector Machine
Robust Tracking Via Feature Mapping Method and Support Vector MachineRobust Tracking Via Feature Mapping Method and Support Vector Machine
Robust Tracking Via Feature Mapping Method and Support Vector MachineIRJET Journal
 
Advances dynamic pile testing technology - Independent Geoscience Pty Ltd
Advances dynamic pile testing technology - Independent Geoscience Pty LtdAdvances dynamic pile testing technology - Independent Geoscience Pty Ltd
Advances dynamic pile testing technology - Independent Geoscience Pty LtdIndependent Geoscience
 
Machine Learning Approach to Geometry Prediction in Cold Spray Additive Manuf...
Machine Learning Approach to Geometry Prediction in Cold Spray Additive Manuf...Machine Learning Approach to Geometry Prediction in Cold Spray Additive Manuf...
Machine Learning Approach to Geometry Prediction in Cold Spray Additive Manuf...Daiki Ikeuchi
 
"How Today's Power Grid Implementation Choices Impact Future Smart Grid Deplo...
"How Today's Power Grid Implementation Choices Impact Future Smart Grid Deplo..."How Today's Power Grid Implementation Choices Impact Future Smart Grid Deplo...
"How Today's Power Grid Implementation Choices Impact Future Smart Grid Deplo...Smart Grid Interoperability Panel
 
Knowledge Discovery in Environmental Management
Knowledge Discovery in Environmental Management Knowledge Discovery in Environmental Management
Knowledge Discovery in Environmental Management Dr. Aparna Varde
 
Optalysys Optical Processing for HPC
Optalysys Optical Processing for HPCOptalysys Optical Processing for HPC
Optalysys Optical Processing for HPCinside-BigData.com
 
Program Management in MBSE
Program Management in MBSEProgram Management in MBSE
Program Management in MBSETaylorDuffy11
 
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Paragon_Science_Inc
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudSigOpt
 
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...ChemAxon
 
3D Television: When will it become economically feasible?
3D Television: When will it become economically feasible?3D Television: When will it become economically feasible?
3D Television: When will it become economically feasible?Jeffrey Funk
 
Michael_Kogan_portfolio
Michael_Kogan_portfolioMichael_Kogan_portfolio
Michael_Kogan_portfolioMichael Kogan
 
Michael_Kogan_portfolio
Michael_Kogan_portfolioMichael_Kogan_portfolio
Michael_Kogan_portfolioMichael Kogan
 
EC-TEL 2016: Which Algorithms Suit Which Learning Environments?
EC-TEL 2016: Which Algorithms Suit Which Learning Environments?EC-TEL 2016: Which Algorithms Suit Which Learning Environments?
EC-TEL 2016: Which Algorithms Suit Which Learning Environments?Simone Kopeinik
 
cv_lukas_mandrake_jpl_update2016_formatted
cv_lukas_mandrake_jpl_update2016_formattedcv_lukas_mandrake_jpl_update2016_formatted
cv_lukas_mandrake_jpl_update2016_formattedLukas Mandrake
 

Similar to Explaining Online Reinforcement Learning Decisions of Self-Adaptive Systems (20)

A cross-layer approach to energy management in manufacturing
A cross-layer approach to energy management in manufacturingA cross-layer approach to energy management in manufacturing
A cross-layer approach to energy management in manufacturing
 
Robust Tracking Via Feature Mapping Method and Support Vector Machine
Robust Tracking Via Feature Mapping Method and Support Vector MachineRobust Tracking Via Feature Mapping Method and Support Vector Machine
Robust Tracking Via Feature Mapping Method and Support Vector Machine
 
Advances dynamic pile testing technology - Independent Geoscience Pty Ltd
Advances dynamic pile testing technology - Independent Geoscience Pty LtdAdvances dynamic pile testing technology - Independent Geoscience Pty Ltd
Advances dynamic pile testing technology - Independent Geoscience Pty Ltd
 
Machine Learning Approach to Geometry Prediction in Cold Spray Additive Manuf...
Machine Learning Approach to Geometry Prediction in Cold Spray Additive Manuf...Machine Learning Approach to Geometry Prediction in Cold Spray Additive Manuf...
Machine Learning Approach to Geometry Prediction in Cold Spray Additive Manuf...
 
"How Today's Power Grid Implementation Choices Impact Future Smart Grid Deplo...
"How Today's Power Grid Implementation Choices Impact Future Smart Grid Deplo..."How Today's Power Grid Implementation Choices Impact Future Smart Grid Deplo...
"How Today's Power Grid Implementation Choices Impact Future Smart Grid Deplo...
 
Knowledge Discovery in Environmental Management
Knowledge Discovery in Environmental Management Knowledge Discovery in Environmental Management
Knowledge Discovery in Environmental Management
 
report
reportreport
report
 
Electronics engineer resume
Electronics engineer resumeElectronics engineer resume
Electronics engineer resume
 
Optalysys Optical Processing for HPC
Optalysys Optical Processing for HPCOptalysys Optical Processing for HPC
Optalysys Optical Processing for HPC
 
Program Management in MBSE
Program Management in MBSEProgram Management in MBSE
Program Management in MBSE
 
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
 
PPT FOR BIG
PPT FOR BIGPPT FOR BIG
PPT FOR BIG
 
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...
 
3D Television: When will it become economically feasible?
3D Television: When will it become economically feasible?3D Television: When will it become economically feasible?
3D Television: When will it become economically feasible?
 
Zack hsu resume
Zack hsu resumeZack hsu resume
Zack hsu resume
 
Michael_Kogan_portfolio
Michael_Kogan_portfolioMichael_Kogan_portfolio
Michael_Kogan_portfolio
 
Michael_Kogan_portfolio
Michael_Kogan_portfolioMichael_Kogan_portfolio
Michael_Kogan_portfolio
 
EC-TEL 2016: Which Algorithms Suit Which Learning Environments?
EC-TEL 2016: Which Algorithms Suit Which Learning Environments?EC-TEL 2016: Which Algorithms Suit Which Learning Environments?
EC-TEL 2016: Which Algorithms Suit Which Learning Environments?
 
cv_lukas_mandrake_jpl_update2016_formatted
cv_lukas_mandrake_jpl_update2016_formattedcv_lukas_mandrake_jpl_update2016_formatted
cv_lukas_mandrake_jpl_update2016_formatted
 

More from Andreas Metzger

Antrittsvorlesung - APL.pptx
Antrittsvorlesung - APL.pptxAntrittsvorlesung - APL.pptx
Antrittsvorlesung - APL.pptxAndreas Metzger
 
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive ServicesFeature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive ServicesAndreas Metzger
 
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...Andreas Metzger
 
Data-driven AI for Self-Adaptive Software Systems
Data-driven AI for Self-Adaptive Software SystemsData-driven AI for Self-Adaptive Software Systems
Data-driven AI for Self-Adaptive Software SystemsAndreas Metzger
 
Data-driven Deep Learning for Proactive Terminal Process Management
Data-driven Deep Learning for Proactive Terminal Process ManagementData-driven Deep Learning for Proactive Terminal Process Management
Data-driven Deep Learning for Proactive Terminal Process Management Andreas Metzger
 
Big Data Technology Insights
Big Data Technology InsightsBig Data Technology Insights
Big Data Technology InsightsAndreas Metzger
 
Proactive Process Adaptation using Deep Learning Ensembles
Proactive Process Adaptation using Deep Learning Ensembles Proactive Process Adaptation using Deep Learning Ensembles
Proactive Process Adaptation using Deep Learning Ensembles Andreas Metzger
 
Data-driven AI for Self-adaptive Information Systems
Data-driven AI for Self-adaptive Information SystemsData-driven AI for Self-adaptive Information Systems
Data-driven AI for Self-adaptive Information SystemsAndreas Metzger
 
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud Towards an End-to-End Architecture for Run-time Data Protection in the Cloud
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud Andreas Metzger
 
Considering Non-sequential Control Flows for Process Prediction with Recurren...
Considering Non-sequential Control Flows for Process Prediction with Recurren...Considering Non-sequential Control Flows for Process Prediction with Recurren...
Considering Non-sequential Control Flows for Process Prediction with Recurren...Andreas Metzger
 
Big Data Value in Mobility and Logistics
Big Data Value in Mobility and Logistics Big Data Value in Mobility and Logistics
Big Data Value in Mobility and Logistics Andreas Metzger
 
Predictive Business Process Monitoring considering Reliability and Risk
Predictive Business Process Monitoring considering Reliability and RiskPredictive Business Process Monitoring considering Reliability and Risk
Predictive Business Process Monitoring considering Reliability and RiskAndreas Metzger
 
Risk-based Proactive Process Adaptation
Risk-based Proactive Process AdaptationRisk-based Proactive Process Adaptation
Risk-based Proactive Process AdaptationAndreas Metzger
 
Predictive Process Monitoring Considering Reliability Estimates
Predictive Process Monitoring Considering Reliability EstimatesPredictive Process Monitoring Considering Reliability Estimates
Predictive Process Monitoring Considering Reliability EstimatesAndreas Metzger
 

More from Andreas Metzger (14)

Antrittsvorlesung - APL.pptx
Antrittsvorlesung - APL.pptxAntrittsvorlesung - APL.pptx
Antrittsvorlesung - APL.pptx
 
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive ServicesFeature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
 
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
 
Data-driven AI for Self-Adaptive Software Systems
Data-driven AI for Self-Adaptive Software SystemsData-driven AI for Self-Adaptive Software Systems
Data-driven AI for Self-Adaptive Software Systems
 
Data-driven Deep Learning for Proactive Terminal Process Management
Data-driven Deep Learning for Proactive Terminal Process ManagementData-driven Deep Learning for Proactive Terminal Process Management
Data-driven Deep Learning for Proactive Terminal Process Management
 
Big Data Technology Insights
Big Data Technology InsightsBig Data Technology Insights
Big Data Technology Insights
 
Proactive Process Adaptation using Deep Learning Ensembles
Proactive Process Adaptation using Deep Learning Ensembles Proactive Process Adaptation using Deep Learning Ensembles
Proactive Process Adaptation using Deep Learning Ensembles
 
Data-driven AI for Self-adaptive Information Systems
Data-driven AI for Self-adaptive Information SystemsData-driven AI for Self-adaptive Information Systems
Data-driven AI for Self-adaptive Information Systems
 
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud Towards an End-to-End Architecture for Run-time Data Protection in the Cloud
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud
 
Considering Non-sequential Control Flows for Process Prediction with Recurren...
Considering Non-sequential Control Flows for Process Prediction with Recurren...Considering Non-sequential Control Flows for Process Prediction with Recurren...
Considering Non-sequential Control Flows for Process Prediction with Recurren...
 
Big Data Value in Mobility and Logistics
Big Data Value in Mobility and Logistics Big Data Value in Mobility and Logistics
Big Data Value in Mobility and Logistics
 
Predictive Business Process Monitoring considering Reliability and Risk
Predictive Business Process Monitoring considering Reliability and RiskPredictive Business Process Monitoring considering Reliability and Risk
Predictive Business Process Monitoring considering Reliability and Risk
 
Risk-based Proactive Process Adaptation
Risk-based Proactive Process AdaptationRisk-based Proactive Process Adaptation
Risk-based Proactive Process Adaptation
 
Predictive Process Monitoring Considering Reliability Estimates
Predictive Process Monitoring Considering Reliability EstimatesPredictive Process Monitoring Considering Reliability Estimates
Predictive Process Monitoring Considering Reliability Estimates
 

Recently uploaded

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 

Recently uploaded (20)

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 

Explaining Online Reinforcement Learning Decisions of Self-Adaptive Systems

  • 1. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger Explaining Online Reinforcement Learning Decisions of Self-Adaptive Systems Felix Feit, Andreas Metzger, Klaus Pohl ACSOS 2022
  • 2. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Agenda © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger A. Metzger, ACSOS 2022 2 1. Motivation 2. Explanation Approach “XRL-DINE“ 3. Validation 4. Discussion and Outlook
  • 3. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Online Reinforcement Learning for SAS A. Metzger, ACSOS 2022 3 MAPE-K Combination RL Self-Adaptation Logic Analyze Monitor Execute Plan Knowledge Online RL for SAS Execute Policy (K) Monitor Action Selection (A + P) Policy Update Action A State S Reward R Next state S’ Action A State S Reward R Action Selection Next state S’ RL Agent Policy Policy Update Environ- ment • Online RL = Emerging approach for addressing design time uncertainty [Palm et al. @ CAiSE 2020]  Leverage information only available @ runtime (i.e., during live system execution) • Since 2019 use of Learning for SAS most prominent strategy [Porter et al. @ ACSOS 2020] • Conceptual Model of Online RL: [Metzger et al. @ Computing 2022] Learning Goal defined by Reward Function
  • 4. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Online Reinforcement Learning for SAS Policy (Knowledge) represented as deep neural network Pro • Handling of continuous states and actions • Generalization over unseen, neighboring states Con • Deep RL = “black box” •  Limited trustworthiness •  Difficult to debug e.g., reward function correctly defined? A. Metzger, ACSOS 2022 4 Increasing use of Deep RL The power of deep learning: “/imagine yellow Labrador in the style of…” [generated using Midjourney AI] UP RIGHT DOWN LEFT -12,224348 -12,044198 -12,463232 -12,349292 -11,368766 -11,407281 -11,567699 -11,741966 -10,724603 -10,758294 -10,878073 -10,956671 -10,118299 -9,9104485 -10,209248 -9,9700161 -9,2663503 -9,0282991 -9,4003475 -9,4925009 -8,1970854 -8,145322 -8,3578677 -8,6139991 -7,5583819 -7,3056764 -7,4218645 -7,6021728 -6,5359939 -6,4629871 -6,6290838 -6,800472 -5,8610507 -5,6457718 -5,6477581 -6,1414037 -5,01 -4,8068304 -4,7890915 -4,8609271 -3,9 -3,902123 -3,9078592 -4,0513057 -3,1389 -3,273 -2,9849214 -3,3706948 -12,609306 -12,612216 -12,579926 -12,784373 -11,845763 -11,794108 -11,845683 -12,478945 -11,237922 -10,873018 -10,900784 -11,436694 -10,008564 -9,9425947 -9,9517958 -10,268937 -9,2581936 -8,9864275 -8,9889605 -9,1798613 -8,1866584 -7,9931344 -7,9940185 -8,9510004 -7,0729858 -6,9970914 -6,9977784 -8,2920288 -6,0319487 -5,9988816 -5,9989739 -6,6224484 -6,0996963 -4,9994285 -4,999519 -6,232246 -4,8057376 -3,9997425 -3,9997874 -4,9212681 -3,1293615 -2,9999285 -2,9999384 -3,556979 -2,7588749 -2,1 -2 -2,2345458 -13,360876 -12 -13,954412 -12,991696 -12,565624 -11 -112,1835 -12,995431 -11,733772 -10 -112,79659 -11,980454 -10,740429 -9 -111,48554 -10,947642 -9,95339 -8 -112,12695 -9,9878453 -8,9112 -7 -112,67844 -8,8890419 -7,992178 -6 -112,91331 -7,965498 -6,9846114 -5 -112,41604 -6,9763523 -5,9533325 -4 -111,81117 -5,9401313 -4,9217978 -3 -110,6219 -4,8068301 -3,9307738 -2 -112,85584 -3,9418704 -2,9796888 -1,9340884 -1 -2,9827181 -13 -112,90933 -13,998779 -13,995334 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 UP RIGHT DOWN LEFT -12,224348 -12,044198 -12,463232 -12,349292 -11,368766 -11,407281 -11,567699 -11,741966 -10,724603 -10,758294 -10,878073 -10,956671 -10,118299 -9,9104485 -10,209248 -9,9700161 -9,2663503 -9,0282991 -9,4003475 -9,4925009 -8,1970854 -8,145322 -8,3578677 -8,6139991 -7,5583819 -7,3056764 -7,4218645 -7,6021728 -6,5359939 -6,4629871 -6,6290838 -6,800472 -5,8610507 -5,6457718 -5,6477581 -6,1414037 -5,01 -4,8068304 -4,7890915 -4,8609271 -3,9 -3,902123 -3,9078592 -4,0513057 -3,1389 -3,273 -2,9849214 -3,3706948 -12,609306 -12,612216 -12,579926 -12,784373 -11,845763 -11,794108 -11,845683 -12,478945 -11,237922 -10,873018 -10,900784 -11,436694 -10,008564 -9,9425947 -9,9517958 -10,268937 -9,2581936 -8,9864275 -8,9889605 -9,1798613 State S Action A State S Action A Deep RL Classical RL (Q-Learning) Environ- ment
  • 5. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Explainable Reinforcement Learning (XRL) for SAS A. Metzger, ACSOS 2022 5 State of the Art Goal-based Models [Welsh et al. @ Trans. CCI 2014] • Explanations in terms of the satisficement of softgoals • Requires making assumptions about environment dynamics at design time (difficult due to design time uncertainty) Provenance Graphs [Reynolds et al. @ MODELS Wkshp 2020] • Graph history to determine if, how and why model changed • Graph can become too complex to be meaningfully interpreted by humans • Query language suggested for “pruning” graphs but not for explanations Temporal Graph Models [Ullauri et al. @ SoSym 2022] • Explicitly considers Deep RL • Explanations via queries to model @ runtime • Interesting points of interactions extracted via CEP • No detailed, contrastive decomposition of explanations Example: Vacuum Cleaner Example: Fibonacci Example: Remote Data Mirroring
  • 6. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Agenda © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger A. Metzger, ACSOS 2022 6 1. Motivation 2. Explanation Approach “XRL-DINE“ 3. Validation 4. Discussion and Outlook
  • 7. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl XRL-DINE Reward Decomposition [Sequeira et al. @ Artif. Intell. 2020] Decompose reward function to explain short- term goal orientation of RL (train sub-RL agents) Pro • Helpful in the presence of multiple, “competing” quality goals for learning • Provides contrastive (counterfactual) explanations Con • No indication of explanation’s relevance • Requires manually selecting relevant explanations  cognitive overhead Interestingness Elements [Juozapaitis et al. @ IJCAI Wkshp 2019] Identify relevant moments of interaction between agent and environment at runtime Pro • Facilitates automatically selecting relevant interactions to be explained Con • Does not explain whether RL behaves as expected and for the right reasons 7 Augment and Combine RL Explanation Techniques from AI Research Decomposed Interestingness Elements (DINEs) A. Metzger, ACSOS 2022 +
  • 8. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl XRL-DINE Internal Behaviour Evolution of Reward External Behaviour Evolution of States and Actions 8 Understanding RL without DINEs? A. Metzger, ACSOS 2022 R S, A
  • 9. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl XRL-DINE Important Interaction Determine whether RL in given state is uncertain (wide range of actions) or certain (almost always same action) • How much does relative importance of actions differ for each sub-agent? • Number of DINES shown can be tuned via Threshold ρ (level of inequality) 9 Three Types of DINEs A. Metzger, ACSOS 2022 Visualization in Dashboard Certain Uncertain
  • 10. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl XRL-DINE Reward Channel Dominance Influence that each sub-agent has on each possible action • Influence of rewards of sub-agents on composed decision 10 Three Types of DINEs A. Metzger, ACSOS 2022 Visualization in Dashboard Relative
  • 11. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl XRL-DINE Reward Channel Extremum Points after local minimum/maximum of state-value  RL decisions in potentially critical states • ExpectedReward (S) – ExpectedReward (S’) > ϕ  Maximum • Number of DINES shown can be tuned via Threshold ϕ 11 Three Types of DINEs A. Metzger, ACSOS 2022 Visualization in Dashboard Minimum Maximum
  • 12. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Agenda © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger A. Metzger, ACSOS 2022 12 1. Motivation 2. Explanation Approach “XRL-DINE“ 3. Validation 4. Discussion and Outlook
  • 13. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Validation Proof-of-Concept Implementation • Double Deep Q-Networks with Experience Replay [Hasselt et al. @ AAAI 2016] • Approximation of environment model using supervised learning on contents of replay memory • OpenAI Gym interface to connect RL and SAS A. Metzger, ACSOS 2022 13 Experimental Setup RL Problem Formulation • Action Space = {Add / remove web servers, Change dimmer value} • State Space = {Request arrival rate, Average throughput, Average response time} • Decomposed Reward Function Self-Adaptive System • “SWIM” Exemplar [Moreno et al. @ SEAMS 2018] • Self-adaptive multi-tier web application
  • 14. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Validation A. Metzger, ACSOS 2022 14 Qualitative Results
  • 15. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Validation Important Interactions Reward Channel Extrema A. Metzger, ACSOS 2022 15 Quantitative Results Cognitive load ~ number of DINEs shown to developers
  • 16. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Agenda © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger A. Metzger, ACSOS 2022 16 1. Motivation 2. Explanation Approach “XRL-DINE“ 3. Validation 4. Discussion and Outlook
  • 17. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Discussion A. Metzger, ACSOS 2022 17 Limitations of XRL-DINE May generate difficult to understand explanations • Reason 1: Reward function was decomposed incorrectly or non-optimally • Reason 2: Environment dynamics may delay effects of adaptations and thus understandability Not directly applicable to collaborative adaptive systems • XRL-DINE does not consider decisions of other RL agents • May lead to misleading explanations if DINE-XRL is directly applied in collaborative setting Only works for value-based deep RL • XRL-DINEs computed using value-function Q(S, A) – details see paper • Value-based deep RL: policy = Q(S, A) approximated by neural network
  • 18. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Outlook A. Metzger, ACSOS 2022 18 Considering Explanation Requirements from Social Sciences [Miller @ Artif. Intell. 2019] Contrastive: “why P happened instead of Q?”  “Reward Channel Dominance” DINE Selective: “no need for complete course of events”  “Reward Channel Extrema” DINE & “Important Interactions” DINE Causal: “most likely explanation not necessarily the best”  Future work: e.g., check whether agent relies on spurious correlations (and not causality) [Gajcin et al. @ AAAMAS Wkshp 2022] Social: “transfer of knowledge as part of a conversation”  Future work: e.g., Chatbot4XAI   ? ? Prediction Explanation Train will be delayed Train passed last light 5 min later than typical This also happened yesterday, but why today a problem? Because of attribute “number of trains behind current one” > 5 What would have to change such that not delay (counterfactual) “number of trains behind current one” < 2 Human (Explainee) Chatbot4XAI (Explainer)
  • 19. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Thank You! Research leading to these results has received funding from the EU’s Horizon 2020 research and innovation programme under grant agreements no. 780351 & 871493 Further Reading • A. Metzger, C. Quinton, Z. Á. Mann, L. Baresi, K. Pohl, “Realizing Self-Adaptive Systems via Online Reinforcement Learning and Feature-Model-guided Exploration”, Computing, Springer, March, 2022 • A. Metzger, C. Quinton, Z. Mann, L. Baresi, and K. Pohl, “Feature model-guided online reinforcement learning for self-adaptive services,” in 18th Int’l Conf. on Service-Oriented Computing (ICSOC 2020), LNCS 12571, Springer, 2020 • A. Palm, A. Metzger, and K. Pohl, “Online reinforcement learning for self-adaptive information systems,” in 32nd Int’l Conf. on Advanced Information Systems Engineering (CAiSE 2020), LNCS 12127. Springer, 2020 A. Metzger, ACSOS 2022 19 www.enact-project.eu www.dataports-project.eu

Editor's Notes

  1. Monet – Kandinski -- Marc