Towards the next generation of interactive and adaptive explanation methods

Towards the next generation of interactive
and adaptive explanation methods
IWM-Lecture series - 7 Dec 2021
Katrien Verbert
Augment/HCI - KU Leuven
@katrien_v

Human-Computer Interaction group
Explainable AI - recommender systems – visualization – intelligent user interfaces
Learning analytics &
human resources
Media
consumption
Precision agriculture
Healthcare
Augment prof. Katrien Verbert
ARIA prof. Adalberto Simeone
Computer
Graphics
prof. Phil Dutré
Language
Intelligence &
Information
Retrieval
prof. Sien Moens

Augment/HCI team
Robin De Croon
Postdoc researcher
Katrien Verbert
Associate Professor
Francisco Gutiérrez
Postdoc researcher
Tom Broos
PhD researcher
Nyi Nyi Htun
Postdoc researcher
Houda Lamqaddam
PhD researcher
Oscar Alvarado
Postdoc researcher
http://augment.cs.kuleuven.be/
Diego Rojo Carcia
PhD researcher
Maxwell Szymanski
PhD researcher
Arno Vanneste
PhD researcher
Jeroen Ooge
PhD researcher
Aditya Bhattacharya
PhD researcher
Ivania Donoso Guzmán
PhD researcher

Explainable Artificial Intelligence (XAI)
“Given an audience, an explainable artificial
intelligence is one that produces details or reasons to
make its functioning clear or easy to understand.”
[Arr20]
4
[Arr20] Arrieta, Alejandro Barredo, et al. "Explainable Artificial Intelligence (XAI): Concepts, taxonomies,
opportunities and challenges toward responsible AI." Information Fusion 58 (2020): 82-115.

 Explaining model outcomes to increase user trust and acceptance
 Enable users to interact with the explanation process to improve the model
Objectives
Models

Collaborative filtering – Content-based filtering
Knowledge-based filtering - Hybrid
Recommendation techniques

Example: TasteWeights
8
Bostandjiev,
S.,
O'Donovan,
J.
and
Höllerer,
T.
TasteWeights:
a
visual
interactive
hybrid
recommender
system.
In
Proceedings
of
the
sixth
ACM
conference
on
Recommender
systems
(RecSys
'12).
ACM,
New
York,
NY,
USA
(2012),
35-42.

Overview
10
Application domains
Algoritmic foundation

Overview
11
Application domains

Explanations
12
Millecamp, M., Htun, N. N., Conati, C., & Verbert, K. (2019, March). To explain or not to explain: the
effects of personal characteristics when explaining music recommendations. In Proceedings of the 2019
Conference on Intelligent User Interface (pp. 397-407). ACM.

Personal characteristics
Need for cognition
• Measurement of the tendency for an individual to engage in, and enjoy, effortful cognitive activities
• Measured by test of Cacioppo et al. [1984]
Visualisation literacy
• Measurement of the ability to interpret and make meaning from information presented in the form of
images and graphs
• Measured by test of Boy et al. [2014]
Locus of control (LOC)
• Measurement of the extent to which people believe they have power over events in their lives
• Measured by test of Rotter et al. [1966]
Visual working memory
• Measurement of the ability to recall visual patterns [Tintarev and Mastoff, 2016]
• Measured by Corsi block-tapping test
Musical experience
• Measurement of the ability to engage with music in a flexible, effective and nuanced way
[Müllensiefen et al., 2014]
• Measured using the Goldsmiths Musical Sophistication Index (Gold-MSI)
Tech savviness
• Measured by confidence in trying out new technology
13

User study
 Within-subjects design: 105 participants recruited with Amazon Mechanical Turk
 Baseline version (without explanations) compared with explanation interface
 Pre-study questionnaire for all personal characteristics
 Task: Based on a chosen scenario for creating a play-list, explore songs and rate all
songs in the final playlist
 Post-study questionnaire:
 Recommender effectiveness
 Trust
 Good understanding
 Use intentions
 Novelty
 Satisfaction
 Confidence

Results
15
The interaction effect between NFC (divided into
4 quartiles Q1-Q4) and interfaces in terms of confidence

Design implications
 Explanations should be personalised for different groups of end-
users.
 Users should be able to choose whether or not they want to see
explanations.
 Explanation components should be flexible enough to present
varying levels of details depending on a user’s preference.
16

User control
Users tend to be more satisfied when they have control over
how recommender systems produce suggestions for them
Control recommendations
Douban FM
Control user profile
Spotify
Control algorithm parameters
TasteWeights

Controllability Cognitive load
Additional controls may increase cognitive load
(Andjelkovic et al. 2016)
Ivana Andjelkovic, Denis Parra, andJohn O’Donovan. 2016. Moodplay: Interactive mood-based music
discovery and recommendation. In Proc. of UMAP’16. ACM, 275–279.

Different levels of user control
19
Level
Recommender
components
Controls
low Recommendations (REC) Rating, removing, and sorting
medium User profile (PRO)
Select which user profile data
will be considered by the
recommender
high
Algorithm parameters
(PAR)
Modify the weight of different
parameters
Jin, Y., Tintarev, N., & Verbert, K. (2018, September). Effects of personal characteristics on music recommender
systems with different levels of controllability. In Proceedings of the 12th ACM Conference on Recommender
Systems (pp. 13-21). ACM.

User profile (PRO) Algorithm parameters (PAR) Recommendations (REC)
8 control settings
No control
REC
PAR
PRO
REC*PRO
REC*PAR
PRO*PAR
REC*PRO*PAR

Evaluation method
 Between-subjects – 240 participants recruited with AMT
 Independent variable: settings of user control
 2x2x2 factorial design
 Dependent variables:
 Acceptance (ratings)
 Cognitive load (NASA-TLX), Musical Sophistication, Visual Memory
 Framework Knijnenburg et al. [2012]

Results
 Main effects: from REC to PRO to PAR → higher cognitive load
 Two-way interaction: does not necessarily result in higher
cognitive load. Adding an additional control component to
PAR increases the acceptance. PRO*PAR has less cognitive
load than PRO and PAR
 High musical sophistication leads to higher quality, and thereby
result in higher acceptance
22
Jin, Y., Tintarev, N., & Verbert, K. (2018, September). Effects of personal characteristics on music
recommender systems with different levels of controllability. In Proceedings of the 12th ACM Conference on
Recommender Systems (pp. 13-21). ACM.

Overview
24
Application domains

Learning analytics
Src: Steve Schoettler

Explaining exercise recommendations
How to automatically adapt
the exercise recommending
on Wiski to the level of
students?
How do (placebo)
explanations affect initial trust
in Wiski for recommending
exercises?
Goals and research questions
Automatic adaptation Explanations & trust Young target audience
Middle and high school students
Ooge, J., Kato, S., Verbert, K. (to appear) Explaining Recommendations in E-Learning: Effects on Adolescents'
Initial Trust. Proceedings of the 27th IUI conference on Intelligent User Interfaces

Methodology: Automatic adaptation
All Questions
Elo
Filter
Potential Questions
of Same Level
Rank with
Collaborative
Filtering
Sorted
Recommended
Questions
* topics are chosen by the user and are
thus not part of the recommendation
scheme
Combine Elo rating system with collaborative filtering:
• Elo rating system finds questions of similar difficulty level
• Collaborative filtering ranks found questions

Methodology: Explanations
Iterative design of explanation interfaces through a user-
centred design methodology
Full-fledged tutorial for full transparency Single-screen explanation Final explanation interface

Why?
Justification
Comparison
with others
Real
explanation
Placebo
explanation
No
explanation

Methodology: Randomised controlled experiment
Equal probability to
be assigned to any
experimental group

Methodology: Trust
Multidimensional trust
• Trusting beliefs (Competence, benevolence,
integrity)
• Intention to return
• Perceived Transparency
One-dimensional trust
Direct trust measurement
Ask about trust factors
with 7-point Likert-type
questions
Indirect trust measurement
Log whether students accept
recommendations or not

Results: Real explanations…
… did increase multidimensional initial trust
… did not increase one-dimensional initial trust
… led to accepting more recommended exercises
compared to both placebo and no explanations

Results: Placebo explanations…
… did not increase initial trust compared to no explanations
… may undermine perceived integrity
… are a useful baseline:
• how critical are students towards explanations?
• how much transparency do students need?

Results: No explanations
Can be acceptable in low-stakes situations (e.g., drilling
exercises):
indications of difficulty level might suffice
Personal level indication:
Easy, Medium and Hard
tags

38
uncertainty
Gutiérrez Hernández F., Seipp K., Ochoa X., Chiluiza K., De Laet T., Verbert K. (2018). LADA: A learning
analytics dashboard for academic advising. Computers in Human Behavior, pp 1-13. doi:
10.1016/j.chb.2018.12.004
LADA: a learning analytics dashboard for
study advisors

Methodology
39
Evaluation @KU Leuven Monitoraat
N = 12
6 Experts (4F, 2M)
6 Laymen (1F, 5M)
Evaluation @ESPOL (Ecuador)
N = 14
8 Experts (3F, 5M)
6 Laymen (6M)

Results
 LADA was perceived as a valuable tool for more accurate and
efficient decision making.
 LADA enables expert advisers to evaluate significantly more
scenarios.
 More transparency in the prediction model is required in order
to increase trust.
40
Gutiérrez Hernández F., Seipp K., Ochoa X., Chiluiza K., De Laet T., Verbert K. (2018). LADA: A learning
analytics dashboard for academic advising. Computers in Human Behavior, pp 1-13. doi:
10.1016/j.chb.2018.12.004

Overview
41
Application domains

AHMoSe
Rojo, D., Htun, N. N., Parra, D., De Croon, R., & Verbert, K. (2021). AHMoSe: A knowledge-based visual
support system for selecting regression machine learning models. Computers and Electronics in Agriculture,
187, 106183.

AHMoSe Visual Encodings
44
Model Explanations
(SHAP)
Model + Knowledge Summary

Case Study – Grape Quality Prediction
45
 Grape Quality Prediction Scenario [Tag14]
 Data
 Years 2010, 2011 (train) 2012 (test)
 48 cells (Central Greece)
 Knowledge-based rules
[Tag14] Tagarakis, A., et al. "A fuzzy inference system to model
grape quality in vineyards." Precision Agriculture 15.5 (2014):
555-578.
Source: [Tag14]

Simulation Study
 AHMoSe vs full AutoML approach to support model selection.
46
RMSE (AutoML) RMSE (AHMoSe) Difference %
Scenario A
Complete
Knowledge
0.430 0.403 ▼ 6.3%
Scenario B
Incomplete
Knowledge
0.458 0.385 ▼ 16.0%

Qualitative Evaluation
 10 open ended questions
 5 viticulture experts and 4 ML experts.
 Thematic Analysis: potential use cases, trust, usability, and
understandability.

Qualitative Evaluation - Trust
48
 Showing the dis/agreement of model outputs with expert’s
knowledge can promote trust.
“The thing that makes us trust the models is the fact that most of
the time, there is a good agreement between the values
predicted by the model and the ones obtained for the knowledge
of the experts.”
– Viticulture Expert

Overview
49
Application domains

Designing for interacting with recommendations and
predictions for finding jobs
50

Predicting duration to find a job
51
Key Issues: Missing data, prediction trust issues, job seeker
motivation, lack of control.

Methodology
 A Customer Journey approach. (5 mediators).
 Hands-on time with the original dashboard (22 mediators).
 Observations of mediation sessions. (3 mediators, 6 job seekers).
 Questionnaire regarding perception of the dashboard and prediction
model (15 Mediators).
52
Charleer S., Gutiérrez Hernández F., Verbert K. (2018). Supporting job mediator and job seeker through an
actionable dashboard. In: Proceedings of the 24th IUI conference on Intelligent User Interfaces Presented at
the ACM IUI 2019, Los Angeles, USA.

Evaluation
 Qualitative evaluation with expert users:
(N = 12, 10f, age: M= 40.7, SD = 9.4)
 Semi-structured interviews
1. Feedback on parameter visuals.
2. Interaction feedback with the working prototype dashboard.
54

Results
 Our design attempts to clarify predictions by supporting
conversations between mediators and job seekers.
 Need for customization and contextualization.
 The human expert plays a crucial role when interpreting and
relaying in the predicted or recommended output.
 Our explanatory tool helps mediators to control the message
they wish to convey depending on the situation context.
55
Charleer S., Gutiérrez Hernández F., Verbert K. (2019). Supporting job mediator and job seeker
through an actionable dashboard. In: Proceedings of the 24th IUI conference on Intelligent User
Interfaces Presented at the ACM IUI 2019, Los Angeles, USA. (Core: A)

Second dashboard: explaining
recommendations
56
Gutiérrez, F., Charleer, Sven, De Croon, Robin, Nyi Nyi Htun, Goetschalckx, Gerd, & Verbert, Katrien.
(2019) “Explaining and exploring job recommendations: a user-driven approach for interacting with
knowledge-based job recommender systems”. In Proceedings of the 13th ACM Conference on
Recommender Systems. ACM, 2019

5
8
Ranking of parameters as voted by participants

Labor Market Explorer Design Goals
60
[DG1] Exploration/Control
Job seekers should be able to control
recommendations and filter out the information
flow coming from the recommender engine by
prioritizing specific items of interest.
[DG2] Explanations
Recommendations and matching scores should be
explained, and details should be provided on-
demand.
[DG3] Actionable Insights
The interface should provide actionable insights to
help job-seekers find new or more job
recommendations from different perspectives.

Final Evaluation
62
66 job seekers (age 33.9 ± 9.5, 18F)
8 Training Programs, 4 Groups, 1 Hour.
1
2
3
4
5
6
7
8
ResQue Questionnaire + two open questions.
Users explored the tool freely.
All interactions were logged.

Results: user empowerment
 The approach is perceived as effective to explore job
recommendations.
 Most participants felt confident and will use the explorer again.
 Explanations contribute to support user empowerment.
 A diverse set of actionable insights were also mentioned by
participants.
68

Results: personal characteristics
 The explorer was slightly better perceived by older participants
(45+).
 Participants in the technical group engaged more with all the
different features of the dashboard.
 Non-native speakers, sales and construction groups engaged
more with the map.
 The table overview was perceived as very useful by all user
groups, but the interaction may need further simplification for
some users.
69

Overview
70
Application domains

72
Ooge, J., Stiglic, G., & Verbert, K. (2021). Explaining artificial intelligence with visual
analytics in healthcare. Wiley Interdisciplinary Reviews: Data Mining and Knowledge
Discovery, e1427. https://doi.org/10.1002/widm.1427

https://www.jmir.org/2021/6/e18035
Nutrition
Nutrition advice (7)
Diets (7)
Recipes (7)
Menus (2)
Fruit (1)
Restaurants (1)
Doctors (4)
Hospital (5)
Thread / fora (3)
Self-Diagnosis (3)
Healthcare information (5)
Similar users (2)
Advise for children (2)
General
health
information
Routes (2)
Physical activity (10)
Leisure activity (2)
Wellbeing motivation (2)
Behaviour (7)
Wearable devices (1)
Tailored messages (2)
Routes (2)
Physical activity (10)
Leisure activity (2)
Behaviour (7)
Lifestyle
Specific
health
conditions
Health
Recommende
r Systems

Recommender systems for food
74

75
https://augment.cs.kuleuven.be/demos

Design and Evaluation
76
Gutiérrez F., Cardoso B., Verbert K. (2017). PHARA: a personal health augmented reality assistant to support
decision-making at grocery stores. In: Proceedings of the International Workshop on Health Recommender
Systems co-located with ACM RecSys 2017 (Paper No. 4) (10-13).

Design
77
Gutiérrez Hernández F., Htun NN., Charleer S., De Croon R., Verbert K. (2018). Designing augmented reality
applications for personal health decision-making. In: Proceedings of the 2019 52nd Hawaii International
Conference on System Sciences Presented at the HICSS, Hawaii, 07 Jan 2019-11 Jan 2019.

Methodology
 Within Subjects
 n = 28 (1F, 27M) Ages from 22 to 38 (M = 25.81, SD = 4.57)
 Post-Questionnaires
 TAM (Technology Acceptance)
 NASA-TLX (Task Load Index)
79
Gutiérrez Hernández F., Htun NN., Charleer S., De Croon R., Verbert K. (2018). Designing augmented reality
applications for personal health decision-making. In: Proceedings of the 2019 52nd Hawaii International
Conference on System Sciences Presented at the HICSS, Hawaii, 07 Jan 2019-11 Jan 2019.

Results
 PHARA allows users to make informed decisions, and resulted
in selecting healthier food products.
 Stack layout performs better with HMD devices with a limited
field of view, like the HoloLens, at the cost of some
affordances.
 The grid and pie layouts performed better in handheld devices,
allowing to explore with more confidence, enjoyability and less
effort.
80
Gutiérrez Hernández F., Htun NN., Charleer S., De Croon R., Verbert K. (2018). Designing augmented
reality applications for personal health decision-making. In: Proceedings of the 2019 52nd Hawaii
International Conference on System Sciences Presented at the HICSS, Hawaii, 07 Jan 2019-11 Jan
2019.

81
https://www.imec-int.com/en/what-we-offer/research-portfolio/discrete

RECOMMENDE
R ALGORITHMS
MACHINE
LEARNING
INTERACTIVE
DASHBOARDS
SMART ALERTS
RICH CARE PLANS
OPEN IoT
ARCHITECTURE

User centered design approach
83

Evaluation methodology
 12 nurses used the app for three months
 Data collection
 Interaction logs
 Resque questions
 Semi-structured interviews
85

 12 nurses during 3 months
86

Results
 Iterative design process identified several important features, such as the pending
list, overview and the feedback shortcut to encourage feedback.
 Explanations seem to contribute well to better support the healthcare professionals.
 Results indicate a better understanding of the call notifications by being able to see the
reasons of the calls.
 More trust in the recommendations and increased perceptions of transparency and control
 Interaction patterns indicate that users engaged well with the interface, although some
users did not use all features to interact with the system.
 Need for further simplification and personalization.
87

90
Explaining health recommendations
Word cloud Feature importance Feature importance+ %

Biofortification info
Plants to cultivate
PERNUG
 Increased access to more nutritious plants
 Improved iron and B12 intakes for vegan and vegetarian
subgroups
Consumer app with recipe recommendations Hydroponic system with
biofortified plants
https://www.eitfood.eu/projects/pernug

Take-away messages
 Involvement of end-users has been key to come up with
interfaces tailored to the needs of non-expert users
 Actionable vs non-actionable parameters
 Domain expertise of users and need for cognition important
personal characteristics
 Need for personalisation and simpliciation
95

Peter Brusliovsky NavaTintarev CristinaConati
Denis Parra
Collaborations
Bart Knijnenburg Jurgen Ziegler

Questions?
katrien.verbert@cs.kuleuven.be
@katrien_v
Thank you!
http://augment.cs.kuleuven.be/

Towards the next generation of interactive and adaptive explanation methods

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Towards the next generation of interactive and adaptive explanation methods

Similar to Towards the next generation of interactive and adaptive explanation methods (20)

More from Katrien Verbert

More from Katrien Verbert (16)

Recently uploaded

Recently uploaded (20)

Towards the next generation of interactive and adaptive explanation methods

Editor's Notes