2. Human-Computer Interaction group
recommender systems – visualization – intelligent user interfaces
Learning analytics
Media
consumption
Research Information
Systems
Wellness
& health
Augment prof. Katrien Verbert
ARIA prof. Adalberto Simeone
Computer
Graphics
prof. Phil Dutré
Language
Intelligence &
Information
Retrieval
prof. Sien Moens
3. Augment/HCI team
Robin De Croon
Postdoc researcher
Katrien Verbert
Associate Professor
Francisco Gutiérrez
PhD researcher
Tom Broos
PhD researcher
Martijn Millecamp
PhD researcher
Sven Charleer
Postdoc researcher
Nyi Nyi Htun
Postdoc researcher
Houda Lamqaddam
PhD researcher
Yucheng Jin
PhD researcher
Oscar Alvarado
PhD researcher
http://augment.cs.kuleuven.be/
Diego Rojo Carcia
PhD researcher
5. Mixed-initiative recommender systems
Core objectives:
• Explaining recommendations to increase user trust and acceptance
• Enable users to interact with the recommendation process
9. Interactive recommender systems
Transparency: explaining the rational of recommendations
User control: closing the gap between browse and search
Diversity – novelty
Cold start
Context-aware interfaces
9
He, C., Parra, D. and Verbert, K., 2016. Interactive recommender systems: A survey of the
state of the art and future research challenges and opportunities. Expert Systems with
Applications, 56, pp.9-27.
10. Flexible interaction with RecSys
Research visit
Host: Carnegie Mellon University
& University of Pittsburg
Collaboration: John Stamper,
Peter Brusilovsky, Denis Parra
Period: April 2012 – June 2012
Second post-doctoral
fellowship FWO
host university: KU Leuven,
Belgium
supervisor: Erik Duval
period: Oct 2012 – Sept 2015
10
11. Overview research topics
11
2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018
Learning Analytics - Media Consumption – Research Information Systems - Healthcare
14. Contributions
new approach to support exploration, transparency and
controllability
recommender systems are shown as agents
in parallel to real users and tags
users can interrelate entities to find items
evaluation study that assesses
effectiveness
probability of item selection
14
Verbert, K., Parra, D., Brusilovsky, P., & Duval, E. (2013). Visualizing recommendations to
support exploration, transparency and controllability. In Proceedings of the IUI 2013
international conference on Intelligent user interfaces (pp. 351-362). ACM.
17. Results of studies 1 & 2
Effectiveness: # bookmarked
items / #explorations
Effectiveness increases with
intersections of more entities
Effectiveness wasn’t affected
in the field study (study 2)
… but exploration distribution
was affected
17
Average effectiveness
Total number of explorations
Verbert, K., Parra, D., & Brusilovsky, P. (2016). Agents vs. users: Visual recommendation of research talks
with multiple dimension of relevance. ACM Transactions on Interactive Intelligent Systems (TIIS), 6(2), 11.
19. Three user studies
Study 1:
Within-subjects study with 20 users
baseline: exploration of recommendations in CN3
Second condition: exploration of recommendations in IEx
Data from two conferences EC-TEL 2014, EC-TEL 2015
Study 2:
Field study at Digital Humanities conference
+ 1000 participants, less technically oriented
Study 3:
Field study at IUI conference
Smaller scale, technical audience
19
20. Study 1 vs Study 2 vs Study 3
Overall ”augmented agents” were used in all three studies
Precision scores significantly higher for augmented agents in study
1 and study 3
Participants of study 2 (Digital Humanities)
more interested in content perspective
Rated several dimensions lower (use intention, fun, information
sufficiency, control)
20
Cardoso, B., Sedrakyan, G., Gutiérrez, F., Parra, D., Brusilovsky, P., & Verbert, K. (2018). IntersectionExplorer, a multi-
perspective approach for exploring recommendations. International Journal of Human-Computer Studies.
21. Overview research topics
21
2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018
Learning analytics - Media Consumption – Research Information Systems - Healthcare
23. Personal characteristics
Need for cognition
• Measurement of the tendency for an individual to engage in, and enjoy, effortful cognitive activities
• Measured by test of Cacioppo et al. [1984]
Visualisation literacy
• Measurement of the ability to interpret and make meaning from information presented in the form of
images and graphs
• Measured by test of Boy et al. [2014]
Locus of control (LOC)
• Measurement of the extent to which people believe they have power over events in their lives
• Measured by test of Rotter et al. [1966]
Visual working memory
• Measurement of the ability to recall visual patterns [Tintarev and Mastoff, 2016]
• Measured by Corsi block-tapping test
Musical experience
• Measurement of the ability to engage with music in a flexible, effective and nuanced way
[Müllensiefen et al., 2014]
• Measured using the Goldsmiths Musical Sophistication Index (Gold-MSI)
Tech savviness
• Measured by confidence in trying out new technology
23
24. User study
Within-subjects design: 105 participants recruited with Amazon Mechanical Turk
Baseline version (without explanations) compared with explanation interface
Pre-study questionnaire for all personal characteristics
Task: Based on a chosen scenario for creating a play-list, explore songs and rate all
songs in the final playlist
Post-study questionnaire:
Recommender effectiveness
Trust
Good understanding
Use intentions
Novelty
Satisfaction
Confidence
26. Design implications
Explanations should be personalised for different groups of end-
users.
Users should be able to choose whether or not they want to see
explanations.
Explanation components should be flexible enough to present
varying levels of details depending on a user’s preference.
26
27. User control
Users tend to be more satisfied when they have control over
how recommender systems produce suggestions for them
(Konstan and Riedl, 2012)
Control recommendations
Douban FM
Control user profile
Spotify
Control algorithm parameters
TasteWeights
29. Different levels of user control
29
Level
Recommender
components
Controls
low Recommendations (REC) Rating, removing, and sorting
medium User profile (PRO)
Select which user profile data
will be considered by the
recommender
high
Algorithm parameters
(PAR)
Modify the weight of different
parameters
Jin, Y., Tintarev, N., & Verbert, K. (2018, September). Effects of personal characteristics on music recommender
systems with different levels of controllability. In Proceedings of the 12th ACM Conference on Recommender
Systems (pp. 13-21). ACM.
30. User profile (PRO) Algorithm parameters (PAR) Recommendations (REC)
8 control settings
No control
REC
PAR
PRO
REC*PRO
REC*PAR
PRO*PAR
REC*PRO*PAR
31. Evaluation method
Between-subjects – 240 participants recruited with AMT
Independent variable: settings of user control
2x2x2 factorial design
Dependent variables:
Acceptance (ratings)
Cognitive load (NASA-TLX), Musical Sophistication, Visual Memory
Framework Knijnenburg et al. [2012]
32. Results
Main effects: from REC to PRO to PAR → higher cognitive load
Two-way interaction: does not necessarily result in higher
cognitive load. Adding an additional control component to
PAR increases the acceptance. PRO*PAR has less cognitive
load than PRO and PAR
High MS leads to higher quality, and thereby result in higher
acceptance
32
33. Overview research topics
33
2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018
Learning Analytics - Media Consumption – Research Information Systems - Healthcare
41. References
Boy, J., Rensink, R. A., Bertini, E., & Fekete, J. D. (2014). A principled way of assessing visualization
literacy. IEEE transactions on visualization and computer graphics, 20(12), 1963-1972.
Cacioppo, J.T., Petty, R.E. and Feng Kao, C., 1984. The efficient assessment of need for cognition.
Journal of personality assessment, 48(3), pp.306-307.
Konstan, J.A. and Riedl, J., 2012. Recommender systems: from algorithms to user experience. User
modeling and user-adapted interaction, 22(1-2), pp.101-123.
Müllensiefen, D., Gingras, B., Musil, J., & Stewart, L. (2014). The musicality of non-musicians: an
index for assessing musical sophistication in the general population. PloS one, 9(2), e89642.
Rotter, J. B. (1966). Generalized expectancies for internal versus external control of reinforcement.
Psychological monographs: General and applied, 80(1), 1.
Tintarev, N., & Masthoff, J. (2016). Effects of Individual Differences in Working Memory on Plan
Presentational Choices. Frontiers in psychology, 7, 1793.
Editor's Notes
Amazon.com gebruikt een collaborative filtering techniek: zoekt gelijkenissen tussen gebruikers en gaat dan op basis van wat gelijkaardige gebruikers kopen aanbevelingen doen.
Bring tools that have been researched for decades within reach of lay-humanists – traditional scholar not well versed with computational methods
- In field study, users explore mainly clusters connected to one entity
Users don’t even get to explore interactions between 4 items
enabling users to explore interrelationships between prospects increases probability of finding a relevant item
interrelating tags with other entities increase their effectiveness significantly
The procedure contains the following steps:
\begin{enumerate}
\item \textit{Tutorial of study} - Participants were invited to read the description of the user study and to choose a scenario for generating a play-list. Then, they were asked to watch a task tutorial. Only the features of the particular setting were shown in this video. The ``Start'' button of the study was only activated after finishing the tutorial. Users logged in with their Spotify accounts to our experimental system, so that our recommenders could leverage the Spotify API and user listening history to generate ``real'' recommendations.
\item \textit{Pre-study questionnaire} - This questionnaire collects user demographics and measures user's personal characteristics such as musical sophistication and visual memory capacity. %and their trust in recommender systems.
The visual memory capacity is measured by ``Corsi block-tapping test''. In the test, a number of tiles are highlighted one at a time, and participants are asked to select the tiles in the correct order afterward. The number of highlighted tiles increases until the user makes too many errors. In Experiments 1 and 3, we used a test with a more sophisticated implementation of the Corsi test~\footnote{\url{https://www.humanbenchmark.com/tests/memory}, accessed June 2018}, which allows us to better distinguish participants by the level of visual memory capacity. In Experiment 2, to control the workload of participants in the within-subjects design, we chose a simple version of the Corsi test~\footnote{\url{http://www.psytoolkit.org/experiment-library/corsi.html}, accessed June 2018} for measuring visual short-term memory.
\end{itemize}
\item \textit{Manipulating Recommender and rating songs} - To ensure that participants spent enough time to explore recommendations, the questionnaire link was only activated after 10 minutes. After tweaking the recommender, participants were asked to rate the top-20 recommended songs that resulted from their interactions.
\item \textit{Post-study questionnaire} - Participants were asked to evaluate the perceived quality, perceived accuracy, perceived diversity, satisfaction, effectiveness, and choice difficulty of the recommender system. After answering all the questions, participants were given opportunities to provide free-text comments of their opinions and suggestions about our recommender.
\end{enumerate}
Figure 2 shows that participants with low NFC are reporting
a higher confident in their playlist with the explanations in-
terface than in the baseline. Participants with a high NFC
reported the opposite. Hence, the participants with low NFC
have more confidence in the explanation interface than in the
baseline, in contrast to user with low NFC. An explanation
might be that low NFC participants benefited from the expla-
nations because they did not spontaneously engage in much
extra reasoning to justify the recommendations they received,
and when they received the rational from the explanation this
increased their confidence in their songs selection.
Figure 2 also indicates that as the NFC increased, the con-
fidence of participants in the playlist created in the baseline
also increased. This result indicated that participants with a
high NFC were more willing to understand their own musical
preference in relation to the attributes of the recommended
songs. This may have resulted in a higher confidence in their
playlist.
We did not see the same increase in trust as NFC increases
in the explanation interface. As Figure 2 shows, the NFC
scores in the third quartile were almost the same for both
interfaces. At the highest NFC level, participants had a higher
confidence in the baseline than in the explanation interface.
The reduced confidence within the explanation interface could
be an indication that users with a high NFC have less need for
explanations.
We employed a between-subjects study to investigate the effects of interactions among different user control on acceptance, perceived diversity, and cognitive load. We consider each of three user control components as a variable. By following the 2x2x2 factorial design we created eight experimental settings (Table~\ref{tab:table1}), which allows us to analyze three main effects, three two-way interactions, and one three-way interaction. We also investigate which specific \textit{personal characteristics} (musical sophistication, visual memory capacity) influence acceptance and perceived diversity. Each experimental setting is evaluated by a group of participants (N=30). Of note, to minimize the effects of UI layout, all settings have the same UI and disable the unsupported UI controls, e.g., graying out sliders.
As shown in section ~\ref{evaluation questions}, we employed Knijnenburg et al.'s framework~\citep{knijnenburg2012explaining} to measure the six subjective factors, perceived quality, perceived diversity, perceived accuracy, effectiveness, satisfaction, and choice difficulty~\citep{knijnenburg2012explaining}. In addition, we measured cognitive load by using a classic cognitive load testing questionnaire, the NASA-TLX~\footnote{https://humansystems.arc.nasa.gov/groups/tlx}. It assesses the cognitive load on six aspects: mental demand, physical demand, temporal demand, performance, effort, and frustration.
The procedure follows the design outlined in the general methodology (c.f., Section \ref{sec:general-procedure}). The \textit{experimental task} is to compose a play-list for the chosen scenario by interacting with the recommender system. Participants were presented with play-list style recommendations (Figure~\ref{fig:vis1}c). Conditions were altered on a between-subjects basis. Each participant was presented with only one setting of user control. For each setting, initial recommendations are generated based on the selected top three artists, top two tracks, and top one genre. According to the controls provided in a particular setting, participants were able to manipulate the recommendation process.
Main effects: REC has lowest cgload and highest acceptance
Two-way: All the settings that combine two control components do not lead to significantly higher cognitive load than using only one control component. combing multiple control components potentially increases acceptance without increasing cognitive load significantly.
visual memory is not a significant factor that affects the cognitive load of controlling recommender systems. In other words, controlling the more advanced recommendation components in this study does not seem to demand a high visual memory. In addition, we did not find an effect of visual memory on acceptance (or perceived accuracy and quality).
One possible explanation is that users with higher musical so- phistication are able to leverage different control components to explore songs, and this influences their perception of recommenda- tion quality, thereby accepting more songs.
Our results show that the settings of user control significantly influence cognitive load and recommendation acceptance. We discuss the results by the main effects and interaction effects in a 2x2x2 factorial design.
Moreover, we discuss how visual memory and musical sophistication affect cognitive load, perceived diversity, and recommendation acceptance.
\subsubsection{Main effects}
We discuss the main effects of three control components. Increased control level; from control of recommendations (REC) to user profile (PRO) to algorithm parameters (PAR); leads to higher cognitive load (see Figure \ref{fig:margin}c). The increased cognitive load, in turn, leads to lower interaction times. Compared to the control of algorithm parameters (PAR) or user profile (PRO), the control of recommendations (REC) introduces the least cognitive load and supports users in finding songs they like.
We observe that most existing music recommender systems only allow users to manipulate the recommendation results, e.g., users provide feedback to a recommender through acceptance. However, the control of recommendations is a limited operation that does not allow users to understand or control the deep mechanism of recommendations.
\subsubsection{Two-way interaction effects}
Adding multiple controls allows us to improve on existing systems w.r.t. control, and do not necessarily result in higher cognitive load. Adding an additional control component to algorithm parameters increases the acceptance of recommended songs significantly.
Interestingly, all the settings that combine two control components do \textit{not} lead to significantly higher cognitive load than using only one control component. We even find that users' cognitive load is significantly \textit{lower} for (PRO*PAR) than (PRO, PAR), which shows a benefit of combining user profile and algorithm parameters in user control. Moreover, combing multiple control components potentially increases acceptance without increasing cognitive load significantly. Arguably, it is beneficial to combine multiple control components in terms of acceptance and cognitive load.
\subsubsection{Three-way interaction effects}
The interaction of PRO*PAR*REC tends to increase acceptance (see Figure \ref{fig:margin}a), and it does not lead to higher cognitive load (see Figure \ref{fig:margin}c). Moreover, it also tends to increase interaction times and accuracy. Therefore, we may consider having three control components in a system.
Consequently, we answer the research question. \textbf{RQ1}: \textit{The UI setting (user control, visualization, or both) has a significant effect on recommendation acceptance?} It seems that combining PAR with a second control component or combing three control components increases acceptance significantly. %KV: this paragraph refers to different RQs: either rephrase or omit? -SOLVED
\subsubsection{Effects of personal characteristics}
Having observed the trends across all users, we survey the difference in cognitive load and item acceptance due to personal characteristics. We study two kinds of characteristics: visual working memory and musical sophistication.
\paragraph{Visual working memory}
The SEM model suggests that visual memory is not a significant factor that affects the cognitive load of controlling recommender systems. The cognitive load for the type of controls used may not be strongly affected by individual differences in visual working memory. In other words, controlling the more advanced recommendation components in this study does not seem to demand a high visual memory.
In addition, we did not find an effect of visual memory on acceptance (or perceived accuracy and quality). Finally, the question items for diversity did not converge in our model, so we are not able to make a conclusion about the influence of visual working memory on diversity.
\paragraph{Musical sophistication}
Our results imply that high musical sophistication allows users to perceive higher recommendation quality, and may thereby be more likely to accept recommended items. However, higher musical sophistication also increases choice difficulty, which may negatively influence acceptance.
One possible explanation is that users with higher musical sophistication are able to leverage different control components to explore songs, and this influences their perception of recommendation quality, thereby accepting more songs. Finally, the question items for diversity did not converge in our model, so we are not able to make a conclusion about the influence of musical sophistication on diversity.