SlideShare a Scribd company logo
1 of 163
Download to read offline
Trustworthy MIR
ISMIR 2022 Tutorial
Christine Bauer, Andrés Ferraro,
Emilia Gómez, Lorenzo Porcaro
1
2
Christine Bauer
Utrecht University,
The Netherlands
Andrés Ferraro
McGill University &
Mila, Canada*
Emilia Gómez,
Music Technology
Group (UPF) &
Joint Research
Center (EC), Spain
Lorenzo Porcaro
Music Technology
Group (UPF) &
Joint Research
Center (EC), Italy
* Currently at Pandora-SiriusXM
Outline
3
➔ Part 1 - Trustworthy MIR
◆ Trustworthy AI and its application to Music IR (Emilia, 30’)
< SHORT BREAK 10’>
◆ Evaluation (Christine, 30’)
< LONG BREAK 20’>
➔ Part 2 - Values in practice
◆ Diversity (Lorenzo, 30’)
< SHORT BREAK 10’>
◆ Fairness (Andrés, 30’)
➔ Wrap up (10’)
Part I
Trustworthy AI and its application to
Music IR
4
5
Trustworthy /ˈtrʌstˌwɚði/
: able to be relied on to do or provide what is needed
or right; deserving of trust; worthy of confidence.
System-oriented
Extraction and inference
of meaningful
information from music.
Socio-technical system
Interaction with the social
system, e.g. business
processes, organizations,
society (law, culture).
Markus Schedl, Emilia Gómez and Julián Urbano (2014), "Music Information Retrieval:
Recent Developments and Applications", Foundations and Trends in Information Retrieval:
Vol. 8: No. 2-3, pp 127-261. http://dx.doi.org/10.1561/1500000042
Paradigm change in MIR
6
1. Lawful - respecting all applicable laws and
regulations.
2. Ethical - respecting ethical principles and values.
3. Robust - both from a technical perspective while
taking into account its social environment. Avoid
intentional/unintentional harm.
7
Ethics guidelines for Trustworthy AI (2019)
KR1. Human
agency and
oversight
KR2. Technical
robustness and
safety
KR3. Privacy and
data governance
KR4.
Transparency
KR5. Diversity,
non-
discrimination
and fairness
KR6. Societal
and
environmental
well-being
KR7.
Accountability
7 Key Requirements for
Trustworthy AI
To be continuously evaluated
and addressed throughout the
AI system’s life cycle
8
Hupont, I., & Gomez, E. (2022). Documenting use cases in
the affective computing domain using Unified Modeling
Language. Affective Computing and Intelligence
Interaction https://arxiv.org/abs/2209.09666v1
9
From systems to use cases: example
Description
AI systems should empower human beings,
allowing them to make informed decisions
and fostering their fundamental rights.
At the same time, proper oversight
mechanisms need to be ensured, which can
be achieved through human-in-the-loop,
human-on-the-loop, and human-in-command
approaches.
Related topics
● Relevant user interfaces and HCIs.
● Ways to override or reverse the system output.
● Expertise needed to operate a system, how to
evaluate its correct operation.
Questions:
- How does an affective music recommender
may empower a listener?
- How listeners can take control of a music
recommender system?
KR1. Human agency and oversight
11
Description
AI systems need to be resilient and secure.
They need to be safe, ensuring a fall back
plan in case something goes wrong, as well
as being accurate, reliable and reproducible.
That is the only way to ensure that also
unintentional harm can be minimized and
prevented.
Related topics
● Accuracy level, metrics,
● Robustness tests: unintended (e.g. noise)
or intended errors (e.g. attacks).
● Consideration of edge cases.
● Reproducibility, open science.
KR2. Technical robustness and safety
MIR considerations
● Different metrics for different tasks and
contexts.
● Robustness tests not widely applied.
● Reproducibility fostered in ISMIR
research.
Description
Besides ensuring full respect for privacy and
data protection, adequate data governance
mechanisms must also be ensured, taking
into account the quality and integrity of the
data, and ensuring legitimised access to data.
Related topics
● Protection of personal data (e.g. minimization).
● Privacy-preserving recommendations.
● Define how to collect, label, process, audit and monitor
the data that goes into developing the AI system.
● Datasets used for training and validation should be
relevant, without errors and representative: ensure data is
up to date, matches demographics, carry out sanity
checks.
KR3. Privacy and data governance
Challenges:
• Data sources, crossing,
reconciliation.
• Time dimension.
• User understandability and
control.
• Different legal approaches
(e.g. EU - GDPR vs US).
• Different levels of personal data:
○ Nominative data leading to the identification of
natural persons.
○ Data leading to identification through
“complex” processes.
Pierre Saurel, Francis Rousseaux, & Marc Danger. (2014). On The Changing Regulations of Privacy and Personal Information in MIR.
Proceedings of the 15th International Society for Music Information Retrieval Conference, 597–602.
https://doi.org/10.5281/zenodo.1416638
KR3. Privacy and data governance
J. S. Gómez-Cañón et al., "Music
Emotion Recognition: Toward new,
robust standards in personalized and
context-sensitive applications," in
IEEE Signal Processing Magazine, vol.
38, no. 6, pp. 106-114, Nov. 2021,
doi: 10.1109/MSP.2021.3106232.
KR3. Privacy and personalization
Description
The data, system and business models linked to AI
should be transparent.
Traceability mechanisms can help achieving this.
Moreover, AI systems and their decisions should be
explained in a manner adapted to the stakeholder
concerned.
Humans need to be aware that they are interacting with
an AI system, and must be informed of the system’s
capabilities and limitations.
Related topics
How the system is built and evaluated, training data, limitations.
How the system is monitored, e.g. logs.
How the system is controlled, e.g. how to interpret its outputs. Potential misuse.
Pre-determined changes, expected lifetime and
necessary maintenance/care measures.
Communication to users, e.g. generative models.
KR4. Transparency
https://algorithmic-transparency.ec.europa.eu/index_en
● Transparency can serve to empower
people to challenge AI systems.
● Listeners → fake news,
impersonation.
● Artists → intellectual property, e.g.
training data has copyright,
infringement on the engineer.
Gómez, E., Blaauw, M., Bonada, J., Chandna, P., & Cuesta, H. (2018).
Deep learning for singing processing: Achievements, challenges and
impact on singers and listeners. arXiv preprint arXiv:1807.03046.
Sturm B., Iglesias M, Ben-Tal O, Miron M, Gómez E. Artificial
Intelligence and Music: Open Questions of Copyright Law and Engineering
Praxis. Arts. 2019; 8(3):115.https://doi.org/10.3390/arts8030115
KR4. Transparency: music generation
Description
Unfair bias must be avoided, as it could have multiple
negative implications, from the marginalization of
vulnerable groups, to the exacerbation of prejudice and
discrimination. Fostering diversity, AI systems should be
accessible to all, regardless of any disability, and involve
relevant stakeholders throughout their entire life cycle.
Related topics
Demographics, gender, sex, age, ethnicity, culture, religion,
sexual orientation, political orientation, culture….
Specific section later in the
second part of the tutorial!
KR5. Diversity, non-discrimination and fairness
Description
AI systems should benefit all human beings,
including future generations. It must hence be
ensured that they are sustainable and
environmentally friendly. Moreover, they
should take into account the environment,
including other living beings, and their social
and societal impact should be carefully
considered.
Related topics
● Impact on jobs and skills.
● Computing ressources.
Questions:
- How will MIR affect music workspaces?
e.g. instruments players, musicians, composers.
Tolan, S., Pesole, A., Martínez-Plumed, F., Fernández-Macías, E., Hernández-Orallo, J., & Gómez, E. (2021). Measuring the occupational impact of AI: tasks,
cognitive abilities and AI benchmarks. Journal of Artificial Intelligence Research, 71, 191-236.
Emilia Gómez (2020). Human and Machine Intelligence: A Music Information Retrieval Perspective. Keynote speech, 11th International Conference on
Computational Creativity.
KR6. Societal and environmental well-being
• Creative and performing artists
• Architects, planners, surveyors,
designers
• Artistic, cultural and culinary
associate professionals
• Creativity and resolution
• Accounting
• Business
• Communication
• Conceptualization
• Text comprehension
• Attention and search
• Quantitative reasoning
• AI exposure score
• Few benchmaking initiatives on
creative systems
Songul Tolan, Annarosa Pesole, Fernando Martínez-Plumed, Enrique
Fernandez-Macias, José Hernández-Orallo & Emilia Gómez,
“Measuring the Occupational Impact of AI: Tasks, Cognitive
Abilities and AI Benchmarks”, JRC Working Papers on Labour,
Education and Technology 2020-02, Joint Research Centre (Seville
site).
KR6. Societal well-being: jobs
Training
Evaluation: version identification,
accuracy-scalability plane.
Embedding distillation
techniques: less storage, faster
retrieval and similar accuracy
(Yesiler et al. 2022).
NIME Conference Environmental
Statement
https://eco.nime.org/
Traditional systems
DL systems
Scalability
Accuracy
F. Yesiler, J. Serrà, E. Gómez. Less is more: Faster and
better music version identification with embedding
distillation, ISMIR 2020.
Jukebox, Open AI (2020)
Evaluation
KR6. Environmental well-being
Description
Mechanisms should be put in place to ensure
responsibility and accountability for AI
systems and their outcomes. Auditability,
which enables the assessment of algorithms,
data and design processes plays a key role
therein, especially in critical applications.
Moreover, adequate an accessible redress
should be ensured.
Related topics
● Reproducibility and open data, code.
● Algorithmic audits. Questions:
- Which are the challenges to reproduce an
existing paper, e.g. audit an MIR system?
KR7. Accountability
Newly
addressed
Some
research
Strong
background
KR1. Human
agency and
oversight
KR2. Technical
robustness and
safety
KR3. Privacy and
data governance
KR4.
Transparency
KR5. Diversity,
non-discrimination
and fairness
KR6. Societal
and
environmental
well-being
KR7.
Accountability
7 Key Requirements for
Trustworthy AI
To be continuously evaluated
and addressed throughout the
AI system’s life cycle
From ethical guidelines to legal requirements
AI Act: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:52021PC0206 **Under negotiation**
Legal requirements:
• Robustness, accuracy and cybersecurity.
• Human oversight (measures built into the
system and/or to be implemented by users).
• Ensure appropriate degree of transparency and
provide users with information on capabilities
and limitations of the system and how to use it.
• Technical documentation and logging
capabilities (traceability and auditability).
• High-quality and representative training,
validation and testing data.
• Risk management.
AI Act - Scope: AI systems (software products)
KR1. Human
agency and
oversight
KR2. Technical
robustness and
safety
KR3. Privacy and
data governance
KR4.
Transparency
KR5. Diversity,
non-discriminatio
n and fairness
KR6. Societal and
environmental
well-being
KR7.
Accountability
Unacceptable
risk
High risk
‘Transparency’ risk
Minimal or no risk
Prohibite
d
Permitted subject to
compliance with AI
requirements and ex-ante
conformity assessment
Permitted subject to
information/transparency
obligations
Permitted
with no
restrictions
*Not
mutually
exclusive
● Risk management.
● Transparency of recommender systems, online
advertisement.
● External & independent auditing, internal
compliance function and public accountability.
● Data sharing with authorities and researchers.
● Crisis response cooperation
Digital Services Act:
https://ec.europa.eu/info/strategy/priorities-2019-2024/europe-fit-digital-age/digital-services-act-ensuring-s
afe-and-accountable-online-environment_en
Entering into force in Europe in 2023!
Digital Services Act - Scope: digital services (e.g. search engines, online platforms)
From ethical guidelines to legal requirements
PRINCIPLES
● Proportionality and Do No Harm
● Safety and security
● Fairness and non-discrimination
● Sustainability
● Right to Privacy, and Data Protection
● Human oversight and determination
● Transparency and explainability
● Responsibility and accountability
● Awareness and literacy
● Multi-stakeholder and adaptive governance and
collaboration
https://unesdoc.unesco.org/ark:/48223/pf0000381137
Towards worldwide recommendations
Part I
Trustworthy MIR:
Evaluation
27
28
Excursus
In 1878 in Birka (Southeastern Sweden),
unburied Viking settlement from about 750 to 950
29
High-status,
Viking warrior,
male.
Weapons found in the grave.
(Image credit: Neil Price, Charlotte
Hedenstierna-Jonson, Torun Zachrisso, Anna
Kjellström; Copyright : Antiquity Publications Ltd.)
Illustration how the burial might have looked just
before it was closed in Viking times.
(Image credit: Drawing by Þórhallur Þráinsson;
Copyright Antiquity Publications Ltd.)
buried person
is female
2017, DNA test
30
XX
chromosomes
Second method (other
than looking on clothes,
arms, and setting)—the
DNA test—gave a better
insight.
Assumption that arms,
“non-female appearing” clothes,
etc., indicate a male warrior.
?
This finding opened up new questions
31
All Viking warriors women?
Was she a warrior?
What was the social role of women in Viking time?
32
There are blind spots when taking
only one perspective,
using a single method
with one metric.
33
MIR evaluation:
from system-centered to
human-centered approaches
MIR evaluation - first steps: system-centered
34
2004 ISMIR 2004 Audio Description Contest
first international evaluation project in MIR
2005 first edition of the Music Information Retrieval Evaluation eXchange (MIREX),
organized by IMIRSEL (International Music IR Systems Evaluation Laboratory)
2011 MusiClef campaign
(later MediaEval series)
2012 Million Song Dataset Challenge
(Urbano et al., 2013)
MIR evaluation - mimicking users
35
● follow Cranfield paradigm for Information Retrieval (IR) evaluation
● trying to mimic the potential requests of the final users:
“removing actual users from the experiment but including a static user component:
the ground truth”
● 3 major simplifying assumptions:
○ 1. Relevance can be approximated by topical similarity.
○ 2. A single set of judgments for a topic is representative of the user population.
○ 3. Lists of relevant documents for each topic is complete (all relevant
documents are known).
(Urbano et al., 2013)
(Cleverdon, 1991)
(Vorhees, 2002)
mimicking users ≈ system-centered
MIR evaluation - considering the user: user-centered?
36
● the neglected user:
○ too simplistic view of “the user”
○ we need to model the user comprehensively
● still: no user involved?!
(Schedl et al., 2013)
37
Nothing about us without us.
MIR evaluation - involving the user = (more) user-centered
38
● If you target user, you have to involve users.
● Typical questions to address:
○ (How) do users accomplish their tasks?
○ (How) do users accomplish the sub-tasks (if any) in their tasks?
○ How many steps do the users spend in finishing their tasks?
○ How long do the users spend in finishing their tasks? (task completion time)
● Important aspects→ Variations and differences across users
○ Individual differences and subjectivity
○ Still: Users groups will share similarities
(Liu & Hu, 2010)
MIR evaluation - from user-centered to human-centered evaluation
39
Users are not the only humans
involved in and impacted by
MIR systems!
Considering multiple stakeholders
40
music creators performers
music companies platform/service providers
end consumers/listeners
society
...
E.g., Stakeholders for (music) recommender systems
41
What is my target group?
Who else is affected/involved?
Also consider sub-groups!
(Jannach & Bauer, 2020, adapted)
What is the providers goal, need or intent?
Do these overlap with the users’ perspective?
Do they contradict?
Different stakeholders — task, intent, goal, need
42
What is the user’s task?
What is the user’s intent?
What does the user want?
What does the user need?
Are there multiple tasks, intents,
demands, needs?
Is this fair?
Is this desirable?
Who says that?
Different stakeholders — different risks
43
(Jannach & Jugovac, 2019)
44
Striving toward trustworthy MIR,
we need to
assess our systems
concerning
the various perspectives.
Comprehensive evaluation is not one-size-fits all
45
● A one-size-fits-all evaluation strategy does not exist.
● Each scenario requires its fitting evaluation strategy.
● Different stakeholders may have different perspectives, goals, demands,
wishes, values, risks—potentially contradicting.
46
Comprehensive
evaluation
Goal:
Getting an integrated big picture
of a system’s performance
Comprehensive evaluation = multi-* evaluation
47
multiple methods
multiple metrics
multiple datasets
multiple perspectives
…
Benefits
48
● Explore sophisticated issues more
holistically and widely
● Capture diverse, but complementary
phenomena
● Apply diverse methods to capture the
same phenomenon from possibly
different angles
● Neutralize biases inherent to evaluation
approaches
(Celik et al., 2018)
We need to thoughtfully configure the evaluation design space.
And we have to do this on several levels for a comprehensive evaluation.
49
(Zangerle & Bauer, 2022)
Framework for EValuating
Recommender systems (FEVR)
Typical differentiation of experiment types
50
Different (sub-)communities
→different terminology
● Computational or algorithmic
approaches
● User studies (in the lab or online)
● Field studies/experiment (using a
real-world system)
(Zangerle & Bauer, 2020)
The scenarios to evaluate are multifaceted
51
Potential overall goals:
● Convince/persuade user
● User satisfaction
● Replace user
● Assist user
● …
Target group:
● Individual users vs. user groups
● Novices vs experts
● Adults vs children
● Niche music vs mainstream listeners
● …
Situation of a user in session:
● Main task vs side task
● Self-initiated vs other-initiated
● Clear idea/goal vs exploration
● …
Starting point for evaluation setting:
● Ground truth exists vs subjective
perception (i.e., variation across users)
● Existing data vs data collection
○ Fit of data to goal
○ Representation of target group
● …
52
What exactly do we want to find out?
What do we need to find out?
Is it relevant?
Does it matter in practice?
What is good enough?
Let us look at some variables of interest
53
Does my system accurately identify a song?
54
● accuracy-based metrics
(e.g., precision, recall, F-score)
● What is the harm of a false positive?
● What is the harm of a false negative?
For person who uploaded a song:
Does my system accurately identify a song?
55
● What is the harm of a false positive?
● What is the harm of a false negative?
TP marked as infringement, is infringement
FP marked as infringement, is not infringement
TN not marked as infringement, is not infringement
FN not marked as infringement, is infringement
Song identification to detect copyright infringement:
For copyright holder of song:
Does my system accurately identify a song?
56
● What is the harm of a false positive?
● What is the harm of a false negative?
TP marked as infringement, is infringement
FP marked as infringement, is not infringement
TN not marked as infringement, is not infringement
FN not marked as infringement, is infringement
Song identification to detect copyright infringement:
Is a MIR system fair?
57
Is my system fair for artists?
● Are artists fairly represented in a playlist?
● …in a dataset?
● …fair regarding gender of artists?
● …fair regarding genre?
● …fair regarding country of origin?
● …
Is my system fair for users?
● Are world-music fans as well served as
heavy-metal fans?
● Are gender minority users served well?
● Are novices served well?
● …
Do artists perceive it as fair?
Do users perceive it as fair?
Do artists/users perceive any differences?
58
We need to ask a lot of question.
We need to ask the right questions.
We need to ask the right questions right.
Part II
Values in Practice:
Diversity and Fairness
59
Key Requirements for Trustworthy AI (AI-HLEG, 2018)
60
“In order to achieve Trustworthy AI, we
must enable inclusion and diversity
throughout the entire AI system’s life
cycle. Besides the consideration and
involvement of all affected
stakeholders throughout the process,
this also entails ensuring equal access
through inclusive design processes as
well as equal treatment. This
requirement is closely linked with the
principle of fairness.” (Section 1.5)
Recommendation on the Ethics of AI (UNESCO, 2020)
61
“AI actors should promote social justice and safeguard fairness and non-discrimination of any kind [...].
This implies an inclusive approach to ensuring that the benefits of AI technologies are available and
accessible to all, taking into consideration the specific needs of different age groups, cultural systems,
different language groups, persons with disabilities, girls and women, and disadvantaged, marginalized
and vulnerable people or people in vulnerable situations.” (Section III.2, Article 28)
(IS)MIR Initiatives
62
❖ Woman in Music Information Retrieval (WiMIR), 2011-present https://wimir.wordpress.com/
❖ Ethics guidelines for MIR, WiMIR workshop, Nov. 3 2019 (link)
❖ ISMIR Tutorials:
➢ Why is Greek Music Interesting? Towards an Ethics of MIR, Section 4, 2014 (link)
➢ Computational Approaches for Analysis of Non-Western Music Traditions, 2018 (link)
➢ Fairness, Accountability and Transparency in Music Information Research, 2019 (link)
❖ ISMIR Special Call for Papers: Cultural and Social Diversity in MIR (2021, 2022)
❖ TISMIR Call for Papers: Special Collection on Cultural Diversity in MIR (2022)
❖ NIME Principles & Code of Practice on Ethical Research https://www.nime.org/ethics/
Part II
Values in Practice:
Diversity
63
Organizing Team ISMIR 2022 (w/o volunteers)
64
https://ismir2022.ismir.net/team
Organizing Team ISMIR 2022 (w/o volunteers)
65
Is it a
diverse
team?
https://ismir2022.ismir.net/team
Organizing Team ISMIR 2022 (w/o volunteers)
66
Is it a
diverse
team?
Yes, but…
https://ismir2022.ismir.net/team
Organizing Team ISMIR 2022 (w/o volunteers)
67
Is it a
diverse
team?
Yes, but… No, but…
https://ismir2022.ismir.net/team
68
69
Measure
2.
Context &
Concept
1.
Impact
4.
Perception
3.
The Cycle of Diversity
Contexts & Concepts of Diversity (Steel et al., 2018)
70
An understanding of what
constitutes diversity,
abstracted from questions of
which attributes are relevant
to diversity in a specific context.
Background situation that
suggests attributes relevant
for assessing diversity.
Context Concept
Contexts & Concepts of Diversity (Steel et al., 2018)
71
An understanding of what
constitutes diversity,
abstracted from questions of
which attributes are relevant
to diversity in a specific context.
Background situation that
suggests attributes relevant
for assessing diversity.
Context Concept
Contexts & Concepts of Diversity (Steel et al., 2018)
72
An understanding of what
constitutes diversity,
abstracted from questions of
which attributes are relevant
to diversity in a specific context.
Background situation that
suggests attributes relevant
for assessing diversity.
Context Concept
Domain Agnostic
Domain Specific
Contexts & Concepts of Diversity (Steel et al., 2018)
73
An understanding of what
constitutes diversity,
abstracted from questions of
which attributes are relevant
to diversity in a specific context.
Background situation that
suggests attributes relevant
for assessing diversity.
Context Concept
Domain Agnostic
Domain Specific
Concepts of Diversity - Egalitarian (Steel et al., 2018)
74
Let L be the line-up of an electronic music festival.
Concepts of Diversity - Egalitarian (Steel et al., 2018)
75
Let L be the line-up of an electronic music festival.
If the artists part of L are distributed uniformly over
several styles of electronic music S = {s1 , ..., sn}, the
festival is diverse from an egalitarian perspective.
Concepts of Diversity - Normic (Steel et al., 2018)
76
Let L be the line-up of an electronic music festival,
S = {s1 , ..., sn } be styles of electronic music, and
Snd
the non-diverse norm be the Electronic Dance Music (EDM) style.
Concepts of Diversity - Normic (Steel et al., 2018)
77
Let L be the line-up of an electronic music festival,
S = {s1 , ..., sn } be styles of electronic music, and
Snd
the non-diverse norm be the Electronic Dance Music (EDM) style.
Then, we can assume that L is expected to be diverse in a normic
sense if the majority of artists performing do not play EDM i.e. ∉ Snd
78
“When a measure becomes a target,
it ceases to be a good measure.”
Strathern (1997)
79
Measures of Diversity
Index-based
Given a set 𝛀 of N elements, analyze their distribution into categories
and returns a single value:
- Compact representation
- Monodimensional
80
Measures of Diversity
Index-based
Given a set 𝛀 of N elements, analyze richness and evenness returns a
single value:
- Compact representation
- Monodimensional
Distance-based
Given a set of 𝛀 of N elements, compute the pairwise distance between
elements & aggregate:
- Element-wise information
- Multidimensional
81
Measures of Diversity - Index-based
Given a set 𝛀 of N elements,
Richness: number of categories present in the set 𝛀
Evenness: distribution of elements across the categories of 𝛀
82
Measures of Diversity - Index-based
Given a set 𝛀 of N elements,
Richness: number of categories present in the set 𝛀
Evenness: distribution of elements across the categories of 𝛀
Valence
Arousal
Q1
Q2
Q3 Q4
83
Measures of Diversity - Index-based
Richness: number of categories present in the set 𝛀 = 4 = N
Evenness: distribution of elements across the categories of 𝛀 = (1,1,1,1) = (n1, n2, n3, n4)
Valence
Arousal
Q1
Q2
Q3 Q4
84
Measures of Diversity - Index-based
Richness: number of categories present in the set 𝛀 = 4 = N
Evenness: distribution of elements across the categories of 𝛀 = (1,1,1,1) = (n1, n2, n3, n4)
Valence
Arousal
Q1
Q2
Q3 Q4
Simpson’s
Diversity Index
(Simpson, 1949)
0 = no diversity
1 = high diversity
85
Measures of Diversity - Index-based
Richness: number of categories present in the set 𝛀 = 4 = N
Evenness: distribution of elements across the categories of 𝛀 = (1,1,1,1) = (n1, n2, n3, n4)
Valence
Arousal
Q1
Q2
Q3 Q4
How diverse is the set of tracks
(in terms of emotion recognized)
according to the Simpson index?
Simpson’s
Diversity Index
(Simpson, 1949)
86
Measures of Diversity - Index-based
Richness: number of categories present in the set 𝛀 = 4 = N
Evenness: distribution of elements across the categories of 𝛀 = (1,1,1,1) = (n1, n2, n3, n4)
Valence
Arousal
Q1
Q2
Q3 Q4
D(𝛀) = 1
N.B. quite egalitarian
Simpson’s
Diversity Index
(Simpson, 1949)
87
Measures of Diversity - Index-based
Richness: number of categories present in the set 𝛀 = 4 = N
Evenness: distribution of elements across the categories of 𝛀 = (1,1,1,1) = (n1, n2, n3, n4)
Valence
Arousal
Q1
Q2
Q3 Q4 Valence
Arousal
Q1
Q2
Q3 Q4
88
Measures of Diversity - Index-based
Richness: number of categories present in the set 𝛀 = 4 = N
Evenness: distribution of elements across the categories of 𝛀 = (1,1,1,1) = (n1, n2, n3, n4)
Valence
Arousal
Q1
Q2
Q3 Q4 Valence
Arousal
Q1
Q2
Q3 Q4
D(𝛀) = 1
???
89
Measures of Diversity - Distance-based
Given a set of 𝛀 of N elements,
compute the elements pairwise
distance (Euclidean, Jaccard,
Cosine, etc.) , and aggregate.
90
Measures of Diversity - Distance-based (Porcaro and Gómez, 2019)
2-dimensional projection
of tag-embedding space
91
Measures of Diversity - Distance-based (Porcaro and Gómez, 2019)
2-dimensional projection
of tag-embedding space
Computed pairwise
distance within clusters
92
Measures of Diversity - Distance-based (Porcaro and Gómez, 2019)
2-dimensional projection
of tag-embedding space
Computed pairwise
distance within clusters
The one with maximum
average distance is the
most diverse!
93
Measures of Diversity - Distance-based (Porcaro and Gómez, 2019)
2-dimensional projection
of tag-embedding space
Computed pairwise
distance within clusters
The one with maximum
distance is the most
diverse!
94
Measures of Diversity - Distance-based (Porcaro and Gómez, 2019)
95
Measures of Diversity - Distance-based (Porcaro and Gómez, 2019)
Jazz vocal - Swing
???
Ballad - Love
96
Do differences in system-measures directly translate to
the same differences in user-measures? How do those
differences affect end users?
Construct Validity (Urbano et al., 2013)
97
Perceptions of Diversity (Porcaro et al., 2022)
Pretty Lights
(US)
Downtempo
Technoboy
(IT)
Hardstyle
Bonobo
(UK)
Downbeat
Laurent Garnier
(FR)
Techno
The Blessed Madonna
(US)
House
Helena Hauff
(DE)
Techno
Miss DJAX
(NL)
Hardcore
Beatrice Dillon
(UK)
Experimental
98
Perceptions of Diversity (Porcaro et al., 2022)
Pretty Lights
(US)
Downtempo
Technoboy
(IT)
Hardstyle
Bonobo
(UK)
Downbeat
Laurent Garnier
(FR)
Techno
The Blessed Madonna
(US)
House
Helena Hauff
(DE)
Techno
Miss DJAX
(NL)
Hardcore
Beatrice Dillon
(UK)
Experimental
Which one is the
most diverse
playlist?
99
Perceptions of Diversity (Porcaro et al., 2022)
Pretty Lights
(US)
Downtempo
Technoboy
(IT)
Hardstyle
Bonobo
(UK)
Downbeat
Laurent Garnier
(FR)
Techno
The Blessed Madonna
(US)
House
Helena Hauff
(DE)
Techno
Miss DJAX
(NL)
Hardcore
Beatrice Dillon
(UK)
Experimental
Both playlists have:
- Same number of styles
- Only Western-based DJs
- Only white skin DJs
- Only one sex (biological
aspect)
100
Perceptions of Diversity (Porcaro et al., 2022)
Pretty Lights
(US)
Downtempo
Technoboy
(IT)
Hardstyle
Bonobo
(UK)
Downbeat
Laurent Garnier
(FR)
Techno
The Blessed Madonna
(US)
House
Helena Hauff
(DE)
Techno
Miss DJAX
(NL)
Hardcore
Beatrice Dillon
(UK)
Experimental
According to
computational
diversity metrics,
NO statistical difference.
101
Perceptions of Diversity (Porcaro et al., 2022)
Pretty Lights
(US)
Downtempo
Technoboy
(IT)
Hardstyle
Bonobo
(UK)
Downbeat
Laurent Garnier
(FR)
Techno
The Blessed Madonna
(US)
House
Helena Hauff
(DE)
Techno
Miss DJAX
(NL)
Hardcore
Beatrice Dillon
(UK)
Experimental
but…
102
Perceptions of Diversity (Porcaro et al., 2022)
Pretty Lights
(US)
Downtempo
Technoboy
(IT)
Hardstyle
Bonobo
(UK)
Downbeat
Laurent Garnier
(FR)
Techno
The Blessed Madonna
(US)
House
Helena Hauff
(DE)
Techno
Miss DJAX
(NL)
Hardcore
Beatrice Dillon
(UK)
Experimental
Asking to 110
participants,
72% of them agree that
List Blue is more diverse
than List Green
103
Perceptions of Diversity (Porcaro et al., 2022)
Pretty Lights
(US)
Downtempo
Technoboy
(IT)
Hardstyle
Bonobo
(UK)
Downbeat
Laurent Garnier
(FR)
Techno
The Blessed Madonna
(US)
House
Helena Hauff
(DE)
Techno
Miss DJAX
(NL)
Hardcore
Beatrice Dillon
(UK)
Experimental
Most of the participants
employed a normic
concept of diversity…
104
Perceptions of Diversity (Porcaro et al., 2022)
Pretty Lights
(US)
Downtempo
Technoboy
(IT)
Hardstyle
Bonobo
(UK)
Downbeat
Laurent Garnier
(FR)
Techno
The Blessed Madonna
(US)
House
Helena Hauff
(DE)
Techno
Miss DJAX
(NL)
Hardcore
Beatrice Dillon
(UK)
Experimental
Most of the participants
employed a normic
concept of diversity…
… while our model works
under an egalitarian
concept.
105
N.B.
106
MIR and Subjectivity
MIR systems include a great level of subjectivity:
- Track A and Track B are very similar
- Track A is happier than Track B
- Playlist A is less diverse than Playlist B
- This track is 80% EDM and 20% Tech House
107
MIR and Subjectivity
MIR systems include a great level of subjectivity:
- Track A and Track B are very similar
- Track A is happier than Track B
- Playlist A is less diverse than Playlist B
- This track is 80% EDM and 20% Tech House
Human annotations are fundamental resources,
but here comes the problem…
108
Agreement and Perceptions
- Listeners trained in classical music tend to disagree more on perceived
emotions, than those not trained [Schedl et al., 2018]
- Preference, familiarity, and lyrics comprehension increase agreement for
emotions corresponding to quadrants Q1 and Q3. [Gómez-Cañón et al., 2020]
- Current emotional state of users influences their perception of music similarity
[Flexer et al., 2021]
109
Diversity is a means to an end, not an end in itself.
Higher education → prepare students for life (allow them to interact with
students different socio-cultural backgrounds).
Media studies → reaching democratic goals (e.g., informed citizenry, inclusive
public discourse).
Economics → foster innovation; hedging against ignorance; mitigating lock-in;
accommodating plural perspectives.
110
Impacts of Diversity
Higher education → prepare students for life (allow them to interact with
students different socio-cultural backgrounds).
Media studies → reaching democratic goals (e.g., informed citizenry, inclusive
public discourse).
Economics → foster innovation; hedging against ignorance; mitigating lock-in;
accommodating plural perspectives.
111
Impacts of Diversity
Higher education → prepare students for life (allow them to interact with
students different socio-cultural backgrounds).
Media studies → reaching democratic goals (e.g., informed citizenry, inclusive
public discourse).
Economics → foster innovation; hedging against ignorance; mitigating lock-in;
accommodating plural perspectives.
112
Impacts of Diversity
In the MIR field, which are the ends wherein the
fostering of diversity may make a difference, that
as----------------community we should care about?
113
Part II
Values in Practice:
Fairness
114
● General notions of Fairness and Bias
● Example of fairness studies in MIR and
recommendations:
○ What is fair in Music recommendation?
○ Gender imbalance in music Recsys
○ Fairness for users
○ Music record labels
○ Cover song identification
Overview
● Fairness: Identification and prevention of situations where individuals or a
group are systematically disfavored, e.g., based on their characteristics
● Fairness is an interdisciplinary topic
● It is based on normative ideas of what it means to treat people “fairly”
Introduction - Fairness
May refer to:
● societal biases
● statistical biases
...
Bias as an objective deviation/imbalance in a model,
measure or data compared to an intended target
Introduction - Bias
118
Bias: we refer to a fact of the system without making
any inherently normative judgment
Fairness: we use it to discuss the normative aspects
of the system and its effects
Fairness vs Bias
BIAS FAIRNESS
● No single general answer to fairness, ethics, etc.
● Fairness hardly will be defined only with clear, universal objectives
but:
● We can identify/measure/mitigate specific harms
● Then, look for general strategies or principles
What is fair? Fair for who?
The ML community has begun to address claims of algorithmic
bias under the rubric of fairness, accountability, and transparency.
We should clearly define:
● Assumptions
● Goals or principles that we are based on
● Methods used to evaluate the ability to meet those goals
● How to assess the relevance and appropriateness of those
goals to the social problem in question
What is fair? Fair for who?
[*] Redistribution and Rekognition: A Feminist Critique of Algorithmic Fairness [West, 2020]
The ML community has begun to address claims of algorithmic
bias under the rubric of fairness, accountability, and transparency.
We should clearly define:
● Assumptions
● Goals or principles that we are based on
● Methods used to evaluate the ability to meet those goals
● How to assess the relevance and appropriateness of those
goals to the social problem in question
What is fair? Fair for who?
[*] Redistribution and Rekognition: A Feminist Critique of Algorithmic Fairness [West, 2020]
● Distributional harm:
Harmful distribution of
resources or outcomes
● Representational harm:
Harmful internal or external
representation
General Concepts - I
Popularity
Recommended
Fairness in Information Access Systems [Ekstrand et al., 2022]
● Distributional harm:
Harmful distribution of
resources or outcomes
● Representational harm:
Harmful internal or external
representation
General Concepts - I
Jamaica
reggae
Mississippi
blues
● Individual fairness: Similar individuals have similar experience/treatment
● Group fairness: Different groups have similar experience/treatment
○ Sensitive attribute: Attribute identifying group membership
○ Disparate treatment: Groups intentionally treated differently
○ Disparate impact: Groups receive outcomes at different rates
○ Disparate mistreatment: Groups receive erroneous effects at different rates
General Concepts - II
● Individual fairness: Similar individuals have similar experience/treatment
● Group fairness: Different groups have similar experience/treatment
○ Sensitive attribute: Attribute identifying group membership
○ Disparate treatment: Groups intentionally treated differently
○ Disparate impact: Groups receive outcomes at different rates
○ Disparate mistreatment: Groups receive erroneous effects at different rates
General Concepts - II
● Individual fairness: Similar individuals have similar experience/treatment
● Group fairness: Different groups have similar experience/treatment
○ Sensitive attribute: Attribute identifying group membership
○ Disparate treatment: Groups intentionally treated differently
○ Disparate impact: Groups receive outcomes at different rates
○ Disparate mistreatment: Groups receive erroneous effects at different rates
General Concepts - II
● Individual fairness: Similar individuals have similar experience/treatment
● Group fairness: Different groups have similar experience/treatment
○ Sensitive attribute: Attribute identifying group membership
○ Disparate treatment: Groups intentionally treated differently
○ Disparate impact: Groups receive outcomes at different rates
○ Disparate mistreatment: Groups receive erroneous effects at different rates
General Concepts - II
● Individual fairness: Similar individuals have similar experience/treatment
● Group fairness: Different groups have similar experience/treatment
○ Sensitive attribute: Attribute identifying group membership
○ Disparate treatment: Groups intentionally treated differently
○ Disparate impact: Groups receive outcomes at different rates
○ Disparate mistreatment: Groups receive erroneous effects at different rates
General Concepts - II
● We want to make sure that different applications are free of undesired biases
and prevent any kind of discrimination
● Usually RS are user-centered: Maximize user satisfaction is the main goal
● Consider systems’ impact on multiple groups of individuals
● How platforms can be fair for the artists?
Provider vs Consumer Fairness
● We want to make sure that different applications are free of undesired biases
and prevent any kind of discrimination
● Usually RS are user-centered: Maximize user satisfaction is the main goal
● Consider systems’ impact on multiple groups of individuals
● How platforms can be fair for the artists?
Provider vs Consumer Fairness
● We want to make sure that different applications are free of undesired biases
and prevent any kind of discrimination
● Usually RS are user-centered: Maximize user satisfaction is the main goal
● Consider systems’ impact on multiple groups of individuals
● How platforms can be fair for the artists?
Provider vs Consumer Fairness
● We want to make sure that different applications are free of undesired biases
and prevent any kind of discrimination
● Usually RS are user-centered: Maximize user satisfaction is the main goal
● Consider systems’ impact on multiple groups of individuals
● How platforms can be fair for the artists?
Provider vs Consumer Fairness
● The leading questions are:
○ How do artists feel affected by current systems?
○ What do artists want to see improved in future?
○ What concrete ideas or even algorithmic decisions could be adopted in
future systems?
What is fair? exploring the artists' perspective on the fairness of music streaming platforms [Ferraro et al., 2021]
What is fair in Music recommendation?
135
Design of
semi-structured
interviews
Documents
preparation
Selected participants
based on diversity
Conducting
interviews
Interview
transcription
Coding
Analysis of
Results
Methodology - Semi-structured interviews and Qualitative Content Analysis
Qualitative content analysis [Mayring, 2004]
136
Topics
- What do artists think about the platforms' opportunities and power to influence
the users' listening behavior?
- We see a strong tendency against influencing users
Results - Interviews
138
"I don’t see why we should tell the users which
genres they should listen to"
- All artists agreed that the platforms should promote content by female artists
to reach a gender balance in what users consume
Results - Interviews
140
"I think there should be actions to correct some
biases. The question is in which cases it should be
corrected and in which not. In heavy metal music, I
imagine that there aren’t many female singers. Maybe
we could give them more visibility, otherwise they would
never be seen"
141
"[...] the population of the world is 50% women. So it
would be ridiculous if the system wouldn’t recommend
them.”
Based the interviews we focus on the topic of gender balance
● Qualitative approach:
○ Artists express that platforms should promote gender balance
● Quantitative approach:
○ Using 2 datasets of user listening data, evaluate how different users’ history is compared to the
(CF) recommendations
○ What is the effect in user consumption in the long term?
142
Break the Loop: Gender Imbalance in Music Recommenders [Ferraro, et al., 2021]
Gender Imbalance in Music Recsys
- Identify: We identify that these artists expect a balanced representation of
artists according to their gender
- Measure: Representation, Exposure and Utility
- Mitigate: Simulation approach to compare longitudinal effects considering
feedback loops.
Gender Imbalance in Music Recsys
- 3 sources of unfairness:
- Representation: Similar proportion to users history
- Exposure: Users have to put more effort to reach the first female artist in the recommendation
- Utiliy: Model make more “mistakes” for female artists than for male artists
Similar results in:
Exploring Artist Gender Bias in Music Recommendation [Shakespeare et al., 2020]
Gender Imbalance in Music Recsys - Results
Trained
model
Dataset with user
interactions
Simulate user interactions
Trained
model
...
Generate recommendations for users
(User X)
Simulate user interactions
Trained
model
Simulate user interactions with the
recommendations
...
(User X)
Simulate user interactions
Trained
model
Retrain the model
(User X)
Simulate user interactions
Trained
model
… repeat the process and
evaluate recommendations
...
(User X)
Simulate user interactions
150
Feedback Loop
151
Feedback Loop
- Exploring Longitudinal Effects of Session-based Recommendations. In Proceedings of
the 14th ACM Conference on Recommender Systems [Ferraro, Jannach & Serra, 2020]
- Consumption and performance: understanding longitudinal dynamics of recommender
systems via an agent-based simulation framework [Zhang et al., 2019]
- Longitudinal impact of preference biases on recommender systems’ performance [Zhou,
Zhang, & Adomavicius, 2021]
- The Dual Echo Chamber: Modeling Social Media Polarization for Interventional
Recommending [Donkers & Ziegler, 2021]
152
References - Simulations
- Exploring User Opinions of Fairness in Recommender Systems [Smith et al.,
2020]
- Fairness and Transparency in Recommendation: The Users’ Perspective
[Sonboli et al., 2021]
- The multisided complexity of fairness in recommender systems [Sonboli et al.,
2022]
References - Fairness for Users
- The Unfairness of Popularity Bias in Music Recommendation: A Reproducibility
Study [Kowald, Schedl & Lex, 2020]
- Support the underground: characteristics of beyond-mainstream music
listeners [Kowald et al., 2021]
- Analyzing Item Popularity Bias of Music Recommender Systems: Are Different
Genders Equally Affected? [Lesota et al., 2021]
References - Unfairness for users
Bias and Feedback Loops in Music Recommendation: Studies on Record Label
Impact [Knees, Ferraro & Hübler, 2022]
Music Record Labels
Music Record Labels
Bias and Feedback Loops in Music Recommendation: Studies on Record Label
Impact [Knees, Ferraro & Hübler, 2022]
Music Record Labels
Bias and Feedback Loops in Music Recommendation: Studies on Record Label
Impact [Knees, Ferraro & Hübler, 2022]
Assessing Algorithmic Biases for Musical Version Identification [Yesiler et al., 2022]
- Multiple stakeholders: Composers and other artists
- Compare 5 systems
- Use different attributes of the songs/artists
They Identify algorithmic biases in cover song id. systems that may lead to differences in
performance (dis)favor different groups of artists, e.g.:
- Tracks after 2000 work better
- Larger gap between release year lead to worse performance
- Learning-based systems perform better for underrepresented artists
Cover Song Identification
Assessing Algorithmic Biases for Musical Version Identification [Yesiler et al., 2022]
- Multiple stakeholders: Composers and other artists
- Compare 5 systems
- Use different attributes of the songs/artists
They Identify algorithmic biases in cover song id. systems that may lead to differences in
performance (dis)favor different groups of artists, e.g.:
- Tracks after 2000 work better
- Larger gap between release year lead to worse performance
- Learning-based systems perform better for underrepresented artists
Cover Song Identification
References
160
161
- Celik, I., Torre, I., Koceva, F., Bauer, C., Zangerle, E., & Knijnenburg, B. (2018). UMAP 2018 Intelligent
User-adapted Interfaces: Design and Multi-modal Evaluation (IUadaptMe). Workshop Chairs’ Welcome &
Organization. Proceedings of the 26th Conference on User Modeling, Adaptation and Personalization (UMAP
2018).
- Cleverdon, C.W. (1991). The significance of the cranfield tests on index languages. International ACM SIGIR
conference on research and development in information retrieval, 3–12.
- Flexer, A., Lallai, T., & Rašl, K. (2021). On Evaluation of Inter- and Intra-Rater Agreement in Music
Recommendation. Transactions of the International Society for Music Information Retrieval, 4(1), 182–194.
https://doi.org/10.5334/tismir.107
- Gómez-Cañón, J. S., Cano, E., Herrera, P., & Gómez, E. (2020). Joyful for You and Tender for Us: The Influence
of Individual Characteristics and Language on Emotion Labeling and Classification. Proceedings of the 21st
International Society for Music Information Retrieval Conference (ISMIR 2020).
- Jannach, D. & Bauer, C. (2020). Escaping the McNamara Fallacy: Toward More Impactful Recommender
Systems Research. AI Magazine, 41(4), 79-95. https://doi.org/10.1609/aimag.v41i4.5312
- Jannach, D. & Jugovac, M. (2019). Measuring the Business Value of Recommender Systems. ACM
Transactions on Management Information Systems 10(4), 1-23. https://doi.org/10.1145/ 3370082
162
- Lee, J. H. & Cunningham, S. J. (2013). Toward an understanding of the history and impact of user studies in
music information retrieval. Journal of Intelligent Information Systems. 41(3), 499-521.
https://doi.org/10.1007/s10844-013-0259-2
- Liu, J. & Hu, X. (2010). User-centered music information retrieval evaluation. Proceedings of the Joint
Conference on Digital Libraries (JCDL) Workshop: Music Information Retrieval for theMmasses.
- Miron, M. & Porcaro, L. (2021). MIR Evaluation Practices. Zenodo. https://doi.org/10.5281/zenodo.4611731
- Porcaro, L., & Gómez, E. (2019). 20 Years of Playlists: a Statistical Analysis on Popularity and Diversity.
Proceedings of the 20th International Symposium on Music Information Retrieval, ISMIR 2019, July, 4–11.
- Porcaro, L., Gómez, E., & Castillo, C. (2022). Perceptions of Diversity in Electronic Music: The Impact of Listener,
Artist, and Track Characteristics. Proc. ACM Hum.-Comput. Interact., 6(CSCW1).
https://doi.org/10.1145/3512956
- Schedl, M., Gomez, E., Trent, E. S., Tkalcic, M., Eghbal-Zadeh, H., & Martorell, A. (2018). On the interrelation
between listener characteristics and the perception of emotions in classical orchestra music. IEEE Transactions
on Affective Computing, 9(4), 507–525. https://doi.org/10.1109/TAFFC.2017.2663421
163
- Schedl, M., Flexer, A. & Urbano, J. (2013). The neglected user in music information retrieval research. Journal of
Intelligent Information Systems, 41, 523–539. https://doi.org/10.1007/s10844-013-0247-6
- Simpson, J. (1949). Measurements of diversity. Nature, 163, 688.
- Steel, D., Fazelpour, S., Gillette, K., Crewe, B., & Burgess, M. (2018). Multiple diversity concepts and their
ethical-epistemic implications. European Journal for Philosophy of Science, 8(3), 761–780.
https://doi.org/10.1007/s13194-018-0209-5
- Strathern, M. (1997). “Improving ratings”: audit in the British University system. European Review, 5(03), 305.
https://doi.org/10.1017/s1062798700002660
- Urbano, J., Schedl, M., & Serra, X. (2013). Evaluation in music information retrieval. Journal of Intelligent
Information Systems, 41(3), 345–369. https://doi.org/10.1007/s10844-013-0249-4
- Voorhees, E. M. (2002). The Philosophy of Information Retrieval Evaluation. In: Peters, C., Braschler, M., Gonzalo,
J., Kluck, M. (eds) Evaluation of Cross-Language Information Retrieval Systems. CLEF 2001. Lecture Notes in
Computer Science, vol 2406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45691-0_34
- Zangerle, E. & Bauer, C. (2022). Evaluating recommender systems: survey and framework. ACM Computing
Surveys. https://doi.org/10.1145/3556536

More Related Content

Similar to TRUSTWORTHY MIR - ISMIR 2022 Tutorial.pdf

Marsden Disinformation Algorithms #IGF2019
Marsden Disinformation Algorithms #IGF2019 Marsden Disinformation Algorithms #IGF2019
Marsden Disinformation Algorithms #IGF2019 Chris Marsden
 
Human-centered AI: towards the next generation of interactive and adaptive ex...
Human-centered AI: towards the next generation of interactive and adaptive ex...Human-centered AI: towards the next generation of interactive and adaptive ex...
Human-centered AI: towards the next generation of interactive and adaptive ex...Katrien Verbert
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Amit Sheth
 
Agricultural Robotics: ethical and social questions
Agricultural Robotics: ethical and social questionsAgricultural Robotics: ethical and social questions
Agricultural Robotics: ethical and social questionsGwendoline Legrand
 
Agricultural Robotics: ethical and social questions
Agricultural Robotics: ethical and social questionsAgricultural Robotics: ethical and social questions
Agricultural Robotics: ethical and social questionsFIRA / GOFAR
 
Artificial Intelligence Role in Modern Science Aims, Merits, Risks and Its Ap...
Artificial Intelligence Role in Modern Science Aims, Merits, Risks and Its Ap...Artificial Intelligence Role in Modern Science Aims, Merits, Risks and Its Ap...
Artificial Intelligence Role in Modern Science Aims, Merits, Risks and Its Ap...ijtsrd
 
Keynote 20221028 v5.pptx
Keynote 20221028 v5.pptxKeynote 20221028 v5.pptx
Keynote 20221028 v5.pptxISSIP
 
Semantic, Cognitive and Perceptual Computing -Keynote artificial intelligence...
Semantic, Cognitive and Perceptual Computing -Keynote artificial intelligence...Semantic, Cognitive and Perceptual Computing -Keynote artificial intelligence...
Semantic, Cognitive and Perceptual Computing -Keynote artificial intelligence...Artificial Intelligence Institute at UofSC
 
Black Box Learning Analytics? Beyond Algorithmic Transparency
Black Box Learning Analytics? Beyond Algorithmic TransparencyBlack Box Learning Analytics? Beyond Algorithmic Transparency
Black Box Learning Analytics? Beyond Algorithmic TransparencySimon Buckingham Shum
 
A REVIEW OF THE ETHICS OF ARTIFICIAL INTELLIGENCE AND ITS APPLICATIONS IN THE...
A REVIEW OF THE ETHICS OF ARTIFICIAL INTELLIGENCE AND ITS APPLICATIONS IN THE...A REVIEW OF THE ETHICS OF ARTIFICIAL INTELLIGENCE AND ITS APPLICATIONS IN THE...
A REVIEW OF THE ETHICS OF ARTIFICIAL INTELLIGENCE AND ITS APPLICATIONS IN THE...IJCI JOURNAL
 
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AIAICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AIFIAT/IFTA
 
Interactive recommender systems: opening up the “black box”
Interactive recommender systems: opening up the “black box”Interactive recommender systems: opening up the “black box”
Interactive recommender systems: opening up the “black box”Katrien Verbert
 
Submit Your Research Papers - International Journal of Information Sciences a...
Submit Your Research Papers - International Journal of Information Sciences a...Submit Your Research Papers - International Journal of Information Sciences a...
Submit Your Research Papers - International Journal of Information Sciences a...ijistjournal
 
Artificial Intelligence and Human Computer Interaction
Artificial Intelligence and Human Computer InteractionArtificial Intelligence and Human Computer Interaction
Artificial Intelligence and Human Computer Interactionijtsrd
 
Defining the boundary for AI research in Intelligent Systems Dec 2021
Defining the boundary for AI research in Intelligent Systems Dec  2021Defining the boundary for AI research in Intelligent Systems Dec  2021
Defining the boundary for AI research in Intelligent Systems Dec 2021Parasuram Balasubramanian
 
Ai2020 ai and or final
Ai2020 ai and or finalAi2020 ai and or final
Ai2020 ai and or finalRichard Vidgen
 

Similar to TRUSTWORTHY MIR - ISMIR 2022 Tutorial.pdf (20)

Marsden Disinformation Algorithms #IGF2019
Marsden Disinformation Algorithms #IGF2019 Marsden Disinformation Algorithms #IGF2019
Marsden Disinformation Algorithms #IGF2019
 
Human-centered AI: towards the next generation of interactive and adaptive ex...
Human-centered AI: towards the next generation of interactive and adaptive ex...Human-centered AI: towards the next generation of interactive and adaptive ex...
Human-centered AI: towards the next generation of interactive and adaptive ex...
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...
 
Agricultural Robotics: ethical and social questions
Agricultural Robotics: ethical and social questionsAgricultural Robotics: ethical and social questions
Agricultural Robotics: ethical and social questions
 
Agricultural Robotics: ethical and social questions
Agricultural Robotics: ethical and social questionsAgricultural Robotics: ethical and social questions
Agricultural Robotics: ethical and social questions
 
Tecnologías emergentes: priorizando al ciudadano
Tecnologías emergentes: priorizando al ciudadanoTecnologías emergentes: priorizando al ciudadano
Tecnologías emergentes: priorizando al ciudadano
 
Artificial Intelligence Role in Modern Science Aims, Merits, Risks and Its Ap...
Artificial Intelligence Role in Modern Science Aims, Merits, Risks and Its Ap...Artificial Intelligence Role in Modern Science Aims, Merits, Risks and Its Ap...
Artificial Intelligence Role in Modern Science Aims, Merits, Risks and Its Ap...
 
Keynote 20221028 v5.pptx
Keynote 20221028 v5.pptxKeynote 20221028 v5.pptx
Keynote 20221028 v5.pptx
 
Semantic, Cognitive and Perceptual Computing -Keynote artificial intelligence...
Semantic, Cognitive and Perceptual Computing -Keynote artificial intelligence...Semantic, Cognitive and Perceptual Computing -Keynote artificial intelligence...
Semantic, Cognitive and Perceptual Computing -Keynote artificial intelligence...
 
Black Box Learning Analytics? Beyond Algorithmic Transparency
Black Box Learning Analytics? Beyond Algorithmic TransparencyBlack Box Learning Analytics? Beyond Algorithmic Transparency
Black Box Learning Analytics? Beyond Algorithmic Transparency
 
DATAIA & TransAlgo
DATAIA & TransAlgoDATAIA & TransAlgo
DATAIA & TransAlgo
 
A REVIEW OF THE ETHICS OF ARTIFICIAL INTELLIGENCE AND ITS APPLICATIONS IN THE...
A REVIEW OF THE ETHICS OF ARTIFICIAL INTELLIGENCE AND ITS APPLICATIONS IN THE...A REVIEW OF THE ETHICS OF ARTIFICIAL INTELLIGENCE AND ITS APPLICATIONS IN THE...
A REVIEW OF THE ETHICS OF ARTIFICIAL INTELLIGENCE AND ITS APPLICATIONS IN THE...
 
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AIAICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
 
AI, people, and society
AI, people, and societyAI, people, and society
AI, people, and society
 
List of publication may 2017
List of publication may 2017List of publication may 2017
List of publication may 2017
 
Interactive recommender systems: opening up the “black box”
Interactive recommender systems: opening up the “black box”Interactive recommender systems: opening up the “black box”
Interactive recommender systems: opening up the “black box”
 
Submit Your Research Papers - International Journal of Information Sciences a...
Submit Your Research Papers - International Journal of Information Sciences a...Submit Your Research Papers - International Journal of Information Sciences a...
Submit Your Research Papers - International Journal of Information Sciences a...
 
Artificial Intelligence and Human Computer Interaction
Artificial Intelligence and Human Computer InteractionArtificial Intelligence and Human Computer Interaction
Artificial Intelligence and Human Computer Interaction
 
Defining the boundary for AI research in Intelligent Systems Dec 2021
Defining the boundary for AI research in Intelligent Systems Dec  2021Defining the boundary for AI research in Intelligent Systems Dec  2021
Defining the boundary for AI research in Intelligent Systems Dec 2021
 
Ai2020 ai and or final
Ai2020 ai and or finalAi2020 ai and or final
Ai2020 ai and or final
 

Recently uploaded

QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

TRUSTWORTHY MIR - ISMIR 2022 Tutorial.pdf

  • 1. Trustworthy MIR ISMIR 2022 Tutorial Christine Bauer, Andrés Ferraro, Emilia Gómez, Lorenzo Porcaro 1
  • 2. 2 Christine Bauer Utrecht University, The Netherlands Andrés Ferraro McGill University & Mila, Canada* Emilia Gómez, Music Technology Group (UPF) & Joint Research Center (EC), Spain Lorenzo Porcaro Music Technology Group (UPF) & Joint Research Center (EC), Italy * Currently at Pandora-SiriusXM
  • 3. Outline 3 ➔ Part 1 - Trustworthy MIR ◆ Trustworthy AI and its application to Music IR (Emilia, 30’) < SHORT BREAK 10’> ◆ Evaluation (Christine, 30’) < LONG BREAK 20’> ➔ Part 2 - Values in practice ◆ Diversity (Lorenzo, 30’) < SHORT BREAK 10’> ◆ Fairness (Andrés, 30’) ➔ Wrap up (10’)
  • 4. Part I Trustworthy AI and its application to Music IR 4
  • 5. 5 Trustworthy /ˈtrʌstˌwɚði/ : able to be relied on to do or provide what is needed or right; deserving of trust; worthy of confidence.
  • 6. System-oriented Extraction and inference of meaningful information from music. Socio-technical system Interaction with the social system, e.g. business processes, organizations, society (law, culture). Markus Schedl, Emilia Gómez and Julián Urbano (2014), "Music Information Retrieval: Recent Developments and Applications", Foundations and Trends in Information Retrieval: Vol. 8: No. 2-3, pp 127-261. http://dx.doi.org/10.1561/1500000042 Paradigm change in MIR 6
  • 7. 1. Lawful - respecting all applicable laws and regulations. 2. Ethical - respecting ethical principles and values. 3. Robust - both from a technical perspective while taking into account its social environment. Avoid intentional/unintentional harm. 7 Ethics guidelines for Trustworthy AI (2019)
  • 8. KR1. Human agency and oversight KR2. Technical robustness and safety KR3. Privacy and data governance KR4. Transparency KR5. Diversity, non- discrimination and fairness KR6. Societal and environmental well-being KR7. Accountability 7 Key Requirements for Trustworthy AI To be continuously evaluated and addressed throughout the AI system’s life cycle 8
  • 9. Hupont, I., & Gomez, E. (2022). Documenting use cases in the affective computing domain using Unified Modeling Language. Affective Computing and Intelligence Interaction https://arxiv.org/abs/2209.09666v1 9 From systems to use cases: example
  • 10. Description AI systems should empower human beings, allowing them to make informed decisions and fostering their fundamental rights. At the same time, proper oversight mechanisms need to be ensured, which can be achieved through human-in-the-loop, human-on-the-loop, and human-in-command approaches. Related topics ● Relevant user interfaces and HCIs. ● Ways to override or reverse the system output. ● Expertise needed to operate a system, how to evaluate its correct operation. Questions: - How does an affective music recommender may empower a listener? - How listeners can take control of a music recommender system? KR1. Human agency and oversight
  • 11. 11
  • 12. Description AI systems need to be resilient and secure. They need to be safe, ensuring a fall back plan in case something goes wrong, as well as being accurate, reliable and reproducible. That is the only way to ensure that also unintentional harm can be minimized and prevented. Related topics ● Accuracy level, metrics, ● Robustness tests: unintended (e.g. noise) or intended errors (e.g. attacks). ● Consideration of edge cases. ● Reproducibility, open science. KR2. Technical robustness and safety MIR considerations ● Different metrics for different tasks and contexts. ● Robustness tests not widely applied. ● Reproducibility fostered in ISMIR research.
  • 13. Description Besides ensuring full respect for privacy and data protection, adequate data governance mechanisms must also be ensured, taking into account the quality and integrity of the data, and ensuring legitimised access to data. Related topics ● Protection of personal data (e.g. minimization). ● Privacy-preserving recommendations. ● Define how to collect, label, process, audit and monitor the data that goes into developing the AI system. ● Datasets used for training and validation should be relevant, without errors and representative: ensure data is up to date, matches demographics, carry out sanity checks. KR3. Privacy and data governance
  • 14. Challenges: • Data sources, crossing, reconciliation. • Time dimension. • User understandability and control. • Different legal approaches (e.g. EU - GDPR vs US). • Different levels of personal data: ○ Nominative data leading to the identification of natural persons. ○ Data leading to identification through “complex” processes. Pierre Saurel, Francis Rousseaux, & Marc Danger. (2014). On The Changing Regulations of Privacy and Personal Information in MIR. Proceedings of the 15th International Society for Music Information Retrieval Conference, 597–602. https://doi.org/10.5281/zenodo.1416638 KR3. Privacy and data governance
  • 15. J. S. Gómez-Cañón et al., "Music Emotion Recognition: Toward new, robust standards in personalized and context-sensitive applications," in IEEE Signal Processing Magazine, vol. 38, no. 6, pp. 106-114, Nov. 2021, doi: 10.1109/MSP.2021.3106232. KR3. Privacy and personalization
  • 16. Description The data, system and business models linked to AI should be transparent. Traceability mechanisms can help achieving this. Moreover, AI systems and their decisions should be explained in a manner adapted to the stakeholder concerned. Humans need to be aware that they are interacting with an AI system, and must be informed of the system’s capabilities and limitations. Related topics How the system is built and evaluated, training data, limitations. How the system is monitored, e.g. logs. How the system is controlled, e.g. how to interpret its outputs. Potential misuse. Pre-determined changes, expected lifetime and necessary maintenance/care measures. Communication to users, e.g. generative models. KR4. Transparency https://algorithmic-transparency.ec.europa.eu/index_en
  • 17. ● Transparency can serve to empower people to challenge AI systems. ● Listeners → fake news, impersonation. ● Artists → intellectual property, e.g. training data has copyright, infringement on the engineer. Gómez, E., Blaauw, M., Bonada, J., Chandna, P., & Cuesta, H. (2018). Deep learning for singing processing: Achievements, challenges and impact on singers and listeners. arXiv preprint arXiv:1807.03046. Sturm B., Iglesias M, Ben-Tal O, Miron M, Gómez E. Artificial Intelligence and Music: Open Questions of Copyright Law and Engineering Praxis. Arts. 2019; 8(3):115.https://doi.org/10.3390/arts8030115 KR4. Transparency: music generation
  • 18. Description Unfair bias must be avoided, as it could have multiple negative implications, from the marginalization of vulnerable groups, to the exacerbation of prejudice and discrimination. Fostering diversity, AI systems should be accessible to all, regardless of any disability, and involve relevant stakeholders throughout their entire life cycle. Related topics Demographics, gender, sex, age, ethnicity, culture, religion, sexual orientation, political orientation, culture…. Specific section later in the second part of the tutorial! KR5. Diversity, non-discrimination and fairness
  • 19. Description AI systems should benefit all human beings, including future generations. It must hence be ensured that they are sustainable and environmentally friendly. Moreover, they should take into account the environment, including other living beings, and their social and societal impact should be carefully considered. Related topics ● Impact on jobs and skills. ● Computing ressources. Questions: - How will MIR affect music workspaces? e.g. instruments players, musicians, composers. Tolan, S., Pesole, A., Martínez-Plumed, F., Fernández-Macías, E., Hernández-Orallo, J., & Gómez, E. (2021). Measuring the occupational impact of AI: tasks, cognitive abilities and AI benchmarks. Journal of Artificial Intelligence Research, 71, 191-236. Emilia Gómez (2020). Human and Machine Intelligence: A Music Information Retrieval Perspective. Keynote speech, 11th International Conference on Computational Creativity. KR6. Societal and environmental well-being
  • 20. • Creative and performing artists • Architects, planners, surveyors, designers • Artistic, cultural and culinary associate professionals • Creativity and resolution • Accounting • Business • Communication • Conceptualization • Text comprehension • Attention and search • Quantitative reasoning • AI exposure score • Few benchmaking initiatives on creative systems Songul Tolan, Annarosa Pesole, Fernando Martínez-Plumed, Enrique Fernandez-Macias, José Hernández-Orallo & Emilia Gómez, “Measuring the Occupational Impact of AI: Tasks, Cognitive Abilities and AI Benchmarks”, JRC Working Papers on Labour, Education and Technology 2020-02, Joint Research Centre (Seville site). KR6. Societal well-being: jobs
  • 21. Training Evaluation: version identification, accuracy-scalability plane. Embedding distillation techniques: less storage, faster retrieval and similar accuracy (Yesiler et al. 2022). NIME Conference Environmental Statement https://eco.nime.org/ Traditional systems DL systems Scalability Accuracy F. Yesiler, J. Serrà, E. Gómez. Less is more: Faster and better music version identification with embedding distillation, ISMIR 2020. Jukebox, Open AI (2020) Evaluation KR6. Environmental well-being
  • 22. Description Mechanisms should be put in place to ensure responsibility and accountability for AI systems and their outcomes. Auditability, which enables the assessment of algorithms, data and design processes plays a key role therein, especially in critical applications. Moreover, adequate an accessible redress should be ensured. Related topics ● Reproducibility and open data, code. ● Algorithmic audits. Questions: - Which are the challenges to reproduce an existing paper, e.g. audit an MIR system? KR7. Accountability
  • 23. Newly addressed Some research Strong background KR1. Human agency and oversight KR2. Technical robustness and safety KR3. Privacy and data governance KR4. Transparency KR5. Diversity, non-discrimination and fairness KR6. Societal and environmental well-being KR7. Accountability 7 Key Requirements for Trustworthy AI To be continuously evaluated and addressed throughout the AI system’s life cycle
  • 24. From ethical guidelines to legal requirements AI Act: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:52021PC0206 **Under negotiation** Legal requirements: • Robustness, accuracy and cybersecurity. • Human oversight (measures built into the system and/or to be implemented by users). • Ensure appropriate degree of transparency and provide users with information on capabilities and limitations of the system and how to use it. • Technical documentation and logging capabilities (traceability and auditability). • High-quality and representative training, validation and testing data. • Risk management. AI Act - Scope: AI systems (software products) KR1. Human agency and oversight KR2. Technical robustness and safety KR3. Privacy and data governance KR4. Transparency KR5. Diversity, non-discriminatio n and fairness KR6. Societal and environmental well-being KR7. Accountability Unacceptable risk High risk ‘Transparency’ risk Minimal or no risk Prohibite d Permitted subject to compliance with AI requirements and ex-ante conformity assessment Permitted subject to information/transparency obligations Permitted with no restrictions *Not mutually exclusive
  • 25. ● Risk management. ● Transparency of recommender systems, online advertisement. ● External & independent auditing, internal compliance function and public accountability. ● Data sharing with authorities and researchers. ● Crisis response cooperation Digital Services Act: https://ec.europa.eu/info/strategy/priorities-2019-2024/europe-fit-digital-age/digital-services-act-ensuring-s afe-and-accountable-online-environment_en Entering into force in Europe in 2023! Digital Services Act - Scope: digital services (e.g. search engines, online platforms) From ethical guidelines to legal requirements
  • 26. PRINCIPLES ● Proportionality and Do No Harm ● Safety and security ● Fairness and non-discrimination ● Sustainability ● Right to Privacy, and Data Protection ● Human oversight and determination ● Transparency and explainability ● Responsibility and accountability ● Awareness and literacy ● Multi-stakeholder and adaptive governance and collaboration https://unesdoc.unesco.org/ark:/48223/pf0000381137 Towards worldwide recommendations
  • 29. In 1878 in Birka (Southeastern Sweden), unburied Viking settlement from about 750 to 950 29 High-status, Viking warrior, male. Weapons found in the grave. (Image credit: Neil Price, Charlotte Hedenstierna-Jonson, Torun Zachrisso, Anna Kjellström; Copyright : Antiquity Publications Ltd.) Illustration how the burial might have looked just before it was closed in Viking times. (Image credit: Drawing by Þórhallur Þráinsson; Copyright Antiquity Publications Ltd.)
  • 30. buried person is female 2017, DNA test 30 XX chromosomes Second method (other than looking on clothes, arms, and setting)—the DNA test—gave a better insight. Assumption that arms, “non-female appearing” clothes, etc., indicate a male warrior.
  • 31. ? This finding opened up new questions 31 All Viking warriors women? Was she a warrior? What was the social role of women in Viking time?
  • 32. 32 There are blind spots when taking only one perspective, using a single method with one metric.
  • 33. 33 MIR evaluation: from system-centered to human-centered approaches
  • 34. MIR evaluation - first steps: system-centered 34 2004 ISMIR 2004 Audio Description Contest first international evaluation project in MIR 2005 first edition of the Music Information Retrieval Evaluation eXchange (MIREX), organized by IMIRSEL (International Music IR Systems Evaluation Laboratory) 2011 MusiClef campaign (later MediaEval series) 2012 Million Song Dataset Challenge (Urbano et al., 2013)
  • 35. MIR evaluation - mimicking users 35 ● follow Cranfield paradigm for Information Retrieval (IR) evaluation ● trying to mimic the potential requests of the final users: “removing actual users from the experiment but including a static user component: the ground truth” ● 3 major simplifying assumptions: ○ 1. Relevance can be approximated by topical similarity. ○ 2. A single set of judgments for a topic is representative of the user population. ○ 3. Lists of relevant documents for each topic is complete (all relevant documents are known). (Urbano et al., 2013) (Cleverdon, 1991) (Vorhees, 2002) mimicking users ≈ system-centered
  • 36. MIR evaluation - considering the user: user-centered? 36 ● the neglected user: ○ too simplistic view of “the user” ○ we need to model the user comprehensively ● still: no user involved?! (Schedl et al., 2013)
  • 37. 37 Nothing about us without us.
  • 38. MIR evaluation - involving the user = (more) user-centered 38 ● If you target user, you have to involve users. ● Typical questions to address: ○ (How) do users accomplish their tasks? ○ (How) do users accomplish the sub-tasks (if any) in their tasks? ○ How many steps do the users spend in finishing their tasks? ○ How long do the users spend in finishing their tasks? (task completion time) ● Important aspects→ Variations and differences across users ○ Individual differences and subjectivity ○ Still: Users groups will share similarities (Liu & Hu, 2010)
  • 39. MIR evaluation - from user-centered to human-centered evaluation 39 Users are not the only humans involved in and impacted by MIR systems!
  • 40. Considering multiple stakeholders 40 music creators performers music companies platform/service providers end consumers/listeners society ...
  • 41. E.g., Stakeholders for (music) recommender systems 41 What is my target group? Who else is affected/involved? Also consider sub-groups! (Jannach & Bauer, 2020, adapted)
  • 42. What is the providers goal, need or intent? Do these overlap with the users’ perspective? Do they contradict? Different stakeholders — task, intent, goal, need 42 What is the user’s task? What is the user’s intent? What does the user want? What does the user need? Are there multiple tasks, intents, demands, needs? Is this fair? Is this desirable? Who says that?
  • 43. Different stakeholders — different risks 43 (Jannach & Jugovac, 2019)
  • 44. 44 Striving toward trustworthy MIR, we need to assess our systems concerning the various perspectives.
  • 45. Comprehensive evaluation is not one-size-fits all 45 ● A one-size-fits-all evaluation strategy does not exist. ● Each scenario requires its fitting evaluation strategy. ● Different stakeholders may have different perspectives, goals, demands, wishes, values, risks—potentially contradicting.
  • 46. 46 Comprehensive evaluation Goal: Getting an integrated big picture of a system’s performance
  • 47. Comprehensive evaluation = multi-* evaluation 47 multiple methods multiple metrics multiple datasets multiple perspectives …
  • 48. Benefits 48 ● Explore sophisticated issues more holistically and widely ● Capture diverse, but complementary phenomena ● Apply diverse methods to capture the same phenomenon from possibly different angles ● Neutralize biases inherent to evaluation approaches (Celik et al., 2018)
  • 49. We need to thoughtfully configure the evaluation design space. And we have to do this on several levels for a comprehensive evaluation. 49 (Zangerle & Bauer, 2022) Framework for EValuating Recommender systems (FEVR)
  • 50. Typical differentiation of experiment types 50 Different (sub-)communities →different terminology ● Computational or algorithmic approaches ● User studies (in the lab or online) ● Field studies/experiment (using a real-world system) (Zangerle & Bauer, 2020)
  • 51. The scenarios to evaluate are multifaceted 51 Potential overall goals: ● Convince/persuade user ● User satisfaction ● Replace user ● Assist user ● … Target group: ● Individual users vs. user groups ● Novices vs experts ● Adults vs children ● Niche music vs mainstream listeners ● … Situation of a user in session: ● Main task vs side task ● Self-initiated vs other-initiated ● Clear idea/goal vs exploration ● … Starting point for evaluation setting: ● Ground truth exists vs subjective perception (i.e., variation across users) ● Existing data vs data collection ○ Fit of data to goal ○ Representation of target group ● …
  • 52. 52 What exactly do we want to find out? What do we need to find out? Is it relevant? Does it matter in practice? What is good enough?
  • 53. Let us look at some variables of interest 53
  • 54. Does my system accurately identify a song? 54 ● accuracy-based metrics (e.g., precision, recall, F-score) ● What is the harm of a false positive? ● What is the harm of a false negative?
  • 55. For person who uploaded a song: Does my system accurately identify a song? 55 ● What is the harm of a false positive? ● What is the harm of a false negative? TP marked as infringement, is infringement FP marked as infringement, is not infringement TN not marked as infringement, is not infringement FN not marked as infringement, is infringement Song identification to detect copyright infringement:
  • 56. For copyright holder of song: Does my system accurately identify a song? 56 ● What is the harm of a false positive? ● What is the harm of a false negative? TP marked as infringement, is infringement FP marked as infringement, is not infringement TN not marked as infringement, is not infringement FN not marked as infringement, is infringement Song identification to detect copyright infringement:
  • 57. Is a MIR system fair? 57 Is my system fair for artists? ● Are artists fairly represented in a playlist? ● …in a dataset? ● …fair regarding gender of artists? ● …fair regarding genre? ● …fair regarding country of origin? ● … Is my system fair for users? ● Are world-music fans as well served as heavy-metal fans? ● Are gender minority users served well? ● Are novices served well? ● … Do artists perceive it as fair? Do users perceive it as fair? Do artists/users perceive any differences?
  • 58. 58 We need to ask a lot of question. We need to ask the right questions. We need to ask the right questions right.
  • 59. Part II Values in Practice: Diversity and Fairness 59
  • 60. Key Requirements for Trustworthy AI (AI-HLEG, 2018) 60 “In order to achieve Trustworthy AI, we must enable inclusion and diversity throughout the entire AI system’s life cycle. Besides the consideration and involvement of all affected stakeholders throughout the process, this also entails ensuring equal access through inclusive design processes as well as equal treatment. This requirement is closely linked with the principle of fairness.” (Section 1.5)
  • 61. Recommendation on the Ethics of AI (UNESCO, 2020) 61 “AI actors should promote social justice and safeguard fairness and non-discrimination of any kind [...]. This implies an inclusive approach to ensuring that the benefits of AI technologies are available and accessible to all, taking into consideration the specific needs of different age groups, cultural systems, different language groups, persons with disabilities, girls and women, and disadvantaged, marginalized and vulnerable people or people in vulnerable situations.” (Section III.2, Article 28)
  • 62. (IS)MIR Initiatives 62 ❖ Woman in Music Information Retrieval (WiMIR), 2011-present https://wimir.wordpress.com/ ❖ Ethics guidelines for MIR, WiMIR workshop, Nov. 3 2019 (link) ❖ ISMIR Tutorials: ➢ Why is Greek Music Interesting? Towards an Ethics of MIR, Section 4, 2014 (link) ➢ Computational Approaches for Analysis of Non-Western Music Traditions, 2018 (link) ➢ Fairness, Accountability and Transparency in Music Information Research, 2019 (link) ❖ ISMIR Special Call for Papers: Cultural and Social Diversity in MIR (2021, 2022) ❖ TISMIR Call for Papers: Special Collection on Cultural Diversity in MIR (2022) ❖ NIME Principles & Code of Practice on Ethical Research https://www.nime.org/ethics/
  • 63. Part II Values in Practice: Diversity 63
  • 64. Organizing Team ISMIR 2022 (w/o volunteers) 64 https://ismir2022.ismir.net/team
  • 65. Organizing Team ISMIR 2022 (w/o volunteers) 65 Is it a diverse team? https://ismir2022.ismir.net/team
  • 66. Organizing Team ISMIR 2022 (w/o volunteers) 66 Is it a diverse team? Yes, but… https://ismir2022.ismir.net/team
  • 67. Organizing Team ISMIR 2022 (w/o volunteers) 67 Is it a diverse team? Yes, but… No, but… https://ismir2022.ismir.net/team
  • 68. 68
  • 70. Contexts & Concepts of Diversity (Steel et al., 2018) 70 An understanding of what constitutes diversity, abstracted from questions of which attributes are relevant to diversity in a specific context. Background situation that suggests attributes relevant for assessing diversity. Context Concept
  • 71. Contexts & Concepts of Diversity (Steel et al., 2018) 71 An understanding of what constitutes diversity, abstracted from questions of which attributes are relevant to diversity in a specific context. Background situation that suggests attributes relevant for assessing diversity. Context Concept
  • 72. Contexts & Concepts of Diversity (Steel et al., 2018) 72 An understanding of what constitutes diversity, abstracted from questions of which attributes are relevant to diversity in a specific context. Background situation that suggests attributes relevant for assessing diversity. Context Concept Domain Agnostic Domain Specific
  • 73. Contexts & Concepts of Diversity (Steel et al., 2018) 73 An understanding of what constitutes diversity, abstracted from questions of which attributes are relevant to diversity in a specific context. Background situation that suggests attributes relevant for assessing diversity. Context Concept Domain Agnostic Domain Specific
  • 74. Concepts of Diversity - Egalitarian (Steel et al., 2018) 74 Let L be the line-up of an electronic music festival.
  • 75. Concepts of Diversity - Egalitarian (Steel et al., 2018) 75 Let L be the line-up of an electronic music festival. If the artists part of L are distributed uniformly over several styles of electronic music S = {s1 , ..., sn}, the festival is diverse from an egalitarian perspective.
  • 76. Concepts of Diversity - Normic (Steel et al., 2018) 76 Let L be the line-up of an electronic music festival, S = {s1 , ..., sn } be styles of electronic music, and Snd the non-diverse norm be the Electronic Dance Music (EDM) style.
  • 77. Concepts of Diversity - Normic (Steel et al., 2018) 77 Let L be the line-up of an electronic music festival, S = {s1 , ..., sn } be styles of electronic music, and Snd the non-diverse norm be the Electronic Dance Music (EDM) style. Then, we can assume that L is expected to be diverse in a normic sense if the majority of artists performing do not play EDM i.e. ∉ Snd
  • 78. 78 “When a measure becomes a target, it ceases to be a good measure.” Strathern (1997)
  • 79. 79 Measures of Diversity Index-based Given a set 𝛀 of N elements, analyze their distribution into categories and returns a single value: - Compact representation - Monodimensional
  • 80. 80 Measures of Diversity Index-based Given a set 𝛀 of N elements, analyze richness and evenness returns a single value: - Compact representation - Monodimensional Distance-based Given a set of 𝛀 of N elements, compute the pairwise distance between elements & aggregate: - Element-wise information - Multidimensional
  • 81. 81 Measures of Diversity - Index-based Given a set 𝛀 of N elements, Richness: number of categories present in the set 𝛀 Evenness: distribution of elements across the categories of 𝛀
  • 82. 82 Measures of Diversity - Index-based Given a set 𝛀 of N elements, Richness: number of categories present in the set 𝛀 Evenness: distribution of elements across the categories of 𝛀 Valence Arousal Q1 Q2 Q3 Q4
  • 83. 83 Measures of Diversity - Index-based Richness: number of categories present in the set 𝛀 = 4 = N Evenness: distribution of elements across the categories of 𝛀 = (1,1,1,1) = (n1, n2, n3, n4) Valence Arousal Q1 Q2 Q3 Q4
  • 84. 84 Measures of Diversity - Index-based Richness: number of categories present in the set 𝛀 = 4 = N Evenness: distribution of elements across the categories of 𝛀 = (1,1,1,1) = (n1, n2, n3, n4) Valence Arousal Q1 Q2 Q3 Q4 Simpson’s Diversity Index (Simpson, 1949) 0 = no diversity 1 = high diversity
  • 85. 85 Measures of Diversity - Index-based Richness: number of categories present in the set 𝛀 = 4 = N Evenness: distribution of elements across the categories of 𝛀 = (1,1,1,1) = (n1, n2, n3, n4) Valence Arousal Q1 Q2 Q3 Q4 How diverse is the set of tracks (in terms of emotion recognized) according to the Simpson index? Simpson’s Diversity Index (Simpson, 1949)
  • 86. 86 Measures of Diversity - Index-based Richness: number of categories present in the set 𝛀 = 4 = N Evenness: distribution of elements across the categories of 𝛀 = (1,1,1,1) = (n1, n2, n3, n4) Valence Arousal Q1 Q2 Q3 Q4 D(𝛀) = 1 N.B. quite egalitarian Simpson’s Diversity Index (Simpson, 1949)
  • 87. 87 Measures of Diversity - Index-based Richness: number of categories present in the set 𝛀 = 4 = N Evenness: distribution of elements across the categories of 𝛀 = (1,1,1,1) = (n1, n2, n3, n4) Valence Arousal Q1 Q2 Q3 Q4 Valence Arousal Q1 Q2 Q3 Q4
  • 88. 88 Measures of Diversity - Index-based Richness: number of categories present in the set 𝛀 = 4 = N Evenness: distribution of elements across the categories of 𝛀 = (1,1,1,1) = (n1, n2, n3, n4) Valence Arousal Q1 Q2 Q3 Q4 Valence Arousal Q1 Q2 Q3 Q4 D(𝛀) = 1 ???
  • 89. 89 Measures of Diversity - Distance-based Given a set of 𝛀 of N elements, compute the elements pairwise distance (Euclidean, Jaccard, Cosine, etc.) , and aggregate.
  • 90. 90 Measures of Diversity - Distance-based (Porcaro and Gómez, 2019) 2-dimensional projection of tag-embedding space
  • 91. 91 Measures of Diversity - Distance-based (Porcaro and Gómez, 2019) 2-dimensional projection of tag-embedding space Computed pairwise distance within clusters
  • 92. 92 Measures of Diversity - Distance-based (Porcaro and Gómez, 2019) 2-dimensional projection of tag-embedding space Computed pairwise distance within clusters The one with maximum average distance is the most diverse!
  • 93. 93 Measures of Diversity - Distance-based (Porcaro and Gómez, 2019) 2-dimensional projection of tag-embedding space Computed pairwise distance within clusters The one with maximum distance is the most diverse!
  • 94. 94 Measures of Diversity - Distance-based (Porcaro and Gómez, 2019)
  • 95. 95 Measures of Diversity - Distance-based (Porcaro and Gómez, 2019) Jazz vocal - Swing ??? Ballad - Love
  • 96. 96 Do differences in system-measures directly translate to the same differences in user-measures? How do those differences affect end users? Construct Validity (Urbano et al., 2013)
  • 97. 97 Perceptions of Diversity (Porcaro et al., 2022) Pretty Lights (US) Downtempo Technoboy (IT) Hardstyle Bonobo (UK) Downbeat Laurent Garnier (FR) Techno The Blessed Madonna (US) House Helena Hauff (DE) Techno Miss DJAX (NL) Hardcore Beatrice Dillon (UK) Experimental
  • 98. 98 Perceptions of Diversity (Porcaro et al., 2022) Pretty Lights (US) Downtempo Technoboy (IT) Hardstyle Bonobo (UK) Downbeat Laurent Garnier (FR) Techno The Blessed Madonna (US) House Helena Hauff (DE) Techno Miss DJAX (NL) Hardcore Beatrice Dillon (UK) Experimental Which one is the most diverse playlist?
  • 99. 99 Perceptions of Diversity (Porcaro et al., 2022) Pretty Lights (US) Downtempo Technoboy (IT) Hardstyle Bonobo (UK) Downbeat Laurent Garnier (FR) Techno The Blessed Madonna (US) House Helena Hauff (DE) Techno Miss DJAX (NL) Hardcore Beatrice Dillon (UK) Experimental Both playlists have: - Same number of styles - Only Western-based DJs - Only white skin DJs - Only one sex (biological aspect)
  • 100. 100 Perceptions of Diversity (Porcaro et al., 2022) Pretty Lights (US) Downtempo Technoboy (IT) Hardstyle Bonobo (UK) Downbeat Laurent Garnier (FR) Techno The Blessed Madonna (US) House Helena Hauff (DE) Techno Miss DJAX (NL) Hardcore Beatrice Dillon (UK) Experimental According to computational diversity metrics, NO statistical difference.
  • 101. 101 Perceptions of Diversity (Porcaro et al., 2022) Pretty Lights (US) Downtempo Technoboy (IT) Hardstyle Bonobo (UK) Downbeat Laurent Garnier (FR) Techno The Blessed Madonna (US) House Helena Hauff (DE) Techno Miss DJAX (NL) Hardcore Beatrice Dillon (UK) Experimental but…
  • 102. 102 Perceptions of Diversity (Porcaro et al., 2022) Pretty Lights (US) Downtempo Technoboy (IT) Hardstyle Bonobo (UK) Downbeat Laurent Garnier (FR) Techno The Blessed Madonna (US) House Helena Hauff (DE) Techno Miss DJAX (NL) Hardcore Beatrice Dillon (UK) Experimental Asking to 110 participants, 72% of them agree that List Blue is more diverse than List Green
  • 103. 103 Perceptions of Diversity (Porcaro et al., 2022) Pretty Lights (US) Downtempo Technoboy (IT) Hardstyle Bonobo (UK) Downbeat Laurent Garnier (FR) Techno The Blessed Madonna (US) House Helena Hauff (DE) Techno Miss DJAX (NL) Hardcore Beatrice Dillon (UK) Experimental Most of the participants employed a normic concept of diversity…
  • 104. 104 Perceptions of Diversity (Porcaro et al., 2022) Pretty Lights (US) Downtempo Technoboy (IT) Hardstyle Bonobo (UK) Downbeat Laurent Garnier (FR) Techno The Blessed Madonna (US) House Helena Hauff (DE) Techno Miss DJAX (NL) Hardcore Beatrice Dillon (UK) Experimental Most of the participants employed a normic concept of diversity… … while our model works under an egalitarian concept.
  • 106. 106 MIR and Subjectivity MIR systems include a great level of subjectivity: - Track A and Track B are very similar - Track A is happier than Track B - Playlist A is less diverse than Playlist B - This track is 80% EDM and 20% Tech House
  • 107. 107 MIR and Subjectivity MIR systems include a great level of subjectivity: - Track A and Track B are very similar - Track A is happier than Track B - Playlist A is less diverse than Playlist B - This track is 80% EDM and 20% Tech House Human annotations are fundamental resources, but here comes the problem…
  • 108. 108 Agreement and Perceptions - Listeners trained in classical music tend to disagree more on perceived emotions, than those not trained [Schedl et al., 2018] - Preference, familiarity, and lyrics comprehension increase agreement for emotions corresponding to quadrants Q1 and Q3. [Gómez-Cañón et al., 2020] - Current emotional state of users influences their perception of music similarity [Flexer et al., 2021]
  • 109. 109 Diversity is a means to an end, not an end in itself.
  • 110. Higher education → prepare students for life (allow them to interact with students different socio-cultural backgrounds). Media studies → reaching democratic goals (e.g., informed citizenry, inclusive public discourse). Economics → foster innovation; hedging against ignorance; mitigating lock-in; accommodating plural perspectives. 110 Impacts of Diversity
  • 111. Higher education → prepare students for life (allow them to interact with students different socio-cultural backgrounds). Media studies → reaching democratic goals (e.g., informed citizenry, inclusive public discourse). Economics → foster innovation; hedging against ignorance; mitigating lock-in; accommodating plural perspectives. 111 Impacts of Diversity
  • 112. Higher education → prepare students for life (allow them to interact with students different socio-cultural backgrounds). Media studies → reaching democratic goals (e.g., informed citizenry, inclusive public discourse). Economics → foster innovation; hedging against ignorance; mitigating lock-in; accommodating plural perspectives. 112 Impacts of Diversity
  • 113. In the MIR field, which are the ends wherein the fostering of diversity may make a difference, that as----------------community we should care about? 113
  • 114. Part II Values in Practice: Fairness 114
  • 115. ● General notions of Fairness and Bias ● Example of fairness studies in MIR and recommendations: ○ What is fair in Music recommendation? ○ Gender imbalance in music Recsys ○ Fairness for users ○ Music record labels ○ Cover song identification Overview
  • 116. ● Fairness: Identification and prevention of situations where individuals or a group are systematically disfavored, e.g., based on their characteristics ● Fairness is an interdisciplinary topic ● It is based on normative ideas of what it means to treat people “fairly” Introduction - Fairness
  • 117. May refer to: ● societal biases ● statistical biases ... Bias as an objective deviation/imbalance in a model, measure or data compared to an intended target Introduction - Bias
  • 118. 118 Bias: we refer to a fact of the system without making any inherently normative judgment Fairness: we use it to discuss the normative aspects of the system and its effects
  • 120. ● No single general answer to fairness, ethics, etc. ● Fairness hardly will be defined only with clear, universal objectives but: ● We can identify/measure/mitigate specific harms ● Then, look for general strategies or principles What is fair? Fair for who?
  • 121. The ML community has begun to address claims of algorithmic bias under the rubric of fairness, accountability, and transparency. We should clearly define: ● Assumptions ● Goals or principles that we are based on ● Methods used to evaluate the ability to meet those goals ● How to assess the relevance and appropriateness of those goals to the social problem in question What is fair? Fair for who? [*] Redistribution and Rekognition: A Feminist Critique of Algorithmic Fairness [West, 2020]
  • 122. The ML community has begun to address claims of algorithmic bias under the rubric of fairness, accountability, and transparency. We should clearly define: ● Assumptions ● Goals or principles that we are based on ● Methods used to evaluate the ability to meet those goals ● How to assess the relevance and appropriateness of those goals to the social problem in question What is fair? Fair for who? [*] Redistribution and Rekognition: A Feminist Critique of Algorithmic Fairness [West, 2020]
  • 123. ● Distributional harm: Harmful distribution of resources or outcomes ● Representational harm: Harmful internal or external representation General Concepts - I Popularity Recommended Fairness in Information Access Systems [Ekstrand et al., 2022]
  • 124. ● Distributional harm: Harmful distribution of resources or outcomes ● Representational harm: Harmful internal or external representation General Concepts - I Jamaica reggae Mississippi blues
  • 125. ● Individual fairness: Similar individuals have similar experience/treatment ● Group fairness: Different groups have similar experience/treatment ○ Sensitive attribute: Attribute identifying group membership ○ Disparate treatment: Groups intentionally treated differently ○ Disparate impact: Groups receive outcomes at different rates ○ Disparate mistreatment: Groups receive erroneous effects at different rates General Concepts - II
  • 126. ● Individual fairness: Similar individuals have similar experience/treatment ● Group fairness: Different groups have similar experience/treatment ○ Sensitive attribute: Attribute identifying group membership ○ Disparate treatment: Groups intentionally treated differently ○ Disparate impact: Groups receive outcomes at different rates ○ Disparate mistreatment: Groups receive erroneous effects at different rates General Concepts - II
  • 127. ● Individual fairness: Similar individuals have similar experience/treatment ● Group fairness: Different groups have similar experience/treatment ○ Sensitive attribute: Attribute identifying group membership ○ Disparate treatment: Groups intentionally treated differently ○ Disparate impact: Groups receive outcomes at different rates ○ Disparate mistreatment: Groups receive erroneous effects at different rates General Concepts - II
  • 128. ● Individual fairness: Similar individuals have similar experience/treatment ● Group fairness: Different groups have similar experience/treatment ○ Sensitive attribute: Attribute identifying group membership ○ Disparate treatment: Groups intentionally treated differently ○ Disparate impact: Groups receive outcomes at different rates ○ Disparate mistreatment: Groups receive erroneous effects at different rates General Concepts - II
  • 129. ● Individual fairness: Similar individuals have similar experience/treatment ● Group fairness: Different groups have similar experience/treatment ○ Sensitive attribute: Attribute identifying group membership ○ Disparate treatment: Groups intentionally treated differently ○ Disparate impact: Groups receive outcomes at different rates ○ Disparate mistreatment: Groups receive erroneous effects at different rates General Concepts - II
  • 130. ● We want to make sure that different applications are free of undesired biases and prevent any kind of discrimination ● Usually RS are user-centered: Maximize user satisfaction is the main goal ● Consider systems’ impact on multiple groups of individuals ● How platforms can be fair for the artists? Provider vs Consumer Fairness
  • 131. ● We want to make sure that different applications are free of undesired biases and prevent any kind of discrimination ● Usually RS are user-centered: Maximize user satisfaction is the main goal ● Consider systems’ impact on multiple groups of individuals ● How platforms can be fair for the artists? Provider vs Consumer Fairness
  • 132. ● We want to make sure that different applications are free of undesired biases and prevent any kind of discrimination ● Usually RS are user-centered: Maximize user satisfaction is the main goal ● Consider systems’ impact on multiple groups of individuals ● How platforms can be fair for the artists? Provider vs Consumer Fairness
  • 133. ● We want to make sure that different applications are free of undesired biases and prevent any kind of discrimination ● Usually RS are user-centered: Maximize user satisfaction is the main goal ● Consider systems’ impact on multiple groups of individuals ● How platforms can be fair for the artists? Provider vs Consumer Fairness
  • 134. ● The leading questions are: ○ How do artists feel affected by current systems? ○ What do artists want to see improved in future? ○ What concrete ideas or even algorithmic decisions could be adopted in future systems? What is fair? exploring the artists' perspective on the fairness of music streaming platforms [Ferraro et al., 2021] What is fair in Music recommendation?
  • 135. 135 Design of semi-structured interviews Documents preparation Selected participants based on diversity Conducting interviews Interview transcription Coding Analysis of Results Methodology - Semi-structured interviews and Qualitative Content Analysis Qualitative content analysis [Mayring, 2004]
  • 137. - What do artists think about the platforms' opportunities and power to influence the users' listening behavior? - We see a strong tendency against influencing users Results - Interviews
  • 138. 138 "I don’t see why we should tell the users which genres they should listen to"
  • 139. - All artists agreed that the platforms should promote content by female artists to reach a gender balance in what users consume Results - Interviews
  • 140. 140 "I think there should be actions to correct some biases. The question is in which cases it should be corrected and in which not. In heavy metal music, I imagine that there aren’t many female singers. Maybe we could give them more visibility, otherwise they would never be seen"
  • 141. 141 "[...] the population of the world is 50% women. So it would be ridiculous if the system wouldn’t recommend them.”
  • 142. Based the interviews we focus on the topic of gender balance ● Qualitative approach: ○ Artists express that platforms should promote gender balance ● Quantitative approach: ○ Using 2 datasets of user listening data, evaluate how different users’ history is compared to the (CF) recommendations ○ What is the effect in user consumption in the long term? 142 Break the Loop: Gender Imbalance in Music Recommenders [Ferraro, et al., 2021] Gender Imbalance in Music Recsys
  • 143. - Identify: We identify that these artists expect a balanced representation of artists according to their gender - Measure: Representation, Exposure and Utility - Mitigate: Simulation approach to compare longitudinal effects considering feedback loops. Gender Imbalance in Music Recsys
  • 144. - 3 sources of unfairness: - Representation: Similar proportion to users history - Exposure: Users have to put more effort to reach the first female artist in the recommendation - Utiliy: Model make more “mistakes” for female artists than for male artists Similar results in: Exploring Artist Gender Bias in Music Recommendation [Shakespeare et al., 2020] Gender Imbalance in Music Recsys - Results
  • 146. Trained model ... Generate recommendations for users (User X) Simulate user interactions
  • 147. Trained model Simulate user interactions with the recommendations ... (User X) Simulate user interactions
  • 148. Trained model Retrain the model (User X) Simulate user interactions
  • 149. Trained model … repeat the process and evaluate recommendations ... (User X) Simulate user interactions
  • 152. - Exploring Longitudinal Effects of Session-based Recommendations. In Proceedings of the 14th ACM Conference on Recommender Systems [Ferraro, Jannach & Serra, 2020] - Consumption and performance: understanding longitudinal dynamics of recommender systems via an agent-based simulation framework [Zhang et al., 2019] - Longitudinal impact of preference biases on recommender systems’ performance [Zhou, Zhang, & Adomavicius, 2021] - The Dual Echo Chamber: Modeling Social Media Polarization for Interventional Recommending [Donkers & Ziegler, 2021] 152 References - Simulations
  • 153. - Exploring User Opinions of Fairness in Recommender Systems [Smith et al., 2020] - Fairness and Transparency in Recommendation: The Users’ Perspective [Sonboli et al., 2021] - The multisided complexity of fairness in recommender systems [Sonboli et al., 2022] References - Fairness for Users
  • 154. - The Unfairness of Popularity Bias in Music Recommendation: A Reproducibility Study [Kowald, Schedl & Lex, 2020] - Support the underground: characteristics of beyond-mainstream music listeners [Kowald et al., 2021] - Analyzing Item Popularity Bias of Music Recommender Systems: Are Different Genders Equally Affected? [Lesota et al., 2021] References - Unfairness for users
  • 155. Bias and Feedback Loops in Music Recommendation: Studies on Record Label Impact [Knees, Ferraro & Hübler, 2022] Music Record Labels
  • 156. Music Record Labels Bias and Feedback Loops in Music Recommendation: Studies on Record Label Impact [Knees, Ferraro & Hübler, 2022]
  • 157. Music Record Labels Bias and Feedback Loops in Music Recommendation: Studies on Record Label Impact [Knees, Ferraro & Hübler, 2022]
  • 158. Assessing Algorithmic Biases for Musical Version Identification [Yesiler et al., 2022] - Multiple stakeholders: Composers and other artists - Compare 5 systems - Use different attributes of the songs/artists They Identify algorithmic biases in cover song id. systems that may lead to differences in performance (dis)favor different groups of artists, e.g.: - Tracks after 2000 work better - Larger gap between release year lead to worse performance - Learning-based systems perform better for underrepresented artists Cover Song Identification
  • 159. Assessing Algorithmic Biases for Musical Version Identification [Yesiler et al., 2022] - Multiple stakeholders: Composers and other artists - Compare 5 systems - Use different attributes of the songs/artists They Identify algorithmic biases in cover song id. systems that may lead to differences in performance (dis)favor different groups of artists, e.g.: - Tracks after 2000 work better - Larger gap between release year lead to worse performance - Learning-based systems perform better for underrepresented artists Cover Song Identification
  • 161. 161 - Celik, I., Torre, I., Koceva, F., Bauer, C., Zangerle, E., & Knijnenburg, B. (2018). UMAP 2018 Intelligent User-adapted Interfaces: Design and Multi-modal Evaluation (IUadaptMe). Workshop Chairs’ Welcome & Organization. Proceedings of the 26th Conference on User Modeling, Adaptation and Personalization (UMAP 2018). - Cleverdon, C.W. (1991). The significance of the cranfield tests on index languages. International ACM SIGIR conference on research and development in information retrieval, 3–12. - Flexer, A., Lallai, T., & Rašl, K. (2021). On Evaluation of Inter- and Intra-Rater Agreement in Music Recommendation. Transactions of the International Society for Music Information Retrieval, 4(1), 182–194. https://doi.org/10.5334/tismir.107 - Gómez-Cañón, J. S., Cano, E., Herrera, P., & Gómez, E. (2020). Joyful for You and Tender for Us: The Influence of Individual Characteristics and Language on Emotion Labeling and Classification. Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR 2020). - Jannach, D. & Bauer, C. (2020). Escaping the McNamara Fallacy: Toward More Impactful Recommender Systems Research. AI Magazine, 41(4), 79-95. https://doi.org/10.1609/aimag.v41i4.5312 - Jannach, D. & Jugovac, M. (2019). Measuring the Business Value of Recommender Systems. ACM Transactions on Management Information Systems 10(4), 1-23. https://doi.org/10.1145/ 3370082
  • 162. 162 - Lee, J. H. & Cunningham, S. J. (2013). Toward an understanding of the history and impact of user studies in music information retrieval. Journal of Intelligent Information Systems. 41(3), 499-521. https://doi.org/10.1007/s10844-013-0259-2 - Liu, J. & Hu, X. (2010). User-centered music information retrieval evaluation. Proceedings of the Joint Conference on Digital Libraries (JCDL) Workshop: Music Information Retrieval for theMmasses. - Miron, M. & Porcaro, L. (2021). MIR Evaluation Practices. Zenodo. https://doi.org/10.5281/zenodo.4611731 - Porcaro, L., & Gómez, E. (2019). 20 Years of Playlists: a Statistical Analysis on Popularity and Diversity. Proceedings of the 20th International Symposium on Music Information Retrieval, ISMIR 2019, July, 4–11. - Porcaro, L., Gómez, E., & Castillo, C. (2022). Perceptions of Diversity in Electronic Music: The Impact of Listener, Artist, and Track Characteristics. Proc. ACM Hum.-Comput. Interact., 6(CSCW1). https://doi.org/10.1145/3512956 - Schedl, M., Gomez, E., Trent, E. S., Tkalcic, M., Eghbal-Zadeh, H., & Martorell, A. (2018). On the interrelation between listener characteristics and the perception of emotions in classical orchestra music. IEEE Transactions on Affective Computing, 9(4), 507–525. https://doi.org/10.1109/TAFFC.2017.2663421
  • 163. 163 - Schedl, M., Flexer, A. & Urbano, J. (2013). The neglected user in music information retrieval research. Journal of Intelligent Information Systems, 41, 523–539. https://doi.org/10.1007/s10844-013-0247-6 - Simpson, J. (1949). Measurements of diversity. Nature, 163, 688. - Steel, D., Fazelpour, S., Gillette, K., Crewe, B., & Burgess, M. (2018). Multiple diversity concepts and their ethical-epistemic implications. European Journal for Philosophy of Science, 8(3), 761–780. https://doi.org/10.1007/s13194-018-0209-5 - Strathern, M. (1997). “Improving ratings”: audit in the British University system. European Review, 5(03), 305. https://doi.org/10.1017/s1062798700002660 - Urbano, J., Schedl, M., & Serra, X. (2013). Evaluation in music information retrieval. Journal of Intelligent Information Systems, 41(3), 345–369. https://doi.org/10.1007/s10844-013-0249-4 - Voorhees, E. M. (2002). The Philosophy of Information Retrieval Evaluation. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds) Evaluation of Cross-Language Information Retrieval Systems. CLEF 2001. Lecture Notes in Computer Science, vol 2406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45691-0_34 - Zangerle, E. & Bauer, C. (2022). Evaluating recommender systems: survey and framework. ACM Computing Surveys. https://doi.org/10.1145/3556536