Successfully reported this slideshow.
Your SlideShare is downloading. ×

TRUSTWORTHY MIR - ISMIR 2022 Tutorial.pdf

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Lorenzo Porcaro PhD Defense
Lorenzo Porcaro PhD Defense
Loading in …3
×

Check these out next

1 of 163 Ad

TRUSTWORTHY MIR - ISMIR 2022 Tutorial.pdf

Download to read offline

The Music Information Retrieval community shows an increasing interest in understanding how current technologies affect the everyday experience of people all over the world, e.g., how we listen to music, compose songs, or learn to play an instrument. As it was introduced in the FAT-MIR tutorial held at ISMIR 2019, a great discussion has aroused around the ethical, social, economic, legal, and cultural implications that the use of MIR systems have in our life.

In this tutorial, we aim at building upon and expanding the aforementioned debate, discussing the more recent results obtained by the MIR community and beyond. The goal of the tutorial is to show how values, such as fairness and diversity, can be embedded in the life cycle of MIR systems to make them trustworthy: from algorithmic design to evaluation practices and regulatory proposals. To achieve that, we will discuss examples of, among the others, popularity bias, gender bias, algorithmic bias, music styles underrepresentation, and diversity-related phenomena (e.g. filter bubbles).

This tutorial is suitable for researchers and students in MIR working in any domain, as these issues are relevant for all MIR tasks. The examples will mostly focus on music information retrieval and recommendation, but there are no prerequisites for taking this tutorial. Besides presenting recent research insights, the tutorial will integrate two hands-on sessions, where we will involve the participants in reflecting on the design of evaluation methods that take into account values for which MIR systems should be accountable.

The Music Information Retrieval community shows an increasing interest in understanding how current technologies affect the everyday experience of people all over the world, e.g., how we listen to music, compose songs, or learn to play an instrument. As it was introduced in the FAT-MIR tutorial held at ISMIR 2019, a great discussion has aroused around the ethical, social, economic, legal, and cultural implications that the use of MIR systems have in our life.

In this tutorial, we aim at building upon and expanding the aforementioned debate, discussing the more recent results obtained by the MIR community and beyond. The goal of the tutorial is to show how values, such as fairness and diversity, can be embedded in the life cycle of MIR systems to make them trustworthy: from algorithmic design to evaluation practices and regulatory proposals. To achieve that, we will discuss examples of, among the others, popularity bias, gender bias, algorithmic bias, music styles underrepresentation, and diversity-related phenomena (e.g. filter bubbles).

This tutorial is suitable for researchers and students in MIR working in any domain, as these issues are relevant for all MIR tasks. The examples will mostly focus on music information retrieval and recommendation, but there are no prerequisites for taking this tutorial. Besides presenting recent research insights, the tutorial will integrate two hands-on sessions, where we will involve the participants in reflecting on the design of evaluation methods that take into account values for which MIR systems should be accountable.

Advertisement
Advertisement

More Related Content

Recently uploaded (20)

Advertisement

TRUSTWORTHY MIR - ISMIR 2022 Tutorial.pdf

  1. 1. Trustworthy MIR ISMIR 2022 Tutorial Christine Bauer, Andrés Ferraro, Emilia Gómez, Lorenzo Porcaro 1
  2. 2. 2 Christine Bauer Utrecht University, The Netherlands Andrés Ferraro McGill University & Mila, Canada* Emilia Gómez, Music Technology Group (UPF) & Joint Research Center (EC), Spain Lorenzo Porcaro Music Technology Group (UPF) & Joint Research Center (EC), Italy * Currently at Pandora-SiriusXM
  3. 3. Outline 3 ➔ Part 1 - Trustworthy MIR ◆ Trustworthy AI and its application to Music IR (Emilia, 30’) < SHORT BREAK 10’> ◆ Evaluation (Christine, 30’) < LONG BREAK 20’> ➔ Part 2 - Values in practice ◆ Diversity (Lorenzo, 30’) < SHORT BREAK 10’> ◆ Fairness (Andrés, 30’) ➔ Wrap up (10’)
  4. 4. Part I Trustworthy AI and its application to Music IR 4
  5. 5. 5 Trustworthy /ˈtrʌstˌwɚði/ : able to be relied on to do or provide what is needed or right; deserving of trust; worthy of confidence.
  6. 6. System-oriented Extraction and inference of meaningful information from music. Socio-technical system Interaction with the social system, e.g. business processes, organizations, society (law, culture). Markus Schedl, Emilia Gómez and Julián Urbano (2014), "Music Information Retrieval: Recent Developments and Applications", Foundations and Trends in Information Retrieval: Vol. 8: No. 2-3, pp 127-261. http://dx.doi.org/10.1561/1500000042 Paradigm change in MIR 6
  7. 7. 1. Lawful - respecting all applicable laws and regulations. 2. Ethical - respecting ethical principles and values. 3. Robust - both from a technical perspective while taking into account its social environment. Avoid intentional/unintentional harm. 7 Ethics guidelines for Trustworthy AI (2019)
  8. 8. KR1. Human agency and oversight KR2. Technical robustness and safety KR3. Privacy and data governance KR4. Transparency KR5. Diversity, non- discrimination and fairness KR6. Societal and environmental well-being KR7. Accountability 7 Key Requirements for Trustworthy AI To be continuously evaluated and addressed throughout the AI system’s life cycle 8
  9. 9. Hupont, I., & Gomez, E. (2022). Documenting use cases in the affective computing domain using Unified Modeling Language. Affective Computing and Intelligence Interaction https://arxiv.org/abs/2209.09666v1 9 From systems to use cases: example
  10. 10. Description AI systems should empower human beings, allowing them to make informed decisions and fostering their fundamental rights. At the same time, proper oversight mechanisms need to be ensured, which can be achieved through human-in-the-loop, human-on-the-loop, and human-in-command approaches. Related topics ● Relevant user interfaces and HCIs. ● Ways to override or reverse the system output. ● Expertise needed to operate a system, how to evaluate its correct operation. Questions: - How does an affective music recommender may empower a listener? - How listeners can take control of a music recommender system? KR1. Human agency and oversight
  11. 11. 11
  12. 12. Description AI systems need to be resilient and secure. They need to be safe, ensuring a fall back plan in case something goes wrong, as well as being accurate, reliable and reproducible. That is the only way to ensure that also unintentional harm can be minimized and prevented. Related topics ● Accuracy level, metrics, ● Robustness tests: unintended (e.g. noise) or intended errors (e.g. attacks). ● Consideration of edge cases. ● Reproducibility, open science. KR2. Technical robustness and safety MIR considerations ● Different metrics for different tasks and contexts. ● Robustness tests not widely applied. ● Reproducibility fostered in ISMIR research.
  13. 13. Description Besides ensuring full respect for privacy and data protection, adequate data governance mechanisms must also be ensured, taking into account the quality and integrity of the data, and ensuring legitimised access to data. Related topics ● Protection of personal data (e.g. minimization). ● Privacy-preserving recommendations. ● Define how to collect, label, process, audit and monitor the data that goes into developing the AI system. ● Datasets used for training and validation should be relevant, without errors and representative: ensure data is up to date, matches demographics, carry out sanity checks. KR3. Privacy and data governance
  14. 14. Challenges: • Data sources, crossing, reconciliation. • Time dimension. • User understandability and control. • Different legal approaches (e.g. EU - GDPR vs US). • Different levels of personal data: ○ Nominative data leading to the identification of natural persons. ○ Data leading to identification through “complex” processes. Pierre Saurel, Francis Rousseaux, & Marc Danger. (2014). On The Changing Regulations of Privacy and Personal Information in MIR. Proceedings of the 15th International Society for Music Information Retrieval Conference, 597–602. https://doi.org/10.5281/zenodo.1416638 KR3. Privacy and data governance
  15. 15. J. S. Gómez-Cañón et al., "Music Emotion Recognition: Toward new, robust standards in personalized and context-sensitive applications," in IEEE Signal Processing Magazine, vol. 38, no. 6, pp. 106-114, Nov. 2021, doi: 10.1109/MSP.2021.3106232. KR3. Privacy and personalization
  16. 16. Description The data, system and business models linked to AI should be transparent. Traceability mechanisms can help achieving this. Moreover, AI systems and their decisions should be explained in a manner adapted to the stakeholder concerned. Humans need to be aware that they are interacting with an AI system, and must be informed of the system’s capabilities and limitations. Related topics How the system is built and evaluated, training data, limitations. How the system is monitored, e.g. logs. How the system is controlled, e.g. how to interpret its outputs. Potential misuse. Pre-determined changes, expected lifetime and necessary maintenance/care measures. Communication to users, e.g. generative models. KR4. Transparency https://algorithmic-transparency.ec.europa.eu/index_en
  17. 17. ● Transparency can serve to empower people to challenge AI systems. ● Listeners → fake news, impersonation. ● Artists → intellectual property, e.g. training data has copyright, infringement on the engineer. Gómez, E., Blaauw, M., Bonada, J., Chandna, P., & Cuesta, H. (2018). Deep learning for singing processing: Achievements, challenges and impact on singers and listeners. arXiv preprint arXiv:1807.03046. Sturm B., Iglesias M, Ben-Tal O, Miron M, Gómez E. Artificial Intelligence and Music: Open Questions of Copyright Law and Engineering Praxis. Arts. 2019; 8(3):115.https://doi.org/10.3390/arts8030115 KR4. Transparency: music generation
  18. 18. Description Unfair bias must be avoided, as it could have multiple negative implications, from the marginalization of vulnerable groups, to the exacerbation of prejudice and discrimination. Fostering diversity, AI systems should be accessible to all, regardless of any disability, and involve relevant stakeholders throughout their entire life cycle. Related topics Demographics, gender, sex, age, ethnicity, culture, religion, sexual orientation, political orientation, culture…. Specific section later in the second part of the tutorial! KR5. Diversity, non-discrimination and fairness
  19. 19. Description AI systems should benefit all human beings, including future generations. It must hence be ensured that they are sustainable and environmentally friendly. Moreover, they should take into account the environment, including other living beings, and their social and societal impact should be carefully considered. Related topics ● Impact on jobs and skills. ● Computing ressources. Questions: - How will MIR affect music workspaces? e.g. instruments players, musicians, composers. Tolan, S., Pesole, A., Martínez-Plumed, F., Fernández-Macías, E., Hernández-Orallo, J., & Gómez, E. (2021). Measuring the occupational impact of AI: tasks, cognitive abilities and AI benchmarks. Journal of Artificial Intelligence Research, 71, 191-236. Emilia Gómez (2020). Human and Machine Intelligence: A Music Information Retrieval Perspective. Keynote speech, 11th International Conference on Computational Creativity. KR6. Societal and environmental well-being
  20. 20. • Creative and performing artists • Architects, planners, surveyors, designers • Artistic, cultural and culinary associate professionals • Creativity and resolution • Accounting • Business • Communication • Conceptualization • Text comprehension • Attention and search • Quantitative reasoning • AI exposure score • Few benchmaking initiatives on creative systems Songul Tolan, Annarosa Pesole, Fernando Martínez-Plumed, Enrique Fernandez-Macias, José Hernández-Orallo & Emilia Gómez, “Measuring the Occupational Impact of AI: Tasks, Cognitive Abilities and AI Benchmarks”, JRC Working Papers on Labour, Education and Technology 2020-02, Joint Research Centre (Seville site). KR6. Societal well-being: jobs
  21. 21. Training Evaluation: version identification, accuracy-scalability plane. Embedding distillation techniques: less storage, faster retrieval and similar accuracy (Yesiler et al. 2022). NIME Conference Environmental Statement https://eco.nime.org/ Traditional systems DL systems Scalability Accuracy F. Yesiler, J. Serrà, E. Gómez. Less is more: Faster and better music version identification with embedding distillation, ISMIR 2020. Jukebox, Open AI (2020) Evaluation KR6. Environmental well-being
  22. 22. Description Mechanisms should be put in place to ensure responsibility and accountability for AI systems and their outcomes. Auditability, which enables the assessment of algorithms, data and design processes plays a key role therein, especially in critical applications. Moreover, adequate an accessible redress should be ensured. Related topics ● Reproducibility and open data, code. ● Algorithmic audits. Questions: - Which are the challenges to reproduce an existing paper, e.g. audit an MIR system? KR7. Accountability
  23. 23. Newly addressed Some research Strong background KR1. Human agency and oversight KR2. Technical robustness and safety KR3. Privacy and data governance KR4. Transparency KR5. Diversity, non-discrimination and fairness KR6. Societal and environmental well-being KR7. Accountability 7 Key Requirements for Trustworthy AI To be continuously evaluated and addressed throughout the AI system’s life cycle
  24. 24. From ethical guidelines to legal requirements AI Act: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:52021PC0206 **Under negotiation** Legal requirements: • Robustness, accuracy and cybersecurity. • Human oversight (measures built into the system and/or to be implemented by users). • Ensure appropriate degree of transparency and provide users with information on capabilities and limitations of the system and how to use it. • Technical documentation and logging capabilities (traceability and auditability). • High-quality and representative training, validation and testing data. • Risk management. AI Act - Scope: AI systems (software products) KR1. Human agency and oversight KR2. Technical robustness and safety KR3. Privacy and data governance KR4. Transparency KR5. Diversity, non-discriminatio n and fairness KR6. Societal and environmental well-being KR7. Accountability Unacceptable risk High risk ‘Transparency’ risk Minimal or no risk Prohibite d Permitted subject to compliance with AI requirements and ex-ante conformity assessment Permitted subject to information/transparency obligations Permitted with no restrictions *Not mutually exclusive
  25. 25. ● Risk management. ● Transparency of recommender systems, online advertisement. ● External & independent auditing, internal compliance function and public accountability. ● Data sharing with authorities and researchers. ● Crisis response cooperation Digital Services Act: https://ec.europa.eu/info/strategy/priorities-2019-2024/europe-fit-digital-age/digital-services-act-ensuring-s afe-and-accountable-online-environment_en Entering into force in Europe in 2023! Digital Services Act - Scope: digital services (e.g. search engines, online platforms) From ethical guidelines to legal requirements
  26. 26. PRINCIPLES ● Proportionality and Do No Harm ● Safety and security ● Fairness and non-discrimination ● Sustainability ● Right to Privacy, and Data Protection ● Human oversight and determination ● Transparency and explainability ● Responsibility and accountability ● Awareness and literacy ● Multi-stakeholder and adaptive governance and collaboration https://unesdoc.unesco.org/ark:/48223/pf0000381137 Towards worldwide recommendations
  27. 27. Part I Trustworthy MIR: Evaluation 27
  28. 28. 28 Excursus
  29. 29. In 1878 in Birka (Southeastern Sweden), unburied Viking settlement from about 750 to 950 29 High-status, Viking warrior, male. Weapons found in the grave. (Image credit: Neil Price, Charlotte Hedenstierna-Jonson, Torun Zachrisso, Anna Kjellström; Copyright : Antiquity Publications Ltd.) Illustration how the burial might have looked just before it was closed in Viking times. (Image credit: Drawing by Þórhallur Þráinsson; Copyright Antiquity Publications Ltd.)
  30. 30. buried person is female 2017, DNA test 30 XX chromosomes Second method (other than looking on clothes, arms, and setting)—the DNA test—gave a better insight. Assumption that arms, “non-female appearing” clothes, etc., indicate a male warrior.
  31. 31. ? This finding opened up new questions 31 All Viking warriors women? Was she a warrior? What was the social role of women in Viking time?
  32. 32. 32 There are blind spots when taking only one perspective, using a single method with one metric.
  33. 33. 33 MIR evaluation: from system-centered to human-centered approaches
  34. 34. MIR evaluation - first steps: system-centered 34 2004 ISMIR 2004 Audio Description Contest first international evaluation project in MIR 2005 first edition of the Music Information Retrieval Evaluation eXchange (MIREX), organized by IMIRSEL (International Music IR Systems Evaluation Laboratory) 2011 MusiClef campaign (later MediaEval series) 2012 Million Song Dataset Challenge (Urbano et al., 2013)
  35. 35. MIR evaluation - mimicking users 35 ● follow Cranfield paradigm for Information Retrieval (IR) evaluation ● trying to mimic the potential requests of the final users: “removing actual users from the experiment but including a static user component: the ground truth” ● 3 major simplifying assumptions: ○ 1. Relevance can be approximated by topical similarity. ○ 2. A single set of judgments for a topic is representative of the user population. ○ 3. Lists of relevant documents for each topic is complete (all relevant documents are known). (Urbano et al., 2013) (Cleverdon, 1991) (Vorhees, 2002) mimicking users ≈ system-centered
  36. 36. MIR evaluation - considering the user: user-centered? 36 ● the neglected user: ○ too simplistic view of “the user” ○ we need to model the user comprehensively ● still: no user involved?! (Schedl et al., 2013)
  37. 37. 37 Nothing about us without us.
  38. 38. MIR evaluation - involving the user = (more) user-centered 38 ● If you target user, you have to involve users. ● Typical questions to address: ○ (How) do users accomplish their tasks? ○ (How) do users accomplish the sub-tasks (if any) in their tasks? ○ How many steps do the users spend in finishing their tasks? ○ How long do the users spend in finishing their tasks? (task completion time) ● Important aspects→ Variations and differences across users ○ Individual differences and subjectivity ○ Still: Users groups will share similarities (Liu & Hu, 2010)
  39. 39. MIR evaluation - from user-centered to human-centered evaluation 39 Users are not the only humans involved in and impacted by MIR systems!
  40. 40. Considering multiple stakeholders 40 music creators performers music companies platform/service providers end consumers/listeners society ...
  41. 41. E.g., Stakeholders for (music) recommender systems 41 What is my target group? Who else is affected/involved? Also consider sub-groups! (Jannach & Bauer, 2020, adapted)
  42. 42. What is the providers goal, need or intent? Do these overlap with the users’ perspective? Do they contradict? Different stakeholders — task, intent, goal, need 42 What is the user’s task? What is the user’s intent? What does the user want? What does the user need? Are there multiple tasks, intents, demands, needs? Is this fair? Is this desirable? Who says that?
  43. 43. Different stakeholders — different risks 43 (Jannach & Jugovac, 2019)
  44. 44. 44 Striving toward trustworthy MIR, we need to assess our systems concerning the various perspectives.
  45. 45. Comprehensive evaluation is not one-size-fits all 45 ● A one-size-fits-all evaluation strategy does not exist. ● Each scenario requires its fitting evaluation strategy. ● Different stakeholders may have different perspectives, goals, demands, wishes, values, risks—potentially contradicting.
  46. 46. 46 Comprehensive evaluation Goal: Getting an integrated big picture of a system’s performance
  47. 47. Comprehensive evaluation = multi-* evaluation 47 multiple methods multiple metrics multiple datasets multiple perspectives …
  48. 48. Benefits 48 ● Explore sophisticated issues more holistically and widely ● Capture diverse, but complementary phenomena ● Apply diverse methods to capture the same phenomenon from possibly different angles ● Neutralize biases inherent to evaluation approaches (Celik et al., 2018)
  49. 49. We need to thoughtfully configure the evaluation design space. And we have to do this on several levels for a comprehensive evaluation. 49 (Zangerle & Bauer, 2022) Framework for EValuating Recommender systems (FEVR)
  50. 50. Typical differentiation of experiment types 50 Different (sub-)communities →different terminology ● Computational or algorithmic approaches ● User studies (in the lab or online) ● Field studies/experiment (using a real-world system) (Zangerle & Bauer, 2020)
  51. 51. The scenarios to evaluate are multifaceted 51 Potential overall goals: ● Convince/persuade user ● User satisfaction ● Replace user ● Assist user ● … Target group: ● Individual users vs. user groups ● Novices vs experts ● Adults vs children ● Niche music vs mainstream listeners ● … Situation of a user in session: ● Main task vs side task ● Self-initiated vs other-initiated ● Clear idea/goal vs exploration ● … Starting point for evaluation setting: ● Ground truth exists vs subjective perception (i.e., variation across users) ● Existing data vs data collection ○ Fit of data to goal ○ Representation of target group ● …
  52. 52. 52 What exactly do we want to find out? What do we need to find out? Is it relevant? Does it matter in practice? What is good enough?
  53. 53. Let us look at some variables of interest 53
  54. 54. Does my system accurately identify a song? 54 ● accuracy-based metrics (e.g., precision, recall, F-score) ● What is the harm of a false positive? ● What is the harm of a false negative?
  55. 55. For person who uploaded a song: Does my system accurately identify a song? 55 ● What is the harm of a false positive? ● What is the harm of a false negative? TP marked as infringement, is infringement FP marked as infringement, is not infringement TN not marked as infringement, is not infringement FN not marked as infringement, is infringement Song identification to detect copyright infringement:
  56. 56. For copyright holder of song: Does my system accurately identify a song? 56 ● What is the harm of a false positive? ● What is the harm of a false negative? TP marked as infringement, is infringement FP marked as infringement, is not infringement TN not marked as infringement, is not infringement FN not marked as infringement, is infringement Song identification to detect copyright infringement:
  57. 57. Is a MIR system fair? 57 Is my system fair for artists? ● Are artists fairly represented in a playlist? ● …in a dataset? ● …fair regarding gender of artists? ● …fair regarding genre? ● …fair regarding country of origin? ● … Is my system fair for users? ● Are world-music fans as well served as heavy-metal fans? ● Are gender minority users served well? ● Are novices served well? ● … Do artists perceive it as fair? Do users perceive it as fair? Do artists/users perceive any differences?
  58. 58. 58 We need to ask a lot of question. We need to ask the right questions. We need to ask the right questions right.
  59. 59. Part II Values in Practice: Diversity and Fairness 59
  60. 60. Key Requirements for Trustworthy AI (AI-HLEG, 2018) 60 “In order to achieve Trustworthy AI, we must enable inclusion and diversity throughout the entire AI system’s life cycle. Besides the consideration and involvement of all affected stakeholders throughout the process, this also entails ensuring equal access through inclusive design processes as well as equal treatment. This requirement is closely linked with the principle of fairness.” (Section 1.5)
  61. 61. Recommendation on the Ethics of AI (UNESCO, 2020) 61 “AI actors should promote social justice and safeguard fairness and non-discrimination of any kind [...]. This implies an inclusive approach to ensuring that the benefits of AI technologies are available and accessible to all, taking into consideration the specific needs of different age groups, cultural systems, different language groups, persons with disabilities, girls and women, and disadvantaged, marginalized and vulnerable people or people in vulnerable situations.” (Section III.2, Article 28)
  62. 62. (IS)MIR Initiatives 62 ❖ Woman in Music Information Retrieval (WiMIR), 2011-present https://wimir.wordpress.com/ ❖ Ethics guidelines for MIR, WiMIR workshop, Nov. 3 2019 (link) ❖ ISMIR Tutorials: ➢ Why is Greek Music Interesting? Towards an Ethics of MIR, Section 4, 2014 (link) ➢ Computational Approaches for Analysis of Non-Western Music Traditions, 2018 (link) ➢ Fairness, Accountability and Transparency in Music Information Research, 2019 (link) ❖ ISMIR Special Call for Papers: Cultural and Social Diversity in MIR (2021, 2022) ❖ TISMIR Call for Papers: Special Collection on Cultural Diversity in MIR (2022) ❖ NIME Principles & Code of Practice on Ethical Research https://www.nime.org/ethics/
  63. 63. Part II Values in Practice: Diversity 63
  64. 64. Organizing Team ISMIR 2022 (w/o volunteers) 64 https://ismir2022.ismir.net/team
  65. 65. Organizing Team ISMIR 2022 (w/o volunteers) 65 Is it a diverse team? https://ismir2022.ismir.net/team
  66. 66. Organizing Team ISMIR 2022 (w/o volunteers) 66 Is it a diverse team? Yes, but… https://ismir2022.ismir.net/team
  67. 67. Organizing Team ISMIR 2022 (w/o volunteers) 67 Is it a diverse team? Yes, but… No, but… https://ismir2022.ismir.net/team
  68. 68. 68
  69. 69. 69 Measure 2. Context & Concept 1. Impact 4. Perception 3. The Cycle of Diversity
  70. 70. Contexts & Concepts of Diversity (Steel et al., 2018) 70 An understanding of what constitutes diversity, abstracted from questions of which attributes are relevant to diversity in a specific context. Background situation that suggests attributes relevant for assessing diversity. Context Concept
  71. 71. Contexts & Concepts of Diversity (Steel et al., 2018) 71 An understanding of what constitutes diversity, abstracted from questions of which attributes are relevant to diversity in a specific context. Background situation that suggests attributes relevant for assessing diversity. Context Concept
  72. 72. Contexts & Concepts of Diversity (Steel et al., 2018) 72 An understanding of what constitutes diversity, abstracted from questions of which attributes are relevant to diversity in a specific context. Background situation that suggests attributes relevant for assessing diversity. Context Concept Domain Agnostic Domain Specific
  73. 73. Contexts & Concepts of Diversity (Steel et al., 2018) 73 An understanding of what constitutes diversity, abstracted from questions of which attributes are relevant to diversity in a specific context. Background situation that suggests attributes relevant for assessing diversity. Context Concept Domain Agnostic Domain Specific
  74. 74. Concepts of Diversity - Egalitarian (Steel et al., 2018) 74 Let L be the line-up of an electronic music festival.
  75. 75. Concepts of Diversity - Egalitarian (Steel et al., 2018) 75 Let L be the line-up of an electronic music festival. If the artists part of L are distributed uniformly over several styles of electronic music S = {s1 , ..., sn}, the festival is diverse from an egalitarian perspective.
  76. 76. Concepts of Diversity - Normic (Steel et al., 2018) 76 Let L be the line-up of an electronic music festival, S = {s1 , ..., sn } be styles of electronic music, and Snd the non-diverse norm be the Electronic Dance Music (EDM) style.
  77. 77. Concepts of Diversity - Normic (Steel et al., 2018) 77 Let L be the line-up of an electronic music festival, S = {s1 , ..., sn } be styles of electronic music, and Snd the non-diverse norm be the Electronic Dance Music (EDM) style. Then, we can assume that L is expected to be diverse in a normic sense if the majority of artists performing do not play EDM i.e. ∉ Snd
  78. 78. 78 “When a measure becomes a target, it ceases to be a good measure.” Strathern (1997)
  79. 79. 79 Measures of Diversity Index-based Given a set 𝛀 of N elements, analyze their distribution into categories and returns a single value: - Compact representation - Monodimensional
  80. 80. 80 Measures of Diversity Index-based Given a set 𝛀 of N elements, analyze richness and evenness returns a single value: - Compact representation - Monodimensional Distance-based Given a set of 𝛀 of N elements, compute the pairwise distance between elements & aggregate: - Element-wise information - Multidimensional
  81. 81. 81 Measures of Diversity - Index-based Given a set 𝛀 of N elements, Richness: number of categories present in the set 𝛀 Evenness: distribution of elements across the categories of 𝛀
  82. 82. 82 Measures of Diversity - Index-based Given a set 𝛀 of N elements, Richness: number of categories present in the set 𝛀 Evenness: distribution of elements across the categories of 𝛀 Valence Arousal Q1 Q2 Q3 Q4
  83. 83. 83 Measures of Diversity - Index-based Richness: number of categories present in the set 𝛀 = 4 = N Evenness: distribution of elements across the categories of 𝛀 = (1,1,1,1) = (n1, n2, n3, n4) Valence Arousal Q1 Q2 Q3 Q4
  84. 84. 84 Measures of Diversity - Index-based Richness: number of categories present in the set 𝛀 = 4 = N Evenness: distribution of elements across the categories of 𝛀 = (1,1,1,1) = (n1, n2, n3, n4) Valence Arousal Q1 Q2 Q3 Q4 Simpson’s Diversity Index (Simpson, 1949) 0 = no diversity 1 = high diversity
  85. 85. 85 Measures of Diversity - Index-based Richness: number of categories present in the set 𝛀 = 4 = N Evenness: distribution of elements across the categories of 𝛀 = (1,1,1,1) = (n1, n2, n3, n4) Valence Arousal Q1 Q2 Q3 Q4 How diverse is the set of tracks (in terms of emotion recognized) according to the Simpson index? Simpson’s Diversity Index (Simpson, 1949)
  86. 86. 86 Measures of Diversity - Index-based Richness: number of categories present in the set 𝛀 = 4 = N Evenness: distribution of elements across the categories of 𝛀 = (1,1,1,1) = (n1, n2, n3, n4) Valence Arousal Q1 Q2 Q3 Q4 D(𝛀) = 1 N.B. quite egalitarian Simpson’s Diversity Index (Simpson, 1949)
  87. 87. 87 Measures of Diversity - Index-based Richness: number of categories present in the set 𝛀 = 4 = N Evenness: distribution of elements across the categories of 𝛀 = (1,1,1,1) = (n1, n2, n3, n4) Valence Arousal Q1 Q2 Q3 Q4 Valence Arousal Q1 Q2 Q3 Q4
  88. 88. 88 Measures of Diversity - Index-based Richness: number of categories present in the set 𝛀 = 4 = N Evenness: distribution of elements across the categories of 𝛀 = (1,1,1,1) = (n1, n2, n3, n4) Valence Arousal Q1 Q2 Q3 Q4 Valence Arousal Q1 Q2 Q3 Q4 D(𝛀) = 1 ???
  89. 89. 89 Measures of Diversity - Distance-based Given a set of 𝛀 of N elements, compute the elements pairwise distance (Euclidean, Jaccard, Cosine, etc.) , and aggregate.
  90. 90. 90 Measures of Diversity - Distance-based (Porcaro and Gómez, 2019) 2-dimensional projection of tag-embedding space
  91. 91. 91 Measures of Diversity - Distance-based (Porcaro and Gómez, 2019) 2-dimensional projection of tag-embedding space Computed pairwise distance within clusters
  92. 92. 92 Measures of Diversity - Distance-based (Porcaro and Gómez, 2019) 2-dimensional projection of tag-embedding space Computed pairwise distance within clusters The one with maximum average distance is the most diverse!
  93. 93. 93 Measures of Diversity - Distance-based (Porcaro and Gómez, 2019) 2-dimensional projection of tag-embedding space Computed pairwise distance within clusters The one with maximum distance is the most diverse!
  94. 94. 94 Measures of Diversity - Distance-based (Porcaro and Gómez, 2019)
  95. 95. 95 Measures of Diversity - Distance-based (Porcaro and Gómez, 2019) Jazz vocal - Swing ??? Ballad - Love
  96. 96. 96 Do differences in system-measures directly translate to the same differences in user-measures? How do those differences affect end users? Construct Validity (Urbano et al., 2013)
  97. 97. 97 Perceptions of Diversity (Porcaro et al., 2022) Pretty Lights (US) Downtempo Technoboy (IT) Hardstyle Bonobo (UK) Downbeat Laurent Garnier (FR) Techno The Blessed Madonna (US) House Helena Hauff (DE) Techno Miss DJAX (NL) Hardcore Beatrice Dillon (UK) Experimental
  98. 98. 98 Perceptions of Diversity (Porcaro et al., 2022) Pretty Lights (US) Downtempo Technoboy (IT) Hardstyle Bonobo (UK) Downbeat Laurent Garnier (FR) Techno The Blessed Madonna (US) House Helena Hauff (DE) Techno Miss DJAX (NL) Hardcore Beatrice Dillon (UK) Experimental Which one is the most diverse playlist?
  99. 99. 99 Perceptions of Diversity (Porcaro et al., 2022) Pretty Lights (US) Downtempo Technoboy (IT) Hardstyle Bonobo (UK) Downbeat Laurent Garnier (FR) Techno The Blessed Madonna (US) House Helena Hauff (DE) Techno Miss DJAX (NL) Hardcore Beatrice Dillon (UK) Experimental Both playlists have: - Same number of styles - Only Western-based DJs - Only white skin DJs - Only one sex (biological aspect)
  100. 100. 100 Perceptions of Diversity (Porcaro et al., 2022) Pretty Lights (US) Downtempo Technoboy (IT) Hardstyle Bonobo (UK) Downbeat Laurent Garnier (FR) Techno The Blessed Madonna (US) House Helena Hauff (DE) Techno Miss DJAX (NL) Hardcore Beatrice Dillon (UK) Experimental According to computational diversity metrics, NO statistical difference.
  101. 101. 101 Perceptions of Diversity (Porcaro et al., 2022) Pretty Lights (US) Downtempo Technoboy (IT) Hardstyle Bonobo (UK) Downbeat Laurent Garnier (FR) Techno The Blessed Madonna (US) House Helena Hauff (DE) Techno Miss DJAX (NL) Hardcore Beatrice Dillon (UK) Experimental but…
  102. 102. 102 Perceptions of Diversity (Porcaro et al., 2022) Pretty Lights (US) Downtempo Technoboy (IT) Hardstyle Bonobo (UK) Downbeat Laurent Garnier (FR) Techno The Blessed Madonna (US) House Helena Hauff (DE) Techno Miss DJAX (NL) Hardcore Beatrice Dillon (UK) Experimental Asking to 110 participants, 72% of them agree that List Blue is more diverse than List Green
  103. 103. 103 Perceptions of Diversity (Porcaro et al., 2022) Pretty Lights (US) Downtempo Technoboy (IT) Hardstyle Bonobo (UK) Downbeat Laurent Garnier (FR) Techno The Blessed Madonna (US) House Helena Hauff (DE) Techno Miss DJAX (NL) Hardcore Beatrice Dillon (UK) Experimental Most of the participants employed a normic concept of diversity…
  104. 104. 104 Perceptions of Diversity (Porcaro et al., 2022) Pretty Lights (US) Downtempo Technoboy (IT) Hardstyle Bonobo (UK) Downbeat Laurent Garnier (FR) Techno The Blessed Madonna (US) House Helena Hauff (DE) Techno Miss DJAX (NL) Hardcore Beatrice Dillon (UK) Experimental Most of the participants employed a normic concept of diversity… … while our model works under an egalitarian concept.
  105. 105. 105 N.B.
  106. 106. 106 MIR and Subjectivity MIR systems include a great level of subjectivity: - Track A and Track B are very similar - Track A is happier than Track B - Playlist A is less diverse than Playlist B - This track is 80% EDM and 20% Tech House
  107. 107. 107 MIR and Subjectivity MIR systems include a great level of subjectivity: - Track A and Track B are very similar - Track A is happier than Track B - Playlist A is less diverse than Playlist B - This track is 80% EDM and 20% Tech House Human annotations are fundamental resources, but here comes the problem…
  108. 108. 108 Agreement and Perceptions - Listeners trained in classical music tend to disagree more on perceived emotions, than those not trained [Schedl et al., 2018] - Preference, familiarity, and lyrics comprehension increase agreement for emotions corresponding to quadrants Q1 and Q3. [Gómez-Cañón et al., 2020] - Current emotional state of users influences their perception of music similarity [Flexer et al., 2021]
  109. 109. 109 Diversity is a means to an end, not an end in itself.
  110. 110. Higher education → prepare students for life (allow them to interact with students different socio-cultural backgrounds). Media studies → reaching democratic goals (e.g., informed citizenry, inclusive public discourse). Economics → foster innovation; hedging against ignorance; mitigating lock-in; accommodating plural perspectives. 110 Impacts of Diversity
  111. 111. Higher education → prepare students for life (allow them to interact with students different socio-cultural backgrounds). Media studies → reaching democratic goals (e.g., informed citizenry, inclusive public discourse). Economics → foster innovation; hedging against ignorance; mitigating lock-in; accommodating plural perspectives. 111 Impacts of Diversity
  112. 112. Higher education → prepare students for life (allow them to interact with students different socio-cultural backgrounds). Media studies → reaching democratic goals (e.g., informed citizenry, inclusive public discourse). Economics → foster innovation; hedging against ignorance; mitigating lock-in; accommodating plural perspectives. 112 Impacts of Diversity
  113. 113. In the MIR field, which are the ends wherein the fostering of diversity may make a difference, that as----------------community we should care about? 113
  114. 114. Part II Values in Practice: Fairness 114
  115. 115. ● General notions of Fairness and Bias ● Example of fairness studies in MIR and recommendations: ○ What is fair in Music recommendation? ○ Gender imbalance in music Recsys ○ Fairness for users ○ Music record labels ○ Cover song identification Overview
  116. 116. ● Fairness: Identification and prevention of situations where individuals or a group are systematically disfavored, e.g., based on their characteristics ● Fairness is an interdisciplinary topic ● It is based on normative ideas of what it means to treat people “fairly” Introduction - Fairness
  117. 117. May refer to: ● societal biases ● statistical biases ... Bias as an objective deviation/imbalance in a model, measure or data compared to an intended target Introduction - Bias
  118. 118. 118 Bias: we refer to a fact of the system without making any inherently normative judgment Fairness: we use it to discuss the normative aspects of the system and its effects
  119. 119. Fairness vs Bias BIAS FAIRNESS
  120. 120. ● No single general answer to fairness, ethics, etc. ● Fairness hardly will be defined only with clear, universal objectives but: ● We can identify/measure/mitigate specific harms ● Then, look for general strategies or principles What is fair? Fair for who?
  121. 121. The ML community has begun to address claims of algorithmic bias under the rubric of fairness, accountability, and transparency. We should clearly define: ● Assumptions ● Goals or principles that we are based on ● Methods used to evaluate the ability to meet those goals ● How to assess the relevance and appropriateness of those goals to the social problem in question What is fair? Fair for who? [*] Redistribution and Rekognition: A Feminist Critique of Algorithmic Fairness [West, 2020]
  122. 122. The ML community has begun to address claims of algorithmic bias under the rubric of fairness, accountability, and transparency. We should clearly define: ● Assumptions ● Goals or principles that we are based on ● Methods used to evaluate the ability to meet those goals ● How to assess the relevance and appropriateness of those goals to the social problem in question What is fair? Fair for who? [*] Redistribution and Rekognition: A Feminist Critique of Algorithmic Fairness [West, 2020]
  123. 123. ● Distributional harm: Harmful distribution of resources or outcomes ● Representational harm: Harmful internal or external representation General Concepts - I Popularity Recommended Fairness in Information Access Systems [Ekstrand et al., 2022]
  124. 124. ● Distributional harm: Harmful distribution of resources or outcomes ● Representational harm: Harmful internal or external representation General Concepts - I Jamaica reggae Mississippi blues
  125. 125. ● Individual fairness: Similar individuals have similar experience/treatment ● Group fairness: Different groups have similar experience/treatment ○ Sensitive attribute: Attribute identifying group membership ○ Disparate treatment: Groups intentionally treated differently ○ Disparate impact: Groups receive outcomes at different rates ○ Disparate mistreatment: Groups receive erroneous effects at different rates General Concepts - II
  126. 126. ● Individual fairness: Similar individuals have similar experience/treatment ● Group fairness: Different groups have similar experience/treatment ○ Sensitive attribute: Attribute identifying group membership ○ Disparate treatment: Groups intentionally treated differently ○ Disparate impact: Groups receive outcomes at different rates ○ Disparate mistreatment: Groups receive erroneous effects at different rates General Concepts - II
  127. 127. ● Individual fairness: Similar individuals have similar experience/treatment ● Group fairness: Different groups have similar experience/treatment ○ Sensitive attribute: Attribute identifying group membership ○ Disparate treatment: Groups intentionally treated differently ○ Disparate impact: Groups receive outcomes at different rates ○ Disparate mistreatment: Groups receive erroneous effects at different rates General Concepts - II
  128. 128. ● Individual fairness: Similar individuals have similar experience/treatment ● Group fairness: Different groups have similar experience/treatment ○ Sensitive attribute: Attribute identifying group membership ○ Disparate treatment: Groups intentionally treated differently ○ Disparate impact: Groups receive outcomes at different rates ○ Disparate mistreatment: Groups receive erroneous effects at different rates General Concepts - II
  129. 129. ● Individual fairness: Similar individuals have similar experience/treatment ● Group fairness: Different groups have similar experience/treatment ○ Sensitive attribute: Attribute identifying group membership ○ Disparate treatment: Groups intentionally treated differently ○ Disparate impact: Groups receive outcomes at different rates ○ Disparate mistreatment: Groups receive erroneous effects at different rates General Concepts - II
  130. 130. ● We want to make sure that different applications are free of undesired biases and prevent any kind of discrimination ● Usually RS are user-centered: Maximize user satisfaction is the main goal ● Consider systems’ impact on multiple groups of individuals ● How platforms can be fair for the artists? Provider vs Consumer Fairness
  131. 131. ● We want to make sure that different applications are free of undesired biases and prevent any kind of discrimination ● Usually RS are user-centered: Maximize user satisfaction is the main goal ● Consider systems’ impact on multiple groups of individuals ● How platforms can be fair for the artists? Provider vs Consumer Fairness
  132. 132. ● We want to make sure that different applications are free of undesired biases and prevent any kind of discrimination ● Usually RS are user-centered: Maximize user satisfaction is the main goal ● Consider systems’ impact on multiple groups of individuals ● How platforms can be fair for the artists? Provider vs Consumer Fairness
  133. 133. ● We want to make sure that different applications are free of undesired biases and prevent any kind of discrimination ● Usually RS are user-centered: Maximize user satisfaction is the main goal ● Consider systems’ impact on multiple groups of individuals ● How platforms can be fair for the artists? Provider vs Consumer Fairness
  134. 134. ● The leading questions are: ○ How do artists feel affected by current systems? ○ What do artists want to see improved in future? ○ What concrete ideas or even algorithmic decisions could be adopted in future systems? What is fair? exploring the artists' perspective on the fairness of music streaming platforms [Ferraro et al., 2021] What is fair in Music recommendation?
  135. 135. 135 Design of semi-structured interviews Documents preparation Selected participants based on diversity Conducting interviews Interview transcription Coding Analysis of Results Methodology - Semi-structured interviews and Qualitative Content Analysis Qualitative content analysis [Mayring, 2004]
  136. 136. 136 Topics
  137. 137. - What do artists think about the platforms' opportunities and power to influence the users' listening behavior? - We see a strong tendency against influencing users Results - Interviews
  138. 138. 138 "I don’t see why we should tell the users which genres they should listen to"
  139. 139. - All artists agreed that the platforms should promote content by female artists to reach a gender balance in what users consume Results - Interviews
  140. 140. 140 "I think there should be actions to correct some biases. The question is in which cases it should be corrected and in which not. In heavy metal music, I imagine that there aren’t many female singers. Maybe we could give them more visibility, otherwise they would never be seen"
  141. 141. 141 "[...] the population of the world is 50% women. So it would be ridiculous if the system wouldn’t recommend them.”
  142. 142. Based the interviews we focus on the topic of gender balance ● Qualitative approach: ○ Artists express that platforms should promote gender balance ● Quantitative approach: ○ Using 2 datasets of user listening data, evaluate how different users’ history is compared to the (CF) recommendations ○ What is the effect in user consumption in the long term? 142 Break the Loop: Gender Imbalance in Music Recommenders [Ferraro, et al., 2021] Gender Imbalance in Music Recsys
  143. 143. - Identify: We identify that these artists expect a balanced representation of artists according to their gender - Measure: Representation, Exposure and Utility - Mitigate: Simulation approach to compare longitudinal effects considering feedback loops. Gender Imbalance in Music Recsys
  144. 144. - 3 sources of unfairness: - Representation: Similar proportion to users history - Exposure: Users have to put more effort to reach the first female artist in the recommendation - Utiliy: Model make more “mistakes” for female artists than for male artists Similar results in: Exploring Artist Gender Bias in Music Recommendation [Shakespeare et al., 2020] Gender Imbalance in Music Recsys - Results
  145. 145. Trained model Dataset with user interactions Simulate user interactions
  146. 146. Trained model ... Generate recommendations for users (User X) Simulate user interactions
  147. 147. Trained model Simulate user interactions with the recommendations ... (User X) Simulate user interactions
  148. 148. Trained model Retrain the model (User X) Simulate user interactions
  149. 149. Trained model … repeat the process and evaluate recommendations ... (User X) Simulate user interactions
  150. 150. 150 Feedback Loop
  151. 151. 151 Feedback Loop
  152. 152. - Exploring Longitudinal Effects of Session-based Recommendations. In Proceedings of the 14th ACM Conference on Recommender Systems [Ferraro, Jannach & Serra, 2020] - Consumption and performance: understanding longitudinal dynamics of recommender systems via an agent-based simulation framework [Zhang et al., 2019] - Longitudinal impact of preference biases on recommender systems’ performance [Zhou, Zhang, & Adomavicius, 2021] - The Dual Echo Chamber: Modeling Social Media Polarization for Interventional Recommending [Donkers & Ziegler, 2021] 152 References - Simulations
  153. 153. - Exploring User Opinions of Fairness in Recommender Systems [Smith et al., 2020] - Fairness and Transparency in Recommendation: The Users’ Perspective [Sonboli et al., 2021] - The multisided complexity of fairness in recommender systems [Sonboli et al., 2022] References - Fairness for Users
  154. 154. - The Unfairness of Popularity Bias in Music Recommendation: A Reproducibility Study [Kowald, Schedl & Lex, 2020] - Support the underground: characteristics of beyond-mainstream music listeners [Kowald et al., 2021] - Analyzing Item Popularity Bias of Music Recommender Systems: Are Different Genders Equally Affected? [Lesota et al., 2021] References - Unfairness for users
  155. 155. Bias and Feedback Loops in Music Recommendation: Studies on Record Label Impact [Knees, Ferraro & Hübler, 2022] Music Record Labels
  156. 156. Music Record Labels Bias and Feedback Loops in Music Recommendation: Studies on Record Label Impact [Knees, Ferraro & Hübler, 2022]
  157. 157. Music Record Labels Bias and Feedback Loops in Music Recommendation: Studies on Record Label Impact [Knees, Ferraro & Hübler, 2022]
  158. 158. Assessing Algorithmic Biases for Musical Version Identification [Yesiler et al., 2022] - Multiple stakeholders: Composers and other artists - Compare 5 systems - Use different attributes of the songs/artists They Identify algorithmic biases in cover song id. systems that may lead to differences in performance (dis)favor different groups of artists, e.g.: - Tracks after 2000 work better - Larger gap between release year lead to worse performance - Learning-based systems perform better for underrepresented artists Cover Song Identification
  159. 159. Assessing Algorithmic Biases for Musical Version Identification [Yesiler et al., 2022] - Multiple stakeholders: Composers and other artists - Compare 5 systems - Use different attributes of the songs/artists They Identify algorithmic biases in cover song id. systems that may lead to differences in performance (dis)favor different groups of artists, e.g.: - Tracks after 2000 work better - Larger gap between release year lead to worse performance - Learning-based systems perform better for underrepresented artists Cover Song Identification
  160. 160. References 160
  161. 161. 161 - Celik, I., Torre, I., Koceva, F., Bauer, C., Zangerle, E., & Knijnenburg, B. (2018). UMAP 2018 Intelligent User-adapted Interfaces: Design and Multi-modal Evaluation (IUadaptMe). Workshop Chairs’ Welcome & Organization. Proceedings of the 26th Conference on User Modeling, Adaptation and Personalization (UMAP 2018). - Cleverdon, C.W. (1991). The significance of the cranfield tests on index languages. International ACM SIGIR conference on research and development in information retrieval, 3–12. - Flexer, A., Lallai, T., & Rašl, K. (2021). On Evaluation of Inter- and Intra-Rater Agreement in Music Recommendation. Transactions of the International Society for Music Information Retrieval, 4(1), 182–194. https://doi.org/10.5334/tismir.107 - Gómez-Cañón, J. S., Cano, E., Herrera, P., & Gómez, E. (2020). Joyful for You and Tender for Us: The Influence of Individual Characteristics and Language on Emotion Labeling and Classification. Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR 2020). - Jannach, D. & Bauer, C. (2020). Escaping the McNamara Fallacy: Toward More Impactful Recommender Systems Research. AI Magazine, 41(4), 79-95. https://doi.org/10.1609/aimag.v41i4.5312 - Jannach, D. & Jugovac, M. (2019). Measuring the Business Value of Recommender Systems. ACM Transactions on Management Information Systems 10(4), 1-23. https://doi.org/10.1145/ 3370082
  162. 162. 162 - Lee, J. H. & Cunningham, S. J. (2013). Toward an understanding of the history and impact of user studies in music information retrieval. Journal of Intelligent Information Systems. 41(3), 499-521. https://doi.org/10.1007/s10844-013-0259-2 - Liu, J. & Hu, X. (2010). User-centered music information retrieval evaluation. Proceedings of the Joint Conference on Digital Libraries (JCDL) Workshop: Music Information Retrieval for theMmasses. - Miron, M. & Porcaro, L. (2021). MIR Evaluation Practices. Zenodo. https://doi.org/10.5281/zenodo.4611731 - Porcaro, L., & Gómez, E. (2019). 20 Years of Playlists: a Statistical Analysis on Popularity and Diversity. Proceedings of the 20th International Symposium on Music Information Retrieval, ISMIR 2019, July, 4–11. - Porcaro, L., Gómez, E., & Castillo, C. (2022). Perceptions of Diversity in Electronic Music: The Impact of Listener, Artist, and Track Characteristics. Proc. ACM Hum.-Comput. Interact., 6(CSCW1). https://doi.org/10.1145/3512956 - Schedl, M., Gomez, E., Trent, E. S., Tkalcic, M., Eghbal-Zadeh, H., & Martorell, A. (2018). On the interrelation between listener characteristics and the perception of emotions in classical orchestra music. IEEE Transactions on Affective Computing, 9(4), 507–525. https://doi.org/10.1109/TAFFC.2017.2663421
  163. 163. 163 - Schedl, M., Flexer, A. & Urbano, J. (2013). The neglected user in music information retrieval research. Journal of Intelligent Information Systems, 41, 523–539. https://doi.org/10.1007/s10844-013-0247-6 - Simpson, J. (1949). Measurements of diversity. Nature, 163, 688. - Steel, D., Fazelpour, S., Gillette, K., Crewe, B., & Burgess, M. (2018). Multiple diversity concepts and their ethical-epistemic implications. European Journal for Philosophy of Science, 8(3), 761–780. https://doi.org/10.1007/s13194-018-0209-5 - Strathern, M. (1997). “Improving ratings”: audit in the British University system. European Review, 5(03), 305. https://doi.org/10.1017/s1062798700002660 - Urbano, J., Schedl, M., & Serra, X. (2013). Evaluation in music information retrieval. Journal of Intelligent Information Systems, 41(3), 345–369. https://doi.org/10.1007/s10844-013-0249-4 - Voorhees, E. M. (2002). The Philosophy of Information Retrieval Evaluation. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds) Evaluation of Cross-Language Information Retrieval Systems. CLEF 2001. Lecture Notes in Computer Science, vol 2406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45691-0_34 - Zangerle, E. & Bauer, C. (2022). Evaluating recommender systems: survey and framework. ACM Computing Surveys. https://doi.org/10.1145/3556536

×