Researchers and practitioners from different disciplines have highlighted the ethical and legal challenges posed by the use of machine learned models and data-driven systems, and the potential for such systems to discriminate against certain population groups, due to biases in algorithmic decision-making systems. This tutorial presents an overview of algorithmic bias / discrimination issues observed over the last few years and the lessons learned, key regulations and laws, and evolution of techniques for achieving fairness in machine learning systems. We motivate the need for adopting a "fairness by design" approach (as opposed to viewing algorithmic bias / fairness considerations as an afterthought), when developing machine learning based models and systems for different consumer and enterprise applications. Then, we focus on the application of fairness-aware machine learning techniques in practice by presenting non-proprietary case studies from different technology companies. Finally, based on our experiences working on fairness in machine learning at companies such as Facebook, Google, LinkedIn, and Microsoft, we present open problems and research directions for the data mining / machine learning community.
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...Krishnaram Kenthapadi
Researchers and practitioners from different disciplines have highlighted the ethical and legal challenges posed by the use of machine learned models and data-driven systems, and the potential for such systems to discriminate against certain population groups, due to biases in algorithmic decision-making systems. This tutorial presents an overview of algorithmic bias / discrimination issues observed over the last few years and the lessons learned, key regulations and laws, and evolution of techniques for achieving fairness in machine learning systems. We will motivate the need for adopting a "fairness by design" approach (as opposed to viewing algorithmic bias / fairness considerations as an afterthought), when developing machine learning based models and systems for different consumer and enterprise applications. Then, we will focus on the application of fairness-aware machine learning techniques in practice by presenting non-proprietary case studies from different technology companies. Finally, based on our experiences working on fairness in machine learning at companies such as Facebook, Google, LinkedIn, and Microsoft, we will present open problems and research directions for the data mining / machine learning community.
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...Krishnaram Kenthapadi
Researchers and practitioners from different disciplines have highlighted the ethical and legal challenges posed by the use of machine learned models and data-driven systems, and the potential for such systems to discriminate against certain population groups, due to biases in algorithmic decision-making systems. This tutorial presents an overview of algorithmic bias / discrimination issues observed over the last few years and the lessons learned, key regulations and laws, and evolution of techniques for achieving fairness in machine learning systems. We will motivate the need for adopting a "fairness by design" approach (as opposed to viewing algorithmic bias / fairness considerations as an afterthought), when developing machine learning based models and systems for different consumer and enterprise applications. Then, we will focus on the application of fairness-aware machine learning techniques in practice by presenting non-proprietary case studies from different technology companies. Finally, based on our experiences working on fairness in machine learning at companies such as Facebook, Google, LinkedIn, and Microsoft, we will present open problems and research directions for the data mining / machine learning community.
Please cite as:
Sarah Bird, Ben Hutchinson, Krishnaram Kenthapadi, Emre Kiciman, and Margaret Mitchell. Fairness-Aware Machine Learning: Practical Challenges and Lessons Learned. WSDM 2019.
[Video available at https://sites.google.com/view/ResponsibleAITutorial]
Artificial Intelligence is increasingly being used in decisions and processes that are critical for individuals, businesses, and society, especially in areas such as hiring, lending, criminal justice, healthcare, and education. Recent ethical challenges and undesirable outcomes associated with AI systems have highlighted the need for regulations, best practices, and practical tools to help data scientists and ML developers build AI systems that are secure, privacy-preserving, transparent, explainable, fair, and accountable – to avoid unintended and potentially harmful consequences and compliance challenges.
In this tutorial, we will present an overview of responsible AI, highlighting model explainability, fairness, and privacy in AI, key regulations/laws, and techniques/tools for providing understanding around AI/ML systems. Then, we will focus on the application of explainability, fairness assessment/unfairness mitigation, and privacy techniques in industry, wherein we present practical challenges/guidelines for using such techniques effectively and lessons learned from deploying models for several web-scale machine learning and data mining applications. We will present case studies across different companies, spanning many industries and application domains. Finally, based on our experiences in industry, we will identify open problems and research directions for the AI community.
[Video recording available at https://www.youtube.com/playlist?list=PLewjn-vrZ7d3x0M4Uu_57oaJPRXkiS221]
Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, and critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling.
As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks.
In this tutorial, we present an overview of model interpretability and explainability in AI, key regulations / laws, and techniques / tools for providing explainability as part of AI/ML systems. Then, we focus on the application of explainability techniques in industry, wherein we present practical challenges / guidelines for effectively using explainability techniques and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We present case studies across different companies, spanning application domains such as search & recommendation systems, hiring, sales, and lending. Finally, based on our experiences in industry, we identify open problems and research directions for the data mining / machine learning community.
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)Krishnaram Kenthapadi
[Video available at https://sites.google.com/view/ResponsibleAITutorial]
Artificial Intelligence is increasingly being used in decisions and processes that are critical for individuals, businesses, and society, especially in areas such as hiring, lending, criminal justice, healthcare, and education. Recent ethical challenges and undesirable outcomes associated with AI systems have highlighted the need for regulations, best practices, and practical tools to help data scientists and ML developers build AI systems that are secure, privacy-preserving, transparent, explainable, fair, and accountable – to avoid unintended and potentially harmful consequences and compliance challenges.
In this tutorial, we will present an overview of responsible AI, highlighting model explainability, fairness, and privacy in AI, key regulations/laws, and techniques/tools for providing understanding around AI/ML systems. Then, we will focus on the application of explainability, fairness assessment/unfairness mitigation, and privacy techniques in industry, wherein we present practical challenges/guidelines for using such techniques effectively and lessons learned from deploying models for several web-scale machine learning and data mining applications. We will present case studies across different companies, spanning many industries and application domains. Finally, based on our experiences in industry, we will identify open problems and research directions for the AI community.
How do we protect privacy of users when building large-scale AI based systems? How do we develop machine learned models and systems taking fairness, accountability, and transparency into account? With the ongoing explosive growth of AI/ML models and systems, these are some of the ethical, legal, and technical challenges encountered by researchers and practitioners alike. In this talk, we will first motivate the need for adopting a "fairness and privacy by design" approach when developing AI/ML models and systems for different consumer and enterprise applications. We will then focus on the application of fairness-aware machine learning and privacy-preserving data mining techniques in practice, by presenting case studies spanning different LinkedIn applications (such as fairness-aware talent search ranking, privacy-preserving analytics, and LinkedIn Salary privacy & security design), and conclude with the key takeaways and open challenges.
Responsible AI in Industry: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
How do we develop machine learning models and systems taking fairness, accuracy, explainability, and transparency into account? How do we protect the privacy of users when building large-scale AI based systems? Model fairness and explainability and protection of user privacy are considered prerequisites for building trust and adoption of AI systems in high stakes domains such as hiring, lending, and healthcare. We will first motivate the need for adopting a “fairness, explainability, and privacy by design” approach when developing AI/ML models and systems for different consumer and enterprise applications from the societal, regulatory, customer, end-user, and model developer perspectives. We will then focus on the application of responsible AI techniques in practice through industry case studies. We will discuss the sociotechnical dimensions and practical challenges, and conclude with the key takeaways and open challenges.
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...Krishnaram Kenthapadi
Researchers and practitioners from different disciplines have highlighted the ethical and legal challenges posed by the use of machine learned models and data-driven systems, and the potential for such systems to discriminate against certain population groups, due to biases in algorithmic decision-making systems. This tutorial presents an overview of algorithmic bias / discrimination issues observed over the last few years and the lessons learned, key regulations and laws, and evolution of techniques for achieving fairness in machine learning systems. We will motivate the need for adopting a "fairness by design" approach (as opposed to viewing algorithmic bias / fairness considerations as an afterthought), when developing machine learning based models and systems for different consumer and enterprise applications. Then, we will focus on the application of fairness-aware machine learning techniques in practice by presenting non-proprietary case studies from different technology companies. Finally, based on our experiences working on fairness in machine learning at companies such as Facebook, Google, LinkedIn, and Microsoft, we will present open problems and research directions for the data mining / machine learning community.
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...Krishnaram Kenthapadi
Researchers and practitioners from different disciplines have highlighted the ethical and legal challenges posed by the use of machine learned models and data-driven systems, and the potential for such systems to discriminate against certain population groups, due to biases in algorithmic decision-making systems. This tutorial presents an overview of algorithmic bias / discrimination issues observed over the last few years and the lessons learned, key regulations and laws, and evolution of techniques for achieving fairness in machine learning systems. We will motivate the need for adopting a "fairness by design" approach (as opposed to viewing algorithmic bias / fairness considerations as an afterthought), when developing machine learning based models and systems for different consumer and enterprise applications. Then, we will focus on the application of fairness-aware machine learning techniques in practice by presenting non-proprietary case studies from different technology companies. Finally, based on our experiences working on fairness in machine learning at companies such as Facebook, Google, LinkedIn, and Microsoft, we will present open problems and research directions for the data mining / machine learning community.
Please cite as:
Sarah Bird, Ben Hutchinson, Krishnaram Kenthapadi, Emre Kiciman, and Margaret Mitchell. Fairness-Aware Machine Learning: Practical Challenges and Lessons Learned. WSDM 2019.
[Video available at https://sites.google.com/view/ResponsibleAITutorial]
Artificial Intelligence is increasingly being used in decisions and processes that are critical for individuals, businesses, and society, especially in areas such as hiring, lending, criminal justice, healthcare, and education. Recent ethical challenges and undesirable outcomes associated with AI systems have highlighted the need for regulations, best practices, and practical tools to help data scientists and ML developers build AI systems that are secure, privacy-preserving, transparent, explainable, fair, and accountable – to avoid unintended and potentially harmful consequences and compliance challenges.
In this tutorial, we will present an overview of responsible AI, highlighting model explainability, fairness, and privacy in AI, key regulations/laws, and techniques/tools for providing understanding around AI/ML systems. Then, we will focus on the application of explainability, fairness assessment/unfairness mitigation, and privacy techniques in industry, wherein we present practical challenges/guidelines for using such techniques effectively and lessons learned from deploying models for several web-scale machine learning and data mining applications. We will present case studies across different companies, spanning many industries and application domains. Finally, based on our experiences in industry, we will identify open problems and research directions for the AI community.
[Video recording available at https://www.youtube.com/playlist?list=PLewjn-vrZ7d3x0M4Uu_57oaJPRXkiS221]
Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, and critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling.
As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks.
In this tutorial, we present an overview of model interpretability and explainability in AI, key regulations / laws, and techniques / tools for providing explainability as part of AI/ML systems. Then, we focus on the application of explainability techniques in industry, wherein we present practical challenges / guidelines for effectively using explainability techniques and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We present case studies across different companies, spanning application domains such as search & recommendation systems, hiring, sales, and lending. Finally, based on our experiences in industry, we identify open problems and research directions for the data mining / machine learning community.
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)Krishnaram Kenthapadi
[Video available at https://sites.google.com/view/ResponsibleAITutorial]
Artificial Intelligence is increasingly being used in decisions and processes that are critical for individuals, businesses, and society, especially in areas such as hiring, lending, criminal justice, healthcare, and education. Recent ethical challenges and undesirable outcomes associated with AI systems have highlighted the need for regulations, best practices, and practical tools to help data scientists and ML developers build AI systems that are secure, privacy-preserving, transparent, explainable, fair, and accountable – to avoid unintended and potentially harmful consequences and compliance challenges.
In this tutorial, we will present an overview of responsible AI, highlighting model explainability, fairness, and privacy in AI, key regulations/laws, and techniques/tools for providing understanding around AI/ML systems. Then, we will focus on the application of explainability, fairness assessment/unfairness mitigation, and privacy techniques in industry, wherein we present practical challenges/guidelines for using such techniques effectively and lessons learned from deploying models for several web-scale machine learning and data mining applications. We will present case studies across different companies, spanning many industries and application domains. Finally, based on our experiences in industry, we will identify open problems and research directions for the AI community.
How do we protect privacy of users when building large-scale AI based systems? How do we develop machine learned models and systems taking fairness, accountability, and transparency into account? With the ongoing explosive growth of AI/ML models and systems, these are some of the ethical, legal, and technical challenges encountered by researchers and practitioners alike. In this talk, we will first motivate the need for adopting a "fairness and privacy by design" approach when developing AI/ML models and systems for different consumer and enterprise applications. We will then focus on the application of fairness-aware machine learning and privacy-preserving data mining techniques in practice, by presenting case studies spanning different LinkedIn applications (such as fairness-aware talent search ranking, privacy-preserving analytics, and LinkedIn Salary privacy & security design), and conclude with the key takeaways and open challenges.
Responsible AI in Industry: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
How do we develop machine learning models and systems taking fairness, accuracy, explainability, and transparency into account? How do we protect the privacy of users when building large-scale AI based systems? Model fairness and explainability and protection of user privacy are considered prerequisites for building trust and adoption of AI systems in high stakes domains such as hiring, lending, and healthcare. We will first motivate the need for adopting a “fairness, explainability, and privacy by design” approach when developing AI/ML models and systems for different consumer and enterprise applications from the societal, regulatory, customer, end-user, and model developer perspectives. We will then focus on the application of responsible AI techniques in practice through industry case studies. We will discuss the sociotechnical dimensions and practical challenges, and conclude with the key takeaways and open challenges.
Responsible Data Use in AI - core tech pillarsSofus Macskássy
In this deck, we cover four core pillars of responsible data use in AI, including fairness, transparency, explainability -- as well as data governance.
Nick Schmidt of BLDS, LLC to the Maryland AI meetup, June 4, 2019 (https://www.meetup.com/Maryland-AI). Nick discusses ideas of fairness and how they apply to machine learning. He explores recent academic work on identifying and mitigating bias, and how his work in lending and employment can be applied to other industries. Nick explains how to measure whether an algorithm is fair and also demonstrate the techniques that model builders can use to ameliorate bias when it is found.
Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, as well as critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling.
As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks.
In this tutorial, we will first motivate the need for model interpretability and explainability in AI from societal, legal, customer/end-user, and model developer perspectives. [Note: Due to time constraints, we will not focus on techniques/tools for providing explainability as part of AI/ML systems.] Then, we will focus on the real-world application of explainability techniques in industry, wherein we present practical challenges / implications for using explainability techniques effectively and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We will present case studies across different companies, spanning application domains such as search and recommendation systems, sales, lending, and fraud detection. Finally, based on our experiences in industry, we will identify open problems and research directions for the research community.
This collection of slides are meant as a starting point and tutorial for the ones who want to understand AI Ethics and in particular the challenges around bias and fairness. Furthermore, I have also included studies on how we as humans perceive AI influence in our private as well as working lives.
The impact of AI on society gets bigger and bigger - and it is not all good. We as Data Scientists have to really put in work to not end up in ML hell.
This presentation was given at the Dutch Data Science Week.
Data/AI driven product development: from video streaming to telehealthXavier Amatriain
Healthcare is different from any other application domain, or is it not? While it is true that there are specific aspects, such as high stakes decisions and a complex regulatory framework, that make healthcare somewhat different, it is also the case that many of the lessons learned from building data-driven products in other domains translate remarcably well into healthcare. This is particularly so because healthcare is also a user facing domain, where users can be both patients or healthcare professionals. Given that data has shown to improve user experience while ensuring quality and scalability, few would argue that healthcare cannot benefit from being much more data-driven than it has traditionally been.
In this talk, I described how this experience building impactful data and AI solutions into user facing products for decades can be leveraged to revolutionize telehealth. At Curai, we combine approaches such as state-of-the-art large language models with expert systems in areas such as NLP, vision, and automated diagnosis to augment and scale doctors, and to improve user experience and healthcare outcomes. We will see some of those applications while analyzing the role of data and ML algorithms in making them possible.
Data Con LA 2020
Description
More and more organizations are embracing AI technology by infusing it in their products and services to to differentiate themselves against their competitors. AI is being utilized in some sensitive areas of human life. In this session let's look at some of principles governing adoption of AI in a responsible manner. Why companies are accelerating adoption of AI?
Increasingly organization are accelerating adoption of AI to differentiate their product and services in the market. Outcomes of this digital transformation that we have seen in the areas of optimizing operations, engaging customers, empowering employees and transforming their products and services.
*List some of the sensitive use cases where AI is being applied
*Why governing AI is important and what are those principles?
*How Microsoft is approaching it?
Speaker
Suresh Paulraj, Microsoft, Principal Cloud Solution Architect Data & AI
Ethical Considerations in the Design of Artificial IntelligenceJohn C. Havens
A presentation for IEEE's Ethics Symposium happening in Vancouver, May 2016. Featuring presentations from John C. Havens, Mike Van der Loos, John P. Sullins, and Alan Mackworth.
Preserving privacy of users is a key requirement of web-scale data mining applications and systems such as web search, recommender systems, crowdsourced platforms, and analytics applications, and has witnessed a renewed focus in light of recent data breaches and new regulations such as GDPR. In this tutorial, we will first present an overview of privacy breaches over the last two decades and the lessons learned, key regulations and laws, and evolution of privacy techniques leading to differential privacy definition / techniques. Then, we will focus on the application of privacy-preserving data mining techniques in practice, by presenting case studies such as Apple's differential privacy deployment for iOS / macOS, Google's RAPPOR, LinkedIn Salary, and Microsoft's differential privacy deployment for collecting Windows telemetry. We will conclude with open problems and challenges for the data mining / machine learning community, based on our experiences in industry.
[Video recording available at https://www.youtube.com/playlist?list=PLewjn-vrZ7d3x0M4Uu_57oaJPRXkiS221]
Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, and critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling.
As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks.
In this tutorial, we present an overview of model interpretability and explainability in AI, key regulations / laws, and techniques / tools for providing explainability as part of AI/ML systems. Then, we focus on the application of explainability techniques in industry, wherein we present practical challenges / guidelines for effectively using explainability techniques and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We present case studies across different companies, spanning application domains such as search & recommendation systems, hiring, sales, and lending. Finally, based on our experiences in industry, we identify open problems and research directions for the data mining / machine learning community.
Invited talk on fairness in AI systems at the 2nd Workshop on Interactive Natural Language Technology for Explainable AI co-located with the International Conference on Natural Language Generation, 18/12/2020.
Machine Learning Ml Overview Algorithms Use Cases And ApplicationsSlideTeam
"You can download this product from SlideTeam.net"
Machine Learning ML Overview Algorithms Use Cases and Applications is for the mid level managers giving information about Machine Learning, how Machine Learning works, Machine Learning algorithms and its use cases. You can also learn the difference between Machine learning vs Traditional programming to understand how to implement machine learning in a better way for business growth. https://bit.ly/2ZaVSG9
Ethical Issues in Machine Learning Algorithms. (Part 3)Vladimir Kanchev
The presentation deals with ethical issues in a few currently widely used machine learning (or AI) technologies and algorithms. The ML applications are described in details, their current state of the art, their specific challenges and ethical problems. Current solutions from academic and industrial perspective are given. A mixture of academic and applied sources are used for the presentation - it aims to be more interesting for students and practitioners.
Introduction to the ethics of machine learningDaniel Wilson
A brief introduction to the domain that is variously described as the ethics of machine learning, data science ethics, AI ethics and the ethics of big data. (Delivered as a guest lecture for COMPSCI 361 at the University of Auckland on May 29, 2019)
Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...Saurabh Mishra
This group reviewed data and measurements indicating the positive potential of AI to serve Sustainable Development Goals (SDG’s). Alongside these optimistic inquiries, this group also investigated the risks of AI in areas such as privacy, vulnerable populations, human rights, workplace and organizational policy. The socio-political consequences of AI raise many complex questions which require continued rigorous examination.
Responsible Data Use in AI - core tech pillarsSofus Macskássy
In this deck, we cover four core pillars of responsible data use in AI, including fairness, transparency, explainability -- as well as data governance.
Nick Schmidt of BLDS, LLC to the Maryland AI meetup, June 4, 2019 (https://www.meetup.com/Maryland-AI). Nick discusses ideas of fairness and how they apply to machine learning. He explores recent academic work on identifying and mitigating bias, and how his work in lending and employment can be applied to other industries. Nick explains how to measure whether an algorithm is fair and also demonstrate the techniques that model builders can use to ameliorate bias when it is found.
Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, as well as critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling.
As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks.
In this tutorial, we will first motivate the need for model interpretability and explainability in AI from societal, legal, customer/end-user, and model developer perspectives. [Note: Due to time constraints, we will not focus on techniques/tools for providing explainability as part of AI/ML systems.] Then, we will focus on the real-world application of explainability techniques in industry, wherein we present practical challenges / implications for using explainability techniques effectively and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We will present case studies across different companies, spanning application domains such as search and recommendation systems, sales, lending, and fraud detection. Finally, based on our experiences in industry, we will identify open problems and research directions for the research community.
This collection of slides are meant as a starting point and tutorial for the ones who want to understand AI Ethics and in particular the challenges around bias and fairness. Furthermore, I have also included studies on how we as humans perceive AI influence in our private as well as working lives.
The impact of AI on society gets bigger and bigger - and it is not all good. We as Data Scientists have to really put in work to not end up in ML hell.
This presentation was given at the Dutch Data Science Week.
Data/AI driven product development: from video streaming to telehealthXavier Amatriain
Healthcare is different from any other application domain, or is it not? While it is true that there are specific aspects, such as high stakes decisions and a complex regulatory framework, that make healthcare somewhat different, it is also the case that many of the lessons learned from building data-driven products in other domains translate remarcably well into healthcare. This is particularly so because healthcare is also a user facing domain, where users can be both patients or healthcare professionals. Given that data has shown to improve user experience while ensuring quality and scalability, few would argue that healthcare cannot benefit from being much more data-driven than it has traditionally been.
In this talk, I described how this experience building impactful data and AI solutions into user facing products for decades can be leveraged to revolutionize telehealth. At Curai, we combine approaches such as state-of-the-art large language models with expert systems in areas such as NLP, vision, and automated diagnosis to augment and scale doctors, and to improve user experience and healthcare outcomes. We will see some of those applications while analyzing the role of data and ML algorithms in making them possible.
Data Con LA 2020
Description
More and more organizations are embracing AI technology by infusing it in their products and services to to differentiate themselves against their competitors. AI is being utilized in some sensitive areas of human life. In this session let's look at some of principles governing adoption of AI in a responsible manner. Why companies are accelerating adoption of AI?
Increasingly organization are accelerating adoption of AI to differentiate their product and services in the market. Outcomes of this digital transformation that we have seen in the areas of optimizing operations, engaging customers, empowering employees and transforming their products and services.
*List some of the sensitive use cases where AI is being applied
*Why governing AI is important and what are those principles?
*How Microsoft is approaching it?
Speaker
Suresh Paulraj, Microsoft, Principal Cloud Solution Architect Data & AI
Ethical Considerations in the Design of Artificial IntelligenceJohn C. Havens
A presentation for IEEE's Ethics Symposium happening in Vancouver, May 2016. Featuring presentations from John C. Havens, Mike Van der Loos, John P. Sullins, and Alan Mackworth.
Preserving privacy of users is a key requirement of web-scale data mining applications and systems such as web search, recommender systems, crowdsourced platforms, and analytics applications, and has witnessed a renewed focus in light of recent data breaches and new regulations such as GDPR. In this tutorial, we will first present an overview of privacy breaches over the last two decades and the lessons learned, key regulations and laws, and evolution of privacy techniques leading to differential privacy definition / techniques. Then, we will focus on the application of privacy-preserving data mining techniques in practice, by presenting case studies such as Apple's differential privacy deployment for iOS / macOS, Google's RAPPOR, LinkedIn Salary, and Microsoft's differential privacy deployment for collecting Windows telemetry. We will conclude with open problems and challenges for the data mining / machine learning community, based on our experiences in industry.
[Video recording available at https://www.youtube.com/playlist?list=PLewjn-vrZ7d3x0M4Uu_57oaJPRXkiS221]
Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, and critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling.
As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks.
In this tutorial, we present an overview of model interpretability and explainability in AI, key regulations / laws, and techniques / tools for providing explainability as part of AI/ML systems. Then, we focus on the application of explainability techniques in industry, wherein we present practical challenges / guidelines for effectively using explainability techniques and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We present case studies across different companies, spanning application domains such as search & recommendation systems, hiring, sales, and lending. Finally, based on our experiences in industry, we identify open problems and research directions for the data mining / machine learning community.
Invited talk on fairness in AI systems at the 2nd Workshop on Interactive Natural Language Technology for Explainable AI co-located with the International Conference on Natural Language Generation, 18/12/2020.
Machine Learning Ml Overview Algorithms Use Cases And ApplicationsSlideTeam
"You can download this product from SlideTeam.net"
Machine Learning ML Overview Algorithms Use Cases and Applications is for the mid level managers giving information about Machine Learning, how Machine Learning works, Machine Learning algorithms and its use cases. You can also learn the difference between Machine learning vs Traditional programming to understand how to implement machine learning in a better way for business growth. https://bit.ly/2ZaVSG9
Ethical Issues in Machine Learning Algorithms. (Part 3)Vladimir Kanchev
The presentation deals with ethical issues in a few currently widely used machine learning (or AI) technologies and algorithms. The ML applications are described in details, their current state of the art, their specific challenges and ethical problems. Current solutions from academic and industrial perspective are given. A mixture of academic and applied sources are used for the presentation - it aims to be more interesting for students and practitioners.
Introduction to the ethics of machine learningDaniel Wilson
A brief introduction to the domain that is variously described as the ethics of machine learning, data science ethics, AI ethics and the ethics of big data. (Delivered as a guest lecture for COMPSCI 361 at the University of Auckland on May 29, 2019)
Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...Saurabh Mishra
This group reviewed data and measurements indicating the positive potential of AI to serve Sustainable Development Goals (SDG’s). Alongside these optimistic inquiries, this group also investigated the risks of AI in areas such as privacy, vulnerable populations, human rights, workplace and organizational policy. The socio-political consequences of AI raise many complex questions which require continued rigorous examination.
e-SIDES presentation at Leiden University 21/09/2017e-SIDES.eu
On September 21st the eLaw team member of e-SIDES, Magdalena Jozwiak, made a presentation of the e-SIDES project at a lunch event at the Leiden University’s Law Faculty. The event, organized within the Interaction Between Legal Systems research theme, attracted an interdisciplinary audience and was followed by a discussion on e-SIDES, its goals and approaches.
Responsible AI in Industry: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
How do we develop machine learning models and systems taking fairness, accuracy, explainability, and transparency into account? How do we protect the privacy of users when building large-scale AI based systems? Model fairness and explainability and protection of user privacy are considered prerequisites for building trust and adoption of AI systems in high stakes domains such as hiring, lending, and healthcare. We will first motivate the need for adopting a “fairness, explainability, and privacy by design” approach when developing AI/ML models and systems for different consumer and enterprise applications from the societal, regulatory, customer, end-user, and model developer perspectives. We will then focus on the application of responsible AI techniques in practice through industry case studies. We will discuss the sociotechnical dimensions and practical challenges, and conclude with the key takeaways and open challenges.
In AI We Trust? Ethics & Bias in Machine Learning
Hosted by Seven Peaks Ventures and Fast Forward Labs
September 2016
Decision-making about critical life stages (college admissions, creditworthiness, employability, jail sentencing) is rapidly becoming centralized and predicted by automated systems like machine learning models. As Data Scientists, the creators of those models, how do we take responsibility for those decisions? How do we define our goals, and how do we measure the effect? As this presents a unique opportunity and risk to businesses, we all become invested in the answers to these questions. Here, I focus on the tactical elements of measuring fairness as well as the forward-looking concerns and opportunities this paradigmatic change presents.
https://www.eventbrite.com/e/in-ai-we-trust-ethics-bias-in-machine-learning-tickets-26313436196#
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2019-embedded-vision-summit-mallick
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Audrey Jill Boguchwal, Senior Product Manager at Samasource, presents the "Practical Approaches to Training Data Strategy: Bias, Legal and Ethical Considerations" tutorial at the May 2019 Embedded Vision Summit.
Recent McKinsey research cites the top five limitations that prevent companies from adopting AI technology. Training data strategy is a common thread. Companies face challenges obtaining enough AI training data, developing strategies for robust data quality and ensuring that bias does not occur.
In this presentation, Boguchwal explores training data strategies that avoid bias in the data and that consider legal and ethical factors. She explains common types of bias, how bias can creep into datasets, the impact of bias, how to avoid bias and how to test your model for bias. She discusses legal and ethical considerations in data sourcing, including real cases where legal and ethical complications can arise, the impact of these complications and best practices for avoiding or mitigating them.
Ethical Issues in Machine Learning Algorithms (Part 2)Vladimir Kanchev
The presentation deals with types of biases found in AI/ML systems - data bias, algorithmic bias, and lack of interpretability. Reasons for their appearances are given, and major approaches for their reduction.
Talk on Algorithmic Bias given at York University (Canada) on March 11, 2019. This is a shorter version of an interactive workshop presented at University of Minnesota, Duluth in Feb 2019.
Spring Splash 3.4.2019: When AI Meets Ethics by Meeri Haataja Saidot
Meeri Haataja's keyote 'When AI Meets Ethics' at Keväthumaus 2019 / Spring Splash 2019 (organised by Väestörekisterikeskus / Population Register Centre).
Ethical Issues in Machine Learning Algorithms. (Part 1)Vladimir Kanchev
This presentation describes recent ethical issues related to AI and ML algorithms. Its focus is data and algorithmic bias, algorithmic interpretability and how GDPR relates to these issues.
Similar to Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KDD 2019 Tutorial) (20)
Amazon SageMaker Clarify (https://aws.amazon.com/sagemaker/clarify/) provides machine learning developers with greater visibility into their training data and models so they can identify and limit bias and explain predictions. SageMaker Clarify detects potential bias during data preparation, after model training, and in your deployed model by examining attributes you specify. For instance, you can check for bias related to age in your initial dataset or in your trained model and receive a detailed report that quantifies different types of possible bias. SageMaker Clarify also includes feature importance graphs that help you explain model predictions and produces reports which can be used to support internal presentations or to identify issues with your model that you can take steps to correct.
For more information on Amazon SageMaker Clarify, please refer these links: (1) https://aws.amazon.com/sagemaker/clarify (2) https://aws.amazon.com/blogs/aws/new-amazon-sagemaker-clarify-detects-bias-and-increases-the-transparency-of-machine-learning-models (3) https://github.com/aws/amazon-sagemaker-clarify (4) Discussion and demo: https://youtu.be/cQo2ew0DQw0
Acknowledgments: Amazon SageMaker Clarify core team, Amazon AWS AI team, and partners across Amazon
Privacy in AI/ML Systems: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
How do we protect the privacy of users when building large-scale AI based systems? How do we develop machine learning models and systems taking fairness, accuracy, explainability, and transparency into account? Model fairness and explainability and protection of user privacy are considered prerequisites for building trust and adoption of AI systems in high stakes domains. We will first motivate the need for adopting a “fairness, explainability, and privacy by design” approach when developing AI/ML models and systems for different consumer and enterprise applications from the societal, regulatory, customer, end-user, and model developer perspectives. We will then focus on the application of privacy-preserving AI techniques in practice through industry case studies. We will discuss the sociotechnical dimensions and practical challenges, and conclude with the key takeaways and open challenges.
Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, and critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling.
As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks.
In this tutorial, we present an overview of model interpretability and explainability in AI, key regulations / laws, and techniques / tools for providing explainability as part of AI/ML systems. Then, we focus on the application of explainability techniques in industry, wherein we present practical challenges / guidelines for effectively using explainability techniques and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We present case studies across different companies, spanning application domains such as search & recommendation systems, sales, lending, and fraud detection. Finally, based on our experiences in industry, we identify open problems and research directions for the data mining / machine learning community.
How do we protect privacy of users when building large-scale AI based systems? How do we develop machine learned models and systems taking fairness, accountability, and transparency into account? With the ongoing explosive growth of AI/ML models and systems, these are some of the ethical, legal, and technical challenges encountered by researchers and practitioners alike. In this talk, we will first motivate the need for adopting a "fairness and privacy by design" approach when developing AI/ML models and systems for different consumer and enterprise applications. We will then focus on the application of fairness-aware machine learning and privacy-preserving data mining techniques in practice, by presenting case studies spanning different LinkedIn applications (such as fairness-aware talent search ranking, privacy-preserving analytics, and LinkedIn Salary privacy & security design), and conclude with the key takeaways and open challenges.
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)Krishnaram Kenthapadi
Preserving privacy of users is a key requirement of web-scale data mining applications and systems such as web search, recommender systems, crowdsourced platforms, and analytics applications, and has witnessed a renewed focus in light of recent data breaches and new regulations such as GDPR. In this tutorial, we will first present an overview of privacy breaches over the last two decades and the lessons learned, key regulations and laws, and evolution of privacy techniques leading to differential privacy definition / techniques. Then, we will focus on the application of privacy-preserving data mining techniques in practice, by presenting case studies such as Apple's differential privacy deployment for iOS / macOS, Google's RAPPOR, LinkedIn Salary, and Microsoft's differential privacy deployment for collecting Windows telemetry. We will conclude with open problems and challenges for the data mining / machine learning community, based on our experiences in industry.
How do we protect privacy of users in large-scale systems? How do we ensure fairness and transparency when developing machine learned models? With the ongoing explosive growth of AI/ML models and systems, these are some of the ethical and legal challenges encountered by researchers and practitioners alike. In this talk (presented at QConSF 2018), we first present an overview of privacy breaches as well as algorithmic bias / discrimination issues observed in the Internet industry over the last few years and the lessons learned, key regulations and laws, and evolution of techniques for achieving privacy and fairness in data-driven systems. We motivate the need for adopting a "privacy and fairness by design" approach when developing data-driven AI/ML models and systems for different consumer and enterprise applications. We also focus on the application of privacy-preserving data mining and fairness-aware machine learning techniques in practice, by presenting case studies spanning different LinkedIn applications, and conclude with the key takeaways and open challenges.
This talk provides an overview of privacy-preserving analytics and data mining systems at LinkedIn, highlighting the practical challenges/requirements, techniques, and lessons learned from deployment. The first part presents a framework to compute robust, privacy-preserving analytics, while the second part focuses on the privacy challenges/design for a large crowdsourced system (LinkedIn Salary). This presentation is an expanded version of the talk given at the Differential Privacy Deployed workshop, co-organized by Cynthia Dwork and held at Harvard / American Academy of Sciences in September, 2018.
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Krishnaram Kenthapadi
Preserving privacy of users is a key requirement of web-scale data mining applications and systems such as web search, recommender systems, crowdsourced platforms, and analytics applications, and has witnessed a renewed focus in light of recent data breaches and new regulations such as GDPR. In this tutorial, we will first present an overview of privacy breaches over the last two decades and the lessons learned, key regulations and laws, and evolution of privacy techniques leading to differential privacy definition / techniques. Then, we will focus on the application of privacy-preserving data mining techniques in practice, by presenting case studies such as Apple’s differential privacy deployment for iOS, Google’s RAPPOR, and LinkedIn Salary. We will also discuss various open source as well as commercial privacy tools, and conclude with open problems and challenges for data mining / machine learning community.
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
ER(Entity Relationship) Diagram for online shopping - TAEHimani415946
https://bit.ly/3KACoyV
The ER diagram for the project is the foundation for the building of the database of the project. The properties, datatypes, and attributes are defined by the ER diagram.
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
1. Fairness-aware Machine Learning:
Practical Challenges and Lessons Learned
KDD 2019 Tutorial
August 2019
Sarah Bird (Microsoft), Ben Hutchinson (Google), Sahin Geyik (LinkedIn)
Krishnaram Kenthapadi (LinkedIn), Emre Kıcıman (Microsoft),
Margaret Mitchell (Google), Mehrnoosh Sameki (Microsoft)
https://sites.google.com/view/kdd19-fairness-tutorial
2. The Coded Gaze [Joy Buolamwini 2016]
• Face detection
software:
Fails for some
darker faces
https://www.youtube.com/watch?v=KB9sI9rY3cA
3. Gender Shades [Joy Buolamwini & Timnit Gebru, 2018]
• Facial analysis
software:
Higher accuracy for
light skinned men
• Error rates for dark
skinned women:
20% - 34%
4. Algorithmic Bias
▪ Ethical challenges posed by AI
systems
▪ Inherent biases present in society
– Reflected in training data
– AI/ML models prone to
amplifying such biases
▪ ACM FAT* conference /
KDD’16 & NeurIPS’17 Tutorials
5. Laws against Discrimination
Immigration Reform and Control Act
Citizenship
Rehabilitation Act of 1973;
Americans with Disabilities Act
of 1990
Disability status
Civil Rights Act of 1964
Race
Age Discrimination in Employment Act of
1967
Age
Equal Pay Act of 1963;
Civil Rights Act of 1964
Sex
And more...
12. Outline
• Algorithmic Bias / Discrimination
• Sources of Data Biases in ML Lifecycle
• Techniques for Fairness in ML
• AI Fairness Tools
• Case Studies
• Key Takeaways
14. Other Great
Tutorials
Fairness in Machine Learning
Solon Barocas and Moritz Hardt, NeurIPS 2017
Challenges of incorporating algorithmic fairness
into practice
Henriette Cramer, Kenneth Holstein, Jennifer
Wortman Vaughan, Hal Daumé III, Miroslav
Dudík, Hanna Wallach, Sravana Reddy, Jean
Garcia-Gathright, FAT* 2019
Defining and Designing Fair Algorithms
Sam Corbett-Davies, Sharad Goel, ICML 2018
The Trouble with Bias
Kate Crawford, NeurIPS 2017 Keynote
16. "[H]iring could become faster and less expensive, and […] lead
recruiters to more highly skilled people who are better matches for
their companies. Another potential result: a more diverse workplace.
The software relies on data to surface candidates from a wide variety of
places and match their skills to the job requirements, free of human
biases."
Miller (2015)
[Barocas & Hardt 2017]
18. "But software is not free of human influence. Algorithms are written
and maintained by people, and machine learning algorithms adjust
what they do based on people’s behavior. As a result […] algorithms can
reinforce human prejudices."
Miller (2015)
[Barocas & Hardt 2017]
20. More positive outcomes & avoiding harmful outcomes
of algorithms for groups of people
[Cramer et al 2019]
21. [Cramer et al 2019]
More positive outcomes & avoiding harmful outcomes
of automated systems for groups of people
22. Legally Recognized Protected Classes
Race (Civil Rights Act of 1964); Color (Civil Rights Act of
1964); Sex (Equal Pay Act of 1963; Civil Rights Act of
1964); Religion (Civil Rights Act of 1964);National origin (Civil Rights
Act of 1964); Citizenship (Immigration Reform and Control
Act); Age (Age Discrimination in Employment Act of
1967);Pregnancy (Pregnancy Discrimination Act); Familial status (Civil
Rights Act of 1968); Disability status (Rehabilitation Act of 1973;
Americans with Disabilities Act of 1990); Veteran status (Vietnam Era
Veterans' Readjustment Assistance Act of 1974; Uniformed Services
Employment and Reemployment Rights Act); Genetic
information (Genetic Information Nondiscrimination Act)
[Boracas & Hardt 2017]
23. Other
Categories
Societal Categories
i.e., political ideology, language, income,
location, topical interests, (sub)culture,
physical traits, etc.
Intersectional Subpopulations
i.e., women from tech
Application-specific subpopulations
i.e., device type
24. Types of Harm
Harms of allocation
withhold opportunity or resources
Harms of representation
reinforce subordination along the lines of identity, stereotypes
[Cramer et al 2019, Shapiro et al., 2017, Kate Crawford, “The Trouble With Bias” keynote N(eur)IPS’17]
25. Bias,
Discrimination
& Machine
Learning
Isn’t bias a technical concept?
Selection, sampling, reporting bias, Bias
of an estimator, Inductive bias
Isn’t discrimination the very point of
machine learning?
Unjustified basis for differentiation
[Barocas & Hardt 2017]
26. Discrimination is not a general concept
It is domain specific
Concerned with important opportunities that affect people’s life chances
It is feature specific
Concerned with socially salient qualities that have served as the basis for unjustified and
systematically adverse treatment in the past
[Barocas & Hardt 2017]
27. Regulated
Domains
Credit (Equal Credit Opportunity Act)
Education (Civil Rights Act of 1964;
Education Amendments of 1972)
Employment (Civil Rights Act of 1964)
Housing (Fair Housing Act)
‘Public Accommodation’ (Civil Rights Act
of 1964)
Extends to marketing and advertising;
not limited to final decision
[Barocas & Hardt 2017]
32. Why do this?
Better product and Serving Broader
Population
Responsibility and Social Impact
Legal and Policy
Competitive Advantage and Brand
[Boracas & Hardt 2017]
36. State of the
Art in Industry
• People: Bring in domain expertise and diversity
• Measure: Emphasize analysis and testing
• Process: Focus on processes and reviews
37. Process Best
Practices
Identify product goals
Get the right people in the room
Identify stakeholders
Select a fairness approach
Analyze and evaluate your system
Mitigate issues
Monitor Continuously and Escalation Plans
Auditing and Transparency
39. Google's Responsible Fairness Practices
https://ai.google/education/responsible-ai-practices?category=fairness
Summary:
• Design your product using concrete goals for fairness and inclusion.
• Engage with social scientists and other relevant experts.
• Set fairness goals
• Check system for unfair biases.
• Include diverse testers and adversarial/stress testing.
• Consider feedback loops
• Analyze performance.
• Evaluate user experience in real-world scenarios.
• Use representative datasets to train and test your model.
40. Fairness
AI systems should treat everyone fairly and
avoid affecting similarly situated groups of
people in different ways
Key considerations
1. Understand the scope, spirit, and
potential uses of the AI system
2. Attract a diverse pool of talent
3. Put processes and tools in place to
identify bias in datasets and machine
learning algorithms
4. Leverage human review and domain
expertise
5. Research and employ best practices,
analytical techniques, and tools
42. Collaborators
Much of this section is based on survey paper and
tutorial series written by Alexandra Olteanu,
Carlos Castillo, Fernando Diaz, Emre Kıcıman
43. To present a taxonomy of challenges that can
occur at different stages of social data analysis.
To recognize, understand, or quantify some
major classes of limitations around data
To give us food for thought, by looking critically
at our work.
....
Survey goals
43
47. data bias
Data bias: a systematic
distortion in data that
compromises its use
for a task.
48. Note: Bias must be considered relative to task
48
Gender discrimination is
illegal
Gender-specific medical
diagnosis is desirable
49. What does data bias look like?
Measure systematic distortions along 5 data properties
1. Population Biases
2. Behavioral Biases
3. Content Production Biases
4. Linking Biases
5. Temporal Biases
50. What does data bias look like?
Measure distortions along 5 data properties
1. Population Biases
Differences in demographics or other user characteristics between a user
population represented in a dataset or platform and a target population
2. Behavioral Biases
3. Content Production Biases
4. Linking Biases
5. Temporal Biases
51. Example:
Different user
demographics on
different social
platforms
51
See [Hargittai’07] for statistics about social media use
among young adults according to gender, race and
ethnicity, and parental educational background.
Figure from http://www.pewinternet.org/2016/11/11/social-media-update-2016/
Example:
Different user
demographics on
different social
platforms
52. Systematic distortions must be evaluated in a
task dependent way
Gender Shades
E.g., for many tasks, populations
should match target population,
to improve external validity
But for some other tasks,
subpopulations require
approximately equal
representation to achieve task
parity
http://gendershades.org/
53. What does data bias look like?
Measure distortions along 5 data properties
1. Population Biases
2. Behavioral Biases
Differences in user behavior across platforms or contexts, or across
users represented in different datasets
3. Content Production Biases
4. Linking Biases
5. Temporal Biases
54. Behavioral Biases from Functional Issues
Platform functionality and algorithms influence human behaviors
and our observations of human behaviors
[Miller et al. ICWSM’16]
Figure from: http://grouplens.org/blog/investigating-the-potential-for-miscommunication-using-emoji/
55. Cultural elements and social contexts are
reflected in social datasets
55
Figure from
[Hannak et al. CSCW 2017]
56. Societal biases embedded in behavior can be
amplified by algorithms
Users pick
biased options
Biased
actions are
used as
feedback
System learns to mimic
biased options
System presents
options,
influencing user
choice
57. What does data bias look like?
Measure distortions along 5 data properties
1. Population Biases
2. Behavioral Biases
3. Content Production Biases
Lexical, syntactic, semantic, and structural differences in the
contents generated by users
4. Linking Biases
5. Temporal Biases
58. Behavioral Biases from Normative Issues
Community norms and societal biases influence observed behavior
and vary across online and offline communities and contexts
What kind of pictures would you share
on Facebook, but not on LinkedIn?
Are individuals comfortable
contradicting popular opinions?
E.g., after singer Prince died, most
SNs showed public mourning. But
not anonymous site PostSecret
The same mechanism can embed
different meanings in different
contexts [Tufekci ICWSM’14]
[the meaning of retweets
or likes] “could range
from affirmation to
denunciation to sarcasm
to approval to disgust”
59. Privacy concerns affect what content users share, and,
thus, the type of patterns we observe.
Foursquare/Image from [Lindqvist et al. CHI’11]
The awareness of
being observed by
other impacts user
behavior: Privacy
and safety
concerns
59
60. As other media, social
media contains
misinformation and
disinformation
60
Misinformation is false information,
unintentionally spread
Disinformation is false information,
deliberately spread
Figures from [Kumar et al. 2016]
Hoaxes on Wikipedia: (left) impact as number
of views per day for hoaxes surviving at least 7
days, and (right) time until a hoax gets detected
and flagged.
61. What does data bias look like?
Measure distortions along 5 data properties
1. Population Biases
2. Behavioral Biases
3. Content Production Biases
4. Linking Biases
Differences in the attributes of networks obtained from user
connections, interactions, or activity
5. Temporal Biases
63. Online social networks formation
also depends on factors
external to the social
platforms
● Geography & distance
● Co-visits
● Dynamics of offline relations
● [...]
Figure from [Gilbert and Karahalios CHI 2009] 63
64. What does data bias look like?
Measure distortions along 5 data properties
1. Population Biases
2. Behavioral Biases
3. Content Production Biases
4. Linking Biases
5. Temporal Biases
Differences in populations and behaviors over time
65. Different demographics can
exhibit different growth rates
across and within social
platforms
65
TaskRabbit and Fiverr are online freelance
marketplaces.
Figure from [Hannak et al. CSCW 2017]
66. E.g., Change in Features over Time
Introducing a new feature or
changing an existing feature
impacts usage patterns on the
platform.
67. Data Collection
Biases can
come in at
any step
along the
data analysis
pipeline
● Metrics: e.g., reliability, lack of domain insights
● Interpretation: e.g., contextual validity, generalizability
● Disclaimers: e.g., lack of negative results and reproducibility
● Functional: biases due to platform affordances and algorithms
● Normative: biases due to community norms
● External: biases due to phenomena outside social platforms
● Non-individuals: e.g., organizations, automated agents
● Acquisition: biases due to, e.g., API limits
● Querying: biases due to, e.g., query formulation
● Filtering: biases due to removal of data “deemed” irrelevant
● Cleaning: biases due to, e.g., default values
● Enrichment: biases from manual or automated annotations
● Aggregation: e.g., grouping, organizing, or structuring data
● Qualitative Analyses: lack generalizability, interpret. biases
● Descriptive Statistics: confounding bias, obfuscated measurements
● Prediction & Inferences: data representation, perform. variations
● Observational studies: peer effects, select. bias, ignorability
Data Processing
Evaluation
Data Analysis
Data Source
68. Design Data Model Application
Best Practices for Bias Avoidance/Mitigation
69. Design Data Model Application
Best Practices for Bias Avoidance/Mitigation
Consider
team composition
for diversity of thought,
background and
experiences
70. Design Data Model Application
Best Practices for Bias Avoidance/Mitigation
Understand the task,
stakeholders, and
potential for errors and
harm
71. Design Data Model Application
Best Practices for Bias Avoidance/Mitigation
Check data sets
Consider data provenance
What is the data intended to
represent?
Verify through qualitative,
experimental, survey and
other methods
72. Design Data Model Application
Best Practices for Bias Avoidance/Mitigation
Check models and validate results
Why is the model making decision?
What mechanisms would explain
results? Is supporting evidence
consistent?
Twyman’s law: The more unusual
the result, more likely it’s an error
73. Design Data Model Application
Best Practices for Bias Avoidance/Mitigation
Post-Deployment
Ensure optimization and guardrail metrics
consistent w/responsible practices and avoid
harms
Continual monitoring, including customer
feedback
Have a plan to identify and respond to
failures and harms as they occur
74. Key Takeaways
• Many, complex biases at all stages of data
collection and analysis
• Population, Behavioral, Content Production, Linking,
Temporal Biases
• Mitigate through deeper investigation,
understanding
• Read more: Social Data: Biases, Methodological
Pitfalls, and Ethical Boundaries, Olteanu,
Castillo, Diaz and Kıcıman
76. Thanks to
Ben Hutchison
Alex Beutel (Research Scientist, fairness in ML),
Allison Woodruff (UX Research, privacy, fairness and ethics),
Andrew Zaldivar (Developer Advocate, ethics and fairness in AI),
Hallie Benjamin (Senior Strategist, ethics and fairness in ML),
Jamaal Barnes (Program Manager, fairness in ML),
Josh Lovejoy (UX Designer, People and AI Research; now Microsoft),
Margaret Mitchell (Research Scientist, ethics and fairness in AI),
Rebecca White (Program Manager, fairness in ML)
and others!
79. Product Introspection (1):
Make Your Key Choices Explicit [Mitchell et al., 2018]
Goals Decision Prediction
Profit from loans Whether to lend Loan will be repaid
Justice, Public safety Whether to detain Crime committed if not detained
• Goals are ideally measurable
• What are your non-goals?
• Which decisions are you not considering?
• What is the relationship between Prediction
and Decision?
80. Product Introspection (2):
Identify Potential Harms
• What are the potential harms?
• Applicants who would have repaid are not
given loans
• Convicts who would not commit a crime
are locked up.
• Are there also longer term harms?
• Applicants are given loans, then go on to
default, harming their credit score
• Are some harms especially bad?
81. Seek out Diverse Perspectives
• Fairness Experts
• User Researchers
• Privacy Experts
• Legal
• Social Science Backgrounds
• Diverse Identities
• Gender
• Sexual Orientation
• Race
• Nationality
• Religion
83. Launch with Confidence: Testing for Bias
• How will you know if users are being
harmed?
• How will you know if harms are unfairly
distributed?
• Detailed testing practices are often not
covered in academic papers
• Discussing testing requirements is a
useful focal point for cross-functional
teams
86. Model Predictions
Positive Negative
● Exists
● Predicted
True Positives
● Doesn’t exist
● Not predicted
True Negatives
Evaluate for Inclusion - Confusion Matrix
87. Model Predictions
Positive Negative
● Exists
● Predicted
True Positives
● Exists
● Not predicted
False Negatives
● Doesn’t exist
● Predicted
False Positives
● Doesn’t exist
● Not predicted
True Negatives
Evaluate for Inclusion - Confusion Matrix
88. Efficient Testing for Bias
• Development teams are under multiple
constraints
• Time
• Money
• Human resources
• Access to data
• How can we efficiently test for bias?
• Prioritization
• Strategic testing
89. Choose your evaluation metrics in light
of acceptable tradeoffs between
False Positives and False Negatives
90. Privacy in Images
False Positive: Something that doesn’t
need to be blurred gets blurred.
Can be a bummer.
False Negative: Something that
needs to be blurred is not blurred.
Identity theft.
False Positives Might be Better than False Negatives
91. Spam Filtering
False Negative: Email that is SPAM is
not caught, so you see it in your inbox.
Usually just a bit annoying.
False Positive: Email flagged as SPAM
is removed from your inbox.
If it’s from a friend or loved one, it’s a
loss!
False Negatives Might Be Better than False Positives
93. 1. Targeted Tests
Based on prior experience/knowledge
• Computer Vision
⇒ Test for dark skin
• Natural Language Processing
⇒ Test for gender stereotypes
Cf. smoke tests
(non-exhaustive tests that check that
most important functions work)
94. Targeted Testing of a Gender Classifier
[Joy Buolamwini & Timnit Gebru, 2018]
• Facial recognition
software:
Higher accuracy for
light skinned men
• Error rates for dark
skinned women:
20% - 34%
95. 2. Quick
Tests
• "Cheap"
• Useful throughout product cycle
• Spot check extreme cases
• Low coverage but high informativity
• Need to be designed thoughtfully, e.g.
• World knowledge
• Prior product failures
97. 3. Comprehensive Tests
Include sufficient data for each subgroup
• May include relevant combinations of attributes
• Sometimes synthetic data is appropriate
Particularly important if model will be used in larger
system
Cf. Unit tests
(verify correct outputs for wide range of correct inputs)
101. AUC Metrics for
Comprehensive Testing
• Subgroup AUC:
• Subgroup Positives vs
Subgroup Negatives
• "BPSN" AUC:
• Background Positives vs
Subgroup Negatives
• "BNSP" AUC:
• Background Negatives vs
Subgroup Positives
102. Comprehensive Testing of a Toxicity Detector
https://github.com/conversationai/perspectiveapi/blob/master/model_cards/English/toxicity.md
103. 4. Ecologically Valid
Testing
Data is drawn from a distribution representative of the
deployment distribution
• Goal is NOT to be representative of the training
distribution
• (When appropriate) Condition on labels & certainty
Example usage scenarios :
• Continuous monitoring
• You have historical product usage data
• You can estimate user distribution reasonably well
105. Challenges with Ecologically Valid Testing
• Post-deployment distributions may not be known
• Product may not be launched yet!
• Sensitive attributes often not available in deployment
• User distributions may change
• We may want user distributions to change
• e.g., broaden user base
106. 5. Adversarial
Tests
Search for rare but extreme harms
• “Poison needle in haystack”
• Requires knowledge of society
Typical usage scenario:
• Close to launch
107. Hypothetical Example of Adversarial Testing
• Emoji autosuggest: are happy emoji suggested for sad sentences?
My dog has gone to heaven
Suggest:
Input:
😊
108. Summary of Practical
Fairness Testing
1. Targeted Tests: domain specific (image, language, etc)
2. Quick Tests: cheap tests throughout dev cycle
3. Comprehensive Tests: thorough
4. Ecologically Valid Tests: real-world data
5. Adversarial Testing: find poison needles
109. Fairness Testing Practices
are Good ML Practices
• Confidence in your product's fairness
requires fairness testing
• Fairness testing has a role throughout
the product iteration lifecycle
• Contextual concerns should be used to
prioritize fairness testing
111. Fairness-aware Data Collection
[Holstein et al., 2019]
• ML literature generally assumes data is fixed
• Often the solution is more and/or better training data
But: need to be Thoughtful!
When might more Data not Help?
• If your data sampling techniques are biased
• Fundamental problems in data quality [Eckhouse et al., 2018]
• What does your data really represent? E.g. crimes vs arrests
• Recall: Product Introspection: How do Predictions relate to Decisions?
114. Fairness-Aware Data Collection Techniques
1. Address population biases
• Target under-represented (with respect to the user population) groups
2. Address representation issues
• Oversample from minority groups
• Sufficient data from each group may be required to avoid model treating them as
"outliers"
3. Data augmentation: synthesize data for minority groups
• E.g. from observed "he is a doctor" → synthesize "she is a doctor"
4. Fairness-aware active learning
• Collect more data for group with highest error rates
117. Practical Concerns with Fair Machine Learning
•Is the training process stable?
•Can we guarantee that fairness policies
will be satisfied?
• Cf. Legal requirements in education,
employment, finance
118. Machine Learning Techniques: Adversarial Training?
P(Label=1) P(Group)
Negative
Gradient
Fairly well-studied with some nice
theoretical guarantees.
But can be difficult to train.
Features, Label, Group
119. Machine Learning: Correlation Loss
[Beutel et al., 2018]
Motivation: Overcome training instability with adversarial training
Key idea: include fairness objective in the loss function
120. Predicted P(Target) distribution
for “Blue” and “Red” examples
(Illustrative Example)
min Loss(Label, Pred)
Pred = P(Label=1)
Features, Label, Group
Machine Learning Techniques: Correlation Loss
121. min Loss(Label, Pred)
+ Abs(Corr(Pred, Group))|Label=0
Pred = P(Label=1)
Predicted P(Target) distribution
for “Blue” and “Red” examples
(Illustrative Example)
Features, Label, Group
Machine Learning Techniques: Correlation Loss
122. ● Computed per batch
● Easy to use
● More stable than adversarial
training.
min Loss(Label, Pred)
+ Abs(Corr(Pred, Group))|Label=0
Pred = P(Label=1)
Features, Label, Group
Machine Learning Techniques: Correlation Loss
123. Machine Learning: Constrained Optimization
[Cotter et al., 2018]
Motivation: Can we ensure that fairness policies are satisfied?
• Fairness goals are explicitly stated as constraints on predictions, e.g.
• FPR on group 1 <= 0.8 * FPR on group 2
• Machine learner optimizes objective function subject to the
constraints
130. Overview of Transparency and Fairness Tools
Bias Detection Bias Mitigation Responsible Metadata
Microsoft InterpretML
Microsoft Azure Interpretability Toolkit
IBM Open Scale
Datasheets for datasets
Model Cards
IBM Fairness 360
Microsoft Fairlearn
Google What-if
H2O
Fact Sheets
131. Overview of Transparency and Fairness Tools
Bias Detection Bias Mitigation Responsible Metadata
Microsoft InterpretML
Microsoft Azure Interpretability Toolkit
IBM Open Scale
Datasheets for datasets
Model Cards
IBM Fairness 360
Microsoft Fairlearn
Google What-if
H2O
Fact Sheets
134. InterpretML
Goal: to provide researchers and AI developers with a toolkit that
allows for:
• Explaining machine learning models globally on all data, or locally
on a specific data point using the state-of-art technologies
• Easily adding new explainers and compare them to the state-of-
the-art explainers
• A common API and data structure across the integrated libraries
github.com/Microsoft/interpret
pip install –U interpret
136. • Training Time Input: model + training data
• Any models that are trained on datasets in Python `numpy.array`, `pandas.DataFrame`,
`iml.datatypes.DenseData`, or `scipy.sparse.csr_matrix` format
• Accepts both models and pipelines as input.
• Model: model must implement the prediction function `predict` or `predict_proba` that conforms to the Scikit
convention.
• Pipeline: the explanation function assumes that the running pipeline script returns a prediction.
• Inferencing Time Input: test data
Azure Machine Learning Interpretability Toolkit
137.
138. • Interpretability at training time
• Combination of glass-box models and black-box explainers
• Auto reason code generation for local predictions
• Ability to cross reference to other techniques to ensure stability and
consistency in results
H2O
139.
140. IBM Open Scale
Goal: to provide AI operations team with a toolkit that allows for:
• Monitoring and re-evaluating machine learning models after
deployment
141. IBM Open Scale
Goal: to provide AI operations team with a toolkit that allows for:
• Monitoring and re-evaluating machine learning models after
deployment
• ACCURACY
• FAIRNESS
• PERFORMANCE
146. Overview of Transparency and Fairness Tools
Bias Detection Bias Mitigation Responsible Metadata
Microsoft InterpretML
Microsoft Azure Interpretability Toolkit
IBM Open Scale
Datasheets for datasets
Model Cards
IBM Fairness 360
Microsoft Fairlearn
Google What-if
H2O
Fact Sheets
147. What If Tool
Goal: Code-free probing of machine learning models
• Feature perturbations (what if scenarios)
• Counterfactual example analysis
• [Classification] Explore the effects of different classification
thresholds, taking into account constraints such as
different numerical fairness metrics.
156. IBM Fairness 360
Datasets
Toolbox
Fairness metrics (30+)
Bias mitigation algorithms (9+)
Guidance
Industry-specific tutorials
Pre-processing algorithm:
a bias mitigation algorithm that is applied to training data
In-processing algorithm:
a bias mitigation algorithm that is applied to
a model during its training
Post-processing algorithm:
a bias mitigation algorithm that is applied to predicted
labels
158. • The toolkit should only be used in a very limited setting:
allocation or risk assessment problems with well-defined
protected attributes in which one would like to have some sort of
statistical or mathematical notion of sameness
• The metrics and algorithms clearly do not capture the full scope
of fairness in all situations
• Only a starting point to a broader discussion among multiple
stakeholders on overall decision-making workflows
Appropriateness of AIF360
@IBM Research’19
159. Appropriateness of AIF360
• The toolkit should only be used in a very limited setting:
allocation or risk assessment problems with well-defined
protected attributes in which one would like to have some sort of
statistical or mathematical notion of sameness
• The metrics and algorithms clearly do not capture the full scope
of fairness in all situations
• Only a starting point to a broader discussion among multiple
stakeholders on overall decision-making workflows @IBM Research’19
160. Appropriateness of AIF360
• The toolkit should only be used in a very limited setting:
allocation or risk assessment problems with well-defined
protected attributes in which one would like to have some sort of
statistical or mathematical notion of sameness
• The metrics and algorithms clearly do not capture the full scope
of fairness in all situations
• Only a starting point to a broader discussion among multiple
stakeholders on overall decision-making workflows @IBM Research’19
161. Microsoft Research Fairlearn
Wrapper around any classification/regression algorithm
q easily integrated into existing ML systems
q Doesn’t require test-time access to protected attribute
Versatile:
q many measures of fairness
q multiple protected attributes with many values
162. • Define fairness metric w/r/t/ protective attribute(s)
• ML goal becomes minimizing classification/regression
error while minimizing unfairness according to the
metric
Challenges:
1. Defining an appropriate fairness metric
2. Learning an accurate model subject to the metric
"Fair" Classification/Regression
163. Microsoft Research Fairlearn
Goal: find a classifier/regressor [in some family]
that minimizes classification/regression error
subject to fairness constraints (user-defined fairness
metric)
Given: a standard ML algorithm as a black box
Approach: iteratively call black box and reweight (and
possibly relabel) the data
165. Overview of Transparency and Fairness Tools
Bias Detection Bias Mitigation Responsible Metadata
Microsoft InterpretML
Microsoft Azure Interpretability Toolkit
IBM Open Scale
Datasheets for datasets
Model Cards
IBM Fairness 360
Microsoft Fairlearn
Google What-if
H2O
Fact Sheets
166. Datasheets for Datasets [Gebru et al., 2018]
• Better data-related documentation
• Datasheets for datasets: every dataset, model, or pre-trained API should be
accompanied by a data sheet that documents its
• Creation
• Intended uses
• Limitations
• Maintenance
• Legal and ethical considerations
• Etc.
168. Fact Sheets [Arnold et al., 2019]
• Is distinguished from “model cards” and “datasheets” in that the
focus is on the final AI service:
• What is the intended use of the service output?
• What algorithms or techniques does this service implement?
• Which datasets was the service tested on? (Provide links to
datasets that were used for testing, along with corresponding
datasheets.)
• Describe the testing methodology.
• Describe the test results.
• Etc.
169. Overview of Transparency and Fairness Tools
Bias Detection Bias Mitigation Responsible Metadata
Microsoft InterpretML
Microsoft Azure Interpretability Toolkit
IBM Open Scale
Datasheets for datasets
Model Cards
IBM Fairness 360
Microsoft Fairlearn
Google What-if
H2O
Fact Sheets
172. Create economic opportunity for every
member of the global workforce
LinkedIn’s Vision
Connect the world's professionals to make
them more productive and successful
LinkedIn’s Mission
174. AI @LinkedIn
25 B
ML A/B experiments
per week
data processed
offline per day
2002.15 PB
data processed
nearline per day
2 PB
Scale
graph edges with 1B
nodes
53 B
parameters in ML
models
181. Representative Ranking for Talent Search
S. C. Geyik, S. Ambler,
K. Kenthapadi, Fairness-
Aware Ranking in Search &
Recommendation Systems with
Application to LinkedIn Talent
Search, KDD’19.
[Microsoft’s AI/ML
conference
(MLADS’18). Distinguished
Contribution Award]
Building Representative
Talent Search at LinkedIn
(LinkedIn engineering blog)
182. Intuition for Measuring Representativeness
• Ideal: Top ranked results should follow a desired distribution on
gender/age/…
• E.g., same distribution as the underlying talent pool
• Inspired by “Equal Opportunity” definition [Hardt et al, NIPS’16]
183. Desired Proportions within the Attribute of Interest
• Compute the proportions of the values of the attribute (e.g., gender,
gender-age combination) amongst the set of qualified candidates
• “Qualified candidates” = Set of candidates that match the search query
criteria
• Retrieved by LinkedIn’s Galene search engine
• Desired proportions could also be obtained based on legal mandate /
voluntary commitment
184. Measuring (Lack of) Representativeness
• Skew@k
• (Logarithmic) ratio of the proportion of candidates having a given attribute value
among the top k ranked results to the corresponding desired proportion
• Variants:
• MinSkew: Minimum over all attribute values
• MaxSkew: Maximum over all attribute values
• Normalized Discounted Cumulative Skew
• Normalized Discounted Cumulative KL-divergence
185. Fairness-aware Reranking Algorithm (Simplified)
• Partition the set of potential candidates into different buckets for
each attribute value
• Rank the candidates in each bucket according to the scores assigned
by the machine-learned model
• Merge the ranked lists, balancing the representation requirements
and the selection of highest scored candidates
• Algorithmic variants based on how we choose the next attribute
187. Validating Our Approach
• Gender Representativeness
• Over 95% of all searches are representative compared to the qualified
population of the search
• Business Metrics
• A/B test over LinkedIn Recruiter users for two weeks
• No significant change in business metrics (e.g., # InMails sent or accepted)
• Ramped to 100% of LinkedIn Recruiter users worldwide
188. Lessons
learned
• Post-processing approach desirable
• Model agnostic
• Scalable across different model
choices for our application
• Acts as a “fail-safe”
• Robust to application-specific
business logic
• Easier to incorporate as part of existing
systems
• Build a stand-alone service or
component for post-processing
• No significant modifications to the
existing components
• Complementary to efforts to reduce bias
from training data & during model training
189. Acknowledgements
•Team:
• AI/ML: Sahin Cem Geyik, Stuart Ambler, Krishnaram Kenthapadi
• Application Engineering: Gurwinder Gulati, Chenhui Zhai
• Analytics: Patrick Driscoll, Divyakumar Menghani
• Product: Rachel Kumar
•Acknowledgements
• Deepak Agarwal, Erik Buchanan, Patrick Cheung, Gil Cottle, Nadia
Fawaz, Rob Hallman, Joshua Hartman, Sara Harrington, Heloise Logan,
Stephen Lynch, Lei Ni, Igor Perisic, Ram Swaminathan, Ketan Thakkar,
Janardhanan Vembunarayanan, Hinkmond Wong, Lin Yang, Liang
Zhang, Yani Zhang
190. Reflections
• Lessons from fairness challenges è
Need “Fairness by Design” approach
when building AI products
• Case studies on fairness-aware ML in
practice
• Collaboration/consensus across
key stakeholders (product, legal,
PR, engineering, AI, …)
196. Design Data Model Application
Questions to ask during
AI Design
Who is affected by AI?
How might AI cause harms?
197. Stakeholders: Who might be affected?
1. Humans speaking with the agent
• Emotional harms, misinformation, threaten task completion
2. The agent “owner”
• Harm practices and reputation of the owner
3. Third-party individuals and groups
• People mentioned in conversations!
4. Audiences listening to the conversation
• This may include general public!
198. How might AI cause harms?
Functional Harms
• Misrepresentation of capabilities
• Misinforming user about task status
• Misunderstanding user and doing the
wrong task
• Revealing private information
inappropriately
Yes, I can do that!
Can you order
sushi for me?
Great, one
California roll
please
I don’t
understand.
?!?!
199. How might AI cause harms? Functional Harms
Functional Harms
• Misrepresentation of capabilities
• Misinforming user about task status
• Misunderstanding user and doing the
wrong task
• Revealing private information
inappropriately
In just 1 minute
When will my
order arrive?
Where’s my
order?
Arriving in 1
minute
?!?!
200. How might AI cause harms? Functional Harms
Functional Harms
• Misrepresentation of capabilities
• Misinforming user about task status
• Misunderstanding user and doing
the wrong task
• Revealing private information
inappropriately
Ordering egg yolks
Tell me a joke
?!?!
201. How might AI cause harms? Functional Harms
Functional Harms
• Misrepresentation of capabilities
• Misinforming user about task status
• Misunderstanding user and doing the
wrong task
• Revealing private information
inappropriately
Bob Smith’s CC
number is …
What’s Bob
Smith’s number?
?!?!
202. How might AI cause harms?
Functional Harms
• Misrepresentation of capabilities
• Misinforming user about task status
• Misunderstanding user and doing the
wrong task
• Revealing private information
inappropriately
These harms are even more problematic when
they systematically occur for some groups of
people but not others
203. How might AI cause harms?
Social Harms: Harms to Individuals
• Inciting/encouraging harmful behavior
• Self/harm, suicide
• Violence or harassment against others
• Discouraging good behavior, e.g., visiting doctors
• Providing wrong information
• Medical, financial, legal advice
• Verbal harassment
• Bullying, sexual harassment
204. How might AI cause harms?
Social Harms: Harms to Communities
• Promoting violence, war, ethnic cleansing, …
• Including promoting related organizations and philosophies
• Engaging in hate speech, disparagement, mocking, …
• Including inadvertent, or Inappropriate imitation (dialect, accent,…)
• Disruption to social processes
• Election disruption, fake news, false disaster response, …
205. Why is this hard?
Language is ambiguous, complex,
with social context
Examples of complex failures:
• Failure to deflect/terminate
contentious topics
• Refusing to discuss when
disapproval would be better
• Polite agreement with
unrecognized bias
Yes, I can do that!
Let’s talk about
<something evil>
206. Why is this hard?
Language is ambiguous, complex,
with social context
Examples of complex failures:
• Failure to deflect/terminate
contentious topics
• Polite agreement with
unrecognized bias
• Refusing to discuss when
disapproval would be better
Sounds ok.
Men are better at
<whatever> than
women
207. Why is this hard?
Language is ambiguous, complex,
with social context
Examples of complex failures:
• Failure to deflect/terminate
contentious topics
• Polite agreement with
unrecognized bias
• Refusing to discuss when
disapproval would be better
I don’t like talking
about religion
I was bullied at
school because
I’m muslim
209. Implications for data collection
Common data sources
• Hand-written rules
• Existing conversational data (e.g., social media)
• New online conversations (e.g., from new customer interactions)
Cleaning training data
• For anonymization
• E.g., remove individual names. But keep famous names (fictional characters, celebrities, politicians, …)
• Ensure adheres to social norms
• Not enough to filter individual words: Filter “I hate [X]”, and you’ll miss “I’m not a fan of [X].
• Remember meanings change with context
• Differentiate between bot input and bot output in training data
• Remove offensive text from bot output training
• But don’t remove from bot inputs à allow learning of good responses to bad inputs
211. Design Data Model Application
Responsible bots: 10 guidelines for
developers of conversational AI
1. Articulate the purpose of your bot
2. Be transparent that you use bots
3. Elevate to a human when needed
4. Design bot to respect cultural norms
5. Ensure bot is reliable (metrics, feedback)
6. Ensure your bot treats people fairly
7. Ensure your bot respects privacy
8. Ensure your bot handles data securely
9. Ensure your bot is accessible
10.Accept responsibility
https://www.microsoft.com/en-us/research/publication/responsible-bots/
212. Design Data Model Application
Responsible bots: 10 guidelines for
developers of conversational AI
1. Articulate the purpose of your bot
2. Be transparent that you use bots
3. Elevate to a human when needed
4. Design bot to respect cultural norms
5. Ensure bot is reliable (metrics, feedback)
6. Ensure your bot treats people fairly
7. Ensure your bot respects privacy
8. Ensure your bot handles data securely
9. Ensure your bot is accessible
10.Accept responsibility
https://www.microsoft.com/en-us/research/publication/responsible-bots/
213. Design Data Model Application
Responsible bots: 10 guidelines for
developers of conversational AI
1. Articulate the purpose of your bot
2. Be transparent that you use bots
3. Elevate to a human when needed
4. Design bot to respect cultural norms
5. Ensure bot is reliable (metrics, feedback)
6. Ensure your bot treats people fairly
7. Ensure your bot respects privacy
8. Ensure your bot handles data securely
9. Ensure your bot is accessible
10.Accept responsibility
https://www.microsoft.com/en-us/research/publication/responsible-bots/
214. Key take-away points
• Many stakeholders affected by conversational agent AIs
• Not only people directly interacting with AI, but also indirectly affected
• Many potential functional, social harms to individuals, communities
• Functional harms exacerbated when systematically biased against groups
• Challenges include complexity and ambiguity of natural language
• Avoiding these harms requires careful consideration across the entire
AI lifecycle.
217. Google
Assistant
Key Points:
• Think about user harms
How does your product make people feel
• Adversarial ("stress") testing for all Google
Assistant launches
• People might say racist,
sexist, homophobic stuff
• Diverse testers
• Think about expanding who your users
could and should be
• Consider the diversity of your users
221. This is a “Shirley Card”
Named after a Kodak studio model
named Shirley Page, they were the
primary method for calibrating color
when processing film.
SKIN TONE IN PHOTOGRAPHY
SOURCES
Color film was built for white people. Here's what it did to dark skin. (Vox)
How Kodak's Shirley Cards Set Photography's Skin-Tone Standard, NPR
222. Until about 1990, virtually all
Shirley Cards featured Caucasian
women.
SKIN TONE IN PHOTOGRAPHY
SOURCES
Color film was built for white people. Here's what it did to dark skin. (Vox)
Colour Balance, Image Technologies, and Cognitive Equity, Roth
How Photography Was Optimized for White Skin Color (Priceonomics)
223. As a result, photos featuring
people with light skin looked
fairly accurate.
SKIN TONE IN PHOTOGRAPHY
SOURCES
Color film was built for white people. Here's what it did to dark skin. (Vox)
Colour Balance, Image Technologies, and Cognitive Equity, Roth
How Photography Was Optimized for White Skin Color (Priceonomics)
Film Kodachrome
Year 1970
Credit Darren Davis, Flickr
224. Photos featuring people with
darker skin, not so much...
SKIN TONE IN PHOTOGRAPHY
SOURCES
Color film was built for white people. Here's what it did to dark skin. (Vox)
Colour Balance, Image Technologies, and Cognitive Equity, Roth
How Photography Was Optimized for White Skin Color (Priceonomics)
Film Kodachrome
Year 1958
Credit Peter Roome, Flickr
226. Google Clips
"We created controlled datasets by
sampling subjects from different genders
and skin tones in a balanced manner, while
keeping variables like content type, duration,
and environmental conditions constant. We
then used this dataset to test that our
algorithms had similar performance when
applied to different groups."
https://ai.googleblog.com/2018/05/automat
ic-photography-with-google-clips.html
231. 1. Detect Gender-Neutral Queries
Train a text classifier to detect when a Turkish query is gender-neutral.
• trained on thousands of human-rated Turkish examples
232. 2. Generate Gender-Specific Translations
• Training: Modify training data to add an additional input token specifying the
required gender:
• (<2MALE> O bir doktor, He is a doctor)
• (<2FEMALE> O bir doktor, She is a doctor)
• Deployment: If step (1) predicted query is gender-neutral, add male and female
tokens to query
• O bir doktor -> {<2MALE> O bir doktor, <2FEMALE> O bir doktor}
233. 3. Check for Accuracy
Verify:
1. If the requested feminine translation is feminine.
2. If the requested masculine translation is masculine.
3. If the feminine and masculine translations are exactly equivalent with the
exception of gender-related changes.
241. Good ML Practices Go a Long Way
Lots of low hanging fruit in terms of
improving fairness simply by using
machine learning best practices
• Representative data
• Introspection tools
• Visualization tools
• Testing
01
Fairness improvements often lead
to overall improvements
• It’s a common misconception that it’s
always a tradeoff
02
242. Breadth and Depth Required
Looking End-to-End is critical
• Need to be aware of bias and potential
problems at every stage of product and
ML pipelines (from design, data
gathering, … to deployment and
monitoring)
01
Details Matter
• Slight changes in features or labeler
criteria can change the outcome
• Must have experts who understand the
effects of decisions
• Many details are not technical such as
how labelers are hired
02
243. Process Best
Practices
Identify product goals
Get the right people in the room
Identify stakeholders
Select a fairness approach
Analyze and evaluate your system
Mitigate issues
Monitor Continuously and Escalation Plans
Auditing and Transparency
Policy
Technology
245. The Real
World is What
Matters
Decisions should be
made considering the
real world goals and
outcomes
You must have
people involved that
understand these
real world effects
•Social scientist,
Lawyers, domain
experts…
•Hire experts (even
ones that don’t
code)
You need different
types of testing
depending on the
application
We need more
research focused on
people, applications,
and real world effects
A lot of the current
research is not that
useful in practice
We need more
social science +
machine learning
research
247. Key Open Problems in Applied Fairness
What if you don’t have the
sensitive attributes?
When should you use what
approach? For example,
Equal treatment vs equal
outcome?
How to identify harms?
Process for framing AI
problems: Will the chosen
metrics lead to desired
results?
How to tell if data generation
and collection method is
appropriate for a task? (e.g.,
causal structure analysis?)
Processes for mitigating
harms and misbehaviors
quickly
248. Related Tutorials / Resources
• Sara Hajian, Francesco Bonchi, and Carlos Castillo, Algorithmic bias: From discrimination
discovery to fairness-aware data mining, KDD Tutorial, 2016.
• Solon Barocas and Moritz Hardt, Fairness in machine learning, NeurIPS Tutorial, 2017.
• Kate Crawford, The Trouble with Bias, NeurIPS Keynote, 2017.
• Arvind Narayanan, 21 fairness definitions and their politics, FAT* Tutorial, 2018.
• Sam Corbett-Davies and Sharad Goel, Defining and Designing Fair Algorithms, Tutorials at
EC 2018 and ICML 2018.
• Ben Hutchinson and Margaret Mitchell, Translation Tutorial: A History of Quantitative
Fairness in Testing, FAT* Tutorial, 2019.
• Henriette Cramer, Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miroslav
Dudík, Hanna Wallach, Sravana Reddy, and Jean Garcia-Gathright, Translation Tutorial:
Challenges of incorporating algorithmic fairness into industry practice, FAT* Tutorial,
2019.
• ACM Conference on Fairness, Accountability, and Transparency (ACM FAT*)
249. Fairness Privacy
Transparency Explainability
Related KDD’19 sessions:
1.Tutorial: Explainable AI in Industry (Sun, 1-5pm)
2.Workshop: Explainable AI/ML (XAI) for Accountability, Fairness, and Transparency (Mon)
3.Social Impact Workshop (Wed, 8:15 – 11:45)
4.Keynote: Cynthia Rudin, Do Simpler Models Exist and How Can We Find Them? (Thu, 8-9am)
5.Research Track Session RT17: Interpretability (Thu, 10-12)
6.Several papers on fairness (e.g., ADS7 (Thu, 10-12), ADS9 (Thu, 1:30-3:30))