ML practitioners and advocates are increasingly finding themselves becoming gatekeepers of the modern world. The models you create have power to get people arrested or vindicated, get loans approved or rejected, determine what interest rate should be charged for such loans, who should be shown to you in your long list of pursuits on your Tinder, what news do you read, who gets called for a job phone screen or even a college admission... the list goes on. My goal in this talk is to summarize the kinds of disparate outcomes that are caused by cargo cult machine learning, and recent academic efforts to address some of them.
Fairness and Transparency in Machine LearningAndreas Dewes
My presentation on fairness on transparency in machine learning that I gave at the PyData Berlin. I investigated the "Stop and Frisk" dataset and tried to show how algorithms can pick up (or remove) biases from our data.
The impact of AI on society gets bigger and bigger - and it is not all good. We as Data Scientists have to really put in work to not end up in ML hell.
This presentation was given at the Dutch Data Science Week.
Nick Schmidt of BLDS, LLC to the Maryland AI meetup, June 4, 2019 (https://www.meetup.com/Maryland-AI). Nick discusses ideas of fairness and how they apply to machine learning. He explores recent academic work on identifying and mitigating bias, and how his work in lending and employment can be applied to other industries. Nick explains how to measure whether an algorithm is fair and also demonstrate the techniques that model builders can use to ameliorate bias when it is found.
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...Krishnaram Kenthapadi
Researchers and practitioners from different disciplines have highlighted the ethical and legal challenges posed by the use of machine learned models and data-driven systems, and the potential for such systems to discriminate against certain population groups, due to biases in algorithmic decision-making systems. This tutorial presents an overview of algorithmic bias / discrimination issues observed over the last few years and the lessons learned, key regulations and laws, and evolution of techniques for achieving fairness in machine learning systems. We motivate the need for adopting a "fairness by design" approach (as opposed to viewing algorithmic bias / fairness considerations as an afterthought), when developing machine learning based models and systems for different consumer and enterprise applications. Then, we focus on the application of fairness-aware machine learning techniques in practice by presenting non-proprietary case studies from different technology companies. Finally, based on our experiences working on fairness in machine learning at companies such as Facebook, Google, LinkedIn, and Microsoft, we present open problems and research directions for the data mining / machine learning community.
Fairness and Transparency in Machine LearningAndreas Dewes
My presentation on fairness on transparency in machine learning that I gave at the PyData Berlin. I investigated the "Stop and Frisk" dataset and tried to show how algorithms can pick up (or remove) biases from our data.
The impact of AI on society gets bigger and bigger - and it is not all good. We as Data Scientists have to really put in work to not end up in ML hell.
This presentation was given at the Dutch Data Science Week.
Nick Schmidt of BLDS, LLC to the Maryland AI meetup, June 4, 2019 (https://www.meetup.com/Maryland-AI). Nick discusses ideas of fairness and how they apply to machine learning. He explores recent academic work on identifying and mitigating bias, and how his work in lending and employment can be applied to other industries. Nick explains how to measure whether an algorithm is fair and also demonstrate the techniques that model builders can use to ameliorate bias when it is found.
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...Krishnaram Kenthapadi
Researchers and practitioners from different disciplines have highlighted the ethical and legal challenges posed by the use of machine learned models and data-driven systems, and the potential for such systems to discriminate against certain population groups, due to biases in algorithmic decision-making systems. This tutorial presents an overview of algorithmic bias / discrimination issues observed over the last few years and the lessons learned, key regulations and laws, and evolution of techniques for achieving fairness in machine learning systems. We motivate the need for adopting a "fairness by design" approach (as opposed to viewing algorithmic bias / fairness considerations as an afterthought), when developing machine learning based models and systems for different consumer and enterprise applications. Then, we focus on the application of fairness-aware machine learning techniques in practice by presenting non-proprietary case studies from different technology companies. Finally, based on our experiences working on fairness in machine learning at companies such as Facebook, Google, LinkedIn, and Microsoft, we present open problems and research directions for the data mining / machine learning community.
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...Krishnaram Kenthapadi
Researchers and practitioners from different disciplines have highlighted the ethical and legal challenges posed by the use of machine learned models and data-driven systems, and the potential for such systems to discriminate against certain population groups, due to biases in algorithmic decision-making systems. This tutorial presents an overview of algorithmic bias / discrimination issues observed over the last few years and the lessons learned, key regulations and laws, and evolution of techniques for achieving fairness in machine learning systems. We will motivate the need for adopting a "fairness by design" approach (as opposed to viewing algorithmic bias / fairness considerations as an afterthought), when developing machine learning based models and systems for different consumer and enterprise applications. Then, we will focus on the application of fairness-aware machine learning techniques in practice by presenting non-proprietary case studies from different technology companies. Finally, based on our experiences working on fairness in machine learning at companies such as Facebook, Google, LinkedIn, and Microsoft, we will present open problems and research directions for the data mining / machine learning community.
Please cite as:
Sarah Bird, Ben Hutchinson, Krishnaram Kenthapadi, Emre Kiciman, and Margaret Mitchell. Fairness-Aware Machine Learning: Practical Challenges and Lessons Learned. WSDM 2019.
[Video recording available at https://www.youtube.com/playlist?list=PLewjn-vrZ7d3x0M4Uu_57oaJPRXkiS221]
Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, and critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling.
As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks.
In this tutorial, we present an overview of model interpretability and explainability in AI, key regulations / laws, and techniques / tools for providing explainability as part of AI/ML systems. Then, we focus on the application of explainability techniques in industry, wherein we present practical challenges / guidelines for effectively using explainability techniques and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We present case studies across different companies, spanning application domains such as search & recommendation systems, hiring, sales, and lending. Finally, based on our experiences in industry, we identify open problems and research directions for the data mining / machine learning community.
[Video available at https://sites.google.com/view/ResponsibleAITutorial]
Artificial Intelligence is increasingly being used in decisions and processes that are critical for individuals, businesses, and society, especially in areas such as hiring, lending, criminal justice, healthcare, and education. Recent ethical challenges and undesirable outcomes associated with AI systems have highlighted the need for regulations, best practices, and practical tools to help data scientists and ML developers build AI systems that are secure, privacy-preserving, transparent, explainable, fair, and accountable – to avoid unintended and potentially harmful consequences and compliance challenges.
In this tutorial, we will present an overview of responsible AI, highlighting model explainability, fairness, and privacy in AI, key regulations/laws, and techniques/tools for providing understanding around AI/ML systems. Then, we will focus on the application of explainability, fairness assessment/unfairness mitigation, and privacy techniques in industry, wherein we present practical challenges/guidelines for using such techniques effectively and lessons learned from deploying models for several web-scale machine learning and data mining applications. We will present case studies across different companies, spanning many industries and application domains. Finally, based on our experiences in industry, we will identify open problems and research directions for the AI community.
Fairness, Transparency, and Privacy in AI @LinkedInC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2V9zW73.
Krishnaram Kenthapadi talks about privacy breaches, algorithmic bias/discrimination issues observed in the Internet industry, regulations & laws, and techniques for achieving privacy and fairness in data-driven systems. He focusses on the application of privacy-preserving data mining and fairness-aware ML techniques in practice, by presenting case studies spanning different LinkedIn applications. Filmed at qconsf.com.
Krishnaram Kenthapadi is part of the AI team at LinkedIn, where he leads the transparency and privacy modeling efforts across different applications. He is LinkedIn's representative in Microsoft's AI and Ethics in Engineering & Research Committee. He shaped the technical roadmap for LinkedIn Salary product, and served as the relevance lead for the LinkedIn Careers & Talent Solutions Relevance team.
How do we protect privacy of users when building large-scale AI based systems? How do we develop machine learned models and systems taking fairness, accountability, and transparency into account? With the ongoing explosive growth of AI/ML models and systems, these are some of the ethical, legal, and technical challenges encountered by researchers and practitioners alike. In this talk, we will first motivate the need for adopting a "fairness and privacy by design" approach when developing AI/ML models and systems for different consumer and enterprise applications. We will then focus on the application of fairness-aware machine learning and privacy-preserving data mining techniques in practice, by presenting case studies spanning different LinkedIn applications (such as fairness-aware talent search ranking, privacy-preserving analytics, and LinkedIn Salary privacy & security design), and conclude with the key takeaways and open challenges.
Responsible AI in Industry: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
How do we develop machine learning models and systems taking fairness, accuracy, explainability, and transparency into account? How do we protect the privacy of users when building large-scale AI based systems? Model fairness and explainability and protection of user privacy are considered prerequisites for building trust and adoption of AI systems in high stakes domains such as hiring, lending, and healthcare. We will first motivate the need for adopting a “fairness, explainability, and privacy by design” approach when developing AI/ML models and systems for different consumer and enterprise applications from the societal, regulatory, customer, end-user, and model developer perspectives. We will then focus on the application of responsible AI techniques in practice through industry case studies. We will discuss the sociotechnical dimensions and practical challenges, and conclude with the key takeaways and open challenges.
Algorithmic Impact Assessment: Fairness, Robustness and Explainability in Aut...Adriano Soares Koshiyama
The workshop session focuses on the following topics:
Introduction to AI & Machine Learning (Algorithms)
Key Components of Algorithmic Impact Assessment
Algorithmic Explainability
Algorithmic Fairness
Algorithmic Robustness
This tutorial extensively covers the definitions, nuances, challenges, and requirements for the design of interpretable and explainable machine learning models and systems in healthcare. We discuss many uses in which interpretable machine learning models are needed in healthcare and how they should be deployed. Additionally, we explore the landscape of recent advances to address the challenges model interpretability in healthcare and also describe how one would go about choosing the right interpretable machine learnig algorithm for a given problem in healthcare.
NYAI #24: Developing Trust in Artificial Intelligence and Machine Learning fo...Maryam Farooq
NYAI #24: Developing Trust in Artificial Intelligence and Machine Learning for High-Stakes Applications with Dr. Kush Varshney (Principal Research Manager, IBM Research AI).
Check out the the IBM AI Fairness 360 open source toolkit: https://www.ibm.com/blogs/research/2018/09/ai-fairness-360/
nyai.co
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...Sri Ambati
This session was recorded in NYC on October 22nd, 2019 and can be viewed here: https://youtu.be/ngOBhhINWb8
Explainable Machine Learning with Shapley Values
Shapley values are popular approach for explaining predictions made by complex machine learning models. In this talk I will discuss what problems Shapley values solve, an intuitive presentation of what they mean, and examples of how they can be used through the ‘shap’ python package.
Bio: I am a senior researcher at Microsoft Research. Before joining Microsoft, I did my Ph.D. studies at the Paul G. Allen School of Computer Science & Engineering of the University of Washington working with Su-In Lee. My work focuses on explainable artificial intelligence and its application to problems in medicine and healthcare. This has led to the development of broadly applicable methods and tools for interpreting complex machine learning models that are now used in banking, logistics, sports, manufacturing, cloud services, economics, and many other areas.
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)Krishnaram Kenthapadi
[Video available at https://sites.google.com/view/ResponsibleAITutorial]
Artificial Intelligence is increasingly being used in decisions and processes that are critical for individuals, businesses, and society, especially in areas such as hiring, lending, criminal justice, healthcare, and education. Recent ethical challenges and undesirable outcomes associated with AI systems have highlighted the need for regulations, best practices, and practical tools to help data scientists and ML developers build AI systems that are secure, privacy-preserving, transparent, explainable, fair, and accountable – to avoid unintended and potentially harmful consequences and compliance challenges.
In this tutorial, we will present an overview of responsible AI, highlighting model explainability, fairness, and privacy in AI, key regulations/laws, and techniques/tools for providing understanding around AI/ML systems. Then, we will focus on the application of explainability, fairness assessment/unfairness mitigation, and privacy techniques in industry, wherein we present practical challenges/guidelines for using such techniques effectively and lessons learned from deploying models for several web-scale machine learning and data mining applications. We will present case studies across different companies, spanning many industries and application domains. Finally, based on our experiences in industry, we will identify open problems and research directions for the AI community.
How do we protect privacy of users when building large-scale AI based systems? How do we develop machine learned models and systems taking fairness, accountability, and transparency into account? With the ongoing explosive growth of AI/ML models and systems, these are some of the ethical, legal, and technical challenges encountered by researchers and practitioners alike. In this talk, we will first motivate the need for adopting a "fairness and privacy by design" approach when developing AI/ML models and systems for different consumer and enterprise applications. We will then focus on the application of fairness-aware machine learning and privacy-preserving data mining techniques in practice, by presenting case studies spanning different LinkedIn applications (such as fairness-aware talent search ranking, privacy-preserving analytics, and LinkedIn Salary privacy & security design), and conclude with the key takeaways and open challenges.
Talk presented at the Analytics Frontiers Conference in Charlotte on March 21. The presentation evaluates opportunities and risks of AI and how consumers, businesses, society and governments can mitigate some of the risks.
Privacy in AI/ML Systems: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
How do we protect the privacy of users when building large-scale AI based systems? How do we develop machine learning models and systems taking fairness, accuracy, explainability, and transparency into account? Model fairness and explainability and protection of user privacy are considered prerequisites for building trust and adoption of AI systems in high stakes domains. We will first motivate the need for adopting a “fairness, explainability, and privacy by design” approach when developing AI/ML models and systems for different consumer and enterprise applications from the societal, regulatory, customer, end-user, and model developer perspectives. We will then focus on the application of privacy-preserving AI techniques in practice through industry case studies. We will discuss the sociotechnical dimensions and practical challenges, and conclude with the key takeaways and open challenges.
[Video recording available at https://www.youtube.com/playlist?list=PLewjn-vrZ7d3x0M4Uu_57oaJPRXkiS221]
Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, and critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling.
As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks.
In this tutorial, we present an overview of model interpretability and explainability in AI, key regulations / laws, and techniques / tools for providing explainability as part of AI/ML systems. Then, we focus on the application of explainability techniques in industry, wherein we present practical challenges / guidelines for effectively using explainability techniques and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We present case studies across different companies, spanning application domains such as search & recommendation systems, hiring, sales, and lending. Finally, based on our experiences in industry, we identify open problems and research directions for the data mining / machine learning community.
Algorithmic Bias: Challenges and Opportunities for AI in HealthcareGregory Nelson
Gregory S. Nelson, VP, Analytics and Strategy – Vidant Health | Adjunct Faculty Duke University
The promise of AI is quickly becoming a reality for a number of industries including healthcare. For example, we have seen early successes in the augmenting clinical intelligence for diagnostic imaging and in early detection of pneumonia and sepsis. But what happens when the algorithms are biased? In this presentation, we will outline a framework for AI governance and discuss ways in which we can address algorithmic bias in machine learning.
Objective 1: Illustrate the issues of bias in AI through examples specific to healthcare.
Objective 2: Summarize the growing body of work in the legal, regulatory, and ethical oversight of AI models and the implications for healthcare.
Objective 3: Outline steps that we can take to establish an AI governance strategy for our organizations.
Explainable AI with H2O Driverless AI's MLI moduleMartin Dvorak
My H2O.ai Prague meetup presentation on explainable AI, model debugging, strategies to identify security vulnerabilities and undesired ML biases + explanation how Driverless AI Machine Learning Interpretability module helps to mitigate aforementioned problems.
Provides a basic overview automated decision systems and recent attempts to guarantee their allocations are fair. It then examines 4 key papers in the field that suggest that this cannot be guaranteed and offers a brief sketch of a different direction.
AI should be Fair, Accountable and Transparent (FAT* AI), hence it's crucial to raise awareness among these topics not only among machine learning practitioners but among the entire population, as ML systems can take life-changing decisions and influence our lives now more than ever.
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...Krishnaram Kenthapadi
Researchers and practitioners from different disciplines have highlighted the ethical and legal challenges posed by the use of machine learned models and data-driven systems, and the potential for such systems to discriminate against certain population groups, due to biases in algorithmic decision-making systems. This tutorial presents an overview of algorithmic bias / discrimination issues observed over the last few years and the lessons learned, key regulations and laws, and evolution of techniques for achieving fairness in machine learning systems. We will motivate the need for adopting a "fairness by design" approach (as opposed to viewing algorithmic bias / fairness considerations as an afterthought), when developing machine learning based models and systems for different consumer and enterprise applications. Then, we will focus on the application of fairness-aware machine learning techniques in practice by presenting non-proprietary case studies from different technology companies. Finally, based on our experiences working on fairness in machine learning at companies such as Facebook, Google, LinkedIn, and Microsoft, we will present open problems and research directions for the data mining / machine learning community.
Please cite as:
Sarah Bird, Ben Hutchinson, Krishnaram Kenthapadi, Emre Kiciman, and Margaret Mitchell. Fairness-Aware Machine Learning: Practical Challenges and Lessons Learned. WSDM 2019.
[Video recording available at https://www.youtube.com/playlist?list=PLewjn-vrZ7d3x0M4Uu_57oaJPRXkiS221]
Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, and critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling.
As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks.
In this tutorial, we present an overview of model interpretability and explainability in AI, key regulations / laws, and techniques / tools for providing explainability as part of AI/ML systems. Then, we focus on the application of explainability techniques in industry, wherein we present practical challenges / guidelines for effectively using explainability techniques and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We present case studies across different companies, spanning application domains such as search & recommendation systems, hiring, sales, and lending. Finally, based on our experiences in industry, we identify open problems and research directions for the data mining / machine learning community.
[Video available at https://sites.google.com/view/ResponsibleAITutorial]
Artificial Intelligence is increasingly being used in decisions and processes that are critical for individuals, businesses, and society, especially in areas such as hiring, lending, criminal justice, healthcare, and education. Recent ethical challenges and undesirable outcomes associated with AI systems have highlighted the need for regulations, best practices, and practical tools to help data scientists and ML developers build AI systems that are secure, privacy-preserving, transparent, explainable, fair, and accountable – to avoid unintended and potentially harmful consequences and compliance challenges.
In this tutorial, we will present an overview of responsible AI, highlighting model explainability, fairness, and privacy in AI, key regulations/laws, and techniques/tools for providing understanding around AI/ML systems. Then, we will focus on the application of explainability, fairness assessment/unfairness mitigation, and privacy techniques in industry, wherein we present practical challenges/guidelines for using such techniques effectively and lessons learned from deploying models for several web-scale machine learning and data mining applications. We will present case studies across different companies, spanning many industries and application domains. Finally, based on our experiences in industry, we will identify open problems and research directions for the AI community.
Fairness, Transparency, and Privacy in AI @LinkedInC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2V9zW73.
Krishnaram Kenthapadi talks about privacy breaches, algorithmic bias/discrimination issues observed in the Internet industry, regulations & laws, and techniques for achieving privacy and fairness in data-driven systems. He focusses on the application of privacy-preserving data mining and fairness-aware ML techniques in practice, by presenting case studies spanning different LinkedIn applications. Filmed at qconsf.com.
Krishnaram Kenthapadi is part of the AI team at LinkedIn, where he leads the transparency and privacy modeling efforts across different applications. He is LinkedIn's representative in Microsoft's AI and Ethics in Engineering & Research Committee. He shaped the technical roadmap for LinkedIn Salary product, and served as the relevance lead for the LinkedIn Careers & Talent Solutions Relevance team.
How do we protect privacy of users when building large-scale AI based systems? How do we develop machine learned models and systems taking fairness, accountability, and transparency into account? With the ongoing explosive growth of AI/ML models and systems, these are some of the ethical, legal, and technical challenges encountered by researchers and practitioners alike. In this talk, we will first motivate the need for adopting a "fairness and privacy by design" approach when developing AI/ML models and systems for different consumer and enterprise applications. We will then focus on the application of fairness-aware machine learning and privacy-preserving data mining techniques in practice, by presenting case studies spanning different LinkedIn applications (such as fairness-aware talent search ranking, privacy-preserving analytics, and LinkedIn Salary privacy & security design), and conclude with the key takeaways and open challenges.
Responsible AI in Industry: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
How do we develop machine learning models and systems taking fairness, accuracy, explainability, and transparency into account? How do we protect the privacy of users when building large-scale AI based systems? Model fairness and explainability and protection of user privacy are considered prerequisites for building trust and adoption of AI systems in high stakes domains such as hiring, lending, and healthcare. We will first motivate the need for adopting a “fairness, explainability, and privacy by design” approach when developing AI/ML models and systems for different consumer and enterprise applications from the societal, regulatory, customer, end-user, and model developer perspectives. We will then focus on the application of responsible AI techniques in practice through industry case studies. We will discuss the sociotechnical dimensions and practical challenges, and conclude with the key takeaways and open challenges.
Algorithmic Impact Assessment: Fairness, Robustness and Explainability in Aut...Adriano Soares Koshiyama
The workshop session focuses on the following topics:
Introduction to AI & Machine Learning (Algorithms)
Key Components of Algorithmic Impact Assessment
Algorithmic Explainability
Algorithmic Fairness
Algorithmic Robustness
This tutorial extensively covers the definitions, nuances, challenges, and requirements for the design of interpretable and explainable machine learning models and systems in healthcare. We discuss many uses in which interpretable machine learning models are needed in healthcare and how they should be deployed. Additionally, we explore the landscape of recent advances to address the challenges model interpretability in healthcare and also describe how one would go about choosing the right interpretable machine learnig algorithm for a given problem in healthcare.
NYAI #24: Developing Trust in Artificial Intelligence and Machine Learning fo...Maryam Farooq
NYAI #24: Developing Trust in Artificial Intelligence and Machine Learning for High-Stakes Applications with Dr. Kush Varshney (Principal Research Manager, IBM Research AI).
Check out the the IBM AI Fairness 360 open source toolkit: https://www.ibm.com/blogs/research/2018/09/ai-fairness-360/
nyai.co
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...Sri Ambati
This session was recorded in NYC on October 22nd, 2019 and can be viewed here: https://youtu.be/ngOBhhINWb8
Explainable Machine Learning with Shapley Values
Shapley values are popular approach for explaining predictions made by complex machine learning models. In this talk I will discuss what problems Shapley values solve, an intuitive presentation of what they mean, and examples of how they can be used through the ‘shap’ python package.
Bio: I am a senior researcher at Microsoft Research. Before joining Microsoft, I did my Ph.D. studies at the Paul G. Allen School of Computer Science & Engineering of the University of Washington working with Su-In Lee. My work focuses on explainable artificial intelligence and its application to problems in medicine and healthcare. This has led to the development of broadly applicable methods and tools for interpreting complex machine learning models that are now used in banking, logistics, sports, manufacturing, cloud services, economics, and many other areas.
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)Krishnaram Kenthapadi
[Video available at https://sites.google.com/view/ResponsibleAITutorial]
Artificial Intelligence is increasingly being used in decisions and processes that are critical for individuals, businesses, and society, especially in areas such as hiring, lending, criminal justice, healthcare, and education. Recent ethical challenges and undesirable outcomes associated with AI systems have highlighted the need for regulations, best practices, and practical tools to help data scientists and ML developers build AI systems that are secure, privacy-preserving, transparent, explainable, fair, and accountable – to avoid unintended and potentially harmful consequences and compliance challenges.
In this tutorial, we will present an overview of responsible AI, highlighting model explainability, fairness, and privacy in AI, key regulations/laws, and techniques/tools for providing understanding around AI/ML systems. Then, we will focus on the application of explainability, fairness assessment/unfairness mitigation, and privacy techniques in industry, wherein we present practical challenges/guidelines for using such techniques effectively and lessons learned from deploying models for several web-scale machine learning and data mining applications. We will present case studies across different companies, spanning many industries and application domains. Finally, based on our experiences in industry, we will identify open problems and research directions for the AI community.
How do we protect privacy of users when building large-scale AI based systems? How do we develop machine learned models and systems taking fairness, accountability, and transparency into account? With the ongoing explosive growth of AI/ML models and systems, these are some of the ethical, legal, and technical challenges encountered by researchers and practitioners alike. In this talk, we will first motivate the need for adopting a "fairness and privacy by design" approach when developing AI/ML models and systems for different consumer and enterprise applications. We will then focus on the application of fairness-aware machine learning and privacy-preserving data mining techniques in practice, by presenting case studies spanning different LinkedIn applications (such as fairness-aware talent search ranking, privacy-preserving analytics, and LinkedIn Salary privacy & security design), and conclude with the key takeaways and open challenges.
Talk presented at the Analytics Frontiers Conference in Charlotte on March 21. The presentation evaluates opportunities and risks of AI and how consumers, businesses, society and governments can mitigate some of the risks.
Privacy in AI/ML Systems: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
How do we protect the privacy of users when building large-scale AI based systems? How do we develop machine learning models and systems taking fairness, accuracy, explainability, and transparency into account? Model fairness and explainability and protection of user privacy are considered prerequisites for building trust and adoption of AI systems in high stakes domains. We will first motivate the need for adopting a “fairness, explainability, and privacy by design” approach when developing AI/ML models and systems for different consumer and enterprise applications from the societal, regulatory, customer, end-user, and model developer perspectives. We will then focus on the application of privacy-preserving AI techniques in practice through industry case studies. We will discuss the sociotechnical dimensions and practical challenges, and conclude with the key takeaways and open challenges.
[Video recording available at https://www.youtube.com/playlist?list=PLewjn-vrZ7d3x0M4Uu_57oaJPRXkiS221]
Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, and critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling.
As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks.
In this tutorial, we present an overview of model interpretability and explainability in AI, key regulations / laws, and techniques / tools for providing explainability as part of AI/ML systems. Then, we focus on the application of explainability techniques in industry, wherein we present practical challenges / guidelines for effectively using explainability techniques and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We present case studies across different companies, spanning application domains such as search & recommendation systems, hiring, sales, and lending. Finally, based on our experiences in industry, we identify open problems and research directions for the data mining / machine learning community.
Algorithmic Bias: Challenges and Opportunities for AI in HealthcareGregory Nelson
Gregory S. Nelson, VP, Analytics and Strategy – Vidant Health | Adjunct Faculty Duke University
The promise of AI is quickly becoming a reality for a number of industries including healthcare. For example, we have seen early successes in the augmenting clinical intelligence for diagnostic imaging and in early detection of pneumonia and sepsis. But what happens when the algorithms are biased? In this presentation, we will outline a framework for AI governance and discuss ways in which we can address algorithmic bias in machine learning.
Objective 1: Illustrate the issues of bias in AI through examples specific to healthcare.
Objective 2: Summarize the growing body of work in the legal, regulatory, and ethical oversight of AI models and the implications for healthcare.
Objective 3: Outline steps that we can take to establish an AI governance strategy for our organizations.
Explainable AI with H2O Driverless AI's MLI moduleMartin Dvorak
My H2O.ai Prague meetup presentation on explainable AI, model debugging, strategies to identify security vulnerabilities and undesired ML biases + explanation how Driverless AI Machine Learning Interpretability module helps to mitigate aforementioned problems.
Provides a basic overview automated decision systems and recent attempts to guarantee their allocations are fair. It then examines 4 key papers in the field that suggest that this cannot be guaranteed and offers a brief sketch of a different direction.
AI should be Fair, Accountable and Transparent (FAT* AI), hence it's crucial to raise awareness among these topics not only among machine learning practitioners but among the entire population, as ML systems can take life-changing decisions and influence our lives now more than ever.
Frankie Rybicki slide set for Deep Learning in Radiology / MedicineFrank Rybicki
These are my #AI slides for medical deep learning using #radiology and medical imaging examples. Please use them & modify to teach your own group about medical AI.
Coming to an Understanding: a Cross-institutional Examination of Assessments ...Stephanie Wright
Data curation has emerged as a strategic growth area for academic libraries. Many libraries have conducted needs assessments as a precursor towards developing services; however there have been few comparisons of the findings across institutions. This panel brings together four librarians from different institutions to discuss both common and distinct findings from their respective needs assessments. The panelists will speculate on the application of these findings at their specific libraries and in academic libraries generally.
Predictive Analytics - How to get stuff out of your Crystal BallDATAVERSITY
Everyone wants to leverage data. The optimal implementation of analytics is an organization-wide set of capabilities. These are called advantageous organizational analytic capabilities in that a clear ROI is demonstrable from these efforts. Turns out that there are a number of prerequisites to advantageous organizational analytics. These include:
Adopting a crawl, walk, run strategy
Understanding current and potential organizational maturity and corresponding capabilities
Achieving an appropriate technology/human capability balance
Implementing useful IT systems development practices
Installing necessary non-IT leadership
This webinar will explore these and other topics using examples drawn from DOD, healthcare researchers, and donation center operations.
Video at https://www.youtube.com/watch?v=kHZqgmrIg8k
Event page : https://www.meetup.com/Tech-Valley-Machine-Learning-and-AI/events/246094251/
Speaker: Dan Elton
Tech Valley Machine Learning Meetup, Troy New York, 12-28-17
Building off David Martuscello's intro talk at the last meetup, Dan Elton presented some pitfalls one can encounter with machine learning. It is important not to get caught up in the hype surrounding machine learning and maintain a sense of what ML really is and what its limitations are. Some of the pitfalls that were discussed were
-- overfitting & underfitting
-- not cleaning & normalizing your data
-- trying to use machine learning for extrapolation
-- biased data sampling
-- not doing hyperparameter optimization carefully
-- not comparing your results to simple baselines
ODSC East 2017: Data Science Models For GoodKarry Lu
Abstract: The rise of data science has been largely fueled by the promise of changing the business landscape - enhancing one's competitive advantage, increasing business optimization and efficiency, and ultimately delivering a better bottom-line. This promise reaches across sectors as machine learning methods are getting better, data access continues to grow, and computation power is easily accessible. However, because the practice of doing data science can be expensive, there is a danger that this so-called promise of data science may only be available to the most well-resourced organizations with sophisticated data capabilities and staff. For the past five years, DataKind has been working to ensure social change organizations too have access to data science, teaming them up with data scientists to build machine learning and artificial intelligence solutions that aim to reduce human suffering. In doing so, DataKind has learned what it takes to apply data science in the social sector and the many applications it has for creating positive change in the world. This session presents DataKind projects showcasing the wide range of applications for ML/AI for social good. From using satellite imagery and remote sensing techniques to detect wheat farm boundaries to protect livelihoods in Ethiopia, to leveraging NLP to automate the time consuming process of synthesizing findings from academic studies to inform conservation efforts and to classifying text records to better understand human rights conditions across the world to using machine learning to reduce traffic fatalities in U.S. cities, learn about some of the latest breakthroughs and findings in the data science for social good space and learn how you can get involved
General AI for Medical Educators April 2024Janet Corral
Learn how to consider Artificial Intelligence as augmentation, to enhance your work. In this presentation we cover augmentation, cyborgs and critically appraise examples of #AI in #MedEd. We then discuss faculty development and can #AI be an #instructionaldesinger.
An invited talk in the Big Data session of the Industrial Research Institute meeting in Seattle Washington.
Some notes on how to train data science talent and exploit the fact that the membrane between academia and industry has become more permeable.
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
22. Discrimination Based on Redundant Encoding
Feature4231:Race=’Black’
Features = {‘loc’, ‘income’, ..}
Polynomial kernel with degree 2
Feature6578:Loc=’EastOakland’^Income=’<10k’
33. How can we characterize fairness?
What does fairness even mean?
Group Fairness vs. Individual Fairness
34. How can we characterize fairness?
One way to characterize group fairness is to ensure both majority and the
protected population have similar outcomes.
or
P(FavorableOutcome | S) : P(FavorableOutcome | Sc) = 1 : 1
35. How can we characterize fairness?
One way to characterize group fairness is to ensure both majority and the
protected population have similar outcomes.
or
P(FavorableOutcome | S) : P(FavorableOutcome | Sc) = 1 : 1
often this is hard to achieve.
For example, for jobs, the EEOC specifies this ratio should be no less than 0.8 : 1
(aka 80% rule).
36. Characterizing Fairness of a black box classifier
One way: Is classifier outcome correlated with membership in
S?
37. Fairness as a constraint
Is classifier outcome correlated with membership in S?
Sensitive attributes
Decision function
Want
40. “If we allowed a model to be
used for college admissions in
1870, we’d still have 0.7% of
women going to college.”
Recommended Reading
41. Reading List
There’s much material on fairness in data-driven decision/policy making from
literature in
- law
- sociology
- political science
- computer science/machine learning
- economics
(the machine learning literature is nascent, only around 2009 onwards)
42. Reading List (Fairness in ML)
Pedreschi, Dino, Salvatore Ruggieri, and Franco Turini. "Measuring Discrimination in Socially-Sensitive
Decision Records." SDM. 2009.
Kamiran, Faisal, and Toon Calders. "Classifying without discriminating."Computer, Control and
Communication, 2009. IC4 2009. 2nd International Conference on. IEEE, 2009.
Dwork, Cynthia, et al. "Fairness through awareness." Proceedings of the 3rd Innovations in
Theoretical Computer Science Conference. ACM, 2012
Romei, Andrea, and Salvatore Ruggieri. "A multidisciplinary survey on discrimination analysis."
The Knowledge Engineering Review 29.05 (2014)
43. Reading List (Fairness in ML)
Friedler, Sorelle, Carlos Scheidegger, and Suresh Venkatasubramanian. "Certifying and removing
disparate impact." CoRR (2014).
Barocas, Solon and Selbst, Andrew D., Big Data's Disparate Impact (August 14, 2015). California
Law Review, Vol. 104,
Zafar, Muhammad Bilal, et al. "Fairness Constraints: A Mechanism for Fair Classification." arXiv preprint
arXiv:1507.05259 (2015).
Zliobaite, Indre. "On the relation between accuracy and fairness in binary classification." arXiv preprint
arXiv:1505.05723 (2015).
44. Other resources
NSF’s “Big Data Innovation Hubs” were created in part to address these
challenges
http://www.nsf.gov/news/news_summ.jsp?cntn_id=136784
Stanford Law Review touches upon this topic regularly
http://www.stanfordlawreview.org/online/privacy-and-big-data
Fairness blog
http://fairness.haverford.edu
Academic: FATML workshops (NIPS 2014, ICML 2015)
www.fatml.org
45. Lessons
Discrimination is an emergent property of any learning algorithm
Watch out for discrimination (implicitly) encoded in features
Big Data can cause Big Problems
Watch out for the proportion of the protected classes
Always do error analysis with protected classes in mind
Notions of fairness are nascent at best. Involve as many people to improve
understanding.
There is no one best notion of fairness
Say you are building a dating app like Tinder. This involves solving a recommendation problem -- recommend a bunch of profiles to be rated thumbs up/thumbs down. This is a very straightforward collaborative filtering problem. We can build really performant models, because in addition to the historical rating data, we also have a very rich profile data.
Note: I’m using Tinder as a convenient example. This work has nothing to do with the Match Group or their dating app Tinder.
Let’s say our goal is to improve the following two business/engagement metrics:
% of right swipes
% of matches (a match happens two people right swipe on top of each other)
A good question ask yourself, “if this was a newspaper personals ad, would mentioning “looking for white guys/girls only” be tolerated? Will a newspaper publish such an ad? If not, how can we build apps that are essentially enabling that?
“But that is what the people want! I am race blind, but I have to give my users what they want”
People wanted segregated bathrooms at some point. Perhaps some people still do. But as a society, and a collective conscious, we agreed that is not in the interest of the minorities and the vulnerable classes, and it promotes discrimination.
So why are we okay in building the online equivalent of segregated bathrooms?
This disparity can arise not just from machine learning, but in any kind of data-driven policy making, which is becoming the norm today. Consider a city fixing potholes.
Imagine if there was an app to report the potholes, and the city could send somebody to fix them. Crowdsourcing for efficient governance. Sounds like a good idea, right?
What if most of the complaints came from well-to-do neighborhoods, because they complain about every little thing, and most of the limited city road-repair resources were diverted to these well-to-do neighborhoods?
The disparity caused by the street bump app was actually noted in this report commissioned by the White House in early 2014.
From the report abstract
Among the many call to actions, expanding technical expertise was identified as a major one.
Talking about expanding technical expertise, consider this piece on predictive policing by Gillian Tett. This appeared 3 months after the White House report came out.
Gillian is an experienced journalist, who railed against Wall Street quants for thoughtlessly deploying models.
The same Gillian Tett, now writes this on Chicago’s predictive policing. “The program has nothing to do with race but multi-variable equations”.
Today’s ML landscape is changing as we speak. Things are scaling in many dimensions.
Scaling dimension 1: Data collection and processing
It’s super easy to collect troves of data of all kinds, very cheap & fast to store/process it.
Scaling dimension 2: Tools
Today, pretty much anyone with basic programming skills (not even algorithmic) can build a predictive application. With startups building ML as a service, all you need is a rest endpoint and basic JS coding to build a model and serve predictions.
Scaling dimension 3: Data Scientist Factories
These are filling a need but not sure if that is the right way to fill ..
Statistics used to be a rigorous discipline. That hasn’t stopped cargo-cult statisticians to popup and do bad statistics. Similarly, ML used to be a very academic discipline with practitioners having insights about the models they were training. Today that’s no longer the case.
The curricula in most of these “data scientist factories” is questionable at best (8 weeks to learn ML/NLP/DataViz in one of them), and none of them have any time to get into the critical aspects about model thinking. What is worrisome is graduates from these places go on to work in places that build applications affecting people’s lives. At the very least, an awareness about the ethical issues of machine learning should be incorporated in every ML curriculum.
Let’s say there’s a sensitive attribute or set of attributes (like gender, race, etc) that partitions a population into a protected class (or a minority class), and the majority class.
Sometimes information about the sensitive attributes can “leak in” from other data sources, even if not explicitly encoded. For example, adding someone’s last name could divulge ethnicity or the location of an individual could correlate with race. Or in this example, know who your facebook friends are, and their sexual orientation, tells a lot about your sexual orientation!
http://firstmonday.org/article/view/2611/2302
Often sensitive attributes like race and gender are redundantly encoded in other variables. For example, the text in tweets can reveal many demographic variables
http://www.fastcompany.com/1769217/you-cant-keep-your-secrets-twitter
Big data can cause big problems
The very thing that makes big data helpful in driving error rates and producing better models, also explains why models perform poorly on the minority population. For minority populations, the number of training samples is dwarfed by the majority. One could try balancing the number of training samples in the majority, but that in turn creates an accuracy-fairness tradeoff.
https://en.wikipedia.org/wiki/Nymwars
If your classifier has 95% accuracy, and you deploy it in production, the 5% of the errors might be affecting a big chunk of the minority population.
Not exhaustive, but should provide a good seed set. The ones in bold are a must read.