AI should be Fair, Accountable and Transparent (FAT* AI), hence it's crucial to raise awareness among these topics not only among machine learning practitioners but among the entire population, as ML systems can take life-changing decisions and influence our lives now more than ever.
Recently, the machine learning community has expressed strong interest in applying latent variable modeling strategies to causal inference problems with unobserved confounding. Here, I discuss one of the big debates that occurred over the past year, and how we can move forward. I will focus specifically on the failure of point identification in this setting, and discuss how this can be used to design flexible sensitivity analyses that cleanly separate identified and unidentified components of the causal model.
Medical Image Data Augmentation using GANsHazrat Ali
Medical Image Data Augmentation using Generative Adversarial Networks. The talk is based on NeurIPS 2019 paper, Generative Image Translation for Data Augmentation in Colorectal Histopathology Images.
I reviewed the "Causal Confusion in Imitation Learning" paper.
- Abstract
Behavioral cloning reduces policy learning to supervised learning by training a discriminative model to predict expert actions given observations. Such discriminative models are non-causal: the training procedure is unaware of the causal structure of the interaction between the expert and the environment. We point out that ignoring causality is particularly damaging because of the distributional shift in imitation learning. In particular, it leads to a counter-intuitive “causal misidentification” phenomenon: access to more information can yield worse performance. We investigate how this problem arises, and propose a solution to combat it through targeted interventions—either environment interaction or expert queries—to determine the correct causal model. We show that causal misidentification occurs in several benchmark control domains as well as realistic driving settings, and validate our solution against DAgger and other baselines and ablations.
- Outline
1. Introduction
2. Causality and Causal Inference
3. Causality in Imitation Learning
4. Experiments Setting
5. Resolving Causal Misidentification
- Causal Graph-Parameterized Policy Learning
- Targeted Intervention
6. Experiments
Link: https://papers.nips.cc/paper/9343-causal-confusion-in-imitation-learning.pdf
Thank you!
AI should be Fair, Accountable and Transparent (FAT* AI), hence it's crucial to raise awareness among these topics not only among machine learning practitioners but among the entire population, as ML systems can take life-changing decisions and influence our lives now more than ever.
Recently, the machine learning community has expressed strong interest in applying latent variable modeling strategies to causal inference problems with unobserved confounding. Here, I discuss one of the big debates that occurred over the past year, and how we can move forward. I will focus specifically on the failure of point identification in this setting, and discuss how this can be used to design flexible sensitivity analyses that cleanly separate identified and unidentified components of the causal model.
Medical Image Data Augmentation using GANsHazrat Ali
Medical Image Data Augmentation using Generative Adversarial Networks. The talk is based on NeurIPS 2019 paper, Generative Image Translation for Data Augmentation in Colorectal Histopathology Images.
I reviewed the "Causal Confusion in Imitation Learning" paper.
- Abstract
Behavioral cloning reduces policy learning to supervised learning by training a discriminative model to predict expert actions given observations. Such discriminative models are non-causal: the training procedure is unaware of the causal structure of the interaction between the expert and the environment. We point out that ignoring causality is particularly damaging because of the distributional shift in imitation learning. In particular, it leads to a counter-intuitive “causal misidentification” phenomenon: access to more information can yield worse performance. We investigate how this problem arises, and propose a solution to combat it through targeted interventions—either environment interaction or expert queries—to determine the correct causal model. We show that causal misidentification occurs in several benchmark control domains as well as realistic driving settings, and validate our solution against DAgger and other baselines and ablations.
- Outline
1. Introduction
2. Causality and Causal Inference
3. Causality in Imitation Learning
4. Experiments Setting
5. Resolving Causal Misidentification
- Causal Graph-Parameterized Policy Learning
- Targeted Intervention
6. Experiments
Link: https://papers.nips.cc/paper/9343-causal-confusion-in-imitation-learning.pdf
Thank you!
As electricity is difficult to store, it is crucial to strictly maintain the balance between production and consumption. The integration of intermittent renewable energies into the production mix has made the management of the balance more complex. However, access to near real-time data and communication with consumers via smart meters suggest demand response. Specifically, sending signals would encourage users to adjust their consumption according to the production of electricity. The algorithms used to select these signals must learn consumer reactions and optimize them while balancing exploration and exploitation. Various sequential or reinforcement learning approaches are being considered.
Online violence amplifies IRL discriminations, and the lack of diversity grows in a vicious circle. Understanding cyber-violence, its forms and mechanisms, can help us fight back. To process massive volumes of data, AI finally comes into play for good.
In the energy sector, the use of temporal data stands as a pivotal topic. At GRDF, we have developed several methods to effectively handle such data. This presentation will specifically delve into our approaches for anomaly detection and data imputation within time series, leveraging transformers and adversarial training techniques.
Natasha shares her experience to delve into the complexities, challenges, and strategies associated with effectively leading tech teams dispersed across borders.
Nour and Maria present the work they did at Tweag, Modus Create innovation arm, where the GenAI team developed an evaluation framework for Retrieval-Augmented Generation (RAG) systems. RAG systems provide an easy and low-cost way to extend the knowledge of Large Language Models (LLMs) but measuring their performance is not an easy task.
The presentation will review existing evaluation frameworks, ranging from those based on the traditional ML approach of using groundtruth datasets, including Tweag's, to those that use LLMs to compute evaluation metrics.
It will also delve into the practical implementation of Tweag's chatbot over two distinct documents datasets and provide insights on chunking, embedding and how open source and commercial LLMs compare.
Sharone Dayan, Machine Learning Engineer and Daria Stefic, Data Scientist, both from Contentsquare, delve into evaluation strategies for dealing with partially labelled or unlabelled data.
Laure talked about a very hot topic in the community at the moment with the ChatGPT phenomenon: how to supervise a PhD thesis in NLP in the age of Large Language Models (LLMs)?
Abstract: Who hasn't heard of the "Pilot Syndrome"? 85% of Data Science Pilots remain pilots and do not make it to the production stage. Let's build a production-ready and end-user-friendly Data Science application. 100% python and 100% open source.
Phase 1 | Building the GUI: create an interactive and powerful interface in a few lines of code
Phase 2 | Integrated back end: Manage your models and pipelines and create scenarios the smart way
"Nature Language Processing for proteins" by Amélie Héliou, Software Engineer @ Google Research
Abstract: Over the past few months, Large Language Models have become very popular.
We'll see how a simple LLM works, from input sentence to prediction.
I'll then present an application of LLM to protein name prediction.
Twitter: @Amelie_hel
"We are not passing by, and we are not a trend". What if an automated and large scale version of the Bechdel-Wallace test could confirm the speech of Alice Diop at the Cesar 2023?
That's the objective of BechdelAI : to build a tool based on Artificial Intelligence and open-source, allowing to measure the inequalities and the under-representation of women in movies and audiovisual.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
More Related Content
More from Paris Women in Machine Learning and Data Science
As electricity is difficult to store, it is crucial to strictly maintain the balance between production and consumption. The integration of intermittent renewable energies into the production mix has made the management of the balance more complex. However, access to near real-time data and communication with consumers via smart meters suggest demand response. Specifically, sending signals would encourage users to adjust their consumption according to the production of electricity. The algorithms used to select these signals must learn consumer reactions and optimize them while balancing exploration and exploitation. Various sequential or reinforcement learning approaches are being considered.
Online violence amplifies IRL discriminations, and the lack of diversity grows in a vicious circle. Understanding cyber-violence, its forms and mechanisms, can help us fight back. To process massive volumes of data, AI finally comes into play for good.
In the energy sector, the use of temporal data stands as a pivotal topic. At GRDF, we have developed several methods to effectively handle such data. This presentation will specifically delve into our approaches for anomaly detection and data imputation within time series, leveraging transformers and adversarial training techniques.
Natasha shares her experience to delve into the complexities, challenges, and strategies associated with effectively leading tech teams dispersed across borders.
Nour and Maria present the work they did at Tweag, Modus Create innovation arm, where the GenAI team developed an evaluation framework for Retrieval-Augmented Generation (RAG) systems. RAG systems provide an easy and low-cost way to extend the knowledge of Large Language Models (LLMs) but measuring their performance is not an easy task.
The presentation will review existing evaluation frameworks, ranging from those based on the traditional ML approach of using groundtruth datasets, including Tweag's, to those that use LLMs to compute evaluation metrics.
It will also delve into the practical implementation of Tweag's chatbot over two distinct documents datasets and provide insights on chunking, embedding and how open source and commercial LLMs compare.
Sharone Dayan, Machine Learning Engineer and Daria Stefic, Data Scientist, both from Contentsquare, delve into evaluation strategies for dealing with partially labelled or unlabelled data.
Laure talked about a very hot topic in the community at the moment with the ChatGPT phenomenon: how to supervise a PhD thesis in NLP in the age of Large Language Models (LLMs)?
Abstract: Who hasn't heard of the "Pilot Syndrome"? 85% of Data Science Pilots remain pilots and do not make it to the production stage. Let's build a production-ready and end-user-friendly Data Science application. 100% python and 100% open source.
Phase 1 | Building the GUI: create an interactive and powerful interface in a few lines of code
Phase 2 | Integrated back end: Manage your models and pipelines and create scenarios the smart way
"Nature Language Processing for proteins" by Amélie Héliou, Software Engineer @ Google Research
Abstract: Over the past few months, Large Language Models have become very popular.
We'll see how a simple LLM works, from input sentence to prediction.
I'll then present an application of LLM to protein name prediction.
Twitter: @Amelie_hel
"We are not passing by, and we are not a trend". What if an automated and large scale version of the Bechdel-Wallace test could confirm the speech of Alice Diop at the Cesar 2023?
That's the objective of BechdelAI : to build a tool based on Artificial Intelligence and open-source, allowing to measure the inequalities and the under-representation of women in movies and audiovisual.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Water billing management system project report.pdfKamal Acharya
Our project entitled “Water Billing Management System” aims is to generate Water bill with all the charges and penalty. Manual system that is employed is extremely laborious and quite inadequate. It only makes the process more difficult and hard.
The aim of our project is to develop a system that is meant to partially computerize the work performed in the Water Board like generating monthly Water bill, record of consuming unit of water, store record of the customer and previous unpaid record.
We used HTML/PHP as front end and MYSQL as back end for developing our project. HTML is primarily a visual design environment. We can create a android application by designing the form and that make up the user interface. Adding android application code to the form and the objects such as buttons and text boxes on them and adding any required support code in additional modular.
MySQL is free open source database that facilitates the effective management of the databases by connecting them to the software. It is a stable ,reliable and the powerful solution with the advanced features and advantages which are as follows: Data Security.MySQL is free open source database that facilitates the effective management of the databases by connecting them to the software.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Identifying and mitigating bias in machine learning, by Ruta Binkyte
1. FAIR AI
Image by James Sutherland from Pixabay
Identifying and mitigating
bias in machine learning
2. 2
Comete Team
Privacy Fairness Causality
Catuscia Palamidessi
Director of Comete team at
Inria Saclay-Ile-de-France,
Ecole Polytechnique
Sami Zhioua
Senior Researcher at Inria
Saclay-Ile-de-France, Ecole
Polytechnique
Ruta Binkyte
Ph.D. Student at Inria
Saclay-Ile-de-France, Ecole
Polytechnique, IP Paris
5. 5
What happened?
The algorithms for face recognition have
much lower accuracy for darker female
faces.
Racial bias
in computer
vision Why?
Black women underrepresented in the
training data.
6. 6
What happened?
The algorithm built to predict the need
for medical interventions (sickness)
would give lower score for black patients
who where the same or more sick than
the white ones.
Racial bias in
healthcare AI
Why?
The proxy used for “sickness” was
healthcare spending, which correlated
race.
Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447-453.
8. 8
The Learning Process
Raw Data
Features/
Attributes
The Prediction
The Decision
Feed Back Loop
Machine Learning Process
9. 9
A group of is underrepresented
in a sample.
Types of Bias
Re p r e s e n t a t i o n B i a s
A historically disadvantaged group has
lower occurrence of positive labels
H i s t o r i c a l B i a s
Sample
population
https://www.britannica.com/topic/racial-segregation
10. 10
When under-representation
can affect discrimination? <3%
Common for
Intersectional
Sensitive
attribute
Zhioua, S., & Binkytė, R. (2023). Shedding light on underrepresentation and Sampling Bias in machine learning. arXiv preprint arXiv:2306.05068.
12. 12
Representation Bias:
Does data augmentation
always solve the problem?
“When representation bias is combined
with historical bias, data augmentation
can increase the disparity ”
Zhioua, S., & Binkytė, R. (2023). Shedding light on underrepresentation and Sampling Bias in machine learning. arXiv preprint arXiv:2306.05068.
13. 13
When representation
bias is combined with
historical bias, the
oversampling can
increase the bias
Zhioua, S., & Binkytė, R. (2023). Shedding light on underrepresentation and Sampling Bias in machine learning. arXiv preprint arXiv:2306.05068.
Discrimination while augmenting
the training set with female group
samples randomly. The male group
size is
fi
xed at 100. Dataset is
Dutch Census and training
algorithm is logistic regression.
Discrimination while augmenting
the training set with only positive
outcome female group samples.
The male group size is
fi
xed at
100. Dataset is Dutch Census and
training algorithm is logistic
regression.
Random
augmentation vs.
Positive label
augmentation
14. 14
Most bias mitigation algorithms aim to satisfy
Statistical Parity
Photo by Possessed Photography on Unsplash
P( ̂
Y = 1|S = 0) = P( ̂
Y = 1|S = 1)
Statistical Parity
Where is the prediction, S is the sensitive
attribute
̂
Y
15. 15
Historical Bias:
Does “equal” always
mean “fair”?
“Black patients on average have 26.3%
more chronic diseases than the white”
Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447-453.
16. 16
More nuanced fairness notions:
P( ̂
Y = 1|S = 0,E) = P( ̂
Y = 1|S = 1,E)
Conditional Statistical Parity
Where is the prediction, S is the sensitive attribute, E is
explanatory attribute and Y is the label
̂
Y
Equal Opportunity
P( ̂
Y = 1|S = 0,Y = 1) = P( ̂
Y = 1|S = 1,Y = 1)
Photo by Possessed Photography on Unsplash
17. 17
When representation
bias is combined with
historical bias, the
oversampling can
increase the bias
BaBE:
Bayesian Bias
Elimination
Binkytė, R., Gorla, D., Palamidessi, C. (2023). BaBE: Enhancing Fairness via Estimation of Latent Explaining Variables
a) Causal relation between the
variables , and the decision
based on .
S, E, Z
YZ Z
b) Derivation of . The
decision is then based directly
on .
E|S, Z
YE
E
18. 18
Results for Disparate Impact Remover and BaBE
The original distributions P [E|S = 0] (green), P
[E|S = 1] (orange) and the distributions of the E
computed by Disparate Impact Remover (blue
and magenta) for S=0. The probability for higher
values for S=0 is increased minimally to match
the distribution of S=1
The original distributions P [E|S = 0] (green),
P [E|S = 1] (orange) and the distributions of
the E estimated by BaBE (blue and magenta)
for S=0. BaBE accurately matches the true
distributions both for S=0 and S=1
The distributions of E (green) and biased
Z (blue) for S = 0, and for S=1 E (orange),
biased Z (magenta).
S=1 S=0