Fairness-aware Learning through Regularization Approach
The 3rd IEEE International Workshop on Privacy Aspects of Data Mining (PADM 2011)
Dec. 11, 2011 @ Vancouver, Canada, in conjunction with ICDM2011
Article @ Official Site: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2011.83
Article @ Personal Site: http://www.kamishima.net/archive/2011-ws-icdm_padm.pdf
Handnote: http://www.kamishima.net/archive/2011-ws-icdm_padm-HN.pdf
Workshop Homepage: http://www.zurich.ibm.com/padm2011/
Abstract:
With the spread of data mining technologies and the accumulation of social data, such technologies and data are being used for determinations that seriously affect people's lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be socially and legally fair from a viewpoint of social responsibility; namely, it must be unbiased and nondiscriminatory in sensitive features, such as race, gender, religion, and so on. Several researchers have recently begun to attempt the development of analysis techniques that are aware of social fairness or discrimination. They have shown that simply avoiding the use of sensitive features is insufficient for eliminating biases in determinations, due to the indirect influence of sensitive information. From a privacy-preserving viewpoint, this can be interpreted as hiding sensitive information when classification results are observed. In this paper, we first discuss three causes of unfairness in machine learning. We then propose a regularization approach that is applicable to any prediction algorithm with probabilistic discriminative models. We further apply this approach to logistic regression and empirically show its effectiveness and efficiency.
Consideration on Fairness-aware Data Mining
IEEE International Workshop on Discrimination and Privacy-Aware Data Mining (DPADM 2012)
Dec. 10, 2012 @ Brussels, Belgium, in conjunction with ICDM2012
Article @ Official Site: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2012.101
Article @ Personal Site: http://www.kamishima.net/archive/2012-ws-icdm-print.pdf
Handnote: http://www.kamishima.net/archive/2012-ws-icdm-HN.pdf
Workshop Homepage: https://sites.google.com/site/dpadm2012/
Abstract:
With the spread of data mining technologies and the accumulation of social data, such technologies and data are being used for determinations that seriously affect individuals' lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be nondiscriminatory and fair regarding sensitive features such as race, gender, religion, and so on. Several researchers have recently begun to develop fairness-aware or discrimination-aware data mining techniques that take into account issues of social fairness, discrimination, and neutrality. In this paper, after demonstrating the applications of these techniques, we explore the formal concepts of fairness and techniques for handling fairness in data mining. We then provide an integrated view of these concepts based on statistical independence. Finally, we discuss the relations between fairness-aware data mining and other research topics, such as privacy-preserving data mining or causal inference.
Correcting Popularity Bias by Enhancing Recommendation NeutralityToshihiro Kamishima
Correcting Popularity Bias by Enhancing Recommendation Neutrality on
The 8th ACM Conference on Recommender Systems, Poster
Article @ Official Site: http://ceur-ws.org/Vol-1247/
Article @ Personal Site: http://www.kamishima.net/archive/2014-po-recsys-print.pdf
Abstract:
In this paper, we attempt to correct a popularity bias, which is the tendency for popular items to be recommended more frequently, by enhancing recommendation neutrality. Recommendation neutrality involves excluding specified information from the prediction process of recommendation. This neutrality was formalized as the statistical independence between a recommendation result and the specified information, and we developed a recommendation algorithm that satisfies this independence constraint. We correct the popularity bias by enhancing neutrality with respect to information regarding whether candidate items are popular or not. We empirically show that a popularity bias in the predicted preference scores can be corrected.
The Independence of Fairness-aware Classifiers
IEEE International Workshop on Privacy Aspects of Data Mining (PADM), in conjunction with ICDM2013
Article @ Official Site:
Article @ Personal Site: http://www.kamishima.net/archive/2013-ws-icdm-print.pdf
Handnote : http://www.kamishima.net/archive/2013-ws-icdm-HN.pdf
Program codes : http://www.kamishima.net/fadm/
Workshop Homepage: http://www.cs.cf.ac.uk/padm2013/
Abstract:
Due to the spread of data mining technologies, such technologies are being used for determinations that seriously affect individuals' lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be nondiscriminatory and fair in sensitive features, such as race, gender, religion, and so on. The goal of fairness-aware classifiers is to classify data while taking into account the potential issues of fairness, discrimination, neutrality, and/or independence. In this paper, after reviewing fairness-aware classification methods, we focus on one such method, Calders and Verwer's two-naive-Bayes method. This method has been shown superior to the other classifiers in terms of fairness, which is formalized as the statistical independence between a class and a sensitive feature. However, the cause of the superiority is unclear, because it utilizes a somewhat heuristic post-processing technique rather than an explicitly formalized model. We clarify the cause by comparing this method with an alternative naive Bayes classifier, which is modified by a modeling technique called "hypothetical fair-factorization." This investigation reveals the theoretical background of the two-naive-Bayes method and its connections with other methods. Based on these findings, we develop another naive Bayes method with an "actual fair-factorization" technique and empirically show that this new method can achieve an equal level of fairness as that of the two-naive-Bayes classifier.
Fairness-aware Classifier with Prejudice Remover RegularizerToshihiro Kamishima
Fairness-aware Classifier with Prejudice Remover Regularizer
Proceedings of the European Conference on Machine Learning and Principles of Knowledge Discovery in Databases (ECMLPKDD), Part II, pp.35-50 (2012)
Article @ Official Site: http://dx.doi.org/10.1007/978-3-642-33486-3_3
Article @ Personal Site: http://www.kamishima.net/archive/2012-p-ecmlpkdd-print.pdf
Handnote: http://www.kamishima.net/archive/2012-p-ecmlpkdd-HN.pdf
Program codes : http://www.kamishima.net/fadm/
Conference Homepage: http://www.ecmlpkdd2012.net/
Abstract:
With the spread of data mining technologies and the accumulation of social data, such technologies and data are being used for determinations that seriously affect individuals' lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be nondiscriminatory and fair in sensitive features, such as race, gender, religion, and so on. Several researchers have recently begun to attempt the development of analysis techniques that are aware of social fairness or discrimination. They have shown that simply avoiding the use of sensitive features is insufficient for eliminating biases in determinations, due to the indirect influence of sensitive information. In this paper, we first discuss three causes of unfairness in machine learning. We then propose a regularization approach that is applicable to any prediction algorithm with probabilistic discriminative models. We further apply this approach to logistic regression and empirically show its effectiveness and efficiency.
Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, a...Toshihiro Kamishima
Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, and Theoretical Aspects
Invited Talk @ Workshop on Fairness, Accountability, and Transparency in Machine Learning
In conjunction with the ICML 2015 @ Lille, France, Jul. 11, 2015
Web Site: http://www.kamishima.net/fadm/
Handnote: http://www.kamishima.net/archive/2015-ws-icml-HN.pdf
The goal of fairness-aware data mining (FADM) is to analyze data while taking into account potential issues of fairness. In this talk, we will cover three topics in FADM:
1. Fairness in a Recommendation Context: In classification tasks, the term "fairness" is regarded as anti-discrimination. We will present other types of problems related to the fairness in a recommendation context.
2. What is Fairness: Most formal definitions of fairness have a connection with the notion of statistical independence. We will explore other types of formal fairness based on causality, agreement, and unfairness.
3. Theoretical Problems of FADM: After reviewing technical and theoretical open problems in the FADM literature, we will introduce the theory of the generalization bound in terms of accuracy as well as fairness.
Joint work with Jun Sakuma, Shotaro Akaho, and Hideki Asoh
Model-based Approaches for Independence-Enhanced RecommendationToshihiro Kamishima
Model-based Approaches for Independence-Enhanced Recommendation
IEEE International Workshop on Privacy Aspects of Data Mining (PADM), in conjunction with ICDM2016
Article @ Official Site: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2016.0127
Workshop Homepage: http://pddm16.eurecat.org/
Abstract:
This paper studies a new approach to enhance recommendation independence. Such approaches are useful in ensuring adherence to laws and regulations, fair treatment of content providers, and exclusion of unwanted information. For example, recommendations that match an employer with a job applicant should not be based on socially sensitive information, such as gender or race, from the perspective of social fairness. An algorithm that could exclude the influence of such sensitive information would be useful in this case. We previously gave a formal definition of recommendation independence and proposed a method adopting a regularizer that imposes such an independence constraint. As no other options than this regularization approach have been put forward, we here propose a new model-based approach, which is based on a generative model that satisfies the constraint of recommendation independence. We apply this approach to a latent class model and empirically show that the model-based approach can enhance recommendation independence. Recommendation algorithms based on generative models, such as topic models, are important, because they have a flexible functionality that enables them to incorporate a wide variety of information types. Our new model-based approach will broaden the applications of independence-enhanced recommendation by integrating the functionality of generative models.
Efficiency Improvement of Neutrality-Enhanced RecommendationToshihiro Kamishima
Efficiency Improvement of Neutrality-Enhanced Recommendation
Workshop on Human Decision Making in Recommender Systems, in conjunction with RecSys 2013
Article @ Official Site: http://ceur-ws.org/Vol-1050/
Article @ Personal Site: http://www.kamishima.net/archive/2013-ws-recsys-print.pdf
Handnote : http://www.kamishima.net/archive/2013-ws-recsys-HN.pdf
Program codes : http://www.kamishima.net/inrs/
Workshop Homepage: http://recex.ist.tugraz.at/RecSysWorkshop/
Abstract:
This paper proposes an algorithm for making recommendations so that neutrality from a viewpoint specified by the user is enhanced. This algorithm is useful for avoiding decisions based on biased information. Such a problem is pointed out as the filter bubble, which is the influence in social decisions biased by personalization technologies. To provide a neutrality-enhanced recommendation, we must first assume that a user can specify a particular viewpoint from which the neutrality can be applied, because a recommendation that is neutral from all viewpoints is no longer a recommendation. Given such a target viewpoint, we implement an information-neutral recommendation algorithm by introducing a penalty term to enforce statistical independence between the target viewpoint and a rating. We empirically show that our algorithm enhances the independence from the specified viewpoint.
Considerations on Recommendation Independence for a Find-Good-Items TaskToshihiro Kamishima
Considerations on Recommendation Independence for a Find-Good-Items Task
Workshop on Responsible Recommendation (FATREC), in conjunction with RecSys2017
Article @ Official Site: http://doi.org/10.18122/B2871W
Workshop Homepage: https://piret.gitlab.io/fatrec/
This paper examines the notion of recommendation independence, which is a constraint that a recommendation result is independent from specific information. This constraint is useful in ensuring adherence to laws and regulations, fair treatment of content providers, and exclusion of unwanted information. For example, to make a job-matching recommendation socially fair, the matching should be independent of socially sensitive information, such as gender or race. We previously developed several recommenders satisfying recommendation independence, but these were all designed for a predicting-ratings task, whose goal is to predict a score that a user would rate. We here focus on another find-good-items task, which aims to find some items that a user would prefer. In this task, scores representing the degree of preference to items are first predicted, and some items having the largest scores are displayed in the form of a ranked list. We developed a preliminary algorithm for this task through a naive approach, enhancing independence between a preference score and sensitive information. We empirically show that although this algorithm can enhance independence of a preference score, it is not fit for the purpose of enhancing independence in terms of a ranked list. This result indicates the need for inventing a notion of independence that is suitable for use with a ranked list and that is applicable for completing a find-good-items task.
Consideration on Fairness-aware Data Mining
IEEE International Workshop on Discrimination and Privacy-Aware Data Mining (DPADM 2012)
Dec. 10, 2012 @ Brussels, Belgium, in conjunction with ICDM2012
Article @ Official Site: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2012.101
Article @ Personal Site: http://www.kamishima.net/archive/2012-ws-icdm-print.pdf
Handnote: http://www.kamishima.net/archive/2012-ws-icdm-HN.pdf
Workshop Homepage: https://sites.google.com/site/dpadm2012/
Abstract:
With the spread of data mining technologies and the accumulation of social data, such technologies and data are being used for determinations that seriously affect individuals' lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be nondiscriminatory and fair regarding sensitive features such as race, gender, religion, and so on. Several researchers have recently begun to develop fairness-aware or discrimination-aware data mining techniques that take into account issues of social fairness, discrimination, and neutrality. In this paper, after demonstrating the applications of these techniques, we explore the formal concepts of fairness and techniques for handling fairness in data mining. We then provide an integrated view of these concepts based on statistical independence. Finally, we discuss the relations between fairness-aware data mining and other research topics, such as privacy-preserving data mining or causal inference.
Correcting Popularity Bias by Enhancing Recommendation NeutralityToshihiro Kamishima
Correcting Popularity Bias by Enhancing Recommendation Neutrality on
The 8th ACM Conference on Recommender Systems, Poster
Article @ Official Site: http://ceur-ws.org/Vol-1247/
Article @ Personal Site: http://www.kamishima.net/archive/2014-po-recsys-print.pdf
Abstract:
In this paper, we attempt to correct a popularity bias, which is the tendency for popular items to be recommended more frequently, by enhancing recommendation neutrality. Recommendation neutrality involves excluding specified information from the prediction process of recommendation. This neutrality was formalized as the statistical independence between a recommendation result and the specified information, and we developed a recommendation algorithm that satisfies this independence constraint. We correct the popularity bias by enhancing neutrality with respect to information regarding whether candidate items are popular or not. We empirically show that a popularity bias in the predicted preference scores can be corrected.
The Independence of Fairness-aware Classifiers
IEEE International Workshop on Privacy Aspects of Data Mining (PADM), in conjunction with ICDM2013
Article @ Official Site:
Article @ Personal Site: http://www.kamishima.net/archive/2013-ws-icdm-print.pdf
Handnote : http://www.kamishima.net/archive/2013-ws-icdm-HN.pdf
Program codes : http://www.kamishima.net/fadm/
Workshop Homepage: http://www.cs.cf.ac.uk/padm2013/
Abstract:
Due to the spread of data mining technologies, such technologies are being used for determinations that seriously affect individuals' lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be nondiscriminatory and fair in sensitive features, such as race, gender, religion, and so on. The goal of fairness-aware classifiers is to classify data while taking into account the potential issues of fairness, discrimination, neutrality, and/or independence. In this paper, after reviewing fairness-aware classification methods, we focus on one such method, Calders and Verwer's two-naive-Bayes method. This method has been shown superior to the other classifiers in terms of fairness, which is formalized as the statistical independence between a class and a sensitive feature. However, the cause of the superiority is unclear, because it utilizes a somewhat heuristic post-processing technique rather than an explicitly formalized model. We clarify the cause by comparing this method with an alternative naive Bayes classifier, which is modified by a modeling technique called "hypothetical fair-factorization." This investigation reveals the theoretical background of the two-naive-Bayes method and its connections with other methods. Based on these findings, we develop another naive Bayes method with an "actual fair-factorization" technique and empirically show that this new method can achieve an equal level of fairness as that of the two-naive-Bayes classifier.
Fairness-aware Classifier with Prejudice Remover RegularizerToshihiro Kamishima
Fairness-aware Classifier with Prejudice Remover Regularizer
Proceedings of the European Conference on Machine Learning and Principles of Knowledge Discovery in Databases (ECMLPKDD), Part II, pp.35-50 (2012)
Article @ Official Site: http://dx.doi.org/10.1007/978-3-642-33486-3_3
Article @ Personal Site: http://www.kamishima.net/archive/2012-p-ecmlpkdd-print.pdf
Handnote: http://www.kamishima.net/archive/2012-p-ecmlpkdd-HN.pdf
Program codes : http://www.kamishima.net/fadm/
Conference Homepage: http://www.ecmlpkdd2012.net/
Abstract:
With the spread of data mining technologies and the accumulation of social data, such technologies and data are being used for determinations that seriously affect individuals' lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be nondiscriminatory and fair in sensitive features, such as race, gender, religion, and so on. Several researchers have recently begun to attempt the development of analysis techniques that are aware of social fairness or discrimination. They have shown that simply avoiding the use of sensitive features is insufficient for eliminating biases in determinations, due to the indirect influence of sensitive information. In this paper, we first discuss three causes of unfairness in machine learning. We then propose a regularization approach that is applicable to any prediction algorithm with probabilistic discriminative models. We further apply this approach to logistic regression and empirically show its effectiveness and efficiency.
Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, a...Toshihiro Kamishima
Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, and Theoretical Aspects
Invited Talk @ Workshop on Fairness, Accountability, and Transparency in Machine Learning
In conjunction with the ICML 2015 @ Lille, France, Jul. 11, 2015
Web Site: http://www.kamishima.net/fadm/
Handnote: http://www.kamishima.net/archive/2015-ws-icml-HN.pdf
The goal of fairness-aware data mining (FADM) is to analyze data while taking into account potential issues of fairness. In this talk, we will cover three topics in FADM:
1. Fairness in a Recommendation Context: In classification tasks, the term "fairness" is regarded as anti-discrimination. We will present other types of problems related to the fairness in a recommendation context.
2. What is Fairness: Most formal definitions of fairness have a connection with the notion of statistical independence. We will explore other types of formal fairness based on causality, agreement, and unfairness.
3. Theoretical Problems of FADM: After reviewing technical and theoretical open problems in the FADM literature, we will introduce the theory of the generalization bound in terms of accuracy as well as fairness.
Joint work with Jun Sakuma, Shotaro Akaho, and Hideki Asoh
Model-based Approaches for Independence-Enhanced RecommendationToshihiro Kamishima
Model-based Approaches for Independence-Enhanced Recommendation
IEEE International Workshop on Privacy Aspects of Data Mining (PADM), in conjunction with ICDM2016
Article @ Official Site: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2016.0127
Workshop Homepage: http://pddm16.eurecat.org/
Abstract:
This paper studies a new approach to enhance recommendation independence. Such approaches are useful in ensuring adherence to laws and regulations, fair treatment of content providers, and exclusion of unwanted information. For example, recommendations that match an employer with a job applicant should not be based on socially sensitive information, such as gender or race, from the perspective of social fairness. An algorithm that could exclude the influence of such sensitive information would be useful in this case. We previously gave a formal definition of recommendation independence and proposed a method adopting a regularizer that imposes such an independence constraint. As no other options than this regularization approach have been put forward, we here propose a new model-based approach, which is based on a generative model that satisfies the constraint of recommendation independence. We apply this approach to a latent class model and empirically show that the model-based approach can enhance recommendation independence. Recommendation algorithms based on generative models, such as topic models, are important, because they have a flexible functionality that enables them to incorporate a wide variety of information types. Our new model-based approach will broaden the applications of independence-enhanced recommendation by integrating the functionality of generative models.
Efficiency Improvement of Neutrality-Enhanced RecommendationToshihiro Kamishima
Efficiency Improvement of Neutrality-Enhanced Recommendation
Workshop on Human Decision Making in Recommender Systems, in conjunction with RecSys 2013
Article @ Official Site: http://ceur-ws.org/Vol-1050/
Article @ Personal Site: http://www.kamishima.net/archive/2013-ws-recsys-print.pdf
Handnote : http://www.kamishima.net/archive/2013-ws-recsys-HN.pdf
Program codes : http://www.kamishima.net/inrs/
Workshop Homepage: http://recex.ist.tugraz.at/RecSysWorkshop/
Abstract:
This paper proposes an algorithm for making recommendations so that neutrality from a viewpoint specified by the user is enhanced. This algorithm is useful for avoiding decisions based on biased information. Such a problem is pointed out as the filter bubble, which is the influence in social decisions biased by personalization technologies. To provide a neutrality-enhanced recommendation, we must first assume that a user can specify a particular viewpoint from which the neutrality can be applied, because a recommendation that is neutral from all viewpoints is no longer a recommendation. Given such a target viewpoint, we implement an information-neutral recommendation algorithm by introducing a penalty term to enforce statistical independence between the target viewpoint and a rating. We empirically show that our algorithm enhances the independence from the specified viewpoint.
Considerations on Recommendation Independence for a Find-Good-Items TaskToshihiro Kamishima
Considerations on Recommendation Independence for a Find-Good-Items Task
Workshop on Responsible Recommendation (FATREC), in conjunction with RecSys2017
Article @ Official Site: http://doi.org/10.18122/B2871W
Workshop Homepage: https://piret.gitlab.io/fatrec/
This paper examines the notion of recommendation independence, which is a constraint that a recommendation result is independent from specific information. This constraint is useful in ensuring adherence to laws and regulations, fair treatment of content providers, and exclusion of unwanted information. For example, to make a job-matching recommendation socially fair, the matching should be independent of socially sensitive information, such as gender or race. We previously developed several recommenders satisfying recommendation independence, but these were all designed for a predicting-ratings task, whose goal is to predict a score that a user would rate. We here focus on another find-good-items task, which aims to find some items that a user would prefer. In this task, scores representing the degree of preference to items are first predicted, and some items having the largest scores are displayed in the form of a ranked list. We developed a preliminary algorithm for this task through a naive approach, enhancing independence between a preference score and sensitive information. We empirically show that although this algorithm can enhance independence of a preference score, it is not fit for the purpose of enhancing independence in terms of a ranked list. This result indicates the need for inventing a notion of independence that is suitable for use with a ranked list and that is applicable for completing a find-good-items task.
Recommendation Independence
The 1st Conference on Fairness, Accountability, and Transparency
Article @ Official Site: http://proceedings.mlr.press/v81/kamishima18a.html
Conference site: https://fatconference.org/2018/
Abstract:
This paper studies a recommendation algorithm whose outcomes are not influenced by specified information. It is useful in contexts potentially unfair decision should be avoided, such as job-applicant recommendations that are not influenced by socially sensitive information. An algorithm that could exclude the influence of sensitive information would thus be useful for job-matching with fairness. We call the condition between a recommendation outcome and a sensitive feature Recommendation Independence, which is formally defined as statistical independence between the outcome and the feature. Our previous independence-enhanced algorithms simply matched the means of predictions between sub-datasets consisting of the same sensitive value. However, this approach could not remove the sensitive information represented by the second or higher moments of distributions. In this paper, we develop new methods that can deal with the second moment, i.e., variance, of recommendation outcomes without increasing the computational complexity. These methods can more strictly remove the sensitive information, and experimental results demonstrate that our new algorithms can more effectively eliminate the factors that undermine fairness. Additionally, we explore potential applications for independence-enhanced recommendation, and discuss its relation to other concepts, such as recommendation diversity.
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org. If you would like to use this course content for teaching, please reach out to inquiry@deltanalytics.org
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.
The Formation of Job Referral Networks: Evidence from a Field Experiment in U...essp2
International Food Policy Research Institute (IFPRI) and Ethiopian Development Research Institute (EDRI) in collaboration with Ethiopian Economics Association (EEA). Eleventh International Conference on Ethiopian Economy. July 18-20, 2013
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERSAndresz26
Este Working Paper relata sobre el nivel del riesgo crediticio, se debe reconocer que el riesgo de un grupo proviene de la diversidad de sus miembros, este libro propone una metodología para la aplicación de la medición del riesgo crediticio, y permite hacer un ranking de la población por su nivel de riesgo. La misma que realiza una distinción en los diferentes rankings de la población por su nivel de riesgo, y considerando en el ranking los riesgos de sus preferencias en sus decisiones realizadas.
http://www.udla.edu.ec/
International Journal of Computational Engineering Research (IJCER) ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Simplilearn
This Decision Tree Algorithm in Machine Learning Presentation will help you understand all the basics of Decision Tree along with what is Machine Learning, problems in Machine Learning, what is Decision Tree, advantages and disadvantages of Decision Tree, how Decision Tree algorithm works with solved examples and at the end we will implement a Decision Tree use case/ demo in Python on loan payment prediction. This Decision Tree tutorial is ideal for both beginners as well as professionals who want to learn Machine Learning Algorithms.
Below topics are covered in this Decision Tree Algorithm Presentation:
1. What is Machine Learning?
2. Types of Machine Learning?
3. Problems in Machine Learning
4. What is Decision Tree?
5. What are the problems a Decision Tree Solves?
6. Advantages of Decision Tree
7. How does Decision Tree Work?
8. Use Case - Loan Repayment Prediction
What is Machine Learning: Machine Learning is an application of Artificial Intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.
- - - - - - - -
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - -
What skills will you learn from this Machine Learning course?
By the end of this Machine Learning course, you will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, naive bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
5. Be able to model a wide variety of robust Machine Learning algorithms including deep learning, clustering, and recommendation systems
- - - - - - -
"I don't trust AI": the role of explainability in responsible AIErika Agostinelli
This Tech talk was part of the Women in Data Science 2021 Bristol Event. An Introduction to Explainable AI: what to consider when developing an explainable strategy, what are the state of the art techniques and open-source tools used in the field, with concrete examples. To see the recording: https://www.crowdcast.io/e/o4gjxatp
An Explanation Framework for Interpretable Credit Scoring gerogepatton
With the recent boosted enthusiasm in Artificial Intelligence (AI) and Financial Technology (FinTech),
applications such as credit scoring have gained substantial academic interest. However, despite the evergrowing achievements, the biggest obstacle in most AI systems is their lack of interpretability. This
deficiency of transparency limits their application in different domains including credit scoring. Credit
scoring systems help financial experts make better decisions regarding whether or not to accept a loan
application so that loans with a high probability of default are not accepted. Apart from the noisy and
highly imbalanced data challenges faced by such credit scoring models, recent regulations such as the
`right to explanation' introduced by the General Data Protection Regulation (GDPR) and the Equal Credit
Opportunity Act (ECOA) have added the need for model interpretability to ensure that algorithmic
decisions are understandable and coherent. A recently introduced concept is eXplainable AI (XAI), which
focuses on making black-box models more interpretable. In this work, we present a credit scoring model
that is both accurate and interpretable. For classification, state-of-the-art performance on the Home
Equity Line of Credit (HELOC) and Lending Club (LC) Datasets is achieved using the Extreme Gradient
Boosting (XGBoost) model. The model is then further enhanced with a 360-degree explanation framework,
which provides different explanations (i.e. global, local feature-based and local instance- based) that are
required by different people in different situations. Evaluation through the use of functionally-grounded,
application-grounded and human-grounded analysis shows that the explanations provided are simple and
consistent as well as correct, effective, easy to understand, sufficiently detailed and trustworthy.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Recommendation Independence
The 1st Conference on Fairness, Accountability, and Transparency
Article @ Official Site: http://proceedings.mlr.press/v81/kamishima18a.html
Conference site: https://fatconference.org/2018/
Abstract:
This paper studies a recommendation algorithm whose outcomes are not influenced by specified information. It is useful in contexts potentially unfair decision should be avoided, such as job-applicant recommendations that are not influenced by socially sensitive information. An algorithm that could exclude the influence of sensitive information would thus be useful for job-matching with fairness. We call the condition between a recommendation outcome and a sensitive feature Recommendation Independence, which is formally defined as statistical independence between the outcome and the feature. Our previous independence-enhanced algorithms simply matched the means of predictions between sub-datasets consisting of the same sensitive value. However, this approach could not remove the sensitive information represented by the second or higher moments of distributions. In this paper, we develop new methods that can deal with the second moment, i.e., variance, of recommendation outcomes without increasing the computational complexity. These methods can more strictly remove the sensitive information, and experimental results demonstrate that our new algorithms can more effectively eliminate the factors that undermine fairness. Additionally, we explore potential applications for independence-enhanced recommendation, and discuss its relation to other concepts, such as recommendation diversity.
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org. If you would like to use this course content for teaching, please reach out to inquiry@deltanalytics.org
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.
The Formation of Job Referral Networks: Evidence from a Field Experiment in U...essp2
International Food Policy Research Institute (IFPRI) and Ethiopian Development Research Institute (EDRI) in collaboration with Ethiopian Economics Association (EEA). Eleventh International Conference on Ethiopian Economy. July 18-20, 2013
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERSAndresz26
Este Working Paper relata sobre el nivel del riesgo crediticio, se debe reconocer que el riesgo de un grupo proviene de la diversidad de sus miembros, este libro propone una metodología para la aplicación de la medición del riesgo crediticio, y permite hacer un ranking de la población por su nivel de riesgo. La misma que realiza una distinción en los diferentes rankings de la población por su nivel de riesgo, y considerando en el ranking los riesgos de sus preferencias en sus decisiones realizadas.
http://www.udla.edu.ec/
International Journal of Computational Engineering Research (IJCER) ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Simplilearn
This Decision Tree Algorithm in Machine Learning Presentation will help you understand all the basics of Decision Tree along with what is Machine Learning, problems in Machine Learning, what is Decision Tree, advantages and disadvantages of Decision Tree, how Decision Tree algorithm works with solved examples and at the end we will implement a Decision Tree use case/ demo in Python on loan payment prediction. This Decision Tree tutorial is ideal for both beginners as well as professionals who want to learn Machine Learning Algorithms.
Below topics are covered in this Decision Tree Algorithm Presentation:
1. What is Machine Learning?
2. Types of Machine Learning?
3. Problems in Machine Learning
4. What is Decision Tree?
5. What are the problems a Decision Tree Solves?
6. Advantages of Decision Tree
7. How does Decision Tree Work?
8. Use Case - Loan Repayment Prediction
What is Machine Learning: Machine Learning is an application of Artificial Intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.
- - - - - - - -
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - -
What skills will you learn from this Machine Learning course?
By the end of this Machine Learning course, you will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, naive bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
5. Be able to model a wide variety of robust Machine Learning algorithms including deep learning, clustering, and recommendation systems
- - - - - - -
"I don't trust AI": the role of explainability in responsible AIErika Agostinelli
This Tech talk was part of the Women in Data Science 2021 Bristol Event. An Introduction to Explainable AI: what to consider when developing an explainable strategy, what are the state of the art techniques and open-source tools used in the field, with concrete examples. To see the recording: https://www.crowdcast.io/e/o4gjxatp
An Explanation Framework for Interpretable Credit Scoring gerogepatton
With the recent boosted enthusiasm in Artificial Intelligence (AI) and Financial Technology (FinTech),
applications such as credit scoring have gained substantial academic interest. However, despite the evergrowing achievements, the biggest obstacle in most AI systems is their lack of interpretability. This
deficiency of transparency limits their application in different domains including credit scoring. Credit
scoring systems help financial experts make better decisions regarding whether or not to accept a loan
application so that loans with a high probability of default are not accepted. Apart from the noisy and
highly imbalanced data challenges faced by such credit scoring models, recent regulations such as the
`right to explanation' introduced by the General Data Protection Regulation (GDPR) and the Equal Credit
Opportunity Act (ECOA) have added the need for model interpretability to ensure that algorithmic
decisions are understandable and coherent. A recently introduced concept is eXplainable AI (XAI), which
focuses on making black-box models more interpretable. In this work, we present a credit scoring model
that is both accurate and interpretable. For classification, state-of-the-art performance on the Home
Equity Line of Credit (HELOC) and Lending Club (LC) Datasets is achieved using the Extreme Gradient
Boosting (XGBoost) model. The model is then further enhanced with a 360-degree explanation framework,
which provides different explanations (i.e. global, local feature-based and local instance- based) that are
required by different people in different situations. Evaluation through the use of functionally-grounded,
application-grounded and human-grounded analysis shows that the explanations provided are simple and
consistent as well as correct, effective, easy to understand, sufficiently detailed and trustworthy.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Prof George Alter, UMich, ICPSR, presenting at the Managing and publishing sensitive data in the Social Sciences webinar on 29/3/17.
FULL webinar recording: https://youtu.be/7wxfeHNfKiQ
Webinar description:
2) Prof George Alter, (Research Professor, ICPSR and Visiting Professor, ANU) George will share the benefit of over 50 years of experience in managing sensitive social science data in the ICPSR: https://www.icpsr.umich.edu/icpsrweb/
More about ICPSR: -- ICPSR (USA) maintains a data archive of more than 250,000 files of research in the social and behavioral sciences. It hosts 21 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields. -- ICPSR collaborates with a number of funders, including U.S. statistical agencies and foundations, to create thematic collections: see https://www.icpsr.umich.edu/icpsrweb/content/about/thematic-collections.html
Classification with No Direct DiscriminationEditor IJCATR
In many automated applications, large amount of data is collected every day and it is used to learn classifier as well as to
make automated decisions. If that training data is biased towards or against certain entity, race, nationality, gender then mining model
may leads to discrimination. This paper elaborate direct discrimination prevention method. The DRP algorithm modifies the original
data set to prevent direct discrimination. Direct discrimination takes place when decisions are made based on discriminatory attributes
those are specified by the user. The performance of this system is evaluated using measures MC, GC, DDPP, DPDM etc. Different
discrimination measures can be used to discover the discrimination
Scoring and predicting risk preferencesGurdal Ertek
This study presents a methodology to determine risk scores of individuals, for a given financial risk preference survey. To this end, we use a regression based iterative algorithm to determine the weights for survey questions in the scoring process. Next, we generate classification models to classify individuals into risk-averse and risk-seeking categories, using a subset of survey questions. We illustrate the methodology through a sample survey with 656 respondents. We find that the demographic (indirect) questions can be almost as successful as risk-related (direct) questions in predicting risk preference classes of respondents. Using a decision-tree based classification model, we discuss how one can
generate actionable business rules based on the findings.
http://research.sabanciuniv.edu.
Outlier Detection Using Unsupervised Learning on High Dimensional DataIJERA Editor
The outliers in data mining can be detected using semi-supervised and unsupervised methods. Outlier
detection in high dimensional data faces various challenges from curse of dimensionality. It means due
to the distance concentration the data becomes unobvious in high dimensional data. Using outlier
detection techniques, the distance base methods are used to detect outliers and label all the points as
good outliers. In high dimensional data to detect outliers effectively, we use unsupervised learning
methods like IQR, KNN with Anti hub.
The use of data and its modelling in science provides meaningful interpretation of real world problems. This presentation provides an easy to understand overview of data visualization and analytics , and snippets of data science applications using R - programming.
What is the impact of bias in data analysis, and how can it be mitigated.pdfSoumodeep Nanee Kundu
Data analysis is a powerful tool for deriving insights and making informed decisions across various domains, from healthcare and finance to marketing and criminal justice. However, data analysis is not immune to bias, which can significantly impact the quality and fairness of the results. Bias in data analysis can stem from various sources, including biased data collection, algorithmic biases, and human biases in decision-making. In this article, we will explore the impact of bias in data analysis and discuss strategies for mitigating it to ensure more accurate, ethical, and fair outcomes.
Shift AI 2020: How to identify and treat biases in ML Models | Navdeep Sharma...Shift Conference
Shift AI was a success, connecting hundreds of professionals that were eager to propel the progress of AI and discuss the newest technologies in data mining, machine learning and neural networks. More at https://ai.shiftconf.co/.
Talk description:
With all the breakthroughs in Machine Learning space, ML models are now being used to make decisions affecting the lives of humans, more than ever. Hence judging the quality of a model can no longer only fulfilled by accuracy, precision, and recall. It's important to understand that each individual and group of people is being treated with equality without any historical bias existed in the data. This talk focuses on some of the many potential ways to establish fairness as metrics for ML models in your organization. Also, my learnings and challenges, I encountered while building a fairness tool for data scientists and business stakeholders.
Demo: Algorithmic Fairness Tool (AFT) was an innovation project, done at Accenture The Dock, which focused on bringing the latest research from academia and building a tool for the industry.
ML practitioners and advocates are increasingly finding themselves becoming gatekeepers of the modern world. The models you create have power to get people arrested or vindicated, get loans approved or rejected, determine what interest rate should be charged for such loans, who should be shown to you in your long list of pursuits on your Tinder, what news do you read, who gets called for a job phone screen or even a college admission... the list goes on. My goal in this talk is to summarize the kinds of disparate outcomes that are caused by cargo cult machine learning, and recent academic efforts to address some of them.
Data visualization has become increasingly more important and sits at the center of how people learn about and experience the world. We process information about politics, business insights and every day decisions through “visual soundbites”. As data journalists, we have incredible power to both positively influence as well as misguide conversations with the choices that we make when presenting graphical results.
In this presentation, we will share some of the best practices that help deliver stories that matter and avoid creating those that mislead.
Keynote at the Dutch-Belgian Information Retrieval Workshop, November 2016, Delft, Netherlands.
Based on KDD 2016 tutorial with Sara Hajian and Francesco Bonchi.
Article Source: Approach-Induced Biases in Human Information Sampling
Hunt LT, Rutledge RB, Malalasekera WMN, Kennerley SW, Dolan RJ (2016) Approach-Induced Biases in Human Information Sampling. PLOS Biology 14(11): e2000638. https://doi.org/10.1371/journal.pbio.2000638
Exploring Author Gender in Book Rating and Recommendation
M. D. Ekstrand, M. Tian, M. R. I. Kazi, H. Mehrpouyan, and D. Kluver
https://doi.org/10.1145/3240323.3240373
RecSys2018 論文読み会 (2018-11-17) https://atnd.org/events/101334
WSDM2018読み会
2018-04-14 @ クックパッド
https://atnd.org/events/95510
Offline A/B Testing for Recommender Systems
A. Gilotte, C. Calauzénes, T. Nedelec, A. Abraham, and S. Dollé
https://doi.org/10.1145/3159652.3159687
KDD2016勉強会 https://atnd.org/events/80771
論文:“Why Should I Trust You?”Explaining the Predictions of Any Classifier
著者:M. T. Ribeiro and S. Singh and C. Guestrin
論文リンク: http://www.kdd.org/kdd2016/subtopic/view/why-should-i-trust-you-explaining-the-predictions-of-any-classifier
WSDM2016勉強会 https://atnd.org/events/74341
論文:Portrait of an Online Shopper: Understanding and Predicting Consumer Behavior
著者:F. Kooti and K. Lerman and L. M. Aiello and M. Grbovic and N. Djuric
論文リンク: http://dx.doi.org/10.1145/2835776.2835831
Absolute and Relative Clustering
4th MultiClust Workshop on Multiple Clusterings, Multi-view Data, and Multi-source Knowledge-driven Clustering (Multiclust 2013)
Aug. 11, 2013 @ Chicago, U.S.A, in conjunction with KDD2013
Article @ Official Site: http://dx.doi.org/10.1145/2501006.2501013
Article @ Personal Site: http://www.kamishima.net/archive/2013-ws-kdd-print.pdf
Handnote: http://www.kamishima.net/archive/2013-ws-kdd-HN.pdf
Workshop Homepage: http://cs.au.dk/research/research-areas/data-intensive-systems/projects/multiclust2013/
Abstract:
Research into (semi-)supervised clustering has been increasing. Supervised clustering aims to group similar data that are partially guided by the user's supervision. In this supervised clustering, there are many choices for formalization. For example, as a type of supervision, one can adopt labels of data points, must/cannot links, and so on. Given a real clustering task, such as grouping documents or image segmentation, users must confront the question ``How should we mathematically formalize our task?''To help answer this question, we propose the classification of real clusterings into absolute and relative clusterings, which are defined based on the relationship between the resultant partition and the data set to be clustered. This categorization can be exploited to choose a type of task formalization.
Enhancement of the Neutrality in Recommendation
Workshop on Human Decision Making in Recommender Systems, in conjunction with RecSys 2012
Article @ Official Site: http://ceur-ws.org/Vol-893/
Article @ Personal Site: http://www.kamishima.net/archive/2012-ws-recsys-print.pdf
Handnote : http://www.kamishima.net/archive/2012-ws-recsys-HN.pdf
Program codes : http://www.kamishima.net/inrs
Workshop Homepage: http://recex.ist.tugraz.at/RecSysWorkshop2012
Abstract:
This paper proposes an algorithm for making recommendation so that the neutrality toward the viewpoint specified by a user is enhanced. This algorithm is useful for avoiding to make decisions based on biased information. Such a problem is pointed out as the filter bubble, which is the influence in social decisions biased by a personalization technology. To provide such a recommendation, we assume that a user specifies a viewpoint toward which the user want to enforce the neutrality, because recommendation that is neutral from any information is no longer recommendation. Given such a target viewpoint, we implemented information neutral recommendation algorithm by introducing a penalty term to enforce the statistical independence between the target viewpoint and a preference score. We empirically show that our algorithm enhances the independence toward the specified viewpoint by and then demonstrate how sets of recommended items are changed.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Fairness-aware Learning through Regularization Approach
1. Fairness-aware Learning
through Regularization Approach
Toshihiro Kamishima†, Shotaro Akaho†, and Jun Sakuma‡
† National Institute of Advanced Industrial Science and Technology (AIST)
‡ University of Tsukuba & Japan Science and Technology Agency
http://www.kamishima.net/
IEEE Int’l Workshop on Privacy Aspects of Data Mining (PADM 2011)
Dec. 11, 2011 @ Vancouver, Canada, co-located with ICDM2011
1
2. Fairness-aware Learning
through Regularization Approach
Toshihiro Kamishima†, Shotaro Akaho†, and Jun Sakuma‡
† National Institute of Advanced Industrial Science and Technology (AIST)
‡ University of Tsukuba & Japan Science and Technology Agency
http://www.kamishima.net/
IEEE Int’l Workshop on Privacy Aspects of Data Mining (PADM 2011)
Dec. 11, 2011 @ Vancouver, Canada, co-located with ICDM2011
START 1
3. Introduction
Due to the spread of data mining technologies...
Data mining is being increasingly applied for serious decisions
ex. credit scoring, insurance rating, employment application
Accumulation of massive data enables to reveal personal information
Fairness-aware / Discrimination-aware Mining
taking care of the following sensitive information
in its decisions or results:
information considered from the viewpoint of social fairness
ex. gender, religion, race, ethnicity, handicap, political conviction
information restricted by law or contracts
ex. insider information, customers’ private information
2
4. Outline
Backgrounds
an example to explain why fairness-aware mining is needed
Causes of Unfairness
prejudice, underestimation, negative legacy
Methods and Experiments
our prejudice removal technique, experimental results
Related Work
finding unfair association rules, situation testing, fairness-aware data
publishing
Conclusion
3
6. Why Fairness-aware Mining?
[Calders+ 10]
US Census Data : predict whether their income is high or low
Male Female
High-Income 3,256 fewer 590
Low-income 7,604 4,831
Females are minority in the high-income class
# of High-Male data is 5.5 times # of High-Female data
While 30% of Male data are High income, only 11% of Females are
Occum’s Razor : Mining techniques prefer simple hypothesis
Minor patterns are frequently ignored
and thus minorities tend to be treated unfairly
5
7. Calders-Verwer Discrimination Score
[Calders+ 10]
Calders-Verwer discrimination score (CV score)
Pr[ Y=High-income | S=Male ] - Pr[ Y=High-income | S=Female ]
Y: objective variable, S: sensitive feature
The conditional probability of the preferred decision
given a sensitive value subtracted that given a non-sensitive value
As the values of Y, the values in sample data are used
The baseline CV score is 0.19
Objective variables, Y, are predicted by a naive-Bayes classifier
trained from data containing all sensitive and non-sensitive features
The CV score increases to 0.34, indicating unfair treatments
Even if sensitive features are excluded in the training of a classifier
improved to 0.28, but still being unfairer than its baseline
Ignoring sensitive features is ineffective against the exclusion of
their indirect influence (red-lining effect)
6
8. Social Backgrounds
Equality Laws
Many of international laws prohibit discrimination in socially-sensitive
decision making tasks [Pedreschi+ 09]
Circulation of Private Information
Apparently irrelevant information helps to reveal private information
ex. Users’ demographics are predicted from query logs [Jones 09]
Contracts with Customers
Customers’ information must be used for the purpose within the
scope of privacy policies
We need sophisticated techniques for data analysis whose outputs
are neutral to specific information
7
10. Three Causes of Unfairness
There are at least three causes of unfairness in data analysis
Prejudice
the statistical dependency of sensitive features on an objective
variable or non-sensitive features
Underestimation
incompletely converged predictors due to the training from the finite
size of samples
Negative Legacy
training data used for building predictors are unfairly sampled or
labeled
9
11. Prejudice: Prediction Model
variables
objective variable Y : a binary class representing a result of social
decision, e.g., whether or not to allow credit
sensitive feature S : a discrete type and represents socially sensitive
information, such as gender or religion
non-sensitive feature X : a continuous type and corresponds to all
features other than a sensitive feature
prediction model
Classification model representing Pr[ Y | X, S ]
M[ Y | X, S ]
Joint distribution derived by multiplying a sample distribution
Pr[ Y, X, S ] = M[ Y | X, S ] Pr[ X, S ]
considering the independence between these variables
over this joint distribution
10
12. Prejudice
Prejudice : the statistical dependency of sensitive features on an
objective variable or non-sensitive features
Direct Prejudice Y⊥ S |X
/
⊥
a clearly unfair state that a prediction model directly depends on a
sensitive feature
implying the conditional dependence between Y and S given X
Indirect Prejudice Y⊥ S |φ
/⊥
a sensitive feature depends on a objective variable
bringing red-lining effect (unfair treatment caused by information
which is non-sensitive, but depending on sensitive information)
Latent Prejudice X⊥ S |φ
/⊥
a sensitive feature depends on a non-sensitive feature
completely excluding sensitive information
11
13. Relation to PPDM
indirect prejudice
the dependency between a objective Y and a sensitive feature S
from the information theoretic perspective...
mutual information between Y and S is non-zero
from the viewpoint of privacy-preservation...
leakage of sensitive information when an objective variable is known
different conditions from PPDM
introducing randomness is occasionally inappropriate for severe
decisions, such as job application
disclosure of identity isn’t problematic generally
12
14. Underestimation
Underestimation : incompletely converged predictors due to the
training from the finite size of samples
If the number of training samples is finite, the learned classifier may
lead to more unfair decisions than that observed in training samples.
Though such decisions are not intentional,
they might awake suspicions of unfair treatment
Notion of asymptotic convergence is mathematically rationale
Unfavorable decisions for minorities due to the shortage of training
samples might not be socially accepted
Techniques for an anytime algorithm or a class imbalance problem
might help to alleviate this underestimation
13
15. Negative Legacy
Negative Legacy : training data used for building predictors are
unfairly sampled or labeled
Unfair Sampling
If people in a protected group have been refused without
investigation, those people are less frequently sampled
This problem is considered as a kind of a sample selection bias
problem, but it is difficult to detect the existence of the sampling bias
Unfair Labeling
If the people in a protected group that should favorably accepted have
been unfavorably rejected, labels of training samples become unfair
Transfer learning might help to address this problem, if additional
information, such as the small number of fairly labeled samples, is
available
14
17. Logistic Regression
Logistic Regression : discriminative classification model
A method to remove indirect prejudice from decisions made by logistic
regression
samples of samples of samples of
objective non-sensitive sensitive
variables features features regularization
model parameter λ
parameter
ln Pr({(y, x, s)}; ) + R({(y, x, s)}, ) + 2
2 2
regularization Regularizer for Prejudice Removal
parameter η L2 regularizer
the smaller value
the larger value avoiding
more enforces more strongly constraints over fitting
fairness the independence between S and Y
16
18. Prejudice Remover
Prejudice Remover Regularizer
mutual information between S and Y
so as to make Y independent from S
Pr[Y, S]
M[Y |X, S] Pr[X, S] ln
Pr[S] Pr[Y ]
Y,X,S
Pr[y|¯ s , s; ]
x
M[y|x, s; ] ln
Y {0,1} (x,s) s Pr[y|¯ s , s; ]
x
true distribution is replaced approximate by the mutual info at means of X
with sample distribution instead of marginalizing X
We are currently improving a computation method
17
19. Calders-Verwer Two Naive Bayes
[Calders+ 10]
Calders-Verwer Two
Naive Bayes
Naive Bayes (CV2NB)
Y Y
S X S X
S and X are conditionally non-sensitive features X are
independent given Y mutually conditionally
independent given Y and S
Unfair decisions are modeled by introducing of the dependency of S
on X, as well as that of Y on X
A model for representing joint distribution Y and S is built so as to
enforce fairness in decisions
18
20. Experimental Results
Accuracy Fairness (MI between S & Y)
10−6
0.85
10−5
high-accuracy
0.80
10−4
0.75
fairer
10−3
0.70
10−2
0.65
10−1
0.60
η 1
0.01 0.1 1 10 0.01 0.1 1 10 η
LR (without S) Prejudice Remover Naive Bayes (without S) CV2NB
Both CV2NB and PR made fairer decisions than their corresponding
baseline methods
For the larger η, the accuracy of PR tends to be declined, but no
clear trends are found in terms of fairness
The unstablity of PR would be due to the influence of approximation
or the non-convexity of objective function
19
22. Finding Unfair Association Rules
[Pedreschi+ 08, Ruggieri+ 10]
ex: association rules extracted from German Credit Data Set
(a) city=NYC class=bad (conf=0.25)
0.25 of NY residents are denied their credit application
(b) city=NYC & race=African class=bad (conf=0.75)
0.75 of NY residents whose race is African are denied their credit application
conf( A ∧ B C )
extended lift (elift) elift =
conf( A C )
the ratio of the confidence of a rule with additional condition
to the confidence of a base rule
a-protection : considered as unfair if there exists association rules
whose elift is larger than a
ex: (b) isn’t a-protected if a = 2, because elift = conf(b) / conf(a) = 3
They proposed an algorithm to enumerate rules that are not a-protected
21
23. Situation Testing
[Luong+ 11]
Situation Testing : When all the conditions are same other than a
sensitive condition, people in a protected group are considered as
unfairly treated if they received unfavorable decision
They proposed a method for finding
people in unfair treatments by checking the
a protected group
statistics of decisions in k-nearest
neighbors of data points in a
protected group
Condition of situation testing is
Pr[ Y | X, S=a ] = Pr[ Y | X, S=b ] ∀ X
This implies the independence
between S and Y
k-nearest neighbors
22
24. Fairness-aware Data Publishing
[Dwork+ 11]
data owner
loss function original data
representing utilities
for the vendor
data representation
so as to maximize
vendor (data user) archtype vendor’s utility
under the constraints
to guarantee fairness
fair decisions in analysis
They show the conditions that these archtypes should satisfy
This condition implies that the probability of receiving favorable
decision is irrelevant to belonging to a protected group
23
25. Conclusion
Contributions
three causes of unfairness: prejudice, underestimation, and negative
legacy
a prejudice remover regularizer, which enforces a classifier's
independence from sensitive information
experimental results of logistic regressions with our prejudice remover
Future Work
Computation of prejudice remover has to be improved
Socially Responsible Mining
Methods of data exploitation that do not damage people’s lives, such
as fairness-aware mining, PPDM, or adversarial learning, together
comprise the notion of socially responsible mining, which it should
become an important concept in the near future.
24
Editor's Notes
I’m Toshihiro Kamishima, and this is joint work with Shotaro Akaho and Jun Sakuma.\nToday, we would like to talk about fairness-aware data mining.\n
Due to the spread of data mining technologies, data mining is being increasingly applied for serious decisions. For example, credit scoring, insurance rating, employment application, and so on.\nFurthermore, accumulation of massive data enables to reveal personal information.\nTo cope with these situation, several researchers have begun to develop fairness-aware mining techniques, which take care the sensitive information in their decisions.\nFirst, information considered from the viewpoint of social fairness. For example, gender, religion, race, ethnicity, handicap, political conviction, and so on.\nSecond, information restricted by law or contracts. For example, insider information, customers’ private information.\n
This is an outline of our talk.\nFirst, we show an example to explain why fairness-aware mining is needed.\nSecond, we discuss three causes of unfairness.\nThird, we show our prejudice removal technique.\nFinally, we review this new research area\n\n
Let’s turn to backgrounds of fairness-aware mining.\n\n
This is Caldars & Verwer’s example to show why fairness-aware mining is needed.\nConsider a task to predict whether their income is high or low.\nIn this US census data, females are minority in the high-income class.\nBecause mining techniques prefer simple hypothesis, minor patterns are frequently ignored; and thus minorities tend to be treated unfairly.\n\n
Caldars & Verwer proposed a score to quantify the degree of unfairness.\nThis is defined as the difference between conditional probabilities of High-income cases given Male and given Female. The larger score indicates the unfairer decision.\nThe baseline CV score is 0.19.\nIf objective variables are predicted by a naive Bayes classifier, the CV score increases to 0.34, indicating unfair treatments.\nEven if sensitive features are excluded, the score is improved to 0.28, but still being unfairer than its baseline.\nIgnoring sensitive features is ineffective against the exclusion of their indirect influence. This is called by a red-lining effect. More affirmative actions are required.\n
Furthermore, fairness-aware mining are required due to social backgrounds, such as equality laws, the circulation of private information, contracts with customers, and so on.\nBecause these prohibit to use of specific information, we need sophisticated techniques for data analysis whose outputs are neutral to the information.\n
We next discuss causes of unfairness.\n
We consider that there are at least three causes of unfairness in data analysis: prejudice, underestimation, and negative legacy.\nWe sequentially show these causes.\n\n
Before talking abut prejudice, we’d like to give some notations.\nThere are three types of variables: an objective variable Y, sensitive features S, and non-sensitive features X.\nWe introduce a classification model representing the conditional probability of Y given X and S.\nUsing this model, joint distribution is predicted by multiplying a sample distribution.\nWe consider the independence between these variables over this joint distribution.\n\n
The first cause of unfairness is a prejudice, which is defined as the statistical dependency of sensitive features on an objective variable or non-sensitive features.\nWe further classify this prejudice into three types: direct, indirect, and latent.\nDirect prejudice is a clearly unfair state that a prediction model directly depends on a sensitive feature. This implies the conditional dependence between Y and S given X.\nIndirect prejudice is a state that a sensitive feature depends on a objective variable. This brings a red-lining effect.\nLatent prejudice is the state that a sensitive feature depends on a non-sensitive feature.\n\n
Here, we’d like to point out the relation between indirect prejudice and PPDM.\nIndirect prejudice refers the dependency between Y and S.\nFrom information theoretic perspective, this means that mutual information between Y and S is non-zero.\nFrom the viewpoint of privacy-preservation, this is interpreted as the leakage of sensitive information when an objective variable is known.\nOn the other hand, there are different conditions from PPDM.\nintroducing randomness is occasionally inappropriate for severe decisions. For example, if my job application is rejected at random, I will complain the decision and immediately consult with lawyers. disclosure of identity isn’t problematic generally.\n
Now, we turn back to the second cause of unfairness.\nUnderestimation is caused by incompletely converged predictors due to the training from the finite If the number of training samples is finite, the learned classifier may lead to more unfair decisions than that observed in the training samples.\nThough such decisions are not intentional, they might awake suspicions of unfair treatment.\nNotion of asymptotic convergence is mathematically rationale.\nHowever, unfavorable decisions for minorities due to the shortage of training samples might not be socially accepted\n
The third cause of unfairness is a negative legacy, which refers that training data used for building predictors are unfairly sampled or labeled.\nUnfair sampling occurs when, for example, if people in a protected group have been refused without investigation, those people are less frequently sampled.\nUnfair labeling occurs when, for example, if the people in a protected group that should favorably accepted have been unfavorably rejected, labels of training samples become unfair.\n
Among these causes of unfairness, we focus on the removal indirect prejudice.\nWe next talk about our methods and experimental results.\n
We propose a method to remove indirect prejudice from decisions made by logistic regression.\nFor this purpose, we add this term to an original objective function of logistic regression.\nRegularizer for prejudice removal is designed so that the smaller value more strongly constraints the independence between S and Y.\nEta is a regularization parameter. The lager value more enforces fairness.\n
To remove indirect prejudice, we develop a prejudice remover regularizer.\nThis regularizer is mutual information between S and Y so as to make Y independent from S.\nThis is calculated by this formula. Rather brute-force approximation is adopted.\n
We compared our logistic regression with Calders-Verwer two-naive-Bayes classifier.\nUnfair decisions are modeled by introducing of the dependency of X on S as well as on Y.\nA model for representing joint distribution Y and S is built so as to enforce fairness in decisions.\n\n
This is a summary of our experimental results.\nOur method balances accuracy and fairness by tuning η parameter.\nA two-naive-Bayes classifier is very powerful. We could win in accuracy, but lost in fairness.\nBoth CV2NB and PR made fairer decisions than their baseline methods.\nFor the larger η, the accuracy of PR tends to be declined, but no clear trends are found in terms of fairness\nThe unstablity of PR would be due to the influence of brute-force approximation or the non-convexity of objective function. We plan to investigate this point.\n\n
Fairness-aware mining is a relatively new problem, so we briefly review related work.\n
To our knowledge, Pedreschi, Ruggieri, and Turini firstly addressed the problem of fairness in data mining.\nThey proposed a measure to quantify the unfairness of association rules.\nELIFT is defined as the ratio of the confidence of an association rule with additional condition to the confidence of a base rule.\nThey proposed a notion of a-protection. Rules are considered as unfair if there exists association rules whose extended lift is larger than a.\nThey proposed an algorithm to enumerate rules that are not a-protected\n
This group proposed another notion, situation testing.\nWhen all the conditions are same other than a sensitive condition, people in a protected group are considered as unfairly treated if they received unfavorable decision.\nThey proposed a method for finding unfair treatments by checking the statistics of decisions in k-nearest neighbors of data points in a protected group.\nThis situation testing has relation to our indirect prejudice.\n\n
Dwork and colleagues proposed a framework for fairness-aware data publishing.\nVendor send a loss function, which represents utilities for the vendor.\nUsing this function, data owner transforms original data into archtypes.\nArchtype is a data representation so as to maximize vendor’s utility under the constraints to guarantee fair results.\nThey show the conditions that these archtypes should satisfy, and this condition implies that the probability of receiving favorable decision is irrelevant to belonging to a protected group.\nThis further implies the independence between S and Y, and has relation to our indirect prejudice.\n\n
Our contributions are as follows.\nIn future work, computation of prejudice remover has to be improved.\nMethods of data exploitation that do not damage people’s lives, such as fairness-aware mining, PPDM, or adversarial learning, together comprise the notion of socially responsible mining, which it should become an important concept in the near future. \nThat’s all we have to say. Thank you for your attention.\n