Exploring Author Gender in Book Rating and Recommendation
M. D. Ekstrand, M. Tian, M. R. I. Kazi, H. Mehrpouyan, and D. Kluver
https://doi.org/10.1145/3240323.3240373
RecSys2018 論文読み会 (2018-11-17) https://atnd.org/events/101334
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...Vahid Taslimitehrani
Presented at 15th International Conference on BioInformatics and BioEngineering (BIBE2014)
Prognostic modeling is central to medicine, as it is often used to predict patients’ outcome and response to treatments and to identify important medical risk factors. Logistic regression is one of the most used approaches for clinical pre- diction modeling. Traumatic brain injury (TBI) is an important public health issue and a leading cause of death and disability worldwide. In this study, we adapt CPXR (Contrast Pattern Aided Regression, a recently introduced regression method), to develop a new logistic regression method called CPXR(Log), for general binary outcome prediction (including prognostic modeling), and we use the method to carry out prognostic modeling for TBI using admission time data. The models produced by CPXR(Log) achieved AUC as high as 0.93 and specificity as high as 0.97, much better than those reported by previous studies. Our method produced interpretable prediction models for diverse patient groups for TBI, which show that different kinds of patients should be evaluated differently for TBI outcome prediction and the odds ratios of some predictor variables differ significantly from those given by previous studies; such results can be valuable to physicians.
Pranešimas VII Lietuvos jaunųjų mokslininkų konferencijoje „Operacijų tyrimas ir taikymai“
„Kompiuterininkų dienos – 2015“, Panevėžyje, KTU PTVF 2013-09-18
This Presentation is on recommended system on question paper predication using machine learning techniques. We did literature survey and implement using same technique.
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...Vahid Taslimitehrani
Presented at 15th International Conference on BioInformatics and BioEngineering (BIBE2014)
Prognostic modeling is central to medicine, as it is often used to predict patients’ outcome and response to treatments and to identify important medical risk factors. Logistic regression is one of the most used approaches for clinical pre- diction modeling. Traumatic brain injury (TBI) is an important public health issue and a leading cause of death and disability worldwide. In this study, we adapt CPXR (Contrast Pattern Aided Regression, a recently introduced regression method), to develop a new logistic regression method called CPXR(Log), for general binary outcome prediction (including prognostic modeling), and we use the method to carry out prognostic modeling for TBI using admission time data. The models produced by CPXR(Log) achieved AUC as high as 0.93 and specificity as high as 0.97, much better than those reported by previous studies. Our method produced interpretable prediction models for diverse patient groups for TBI, which show that different kinds of patients should be evaluated differently for TBI outcome prediction and the odds ratios of some predictor variables differ significantly from those given by previous studies; such results can be valuable to physicians.
Pranešimas VII Lietuvos jaunųjų mokslininkų konferencijoje „Operacijų tyrimas ir taikymai“
„Kompiuterininkų dienos – 2015“, Panevėžyje, KTU PTVF 2013-09-18
This Presentation is on recommended system on question paper predication using machine learning techniques. We did literature survey and implement using same technique.
Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a mult...GigaScience, BGI Hong Kong
Jie Zheng at the #ICG12 GigaScience Prize Track: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome. ICG12, Shenzhen, 26th October 2017
Poster for Society for Clinical Trials annual meeting in Boston, MA
Abstract
Randomization methods generally are designed to be both unpredictable and balanced between treatment allocations overall and within strata. However, when planning studies, little consideration is given to measuring these characteristics, nor are they examined jointly, and published comparisons between methods often use incompatible metrics and simulation assumptions. Furthermore, for purposes of real-world planning, such simulations often make unrealistic assumptions (e.g., equal sized strata), and summary statistics give limited information.
Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...CSCJournals
Recently a great deal of attention has been paid to modern regression methods such as penalized regressions which perform variable selection and coefficient estimation simultaneously, thereby providing new approaches to analyze complex data of high dimension. The choice of the tuning parameter is vital in penalized regression. In this paper, we studied the effect of different tuning parameter choosing criteria on the performances of some well-known penalization methods including ridge, lasso, and elastic net regressions. Specifically, we investigated the widely used information criteria in regression models such as Bayesian information criterion (BIC), Akaike’s information criterion (AIC), and AIC correction (AICc) in various simulation scenarios and a real data example in economic modeling. We found that predictive performance of models selected by different information criteria is heavily dependent on the properties of a data set. It is hard to find a universal best tuning parameter choosing criterion and a best penalty function for all cases. The results in this research provide reference for the choices of different criteria for tuning parameter in penalized regressions for practitioners, which also expands the nascent field of applications of penalized regressions.
1 Assignment Quantitative Methods 2 The following ass.docxteresehearn
1
Assignment: Quantitative Methods 2
The following assignment is designed to give you a first experience in conducting original empirical
research. You will have to use your own ingenuity in constructing an interesting hypothesis, and finding the
relevant data before conducting the analysis. This exercise should assist you in understanding the statistical
components of the course and in building skills that will be useful in completing your 3rd year dissertations.
The Task
To complete this assignment you must find two or more variables that you believe are related, one of which
is to be explained by the others using an OLS regression, ANOVA or PROBIT regression;
The data may be found anywhere on the web or in books, journals or magazines in the library1. You are free
to do any topics that is of interest to you. It need not be directly relevant to your course. The only
limitations are that:
The data must be secondary (it cannot come from a survey that you conduct yourself);
The data must not already been analysed in a similar way to the way you propose;
The sample size must be at least 25 observations (for all variables)
Your analysis needs to be yours and must be different to others in the class. The same data set
can be used but if two students present essentially the same analysis then this will be looked at
very closely to verify that it is original.
It cannot be from the data sets provided to you on BB.
You are recommended Python but you can use Excel or other software if you wish. Then:
I) Complete a report summarising the analysis; and,
II) Present copy of the data and a copy of the results of the analysis.
Within the report you must complete the following objectives within (I):
1. Clearly state what hypothesis or hypotheses are to be tested, and write one or two paragraphs
on why you believe that the analysis is worth doing with supporting evidence from literature
(e.g. textbooks, articles or internet).
2. Give an exact source for the data. This must be a verifiable source so that we can check if the
data is genuine.
3. Present a summary analysis of the results, with a formal test of the appropriate hypothesis using
the data.
4. It should contain references if they have been cited within the text.
The failure to attach the data in full along with a verifiable source, or a full set of regression results
that can be clearly read will result in 0 mark being given.
Full criteria for Marking are given in the attached Rubric at the end of this document
Assignments should not exceed 1000 words (excluding graphs tables and references).
.
The material necessary to complete this assignment will be completed late in the Autumn Term. Data
should be obtained prior to the Christmas break. However, you will need to submit the assignment
electronically, and will be in due on Monday 14/1/2019.
Note that you should expect your marks back on the Friday 01.
WSDM2018読み会
2018-04-14 @ クックパッド
https://atnd.org/events/95510
Offline A/B Testing for Recommender Systems
A. Gilotte, C. Calauzénes, T. Nedelec, A. Abraham, and S. Dollé
https://doi.org/10.1145/3159652.3159687
Recommendation Independence
The 1st Conference on Fairness, Accountability, and Transparency
Article @ Official Site: http://proceedings.mlr.press/v81/kamishima18a.html
Conference site: https://fatconference.org/2018/
Abstract:
This paper studies a recommendation algorithm whose outcomes are not influenced by specified information. It is useful in contexts potentially unfair decision should be avoided, such as job-applicant recommendations that are not influenced by socially sensitive information. An algorithm that could exclude the influence of sensitive information would thus be useful for job-matching with fairness. We call the condition between a recommendation outcome and a sensitive feature Recommendation Independence, which is formally defined as statistical independence between the outcome and the feature. Our previous independence-enhanced algorithms simply matched the means of predictions between sub-datasets consisting of the same sensitive value. However, this approach could not remove the sensitive information represented by the second or higher moments of distributions. In this paper, we develop new methods that can deal with the second moment, i.e., variance, of recommendation outcomes without increasing the computational complexity. These methods can more strictly remove the sensitive information, and experimental results demonstrate that our new algorithms can more effectively eliminate the factors that undermine fairness. Additionally, we explore potential applications for independence-enhanced recommendation, and discuss its relation to other concepts, such as recommendation diversity.
Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a mult...GigaScience, BGI Hong Kong
Jie Zheng at the #ICG12 GigaScience Prize Track: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome. ICG12, Shenzhen, 26th October 2017
Poster for Society for Clinical Trials annual meeting in Boston, MA
Abstract
Randomization methods generally are designed to be both unpredictable and balanced between treatment allocations overall and within strata. However, when planning studies, little consideration is given to measuring these characteristics, nor are they examined jointly, and published comparisons between methods often use incompatible metrics and simulation assumptions. Furthermore, for purposes of real-world planning, such simulations often make unrealistic assumptions (e.g., equal sized strata), and summary statistics give limited information.
Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...CSCJournals
Recently a great deal of attention has been paid to modern regression methods such as penalized regressions which perform variable selection and coefficient estimation simultaneously, thereby providing new approaches to analyze complex data of high dimension. The choice of the tuning parameter is vital in penalized regression. In this paper, we studied the effect of different tuning parameter choosing criteria on the performances of some well-known penalization methods including ridge, lasso, and elastic net regressions. Specifically, we investigated the widely used information criteria in regression models such as Bayesian information criterion (BIC), Akaike’s information criterion (AIC), and AIC correction (AICc) in various simulation scenarios and a real data example in economic modeling. We found that predictive performance of models selected by different information criteria is heavily dependent on the properties of a data set. It is hard to find a universal best tuning parameter choosing criterion and a best penalty function for all cases. The results in this research provide reference for the choices of different criteria for tuning parameter in penalized regressions for practitioners, which also expands the nascent field of applications of penalized regressions.
1 Assignment Quantitative Methods 2 The following ass.docxteresehearn
1
Assignment: Quantitative Methods 2
The following assignment is designed to give you a first experience in conducting original empirical
research. You will have to use your own ingenuity in constructing an interesting hypothesis, and finding the
relevant data before conducting the analysis. This exercise should assist you in understanding the statistical
components of the course and in building skills that will be useful in completing your 3rd year dissertations.
The Task
To complete this assignment you must find two or more variables that you believe are related, one of which
is to be explained by the others using an OLS regression, ANOVA or PROBIT regression;
The data may be found anywhere on the web or in books, journals or magazines in the library1. You are free
to do any topics that is of interest to you. It need not be directly relevant to your course. The only
limitations are that:
The data must be secondary (it cannot come from a survey that you conduct yourself);
The data must not already been analysed in a similar way to the way you propose;
The sample size must be at least 25 observations (for all variables)
Your analysis needs to be yours and must be different to others in the class. The same data set
can be used but if two students present essentially the same analysis then this will be looked at
very closely to verify that it is original.
It cannot be from the data sets provided to you on BB.
You are recommended Python but you can use Excel or other software if you wish. Then:
I) Complete a report summarising the analysis; and,
II) Present copy of the data and a copy of the results of the analysis.
Within the report you must complete the following objectives within (I):
1. Clearly state what hypothesis or hypotheses are to be tested, and write one or two paragraphs
on why you believe that the analysis is worth doing with supporting evidence from literature
(e.g. textbooks, articles or internet).
2. Give an exact source for the data. This must be a verifiable source so that we can check if the
data is genuine.
3. Present a summary analysis of the results, with a formal test of the appropriate hypothesis using
the data.
4. It should contain references if they have been cited within the text.
The failure to attach the data in full along with a verifiable source, or a full set of regression results
that can be clearly read will result in 0 mark being given.
Full criteria for Marking are given in the attached Rubric at the end of this document
Assignments should not exceed 1000 words (excluding graphs tables and references).
.
The material necessary to complete this assignment will be completed late in the Autumn Term. Data
should be obtained prior to the Christmas break. However, you will need to submit the assignment
electronically, and will be in due on Monday 14/1/2019.
Note that you should expect your marks back on the Friday 01.
WSDM2018読み会
2018-04-14 @ クックパッド
https://atnd.org/events/95510
Offline A/B Testing for Recommender Systems
A. Gilotte, C. Calauzénes, T. Nedelec, A. Abraham, and S. Dollé
https://doi.org/10.1145/3159652.3159687
Recommendation Independence
The 1st Conference on Fairness, Accountability, and Transparency
Article @ Official Site: http://proceedings.mlr.press/v81/kamishima18a.html
Conference site: https://fatconference.org/2018/
Abstract:
This paper studies a recommendation algorithm whose outcomes are not influenced by specified information. It is useful in contexts potentially unfair decision should be avoided, such as job-applicant recommendations that are not influenced by socially sensitive information. An algorithm that could exclude the influence of sensitive information would thus be useful for job-matching with fairness. We call the condition between a recommendation outcome and a sensitive feature Recommendation Independence, which is formally defined as statistical independence between the outcome and the feature. Our previous independence-enhanced algorithms simply matched the means of predictions between sub-datasets consisting of the same sensitive value. However, this approach could not remove the sensitive information represented by the second or higher moments of distributions. In this paper, we develop new methods that can deal with the second moment, i.e., variance, of recommendation outcomes without increasing the computational complexity. These methods can more strictly remove the sensitive information, and experimental results demonstrate that our new algorithms can more effectively eliminate the factors that undermine fairness. Additionally, we explore potential applications for independence-enhanced recommendation, and discuss its relation to other concepts, such as recommendation diversity.
Considerations on Recommendation Independence for a Find-Good-Items TaskToshihiro Kamishima
Considerations on Recommendation Independence for a Find-Good-Items Task
Workshop on Responsible Recommendation (FATREC), in conjunction with RecSys2017
Article @ Official Site: http://doi.org/10.18122/B2871W
Workshop Homepage: https://piret.gitlab.io/fatrec/
This paper examines the notion of recommendation independence, which is a constraint that a recommendation result is independent from specific information. This constraint is useful in ensuring adherence to laws and regulations, fair treatment of content providers, and exclusion of unwanted information. For example, to make a job-matching recommendation socially fair, the matching should be independent of socially sensitive information, such as gender or race. We previously developed several recommenders satisfying recommendation independence, but these were all designed for a predicting-ratings task, whose goal is to predict a score that a user would rate. We here focus on another find-good-items task, which aims to find some items that a user would prefer. In this task, scores representing the degree of preference to items are first predicted, and some items having the largest scores are displayed in the form of a ranked list. We developed a preliminary algorithm for this task through a naive approach, enhancing independence between a preference score and sensitive information. We empirically show that although this algorithm can enhance independence of a preference score, it is not fit for the purpose of enhancing independence in terms of a ranked list. This result indicates the need for inventing a notion of independence that is suitable for use with a ranked list and that is applicable for completing a find-good-items task.
Model-based Approaches for Independence-Enhanced RecommendationToshihiro Kamishima
Model-based Approaches for Independence-Enhanced Recommendation
IEEE International Workshop on Privacy Aspects of Data Mining (PADM), in conjunction with ICDM2016
Article @ Official Site: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2016.0127
Workshop Homepage: http://pddm16.eurecat.org/
Abstract:
This paper studies a new approach to enhance recommendation independence. Such approaches are useful in ensuring adherence to laws and regulations, fair treatment of content providers, and exclusion of unwanted information. For example, recommendations that match an employer with a job applicant should not be based on socially sensitive information, such as gender or race, from the perspective of social fairness. An algorithm that could exclude the influence of such sensitive information would be useful in this case. We previously gave a formal definition of recommendation independence and proposed a method adopting a regularizer that imposes such an independence constraint. As no other options than this regularization approach have been put forward, we here propose a new model-based approach, which is based on a generative model that satisfies the constraint of recommendation independence. We apply this approach to a latent class model and empirically show that the model-based approach can enhance recommendation independence. Recommendation algorithms based on generative models, such as topic models, are important, because they have a flexible functionality that enables them to incorporate a wide variety of information types. Our new model-based approach will broaden the applications of independence-enhanced recommendation by integrating the functionality of generative models.
KDD2016勉強会 https://atnd.org/events/80771
論文:“Why Should I Trust You?”Explaining the Predictions of Any Classifier
著者:M. T. Ribeiro and S. Singh and C. Guestrin
論文リンク: http://www.kdd.org/kdd2016/subtopic/view/why-should-i-trust-you-explaining-the-predictions-of-any-classifier
WSDM2016勉強会 https://atnd.org/events/74341
論文:Portrait of an Online Shopper: Understanding and Predicting Consumer Behavior
著者:F. Kooti and K. Lerman and L. M. Aiello and M. Grbovic and N. Djuric
論文リンク: http://dx.doi.org/10.1145/2835776.2835831
Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, a...Toshihiro Kamishima
Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, and Theoretical Aspects
Invited Talk @ Workshop on Fairness, Accountability, and Transparency in Machine Learning
In conjunction with the ICML 2015 @ Lille, France, Jul. 11, 2015
Web Site: http://www.kamishima.net/fadm/
Handnote: http://www.kamishima.net/archive/2015-ws-icml-HN.pdf
The goal of fairness-aware data mining (FADM) is to analyze data while taking into account potential issues of fairness. In this talk, we will cover three topics in FADM:
1. Fairness in a Recommendation Context: In classification tasks, the term "fairness" is regarded as anti-discrimination. We will present other types of problems related to the fairness in a recommendation context.
2. What is Fairness: Most formal definitions of fairness have a connection with the notion of statistical independence. We will explore other types of formal fairness based on causality, agreement, and unfairness.
3. Theoretical Problems of FADM: After reviewing technical and theoretical open problems in the FADM literature, we will introduce the theory of the generalization bound in terms of accuracy as well as fairness.
Joint work with Jun Sakuma, Shotaro Akaho, and Hideki Asoh
Correcting Popularity Bias by Enhancing Recommendation NeutralityToshihiro Kamishima
Correcting Popularity Bias by Enhancing Recommendation Neutrality on
The 8th ACM Conference on Recommender Systems, Poster
Article @ Official Site: http://ceur-ws.org/Vol-1247/
Article @ Personal Site: http://www.kamishima.net/archive/2014-po-recsys-print.pdf
Abstract:
In this paper, we attempt to correct a popularity bias, which is the tendency for popular items to be recommended more frequently, by enhancing recommendation neutrality. Recommendation neutrality involves excluding specified information from the prediction process of recommendation. This neutrality was formalized as the statistical independence between a recommendation result and the specified information, and we developed a recommendation algorithm that satisfies this independence constraint. We correct the popularity bias by enhancing neutrality with respect to information regarding whether candidate items are popular or not. We empirically show that a popularity bias in the predicted preference scores can be corrected.
The Independence of Fairness-aware Classifiers
IEEE International Workshop on Privacy Aspects of Data Mining (PADM), in conjunction with ICDM2013
Article @ Official Site:
Article @ Personal Site: http://www.kamishima.net/archive/2013-ws-icdm-print.pdf
Handnote : http://www.kamishima.net/archive/2013-ws-icdm-HN.pdf
Program codes : http://www.kamishima.net/fadm/
Workshop Homepage: http://www.cs.cf.ac.uk/padm2013/
Abstract:
Due to the spread of data mining technologies, such technologies are being used for determinations that seriously affect individuals' lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be nondiscriminatory and fair in sensitive features, such as race, gender, religion, and so on. The goal of fairness-aware classifiers is to classify data while taking into account the potential issues of fairness, discrimination, neutrality, and/or independence. In this paper, after reviewing fairness-aware classification methods, we focus on one such method, Calders and Verwer's two-naive-Bayes method. This method has been shown superior to the other classifiers in terms of fairness, which is formalized as the statistical independence between a class and a sensitive feature. However, the cause of the superiority is unclear, because it utilizes a somewhat heuristic post-processing technique rather than an explicitly formalized model. We clarify the cause by comparing this method with an alternative naive Bayes classifier, which is modified by a modeling technique called "hypothetical fair-factorization." This investigation reveals the theoretical background of the two-naive-Bayes method and its connections with other methods. Based on these findings, we develop another naive Bayes method with an "actual fair-factorization" technique and empirically show that this new method can achieve an equal level of fairness as that of the two-naive-Bayes classifier.
Efficiency Improvement of Neutrality-Enhanced RecommendationToshihiro Kamishima
Efficiency Improvement of Neutrality-Enhanced Recommendation
Workshop on Human Decision Making in Recommender Systems, in conjunction with RecSys 2013
Article @ Official Site: http://ceur-ws.org/Vol-1050/
Article @ Personal Site: http://www.kamishima.net/archive/2013-ws-recsys-print.pdf
Handnote : http://www.kamishima.net/archive/2013-ws-recsys-HN.pdf
Program codes : http://www.kamishima.net/inrs/
Workshop Homepage: http://recex.ist.tugraz.at/RecSysWorkshop/
Abstract:
This paper proposes an algorithm for making recommendations so that neutrality from a viewpoint specified by the user is enhanced. This algorithm is useful for avoiding decisions based on biased information. Such a problem is pointed out as the filter bubble, which is the influence in social decisions biased by personalization technologies. To provide a neutrality-enhanced recommendation, we must first assume that a user can specify a particular viewpoint from which the neutrality can be applied, because a recommendation that is neutral from all viewpoints is no longer a recommendation. Given such a target viewpoint, we implement an information-neutral recommendation algorithm by introducing a penalty term to enforce statistical independence between the target viewpoint and a rating. We empirically show that our algorithm enhances the independence from the specified viewpoint.
Absolute and Relative Clustering
4th MultiClust Workshop on Multiple Clusterings, Multi-view Data, and Multi-source Knowledge-driven Clustering (Multiclust 2013)
Aug. 11, 2013 @ Chicago, U.S.A, in conjunction with KDD2013
Article @ Official Site: http://dx.doi.org/10.1145/2501006.2501013
Article @ Personal Site: http://www.kamishima.net/archive/2013-ws-kdd-print.pdf
Handnote: http://www.kamishima.net/archive/2013-ws-kdd-HN.pdf
Workshop Homepage: http://cs.au.dk/research/research-areas/data-intensive-systems/projects/multiclust2013/
Abstract:
Research into (semi-)supervised clustering has been increasing. Supervised clustering aims to group similar data that are partially guided by the user's supervision. In this supervised clustering, there are many choices for formalization. For example, as a type of supervision, one can adopt labels of data points, must/cannot links, and so on. Given a real clustering task, such as grouping documents or image segmentation, users must confront the question ``How should we mathematically formalize our task?''To help answer this question, we propose the classification of real clusterings into absolute and relative clusterings, which are defined based on the relationship between the resultant partition and the data set to be clustered. This categorization can be exploited to choose a type of task formalization.
Consideration on Fairness-aware Data Mining
IEEE International Workshop on Discrimination and Privacy-Aware Data Mining (DPADM 2012)
Dec. 10, 2012 @ Brussels, Belgium, in conjunction with ICDM2012
Article @ Official Site: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2012.101
Article @ Personal Site: http://www.kamishima.net/archive/2012-ws-icdm-print.pdf
Handnote: http://www.kamishima.net/archive/2012-ws-icdm-HN.pdf
Workshop Homepage: https://sites.google.com/site/dpadm2012/
Abstract:
With the spread of data mining technologies and the accumulation of social data, such technologies and data are being used for determinations that seriously affect individuals' lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be nondiscriminatory and fair regarding sensitive features such as race, gender, religion, and so on. Several researchers have recently begun to develop fairness-aware or discrimination-aware data mining techniques that take into account issues of social fairness, discrimination, and neutrality. In this paper, after demonstrating the applications of these techniques, we explore the formal concepts of fairness and techniques for handling fairness in data mining. We then provide an integrated view of these concepts based on statistical independence. Finally, we discuss the relations between fairness-aware data mining and other research topics, such as privacy-preserving data mining or causal inference.
Fairness-aware Classifier with Prejudice Remover RegularizerToshihiro Kamishima
Fairness-aware Classifier with Prejudice Remover Regularizer
Proceedings of the European Conference on Machine Learning and Principles of Knowledge Discovery in Databases (ECMLPKDD), Part II, pp.35-50 (2012)
Article @ Official Site: http://dx.doi.org/10.1007/978-3-642-33486-3_3
Article @ Personal Site: http://www.kamishima.net/archive/2012-p-ecmlpkdd-print.pdf
Handnote: http://www.kamishima.net/archive/2012-p-ecmlpkdd-HN.pdf
Program codes : http://www.kamishima.net/fadm/
Conference Homepage: http://www.ecmlpkdd2012.net/
Abstract:
With the spread of data mining technologies and the accumulation of social data, such technologies and data are being used for determinations that seriously affect individuals' lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be nondiscriminatory and fair in sensitive features, such as race, gender, religion, and so on. Several researchers have recently begun to attempt the development of analysis techniques that are aware of social fairness or discrimination. They have shown that simply avoiding the use of sensitive features is insufficient for eliminating biases in determinations, due to the indirect influence of sensitive information. In this paper, we first discuss three causes of unfairness in machine learning. We then propose a regularization approach that is applicable to any prediction algorithm with probabilistic discriminative models. We further apply this approach to logistic regression and empirically show its effectiveness and efficiency.
Enhancement of the Neutrality in Recommendation
Workshop on Human Decision Making in Recommender Systems, in conjunction with RecSys 2012
Article @ Official Site: http://ceur-ws.org/Vol-893/
Article @ Personal Site: http://www.kamishima.net/archive/2012-ws-recsys-print.pdf
Handnote : http://www.kamishima.net/archive/2012-ws-recsys-HN.pdf
Program codes : http://www.kamishima.net/inrs
Workshop Homepage: http://recex.ist.tugraz.at/RecSysWorkshop2012
Abstract:
This paper proposes an algorithm for making recommendation so that the neutrality toward the viewpoint specified by a user is enhanced. This algorithm is useful for avoiding to make decisions based on biased information. Such a problem is pointed out as the filter bubble, which is the influence in social decisions biased by a personalization technology. To provide such a recommendation, we assume that a user specifies a viewpoint toward which the user want to enforce the neutrality, because recommendation that is neutral from any information is no longer recommendation. Given such a target viewpoint, we implemented information neutral recommendation algorithm by introducing a penalty term to enforce the statistical independence between the target viewpoint and a preference score. We empirically show that our algorithm enhances the independence toward the specified viewpoint by and then demonstrate how sets of recommended items are changed.
Fairness-aware Learning through Regularization Approach
The 3rd IEEE International Workshop on Privacy Aspects of Data Mining (PADM 2011)
Dec. 11, 2011 @ Vancouver, Canada, in conjunction with ICDM2011
Article @ Official Site: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2011.83
Article @ Personal Site: http://www.kamishima.net/archive/2011-ws-icdm_padm.pdf
Handnote: http://www.kamishima.net/archive/2011-ws-icdm_padm-HN.pdf
Workshop Homepage: http://www.zurich.ibm.com/padm2011/
Abstract:
With the spread of data mining technologies and the accumulation of social data, such technologies and data are being used for determinations that seriously affect people's lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be socially and legally fair from a viewpoint of social responsibility; namely, it must be unbiased and nondiscriminatory in sensitive features, such as race, gender, religion, and so on. Several researchers have recently begun to attempt the development of analysis techniques that are aware of social fairness or discrimination. They have shown that simply avoiding the use of sensitive features is insufficient for eliminating biases in determinations, due to the indirect influence of sensitive information. From a privacy-preserving viewpoint, this can be interpreted as hiding sensitive information when classification results are observed. In this paper, we first discuss three causes of unfairness in machine learning. We then propose a regularization approach that is applicable to any prediction algorithm with probabilistic discriminative models. We further apply this approach to logistic regression and empirically show its effectiveness and efficiency.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
4. 4
RecSys ’18, October 2–7, 2018, Vancouver, BC, Canada
u
unu
µ
¯ua ua
¯ua ¯nua
ba
sa a
u 2 U
a 2 A
u u a
5. 5
RecSys ’18, October 2–7, 2018, Vancouver, BC, Can
u
unu
µ
¯ua
¯ua
ba
sa
u 2 U
Binomial(nu, θu)NegBinomial(ν, γ)
logit(θu) Normal(μ, σ)
6. 6
ober 2–7, 2018, Vancouver, BC, Canada
u
u
µ
¯ua ua
¯ua ¯nua
ba
sa a
a 2 A
Table
Variab
n
¯nu
¯u
logit( ) Normal( + logit( ), 2)<latexit sha1_base64="WiSy2qnMJnJn/Jh+eBgx0ac955E=">AAADb3ichVLPaxNBFP6a1Vrrj8Z6UBBkMEQSlDIJQqWnohe9SJs0baBbl9l1mizdX+xMQuuy/4AXjx68qOBB/DO8ePLmoX+CeJIW9GDFt5sNakvrLDvz5pv3fe/Nm2dHnqs057sTJePU6ckzU2enz52/cHGmfGl2VYWD2JEdJ/TCuGsLJT03kB3tak92o1gK3/bkmr11PztfG8pYuWGwonciueGLXuBuuo7QBFnlvukL3Y/9xAt7rk5rpi3ixNR9qUVqJQOR1pmpXJ+N3R6FsS+8tGZbgt1iiubDAjnXGtRvs4SYPV9YIn3crFvlCp/j+WBHjUZhVFCMpbD8AyaeIISDAXxIBNBkexBQ9K2jAY6IsA0khMVkufm5RIpp4g7IS5KHIHSL5h7t1gs0oH2mqXK2Q1E8+mNiMlT5Z/6O7/GP/D3/wn8eq5XkGlkuO7TaI66MrJlnV9vf/8vyadXo/2GdmLPGJu7mubqUe5Qj2S2cEX/49MVee6FVTW7yN/wr5f+a7/IPdINguO+8XZatl6ReBXt+0HrVmjwhUkBVyJS3i7qqvKrbaI7i0D/Om1FGYf4mC2S3sYKH6P6FHn//scK4blntVfZm1CKNww1x1FhtzjXIXr5TWbxXNMsUruEGatQR81jEAyyhQxE/YR8H+FX6Zlwxrhts5FqaKDiX8c8w6r8BoCnTFQ==</latexit><latexit sha1_base64="WiSy2qnMJnJn/Jh+eBgx0ac955E=">AAADb3ichVLPaxNBFP6a1Vrrj8Z6UBBkMEQSlDIJQqWnohe9SJs0baBbl9l1mizdX+xMQuuy/4AXjx68qOBB/DO8ePLmoX+CeJIW9GDFt5sNakvrLDvz5pv3fe/Nm2dHnqs057sTJePU6ckzU2enz52/cHGmfGl2VYWD2JEdJ/TCuGsLJT03kB3tak92o1gK3/bkmr11PztfG8pYuWGwonciueGLXuBuuo7QBFnlvukL3Y/9xAt7rk5rpi3ixNR9qUVqJQOR1pmpXJ+N3R6FsS+8tGZbgt1iiubDAjnXGtRvs4SYPV9YIn3crFvlCp/j+WBHjUZhVFCMpbD8AyaeIISDAXxIBNBkexBQ9K2jAY6IsA0khMVkufm5RIpp4g7IS5KHIHSL5h7t1gs0oH2mqXK2Q1E8+mNiMlT5Z/6O7/GP/D3/wn8eq5XkGlkuO7TaI66MrJlnV9vf/8vyadXo/2GdmLPGJu7mubqUe5Qj2S2cEX/49MVee6FVTW7yN/wr5f+a7/IPdINguO+8XZatl6ReBXt+0HrVmjwhUkBVyJS3i7qqvKrbaI7i0D/Om1FGYf4mC2S3sYKH6P6FHn//scK4blntVfZm1CKNww1x1FhtzjXIXr5TWbxXNMsUruEGatQR81jEAyyhQxE/YR8H+FX6Zlwxrhts5FqaKDiX8c8w6r8BoCnTFQ==</latexit><latexit sha1_base64="WiSy2qnMJnJn/Jh+eBgx0ac955E=">AAADb3ichVLPaxNBFP6a1Vrrj8Z6UBBkMEQSlDIJQqWnohe9SJs0baBbl9l1mizdX+xMQuuy/4AXjx68qOBB/DO8ePLmoX+CeJIW9GDFt5sNakvrLDvz5pv3fe/Nm2dHnqs057sTJePU6ckzU2enz52/cHGmfGl2VYWD2JEdJ/TCuGsLJT03kB3tak92o1gK3/bkmr11PztfG8pYuWGwonciueGLXuBuuo7QBFnlvukL3Y/9xAt7rk5rpi3ixNR9qUVqJQOR1pmpXJ+N3R6FsS+8tGZbgt1iiubDAjnXGtRvs4SYPV9YIn3crFvlCp/j+WBHjUZhVFCMpbD8AyaeIISDAXxIBNBkexBQ9K2jAY6IsA0khMVkufm5RIpp4g7IS5KHIHSL5h7t1gs0oH2mqXK2Q1E8+mNiMlT5Z/6O7/GP/D3/wn8eq5XkGlkuO7TaI66MrJlnV9vf/8vyadXo/2GdmLPGJu7mubqUe5Qj2S2cEX/49MVee6FVTW7yN/wr5f+a7/IPdINguO+8XZatl6ReBXt+0HrVmjwhUkBVyJS3i7qqvKrbaI7i0D/Om1FGYf4mC2S3sYKH6P6FHn//scK4blntVfZm1CKNww1x1FhtzjXIXr5TWbxXNMsUruEGatQR81jEAyyhQxE/YR8H+FX6Zlwxrhts5FqaKDiX8c8w6r8BoCnTFQ==</latexit><latexit sha1_base64="WiSy2qnMJnJn/Jh+eBgx0ac955E=">AAADb3ichVLPaxNBFP6a1Vrrj8Z6UBBkMEQSlDIJQqWnohe9SJs0baBbl9l1mizdX+xMQuuy/4AXjx68qOBB/DO8ePLmoX+CeJIW9GDFt5sNakvrLDvz5pv3fe/Nm2dHnqs057sTJePU6ckzU2enz52/cHGmfGl2VYWD2JEdJ/TCuGsLJT03kB3tak92o1gK3/bkmr11PztfG8pYuWGwonciueGLXuBuuo7QBFnlvukL3Y/9xAt7rk5rpi3ixNR9qUVqJQOR1pmpXJ+N3R6FsS+8tGZbgt1iiubDAjnXGtRvs4SYPV9YIn3crFvlCp/j+WBHjUZhVFCMpbD8AyaeIISDAXxIBNBkexBQ9K2jAY6IsA0khMVkufm5RIpp4g7IS5KHIHSL5h7t1gs0oH2mqXK2Q1E8+mNiMlT5Z/6O7/GP/D3/wn8eq5XkGlkuO7TaI66MrJlnV9vf/8vyadXo/2GdmLPGJu7mubqUe5Qj2S2cEX/49MVee6FVTW7yN/wr5f+a7/IPdINguO+8XZatl6ReBXt+0HrVmjwhUkBVyJS3i7qqvKrbaI7i0D/Om1FGYf4mC2S3sYKH6P6FHn//scK4blntVfZm1CKNww1x1FhtzjXIXr5TWbxXNMsUruEGatQR81jEAyyhQxE/YR8H+FX6Zlwxrhts5FqaKDiX8c8w6r8BoCnTFQ==</latexit>
7. 7
btain author information from
(VIAF)3, a directory of author
ity records from the Library of
und the world. Author gender
s for many records.
mployed by the VIAF is exible
ender identities, supporting an
es for the validity of an identity.
se exibility — all its assertions
This is a signicant limitation
on 5.1.
book data with rating data by
ve data linking coverage, and
works instead of individual edi-
m a bipartite graph of ISBNs and
“edition” records, and OpenLi-
e) and consider each connected
ess than 1% of ratings) this caus-
or a book; we resolve multiple
ir ratings.
VIAF do not share linking iden-
hority records by author name.
ontain multiple name entries,
izations of the author’s name.
arry multiple known forms of
ng names to improve matching
ng both “Last, First” and “First
e all VIAF records containing a
d names for the rst author of
n a book’s cluster. If all records
hor’s gender agree, we take that
ontradicting gender statements,
as “ambiguous”.
ure good coverage while main-
Table 2: Summary of rating data
BookCrossing Amazon
Ratings 1,149,780 22,507,155
Users 105,283 8,026,324
Rated ISBNs/ASINs 340,554 2,330,066
Rated ‘Books’ 295,935 2,286,656
Matched Books 240,255 1,083,066
Known-Gender Books 166,928 616,317
Female-Author Books 66,524 181,850
Male-Author Books 100,404 434,467
% Female Books 39.9% 29.5%
% Female Ratings 45.3% 36.2%
BXA BXE
LOC AZ
fem
ale
m
ale
am
biguousunknow
nunlinked
fem
ale
m
ale
am
biguousunknow
nunlinked
0%
20%
40%
60%
0%
10%
20%
30%
40%
0%
10%
20%
30%
40%
0%
10%
20%
30%
40%
Linking Result
CoveragePercent
Scope
Books
Ratings
Figure 1: Results of data linking and gender resolution. LOC
is the set of books with Library of Congress records; other
panes are the results of linking rating data.
8. 8
dependent
TAN 2.17.3
each per-
We report
arameters
h existing
acterizing
nalyze the
Tables 1–
sample of
nders are
in our cat-
has a more
ookCross-
wn-gender
oportions
(est. sd log odds) 1.03 1.11 1.77
Posterior Mean 0.42 0.40 0.37
Std. Dev. 0.23 0.23 0.28
AZBXABXE
0.00 0.25 0.50 0.75 1.00
0
1
2
3
4
0.0
0.5
1.0
1.5
2.0
0.0
0.5
1.0
1.5
2.0
Proportion of Female Authors
Density
Method Estimated θ Observed y/n Predicted y/n
Figure 4: Distribution of user author-gender tendencies. His-
togram shows observed proportions; lines show kernel den-
sities of estimated tendencies ( 0) along with observed and
predicted proportions.
and Figure 4 shows the distribution of observed author gender
9. 9
Users Dist. Items % Dist. Users Dist. Items % Dist. Users Dist. Items % Dist. Users Dist. Items % Dist.
Prole 1,000 35,187 66.5 1,000 24,913 73.6 1,000 27,525 88.2 1,000 27,525 88.2
UserUser 1,000 6,007 12.0 988 6,235 12.7 1,000 15,343 30.7 939 25,853 55.1
ItemItem 1,000 21,282 42.6 997 10,174 20.4 999 33,363 67.7 999 22,360 45.6
MF 1,000 140 0.3 1,000 264 0.5 1,000 164 0.3 1,000 651 1.3
PF 1,000 1,506 3.0 1,000 4,105 8.2 1,000 2,746 5.4 1,000 3,538 7.0
AZ (Explicit) AZ (Implicit) BXA BXE
UserUserItemItemMFPF
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
0
1
2
3
0
1
2
3
4
0
10
20
0
1
2
3
4
Proportion of Books by Female Authors
Density
Mean
Algorithm
Popular
Profile
Method
Observed
Predicted
Figure 5: Posterior densities of recommender biases from integrated regression model.
proportions. The ripples in predicted and observed proportions are
due to the commonality of 5-item user proles, for which there are
only 6 possible proportions; estimated tendency ( ) smooths them
out. This smoothing, along with avoiding estimated extreme biases
based on limited data, are why we nd it useful to estimate tenden-
cy instead of directly computing statistics on observed proportions.
To support direct comparison of the densities of observations and
predictions, we resampled observed proportions with replacement
to yield 10,000 observations.
We observe a population tendency to rate male authors more
frequently than female authors in all data sets (µ 0), but to rate
female authors more frequently than they would be rated were
users drawing books uniformly at random from the available set.
The average user author-gender tendency is slightly closer to an
even balance than the set of rated books. We also found a large
diversity amongst users about their estimated tendencies (s.d. of
Table 6: Mean / SD of rec. list female author proportions.
BXA BXE AZ (Implicit) AZ (Explicit)
Popular 0.458 0.500 0.364 0.364
Rating — 0.383 — 0.222
UserUser 0.399 / 0.180 0.435 / 0.190 0.315 / 0.186 0.367 / 0.278
ItemItem 0.465 / 0.200 0.348 / 0.124 0.351 / 0.245 0.389 / 0.336
MF 0.134 / 0.027 0.334 / 0.039 0.468 / 0.079 0.418 / 0.124
PF 0.372 / 0.208 0.429 / 0.177 0.374 / 0.144 0.394 / 0.177
basic coverage statistics of these algorithms along with correspond-
ing user prole statistics. Users for which an algorithm could not
produce recommendations are rare. We also computed the extent
to which algorithms recommend dierent items to dierent users;
“% Dist.” is the percentage of all recommendations that were distinct
items. Algorithms that repeatedly recommend the same items will
10. 10
BXE
-0.139 0.162 0.906 -0.573 0.129 0.531 -0.652 0.002 0.161 -0.166 0.298 0.772
(-0.20,-0.08) (0.10,0.22) (0.87,0.95) (-0.61,-0.54) (0.09,0.16) (0.51,0.56) (-0.66,-0.64) (-0.01,0.01) (0.15,0.17) (-0.22,-0.11) (0.25,0.35) (0.74,0.81)
AZ (Implicit)
-0.127 0.688 0.715 0.094 0.863 0.895 -0.244 0.011 0.364 -0.224 0.287 0.537
(-0.19,-0.06) (0.65,0.73) (0.68,0.76) (0.02,0.17) (0.81,0.92) (0.84,0.95) (-0.27,-0.22) (-0.00,0.02) (0.35,0.38) (-0.26,-0.18) (0.26,0.31) (0.51,0.56)
AZ (Explicit)
-0.580 0.322 0.681 -0.380 0.438 0.852 -0.117 0.006 0.273 -0.403 0.141 0.525
(-0.63,-0.53) (0.29,0.35) (0.65,0.71) (-0.44,-0.32) (0.40,0.48) (0.81,0.89) (-0.14,-0.10) (-0.00,0.02) (0.26,0.29) (-0.44,-0.37) (0.12,0.16) (0.50,0.55)
AZ (Explicit) AZ (Implicit) BXA BXE
UserUserItemItemMFPF
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
Profile Proportion of Female Authors
RecommenderProportionofFemaleAuthors
Figure 6: Scatter plots and regression curves for recommender response to individual users.
more concentrated. In the BookCrossing data, it tends to favor male
authors more than the underlying data would support; in implic-
it feedback mode, it is highly biased towards male authors with
respect even to the baseline distributions.
4.4 From Proles to Recommendations
Our extended Bayesian model (Section 3.4.2) allows us to address
RQ4: the extent to which our algorithms propagate individual users’
tendencies into their recommendations (RQ4).
Figure 5 shows the posterior predictive and observed densities
of recommender author-gender tendencies, and Figure 6 shows
scatter plots of observed recommendation proportions against user
prole proportions with regression curves (regression lines in log-
place. Visual inspection of the scatter plot suggests that there is a
strong component with consistent tendencies, but the regression
may accurately model the remaining users. Future work will use a
model that can better account for some global consistency.
4.5 Summary
RQ1 — Baseline Gender Distribution Known books are sig-
nicantly more likely to be written by men than by women;
representation among rated books is more balanced.
RQ2 — User Input Gender Distributions User are diuse in
their rating tendencies, with an overall trend favoring male
authors but less strongly than the baseline distribution.
RQ3 — Recommender Output Distributions Dierent CF