Enhancement of the Neutrality in Recommendation
Workshop on Human Decision Making in Recommender Systems, in conjunction with RecSys 2012
Article @ Official Site: http://ceur-ws.org/Vol-893/
Article @ Personal Site: http://www.kamishima.net/archive/2012-ws-recsys-print.pdf
Handnote : http://www.kamishima.net/archive/2012-ws-recsys-HN.pdf
Program codes : http://www.kamishima.net/inrs
Workshop Homepage: http://recex.ist.tugraz.at/RecSysWorkshop2012
Abstract:
This paper proposes an algorithm for making recommendation so that the neutrality toward the viewpoint specified by a user is enhanced. This algorithm is useful for avoiding to make decisions based on biased information. Such a problem is pointed out as the filter bubble, which is the influence in social decisions biased by a personalization technology. To provide such a recommendation, we assume that a user specifies a viewpoint toward which the user want to enforce the neutrality, because recommendation that is neutral from any information is no longer recommendation. Given such a target viewpoint, we implemented information neutral recommendation algorithm by introducing a penalty term to enforce the statistical independence between the target viewpoint and a preference score. We empirically show that our algorithm enhances the independence toward the specified viewpoint by and then demonstrate how sets of recommended items are changed.
Efficiency Improvement of Neutrality-Enhanced RecommendationToshihiro Kamishima
Efficiency Improvement of Neutrality-Enhanced Recommendation
Workshop on Human Decision Making in Recommender Systems, in conjunction with RecSys 2013
Article @ Official Site: http://ceur-ws.org/Vol-1050/
Article @ Personal Site: http://www.kamishima.net/archive/2013-ws-recsys-print.pdf
Handnote : http://www.kamishima.net/archive/2013-ws-recsys-HN.pdf
Program codes : http://www.kamishima.net/inrs/
Workshop Homepage: http://recex.ist.tugraz.at/RecSysWorkshop/
Abstract:
This paper proposes an algorithm for making recommendations so that neutrality from a viewpoint specified by the user is enhanced. This algorithm is useful for avoiding decisions based on biased information. Such a problem is pointed out as the filter bubble, which is the influence in social decisions biased by personalization technologies. To provide a neutrality-enhanced recommendation, we must first assume that a user can specify a particular viewpoint from which the neutrality can be applied, because a recommendation that is neutral from all viewpoints is no longer a recommendation. Given such a target viewpoint, we implement an information-neutral recommendation algorithm by introducing a penalty term to enforce statistical independence between the target viewpoint and a rating. We empirically show that our algorithm enhances the independence from the specified viewpoint.
Model-based Approaches for Independence-Enhanced RecommendationToshihiro Kamishima
Model-based Approaches for Independence-Enhanced Recommendation
IEEE International Workshop on Privacy Aspects of Data Mining (PADM), in conjunction with ICDM2016
Article @ Official Site: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2016.0127
Workshop Homepage: http://pddm16.eurecat.org/
Abstract:
This paper studies a new approach to enhance recommendation independence. Such approaches are useful in ensuring adherence to laws and regulations, fair treatment of content providers, and exclusion of unwanted information. For example, recommendations that match an employer with a job applicant should not be based on socially sensitive information, such as gender or race, from the perspective of social fairness. An algorithm that could exclude the influence of such sensitive information would be useful in this case. We previously gave a formal definition of recommendation independence and proposed a method adopting a regularizer that imposes such an independence constraint. As no other options than this regularization approach have been put forward, we here propose a new model-based approach, which is based on a generative model that satisfies the constraint of recommendation independence. We apply this approach to a latent class model and empirically show that the model-based approach can enhance recommendation independence. Recommendation algorithms based on generative models, such as topic models, are important, because they have a flexible functionality that enables them to incorporate a wide variety of information types. Our new model-based approach will broaden the applications of independence-enhanced recommendation by integrating the functionality of generative models.
Correcting Popularity Bias by Enhancing Recommendation NeutralityToshihiro Kamishima
Correcting Popularity Bias by Enhancing Recommendation Neutrality on
The 8th ACM Conference on Recommender Systems, Poster
Article @ Official Site: http://ceur-ws.org/Vol-1247/
Article @ Personal Site: http://www.kamishima.net/archive/2014-po-recsys-print.pdf
Abstract:
In this paper, we attempt to correct a popularity bias, which is the tendency for popular items to be recommended more frequently, by enhancing recommendation neutrality. Recommendation neutrality involves excluding specified information from the prediction process of recommendation. This neutrality was formalized as the statistical independence between a recommendation result and the specified information, and we developed a recommendation algorithm that satisfies this independence constraint. We correct the popularity bias by enhancing neutrality with respect to information regarding whether candidate items are popular or not. We empirically show that a popularity bias in the predicted preference scores can be corrected.
The Independence of Fairness-aware Classifiers
IEEE International Workshop on Privacy Aspects of Data Mining (PADM), in conjunction with ICDM2013
Article @ Official Site:
Article @ Personal Site: http://www.kamishima.net/archive/2013-ws-icdm-print.pdf
Handnote : http://www.kamishima.net/archive/2013-ws-icdm-HN.pdf
Program codes : http://www.kamishima.net/fadm/
Workshop Homepage: http://www.cs.cf.ac.uk/padm2013/
Abstract:
Due to the spread of data mining technologies, such technologies are being used for determinations that seriously affect individuals' lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be nondiscriminatory and fair in sensitive features, such as race, gender, religion, and so on. The goal of fairness-aware classifiers is to classify data while taking into account the potential issues of fairness, discrimination, neutrality, and/or independence. In this paper, after reviewing fairness-aware classification methods, we focus on one such method, Calders and Verwer's two-naive-Bayes method. This method has been shown superior to the other classifiers in terms of fairness, which is formalized as the statistical independence between a class and a sensitive feature. However, the cause of the superiority is unclear, because it utilizes a somewhat heuristic post-processing technique rather than an explicitly formalized model. We clarify the cause by comparing this method with an alternative naive Bayes classifier, which is modified by a modeling technique called "hypothetical fair-factorization." This investigation reveals the theoretical background of the two-naive-Bayes method and its connections with other methods. Based on these findings, we develop another naive Bayes method with an "actual fair-factorization" technique and empirically show that this new method can achieve an equal level of fairness as that of the two-naive-Bayes classifier.
Consideration on Fairness-aware Data Mining
IEEE International Workshop on Discrimination and Privacy-Aware Data Mining (DPADM 2012)
Dec. 10, 2012 @ Brussels, Belgium, in conjunction with ICDM2012
Article @ Official Site: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2012.101
Article @ Personal Site: http://www.kamishima.net/archive/2012-ws-icdm-print.pdf
Handnote: http://www.kamishima.net/archive/2012-ws-icdm-HN.pdf
Workshop Homepage: https://sites.google.com/site/dpadm2012/
Abstract:
With the spread of data mining technologies and the accumulation of social data, such technologies and data are being used for determinations that seriously affect individuals' lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be nondiscriminatory and fair regarding sensitive features such as race, gender, religion, and so on. Several researchers have recently begun to develop fairness-aware or discrimination-aware data mining techniques that take into account issues of social fairness, discrimination, and neutrality. In this paper, after demonstrating the applications of these techniques, we explore the formal concepts of fairness and techniques for handling fairness in data mining. We then provide an integrated view of these concepts based on statistical independence. Finally, we discuss the relations between fairness-aware data mining and other research topics, such as privacy-preserving data mining or causal inference.
Fairness-aware Classifier with Prejudice Remover RegularizerToshihiro Kamishima
Fairness-aware Classifier with Prejudice Remover Regularizer
Proceedings of the European Conference on Machine Learning and Principles of Knowledge Discovery in Databases (ECMLPKDD), Part II, pp.35-50 (2012)
Article @ Official Site: http://dx.doi.org/10.1007/978-3-642-33486-3_3
Article @ Personal Site: http://www.kamishima.net/archive/2012-p-ecmlpkdd-print.pdf
Handnote: http://www.kamishima.net/archive/2012-p-ecmlpkdd-HN.pdf
Program codes : http://www.kamishima.net/fadm/
Conference Homepage: http://www.ecmlpkdd2012.net/
Abstract:
With the spread of data mining technologies and the accumulation of social data, such technologies and data are being used for determinations that seriously affect individuals' lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be nondiscriminatory and fair in sensitive features, such as race, gender, religion, and so on. Several researchers have recently begun to attempt the development of analysis techniques that are aware of social fairness or discrimination. They have shown that simply avoiding the use of sensitive features is insufficient for eliminating biases in determinations, due to the indirect influence of sensitive information. In this paper, we first discuss three causes of unfairness in machine learning. We then propose a regularization approach that is applicable to any prediction algorithm with probabilistic discriminative models. We further apply this approach to logistic regression and empirically show its effectiveness and efficiency.
Efficiency Improvement of Neutrality-Enhanced RecommendationToshihiro Kamishima
Efficiency Improvement of Neutrality-Enhanced Recommendation
Workshop on Human Decision Making in Recommender Systems, in conjunction with RecSys 2013
Article @ Official Site: http://ceur-ws.org/Vol-1050/
Article @ Personal Site: http://www.kamishima.net/archive/2013-ws-recsys-print.pdf
Handnote : http://www.kamishima.net/archive/2013-ws-recsys-HN.pdf
Program codes : http://www.kamishima.net/inrs/
Workshop Homepage: http://recex.ist.tugraz.at/RecSysWorkshop/
Abstract:
This paper proposes an algorithm for making recommendations so that neutrality from a viewpoint specified by the user is enhanced. This algorithm is useful for avoiding decisions based on biased information. Such a problem is pointed out as the filter bubble, which is the influence in social decisions biased by personalization technologies. To provide a neutrality-enhanced recommendation, we must first assume that a user can specify a particular viewpoint from which the neutrality can be applied, because a recommendation that is neutral from all viewpoints is no longer a recommendation. Given such a target viewpoint, we implement an information-neutral recommendation algorithm by introducing a penalty term to enforce statistical independence between the target viewpoint and a rating. We empirically show that our algorithm enhances the independence from the specified viewpoint.
Model-based Approaches for Independence-Enhanced RecommendationToshihiro Kamishima
Model-based Approaches for Independence-Enhanced Recommendation
IEEE International Workshop on Privacy Aspects of Data Mining (PADM), in conjunction with ICDM2016
Article @ Official Site: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2016.0127
Workshop Homepage: http://pddm16.eurecat.org/
Abstract:
This paper studies a new approach to enhance recommendation independence. Such approaches are useful in ensuring adherence to laws and regulations, fair treatment of content providers, and exclusion of unwanted information. For example, recommendations that match an employer with a job applicant should not be based on socially sensitive information, such as gender or race, from the perspective of social fairness. An algorithm that could exclude the influence of such sensitive information would be useful in this case. We previously gave a formal definition of recommendation independence and proposed a method adopting a regularizer that imposes such an independence constraint. As no other options than this regularization approach have been put forward, we here propose a new model-based approach, which is based on a generative model that satisfies the constraint of recommendation independence. We apply this approach to a latent class model and empirically show that the model-based approach can enhance recommendation independence. Recommendation algorithms based on generative models, such as topic models, are important, because they have a flexible functionality that enables them to incorporate a wide variety of information types. Our new model-based approach will broaden the applications of independence-enhanced recommendation by integrating the functionality of generative models.
Correcting Popularity Bias by Enhancing Recommendation NeutralityToshihiro Kamishima
Correcting Popularity Bias by Enhancing Recommendation Neutrality on
The 8th ACM Conference on Recommender Systems, Poster
Article @ Official Site: http://ceur-ws.org/Vol-1247/
Article @ Personal Site: http://www.kamishima.net/archive/2014-po-recsys-print.pdf
Abstract:
In this paper, we attempt to correct a popularity bias, which is the tendency for popular items to be recommended more frequently, by enhancing recommendation neutrality. Recommendation neutrality involves excluding specified information from the prediction process of recommendation. This neutrality was formalized as the statistical independence between a recommendation result and the specified information, and we developed a recommendation algorithm that satisfies this independence constraint. We correct the popularity bias by enhancing neutrality with respect to information regarding whether candidate items are popular or not. We empirically show that a popularity bias in the predicted preference scores can be corrected.
The Independence of Fairness-aware Classifiers
IEEE International Workshop on Privacy Aspects of Data Mining (PADM), in conjunction with ICDM2013
Article @ Official Site:
Article @ Personal Site: http://www.kamishima.net/archive/2013-ws-icdm-print.pdf
Handnote : http://www.kamishima.net/archive/2013-ws-icdm-HN.pdf
Program codes : http://www.kamishima.net/fadm/
Workshop Homepage: http://www.cs.cf.ac.uk/padm2013/
Abstract:
Due to the spread of data mining technologies, such technologies are being used for determinations that seriously affect individuals' lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be nondiscriminatory and fair in sensitive features, such as race, gender, religion, and so on. The goal of fairness-aware classifiers is to classify data while taking into account the potential issues of fairness, discrimination, neutrality, and/or independence. In this paper, after reviewing fairness-aware classification methods, we focus on one such method, Calders and Verwer's two-naive-Bayes method. This method has been shown superior to the other classifiers in terms of fairness, which is formalized as the statistical independence between a class and a sensitive feature. However, the cause of the superiority is unclear, because it utilizes a somewhat heuristic post-processing technique rather than an explicitly formalized model. We clarify the cause by comparing this method with an alternative naive Bayes classifier, which is modified by a modeling technique called "hypothetical fair-factorization." This investigation reveals the theoretical background of the two-naive-Bayes method and its connections with other methods. Based on these findings, we develop another naive Bayes method with an "actual fair-factorization" technique and empirically show that this new method can achieve an equal level of fairness as that of the two-naive-Bayes classifier.
Consideration on Fairness-aware Data Mining
IEEE International Workshop on Discrimination and Privacy-Aware Data Mining (DPADM 2012)
Dec. 10, 2012 @ Brussels, Belgium, in conjunction with ICDM2012
Article @ Official Site: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2012.101
Article @ Personal Site: http://www.kamishima.net/archive/2012-ws-icdm-print.pdf
Handnote: http://www.kamishima.net/archive/2012-ws-icdm-HN.pdf
Workshop Homepage: https://sites.google.com/site/dpadm2012/
Abstract:
With the spread of data mining technologies and the accumulation of social data, such technologies and data are being used for determinations that seriously affect individuals' lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be nondiscriminatory and fair regarding sensitive features such as race, gender, religion, and so on. Several researchers have recently begun to develop fairness-aware or discrimination-aware data mining techniques that take into account issues of social fairness, discrimination, and neutrality. In this paper, after demonstrating the applications of these techniques, we explore the formal concepts of fairness and techniques for handling fairness in data mining. We then provide an integrated view of these concepts based on statistical independence. Finally, we discuss the relations between fairness-aware data mining and other research topics, such as privacy-preserving data mining or causal inference.
Fairness-aware Classifier with Prejudice Remover RegularizerToshihiro Kamishima
Fairness-aware Classifier with Prejudice Remover Regularizer
Proceedings of the European Conference on Machine Learning and Principles of Knowledge Discovery in Databases (ECMLPKDD), Part II, pp.35-50 (2012)
Article @ Official Site: http://dx.doi.org/10.1007/978-3-642-33486-3_3
Article @ Personal Site: http://www.kamishima.net/archive/2012-p-ecmlpkdd-print.pdf
Handnote: http://www.kamishima.net/archive/2012-p-ecmlpkdd-HN.pdf
Program codes : http://www.kamishima.net/fadm/
Conference Homepage: http://www.ecmlpkdd2012.net/
Abstract:
With the spread of data mining technologies and the accumulation of social data, such technologies and data are being used for determinations that seriously affect individuals' lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be nondiscriminatory and fair in sensitive features, such as race, gender, religion, and so on. Several researchers have recently begun to attempt the development of analysis techniques that are aware of social fairness or discrimination. They have shown that simply avoiding the use of sensitive features is insufficient for eliminating biases in determinations, due to the indirect influence of sensitive information. In this paper, we first discuss three causes of unfairness in machine learning. We then propose a regularization approach that is applicable to any prediction algorithm with probabilistic discriminative models. We further apply this approach to logistic regression and empirically show its effectiveness and efficiency.
[Decisions2013@RecSys]The Role of Emotions in Context-aware RecommendationYONG ZHENG
Context-aware recommender systems try to adapt to users' preferences across different contexts and have been proven to provide better predictive performance in a number of domains. Emotion is one of the most popular contextual variables, but few researchers have explored how emotions take effect in recommendations -- especially the usage of the emotional variables other than the effectiveness alone. In this paper, we explore the role of emotions in context-aware recommendation algorithms. More specifically, we evaluate two types of popular context-aware recommendation algorithms -- context-aware splitting approaches and differential context modeling. We examine predictive performance, and also explore the usage of emotions to discover how emotional features interact with those context-aware recommendation algorithms in the recommendation process.
Causal data mining: Identifying causal effects at scaleAmit Sharma
Identifying causal effects is an integral part of scientific inquiry, spanning a wide range of questions such as understanding behavior in online systems, effect of social policies, or risk factors for diseases. In the absence of a randomized experiment, however, traditional methods such as matching or instrumental variables fail to provide robust estimates because they depend on strong assumptions that are never tested.
My research shows that many of the strong assumptions are testable. This leads to a data mining framework for causal inference from observed data: instead of relying on untestable assumptions, we develop tests for valid experiment-like data---a "natural" experiment---and estimate causal effects only from subsets of data that pass those tests. Two such methods are presented. The first utilizes auxiliary data from large-scale systems to automate the search for natural experiments. Applying it to estimate the additional activity caused by Amazon's recommendation system, I find over 20,000 natural experiments, an order of magnitude more than those in past work. These experiments indicate that less than half of the click-throughs typically attributed to the recommendation system are causal; the rest would have happened anyways. The second method proposes a general Bayesian test that can be used for validating natural experiments in any dataset. For instance, I find that a majority of natural experiments used in recent studies in a premier economics journal are likely invalid. More generally, the proposed framework presents a viable way of doing causal inference in large-scale datasets with minimal assumptions.
Instance Selection and Optimization of Neural NetworksITIIIndustries
Credit scoring is an important tool in financial institutions, which can be used in credit granting decision. Credit applications are marked by credit scoring models and those with high marks will be treated as “good”, while those with low marks will be regarded as “bad”. As data mining technique develops, automatic credit scoring systems are warmly welcomed for their high efficiency and objective judgments. Many machine learning algorithms have been applied in training credit scoring models, and ANN is one of them with good performance. This paper presents a higher accuracy credit scoring model based on MLP neural networks trained with back propagation algorithm. Our work focuses on enhancing credit scoring models in three aspects: optimize data distribution in datasets using a new method called Average Random Choosing; compare effects of training-validation-test instances numbers; and find the most suitable number of hidden units. Another contribution of this paper is summarizing the tendency of scoring accuracy of models when the number of hidden units increases. The experiment results show that our methods can achieve high credit scoring accuracy with imbalanced datasets. Thus, credit granting decision can be made by data mining methods using MLP neural networks.
MTVRep: A movie and TV show reputation system based on fine-grained sentiment ...IJECEIAES
Customer reviews are a valuable source of information from which we can extract very useful data about different online shopping experiences. For trendy items (products, movies, TV shows, hotels, services . . . ), the number of available users and customers’ opinions could easily surpass thousands. Therefore, online reputation systems could aid potential customers in making the right decision (buying, renting, booking . . . ) by automatically mining textual reviews and their ratings. This paper presents MTVRep, a movie and TV show reputation system that incorporates fine-grained opinion mining and semantic analysis to generate and visualize reputation toward movies and TV shows. Differently from previous studies on reputation generation that treat the task of sentiment analysis as a binary classification problem (positive, negative), the proposed system identifies the sentiment strength during the phase of sentiment classification by using fine-grained sentiment analysis to separate movie and TV show reviews into five discrete classes: strongly negative, weakly negative, neutral, weakly positive and strongly positive. Besides, it employs embeddings from language models (ELMo) representations to extract semantic relations between reviews. The contribution of this paper is threefold. First, movie and TV show reviews are separated into five groups based on their sentiment orientation. Second, a custom score is computed for each opinion group. Finally, a numerical reputation value is produced toward the target movie or TV show. The efficacy of the proposed system is illustrated by conducting several experiments on a real-world movie and TV show dataset.
Data Reduction and Classification for Lumosity DataYao Yao
In 2015 researchers at Lumos Labs, the Lumosity cognitive training games platform maker, sought to determine if cognitive training (via the Lumosity platform) would result in cognitive performance gains. Can the randomization grouping of participants in the original study be predicted? Utilizing cognitive ability measurements, participant activity measurements, and participants’ ages, we attempt to predict randomization grouping utilizing linear discriminant analysis and principal component analysis techniques.
https://github.com/yaowser/LDA-PCA-Lumosity-Categorical-Prediction
V Jornadas eMadrid sobre “Educación Digital”. Roberto Centeno, Universidad Na...eMadrid network
V Jornadas eMadrid sobre “Educación Digital”. Roberto Centeno, Universidad Nacional de Educación a Distancia: Mecanismos de reputación en MOOCs. 2015-06-30
Many customers often switch or unsubscribe (churn) from their telecom providers for a variety of reasons. These could range from unsatisfactory service, better pricing from competitors, customers moving to different cities etc. Therefore, telecom companies are interested in analyzing the patterns for customers who churn from their services and use the resultant analysis to determine in the future which customers are more likely to unsubscribe from their services. One such company is Telco Systems. Telco Systems is interested in identifying the precise patterns for their churning customers and have provided the customer data for this project.
Fairness-aware Learning through Regularization Approach
The 3rd IEEE International Workshop on Privacy Aspects of Data Mining (PADM 2011)
Dec. 11, 2011 @ Vancouver, Canada, in conjunction with ICDM2011
Article @ Official Site: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2011.83
Article @ Personal Site: http://www.kamishima.net/archive/2011-ws-icdm_padm.pdf
Handnote: http://www.kamishima.net/archive/2011-ws-icdm_padm-HN.pdf
Workshop Homepage: http://www.zurich.ibm.com/padm2011/
Abstract:
With the spread of data mining technologies and the accumulation of social data, such technologies and data are being used for determinations that seriously affect people's lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be socially and legally fair from a viewpoint of social responsibility; namely, it must be unbiased and nondiscriminatory in sensitive features, such as race, gender, religion, and so on. Several researchers have recently begun to attempt the development of analysis techniques that are aware of social fairness or discrimination. They have shown that simply avoiding the use of sensitive features is insufficient for eliminating biases in determinations, due to the indirect influence of sensitive information. From a privacy-preserving viewpoint, this can be interpreted as hiding sensitive information when classification results are observed. In this paper, we first discuss three causes of unfairness in machine learning. We then propose a regularization approach that is applicable to any prediction algorithm with probabilistic discriminative models. We further apply this approach to logistic regression and empirically show its effectiveness and efficiency.
[Decisions2013@RecSys]The Role of Emotions in Context-aware RecommendationYONG ZHENG
Context-aware recommender systems try to adapt to users' preferences across different contexts and have been proven to provide better predictive performance in a number of domains. Emotion is one of the most popular contextual variables, but few researchers have explored how emotions take effect in recommendations -- especially the usage of the emotional variables other than the effectiveness alone. In this paper, we explore the role of emotions in context-aware recommendation algorithms. More specifically, we evaluate two types of popular context-aware recommendation algorithms -- context-aware splitting approaches and differential context modeling. We examine predictive performance, and also explore the usage of emotions to discover how emotional features interact with those context-aware recommendation algorithms in the recommendation process.
Causal data mining: Identifying causal effects at scaleAmit Sharma
Identifying causal effects is an integral part of scientific inquiry, spanning a wide range of questions such as understanding behavior in online systems, effect of social policies, or risk factors for diseases. In the absence of a randomized experiment, however, traditional methods such as matching or instrumental variables fail to provide robust estimates because they depend on strong assumptions that are never tested.
My research shows that many of the strong assumptions are testable. This leads to a data mining framework for causal inference from observed data: instead of relying on untestable assumptions, we develop tests for valid experiment-like data---a "natural" experiment---and estimate causal effects only from subsets of data that pass those tests. Two such methods are presented. The first utilizes auxiliary data from large-scale systems to automate the search for natural experiments. Applying it to estimate the additional activity caused by Amazon's recommendation system, I find over 20,000 natural experiments, an order of magnitude more than those in past work. These experiments indicate that less than half of the click-throughs typically attributed to the recommendation system are causal; the rest would have happened anyways. The second method proposes a general Bayesian test that can be used for validating natural experiments in any dataset. For instance, I find that a majority of natural experiments used in recent studies in a premier economics journal are likely invalid. More generally, the proposed framework presents a viable way of doing causal inference in large-scale datasets with minimal assumptions.
Instance Selection and Optimization of Neural NetworksITIIIndustries
Credit scoring is an important tool in financial institutions, which can be used in credit granting decision. Credit applications are marked by credit scoring models and those with high marks will be treated as “good”, while those with low marks will be regarded as “bad”. As data mining technique develops, automatic credit scoring systems are warmly welcomed for their high efficiency and objective judgments. Many machine learning algorithms have been applied in training credit scoring models, and ANN is one of them with good performance. This paper presents a higher accuracy credit scoring model based on MLP neural networks trained with back propagation algorithm. Our work focuses on enhancing credit scoring models in three aspects: optimize data distribution in datasets using a new method called Average Random Choosing; compare effects of training-validation-test instances numbers; and find the most suitable number of hidden units. Another contribution of this paper is summarizing the tendency of scoring accuracy of models when the number of hidden units increases. The experiment results show that our methods can achieve high credit scoring accuracy with imbalanced datasets. Thus, credit granting decision can be made by data mining methods using MLP neural networks.
MTVRep: A movie and TV show reputation system based on fine-grained sentiment ...IJECEIAES
Customer reviews are a valuable source of information from which we can extract very useful data about different online shopping experiences. For trendy items (products, movies, TV shows, hotels, services . . . ), the number of available users and customers’ opinions could easily surpass thousands. Therefore, online reputation systems could aid potential customers in making the right decision (buying, renting, booking . . . ) by automatically mining textual reviews and their ratings. This paper presents MTVRep, a movie and TV show reputation system that incorporates fine-grained opinion mining and semantic analysis to generate and visualize reputation toward movies and TV shows. Differently from previous studies on reputation generation that treat the task of sentiment analysis as a binary classification problem (positive, negative), the proposed system identifies the sentiment strength during the phase of sentiment classification by using fine-grained sentiment analysis to separate movie and TV show reviews into five discrete classes: strongly negative, weakly negative, neutral, weakly positive and strongly positive. Besides, it employs embeddings from language models (ELMo) representations to extract semantic relations between reviews. The contribution of this paper is threefold. First, movie and TV show reviews are separated into five groups based on their sentiment orientation. Second, a custom score is computed for each opinion group. Finally, a numerical reputation value is produced toward the target movie or TV show. The efficacy of the proposed system is illustrated by conducting several experiments on a real-world movie and TV show dataset.
Data Reduction and Classification for Lumosity DataYao Yao
In 2015 researchers at Lumos Labs, the Lumosity cognitive training games platform maker, sought to determine if cognitive training (via the Lumosity platform) would result in cognitive performance gains. Can the randomization grouping of participants in the original study be predicted? Utilizing cognitive ability measurements, participant activity measurements, and participants’ ages, we attempt to predict randomization grouping utilizing linear discriminant analysis and principal component analysis techniques.
https://github.com/yaowser/LDA-PCA-Lumosity-Categorical-Prediction
V Jornadas eMadrid sobre “Educación Digital”. Roberto Centeno, Universidad Na...eMadrid network
V Jornadas eMadrid sobre “Educación Digital”. Roberto Centeno, Universidad Nacional de Educación a Distancia: Mecanismos de reputación en MOOCs. 2015-06-30
Many customers often switch or unsubscribe (churn) from their telecom providers for a variety of reasons. These could range from unsatisfactory service, better pricing from competitors, customers moving to different cities etc. Therefore, telecom companies are interested in analyzing the patterns for customers who churn from their services and use the resultant analysis to determine in the future which customers are more likely to unsubscribe from their services. One such company is Telco Systems. Telco Systems is interested in identifying the precise patterns for their churning customers and have provided the customer data for this project.
Fairness-aware Learning through Regularization Approach
The 3rd IEEE International Workshop on Privacy Aspects of Data Mining (PADM 2011)
Dec. 11, 2011 @ Vancouver, Canada, in conjunction with ICDM2011
Article @ Official Site: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2011.83
Article @ Personal Site: http://www.kamishima.net/archive/2011-ws-icdm_padm.pdf
Handnote: http://www.kamishima.net/archive/2011-ws-icdm_padm-HN.pdf
Workshop Homepage: http://www.zurich.ibm.com/padm2011/
Abstract:
With the spread of data mining technologies and the accumulation of social data, such technologies and data are being used for determinations that seriously affect people's lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be socially and legally fair from a viewpoint of social responsibility; namely, it must be unbiased and nondiscriminatory in sensitive features, such as race, gender, religion, and so on. Several researchers have recently begun to attempt the development of analysis techniques that are aware of social fairness or discrimination. They have shown that simply avoiding the use of sensitive features is insufficient for eliminating biases in determinations, due to the indirect influence of sensitive information. From a privacy-preserving viewpoint, this can be interpreted as hiding sensitive information when classification results are observed. In this paper, we first discuss three causes of unfairness in machine learning. We then propose a regularization approach that is applicable to any prediction algorithm with probabilistic discriminative models. We further apply this approach to logistic regression and empirically show its effectiveness and efficiency.
Absolute and Relative Clustering
4th MultiClust Workshop on Multiple Clusterings, Multi-view Data, and Multi-source Knowledge-driven Clustering (Multiclust 2013)
Aug. 11, 2013 @ Chicago, U.S.A, in conjunction with KDD2013
Article @ Official Site: http://dx.doi.org/10.1145/2501006.2501013
Article @ Personal Site: http://www.kamishima.net/archive/2013-ws-kdd-print.pdf
Handnote: http://www.kamishima.net/archive/2013-ws-kdd-HN.pdf
Workshop Homepage: http://cs.au.dk/research/research-areas/data-intensive-systems/projects/multiclust2013/
Abstract:
Research into (semi-)supervised clustering has been increasing. Supervised clustering aims to group similar data that are partially guided by the user's supervision. In this supervised clustering, there are many choices for formalization. For example, as a type of supervision, one can adopt labels of data points, must/cannot links, and so on. Given a real clustering task, such as grouping documents or image segmentation, users must confront the question ``How should we mathematically formalize our task?''To help answer this question, we propose the classification of real clusterings into absolute and relative clusterings, which are defined based on the relationship between the resultant partition and the data set to be clustered. This categorization can be exploited to choose a type of task formalization.
Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, a...Toshihiro Kamishima
Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, and Theoretical Aspects
Invited Talk @ Workshop on Fairness, Accountability, and Transparency in Machine Learning
In conjunction with the ICML 2015 @ Lille, France, Jul. 11, 2015
Web Site: http://www.kamishima.net/fadm/
Handnote: http://www.kamishima.net/archive/2015-ws-icml-HN.pdf
The goal of fairness-aware data mining (FADM) is to analyze data while taking into account potential issues of fairness. In this talk, we will cover three topics in FADM:
1. Fairness in a Recommendation Context: In classification tasks, the term "fairness" is regarded as anti-discrimination. We will present other types of problems related to the fairness in a recommendation context.
2. What is Fairness: Most formal definitions of fairness have a connection with the notion of statistical independence. We will explore other types of formal fairness based on causality, agreement, and unfairness.
3. Theoretical Problems of FADM: After reviewing technical and theoretical open problems in the FADM literature, we will introduce the theory of the generalization bound in terms of accuracy as well as fairness.
Joint work with Jun Sakuma, Shotaro Akaho, and Hideki Asoh
WSDM2016勉強会 https://atnd.org/events/74341
論文:Portrait of an Online Shopper: Understanding and Predicting Consumer Behavior
著者:F. Kooti and K. Lerman and L. M. Aiello and M. Grbovic and N. Djuric
論文リンク: http://dx.doi.org/10.1145/2835776.2835831
2017 10-10 (netflix ml platform meetup) learning item and user representation...Ed Chi
Learning item and user representations with sparse data in recommender systems
Ed H. Chi
Google Inc.
Recommenders match users in a particular context with the best personalized items that they will engage with. The problem is that users have shifting item and topic preferences, and give sparse feedback over time (or no-feedback at all). Contexts shift from interaction-to-interaction at various time scales (seconds to minutes to days). Learning about users and items is hard because of noisy and sparse labels, and the user/item set changes rapidly and is large and long-tailed. Given the enormity of the problem, it is a wonder that we learn anything at all about our items and users.
In this talk, I will outline some research at Google to tackle the sparsity problem. First, I will summarize some work on focused learning, which suggests that learning about subsets of the data requires tuning the parameters for estimating the missing unobserved entries. Second, we utilize joint feature factorization to impute possible user affinity to freshly-uploaded items, and employ hashing-based techniques to perform extremely fast similarity scoring on a large item catalog, while controlling variance. This approach is currently serving a ~1TB model on production traffic using distributed TensorFlow Serving, demonstrating that our techniques work in practice. I will conclude with some remarks on possible future directions.
Deep Learning for Recommendations: Fundamentals and Advances
In this part, we focus on the Fundamentals of Deep Recommender Systems.
Tutorial Website/slides: https://advanced-recommender-systems.github.io/ijcai2021-tutorial/
https://youtu.be/_M5S0Njmc_c
There's lots of talk about behavioural economics. But how do you practically apply it to sustainability?
Many books and presentations focus on specific aspects and heuristics, such as social norms and reframing.
This deck doesn't attempt this. Instead it provides initial guidance and suggestions for CSR and Sustainability professionals who want to start applying behavioural economics within their organisations. What should they do differently? Which practical steps should they take?
We hope it gets you thinking.
For a copy of the deck, please get in touch via: http://prime-decision.com/contact/
AI-driven product innovation: from Recommender Systems to COVID-19Xavier Amatriain
AI/Machine Learning has become an integral part of many household tech products, from Netflix to our phones. In this talk I will draw from my experience driving AI teams at some of those companies to showcase how AI can positively impact products as different as Netflix and Curai, an online telehealth service.
Presentation slides used by Simon Chen, Ramius Corporation & Sylvie Croteau, Ad Hoc Recherche at Market Research in the Mobile World (MRMW) Conference, Cincinnati, July 2012.
Ad Hoc Recherche and Sobeys collaborated on the "Le Comptoir" project that was powered by Ramius' Recollective software. Working with Ad Hoc Recherche, Sobeys had a national study underway and was eager to gain additional insights from a short term online research community. Aside from the additional actionable business insights the community yielded, the collaborators learned how the online medium compared and contrasted with traditional qualitative techniques to gain insight on how online methods could be incorporated into future research initiatives.
Exploring Author Gender in Book Rating and Recommendation
M. D. Ekstrand, M. Tian, M. R. I. Kazi, H. Mehrpouyan, and D. Kluver
https://doi.org/10.1145/3240323.3240373
RecSys2018 論文読み会 (2018-11-17) https://atnd.org/events/101334
WSDM2018読み会
2018-04-14 @ クックパッド
https://atnd.org/events/95510
Offline A/B Testing for Recommender Systems
A. Gilotte, C. Calauzénes, T. Nedelec, A. Abraham, and S. Dollé
https://doi.org/10.1145/3159652.3159687
Recommendation Independence
The 1st Conference on Fairness, Accountability, and Transparency
Article @ Official Site: http://proceedings.mlr.press/v81/kamishima18a.html
Conference site: https://fatconference.org/2018/
Abstract:
This paper studies a recommendation algorithm whose outcomes are not influenced by specified information. It is useful in contexts potentially unfair decision should be avoided, such as job-applicant recommendations that are not influenced by socially sensitive information. An algorithm that could exclude the influence of sensitive information would thus be useful for job-matching with fairness. We call the condition between a recommendation outcome and a sensitive feature Recommendation Independence, which is formally defined as statistical independence between the outcome and the feature. Our previous independence-enhanced algorithms simply matched the means of predictions between sub-datasets consisting of the same sensitive value. However, this approach could not remove the sensitive information represented by the second or higher moments of distributions. In this paper, we develop new methods that can deal with the second moment, i.e., variance, of recommendation outcomes without increasing the computational complexity. These methods can more strictly remove the sensitive information, and experimental results demonstrate that our new algorithms can more effectively eliminate the factors that undermine fairness. Additionally, we explore potential applications for independence-enhanced recommendation, and discuss its relation to other concepts, such as recommendation diversity.
Considerations on Recommendation Independence for a Find-Good-Items TaskToshihiro Kamishima
Considerations on Recommendation Independence for a Find-Good-Items Task
Workshop on Responsible Recommendation (FATREC), in conjunction with RecSys2017
Article @ Official Site: http://doi.org/10.18122/B2871W
Workshop Homepage: https://piret.gitlab.io/fatrec/
This paper examines the notion of recommendation independence, which is a constraint that a recommendation result is independent from specific information. This constraint is useful in ensuring adherence to laws and regulations, fair treatment of content providers, and exclusion of unwanted information. For example, to make a job-matching recommendation socially fair, the matching should be independent of socially sensitive information, such as gender or race. We previously developed several recommenders satisfying recommendation independence, but these were all designed for a predicting-ratings task, whose goal is to predict a score that a user would rate. We here focus on another find-good-items task, which aims to find some items that a user would prefer. In this task, scores representing the degree of preference to items are first predicted, and some items having the largest scores are displayed in the form of a ranked list. We developed a preliminary algorithm for this task through a naive approach, enhancing independence between a preference score and sensitive information. We empirically show that although this algorithm can enhance independence of a preference score, it is not fit for the purpose of enhancing independence in terms of a ranked list. This result indicates the need for inventing a notion of independence that is suitable for use with a ranked list and that is applicable for completing a find-good-items task.
KDD2016勉強会 https://atnd.org/events/80771
論文:“Why Should I Trust You?”Explaining the Predictions of Any Classifier
著者:M. T. Ribeiro and S. Singh and C. Guestrin
論文リンク: http://www.kdd.org/kdd2016/subtopic/view/why-should-i-trust-you-explaining-the-predictions-of-any-classifier
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
1. Enhancement of the Neutrality
in Recommendation
Toshihiro Kamishima*, Shotaro Akaho*, Hideki Asoh*, and Jun Sakuma†
*National Institute of Advanced Industrial Science and Technology (AIST), Japan
†University of Tsukuba, Japan; and Japan Science and Technology Agency
Workshop on Human Decision Making in Recommender Systems
In conjunction with the RecSys 2012 @ Dublin, Ireland, Sep. 9, 2012
START 1
2. Overview
Decision Making and Neutrality in Recommendation
Decisions based on biased information brings undesirable results
Providing neutral information is important in recommendation
Information Neutral Recommender System
The absolutely neutral recommendation is intrinsically infeasible,
because recommendation is always biased in a sense that it is
arranged for a specific user,
A system makes recommendation so as to enhance
the neutrality from a viewpoint specified by a user
2
3. Outline
Introduction
Importance of the Neutrality and the Filter Bubble
influence of biased recommendation, filter bubble problem,
discussion in the RecSys 2011 panel
Neutrality in Recommendation
ugly duckling theorem, information neutral recommendation
Information Neutral Recommender System
latent factor model, viewpoint variable, neutrality function
Experiments
Conclusion
3
5. Biased Recommendation
Biased Recommendation
exclude a good candidate from a set of options
rate relatively inferior options higher
Inappropriate Decision
The Filter Bubble Problem
Pariser posed a concern that personalization technologies narrow and
bias the topics of information provided to people
http://www.thefilterbubble.com/
5
6. Filter Bubble
[TED Talk by Eli Pariser]
Friend Recommendation List in Facebook
conservative people are eliminated form Pariser’s recommendation list
A summary of Pariser’s claim
Users lost opportunities to obtain information about a wide
variety of topics
Each user obtains too personalized information, and this make it
difficult to build consensus in our society
6
7. RecSys 2011 Panel on Filter Bubble
[RecSys 2011 Panel on the Filter Bubble]
RecSys 2011 Panel on Filter Bubble
Are there “filter bubbles?”
To what degree is personalized filtering a problem?
What should we as a community do to address the filter bubble
issue?
http://acmrecsys.wordpress.com/2011/10/25/panel-on-the-filter-bubble/
Intrinsic trade-off
providing focusing on
a diversity of topics users’ interests
To select something is not to select other things
7
8. RecSys 2011 Panel on Filter Bubble
[RecSys 2011 Panel on the Filter Bubble]
Intrinsic trade-off
providing focusing on
diversity of topics users’ interests
To select something is not to select other things
Personalized filtering is a necessity
Personalized filtering is a very effective tool
to find interesting things from the flood of information
8
9. RecSys 2011 Panel on Filter Bubble
[RecSys 2011 Panel on the Filter Bubble]
Personalized filtering is a necessity
Personalized filtering is a very effective tool
to find interesting things from the flood of information
recipes for alleviating
undesirable influence of personalized filtering
capture the users’ long-term interests
consider preference of item portfolio, not individual items
follow the changes of users’ preference pattern
our approach
give users to control perspective
to see the world through other eyes
9
11. Ugly Duckling Theorem
[Watanabe 69]
Ugly Duckling Theorem on fundamental property of classification
if the similarity between a pair of ducklings is measured by the number of potential
binary classification rules that classifies both of them into the same positive class
similarity between similarity between
an ugly and a normal ducklings = any pair of normal ducklings
An ugly and a normal ducklings are indistinguishable
Extremely unintuitive! Why?
We classify them by methods like SVMs everyday...
11
12. Ugly Duckling Theorem
[Watanabe 69]
Extremely unintuitive! Why?
The number of classification rules are considered,
but properties of rules are completely ignored
All features are equally treated
ex. the weight of a body color feature equals to that of a length of a
duckling
The complexity of rules is ignored
ex. the number of features included in rules
When classification, one must emphasize some features of objects
and must ignore the other features
12
13. Information Neutral Recommendation
Ugly Duckling Theorem
A part of aspects must be stressed when classifying objects
It is infeasible to make recommendation
that is neutral from any viewpoints
Information Neutral Recommendation
the neutrality from a viewpoint specified by a user
and other viewpoints are not considered
ex. A recommender system enhances the neutrality in terms of
whether conservative or progressive, but it is allowed to make
biased recommendations in terms of other viewpoints, for
example, the birthplace or age of friends
13
15. Information Neutral Recommender System
v : viewpoint variable
A binary variable representing a viewpoint specified by a user
Information Neutral Recommender System
neutral from a specified viewpoint
maximize statistical independence
between a preference score and a viewpoint variable
+
high prediction accuracy
minimize an empirical error plus a L2 regularization term
Information neutral version of a latent factor model
15
16. Latent Factor Model
[Koren 08]
Latent Factor Model : basic model of matrix decomposition
Predicting Ratings Task
predict a preference score of an item y rated by a user x
cross effect of
global bias users and items
s(x, y) = µ + bx + cy + px qy
ˆ
user-dependent bias item-dependent bias
For a given training data set, model parameters are learned by
minimizing the squared loss function with a L2 regularizer
16
17. Information Neutral Latent Factor Model
modifications of a latent factor model
adjust scores according to the state of a viewpoint
incorporate dependency on a viewpoint variable
enhance the neutrality of a score from a viewpoint
add a neutrality function as a constraint term
adjust scores according to the state of a viewpoint
viewpoint variables
s(x, y, v) = µ(v) + b(v) + c(v) + px q(v)
ˆ x y
(v)
y
Multiple latent factor models are built separately, and each of these
models corresponds to the each value of a viewpoint variable
When predicting scores, a model is selected according to the value
of viewpoint variable
17
18. Information Neutral Latent Factor Model
enhance the neutrality of a score from a viewpoint
neutrality function, neutral(ˆ, v) : quantify the degree of neutrality
s
The larger output of a neutrality function,
the higher degree of the neutrality of a prediction score
from a viewpoint variable
neutrality parameter to balance regularization
between the neutrality and the accuracy parameter
2 2
(si s(xi , yi , vi ))
ˆ neutral(ˆ(xi , yi , vi ), vi ) +
s 2
D
squared loss function neutrality function L2 regularizer
Parameters are learned by minimizing this objective function
18
19. Mutual Information as Neutrality Function
neutrality = scores are not influenced by a viewpoint variable
neutrality function = negative mutual information
We treat the neutrality as the statistical independence,
and it is quantified by mutual information
between a predicted score and a viewpoint variable
Pr[ˆ|v] is required for computing mutual information
s
This distribution function modeled by a histogram model
Failed to derive an analytical form of gradients of objective function
An objective function is minimized by a Powell method without gradients
19
21. Experimental Conditions
General Conditions
9,409 use-item pairs are sampled from the Movielens 100k data set
(A Powell optimizer is computationally inefficient and cannot be
applied to a large data set)
the number of latent factor K = 1
regularization parameter λ = 0.01
Evaluation measures are calculated by using five-fold cross validation
Evaluation Measure
MAE (mean absolute error)
prediction accuracy
NMI (normalized mutual information)
the neutrality of a preference score from a specified viewpoint
(mutual information between the predicted scores and the values
of viewpoint variable, and it is normalized into the range [0, 1])
21
22. Viewpoint Variables
The values of viewpoint variables are determined
depending on a user and/or an item
“Year” viewpoint : a movie’s release year is newer than 1990 or not
The older movies have a tendency to be rated higher, perhaps because
only masterpieces have survived [Koren 2009]
“Gender” viewpoint : a user is male or female
The movie rating would depend on the user’s gender
22
23. Experimental Results
prediction accuracy (MAE) degree of neutrality (NMI)
0.90 0.050
Year
higher degree of neutrality
Gender
higher accuracy
0.85
0.010
Year 0.005
Gender
0.80
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
neutrality parameter η : the lager value enhances the neutrality more
As the increase of a neutrality parameter η, prediction accuracies were
worsened slightly, but the neutralities were improved drastically
INRS could successfully improved the neutrality
without seriously sacrificing the prediction accuracy
23
24. Conclusion
Our Contributions
We formulate the neutrality in recommendation based on the ugly
duckling theorem
We developed a recommender system that can enhance the
neutrality in recommendation
Our experimental results show that the neutrality is successfully
enhanced without seriously sacrificing the prediction accuracy
Future Work
Our current formulation is poor in its scalability
formulation of the objective function whose gradients can be
derived analytically
neutrality functions other than mutual information, such as the
kurtosis used in ICA
Information neutral version of generative recommendation models,
such as a pLSA / LDA model
24
25. program codes and data sets
http://www.kamishima.net/inrs
acknowledgements
We would like to thank for providing a data set for the Grouplens
research lab
This work is supported by MEXT/JSPS KAKENHI Grant Number
16700157, 21500154, 22500142, 23240043, and 24500194, and JST
PRESTO 09152492
25
Editor's Notes
Today, we would like to talk about the enhancement of the neutrality in recommendation.\n
Because decisions based on biased information brings undesirable results, providing neutral information is important in recommendation.\nFor this purpose, we propose an information neutral recommender system.\nUnfortunately, the absolutely neutral recommendation is intrinsically infeasible.\nTherefore, this recommender system makes recommendation so as to enhance the neutrality from a viewpoint specified by a user.\n
This is an outline of our talk.\nAfter showing the importance of the neutrality in recommendation, we introduce the Filter Bubble problem.\nWe then discuss the neutrality in recommendation, and show our information neutral recommender system.\nFinally, we summarize our experimental results, and conclude our talk.\n
We begin with the importance of the neutrality and the filter bubble problem.\n
Biased recommendations may exclude a good candidate from candidates, or may rate relatively inferior option higher.\nConsequently, decisions would become inappropriate.\nPariser pointed out a problem of such biased recommendations as the filter bubble problem, which is a concern that personalization technologies narrow and bias the topics of information provided to people.\n
Pariser show an example of a friend recommendation list in Facebook.\nTo fit for his preference, conservative people are eliminated form his recommendation list, while this fact is not noticed to him.\nHis claim would be summarized into these two points.\nUsers lost opportunities to obtain information about a wide variety of topics.\nEach user obtains too personalized information, and this make it difficult to build consensus in our society.\n
In the last RecSys 2011 conference, a panel on this filter bubble problem was held.\nThese three sub-problems are discussed.\nFor the first sub-problem, panelists pointed out that the filter bubble is an intrinsic trade-off between providing a diversity of topics and focusing on users’ interests, because to select something is not to select other things.\n\n\n
Though personalized filtering has such a flaw, it is a very effective tool to find interesting things from the flood of information.\nClearly, personalized filtering is a necessity.\n
In the RecSys panel, panelists suggested recipes for alleviating undesirable influence of personalized filtering.\nAmong these recipes, we took an approach to give users to control perspective to see the world through other eyes.\n
We then discuss the neutrality in recommendation.\n
Before discussing the neutrality, we reconsider the well-known ugly duckling theorem on a fundamental property of classification.\nAccording to this theorem, under this condition, the similarity between a ugly and a normal ducklings is equivalent to the similarity between any pair of normal ducklings.\nThis fact derives the fact that an ugly and a normal ducklings are indistinguishable.\nThis looks extremely unintuitive! Why?\n
This is because the number of classification rules are considered, but properties of rules are completely ignored: all features are equally treated and the complexity of rules is ignored.\nThis theorem implies that When classification, one must emphasize some features of objectsand must ignore the other features.\n
Because the ugly duckling theorem indicates that a part of aspects must be stressed when classifying objects, it is infeasible to make recommendation that is neutral from any viewpoints.\nTherefore, we took an approach of enhancing the neutrality from a viewpoint specified by a user and other viewpoints are not considered.\nIn the case of Pariser’s Facebook example, a system enhances the neutrality in terms of whether conservative or progressive, but it is allowed to make biased recommendations in terms of other viewpoints, for example, the birthplace or age of friends.\n\n
To enhance such neutrality, we propose an information neutral recommender system.\n
This system adopt a viewpoint variable, which is a binary variable representing a viewpoint specified by a user.\nA goal of an information neutral recommender system is to make recommendation that is neutral from a specified viewpoint while keeping high prediction accuracy.\nThe neutrality is enhanced by statistical dependence between a preference score and a viewpoint variable.\nHigh prediction accuracy is achieved by minimizing an empirical error plus a L2 regularization term.\nWe then show an information neutral version of a latent factor model.\n
A latent factor model is a basic model of matrix decomposition, and is designed for predicting a preference score.\nA preference score is modeled by this formula, which consists of three bias terms and one cross term.\nFor a given training data set, model parameters are learned by minimizing the squared loss function with a L2 regularizer.\n
These two points are modified in information neutral version of a latent factor model.\nFirst, we modify this model so as to be able to adjust scores according to the state of a viewpoint to incorporate dependency on a viewpoint variable.\nMultiple latent factor models are built separately, and each of these models corresponds to the each value of a viewpoint variable.\nWhen predicting scores, a model is selected according to the state of viewpoint variable.\n
Second, a model is modified so as to be able to enhance the neutrality between a score and a viewpoint.\nFor this purpose, we introduce a neutrality function to quantify the degree of neutrality.\nThis neutrality function is added to the objective function like a regularization term.\nA neutrality parameter η balances between the neutrality and the accuracy.\nParameters are learned by minimizing this objective function.\n
We finally formalize a neutrality function.\nHere, the neutrality means that scores are not influenced by a viewpoint variable.\nTherefore, we treat the neutrality as the statistical independence, and it is quantified by mutual information between a predicted score and a viewpoint variable.\nThe computation of mutual information is fairly complicated, but we here omit the details.\n\n
We finally summarize our experimental results.\n
These are our experimental conditions.\nWe tested on this sampled data set, because a Powell optimizer is computationally inefficient and cannot be applied to a large data set.\nWe used two types of evaluation measure.\nMAE, mean absolute error, measures prediction accuracy.\nNMI, normalized mutual information, measures the neutrality between a predicted score and a viewpoint variable.\n\n
We tested two types of viewpoint variables.\nThe values of viewpoint variables are determined depending on a user and/or an item.\nFirst, a “Year” viewpoint variable represents whether a movie’s release year is newer than 1990 or not.\nSecond, a “Gender” viewpoint variable represents a user is male or female.\n
These are our experimental results.\nX-axes correspond to neutrality parameters, the lager value enhances the neutrality more.\nThis chart (left) shows the change of prediction accuracy.\nThis chart (right) shows the change of the degree of neutrality.\nAs the increase of a neutrality parameter η, prediction accuracy worsened slightly, and the neutrality enhanced drastically.\nTherefore, we can conclude that our information neutral recommender system could successfully improved the neutrality without seriously sacrificing the prediction accuracy.\n
These are our contributions.\nOur current formulation is poor in its scalability.\nWe plan to develop the formulation of the objective function whose gradients can be derived analytically.\nWe also consider to use a neutrality function other than mutual information, such as the kurtosis used in ICA.\n
Program codes and data sets are available at here.\nThat’s all I have to say. Thank you for your attention.\n