Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Recommender Systems Bener


Published on

Cognitive Systems Institute Group Speaker Series January 22, 2015. Speaker Ayse Bener from Ryerson University.

Published in: Technology
  • Be the first to comment

Recommender Systems Bener

  1. 1. Recommender Systems: Challenges and Opportunities Ayse Bener January 22, 2015
  2. 2. }  Search is what you do when you're looking for something. Discovery is when something wonderful that you didn't know existed, or didn’t know how to ask for, finds you.” – CNN Money, “The race to create ‘smart’ Google.” From Search to Recommendation
  3. 3. Recommender problem the user is } Consumer } Subscriber } Member Estimate a utility function to predict how a user will like an item the item is }  Movie }  Apps }  Travel destinations
  4. 4. Recommender }  A good recommendation } Relevant to the user } Personalized } Diverse } Expands the user's taste into neighboring areas (serendipity – unsought finding)
  5. 5. Paradigm of Recommender Systems }  Recommender systems reduce information overload by estimating relevance }  Collaborative filtering :What is popular in a community }  User profile & community information }  Content Based: Provides more of what user liked before }  User profile & Item profile }  Knowledge Based :What is best based on the users’ needs }  User profile & Item profile & Knowledge Model }  Hybrid Method: Combination of inputs and/or composition of different methods }  User profile & Item profile & knowledge Model & Community Information
  6. 6. Recommender Systems Challenges }  Dealing with Big Data problems }  Lack of Useful Data }  Unstructured data }  Missing Data }  New user and New Item }  Cold Start problem }  Temporality }  Changing Data }  Changing user preferences and biases }  Negative choices }  Evaluating Recommenders
  7. 7. Main Research Issues }  Understanding the context and modeling context }  Algorithms }  Evaluation }  Engineering
  8. 8. Bayesian Networks For Evidence-Based Decision- Making in Software Engineering Ayse Tosun Misirli, and Ayse Bener, IEEE Transactions on Software Engineering, vol.40, no.6., June 2014
  9. 9. Recommendation systems for software engineering (RSSE) }  Recommendation systems/ prediction models should be designed in a way that they are capable of integrating evidence, i.e., facts and probabilities systematically collected or measured from real data and observations, into practitioners’ experience. }  In this study, we follow the lead of computational biology and healthcare decision-making, and investigate the applications of BNs in SE
  10. 10. The Bayesian Approach }  Provides a natural statistical framework for evidence-based decision-making by incorporating an integrated summary of the available evidence and associated uncertainty (of consequences) }  Maintaining observations, statistical distributions, prior assumptions, and expert judgment in a single model }  Encoding causal relationships among variables for predicting future actions }  “information propagation through the network”, i.e., gaming over the network to see all possible scenarios and their outcomes to give the best action }  imitating the process of human thinking, while going beyond the capabilities of human reasoning with a fact-based, error-free intelligence through the usage of enormous amounts of historical data
  11. 11. Example of a simple BN with different variable types
  12. 12. Systematic Mapping of BNs in SWE }  To investigate the applications of BNs in SE }  main software engineering challenges addressed }  techniques used to learn causal relationships among variables }  techniques used to infer the parameters }  variable types used as BN nodes
  13. 13. Empirical Analysis on Bayesian Decision- Making }  Hybrid Bayesian Network that would solve a specific software engineering challenge }  predicting software reliability in terms of post-release defects }  a ’mixeddata’ model to represent software life cycle phases by incorporating expert judgment (qualitative data through surveys) into quantitative data collected from software repositories }  a ’hybrid’ BN that incorporates both continuous and categorical variables
  14. 14. Demographics for Two Software Companies
  15. 15. BN Models in this Study
  16. 16. Model Representation Model #1 Model #2 Model #3
  17. 17. Graphical Representation of BN (Co. A)
  18. 18. Graphical Representation of BN (Co. B)
  19. 19. Setting Prior Distributions }  Model #1 }  expert knowledge }  Model #2 }  Lilliefors significance test on all variables and on post release defects }  normal probability plots }  Model #3 }  The requirements specification subnet whose distributions were set based on expert knowledge is used, and it is incorporated with the development and testing subnet in Model #2 whose variables are assigned different distributions based on the significance tests
  20. 20. Structure Learning }  Expert Judgement }  Chi-plot }  Independence betwen two variables }  Copula models- a transformation of data with marginal distributions }  Prior to modeling it is necessary to chack the presence of dependence there is a positive monotone dependence between test cases and post release defects as data pairs are shifted towards right from the center
  21. 21. Inference }  Bayesian learning for complex models using Monte Carlo methods, especially Gibbs sampling }  insufficient statistics }  incomplete data }  successively sample from posterior distribution of each node in a Bayesian model given all the others as full conditionals }  successful when estimating the unknown parameters of probability distributions or when conducting empirical analysis to infer true values of a given sample }  enables to make predictions for future scenarios even though some of the input variables are missing
  22. 22. Prediction Performance of the Models
  23. 23. Threats to Validity }  Internal validity }  biases during data collection }  Used scripts to extract data }  Eliminated outliers }  BNs for causality and to avoid over-fitting }  Construct validity }  Large set of metrics were chosen }  Well-known performance measures are used }  Conclusion validity }  Non-parametric test (Mann-Whitney U-test),ANOVA, t-test were used }  External validity }  we aim to transfer the methodology behind BN construction to enhance the usage of these graphical, probabilistic models in software engineering
  24. 24. Conclusions }  Similar to computational biology and healthcare, we need to make decisions under uncertainty using multiple data sources }  As we understand the dynamics of BNs and the techniques used for model learning, these models would enable us to uncover hidden relationships between variables, which cannot be easily identified by experts }  Understanding the theory behind BNs also gives us the opportunity to adopt these models to different industrial settings by changing the set of metrics, their distributions, and causal relationships among variables
  25. 25. Conclusions }  An integrated tool support (intelligent software delivery platform) }  Dione – to be integrated to IBM Rational