A N O V E R V I E W O F L I T E R A T U R E A N D P R O B L E MS T A T E M E N T S A R O U N D S E N S I T I V I T Y A N DI N T E R E S T I N G N E S S I N B E L I E F N E T W O R K SA D N A N M A S O O DS C I S . N O V A . E D U / ~ A D N A NA D N A N @ N O V A . E D UD O C T O R A L C A N D I D A T EN O V A S O U T H E A S T E R N U N I V E R S I T YBayesian Networks andAssociation Analysis
Preliminaries Data mining is a statistical process to extract useful information,unknown patterns and interesting relationships in large databases.In this process, many statistical methods are used. Two of thesemethods are Bayesian networks and association analysis. Bayesian networks are probabilistic graphical models thatencode relationships among a set of random variables in a database.Since they have both causal and probabilistic aspects, datainformation and expert knowledge can easily be combined by them.Bayesian networks can also represent knowledge about uncertaindomain and make strong inferences. Association analysis is a useful technique to detect hiddenassociations and rules in large databases, and it extracts previouslyunknown and surprising patterns from already known information.A drawback of association analysis is that many patterns aregenerated even if the data set is very small. Hence, suitableinterestingness measures must be performed to eliminateuninteresting patterns.Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
Bayesian Networks Bayesian networks are Directed Acyclic Graphs (DAGs) thatencode probabilistic relationships between random variables.They provide to model joint probability distribution of a set ofrandom variables efficiently and to make some computationsfrom this model. Bayesian Networks have have emerged as an importantmethod which adds uncertain expert knowledge to the system Bayesian networks provides support for inference andlearning. Bayesian networks can be learned directly from data, they canalso be learned from expert opinion. The results obtained from association analysis can be used tolearn and update Bayesian networks.Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
Association Analysis Association analysis helps examine items that are seen togetherfrequently in data set and to reveal patterns that help decision making.These patterns are represented as “association rules” or “frequentitemsets” in association analysis. In association analysis obtaining the patterns is complicated and time consuming when data set islarge. patterns found by association analysis can be deceptive since somerelationships may arise by chance. A problem encountered in association analysis is that a great number ofpatterns are generated even if data set is small, so millions of patternscan be obtained when data set is large. Therefore, patterns obtained by association analysis should beevaluated according to their interestingness levels and the patternswhich are found uninteresting according to these measures should beeliminated.Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
Measuring Interestingness Interestingness levels are measured by interestingnessmeasures. These measures are categorized into“objective interestingness measures” and “subjectiveinterestingness measures”. Objective measures arebased on data and structure of pattern, subjectivemeasures are also based on expert knowledge inaddition to data and structure of the pattern. Subjective interestingness measures aregenerally specified through belief systems.Since Bayesian networks are belief systems,these measurescan be specified over themReference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
Interestingness Measures for AssociationPatterns (Tan et al, 2002)
Mutual collaboration between Bayesian networksand Association analysis Bayesian networks and association analysis can be usedtogether in knowledge discovery. While Bayesiannetworks are used to generate an objective or subjectivemeasure, interesting patterns obtained via associationanalysis are used to learn Bayesian networks. Bayesian networks encode the expert belief systems andthey can be used to create a subjective and objectivemeasure. Therefore, the most different patterns from thepast information (belief) indicated by a Bayesian networkare considered interesting. There are some methodssuggested in the literature to determine interestingpatterns by using Bayesian networks. Two of thesemethods were suggested by Jaroszewicz and Simoviciand Malhas and Aghbari.Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
Interestingness according to Jaroszewicz & Simovici Jaroszewicz and Simovici define interestingness ofan itemset as absolute difference between itssupports estimated from data and Bayesiannetwork. If this difference for an itemset is biggerthan a given threshold, this itemset is consideredinteresting. In this method, interesting itemsets aredetermined instead of association rules. Direction of rules is specified according to user‟sexperience.
Interestingness according to Malhas & Aghbari Another subjective interestingness measure generatedusing Bayesian networks were suggested by Malhas andAghbari. This measure is the sensitivity of the Bayesiannetwork to the patterns discovered and it is obtained byassessing the uncertainty-increasing potential of apattern on the beliefs of Bayesian network. The patternshaving the highest sensitivity value is considered themost interesting patterns. In this approach, mutualinformation is a measure of uncertainty. Sensitivity of apattern is the sum of the mutual information increaseswhen a pattern enters as an evidence/finding to theBayesian network.
A Belief-Driven Method for Discovering Unexpected PatternsTuzhilin and Padmanabhan (AAAI)Abstract Several pattern discovery methods proposed in the data mining literaturehave the drawbacks that they discover too many obvious or irrelevantpatterns and that they do not leverage to a full extent valuable prior domainknowledge that decision makers have. (Tuzhilin and Padmanabhan)proposed a new method of discovery that addresses these drawbacks. In particular they propose a new method of discovering unexpectedpatterns that takes into consideration prior background knowledge ofdecision makers. This prior knowledge constitutes a set of expectations orbeliefs about the problem domain. Tuzhilin and Padmanabhan‟s proposed method of discovering unexpectedpatterns uses these beliefs to seed the search for patterns in data thatcontradict the beliefs. To evaluate the practicality of our approach, authorsapplied our algorithm to consumer purchase data from a major marketresearch company and to web logfile data tracked at an academic Web site.
Fast Discovery of Unexpected Patterns in Data, Relative to aBayesian Network (Jaroszewicz & Scheffer) Jaroszewicz and Scheffer considered a model in whichbackground knowledge on a given domain of interest isavailable in terms of a Bayesian network, in addition to a largedatabase. The mining problem is to discoverunexpected patterns: goal is to find the strongestdiscrepancies between network and database. This problem is intrinsically difficult because it requiresinference in a Bayesian network and processing the entire,potentially very large, database. A sampling-based method that we introduce is efficient andyet provably finds the approximately most interestingunexpected patterns. Jaroszewicz & Scheffer give a rigorousproof of the method‟s correctness. Experiments shed light onits efficiency and practicality for large-scale Bayesiannetworks and databases.
Scalable pattern mining with Bayesian networks as backgroundknowledge (Szymon Jaroszewicz , Tobias Scheffer & Dan A. Simovici)Abstract Authors study a discovery framework in which background knowledge on variables andtheir relations within a discourse area is available in the form of a graphical model. Starting from an initial, hand-crafted or possibly empty graphical model, the networkevolves in an interactive process of discovery, researchers focus on the central step of thisprocess: Given a graphical model and a database, authors address the problem of finding the mostinteresting attribute sets. Authors formalize the concept of interestingness of attributesets as the divergence between their behavior as observed in the data, and the behaviorthat can be explained given the current model. Jaroszewicz et al derive an exact algorithm that finds all attribute sets whoseinterestingness exceeds a given threshold. Authors then consider the case of a very largenetwork that renders exact inference unfeasible, and a very large database or datastream. They devise an algorithm that efficiently finds the most interesting attribute setswith prescribed approximation bound and confidence probability, even for very largenetworks and infinite streams.
Fast Discovery Of Interesting Patterns Based On Bayesian Network Background Knowledgeby Rana Malhas & Zaher Al AghbariAbstract The main problem faced by all association rule/pattern mining algorithmsis their production of a large number of rules which incur a secondarymining problem; namely, mining interesting association rules/patterns.The problem is compounded by the fact that „common knowledge‟discovered rules are not interesting, but they are usually strong rules withhigh support and confidence levels- the classical measures. In their research paper, authors presented a fast algorithm for discoveringinteresting (unexpected) patterns based on background knowledge,represented by a Bayesian network. A pattern/rule is unexpected if it is„surprising‟ to the user. The algorithm profiles a pattern as interesting(unexpected), if the absolute difference between its support estimated fromthe dataset and the Bayesian network exceeds a user specified threshold(ε). Itemsets with the highest diverging supports are considered the mostinteresting.
Measuring the Interestingness The most interesting variable set according to Jaroszewiczand Simovici (2004) approach using Bayesian network isbased on the absolute difference between this set‟s supportvalues obtained from data and Bayesian network. This interestingness value is based on the discrepancybetween Bayesian network and data. By using differentBayesian networks depending on expert knowledge, moreinteresting variable sets may be obtained. Bayesian network can be updated again using the mostinteresting variable sets, and this updated network can beused again to find interesting patterns. As these steps repeat,the interestingness values of variable sets are reduced. This is because Bayesian network adapts data well anddiscrepancy between Bayesian network and data decreases.Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
Using Interesting Patterns to Learn Bayesian Networks Learning structure and parameters in Bayesiannetworks is an important problem in literature.Expert knowledge is generally used to solve thisproblem. However, it is not always possible to reachthe appropriate expert opinion. In currentapplications, data set is exploited to learn Bayesiannetwork because of the lack of expert opinion.Learning problem in Bayesian networks areseparated into “parameter learning” and “structurelearning”.Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
Bayesian Network and Expert Opinion Bayesian networks are generally created according toexpert opinion about the problem and the data.Achieving expert opinion is generally difficult and costly.If expert opinion about interested data can not bereached, knowledge obtained from association analysisresults can be used to create a Bayesian network. Inaddition, if expert opinion does not exist but a Bayesiannetwork is created according to non-expert opinion, thisBayesian network can be updated according toassociation analysis results. Hence, a suitable Bayesiannetwork can be created without the need for expertopinion.
Expert Knowledge and Belief Network Learning If there is no expert knowledge about the structure ofBayesian network, the data is used to learn the DAGstructure that best describes the data. In principle, inorder to find the best DAG structure for the variableset V, all possible DAG representations for V shouldbe established and compared.Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
Research Problem – Interestingness for Rare Beliefs As seen in earlier examples from the the literature, theproperties of several interestingness measures have beenanalyzed and several frameworks has been proposed forselecting a right interestingness measure for extractingassociation rules. However, for rare beliefs, anomalies and outliers, whichcontain useful knowledge, researchers are making efforts toinvestigate efficient approaches to extract the same. The research problem is to analyze the properties ofinterestingness measures for determining the interestingnessof outliers and rare beliefs. Based on the analysis, researchers can suggest a set ofmeasures and properties an expert should consider whileselecting a measure to find the interestingness of rareassociations.
Conclusion Association analysis and Bayesian networks are twomethods which are used to accomplish different goals indata mining. Whereas the aim of association analysis isto obtain interesting patterns in a data set, the aim ofBayesian networks is to calculate local probabilitydistributions of the variables by modelling causalrelationships between variables. Output of one of these two methods can be used as aninput to another method. Interesting patternsdetermined by association analysis is exploited inlearning and updating Bayesian networks. Also, Bayesiannetworks are exploited to create interestingnessmeasures used in association analysis.Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
Conclusion In association analysis, objective interestingness measures aregenerally used to determine interesting patterns.Interestingness is the incompatibility degree of the pattern tothe prior knowledge of the researcher. Objectiveinterestingness measures do not fully comply with thismeaning of interestingness. These measures identify patterns frequently seen in data setrather than interesting patterns. However, subjectiveinterestingness measures comply with the meaning ofinterestingness rather than objective measures. Subjectiveinterestingness measures are defined over expert systems. Bayesian networks are expert systems and they may be usedto define a subjective measure. The most different patternsfrom the knowledge represented by Bayesian networks arespecified as the most interesting patterns.Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
Conclusion (cont) Using together these two data mining techniquesprovides to add prior information to knowledgediscovery process. Even if this prior information is not received from anexpert, suitable results can still be obtained bysupporting this non-expert information with data.Hence, there is no need to complicated and timeconsuming algorithms to learn Bayesian networksand to create interestingness measures, and moresuitable results to real world are reached.Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
References and Bibliography Ersel, D., & Günay, S. Bayesian Networks and Association Analysis in Knowledge Discovery Process. D. Ersel, S. Günay/ İstatistikçiler Dergisi 5 (2012) 51-64 64 I. Ben-Gal, 2007, Bayesian Networks,Encyclopedia of Statistics in Quality &Reliability,F. Ruggeri, F.Faltin, R. Kenett,R. (eds), Wiley & Sons. M.W. Berry,M.Browne, 2006,Lecture Notes in Data Mining, World Scientific Publishing, Singapore, 222p. Y. Dong-Peng, L. Jin-Lin, 2008, Research on personal credit evaluation model based on bayesian network andassociation rules, Knowledge Discovery and Data Mining, 2008. WKDD 2008. First International Workshop on , 457-460. D. Heckerman, 1995, Bayesian networks for data mining, Data Mining and Knowledge Discovery1, 79-119. S. Jaroszewicz, D.A. Simovici, 2004, Interestingness of frequent itemsets using Bayesian networks as backgroundknowledge, Proceedings of the 10th ACM SIGKDD Conference on Knowledge Dicovery and Data Mining, August 20-25,2004, New York, USA, 178-186. F.V. Jensen, 2001, Bayesian Networks and Decision Graphs, Springer-Verlag, New York, 268p. R. Malhas, Z. Aghbari, 2007, Fast discovery of interesting patterns based on Bayesian network background knowledge,University of Sharjah Journal of Pure and Applied Science, 4 (3). K. Murphy, 1998, A brief introduction to graphical models and Bayesian networks,http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html#infer. A. Siberschatz, A. Tuzhilin, A., 1995, On subjective measures of interestingness in knowledge discovery, Proceedings of the 1st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 20-21, 1995, Montreal, Canada, 275-281. P. Tan, V. Kumar, J. Srivastava, 2002, Selecting the right interestingness measure for association patterns, Proceedingsof the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 23-26, 2002,Edmonton, Alberta, Canada, 32-41. P.Tan, M. Steinbach, V. Kumar, 2006, Introduction to Data Mining, Addison-Wesley, Boston,769p.