Bayesian Networks and Association Analysis

A N O V E R V I E W O F L I T E R A T U R E A N D P R O B L E M
S T A T E M E N T S A R O U N D S E N S I T I V I T Y A N D
I N T E R E S T I N G N E S S I N B E L I E F N E T W O R K S
A D N A N M A S O O D
S C I S . N O V A . E D U / ~ A D N A N
A D N A N @ N O V A . E D U
D O C T O R A L C A N D I D A T E
N O V A S O U T H E A S T E R N U N I V E R S I T Y
Bayesian Networks and
Association Analysis

Preliminaries
 Data mining is a statistical process to extract useful information,
unknown patterns and interesting relationships in large databases.
In this process, many statistical methods are used. Two of these
methods are Bayesian networks and association analysis.
 Bayesian networks are probabilistic graphical models that
encode relationships among a set of random variables in a database.
Since they have both causal and probabilistic aspects, data
information and expert knowledge can easily be combined by them.
Bayesian networks can also represent knowledge about uncertain
domain and make strong inferences.
 Association analysis is a useful technique to detect hidden
associations and rules in large databases, and it extracts previously
unknown and surprising patterns from already known information.
A drawback of association analysis is that many patterns are
generated even if the data set is very small. Hence, suitable
interestingness measures must be performed to eliminate
uninteresting patterns.
Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52

Bayesian Networks
 Bayesian networks are Directed Acyclic Graphs (DAGs) that
encode probabilistic relationships between random variables.
They provide to model joint probability distribution of a set of
random variables efficiently and to make some computations
from this model.
 Bayesian Networks have have emerged as an important
method which adds uncertain expert knowledge to the system
 Bayesian networks provides support for inference and
learning.
 Bayesian networks can be learned directly from data, they can
also be learned from expert opinion.
 The results obtained from association analysis can be used to
learn and update Bayesian networks.

Association Analysis
 Association analysis helps examine items that are seen together
frequently in data set and to reveal patterns that help decision making.
These patterns are represented as “association rules” or “frequent
itemsets” in association analysis. In association analysis
 obtaining the patterns is complicated and time consuming when data set is
large.
 patterns found by association analysis can be deceptive since some
relationships may arise by chance.
 A problem encountered in association analysis is that a great number of
patterns are generated even if data set is small, so millions of patterns
can be obtained when data set is large.
 Therefore, patterns obtained by association analysis should be
evaluated according to their interestingness levels and the patterns
which are found uninteresting according to these measures should be
eliminated.

Measuring Interestingness
 Interestingness levels are measured by interestingness
measures. These measures are categorized into
“objective interestingness measures” and “subjective
interestingness measures”. Objective measures are
based on data and structure of pattern, subjective
measures are also based on expert knowledge in
addition to data and structure of the pattern.
 Subjective interestingness measures are
generally specified through belief systems.
Since Bayesian networks are belief systems,
these measurescan be specified over them

Interestingness Measures for Association
Patterns (Tan et al, 2002)

Mutual collaboration between Bayesian networks
and Association analysis
 Bayesian networks and association analysis can be used
together in knowledge discovery. While Bayesian
networks are used to generate an objective or subjective
measure, interesting patterns obtained via association
analysis are used to learn Bayesian networks.
 Bayesian networks encode the expert belief systems and
they can be used to create a subjective and objective
measure. Therefore, the most different patterns from the
past information (belief) indicated by a Bayesian network
are considered interesting. There are some methods
suggested in the literature to determine interesting
patterns by using Bayesian networks. Two of these
methods were suggested by Jaroszewicz and Simovici
and Malhas and Aghbari.

Interestingness according to Jaroszewicz & Simovici
 Jaroszewicz and Simovici define interestingness of
an itemset as absolute difference between its
supports estimated from data and Bayesian
network. If this difference for an itemset is bigger
than a given threshold, this itemset is considered
interesting. In this method, interesting itemsets are
determined instead of association rules.
 Direction of rules is specified according to user‟s
experience.

Interestingness according to Malhas & Aghbari
 Another subjective interestingness measure generated
using Bayesian networks were suggested by Malhas and
Aghbari. This measure is the sensitivity of the Bayesian
network to the patterns discovered and it is obtained by
assessing the uncertainty-increasing potential of a
pattern on the beliefs of Bayesian network. The patterns
having the highest sensitivity value is considered the
most interesting patterns. In this approach, mutual
information is a measure of uncertainty. Sensitivity of a
pattern is the sum of the mutual information increases
when a pattern enters as an evidence/finding to the
Bayesian network.

A Belief-Driven Method for Discovering Unexpected Patterns
Tuzhilin and Padmanabhan (AAAI)
Abstract
 Several pattern discovery methods proposed in the data mining literature
have the drawbacks that they discover too many obvious or irrelevant
patterns and that they do not leverage to a full extent valuable prior domain
knowledge that decision makers have. (Tuzhilin and Padmanabhan)
proposed a new method of discovery that addresses these drawbacks.
 In particular they propose a new method of discovering unexpected
patterns that takes into consideration prior background knowledge of
decision makers. This prior knowledge constitutes a set of expectations or
beliefs about the problem domain.
 Tuzhilin and Padmanabhan‟s proposed method of discovering unexpected
patterns uses these beliefs to seed the search for patterns in data that
contradict the beliefs. To evaluate the practicality of our approach, authors
applied our algorithm to consumer purchase data from a major market
research company and to web logfile data tracked at an academic Web site.

Fast Discovery of Unexpected Patterns in Data, Relative to a
Bayesian Network (Jaroszewicz & Scheffer)
 Jaroszewicz and Scheffer considered a model in which
background knowledge on a given domain of interest is
available in terms of a Bayesian network, in addition to a large
database. The mining problem is to discover
unexpected patterns: goal is to find the strongest
discrepancies between network and database.
 This problem is intrinsically difficult because it requires
inference in a Bayesian network and processing the entire,
potentially very large, database.
 A sampling-based method that we introduce is efficient and
yet provably finds the approximately most interesting
unexpected patterns. Jaroszewicz & Scheffer give a rigorous
proof of the method‟s correctness. Experiments shed light on
its efficiency and practicality for large-scale Bayesian
networks and databases.

Scalable pattern mining with Bayesian networks as background
knowledge (Szymon Jaroszewicz , Tobias Scheffer & Dan A. Simovici)
Abstract
 Authors study a discovery framework in which background knowledge on variables and
their relations within a discourse area is available in the form of a graphical model.
 Starting from an initial, hand-crafted or possibly empty graphical model, the network
evolves in an interactive process of discovery, researchers focus on the central step of this
process:
 Given a graphical model and a database, authors address the problem of finding the most
interesting attribute sets. Authors formalize the concept of interestingness of attribute
sets as the divergence between their behavior as observed in the data, and the behavior
that can be explained given the current model.
 Jaroszewicz et al derive an exact algorithm that finds all attribute sets whose
interestingness exceeds a given threshold. Authors then consider the case of a very large
network that renders exact inference unfeasible, and a very large database or data
stream. They devise an algorithm that efficiently finds the most interesting attribute sets
with prescribed approximation bound and confidence probability, even for very large
networks and infinite streams.

Fast Discovery Of Interesting Patterns Based On Bayesian Network Background Knowledge
by Rana Malhas & Zaher Al Aghbari
Abstract
 The main problem faced by all association rule/pattern mining algorithms
is their production of a large number of rules which incur a secondary
mining problem; namely, mining interesting association rules/patterns.
The problem is compounded by the fact that „common knowledge‟
discovered rules are not interesting, but they are usually strong rules with
high support and confidence levels- the classical measures.
 In their research paper, authors presented a fast algorithm for discovering
interesting (unexpected) patterns based on background knowledge,
represented by a Bayesian network. A pattern/rule is unexpected if it is
„surprising‟ to the user. The algorithm profiles a pattern as interesting
(unexpected), if the absolute difference between its support estimated from
the dataset and the Bayesian network exceeds a user specified threshold
(ε). Itemsets with the highest diverging supports are considered the most
interesting.

Measuring the Interestingness
 The most interesting variable set according to Jaroszewicz
and Simovici (2004) approach using Bayesian network is
based on the absolute difference between this set‟s support
values obtained from data and Bayesian network.
 This interestingness value is based on the discrepancy
between Bayesian network and data. By using different
Bayesian networks depending on expert knowledge, more
interesting variable sets may be obtained.
 Bayesian network can be updated again using the most
interesting variable sets, and this updated network can be
used again to find interesting patterns. As these steps repeat,
the interestingness values of variable sets are reduced.
 This is because Bayesian network adapts data well and
discrepancy between Bayesian network and data decreases.

Using Interesting Patterns to Learn Bayesian Networks
 Learning structure and parameters in Bayesian
networks is an important problem in literature.
Expert knowledge is generally used to solve this
problem. However, it is not always possible to reach
the appropriate expert opinion. In current
applications, data set is exploited to learn Bayesian
network because of the lack of expert opinion.
Learning problem in Bayesian networks are
separated into “parameter learning” and “structure
learning”.

Bayesian Network and Expert Opinion
 Bayesian networks are generally created according to
expert opinion about the problem and the data.
Achieving expert opinion is generally difficult and costly.
If expert opinion about interested data can not be
reached, knowledge obtained from association analysis
results can be used to create a Bayesian network. In
addition, if expert opinion does not exist but a Bayesian
network is created according to non-expert opinion, this
Bayesian network can be updated according to
association analysis results. Hence, a suitable Bayesian
network can be created without the need for expert
opinion.

Expert Knowledge and Belief Network Learning
 If there is no expert knowledge about the structure of
Bayesian network, the data is used to learn the DAG
structure that best describes the data. In principle, in
order to find the best DAG structure for the variable
set V, all possible DAG representations for V should
be established and compared.

Research Problem – Interestingness for Rare Beliefs
 As seen in earlier examples from the the literature, the
properties of several interestingness measures have been
analyzed and several frameworks has been proposed for
selecting a right interestingness measure for extracting
association rules.
 However, for rare beliefs, anomalies and outliers, which
contain useful knowledge, researchers are making efforts to
investigate efficient approaches to extract the same.
 The research problem is to analyze the properties of
interestingness measures for determining the interestingness
of outliers and rare beliefs.
 Based on the analysis, researchers can suggest a set of
measures and properties an expert should consider while
selecting a measure to find the interestingness of rare
associations.

Conclusion
 Association analysis and Bayesian networks are two
methods which are used to accomplish different goals in
data mining. Whereas the aim of association analysis is
to obtain interesting patterns in a data set, the aim of
Bayesian networks is to calculate local probability
distributions of the variables by modelling causal
relationships between variables.
 Output of one of these two methods can be used as an
input to another method. Interesting patterns
determined by association analysis is exploited in
learning and updating Bayesian networks. Also, Bayesian
networks are exploited to create interestingness
measures used in association analysis.

Conclusion
 In association analysis, objective interestingness measures are
generally used to determine interesting patterns.
Interestingness is the incompatibility degree of the pattern to
the prior knowledge of the researcher. Objective
interestingness measures do not fully comply with this
meaning of interestingness.
 These measures identify patterns frequently seen in data set
rather than interesting patterns. However, subjective
interestingness measures comply with the meaning of
interestingness rather than objective measures. Subjective
interestingness measures are defined over expert systems.
 Bayesian networks are expert systems and they may be used
to define a subjective measure. The most different patterns
from the knowledge represented by Bayesian networks are
specified as the most interesting patterns.

Conclusion (cont)
 Using together these two data mining techniques
provides to add prior information to knowledge
discovery process.
 Even if this prior information is not received from an
expert, suitable results can still be obtained by
supporting this non-expert information with data.
Hence, there is no need to complicated and time
consuming algorithms to learn Bayesian networks
and to create interestingness measures, and more
suitable results to real world are reached.

References and Bibliography
 Ersel, D., & Günay, S. Bayesian Networks and Association Analysis in Knowledge Discovery Process. D. Ersel, S. Günay
/ İstatistikçiler Dergisi 5 (2012) 51-64 64
 I. Ben-Gal, 2007, Bayesian Networks,Encyclopedia of Statistics in Quality &Reliability,F. Ruggeri, F.Faltin, R. Kenett,
R. (eds), Wiley & Sons.
 M.W. Berry,M.Browne, 2006,Lecture Notes in Data Mining, World Scientific Publishing, Singapore, 222p.
 Y. Dong-Peng, L. Jin-Lin, 2008, Research on personal credit evaluation model based on bayesian network and
association rules, Knowledge Discovery and Data Mining, 2008. WKDD 2008. First International Workshop on , 457-
460.
 D. Heckerman, 1995, Bayesian networks for data mining, Data Mining and Knowledge Discovery1, 79-119.
 S. Jaroszewicz, D.A. Simovici, 2004, Interestingness of frequent itemsets using Bayesian networks as background
knowledge, Proceedings of the 10th ACM SIGKDD Conference on Knowledge Dicovery and Data Mining, August 20-25,
2004, New York, USA, 178-186.
 F.V. Jensen, 2001, Bayesian Networks and Decision Graphs, Springer-Verlag, New York, 268p.
 R. Malhas, Z. Aghbari, 2007, Fast discovery of interesting patterns based on Bayesian network background knowledge,
University of Sharjah Journal of Pure and Applied Science, 4 (3).
 K. Murphy, 1998, A brief introduction to graphical models and Bayesian networks,
http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html#infer.
 A. Siberschatz, A. Tuzhilin, A., 1995, On subjective measures of interestingness in knowledge discovery,
 Proceedings of the 1st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 20-
21, 1995, Montreal, Canada, 275-281.
 P. Tan, V. Kumar, J. Srivastava, 2002, Selecting the right interestingness measure for association patterns, Proceedings
of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 23-26, 2002,
Edmonton, Alberta, Canada, 32-41.
 P.Tan, M. Steinbach, V. Kumar, 2006, Introduction to Data Mining, Addison-Wesley, Boston,769p.

Bayesian Networks and Association Analysis

Bayesian Networks and Association Analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Viewers also liked

Viewers also liked (9)

Similar to Bayesian Networks and Association Analysis

Similar to Bayesian Networks and Association Analysis (20)

More from Adnan Masood

More from Adnan Masood (9)

Recently uploaded

Recently uploaded (20)

Bayesian Networks and Association Analysis