SlideShare a Scribd company logo
1 of 24
A N O V E R V I E W O F L I T E R A T U R E A N D P R O B L E M
S T A T E M E N T S A R O U N D S E N S I T I V I T Y A N D
I N T E R E S T I N G N E S S I N B E L I E F N E T W O R K S
A D N A N M A S O O D
S C I S . N O V A . E D U / ~ A D N A N
A D N A N @ N O V A . E D U
D O C T O R A L C A N D I D A T E
N O V A S O U T H E A S T E R N U N I V E R S I T Y
Bayesian Networks and
Association Analysis
Preliminaries
 Data mining is a statistical process to extract useful information,
unknown patterns and interesting relationships in large databases.
In this process, many statistical methods are used. Two of these
methods are Bayesian networks and association analysis.
 Bayesian networks are probabilistic graphical models that
encode relationships among a set of random variables in a database.
Since they have both causal and probabilistic aspects, data
information and expert knowledge can easily be combined by them.
Bayesian networks can also represent knowledge about uncertain
domain and make strong inferences.
 Association analysis is a useful technique to detect hidden
associations and rules in large databases, and it extracts previously
unknown and surprising patterns from already known information.
A drawback of association analysis is that many patterns are
generated even if the data set is very small. Hence, suitable
interestingness measures must be performed to eliminate
uninteresting patterns.
Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
Bayesian Networks
 Bayesian networks are Directed Acyclic Graphs (DAGs) that
encode probabilistic relationships between random variables.
They provide to model joint probability distribution of a set of
random variables efficiently and to make some computations
from this model.
 Bayesian Networks have have emerged as an important
method which adds uncertain expert knowledge to the system
 Bayesian networks provides support for inference and
learning.
 Bayesian networks can be learned directly from data, they can
also be learned from expert opinion.
 The results obtained from association analysis can be used to
learn and update Bayesian networks.
Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
Bayesian Networks (cont)
Association Analysis
 Association analysis helps examine items that are seen together
frequently in data set and to reveal patterns that help decision making.
These patterns are represented as “association rules” or “frequent
itemsets” in association analysis. In association analysis
 obtaining the patterns is complicated and time consuming when data set is
large.
 patterns found by association analysis can be deceptive since some
relationships may arise by chance.
 A problem encountered in association analysis is that a great number of
patterns are generated even if data set is small, so millions of patterns
can be obtained when data set is large.
 Therefore, patterns obtained by association analysis should be
evaluated according to their interestingness levels and the patterns
which are found uninteresting according to these measures should be
eliminated.
Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
Measuring Interestingness
 Interestingness levels are measured by interestingness
measures. These measures are categorized into
“objective interestingness measures” and “subjective
interestingness measures”. Objective measures are
based on data and structure of pattern, subjective
measures are also based on expert knowledge in
addition to data and structure of the pattern.
 Subjective interestingness measures are
generally specified through belief systems.
Since Bayesian networks are belief systems,
these measurescan be specified over them
Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
Interestingness Measures for Association
Patterns (Tan et al, 2002)
Mutual collaboration between Bayesian networks
and Association analysis
 Bayesian networks and association analysis can be used
together in knowledge discovery. While Bayesian
networks are used to generate an objective or subjective
measure, interesting patterns obtained via association
analysis are used to learn Bayesian networks.
 Bayesian networks encode the expert belief systems and
they can be used to create a subjective and objective
measure. Therefore, the most different patterns from the
past information (belief) indicated by a Bayesian network
are considered interesting. There are some methods
suggested in the literature to determine interesting
patterns by using Bayesian networks. Two of these
methods were suggested by Jaroszewicz and Simovici
and Malhas and Aghbari.
Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
Interestingness according to Jaroszewicz & Simovici
 Jaroszewicz and Simovici define interestingness of
an itemset as absolute difference between its
supports estimated from data and Bayesian
network. If this difference for an itemset is bigger
than a given threshold, this itemset is considered
interesting. In this method, interesting itemsets are
determined instead of association rules.
 Direction of rules is specified according to user‟s
experience.
Interestingness according to Malhas & Aghbari
 Another subjective interestingness measure generated
using Bayesian networks were suggested by Malhas and
Aghbari. This measure is the sensitivity of the Bayesian
network to the patterns discovered and it is obtained by
assessing the uncertainty-increasing potential of a
pattern on the beliefs of Bayesian network. The patterns
having the highest sensitivity value is considered the
most interesting patterns. In this approach, mutual
information is a measure of uncertainty. Sensitivity of a
pattern is the sum of the mutual information increases
when a pattern enters as an evidence/finding to the
Bayesian network.
A Belief-Driven Method for Discovering Unexpected Patterns
Tuzhilin and Padmanabhan (AAAI)
Abstract
 Several pattern discovery methods proposed in the data mining literature
have the drawbacks that they discover too many obvious or irrelevant
patterns and that they do not leverage to a full extent valuable prior domain
knowledge that decision makers have. (Tuzhilin and Padmanabhan)
proposed a new method of discovery that addresses these drawbacks.
 In particular they propose a new method of discovering unexpected
patterns that takes into consideration prior background knowledge of
decision makers. This prior knowledge constitutes a set of expectations or
beliefs about the problem domain.
 Tuzhilin and Padmanabhan‟s proposed method of discovering unexpected
patterns uses these beliefs to seed the search for patterns in data that
contradict the beliefs. To evaluate the practicality of our approach, authors
applied our algorithm to consumer purchase data from a major market
research company and to web logfile data tracked at an academic Web site.
Fast Discovery of Unexpected Patterns in Data, Relative to a
Bayesian Network (Jaroszewicz & Scheffer)
 Jaroszewicz and Scheffer considered a model in which
background knowledge on a given domain of interest is
available in terms of a Bayesian network, in addition to a large
database. The mining problem is to discover
unexpected patterns: goal is to find the strongest
discrepancies between network and database.
 This problem is intrinsically difficult because it requires
inference in a Bayesian network and processing the entire,
potentially very large, database.
 A sampling-based method that we introduce is efficient and
yet provably finds the approximately most interesting
unexpected patterns. Jaroszewicz & Scheffer give a rigorous
proof of the method‟s correctness. Experiments shed light on
its efficiency and practicality for large-scale Bayesian
networks and databases.
Scalable pattern mining with Bayesian networks as background
knowledge (Szymon Jaroszewicz , Tobias Scheffer & Dan A. Simovici)
Abstract
 Authors study a discovery framework in which background knowledge on variables and
their relations within a discourse area is available in the form of a graphical model.
 Starting from an initial, hand-crafted or possibly empty graphical model, the network
evolves in an interactive process of discovery, researchers focus on the central step of this
process:
 Given a graphical model and a database, authors address the problem of finding the most
interesting attribute sets. Authors formalize the concept of interestingness of attribute
sets as the divergence between their behavior as observed in the data, and the behavior
that can be explained given the current model.
 Jaroszewicz et al derive an exact algorithm that finds all attribute sets whose
interestingness exceeds a given threshold. Authors then consider the case of a very large
network that renders exact inference unfeasible, and a very large database or data
stream. They devise an algorithm that efficiently finds the most interesting attribute sets
with prescribed approximation bound and confidence probability, even for very large
networks and infinite streams.
Fast Discovery Of Interesting Patterns Based On Bayesian Network Background Knowledge
by Rana Malhas & Zaher Al Aghbari
Abstract
 The main problem faced by all association rule/pattern mining algorithms
is their production of a large number of rules which incur a secondary
mining problem; namely, mining interesting association rules/patterns.
The problem is compounded by the fact that „common knowledge‟
discovered rules are not interesting, but they are usually strong rules with
high support and confidence levels- the classical measures.
 In their research paper, authors presented a fast algorithm for discovering
interesting (unexpected) patterns based on background knowledge,
represented by a Bayesian network. A pattern/rule is unexpected if it is
„surprising‟ to the user. The algorithm profiles a pattern as interesting
(unexpected), if the absolute difference between its support estimated from
the dataset and the Bayesian network exceeds a user specified threshold
(ε). Itemsets with the highest diverging supports are considered the most
interesting.
Measuring the Interestingness
 The most interesting variable set according to Jaroszewicz
and Simovici (2004) approach using Bayesian network is
based on the absolute difference between this set‟s support
values obtained from data and Bayesian network.
 This interestingness value is based on the discrepancy
between Bayesian network and data. By using different
Bayesian networks depending on expert knowledge, more
interesting variable sets may be obtained.
 Bayesian network can be updated again using the most
interesting variable sets, and this updated network can be
used again to find interesting patterns. As these steps repeat,
the interestingness values of variable sets are reduced.
 This is because Bayesian network adapts data well and
discrepancy between Bayesian network and data decreases.
Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
Using Interesting Patterns to Learn Bayesian Networks
 Learning structure and parameters in Bayesian
networks is an important problem in literature.
Expert knowledge is generally used to solve this
problem. However, it is not always possible to reach
the appropriate expert opinion. In current
applications, data set is exploited to learn Bayesian
network because of the lack of expert opinion.
Learning problem in Bayesian networks are
separated into “parameter learning” and “structure
learning”.
Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
Bayesian Network and Expert Opinion
 Bayesian networks are generally created according to
expert opinion about the problem and the data.
Achieving expert opinion is generally difficult and costly.
If expert opinion about interested data can not be
reached, knowledge obtained from association analysis
results can be used to create a Bayesian network. In
addition, if expert opinion does not exist but a Bayesian
network is created according to non-expert opinion, this
Bayesian network can be updated according to
association analysis results. Hence, a suitable Bayesian
network can be created without the need for expert
opinion.
Expert Knowledge and Belief Network Learning
 If there is no expert knowledge about the structure of
Bayesian network, the data is used to learn the DAG
structure that best describes the data. In principle, in
order to find the best DAG structure for the variable
set V, all possible DAG representations for V should
be established and compared.
Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
Research Problem – Interestingness for Rare Beliefs
 As seen in earlier examples from the the literature, the
properties of several interestingness measures have been
analyzed and several frameworks has been proposed for
selecting a right interestingness measure for extracting
association rules.
 However, for rare beliefs, anomalies and outliers, which
contain useful knowledge, researchers are making efforts to
investigate efficient approaches to extract the same.
 The research problem is to analyze the properties of
interestingness measures for determining the interestingness
of outliers and rare beliefs.
 Based on the analysis, researchers can suggest a set of
measures and properties an expert should consider while
selecting a measure to find the interestingness of rare
associations.
Conclusion
 Association analysis and Bayesian networks are two
methods which are used to accomplish different goals in
data mining. Whereas the aim of association analysis is
to obtain interesting patterns in a data set, the aim of
Bayesian networks is to calculate local probability
distributions of the variables by modelling causal
relationships between variables.
 Output of one of these two methods can be used as an
input to another method. Interesting patterns
determined by association analysis is exploited in
learning and updating Bayesian networks. Also, Bayesian
networks are exploited to create interestingness
measures used in association analysis.
Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
Conclusion
 In association analysis, objective interestingness measures are
generally used to determine interesting patterns.
Interestingness is the incompatibility degree of the pattern to
the prior knowledge of the researcher. Objective
interestingness measures do not fully comply with this
meaning of interestingness.
 These measures identify patterns frequently seen in data set
rather than interesting patterns. However, subjective
interestingness measures comply with the meaning of
interestingness rather than objective measures. Subjective
interestingness measures are defined over expert systems.
 Bayesian networks are expert systems and they may be used
to define a subjective measure. The most different patterns
from the knowledge represented by Bayesian networks are
specified as the most interesting patterns.
Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
Conclusion (cont)
 Using together these two data mining techniques
provides to add prior information to knowledge
discovery process.
 Even if this prior information is not received from an
expert, suitable results can still be obtained by
supporting this non-expert information with data.
Hence, there is no need to complicated and time
consuming algorithms to learn Bayesian networks
and to create interestingness measures, and more
suitable results to real world are reached.
Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
References and Bibliography
 Ersel, D., & Günay, S. Bayesian Networks and Association Analysis in Knowledge Discovery Process. D. Ersel, S. Günay
/ İstatistikçiler Dergisi 5 (2012) 51-64 64
 I. Ben-Gal, 2007, Bayesian Networks,Encyclopedia of Statistics in Quality &Reliability,F. Ruggeri, F.Faltin, R. Kenett,
R. (eds), Wiley & Sons.
 M.W. Berry,M.Browne, 2006,Lecture Notes in Data Mining, World Scientific Publishing, Singapore, 222p.
 Y. Dong-Peng, L. Jin-Lin, 2008, Research on personal credit evaluation model based on bayesian network and
association rules, Knowledge Discovery and Data Mining, 2008. WKDD 2008. First International Workshop on , 457-
460.
 D. Heckerman, 1995, Bayesian networks for data mining, Data Mining and Knowledge Discovery1, 79-119.
 S. Jaroszewicz, D.A. Simovici, 2004, Interestingness of frequent itemsets using Bayesian networks as background
knowledge, Proceedings of the 10th ACM SIGKDD Conference on Knowledge Dicovery and Data Mining, August 20-25,
2004, New York, USA, 178-186.
 F.V. Jensen, 2001, Bayesian Networks and Decision Graphs, Springer-Verlag, New York, 268p.
 R. Malhas, Z. Aghbari, 2007, Fast discovery of interesting patterns based on Bayesian network background knowledge,
University of Sharjah Journal of Pure and Applied Science, 4 (3).
 K. Murphy, 1998, A brief introduction to graphical models and Bayesian networks,
http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html#infer.
 A. Siberschatz, A. Tuzhilin, A., 1995, On subjective measures of interestingness in knowledge discovery,
 Proceedings of the 1st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 20-
21, 1995, Montreal, Canada, 275-281.
 P. Tan, V. Kumar, J. Srivastava, 2002, Selecting the right interestingness measure for association patterns, Proceedings
of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 23-26, 2002,
Edmonton, Alberta, Canada, 32-41.
 P.Tan, M. Steinbach, V. Kumar, 2006, Introduction to Data Mining, Addison-Wesley, Boston,769p.
Bayesian Networks and Association Analysis

More Related Content

What's hot

Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...
Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...
Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...IJECEIAES
 
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSA MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSZac Darcy
 
1.model building
1.model building1.model building
1.model buildingVinod Sahu
 
Outlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional DataOutlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional DataIJERA Editor
 
Final Report
Final ReportFinal Report
Final Reportimu409
 
Survey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data MiningSurvey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data Miningijsrd.com
 
An efficient information retrieval ontology system based indexing for context
An efficient information retrieval ontology system based indexing for contextAn efficient information retrieval ontology system based indexing for context
An efficient information retrieval ontology system based indexing for contexteSAT Journals
 
On Machine Learning and Data Mining
On Machine Learning and Data MiningOn Machine Learning and Data Mining
On Machine Learning and Data Miningbutest
 
Classification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted ClusterClassification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted ClusterIOSR Journals
 
Recommender systems bener
Recommender systems   benerRecommender systems   bener
Recommender systems benerdiannepatricia
 
Accounting for variance in machine learning benchmarks
Accounting for variance in machine learning benchmarksAccounting for variance in machine learning benchmarks
Accounting for variance in machine learning benchmarksDevansh16
 
Classification and decision tree classifier machine learning
Classification and decision tree classifier machine learningClassification and decision tree classifier machine learning
Classification and decision tree classifier machine learningFrancisco E. Figueroa-Nigaglioni
 
opinion feature extraction using enhanced opinion mining technique and intrin...
opinion feature extraction using enhanced opinion mining technique and intrin...opinion feature extraction using enhanced opinion mining technique and intrin...
opinion feature extraction using enhanced opinion mining technique and intrin...INFOGAIN PUBLICATION
 
Classification Scoring for Cleaning Inconsistent Survey Data
Classification Scoring for Cleaning Inconsistent Survey DataClassification Scoring for Cleaning Inconsistent Survey Data
Classification Scoring for Cleaning Inconsistent Survey DataCSCJournals
 
Increasing the rigor and efficiency of research through the use of qualitati...
Increasing the rigor and efficiency of research through the use of  qualitati...Increasing the rigor and efficiency of research through the use of  qualitati...
Increasing the rigor and efficiency of research through the use of qualitati...Merlien Institute
 
REPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELS
REPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELSREPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELS
REPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELScscpconf
 

What's hot (17)

Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...
Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...
Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...
 
Research Proposal
Research ProposalResearch Proposal
Research Proposal
 
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSA MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
 
1.model building
1.model building1.model building
1.model building
 
Outlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional DataOutlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional Data
 
Final Report
Final ReportFinal Report
Final Report
 
Survey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data MiningSurvey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data Mining
 
An efficient information retrieval ontology system based indexing for context
An efficient information retrieval ontology system based indexing for contextAn efficient information retrieval ontology system based indexing for context
An efficient information retrieval ontology system based indexing for context
 
On Machine Learning and Data Mining
On Machine Learning and Data MiningOn Machine Learning and Data Mining
On Machine Learning and Data Mining
 
Classification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted ClusterClassification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted Cluster
 
Recommender systems bener
Recommender systems   benerRecommender systems   bener
Recommender systems bener
 
Accounting for variance in machine learning benchmarks
Accounting for variance in machine learning benchmarksAccounting for variance in machine learning benchmarks
Accounting for variance in machine learning benchmarks
 
Classification and decision tree classifier machine learning
Classification and decision tree classifier machine learningClassification and decision tree classifier machine learning
Classification and decision tree classifier machine learning
 
opinion feature extraction using enhanced opinion mining technique and intrin...
opinion feature extraction using enhanced opinion mining technique and intrin...opinion feature extraction using enhanced opinion mining technique and intrin...
opinion feature extraction using enhanced opinion mining technique and intrin...
 
Classification Scoring for Cleaning Inconsistent Survey Data
Classification Scoring for Cleaning Inconsistent Survey DataClassification Scoring for Cleaning Inconsistent Survey Data
Classification Scoring for Cleaning Inconsistent Survey Data
 
Increasing the rigor and efficiency of research through the use of qualitati...
Increasing the rigor and efficiency of research through the use of  qualitati...Increasing the rigor and efficiency of research through the use of  qualitati...
Increasing the rigor and efficiency of research through the use of qualitati...
 
REPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELS
REPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELSREPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELS
REPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELS
 

Viewers also liked

Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationAdnan Masood
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysisguest0edcaf
 
Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningWan Aezwani Wab
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Salah Amean
 
Risk Analysis in the Financial Services Industry
Risk Analysis in the Financial Services IndustryRisk Analysis in the Financial Services Industry
Risk Analysis in the Financial Services IndustryRevolution Analytics
 
Text categorization
Text categorizationText categorization
Text categorizationKU Leuven
 

Viewers also liked (9)

Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian Classification
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysis
 
Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule Mining
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
 
Risk Analysis in the Financial Services Industry
Risk Analysis in the Financial Services IndustryRisk Analysis in the Financial Services Industry
Risk Analysis in the Financial Services Industry
 
Text categorization
Text categorizationText categorization
Text categorization
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Data mining
Data miningData mining
Data mining
 

Similar to Bayesian Networks and Association Analysis

LINK MINING PROCESS
LINK MINING PROCESSLINK MINING PROCESS
LINK MINING PROCESSIJDKP
 
02 Network Data Collection
02 Network Data Collection02 Network Data Collection
02 Network Data Collectiondnac
 
Poster Abstracts
Poster AbstractsPoster Abstracts
Poster Abstractsbutest
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...Editor IJCATR
 
Distributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache SparkDistributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache SparkAnastasios Theodosiou
 
Link Prediction Survey
Link Prediction SurveyLink Prediction Survey
Link Prediction SurveyPatrick Walter
 
Contractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACContractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACwebuploader
 
DATA MINING.doc
DATA MINING.docDATA MINING.doc
DATA MINING.docbutest
 
An improved technique for ranking semantic associationst07
An improved technique for ranking semantic associationst07An improved technique for ranking semantic associationst07
An improved technique for ranking semantic associationst07IJwest
 
Criminal and Civil Identification with DNA Databases Using Bayesian Networks
Criminal and Civil Identification with DNA Databases Using Bayesian NetworksCriminal and Civil Identification with DNA Databases Using Bayesian Networks
Criminal and Civil Identification with DNA Databases Using Bayesian NetworksCSCJournals
 
Neural networks, naïve bayes and decision tree machine learning
Neural networks, naïve bayes and decision tree machine learningNeural networks, naïve bayes and decision tree machine learning
Neural networks, naïve bayes and decision tree machine learningFrancisco E. Figueroa-Nigaglioni
 
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...IOSR Journals
 
4113ijaia09
4113ijaia094113ijaia09
4113ijaia09mamin321
 
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data MiningUsing Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining14894
 

Similar to Bayesian Networks and Association Analysis (20)

LINK MINING PROCESS
LINK MINING PROCESSLINK MINING PROCESS
LINK MINING PROCESS
 
02 Network Data Collection
02 Network Data Collection02 Network Data Collection
02 Network Data Collection
 
02 Network Data Collection (2016)
02 Network Data Collection (2016)02 Network Data Collection (2016)
02 Network Data Collection (2016)
 
Poster Abstracts
Poster AbstractsPoster Abstracts
Poster Abstracts
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
 
Hu3414421448
Hu3414421448Hu3414421448
Hu3414421448
 
Data mining
Data miningData mining
Data mining
 
Distributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache SparkDistributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache Spark
 
Link Prediction Survey
Link Prediction SurveyLink Prediction Survey
Link Prediction Survey
 
Contractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACContractor-Borner-SNA-SAC
Contractor-Borner-SNA-SAC
 
DATA MINING.doc
DATA MINING.docDATA MINING.doc
DATA MINING.doc
 
An improved technique for ranking semantic associationst07
An improved technique for ranking semantic associationst07An improved technique for ranking semantic associationst07
An improved technique for ranking semantic associationst07
 
Criminal and Civil Identification with DNA Databases Using Bayesian Networks
Criminal and Civil Identification with DNA Databases Using Bayesian NetworksCriminal and Civil Identification with DNA Databases Using Bayesian Networks
Criminal and Civil Identification with DNA Databases Using Bayesian Networks
 
Neural networks, naïve bayes and decision tree machine learning
Neural networks, naïve bayes and decision tree machine learningNeural networks, naïve bayes and decision tree machine learning
Neural networks, naïve bayes and decision tree machine learning
 
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
 
4113ijaia09
4113ijaia094113ijaia09
4113ijaia09
 
4113ijaia09
4113ijaia094113ijaia09
4113ijaia09
 
F033026029
F033026029F033026029
F033026029
 
M033059064
M033059064M033059064
M033059064
 
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data MiningUsing Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining
 

More from Adnan Masood

Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhDSpark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhDAdnan Masood
 
Data science with Windows Azure - A Brief Introduction
Data science with Windows Azure - A Brief IntroductionData science with Windows Azure - A Brief Introduction
Data science with Windows Azure - A Brief IntroductionAdnan Masood
 
Restructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality ApproachRestructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality ApproachAdnan Masood
 
System Quality Attributes for Software Architecture
System Quality Attributes for Software ArchitectureSystem Quality Attributes for Software Architecture
System Quality Attributes for Software ArchitectureAdnan Masood
 
Agile Software Development
Agile Software DevelopmentAgile Software Development
Agile Software DevelopmentAdnan Masood
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionAdnan Masood
 
Web API or WCF - An Architectural Comparison
Web API or WCF - An Architectural ComparisonWeb API or WCF - An Architectural Comparison
Web API or WCF - An Architectural ComparisonAdnan Masood
 
SOLID Principles of Refactoring Presentation - Inland Empire User Group
SOLID Principles of Refactoring Presentation - Inland Empire User GroupSOLID Principles of Refactoring Presentation - Inland Empire User Group
SOLID Principles of Refactoring Presentation - Inland Empire User GroupAdnan Masood
 
Brief bibliography of interestingness measure, bayesian belief network and ca...
Brief bibliography of interestingness measure, bayesian belief network and ca...Brief bibliography of interestingness measure, bayesian belief network and ca...
Brief bibliography of interestingness measure, bayesian belief network and ca...Adnan Masood
 

More from Adnan Masood (9)

Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhDSpark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
 
Data science with Windows Azure - A Brief Introduction
Data science with Windows Azure - A Brief IntroductionData science with Windows Azure - A Brief Introduction
Data science with Windows Azure - A Brief Introduction
 
Restructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality ApproachRestructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality Approach
 
System Quality Attributes for Software Architecture
System Quality Attributes for Software ArchitectureSystem Quality Attributes for Software Architecture
System Quality Attributes for Software Architecture
 
Agile Software Development
Agile Software DevelopmentAgile Software Development
Agile Software Development
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief Introduction
 
Web API or WCF - An Architectural Comparison
Web API or WCF - An Architectural ComparisonWeb API or WCF - An Architectural Comparison
Web API or WCF - An Architectural Comparison
 
SOLID Principles of Refactoring Presentation - Inland Empire User Group
SOLID Principles of Refactoring Presentation - Inland Empire User GroupSOLID Principles of Refactoring Presentation - Inland Empire User Group
SOLID Principles of Refactoring Presentation - Inland Empire User Group
 
Brief bibliography of interestingness measure, bayesian belief network and ca...
Brief bibliography of interestingness measure, bayesian belief network and ca...Brief bibliography of interestingness measure, bayesian belief network and ca...
Brief bibliography of interestingness measure, bayesian belief network and ca...
 

Recently uploaded

Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Recently uploaded (20)

Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Bayesian Networks and Association Analysis

  • 1. A N O V E R V I E W O F L I T E R A T U R E A N D P R O B L E M S T A T E M E N T S A R O U N D S E N S I T I V I T Y A N D I N T E R E S T I N G N E S S I N B E L I E F N E T W O R K S A D N A N M A S O O D S C I S . N O V A . E D U / ~ A D N A N A D N A N @ N O V A . E D U D O C T O R A L C A N D I D A T E N O V A S O U T H E A S T E R N U N I V E R S I T Y Bayesian Networks and Association Analysis
  • 2. Preliminaries  Data mining is a statistical process to extract useful information, unknown patterns and interesting relationships in large databases. In this process, many statistical methods are used. Two of these methods are Bayesian networks and association analysis.  Bayesian networks are probabilistic graphical models that encode relationships among a set of random variables in a database. Since they have both causal and probabilistic aspects, data information and expert knowledge can easily be combined by them. Bayesian networks can also represent knowledge about uncertain domain and make strong inferences.  Association analysis is a useful technique to detect hidden associations and rules in large databases, and it extracts previously unknown and surprising patterns from already known information. A drawback of association analysis is that many patterns are generated even if the data set is very small. Hence, suitable interestingness measures must be performed to eliminate uninteresting patterns. Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
  • 3. Bayesian Networks  Bayesian networks are Directed Acyclic Graphs (DAGs) that encode probabilistic relationships between random variables. They provide to model joint probability distribution of a set of random variables efficiently and to make some computations from this model.  Bayesian Networks have have emerged as an important method which adds uncertain expert knowledge to the system  Bayesian networks provides support for inference and learning.  Bayesian networks can be learned directly from data, they can also be learned from expert opinion.  The results obtained from association analysis can be used to learn and update Bayesian networks. Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
  • 5. Association Analysis  Association analysis helps examine items that are seen together frequently in data set and to reveal patterns that help decision making. These patterns are represented as “association rules” or “frequent itemsets” in association analysis. In association analysis  obtaining the patterns is complicated and time consuming when data set is large.  patterns found by association analysis can be deceptive since some relationships may arise by chance.  A problem encountered in association analysis is that a great number of patterns are generated even if data set is small, so millions of patterns can be obtained when data set is large.  Therefore, patterns obtained by association analysis should be evaluated according to their interestingness levels and the patterns which are found uninteresting according to these measures should be eliminated. Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
  • 6. Measuring Interestingness  Interestingness levels are measured by interestingness measures. These measures are categorized into “objective interestingness measures” and “subjective interestingness measures”. Objective measures are based on data and structure of pattern, subjective measures are also based on expert knowledge in addition to data and structure of the pattern.  Subjective interestingness measures are generally specified through belief systems. Since Bayesian networks are belief systems, these measurescan be specified over them Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
  • 7. Interestingness Measures for Association Patterns (Tan et al, 2002)
  • 8. Mutual collaboration between Bayesian networks and Association analysis  Bayesian networks and association analysis can be used together in knowledge discovery. While Bayesian networks are used to generate an objective or subjective measure, interesting patterns obtained via association analysis are used to learn Bayesian networks.  Bayesian networks encode the expert belief systems and they can be used to create a subjective and objective measure. Therefore, the most different patterns from the past information (belief) indicated by a Bayesian network are considered interesting. There are some methods suggested in the literature to determine interesting patterns by using Bayesian networks. Two of these methods were suggested by Jaroszewicz and Simovici and Malhas and Aghbari. Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
  • 9. Interestingness according to Jaroszewicz & Simovici  Jaroszewicz and Simovici define interestingness of an itemset as absolute difference between its supports estimated from data and Bayesian network. If this difference for an itemset is bigger than a given threshold, this itemset is considered interesting. In this method, interesting itemsets are determined instead of association rules.  Direction of rules is specified according to user‟s experience.
  • 10. Interestingness according to Malhas & Aghbari  Another subjective interestingness measure generated using Bayesian networks were suggested by Malhas and Aghbari. This measure is the sensitivity of the Bayesian network to the patterns discovered and it is obtained by assessing the uncertainty-increasing potential of a pattern on the beliefs of Bayesian network. The patterns having the highest sensitivity value is considered the most interesting patterns. In this approach, mutual information is a measure of uncertainty. Sensitivity of a pattern is the sum of the mutual information increases when a pattern enters as an evidence/finding to the Bayesian network.
  • 11. A Belief-Driven Method for Discovering Unexpected Patterns Tuzhilin and Padmanabhan (AAAI) Abstract  Several pattern discovery methods proposed in the data mining literature have the drawbacks that they discover too many obvious or irrelevant patterns and that they do not leverage to a full extent valuable prior domain knowledge that decision makers have. (Tuzhilin and Padmanabhan) proposed a new method of discovery that addresses these drawbacks.  In particular they propose a new method of discovering unexpected patterns that takes into consideration prior background knowledge of decision makers. This prior knowledge constitutes a set of expectations or beliefs about the problem domain.  Tuzhilin and Padmanabhan‟s proposed method of discovering unexpected patterns uses these beliefs to seed the search for patterns in data that contradict the beliefs. To evaluate the practicality of our approach, authors applied our algorithm to consumer purchase data from a major market research company and to web logfile data tracked at an academic Web site.
  • 12. Fast Discovery of Unexpected Patterns in Data, Relative to a Bayesian Network (Jaroszewicz & Scheffer)  Jaroszewicz and Scheffer considered a model in which background knowledge on a given domain of interest is available in terms of a Bayesian network, in addition to a large database. The mining problem is to discover unexpected patterns: goal is to find the strongest discrepancies between network and database.  This problem is intrinsically difficult because it requires inference in a Bayesian network and processing the entire, potentially very large, database.  A sampling-based method that we introduce is efficient and yet provably finds the approximately most interesting unexpected patterns. Jaroszewicz & Scheffer give a rigorous proof of the method‟s correctness. Experiments shed light on its efficiency and practicality for large-scale Bayesian networks and databases.
  • 13. Scalable pattern mining with Bayesian networks as background knowledge (Szymon Jaroszewicz , Tobias Scheffer & Dan A. Simovici) Abstract  Authors study a discovery framework in which background knowledge on variables and their relations within a discourse area is available in the form of a graphical model.  Starting from an initial, hand-crafted or possibly empty graphical model, the network evolves in an interactive process of discovery, researchers focus on the central step of this process:  Given a graphical model and a database, authors address the problem of finding the most interesting attribute sets. Authors formalize the concept of interestingness of attribute sets as the divergence between their behavior as observed in the data, and the behavior that can be explained given the current model.  Jaroszewicz et al derive an exact algorithm that finds all attribute sets whose interestingness exceeds a given threshold. Authors then consider the case of a very large network that renders exact inference unfeasible, and a very large database or data stream. They devise an algorithm that efficiently finds the most interesting attribute sets with prescribed approximation bound and confidence probability, even for very large networks and infinite streams.
  • 14. Fast Discovery Of Interesting Patterns Based On Bayesian Network Background Knowledge by Rana Malhas & Zaher Al Aghbari Abstract  The main problem faced by all association rule/pattern mining algorithms is their production of a large number of rules which incur a secondary mining problem; namely, mining interesting association rules/patterns. The problem is compounded by the fact that „common knowledge‟ discovered rules are not interesting, but they are usually strong rules with high support and confidence levels- the classical measures.  In their research paper, authors presented a fast algorithm for discovering interesting (unexpected) patterns based on background knowledge, represented by a Bayesian network. A pattern/rule is unexpected if it is „surprising‟ to the user. The algorithm profiles a pattern as interesting (unexpected), if the absolute difference between its support estimated from the dataset and the Bayesian network exceeds a user specified threshold (ε). Itemsets with the highest diverging supports are considered the most interesting.
  • 15. Measuring the Interestingness  The most interesting variable set according to Jaroszewicz and Simovici (2004) approach using Bayesian network is based on the absolute difference between this set‟s support values obtained from data and Bayesian network.  This interestingness value is based on the discrepancy between Bayesian network and data. By using different Bayesian networks depending on expert knowledge, more interesting variable sets may be obtained.  Bayesian network can be updated again using the most interesting variable sets, and this updated network can be used again to find interesting patterns. As these steps repeat, the interestingness values of variable sets are reduced.  This is because Bayesian network adapts data well and discrepancy between Bayesian network and data decreases. Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
  • 16. Using Interesting Patterns to Learn Bayesian Networks  Learning structure and parameters in Bayesian networks is an important problem in literature. Expert knowledge is generally used to solve this problem. However, it is not always possible to reach the appropriate expert opinion. In current applications, data set is exploited to learn Bayesian network because of the lack of expert opinion. Learning problem in Bayesian networks are separated into “parameter learning” and “structure learning”. Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
  • 17. Bayesian Network and Expert Opinion  Bayesian networks are generally created according to expert opinion about the problem and the data. Achieving expert opinion is generally difficult and costly. If expert opinion about interested data can not be reached, knowledge obtained from association analysis results can be used to create a Bayesian network. In addition, if expert opinion does not exist but a Bayesian network is created according to non-expert opinion, this Bayesian network can be updated according to association analysis results. Hence, a suitable Bayesian network can be created without the need for expert opinion.
  • 18. Expert Knowledge and Belief Network Learning  If there is no expert knowledge about the structure of Bayesian network, the data is used to learn the DAG structure that best describes the data. In principle, in order to find the best DAG structure for the variable set V, all possible DAG representations for V should be established and compared. Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
  • 19. Research Problem – Interestingness for Rare Beliefs  As seen in earlier examples from the the literature, the properties of several interestingness measures have been analyzed and several frameworks has been proposed for selecting a right interestingness measure for extracting association rules.  However, for rare beliefs, anomalies and outliers, which contain useful knowledge, researchers are making efforts to investigate efficient approaches to extract the same.  The research problem is to analyze the properties of interestingness measures for determining the interestingness of outliers and rare beliefs.  Based on the analysis, researchers can suggest a set of measures and properties an expert should consider while selecting a measure to find the interestingness of rare associations.
  • 20. Conclusion  Association analysis and Bayesian networks are two methods which are used to accomplish different goals in data mining. Whereas the aim of association analysis is to obtain interesting patterns in a data set, the aim of Bayesian networks is to calculate local probability distributions of the variables by modelling causal relationships between variables.  Output of one of these two methods can be used as an input to another method. Interesting patterns determined by association analysis is exploited in learning and updating Bayesian networks. Also, Bayesian networks are exploited to create interestingness measures used in association analysis. Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
  • 21. Conclusion  In association analysis, objective interestingness measures are generally used to determine interesting patterns. Interestingness is the incompatibility degree of the pattern to the prior knowledge of the researcher. Objective interestingness measures do not fully comply with this meaning of interestingness.  These measures identify patterns frequently seen in data set rather than interesting patterns. However, subjective interestingness measures comply with the meaning of interestingness rather than objective measures. Subjective interestingness measures are defined over expert systems.  Bayesian networks are expert systems and they may be used to define a subjective measure. The most different patterns from the knowledge represented by Bayesian networks are specified as the most interesting patterns. Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
  • 22. Conclusion (cont)  Using together these two data mining techniques provides to add prior information to knowledge discovery process.  Even if this prior information is not received from an expert, suitable results can still be obtained by supporting this non-expert information with data. Hence, there is no need to complicated and time consuming algorithms to learn Bayesian networks and to create interestingness measures, and more suitable results to real world are reached. Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52
  • 23. References and Bibliography  Ersel, D., & Günay, S. Bayesian Networks and Association Analysis in Knowledge Discovery Process. D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 64  I. Ben-Gal, 2007, Bayesian Networks,Encyclopedia of Statistics in Quality &Reliability,F. Ruggeri, F.Faltin, R. Kenett, R. (eds), Wiley & Sons.  M.W. Berry,M.Browne, 2006,Lecture Notes in Data Mining, World Scientific Publishing, Singapore, 222p.  Y. Dong-Peng, L. Jin-Lin, 2008, Research on personal credit evaluation model based on bayesian network and association rules, Knowledge Discovery and Data Mining, 2008. WKDD 2008. First International Workshop on , 457- 460.  D. Heckerman, 1995, Bayesian networks for data mining, Data Mining and Knowledge Discovery1, 79-119.  S. Jaroszewicz, D.A. Simovici, 2004, Interestingness of frequent itemsets using Bayesian networks as background knowledge, Proceedings of the 10th ACM SIGKDD Conference on Knowledge Dicovery and Data Mining, August 20-25, 2004, New York, USA, 178-186.  F.V. Jensen, 2001, Bayesian Networks and Decision Graphs, Springer-Verlag, New York, 268p.  R. Malhas, Z. Aghbari, 2007, Fast discovery of interesting patterns based on Bayesian network background knowledge, University of Sharjah Journal of Pure and Applied Science, 4 (3).  K. Murphy, 1998, A brief introduction to graphical models and Bayesian networks, http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html#infer.  A. Siberschatz, A. Tuzhilin, A., 1995, On subjective measures of interestingness in knowledge discovery,  Proceedings of the 1st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 20- 21, 1995, Montreal, Canada, 275-281.  P. Tan, V. Kumar, J. Srivastava, 2002, Selecting the right interestingness measure for association patterns, Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 23-26, 2002, Edmonton, Alberta, Canada, 32-41.  P.Tan, M. Steinbach, V. Kumar, 2006, Introduction to Data Mining, Addison-Wesley, Boston,769p.