SlideShare a Scribd company logo
Republic of Iraq
Ministry of Higher Education & Scientific Research
Iraqi Commission for Computers and Informatics
Informatics Institute for Postgraduate Studies
Study of Association Rules'Visulalization
Techniques
A Project
Submitted to the Informatics Institute
For Postgraduate Studies of the Iraqi Commission
For Computers and Informatics as a partial fulfillment of the
Requirements for the degree of Higher Diploma in Web Site
Technology in Computer Science
By
Mustafa S.Shaheed
Supervised by
Dr. Hussein K. Khafaji
Baghdad, Iraq
Feb 2011 1432
I
‫اﻟﺮﺣﻴﻢ‬ ‫اﻟﺮﺣﻤﻦ‬ ‫ﷲ‬ ‫ﺑﺴﻢ‬
‫ْـﻤﴼ‬‫ﻠ‬‫ِـ‬‫ﻋ‬ ‫ْﻧـﻲ‬‫د‬ٍ‫ز‬ ِّ‫ب‬َ‫ـﻞْ ر‬ُ‫ﻗ‬َ
‫اﻟﻌﻈﻴﻢ‬ ‫ﷲ‬ ‫ﺻﺪق‬
‫آﻳﻪ‬ -‫ﻃﻪ‬ ‫ﺳﻮرة‬114
II
Dedication
To My Family With Love
And Affection
III
Acknowledgments
My first and deepest gratitude goes to ALLAH the
almighty for his uncountable blessing, help, and
guidance.
I would like to express my deepest appreciation to
my supervisor Dr. Hussein K. Khafaji for his guidance,
helpful, comments, and suggestions.
IV
Supervisor's Certification
I certify that the project entitled "Comparative Study of
Association Rules'Visulalization Techniques” was prepared under
my supervision at the Informatics Institute for Postgraduate
Studies in Iraqi Commission for Computers and Informatics as a
partial fulfillment of the requirements for the degree of Higher
Diploma in Web Site Technology in Computer Science.
Signature:
Name: Dr. Hussein K. Khafaji
Date: /2/2011
V
Examining Committee Certification
We certify that we read this project, entitled " Comparative Study
of Association Rules'Visulalization Techniques ", and as an examining
committee, examined the student " Mustafa S. Shaheed", in the contents and
what is related to it and that in our opinion it meet the standard of a project
for the Higher Diploma in Web Site Technology in Computer Science.
Signature
Name: Dr. Hussein K. Khafaji
Title:
Date: /2/2011
Supervisor
Approved by the Informatics Institute for Postgraduate Studies of the
Iraqi Commission for Computers and Informatics.
Signature
Name: Prof. Dr. Imad Hussain Al-Hussaini
Date: /10/2010
Dean of the Institute
Signature
Name: Dr.
Title:
Date: /2/2011
Chairman
Signature
Name: Dr.
Title:
Date: /2/2011
Member
Signature
Name: Dr.
Title:
Date: /2/2011
Member
VI
Abstract
Computers are used in more and more areas, large volumes of data have
been collected and stored in the database continuously. An important issue is to
figure out how to find the useful information from these massive data.
Data mining, also known as knowledge discovery in databases, is such a
research area to extract implicit, understandable, previously unknown and
potentially useful information from data.
Association Rules are one of the most widespread data mining tools because
they provide valuable information for many application fields, in spite of their
mining difficulties.
The exploration of large data sets is an important but difficult problem.
Information visualization techniques can be useful in solving this problem.
Visual data exploration has a high potential, and many applications.
Association Rules Visualization is emerging as a crucial step in a data
mining process in order to profitably use the extracted knowledge.
In this project, most important techniques of association rule visualization are
study which used to present the association rule that discovered from databases by
used algorithms 0Tdeveloped0T1T 0T1Tfor this0T1T 0T1Tpurpose and identify0T1T 0T1Tthe strengths0T1T 0T1Tand
weaknesses0T1T 0T1Tof0T1T 0T1Tthese0T1T 0T1Ttechniques to reach0T1T 0T1Tthe0T1T 0T1Tmost0T1T 0T1Tappropriate0T1T 0T1Ttechnology0T1T 0T1Tto
solve 0Tthe main drawback of Association Rules.
VII
Title Page
Chapter One: Introduction 1
1.1 Introduction 2
1.2 Introduction to Data Mining 2
1.3 Introduction to Association Rule 3
1.4 Introduction to Functional Dependencies 4
1.4.1 Candidate Key 5
1.5 Aim of the study 6
Chapter Two: Data Mining And Functional Dependency 8
2.1 Introduction 9
2.2 Data Mining Overview 9
2.2.1 Data Mining Application 10
2.2.2 The process before Data Mining 10
2.2.3 Data Mining tasks 11
2.2.3.1 Association Rules 12
2.2.3.2 Apriori algorithm 15
2.3 Functional depe 16
2.3.1 Definition (1) 17
2.3.2 Definition (2) 18
2.3.3 Multi Valued Dependencies 23
2.4 Candidate Keys 24
2.5 Primary Key 25
2.6 Super key 26
List of Contents
VIII
2.7 Armstrong's Axioms 27
Chapter Three: proposed System To Determine the
Candidate Keys
31
3.1 Introduction 32
3.2 The relation between data mining and functional
dependency
32
3.3 An Algorithm of determining closure sets 32
3.4 System Architecture 34
3.4.1 Sets Generator 35
3.4.2 Candidate key tester 36
3.5 Set closure producer 42
3.6 key filter 46
3.7 Candidate keys system execution 47
Chapter four: Discussion, and Future works 52
4.1 Discussion 53
4.2 Future works 54
IX
List of algorithms
Algorithm (3-1) testing the closure of sets of attributes algorithm 33
Algorithm (3-2) Rule testing algorithm 43
Algorithm (3-3) Closure generator algorithm 44
List of programs
Program (3-1) Candidate key tester 41
Program (3-2) Candidate key function 42
Program (3-3) merge program 45
List of Figures
Figure (3-1) the architecture of generating candidate keys 34
Figure (3-2) the main view of application 47
Figure (3-3) the interface of set generator 48
Figure (3-4) the interface of canidiate key tester 49
Figure (3-5) the interface of table (sets) 50
Figure (3-6) the in oterfacef table (candid) 51
X
List of tables
Table (2.1) A database with 4 items and 5 transactions 12
Table (2.2) How employees get to work 19
Table (2.3) Functional Dependencies defined over two sets 20
Table (2.4) Employees information 21
Table (2.5) Students information 22
Table (2.6) Managers phone# 23
Table (2.7) Manager- employee 23
Table (2.8) Relation of Managers, phone, and employee 24
Table (3.1) Sets stored table 36
Table (3.2) Candidate keys stored table 37
Table (3.3) Temporary values stored table 37
1
Chapter One
Introduction
Introduction
2
Chapter 1
Chapter one
Introduction
Knowledge discovery in databases (KDD) is a new field
depending on ideas from statistics, machine learning, databases, parallel
computing, computer graphics, data visualization, and other fields. KDD
systems generally use methods , algorithms, and techniques from all of
these fields. It has been materialized due to the extraordinary growth of
data in all specialties of human activities, disability of database
management system (DBMS) to extract hidden knowledge in databases,
1.1 Overview
Recent years have seen an enormous increase in the amount of
information stored in electronic format. It has been estimated that the
amount of collected information in the world doubles every 20 months
and the size and number of databases are increasing even faster and the
ability to rapidly collect data has outpaced the ability to analyze it.
Information is crucial for decision making, especially in business
operations. As a response to those trends, the term 'Data Mining' (or
'Knowledge Discovery') has been coined to describe a variety of
techniques to identify nuggets of information or decision-making
knowledge in bodies of data, and extracting these in such a way that they
can be put to use in the areas such as decision support, prediction,
forecasting and estimation. Automated tools must be developed to help
extract meaningful information from a flood of information. Moreover,
these tools must be sophisticated enough to search for correlations
among the data unspecified by the user, as the potential for unforeseen
relationships to exist among the data is very high. A successful tool set
to accomplish these goals will locate useful nuggets of information in
the otherwise chaotic data space, and present them to the user in a
contextual format.
Introduction
3
Chapter 1
and the need for economic and scientific tools such knowledge. KDD
includes techniques and tools to address this need.
defines knowledge discovery in databases as follows[27]:
"KDD is the non-trivial process of identifying valid, novel,
potentially useful, and ultimately understandable patterns in the data".
Many literatures used the terms data mining (DM) and KDD
interchangeably and regard them as synonymous. At the first
international KDD conference in Montreal in 1995, it was proposed that
the term "KDD" be employed to describe the whole process of
extraction of knowledge from data. It was further proposed that the term
'data mining' should be used exclusively for the discovery stage of the
KDD process. A more or less official definition of DM is the process of
automatic extraction of novel, useful, and understandable patterns
in large databases[20,21]. Hence, KDD
includes many steps such as Focussing, Preprocessing,
Transformation, Data Mining and Evaluation. Figure (1.1) abstracts
the KDD process[14].
1- Focussing :- define the goal of the particular KDD task.
2- Preprocessing :- specified data has to be integrated.
3- Transformation :- assure that each data object is represented in a
common form which is suitable as input in the next step.
4- Data Mining :- detect the desired patterns contained within the
given data.
5- Evaluation :- the user evaluates the extracted patterns with
respect to the task defined in the focussing step.
Introduction
4
Chapter 1
data mining is the most important step within the KDD
process, defines data mining as follows[27]:
Data mining is a step in the KDD process consisting of applying data
analysis and discovery algorithms that, under acceptable computational
efficiency limitations, produce a particular enumeration of patterns over
the data.
According to this definition data mining is the step that is responsible
for the actual knowledge discovery and the data minig has many step
such as Association Rules (AR), Sequential Patterns, Classification,
Clustering, Similarity search.
Association Rules is the most important task of DM. ARs represent the
correlation between sets of items in transaction database. An AR is an
implication of the form:
X c%
means that the person who reads the novels "The love in cholera
era",
Y , where X, and Yare sets of items each of which is called
itemsets.{X} is called antecedent, while Y is called consequent such that
{X} ∩ {Y}=∅ and C% is the confidence of the implication, for example
the following rule
The Merchant of Venice
The ARs are extracted from mined frequent itemsets. Mining of
frequent itemsets is a very complex process[3].the mining of association
rules consists of two steps; the first one is mining of frequent itemsets
", and "Zoorba", also reads the novels
{"The Trees and Marzooq's Association", "One Hundred Years of
Segregation}, with certainty factor of 60%. The confidence of a rule is
calculated as follows:
Confidence = support (X∪Y)/support (X).
where the support of an itemset is the number of its occurrences in the
database. The confidante rule is of confidence greater than or equal to
the user defined threshold called minimum confidence, minconf
{ “The love in cholera era” , “The Merchant of Venice “ , “Zoorba”} 60%
{“ the Tree and Marzooq’s Association” , “One Hundred Years of egregation”}
Introduction
5
Chapter 1
while the second one is extracting the rules from these frequent ilemsets.
The first step, intermediate step, is massive computational step and
attains the interest of the researcher since for many years many
algorithms have been produced to accomplish this complicated mining
process such as apriori, aprioriTID, aprioriHyprid [20], FP-growth
[12], and CHARM [17], . The second step is extracting the association
rules from the results of the previous step.
The main drawback of Association Rules is thus the huge number
of extracted rules that cannot be manually inspected by that and the
existence of trivial or meaningless associations that are usually mined
due to the exhaustive nature of the extraction algorithms[24]. Graphical
tools and pruning methods are the main approaches used to face these
problems and to make data mining to be effective and well-Evaluated, it
is important to include the human in the data exploration process and
combine the flexibility, creativity, and general knowledge of the human
with the enormous storage capacity and the computational power of
today’s computers. Visual data exploration aims at integrating the
human in the data exploration process, applying human perceptual
abilities to the analysis of large data sets available in today’s computer
systems. The basic idea of visual data exploration is to present the data
in some visual form, allowing the user to gain insight into the data, draw
conclusions, and directly interact with the data. Visual data mining
techniques have proven to be of high value in exploratory data analysis,
and have a high potential for exploring large databases. Visual data
exploration is especially useful when little is known about the data and
the exploration goals are vague. Since the user is directly involved in the
exploration process, shifting and adjusting the exploration goals is
automatically done if necessary.There are many techniques used to
visually represent the data we will discuss some of them in this project.
Introduction
6
Chapter 1
Figure (1-1) Visualization and Data Mining
The aim of the project is a Study of techniques which used to
present the association rule that discovered from databases by used
algorithms
1.2 Aim of the project
developed for this purpose and identify the strengths and
weaknesses of these techniques to
Introduction
7
Chapter 1
reach the most appropriate technology to solve the main drawback of
Association Rules.
1.3 Project Outline
Chapter two explains the stage of Knowledge Discovery in
Databases (KDD), task of data mining and concentrates on
Association rules(AR).
Chapter three focus on concept of Visualization, Visualization
Benefits and Visualization Techniques which used to visualize the
association rules (AR) due to their importance as an interesting field of
this study.
Chapter four presents the summary and future work of the
techniques used to visualized association rules.
8
Chapter Two
Data Mining
And
Association Rules
Data mining and Association Rules
9
Chapter 2
Chapter Two
Data mining and Association Rules
2.1 Introduction
This chapter presents the general steps of Knowledge discovery
in databases (KDD) and its relation with data mining. Also, it presents
the tasks of data mining (DM) and concentrates on Association rules
due to their importance as an interesting field of DM.
2.2 Knowledge Discovery in Databases
In recent years the amount of data that is collected by advanced
information systems has increased tremendously. Although very useful
information of strategic importance is buried within this data, this
information is not readily available for the users To analyze these huge
amounts of data, the interdisciplinary field of Knowledge Discovery in
Databases (KDD) has emerged. Applies efficient algorithms to extract
interesting patterns and regularities from the data.
KDD is defined as follows[27] :
Knowledge Discovery in Databases is the non-trivial process of
identifying valid, novel, potentially useful, and ultimately
understandable patterns in data.
Data mining and Association Rules
10
Chapter 2
According to this definition, data is a set of facts that is somehow
accessible in electronic form. The term patterns indicate models and
regularities which can be observed within the data. Patterns have to be
valid, i.e. they should be true on new data with some degree of certainty.
A novel pattern is not previously known or trivially true. The potentially
usefulness of patterns refers to the possibility that they lead to an action
providing a benefit.
A pattern is understandable if it is interpretable by a human user.
At last KDD is a process, indicating that there are several steps that are
repeated in several iterations.
Figure 2.1 displays the process of KDD in its basic form.
Figure (2-1) The KDD process
Data mining and Association Rules
11
Chapter 2
1- Focussing
2.3 KDD Process Stages
KDD process is an interactive and iterative multi-step process
which uses five steps to extract interesting knowledge according to
some specific measures and thresholds.[14]
2- Preprocessing
3- Transformation
4- Data Mining
5- Evaluation
2.3.1 Focussing
The first step is to define the goal of the particular KDD task.
Another important aspect of this step is to determine the data to be
analyzed and how to obtain it.
2.3.2 Preprocessing
In this step the specified data has to be integrated, because it is not
necessarily accessible on the same system. Furthermore, several objects
may be described incompletely. Thus, the missing values need to be
completed and inconsistent data should be corrected or left out.
2.3.3 Transformation
The transformation step has to assure that each data object is
represented in a common form which is suitable as input in the next step.
Data mining and Association Rules
12
Chapter 2
2.3.4 Data Mining
Data mining is the application of efficient algorithms to detect the
desired patterns contained within the given data. Thus, the data mining
step is responsible for finding patterns according to the predefined task.
Since this step is the most important within the KDD process, we are
going to have a closer look at it in the next section(2.4).
2.4 Data Mining
2.3.5 Evaluation
At last, the user evaluates the extracted patterns with respect to the
task defined in the focussing step. An important aspect of this evaluation
is the representation of the found patterns. Depending on the given task,
there are several quality measures and visualizations available to
describe the result. The important phase to represent the result of KDD
process by visualization techniques, these techniques allow the user to
assess the results in easier and more flexible. If the user is satisfied with
the quality of the patterns, the process is terminated. However, in most
cases the results might not be satisfying after only one iteration. In those
cases, the user might return to any of the previous steps to achieve more
useful results.
Since data mining is the most important step within the KDD
process, we will treat it more carefully in this section. In [27, 30] Data
Mining is defined as follows:
Data mining is a step in the KDD process consisting of applying
data analysis and discovery algorithms that, under acceptable
Data mining and Association Rules
13
Chapter 2
computational efficiency limitations, produce a particular enumeration
of patterns over the data.
According to this definition data mining is the step that is responsible for
the actual knowledge discovery. To emphasize the necessity that data
mining algorithms need to process large amounts of data, the desired
patterns has to be found under acceptable computational efficiency
limitations. Let us note that there are many other definitions of data
mining and that the term data mining and KDD are often used in a
synonymous way.
Data mining has many tasks such as:
1- Association Rules (AR): Given a database of transactions, where each
transaction consists of a set of items, association discovery finds all the
item sets that frequently occur together, and also the rules among them.
we are going to have a closer look at it in the next section(2.5).
2- Sequential Patterns: Sequence Discovery aims at extracting sets of
events that commonly occur over a period of time.
3- Classification and Regression: Classification aims to assign a new data
item to one of several predefined categorical classes. The goal of
classification and regression is to build a model that minimizes the error
between the predicted and true values of the target variable. [15,18]
it known as supervised induction[14]. Supervised induction is the
machine learning task of inferring a function from supervised training
data[30].
4- Clustering: Clustering is the process of grouping the data records into
meaningful subclasses (clusters) in a way that maximizes the similarity
within clusters and minimizes the similarity between two different
clusters [10].clustering is also called unsupervised induction.[3]
Data mining and Association Rules
14
Chapter 2
5- Similarity search: Similarity search is performed on a database of
objects to find the object(s) that are within a user-defined distance from
the queried object, or to find all pairs within some distance of each other.
Figure (2-2) Classification separates the data space (left) and clustering
groups data objects (right)
2.5 Association Rule
Association rules are ones of the promising aspects of data mining
as knowledge discovery tool, and have been widely explored to
date[27,14]. They allow to capture all possible rules that explain the
presence of some attributes according to the presence of other attributes.
An association rule, X⇒ Y, is a statement of the form "for a specified
fraction of transactions, a particular value of an attribute set X
determines the value of attribute set Y as another particular value under a
certain confidence". Thus, association rules aim at discovering the
patterns of co-occurrences of attributes in a database. For instance, an
association rule in a supermarket basket data may be "In 10% of
transactions, 85% of the people buying milk also buy milky-sweets
in that transaction". The association rules may be useful in many
Data mining and Association Rules
15
Chapter 2
applications such as supermarket transactions analysis, store layout and
promotions on the items, telecommunications alarm correlation,
university course enrollment analysis, customer behavior analysis in
retailing, catalog design, word occurrence in text documents, stock
transactions, etc[29,21,16].
Let I = {I1,..., Im} be a set of literals, called items. Let D be a set of
transactions, where each transaction T is a set of items such that T ⊆ I,
and each transaction is associated with a unique identifier called TID.
Definition 2.1 An itemset X is a set of items in I. An itemset X is called a
k-itemset if it contains k items from I.
Definition 2.2 A transaction T satisfies an itemset X if X ⊆ T. The
support of an itemset X in D, supportD
Definition 2.5 An association rule is an implication of the form X ⇒ Y,
where X ⊂ I, Y ⊂ I, and X ∩ Y = φ. X is called the antecedent of the
rule, and Y is called the consequent of the rule. The rule X ⇒ Y holds in
(X), is the number of transactions
in D that satisfies X.
Definition 2.3 An itemset X is called a large itemset if the support of X
in D exceeds a minimum support threshold explicitly declared by the
user, and a small itemset otherwise.
Definition 2.4 The negative border of a set S ⊂ P(R), closed with
respect to the set inclusion relation, is the set of minimal itemsets X ⊂ R
not in S. The negative border of the set of large itemsets is the set of
itemsets that are generated as a candidate but fail to qualify into the set
of large itemsets.
Data mining and Association Rules
16
Chapter 2
D with confidence c where c=supportD(X ∪Y)/supportD(X). The rule
X⇒Y has support s in D if the fraction s of the transactions in D
contain X ∪Y.
Example: Suppose I={A, B, C, D, E} is the abbreviation of movie title
in Movie-CD shop, these abbreviation are shown in Table (2.1). Table
(2.2)
Represent a database of the shop sells. Each transaction is defined
Transaction identifier, TID. Table (2.3) shows the frequent itemsets
according To minsup =50%, while Table (2.4) depicts all the ARs
according to Minconf = 100%.
Table (2.1) The items abbreviations of Database
Item Abbreviation
A Golden mountain
B Gone with the Wind
C Zoorba
D Rain Man
E Sound of Music
Data mining and Association Rules
17
Chapter 2
Table (2.2) The items abbreviations of Database
Transaction TID (Person) Items-(Attributes)
1 B,C,E
2 B,C,D,E
3 A,B,C,D,E
4 B,C,D
5 A,B,F
6 A,B,C,E
Table (2.3) Large itemsets with minsup = 33%=2
Support Itemsets No.
6=100% B 1
5=83% C,BC 2
4=67% E,BE,CE,BCE 4
3=50% A,D,AB,BD,CD,BCD 6
2=33%
AC,AE,DE,ABC,ABE,ACE,BDE,
CDE,ABCE,BCDE
10
Table(2.4)AssociationRules
Associationruleswithminconf=100%
A→B(3/3) AC→B(2/2) AC→BE(2/2)
C→B(5/5) AE→B(2/2) AE→BC(2/2)
D→B(3/3) AC→E(2/2) DE→BC(2/2)
E→B(4/4) AE→C(2/2) ABC→E(2/2)
D→C(3/3) DE→B(2/2) ABE→C(2/2)
E→C(4/4) DE→C(2/2) ACE→B(2/2)
ABE→C(2/2) ACE→B(2/2) ABC→E(2/2)
Data mining and Association Rules
18
Chapter 2
The mining of Association Rules is decomposed into two sub
problems:
1- Discovering all frequent, (large), patterns (represented by large
itemsets
defined above), and;
2- Generating the association rules from those frequent itemsets.
The first sub problem is very tedious, I/O intensive, and
Computationally expensive for very large databases and this is the case
for many real life applications. In large retailing data, the number of
transactions is generally in the order of millions, and number of items
(attributes) is generally in the order of thousands. When the data
contains N items, then the number of possible large itemsets is 2N. There
are many algorithms to mine frequent itemsets such as apriori,
aprioriTID, and aprioriHyprid,[12]The second problem is
straightforward, and can he done efficiently in a reasonable time and
there is a well known algorithm presented in to accomplish the
extraction of AR. The databases of frequent itemsets and ARs are
assumed to be available in this thesis, therefore there IS no focus on any
frequent itemset and AR mining algorithms.
19
Chapter Three
Visualization
Techniques of
Association Rules
Visualization Techniques of Association Rules
20
Chapter 3
Chapter Three
Visualization Techniques of Association Rules
3.1 Introduction
This chapter, presents the concept of visualization, visualization
benefits and Visualization Techniques which used to visualize the
association rules (AR) in KDD process.
3.2 Visualization
Visualization is the process of transforming data, information,
and knowledge into visual form making use of human’s natural visual
capabilities [9]. Typical of a visualization application is the field of
computer graphics. The invention of computer graphics may be the most
important development in visualization since the invention of central
perspective in the renaissance period. The development of animation
also helped advance visualization. In spite of the importance of the
visualization, there are many limitations and difficulties that must be
taken in consideration such as [28, 4]:
The main limitations are:
• Visualization techniques are always difficult to evaluate. This one is no
exception.
• The implementation may require, the use of an operating system from
one specific vendor.
•The visualization techniques offered are very limited.
• The limitation of many 3D visualizations is the possible waste of
screen space towards the comers of the screen.
• The traditional menu bar approach would require long mouse
movements from the visualization to the menu bar and vice versa.
Visualization Techniques of Association Rules
21
Chapter 3
•Object interacting complexity occurs within 3-d environment, for
example the user can transform the parallel bar chart into a matrix
format and vice versa.
3.3 Benefits of Visualization
Visual data exploration can be seen as a hypothesis generation
process, the visualizations of the data allow the user to gain insight into the
data and come up with new hypotheses. The verification of the hypotheses
can also be done via data visualization, but may also be accomplished by
automatic techniques from statistics, pattern recognition, or machine
learning. In addition to the direct involvement of the user, the main
advantages of visual data exploration over automatic data analysis
techniques are:
• Visual data exploration can easily deal with highly non-homogeneous
and noisy data.
• Visual data exploration is intuitive and requires no understanding of
complex mathematical or statistical algorithms or parameters.
• Visualization can provide a qualitative overview of the data, allowing
data phenomena to be isolated for further quantitative analysis.
As a result, visual data exploration usually allows a faster data
exploration and often provides more interesting results, especially in
cases where automatic algorithms fail. In addition, visual data
exploration techniques provide a much higher degree of confidence in
the findings of the exploration. These facts lead to a high demand for
visual exploration techniques and make them indispensable in
conjunction with automatic exploration techniques [6].
3.4 Visualization of Association Rule
Visualizing association rules aims at solving some major
problems that come with association rules. First of all the rules found by
automatic procedures must be filtered. Depending on what minimum
confidence and what support is specified a vast amount of rules may be
generated.
There are at least five parameters involved in a visualization of
association rules [19].
· Sets of antecedent items.
· Sets of consequent items.
Visualization Techniques of Association Rules
22
Chapter 3
· Associations between antecedent and consequent.
· Rules' support.
. Rules' confidence.
The goal of association rule generation is to find interesting patterns
and trends in transaction databases. Association rules are statistical
relations between two or more items in the data set. In a supermarket
basket application, associations express "the relations between items that
are bought together. It is for example interesting if we find out that in
70% of the cases when people buy bread, they also buy milk.
Association rules tell us that the presence of some items in a transaction
implies the presence of other items In the same transaction with a certain
probability, called confidence. A second important parameter is the
support of an association rule, which is defined as the percentage of
transactions in which the items co·occur.
Let I = {i1., .. .in} be a set of items and let D be a set of transactions,
where each transaction T is a set of items such that T ⊆ I. An association
rule is an implication of the form X → Y, ,where X ⊆I ,Y ∈ I, X, Y≠ O.
The confidence c is defined as the percentage of transactions that contain
Y, given X The support is the percentage of transactions that contain
both X and Y. For a given support and confidence level, there are
efficient algorithms to determine all association rules. A problem,
however, is that the resulting set of association rules is usually very
large, especially for low support and confidence levels [8,9]. Using
higher support and confidence levels may not be effective since then,
useful rules may be overlooked. Pattern visualization techniques have
been used to overcome this problem and to allow an interactive selection
of good support and confidence levels. Figure (2.5) shows SGI MineSets
Rule Visualizer[14], which maps the left and right hand sides of the
rules to the x- and y-axes of the plot, respectively, and shows the
confidence as the height of the bars and the support as the height of the
discs.
The color of the bars shows the interestingness of the rule.
Visualization Techniques of Association Rules
23
Chapter 3
Figure (3.1) MineSet's Association Rule Visualizer
Using the visualization, the user is able to see groups of related rules and
the impact of different confidence and support levels. The goal of
association rules visualization is to visualize a large number of
association rules and their metadata in two- dimensional (2D) or
three-dimensional (3D) display with minimum human interaction,
minimum occlusion, and no screen swapping. There are many
approaches developed to visualize association rules which are the:
1- Rule Table
2- two-dimensional matrix
3- directed graph
4- rule-item approach
5- Mosaic Plot
6- Double Decker Plot,
7- Parallel Coordinates,
8- Many- to- Many AR Visualization Technique.
U3.4.1 Rule TableU
The most straightforward method for the association rule
visualization is to use the rule table. The following rule table format has
been used [26]:
tem
1
Item
2
Item
3
Item
4
Item
5
Item
N
Rule
N
Antecedent
N
Confidence Support
Visualization Techniques of Association Rules
24
Chapter 3
Here Item1, Item2, …, and Item5 mean the 5 items, Rule N means the
number of item in rule, antecedent N means the number of item in rule
antecedent ,
Rule N – antecedentN= consequent.
Table (3.1) Example of Association Rules in Rule Table Format
Item 1 Item2 Item3 Item4 Item5 Item
5
Rule
N
Antecedent
N
Confidence Support
Bread Milk Null Null Null Null 2 1 90% 10%
Eggs Bread Milk Null Null Null 3 1 85% 7%
Milk Bread Eggs Olive Null Null 4 2 60% 3%
In Table 3.1, rule #3 (the third row), the column Rule N= 4 means the
rule consists of 4 items.’ antecedentN=2’ means there are 2 items in the
rule head.
Milk, Bread 60%
Eggs, Olive and support 3%.
Rule table is the most straightforward way to show the association
rule to the users. However, the rule table is only suitable to display the
limited number of rules to the users. If the user needs to have a global
view of all the rules, the rule table is not a suitable approach.
• The strengths of a 2D matrix, however, break down when we need to
Visualize many-to-one relationships such as association rules with
3.4.2 Two-Dimensional Matrix
The design of a two-dimensional (2D) association matrix
positions the antecedent and consequent items on separate axes of a
square matrix. Customized icons are drawn on certain matrix tiles that
connect the antecedent and the consequent items of the corresponding
association rules. Different icons can be used to depict different
metadata such as the support and confidence values of the rules. Figure
(2.2) depicts an association rule (B→C). Both the height and the color of
the column icon can be used to present metadata values. The values of
support and confidence are mapped to 3D columns that are built
separately on and beneath the matrix tiles. Other icons such as disk and
bar are also used to visualize metadata in the rule visualize of MineSet
[4,22,28] . A 2D matrix is arguably the most effective technique to show
one-to- one binary relationship.
Visualization Techniques of Association Rules
25
Chapter 3
multiple antecedent items. For example, in Figure (2.3) it is almost
impossible to tell whether there is only one association rule (A+B→C) or
two (A→C and B→C).
• the lack of a practical way to identify the togetherness of individual
antecedent items makes a 2D matrix a weaker candidate to visualize
rules with multiple antecedent items. MineSet[23] addresses the problem
by grouping all the antecedent items of an association rule as one unit
and plotting it against its consequent, i.e., an antecedent -to-consequent
plot. For example, a dedicated item group (A+B) is created in Figure
(3.4) to describe the association rule (A +B→C).
Figure (3.2) The colored column indicates the association
rule (B →C). Different icon colors are used to show
different metadata values of the association rule
• The strategy works fine for smaller antecedent sets (e.g., less than
3items). In our text mining studies, we encounter association rules with
as many as 12 items in the antecedent.
• The replication of items in the antecedent groups creates a much larger
antecedent-to-consequent plot when compared with the corresponding
item-to-item plot.
The loss of item identity within an antecedent group also defeats the
purpose of visualizing the associations with a matrix. For example, the
row (or column) of the matrix connected to an item can no longer be
used to search for all the rules involving that item.
Visualization Techniques of Association Rules
26
Chapter 3
Figure. (3.3) It is Very difficult to determine the differences
between (A+B→C) and (A→C and B→C)
Figure (3.4) The identities of A and B are lost in the
new item group that was created to depict the
association rule (A+B→C).
• Another problem in a 2D·matrix display is object occlusion, especially
when multiple icons are used to depict different metadata values on the
matrix tiles. The occlusion problem is obvious in Figure (3.5).
Visualization Techniques of Association Rules
27
Chapter 3
Figure (3.5) Object occlusions are unavoidable.
Figure (3.6) Left: A →C and B →C. Right: A+B→C.
3.4.3 Directed Graph
A directed graph is another prevailing technique to depict item
associations. The nodes of a directed graph represent the items, and the
edges represent the associations. Figure (3.6) shows three association
rules (A→C, B→C, A+B→C).
• This technique works well when only a few items (nodes) and
associations (edges) are involved. An association graph can quickly turn
in to a tangled display with as few as a dozen rules. Hetzler et at [19]
address the problem by animating the edges to show the association of
certain items with 3D rainbow arcs. The animation technique requires
significcp1t human interaction to turn on and off the item nodes. It is not
an easy task to show multiple metadata values including support and
confidence, alongside the association rules.
Visualization Techniques of Association Rules
28
Chapter 3
3.4.4 Rule-to-Item Visualization Technique
To visualize many-to-one association rules, instead of using the
tiles of a 2D matrix to show the item-to-item association rules, the
matrix of the rule-to-item relationship is used to depict many-to-one
rule[19]. In figure (3.7) the rows of the matrix floor represent the items
(or topics in the context of text mining), and the columns represent the
item associations. The blue and red blocks of each column (rule)
represent the antecedent and the consequent of the rule. The identities of
the items are shown along the right side of the matrix. The confidence
and support levels of the rules are given by the corresponding bar charts
in different scales at the far end of the matrix. The rule-to-item
visualization approach has many advantages over all the other matrix-
based predecessors:
•There is virtually no upper limit on the number of items in an
antecedent. We can analyze the distributions of the association
rules(horizontal axis) as well as the items within (vertical axis)
simultaneously.
•Unlike Figure (3.4), the identity of individual items within an
antecedent group is clearly shown.
•No new antecedent groups are created because of the multiple
antecedent items in association rules.
•Because all the metadata are plotted at the far end and the height of the
columns is scaled so that the front columns do not block the rear ones,
few occlusions occur.
• No screen swapping, animation, or human interaction (other than basic
mouse zooming) is required to analyze the rules.
Although this technique is the better one, there are fatal drawbacks that
are suffers from, such as:
• It is unable to visualize many-to-many association rule.
• It suffers from antecedent-consequent interlining, i.e interleaving of the
items of the antecedent and consequent, although they are given
different colors
Visualization Techniques of Association Rules
29
Chapter 3
• Deterioration of the naturalness of the rule's parts sequence.
Figure (3.7) A visualization of item associations with
support 0.4% and confidence 50%.
Parallel Coordinates [1,2,13],the Basic elements of association
rules are sets of items, which can be handled by listing all items along a
vertical coordinate. The resulting coordinate is then repeated evenly in
the horizontal direction until there are enough coordinates to host the
longest of the association rule. An association rule can be visualized as a
polygonal line connecting all items in the rule. Parameters such as
support factor and confidence can be mapped to graphics features such
as line-width and color. Figure (3.8) illustrates an association rule ab →
cd as one polygonal line for its LHS, followed by an arrow connecting
another polygonal line for its RHS. This visualization handles nicely the
3.4.5 Parallel Coordinates
Visualization Techniques of Association Rules
30
Chapter 3
upward closure property of association rules: subsets of the RHS are
absorbed and are not displayed. For example, ab → cd implies that abc
→ d, abd → c, ab → c, and ab → d are valid association rules. The
implied association rules are not displayed.If two or more itemsets or
rules have parts in common, for example, adbe and cdb in Figure (3.8).
Figure (3.8) association rule ab → cd in Parallel Coordinates
Visualization technique
U3.4.6 Mosaic Plot
The basic idea is to partition a rectangle on the y-axis according to
one attribute and make the regions proportional to the sum of the
corresponding data values the height of the bars instead of the width to
show the parameter value. Then each resulting area is split in the same
way according to a second attribute [13]. The coloring reflects the
percentage of data items that fulfill a third attribute. The visualization
shows the support and confidence values of all rules of the form X1,X2
→ Y Figure (3.9). Mosaic plots are restricted to two attributes on the left
side of the association rule [6].
Visualization Techniques of Association Rules
31
Chapter 3
Figure (3.9) X1,X2 → Y in Mosaic Plot
Figure (3.10) X1,X2 → Y in Double Decker Plot
3.4.7 Double Decker Plot
Double decker plots can be used to show more than two attributes
on the left side. The idea is to show a hierarchy of attributes on the
bottom (heineken, coke, chicken in the example shown in figure (3.10)
corresponding to the left hand side of the association rules and the bars
on the top correspond to the number of items in the corresponding subset
of the database and therefore visualize the support of the rule. The
colored areas in the bars correspond to the percentage of data
transactions that contain an additional item and therefore correspond to
the support [6,11].
Visualization Techniques of Association Rules
32
Chapter 3
As previously mentioned, three approaches developed to
visualize association rules are the two-dimensional matrix, directed
graph, and rule-item approach. Also, it is shown that rules-item approach
is the best technique in spite of its drawbacks such as its inability to
represent many-to -many AR and interlining of consequent and
antecedent items in the visualization area. This section presents a new
technique which excludes these drawbacks. It excludes the items
interleaving and efficiently represents many-to-many AR. This
technique has been called many-to-many AR visualization technique,
MARVT. In this technique the visualization area is divided into three
regions; antecedent region, statistical region, and consequent region.
This technique can be implemented in 2- dimension or 3- dimension. If
the 2-dimension implementation is chosen, the x-axis of the visualization
area is rule identifiers, while the y-axis of antecedent region is items of
the antecedent of the rules to be visualized. The y-axis of the statistical
region is divided according to the confidence and support level of the
rules, while the y-axis of the antecedent region is the items of the
consequent of the selector rules. Figure (3.11) depicts the general
structure of visualization area of the proposed technique. If an item i is
belonging to the antecedent of a rule R a red ellipse is drawn in (R, i)
position of the antecedent region and if an item j is part of the
consequent of the rule R, a black ellipse is drawn in the (R, j) position of
consequent area. The statistical region contains an important statistical
value such as the confidence, support, support of antecedent item set
and- support of consequent itemset of each rule in a specified region of a
rule. The y-axis of statistical region is divided beginning at the minsup
and minconf threshold and ending with 100%. The technique is flexible
to visualize more statistical information such as the support for each
item. Also, it is possible to display the order of the rule. If this technique
is implemented as a 3-dimension, the same regions are utilized. X-axis is
determined by rule id. Y-axis is determined by the items of antecedent
and consequent for their regions respectively. Z-axis is determined by
the support and confidence beginning at minconf or minsup threshold.
3.4.8 Many to Many AR Visualization Technique
Visualization Techniques of Association Rules
33
Chapter 3
The third dimension is used to show the support of the items, the
confidence, and the support of a rule, and the support of antecedent
itemset and consequent itemsets. In this technique it is possible to
visualize many-many rules, one-to-many, many-to-one, etc. because it
determines two separated regions for antecedent an consequent which
hold unlimited number of items. This separation, also, excludes the
items interlining because the items of consequent and antecedent are
presented at different regions.
Figure (3.11) General Structure of Visualization Area of
Proposed Many-to-Many Association Rules
Visualization Technique, MARVT .
Visualization Techniques of Association Rules
34
Chapter 3
To give more_ illustration of this technique, for example, consider
the
following rules:
1- a,b→c,q1 and its confidence, and support are 63, 2 respectively.
2- a,b,c→q1,m and its confidence, and support are 100, 3 respectively.
3-b,c→c,m,q1 and its confidence, and support are 50, 1 respectively.
Figure (3, 12) shows the hypothesis visualization of these rules. As
shown the antecedent items of R1 are a and b therefore, the position
(R1, a)
Figure (3.12) Visualization Area of
Many-to-Many Association Rules
Visualization Technique
Visualization Techniques of Association Rules
35
Chapter 3
and (R1, b) of antecedent area is marked with red cycles and so on for
the rest to rules. Also, (R1, c) and (Rl, ql) of consequent area are marked
with black cycles because e and ql are the consequent items of Rl. The
same process is done for R2 and R3. The statistical area visualizes the
support of antecedent and consequent itemsets and furthermore the
support and confidence of the rules. Also, it is possible to add the
support of each item with its ellipse in its position. For example, the
number 3 beside the ellipse of the item a in Rl represents the support of
the item a and so on for each items. Figure (3.13) depicts the general
structure of MARVT. This structure preserves the same pertaining
regions; consequent, antecedent, and statistical regions.
Visualization Techniques of Association Rules
36
Chapter 3
Figure (3.13) 3D General Structure of MARVT
37
Chapter four
Summary
And
Future work
Conclusion
38
Chapter 4
Chapter four
Summary and Future work
4.1 introductions
In chapter three, the most important techniques which visualized
the association rules are presented. In this chapter, the summary of these
techniques by Review the most important advantages and disadvantages
of these techniques,
4.2 Summary
Summary by review of the most important characteristics of the
previous techniques.
1- Visualize one-to- one, many-to-one, many-to-many
relationships.
4.2.1 Rule Table
2- Ability to sort the results by the column of interest.
3- Visualize full details for the rule (antecedent, consequent, support,
confidence).
4- Display the limited number of rules.
5- Its main limitation is the close resemblance to the original row
textual form so that the user can inspect only few rules without
having a global view of all the information.
6- Not interacting.
Conclusion
39
Chapter 4
1- Effective technique to show one-to- one binary relationship.
4.2.2 Two-Dimensional Matrix
2- Break down when we need to Visualize many-to-one, many-to-
many relationships.
3- Visualize full details for the rule (antecedent, consequent,
support, confidence).
4- Object occlusion, especially when multiple icons are used to
depict different metadata values on the matrix tiles.
5- Limited number of rule.
6- Not interacting.
1- Visualize one-to- one, many-to-one relationships.
4.2.3 Directed Graph
2- Display the limited number of rules.
3- Lacks a clear representation the
4-
support and confidence.
Edges overlap with each other to
5- Not interacting.
different rules.
1- Visualize many-to-one relationships.
4.2.4 Rule-to-Item Visualization Technique
2- Break down when we need to Visualize many-to-many
relationships.
3- No upper limit on the number of items in an antecedent.
4- Clearly shown to the individual items within an antecedent group.
5- No new antecedent groups are created because of the multiple
antecedent items in association rules.
6- No Object occlusion.
7- Deterioration of the naturalness of the rule's parts sequence
8- Interleaving of the items of the antecedent and consequent,
although they are given different colors.
9- Interacting.
Conclusion
40
Chapter 4
1- Visualize one-to- one, many-to-one, many-to-many relationships.
4.2.5 Parallel Coordinates
2- Visualize full details for the rule (antecedent, consequent, support,
confidence).
3- Visual rules overlap
4- Object occlusion.
with each other.
5- Lacks a clear representation the support and confidence figure
(4.1).
Figure (4.1) The rules overlap and lack of representation is clear for the
support and confidence
1- Visualize one-to- one, many-to-one, many-to-many relationships.
4.2.6 Mosaic Plot
2- Restricted to two attributes on the left side of the association rule.
3- Visualize one rule in time.
4- Difficult to understand and implementation.
5- Lacks a clear representation the support and confidence.
Conclusion
41
Chapter 4
1- Visualize one-to- one, many-to-one, many-to-many relationships.
4.2.7 Double Decker Plot
2- Show more than two attributes on the left side.
3- Visualize one rule in time.
4- Lacks a clear representation the
5- Difficult to understand and implementation.
support and confidence.
1- Best technique to Visualize many-to-many relationships.
4.2.8 Many to Many AR Visualization Technique
2- Visualize full details for the rule (antecedent, consequent,
support, confidence).
3- No Object occlusion.
4- No upper limit on the number of items in an antecedent.
5- Clear representation the
6- Interacting.
support and confidence.
7- Flexible to visualize more statistical information.
8- It is possible to display the order of the rule.
4.3 Future work
The exploration of large data sets is an important but difficult problem.
Information visualization techniques can be useful in solving this
problem. Visual data exploration has a high potential, and many
applications such as fraud detection and data mining can use information
visualization technology for improved data analysis.
Avenues for future work include the tight integration of
visualization techniques with traditional techniques from such
disciplines as statistics, machine learning, operations research, and
simulation. Integration of visualization techniques and these more
established methods would combine fast automatic data mining
algorithms with the intuitive power of the human mind, improving the
quality and speed of the data mining process. Visual data mining
techniques also need to be tightly integrated with the systems used to
manage the vast amounts of relational and semi structured information,
including database management and data warehouse systems. The
ultimate goal is to bring the power of visualization technology to every
desktop to allow a better, faster and more intuitive exploration of very
large data resources. This will not only be valuable in an economic sense
but will also stimulate and delight the user.
42
References
[1] Alfred Inselberg, “Parallel Coordinates: Visual Multidimensional
Geometry and Its Application”, University of San Francisco, 2009.
[2] Alfred Inselberg, “Visualizing high dimensional datasets and
multivariate relations”, (tutorial).In: Proc. 6th
[4] B. Bustos, D. KeIrn, C. Panse, T Schreck, “ Pattern
Visualization",
ACMSIGKDD Inter. Conf. on
Knowledge Discovery and Data Mining (KDD 2000), Boston, MA (2000).
[3] Anil K. Jain and Richard C. Dubes, “Algorithms for Clustering Data”,
Prentice Hall, 1988.
wawTyniuk}@dbvis.infUlUkonslanz., 2003.
[5] Cheung D.W., Ng V., Fu A.W. and Fu Y., “Efficient Mining of
Association Rules in Distributed Databases”, Special Issue in ata
ining”,IEEE Transaction on Knowledge and Data Engineering, IEEE
Computer Society, 1996.
[6] Daniel Keim and Matthew Ward, “Visual Data MiningTechniques “,
University of Konstanz, Germany and Worcester Polytechnic Institute,
USA 2002.
[7] D. Bruzzese, C. Davino, “Visual Post-Analysis of Association Rules”,
Dept. of athematics and Statistics, University of Naples Federico, Italy,
{dbruzzes, cdavino !aunina.it, 2002.
43
[8] D. Keim, "Designing fuel-Oriented Visualization Techniques” ,
University of Florida,,2000.
[9] Gershon N., Eick S. G., and Card S., “Information Visualization”, ACM
Interactions, vol. 5, no. 2, pp. 9-15, March/April 1998.
[10] G. Karypis and V. Kumar, “Scalable Parallel Data Mining for
Association Rules”, University Arizona,2000.
[11] H. Hofmann, A. Siebes, and A. Wilhelm, “Visualizing association
rules with interactive mosaic plots”, SIGKDD Int. Conf. On Knowledge
Discovery & Data Mining (KDD 2000), Boston, MA, 2000.
[12] J.Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate
generation”. In Proc. 2000 ACM-SIGMOD Int. Conf. Management of Data
(SIGMOD’00, Dallas, TX, May 2000.
[13] Martin, A., Ward, M.O.: High dimensional brushing for interactive
exploration of multivariate data, In: Proc. IEEE Conf. on Visualization,
Atlanta,(1995).
[14] Matthias Schubert, “Advanced Data Mining Techniques for Compound
Objects”, Maximilians- University¨, 2004.
[15] M. Deshpande and G. Karypis. ”Evaluation of Techniques for
lassifying Biological equences”. Taipei, Taiwan2002.
[16] Michael Hahsler and Sudheer Chelluboina, “Visualizing Association
Rules: Introduction to theR-extension Package arulesViz”, Southern
Methodist University 2004.
44
[17] M. J. Zaki and C. J. Hsiao. CHARM: “An efficient algorithm for closed
itemset mining”. In Proc. 2002 SIAM Int. Conf. Data Mining (SDM’02),
pages 457–473, Arlington, VA, April 2002.
[18] Pang-Ning Tan, Michael Steinbach, and Vipin Kumar,” Introduction to
Data Mining”, University of Minnesota , 2005.
[19] P. C Wong, P. Whitney, J. Thomas, "Visualizing Anociation Rules for
Text Mining", Pacific Northwest National Laboratory, 2000.
[20] Rakesh Agrawal Ramakrishnan Srikant, “Fast Algorithms for Mining
Association Rules”, IBM Almaden Research Center 1994.
[21] Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami:” Mining
Association Rules between Sets of Items in Large Databases”. SIGMOD
Conference 1993.
[22] Redpath, B. Sriruvasan, "Criteria for Comparati"e Study of
VISualization Techniques in Data mining", IEEE 3..1 into Conf On
Intelligent System, Tulsa, USA, 2003.
[23] S. G. Inc. Mineset. http://www.sgi.com/software/mineset, 2001.
[24] Simeon J. Simoff, Michael H. Böhlen, “Visual Data Mining”,
University ofWestern Sydney,1998.
[25] Stefanos Manganaris. “Supervised Classification with Temporal Data”,
PhD thesis, School of Engineering, Vanderbilt University, 1997.
45
[26] Thomas S., “Architectures and Optimizations for Integrating Data
Mining Algorithms with Database Systems”, Ph.D. dissertation, University
of Florida, Gainesville, 1998.
[27] U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy (Editors).
“Advances in Knowledge Discovery and Data Mining”, Menlo Park, 1996.
[28] U. M. Fan-ad, G. Grinstein, "Information Visualization in Dara Mining
and Knowledge Discovery", Morgan Kaufman, San Francisco (CA), 2004.
[29] vincent wing-sing cho ,”knowledge discovery from distributed and
textual data” , Hong Kong University of Science and Technology , 1999.
[30] http://en.wikipedia.org/wiki/Association_rule_learning.
‫ﺍﻟﻌﺭﺍﻕ‬ ‫ﺟﻣﻬﻭﺭﻳﺔ‬
‫ﺍﻟﻌﻠـﻣﻲ‬ ‫ﻭﺍﻟﺑﺣﺙ‬ ‫ﺍﻟﻌﺎﻟﻲ‬ ‫ﺍﻟﺗﻌﻠﻳﻡ‬ ‫ﻭﺯﺍﺭﺓ‬
‫ﻭﺍﻟﻣﻌﻠﻭﻣﺎﺗﻳﺔ‬ ‫ﻟﻠﺣﺎﺳﺑﺎﺕ‬ ‫ﺍﻟﻌﺭﺍﻗﻳﺔ‬ ‫ﺍﻟﻬﻳﺋﺔ‬
‫ﺍﻟﻌﻠﻳـﺎ‬ ‫ﻟﻠﺩﺭﺍﺳـﺎﺕ‬ ‫ﺍﻟﻣﻌﻠـﻭﻣﺎﺗﻳـﺔ‬ ‫ﻣﻌﻬـﺩ‬
‫ﺍﻟﻣﺟﻣﻌﺔ‬ ‫ﺍﻟﻘﻭﺍﻋﺩ‬ ‫ﺗﺭﺋﻳﺔ‬ ‫ﺗﻘﻧﻳﺎﺕ‬ ‫ﺩﺭﺍﺳﺔ‬
‫ﻣﻘﺩﻣﺔ‬ ‫ﺭﺳﺎﻟﺔ‬
‫ﺍﻟﻰ‬
‫ﻭﺍﻟﻣﻌﻠﻭﻣﺎﺗﻳﺔ‬ ‫ﻟﻠﺣﺎﺳﺑﺎﺕ‬ ‫ﺍﻟﻌﺭﺍﻗﻳﺔ‬ ‫ﺍﻟﻬﻳﺋﺔ‬ /‫ﺍﻟﻌﻠﻳـﺎ‬ ‫ﻟﻠﺩﺭﺍﺳـﺎﺕ‬ ‫ﺍﻟﻣﻌﻠـﻭﻣﺎﺗﻳـﺔ‬ ‫ﻣﻌﻬـﺩ‬
‫ﻓﻲ‬ ‫ﺍﻟﻌﺎﻟﻲ‬ ‫ﺍﻟﺩﺑﻠﻭﻡ‬ ‫ﺷﻬﺎﺩﺓ‬ ‫ﻧﻳﻝ‬ ‫ﻣﺗﻁﻠﺑﺎﺕ‬ ‫ﻣﻥ‬ ‫ﻛﺟﺯء‬
‫ﺍﻟﺷﺑﻛﺔ‬ ‫ﻣﻭﺍﻗﻊ‬ ‫ﺗﻘﻧﻳﺔ‬
‫ﻗﺑﻝ‬ ‫ﻣﻥ‬
‫ﺷﻬﻳﺩ‬ ‫ﺻﺑﺎﺡ‬ ‫ﻣﺻﻁﻔﻰ‬
‫ﺑﺄﺷﺭﺍﻑ‬
‫ﺍﻟﺧﻔﺎﺟﻲ‬ ‫ﺣﺳﻳﻥ‬ .‫ﺩ‬
‫ﺍﻻﻭﻝ‬ ‫ﺭﺑﻳﻊ‬
1432
‫ﺷﺑﺎﻁ‬
2011

More Related Content

What's hot

Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar reportmayurik19
 
New approaches of Data Mining for the Internet of things with systems: Litera...
New approaches of Data Mining for the Internet of things with systems: Litera...New approaches of Data Mining for the Internet of things with systems: Litera...
New approaches of Data Mining for the Internet of things with systems: Litera...
IRJET Journal
 
Comparison Between WEKA and Salford System in Data Mining Software
Comparison Between WEKA and Salford System in Data Mining SoftwareComparison Between WEKA and Salford System in Data Mining Software
Comparison Between WEKA and Salford System in Data Mining Software
Universitas Pembangunan Panca Budi
 
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data MiningUsing Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining
14894
 
A Web Extraction Using Soft Algorithm for Trinity Structure
A Web Extraction Using Soft Algorithm for Trinity StructureA Web Extraction Using Soft Algorithm for Trinity Structure
A Web Extraction Using Soft Algorithm for Trinity Structure
iosrjce
 
5. data mining tools and techniques a review--31-39
5. data mining tools and techniques  a review--31-395. data mining tools and techniques  a review--31-39
5. data mining tools and techniques a review--31-39Alexander Decker
 
11.0005www.iiste.org call for paper. data mining tools and techniques- a revi...
11.0005www.iiste.org call for paper. data mining tools and techniques- a revi...11.0005www.iiste.org call for paper. data mining tools and techniques- a revi...
11.0005www.iiste.org call for paper. data mining tools and techniques- a revi...Alexander Decker
 
Performance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
Performance Analysis of Hybrid Approach for Privacy Preserving in Data MiningPerformance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
Performance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
idescitation
 
Enabling Use of Dynamic Anonymization for Enhanced Security in Cloud
Enabling Use of Dynamic Anonymization for Enhanced Security in CloudEnabling Use of Dynamic Anonymization for Enhanced Security in Cloud
Enabling Use of Dynamic Anonymization for Enhanced Security in Cloud
IOSR Journals
 
Achieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logsAchieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logs
IOSR Journals
 
Data Transformation Technique for Protecting Private Information in Privacy P...
Data Transformation Technique for Protecting Private Information in Privacy P...Data Transformation Technique for Protecting Private Information in Privacy P...
Data Transformation Technique for Protecting Private Information in Privacy P...
acijjournal
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
ieijjournal
 
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
Anastasija Nikiforova
 
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
cscpconf
 
IRJET- Swift Retrieval of DNA Databases by Aggregating Queries
IRJET- Swift Retrieval of DNA Databases by Aggregating QueriesIRJET- Swift Retrieval of DNA Databases by Aggregating Queries
IRJET- Swift Retrieval of DNA Databases by Aggregating Queries
IRJET Journal
 

What's hot (17)

Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar report
 
New approaches of Data Mining for the Internet of things with systems: Litera...
New approaches of Data Mining for the Internet of things with systems: Litera...New approaches of Data Mining for the Internet of things with systems: Litera...
New approaches of Data Mining for the Internet of things with systems: Litera...
 
Comparison Between WEKA and Salford System in Data Mining Software
Comparison Between WEKA and Salford System in Data Mining SoftwareComparison Between WEKA and Salford System in Data Mining Software
Comparison Between WEKA and Salford System in Data Mining Software
 
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data MiningUsing Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining
 
A Web Extraction Using Soft Algorithm for Trinity Structure
A Web Extraction Using Soft Algorithm for Trinity StructureA Web Extraction Using Soft Algorithm for Trinity Structure
A Web Extraction Using Soft Algorithm for Trinity Structure
 
5. data mining tools and techniques a review--31-39
5. data mining tools and techniques  a review--31-395. data mining tools and techniques  a review--31-39
5. data mining tools and techniques a review--31-39
 
11.0005www.iiste.org call for paper. data mining tools and techniques- a revi...
11.0005www.iiste.org call for paper. data mining tools and techniques- a revi...11.0005www.iiste.org call for paper. data mining tools and techniques- a revi...
11.0005www.iiste.org call for paper. data mining tools and techniques- a revi...
 
Ijariie1184
Ijariie1184Ijariie1184
Ijariie1184
 
Performance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
Performance Analysis of Hybrid Approach for Privacy Preserving in Data MiningPerformance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
Performance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
 
Enabling Use of Dynamic Anonymization for Enhanced Security in Cloud
Enabling Use of Dynamic Anonymization for Enhanced Security in CloudEnabling Use of Dynamic Anonymization for Enhanced Security in Cloud
Enabling Use of Dynamic Anonymization for Enhanced Security in Cloud
 
Achieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logsAchieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logs
 
Data Transformation Technique for Protecting Private Information in Privacy P...
Data Transformation Technique for Protecting Private Information in Privacy P...Data Transformation Technique for Protecting Private Information in Privacy P...
Data Transformation Technique for Protecting Private Information in Privacy P...
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
 
DLD_SYNOPSIS
DLD_SYNOPSISDLD_SYNOPSIS
DLD_SYNOPSIS
 
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
 
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
 
IRJET- Swift Retrieval of DNA Databases by Aggregating Queries
IRJET- Swift Retrieval of DNA Databases by Aggregating QueriesIRJET- Swift Retrieval of DNA Databases by Aggregating Queries
IRJET- Swift Retrieval of DNA Databases by Aggregating Queries
 

Similar to Association rule visualization technique

A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data Mining
IOSR Journals
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
theijes
 
Fundamentals of data mining and its applications
Fundamentals of data mining and its applicationsFundamentals of data mining and its applications
Fundamentals of data mining and its applicationsSubrat Swain
 
Data Mining Applications And Feature Scope Survey
Data Mining Applications And Feature Scope SurveyData Mining Applications And Feature Scope Survey
Data Mining Applications And Feature Scope Survey
NIET Journal of Engineering & Technology (NIETJET)
 
IRJET- A Study on Data Mining in Software
IRJET- A Study on Data Mining in SoftwareIRJET- A Study on Data Mining in Software
IRJET- A Study on Data Mining in Software
IRJET Journal
 
Mining Social Media Data for Understanding Drugs Usage
Mining Social Media Data for Understanding Drugs  UsageMining Social Media Data for Understanding Drugs  Usage
Mining Social Media Data for Understanding Drugs Usage
IRJET Journal
 
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
IRJET-	 Fault Detection and Prediction of Failure using Vibration AnalysisIRJET-	 Fault Detection and Prediction of Failure using Vibration Analysis
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
IRJET Journal
 
knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)
Kartik Kalpande Patil
 
C03406021027
C03406021027C03406021027
C03406021027
theijes
 
A Comprehensive Study on Outlier Detection in Data Mining
A Comprehensive Study on Outlier Detection in Data MiningA Comprehensive Study on Outlier Detection in Data Mining
A Comprehensive Study on Outlier Detection in Data Mining
BRNSSPublicationHubI
 
A Clustering Based Approach for knowledge discovery on web.
A Clustering Based Approach for knowledge discovery on web.A Clustering Based Approach for knowledge discovery on web.
A Clustering Based Approach for knowledge discovery on web.
NIET Journal of Engineering & Technology (NIETJET)
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
JOSEPH FRANCIS
 
Data mining and business intelligence
Data mining and business intelligenceData mining and business intelligence
Data mining and business intelligence
chirag patil
 
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESA SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
IJCSES Journal
 
Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Mr.Sameer Kumar Das
 
Data Mining @ Information Age
Data Mining @ Information AgeData Mining @ Information Age
Data Mining @ Information Age
IIRindia
 
IRJET- Analysis of Big Data Technology and its Challenges
IRJET- Analysis of Big Data Technology and its ChallengesIRJET- Analysis of Big Data Technology and its Challenges
IRJET- Analysis of Big Data Technology and its Challenges
IRJET Journal
 
IRJET- A Survey on Mining of Tweeter Data for Predicting User Behavior
IRJET- A Survey on Mining of Tweeter Data for Predicting User BehaviorIRJET- A Survey on Mining of Tweeter Data for Predicting User Behavior
IRJET- A Survey on Mining of Tweeter Data for Predicting User Behavior
IRJET Journal
 

Similar to Association rule visualization technique (20)

Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data Mining
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
 
Fundamentals of data mining and its applications
Fundamentals of data mining and its applicationsFundamentals of data mining and its applications
Fundamentals of data mining and its applications
 
Data Mining Applications And Feature Scope Survey
Data Mining Applications And Feature Scope SurveyData Mining Applications And Feature Scope Survey
Data Mining Applications And Feature Scope Survey
 
IRJET- A Study on Data Mining in Software
IRJET- A Study on Data Mining in SoftwareIRJET- A Study on Data Mining in Software
IRJET- A Study on Data Mining in Software
 
Mining Social Media Data for Understanding Drugs Usage
Mining Social Media Data for Understanding Drugs  UsageMining Social Media Data for Understanding Drugs  Usage
Mining Social Media Data for Understanding Drugs Usage
 
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
IRJET-	 Fault Detection and Prediction of Failure using Vibration AnalysisIRJET-	 Fault Detection and Prediction of Failure using Vibration Analysis
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
 
knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)
 
C03406021027
C03406021027C03406021027
C03406021027
 
F035431037
F035431037F035431037
F035431037
 
A Comprehensive Study on Outlier Detection in Data Mining
A Comprehensive Study on Outlier Detection in Data MiningA Comprehensive Study on Outlier Detection in Data Mining
A Comprehensive Study on Outlier Detection in Data Mining
 
A Clustering Based Approach for knowledge discovery on web.
A Clustering Based Approach for knowledge discovery on web.A Clustering Based Approach for knowledge discovery on web.
A Clustering Based Approach for knowledge discovery on web.
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Data mining and business intelligence
Data mining and business intelligenceData mining and business intelligence
Data mining and business intelligence
 
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESA SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
 
Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53
 
Data Mining @ Information Age
Data Mining @ Information AgeData Mining @ Information Age
Data Mining @ Information Age
 
IRJET- Analysis of Big Data Technology and its Challenges
IRJET- Analysis of Big Data Technology and its ChallengesIRJET- Analysis of Big Data Technology and its Challenges
IRJET- Analysis of Big Data Technology and its Challenges
 
IRJET- A Survey on Mining of Tweeter Data for Predicting User Behavior
IRJET- A Survey on Mining of Tweeter Data for Predicting User BehaviorIRJET- A Survey on Mining of Tweeter Data for Predicting User Behavior
IRJET- A Survey on Mining of Tweeter Data for Predicting User Behavior
 

Recently uploaded

一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 

Recently uploaded (20)

一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 

Association rule visualization technique

  • 1. Republic of Iraq Ministry of Higher Education & Scientific Research Iraqi Commission for Computers and Informatics Informatics Institute for Postgraduate Studies Study of Association Rules'Visulalization Techniques A Project Submitted to the Informatics Institute For Postgraduate Studies of the Iraqi Commission For Computers and Informatics as a partial fulfillment of the Requirements for the degree of Higher Diploma in Web Site Technology in Computer Science By Mustafa S.Shaheed Supervised by Dr. Hussein K. Khafaji Baghdad, Iraq Feb 2011 1432
  • 2. I ‫اﻟﺮﺣﻴﻢ‬ ‫اﻟﺮﺣﻤﻦ‬ ‫ﷲ‬ ‫ﺑﺴﻢ‬ ‫ْـﻤﴼ‬‫ﻠ‬‫ِـ‬‫ﻋ‬ ‫ْﻧـﻲ‬‫د‬ٍ‫ز‬ ِّ‫ب‬َ‫ـﻞْ ر‬ُ‫ﻗ‬َ ‫اﻟﻌﻈﻴﻢ‬ ‫ﷲ‬ ‫ﺻﺪق‬ ‫آﻳﻪ‬ -‫ﻃﻪ‬ ‫ﺳﻮرة‬114
  • 3. II Dedication To My Family With Love And Affection
  • 4. III Acknowledgments My first and deepest gratitude goes to ALLAH the almighty for his uncountable blessing, help, and guidance. I would like to express my deepest appreciation to my supervisor Dr. Hussein K. Khafaji for his guidance, helpful, comments, and suggestions.
  • 5. IV Supervisor's Certification I certify that the project entitled "Comparative Study of Association Rules'Visulalization Techniques” was prepared under my supervision at the Informatics Institute for Postgraduate Studies in Iraqi Commission for Computers and Informatics as a partial fulfillment of the requirements for the degree of Higher Diploma in Web Site Technology in Computer Science. Signature: Name: Dr. Hussein K. Khafaji Date: /2/2011
  • 6. V Examining Committee Certification We certify that we read this project, entitled " Comparative Study of Association Rules'Visulalization Techniques ", and as an examining committee, examined the student " Mustafa S. Shaheed", in the contents and what is related to it and that in our opinion it meet the standard of a project for the Higher Diploma in Web Site Technology in Computer Science. Signature Name: Dr. Hussein K. Khafaji Title: Date: /2/2011 Supervisor Approved by the Informatics Institute for Postgraduate Studies of the Iraqi Commission for Computers and Informatics. Signature Name: Prof. Dr. Imad Hussain Al-Hussaini Date: /10/2010 Dean of the Institute Signature Name: Dr. Title: Date: /2/2011 Chairman Signature Name: Dr. Title: Date: /2/2011 Member Signature Name: Dr. Title: Date: /2/2011 Member
  • 7. VI Abstract Computers are used in more and more areas, large volumes of data have been collected and stored in the database continuously. An important issue is to figure out how to find the useful information from these massive data. Data mining, also known as knowledge discovery in databases, is such a research area to extract implicit, understandable, previously unknown and potentially useful information from data. Association Rules are one of the most widespread data mining tools because they provide valuable information for many application fields, in spite of their mining difficulties. The exploration of large data sets is an important but difficult problem. Information visualization techniques can be useful in solving this problem. Visual data exploration has a high potential, and many applications. Association Rules Visualization is emerging as a crucial step in a data mining process in order to profitably use the extracted knowledge. In this project, most important techniques of association rule visualization are study which used to present the association rule that discovered from databases by used algorithms 0Tdeveloped0T1T 0T1Tfor this0T1T 0T1Tpurpose and identify0T1T 0T1Tthe strengths0T1T 0T1Tand weaknesses0T1T 0T1Tof0T1T 0T1Tthese0T1T 0T1Ttechniques to reach0T1T 0T1Tthe0T1T 0T1Tmost0T1T 0T1Tappropriate0T1T 0T1Ttechnology0T1T 0T1Tto solve 0Tthe main drawback of Association Rules.
  • 8. VII Title Page Chapter One: Introduction 1 1.1 Introduction 2 1.2 Introduction to Data Mining 2 1.3 Introduction to Association Rule 3 1.4 Introduction to Functional Dependencies 4 1.4.1 Candidate Key 5 1.5 Aim of the study 6 Chapter Two: Data Mining And Functional Dependency 8 2.1 Introduction 9 2.2 Data Mining Overview 9 2.2.1 Data Mining Application 10 2.2.2 The process before Data Mining 10 2.2.3 Data Mining tasks 11 2.2.3.1 Association Rules 12 2.2.3.2 Apriori algorithm 15 2.3 Functional depe 16 2.3.1 Definition (1) 17 2.3.2 Definition (2) 18 2.3.3 Multi Valued Dependencies 23 2.4 Candidate Keys 24 2.5 Primary Key 25 2.6 Super key 26 List of Contents
  • 9. VIII 2.7 Armstrong's Axioms 27 Chapter Three: proposed System To Determine the Candidate Keys 31 3.1 Introduction 32 3.2 The relation between data mining and functional dependency 32 3.3 An Algorithm of determining closure sets 32 3.4 System Architecture 34 3.4.1 Sets Generator 35 3.4.2 Candidate key tester 36 3.5 Set closure producer 42 3.6 key filter 46 3.7 Candidate keys system execution 47 Chapter four: Discussion, and Future works 52 4.1 Discussion 53 4.2 Future works 54
  • 10. IX List of algorithms Algorithm (3-1) testing the closure of sets of attributes algorithm 33 Algorithm (3-2) Rule testing algorithm 43 Algorithm (3-3) Closure generator algorithm 44 List of programs Program (3-1) Candidate key tester 41 Program (3-2) Candidate key function 42 Program (3-3) merge program 45 List of Figures Figure (3-1) the architecture of generating candidate keys 34 Figure (3-2) the main view of application 47 Figure (3-3) the interface of set generator 48 Figure (3-4) the interface of canidiate key tester 49 Figure (3-5) the interface of table (sets) 50 Figure (3-6) the in oterfacef table (candid) 51
  • 11. X List of tables Table (2.1) A database with 4 items and 5 transactions 12 Table (2.2) How employees get to work 19 Table (2.3) Functional Dependencies defined over two sets 20 Table (2.4) Employees information 21 Table (2.5) Students information 22 Table (2.6) Managers phone# 23 Table (2.7) Manager- employee 23 Table (2.8) Relation of Managers, phone, and employee 24 Table (3.1) Sets stored table 36 Table (3.2) Candidate keys stored table 37 Table (3.3) Temporary values stored table 37
  • 13. Introduction 2 Chapter 1 Chapter one Introduction Knowledge discovery in databases (KDD) is a new field depending on ideas from statistics, machine learning, databases, parallel computing, computer graphics, data visualization, and other fields. KDD systems generally use methods , algorithms, and techniques from all of these fields. It has been materialized due to the extraordinary growth of data in all specialties of human activities, disability of database management system (DBMS) to extract hidden knowledge in databases, 1.1 Overview Recent years have seen an enormous increase in the amount of information stored in electronic format. It has been estimated that the amount of collected information in the world doubles every 20 months and the size and number of databases are increasing even faster and the ability to rapidly collect data has outpaced the ability to analyze it. Information is crucial for decision making, especially in business operations. As a response to those trends, the term 'Data Mining' (or 'Knowledge Discovery') has been coined to describe a variety of techniques to identify nuggets of information or decision-making knowledge in bodies of data, and extracting these in such a way that they can be put to use in the areas such as decision support, prediction, forecasting and estimation. Automated tools must be developed to help extract meaningful information from a flood of information. Moreover, these tools must be sophisticated enough to search for correlations among the data unspecified by the user, as the potential for unforeseen relationships to exist among the data is very high. A successful tool set to accomplish these goals will locate useful nuggets of information in the otherwise chaotic data space, and present them to the user in a contextual format.
  • 14. Introduction 3 Chapter 1 and the need for economic and scientific tools such knowledge. KDD includes techniques and tools to address this need. defines knowledge discovery in databases as follows[27]: "KDD is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in the data". Many literatures used the terms data mining (DM) and KDD interchangeably and regard them as synonymous. At the first international KDD conference in Montreal in 1995, it was proposed that the term "KDD" be employed to describe the whole process of extraction of knowledge from data. It was further proposed that the term 'data mining' should be used exclusively for the discovery stage of the KDD process. A more or less official definition of DM is the process of automatic extraction of novel, useful, and understandable patterns in large databases[20,21]. Hence, KDD includes many steps such as Focussing, Preprocessing, Transformation, Data Mining and Evaluation. Figure (1.1) abstracts the KDD process[14]. 1- Focussing :- define the goal of the particular KDD task. 2- Preprocessing :- specified data has to be integrated. 3- Transformation :- assure that each data object is represented in a common form which is suitable as input in the next step. 4- Data Mining :- detect the desired patterns contained within the given data. 5- Evaluation :- the user evaluates the extracted patterns with respect to the task defined in the focussing step.
  • 15. Introduction 4 Chapter 1 data mining is the most important step within the KDD process, defines data mining as follows[27]: Data mining is a step in the KDD process consisting of applying data analysis and discovery algorithms that, under acceptable computational efficiency limitations, produce a particular enumeration of patterns over the data. According to this definition data mining is the step that is responsible for the actual knowledge discovery and the data minig has many step such as Association Rules (AR), Sequential Patterns, Classification, Clustering, Similarity search. Association Rules is the most important task of DM. ARs represent the correlation between sets of items in transaction database. An AR is an implication of the form: X c% means that the person who reads the novels "The love in cholera era", Y , where X, and Yare sets of items each of which is called itemsets.{X} is called antecedent, while Y is called consequent such that {X} ∩ {Y}=∅ and C% is the confidence of the implication, for example the following rule The Merchant of Venice The ARs are extracted from mined frequent itemsets. Mining of frequent itemsets is a very complex process[3].the mining of association rules consists of two steps; the first one is mining of frequent itemsets ", and "Zoorba", also reads the novels {"The Trees and Marzooq's Association", "One Hundred Years of Segregation}, with certainty factor of 60%. The confidence of a rule is calculated as follows: Confidence = support (X∪Y)/support (X). where the support of an itemset is the number of its occurrences in the database. The confidante rule is of confidence greater than or equal to the user defined threshold called minimum confidence, minconf { “The love in cholera era” , “The Merchant of Venice “ , “Zoorba”} 60% {“ the Tree and Marzooq’s Association” , “One Hundred Years of egregation”}
  • 16. Introduction 5 Chapter 1 while the second one is extracting the rules from these frequent ilemsets. The first step, intermediate step, is massive computational step and attains the interest of the researcher since for many years many algorithms have been produced to accomplish this complicated mining process such as apriori, aprioriTID, aprioriHyprid [20], FP-growth [12], and CHARM [17], . The second step is extracting the association rules from the results of the previous step. The main drawback of Association Rules is thus the huge number of extracted rules that cannot be manually inspected by that and the existence of trivial or meaningless associations that are usually mined due to the exhaustive nature of the extraction algorithms[24]. Graphical tools and pruning methods are the main approaches used to face these problems and to make data mining to be effective and well-Evaluated, it is important to include the human in the data exploration process and combine the flexibility, creativity, and general knowledge of the human with the enormous storage capacity and the computational power of today’s computers. Visual data exploration aims at integrating the human in the data exploration process, applying human perceptual abilities to the analysis of large data sets available in today’s computer systems. The basic idea of visual data exploration is to present the data in some visual form, allowing the user to gain insight into the data, draw conclusions, and directly interact with the data. Visual data mining techniques have proven to be of high value in exploratory data analysis, and have a high potential for exploring large databases. Visual data exploration is especially useful when little is known about the data and the exploration goals are vague. Since the user is directly involved in the exploration process, shifting and adjusting the exploration goals is automatically done if necessary.There are many techniques used to visually represent the data we will discuss some of them in this project.
  • 17. Introduction 6 Chapter 1 Figure (1-1) Visualization and Data Mining The aim of the project is a Study of techniques which used to present the association rule that discovered from databases by used algorithms 1.2 Aim of the project developed for this purpose and identify the strengths and weaknesses of these techniques to
  • 18. Introduction 7 Chapter 1 reach the most appropriate technology to solve the main drawback of Association Rules. 1.3 Project Outline Chapter two explains the stage of Knowledge Discovery in Databases (KDD), task of data mining and concentrates on Association rules(AR). Chapter three focus on concept of Visualization, Visualization Benefits and Visualization Techniques which used to visualize the association rules (AR) due to their importance as an interesting field of this study. Chapter four presents the summary and future work of the techniques used to visualized association rules.
  • 20. Data mining and Association Rules 9 Chapter 2 Chapter Two Data mining and Association Rules 2.1 Introduction This chapter presents the general steps of Knowledge discovery in databases (KDD) and its relation with data mining. Also, it presents the tasks of data mining (DM) and concentrates on Association rules due to their importance as an interesting field of DM. 2.2 Knowledge Discovery in Databases In recent years the amount of data that is collected by advanced information systems has increased tremendously. Although very useful information of strategic importance is buried within this data, this information is not readily available for the users To analyze these huge amounts of data, the interdisciplinary field of Knowledge Discovery in Databases (KDD) has emerged. Applies efficient algorithms to extract interesting patterns and regularities from the data. KDD is defined as follows[27] : Knowledge Discovery in Databases is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data.
  • 21. Data mining and Association Rules 10 Chapter 2 According to this definition, data is a set of facts that is somehow accessible in electronic form. The term patterns indicate models and regularities which can be observed within the data. Patterns have to be valid, i.e. they should be true on new data with some degree of certainty. A novel pattern is not previously known or trivially true. The potentially usefulness of patterns refers to the possibility that they lead to an action providing a benefit. A pattern is understandable if it is interpretable by a human user. At last KDD is a process, indicating that there are several steps that are repeated in several iterations. Figure 2.1 displays the process of KDD in its basic form. Figure (2-1) The KDD process
  • 22. Data mining and Association Rules 11 Chapter 2 1- Focussing 2.3 KDD Process Stages KDD process is an interactive and iterative multi-step process which uses five steps to extract interesting knowledge according to some specific measures and thresholds.[14] 2- Preprocessing 3- Transformation 4- Data Mining 5- Evaluation 2.3.1 Focussing The first step is to define the goal of the particular KDD task. Another important aspect of this step is to determine the data to be analyzed and how to obtain it. 2.3.2 Preprocessing In this step the specified data has to be integrated, because it is not necessarily accessible on the same system. Furthermore, several objects may be described incompletely. Thus, the missing values need to be completed and inconsistent data should be corrected or left out. 2.3.3 Transformation The transformation step has to assure that each data object is represented in a common form which is suitable as input in the next step.
  • 23. Data mining and Association Rules 12 Chapter 2 2.3.4 Data Mining Data mining is the application of efficient algorithms to detect the desired patterns contained within the given data. Thus, the data mining step is responsible for finding patterns according to the predefined task. Since this step is the most important within the KDD process, we are going to have a closer look at it in the next section(2.4). 2.4 Data Mining 2.3.5 Evaluation At last, the user evaluates the extracted patterns with respect to the task defined in the focussing step. An important aspect of this evaluation is the representation of the found patterns. Depending on the given task, there are several quality measures and visualizations available to describe the result. The important phase to represent the result of KDD process by visualization techniques, these techniques allow the user to assess the results in easier and more flexible. If the user is satisfied with the quality of the patterns, the process is terminated. However, in most cases the results might not be satisfying after only one iteration. In those cases, the user might return to any of the previous steps to achieve more useful results. Since data mining is the most important step within the KDD process, we will treat it more carefully in this section. In [27, 30] Data Mining is defined as follows: Data mining is a step in the KDD process consisting of applying data analysis and discovery algorithms that, under acceptable
  • 24. Data mining and Association Rules 13 Chapter 2 computational efficiency limitations, produce a particular enumeration of patterns over the data. According to this definition data mining is the step that is responsible for the actual knowledge discovery. To emphasize the necessity that data mining algorithms need to process large amounts of data, the desired patterns has to be found under acceptable computational efficiency limitations. Let us note that there are many other definitions of data mining and that the term data mining and KDD are often used in a synonymous way. Data mining has many tasks such as: 1- Association Rules (AR): Given a database of transactions, where each transaction consists of a set of items, association discovery finds all the item sets that frequently occur together, and also the rules among them. we are going to have a closer look at it in the next section(2.5). 2- Sequential Patterns: Sequence Discovery aims at extracting sets of events that commonly occur over a period of time. 3- Classification and Regression: Classification aims to assign a new data item to one of several predefined categorical classes. The goal of classification and regression is to build a model that minimizes the error between the predicted and true values of the target variable. [15,18] it known as supervised induction[14]. Supervised induction is the machine learning task of inferring a function from supervised training data[30]. 4- Clustering: Clustering is the process of grouping the data records into meaningful subclasses (clusters) in a way that maximizes the similarity within clusters and minimizes the similarity between two different clusters [10].clustering is also called unsupervised induction.[3]
  • 25. Data mining and Association Rules 14 Chapter 2 5- Similarity search: Similarity search is performed on a database of objects to find the object(s) that are within a user-defined distance from the queried object, or to find all pairs within some distance of each other. Figure (2-2) Classification separates the data space (left) and clustering groups data objects (right) 2.5 Association Rule Association rules are ones of the promising aspects of data mining as knowledge discovery tool, and have been widely explored to date[27,14]. They allow to capture all possible rules that explain the presence of some attributes according to the presence of other attributes. An association rule, X⇒ Y, is a statement of the form "for a specified fraction of transactions, a particular value of an attribute set X determines the value of attribute set Y as another particular value under a certain confidence". Thus, association rules aim at discovering the patterns of co-occurrences of attributes in a database. For instance, an association rule in a supermarket basket data may be "In 10% of transactions, 85% of the people buying milk also buy milky-sweets in that transaction". The association rules may be useful in many
  • 26. Data mining and Association Rules 15 Chapter 2 applications such as supermarket transactions analysis, store layout and promotions on the items, telecommunications alarm correlation, university course enrollment analysis, customer behavior analysis in retailing, catalog design, word occurrence in text documents, stock transactions, etc[29,21,16]. Let I = {I1,..., Im} be a set of literals, called items. Let D be a set of transactions, where each transaction T is a set of items such that T ⊆ I, and each transaction is associated with a unique identifier called TID. Definition 2.1 An itemset X is a set of items in I. An itemset X is called a k-itemset if it contains k items from I. Definition 2.2 A transaction T satisfies an itemset X if X ⊆ T. The support of an itemset X in D, supportD Definition 2.5 An association rule is an implication of the form X ⇒ Y, where X ⊂ I, Y ⊂ I, and X ∩ Y = φ. X is called the antecedent of the rule, and Y is called the consequent of the rule. The rule X ⇒ Y holds in (X), is the number of transactions in D that satisfies X. Definition 2.3 An itemset X is called a large itemset if the support of X in D exceeds a minimum support threshold explicitly declared by the user, and a small itemset otherwise. Definition 2.4 The negative border of a set S ⊂ P(R), closed with respect to the set inclusion relation, is the set of minimal itemsets X ⊂ R not in S. The negative border of the set of large itemsets is the set of itemsets that are generated as a candidate but fail to qualify into the set of large itemsets.
  • 27. Data mining and Association Rules 16 Chapter 2 D with confidence c where c=supportD(X ∪Y)/supportD(X). The rule X⇒Y has support s in D if the fraction s of the transactions in D contain X ∪Y. Example: Suppose I={A, B, C, D, E} is the abbreviation of movie title in Movie-CD shop, these abbreviation are shown in Table (2.1). Table (2.2) Represent a database of the shop sells. Each transaction is defined Transaction identifier, TID. Table (2.3) shows the frequent itemsets according To minsup =50%, while Table (2.4) depicts all the ARs according to Minconf = 100%. Table (2.1) The items abbreviations of Database Item Abbreviation A Golden mountain B Gone with the Wind C Zoorba D Rain Man E Sound of Music
  • 28. Data mining and Association Rules 17 Chapter 2 Table (2.2) The items abbreviations of Database Transaction TID (Person) Items-(Attributes) 1 B,C,E 2 B,C,D,E 3 A,B,C,D,E 4 B,C,D 5 A,B,F 6 A,B,C,E Table (2.3) Large itemsets with minsup = 33%=2 Support Itemsets No. 6=100% B 1 5=83% C,BC 2 4=67% E,BE,CE,BCE 4 3=50% A,D,AB,BD,CD,BCD 6 2=33% AC,AE,DE,ABC,ABE,ACE,BDE, CDE,ABCE,BCDE 10 Table(2.4)AssociationRules Associationruleswithminconf=100% A→B(3/3) AC→B(2/2) AC→BE(2/2) C→B(5/5) AE→B(2/2) AE→BC(2/2) D→B(3/3) AC→E(2/2) DE→BC(2/2) E→B(4/4) AE→C(2/2) ABC→E(2/2) D→C(3/3) DE→B(2/2) ABE→C(2/2) E→C(4/4) DE→C(2/2) ACE→B(2/2) ABE→C(2/2) ACE→B(2/2) ABC→E(2/2)
  • 29. Data mining and Association Rules 18 Chapter 2 The mining of Association Rules is decomposed into two sub problems: 1- Discovering all frequent, (large), patterns (represented by large itemsets defined above), and; 2- Generating the association rules from those frequent itemsets. The first sub problem is very tedious, I/O intensive, and Computationally expensive for very large databases and this is the case for many real life applications. In large retailing data, the number of transactions is generally in the order of millions, and number of items (attributes) is generally in the order of thousands. When the data contains N items, then the number of possible large itemsets is 2N. There are many algorithms to mine frequent itemsets such as apriori, aprioriTID, and aprioriHyprid,[12]The second problem is straightforward, and can he done efficiently in a reasonable time and there is a well known algorithm presented in to accomplish the extraction of AR. The databases of frequent itemsets and ARs are assumed to be available in this thesis, therefore there IS no focus on any frequent itemset and AR mining algorithms.
  • 31. Visualization Techniques of Association Rules 20 Chapter 3 Chapter Three Visualization Techniques of Association Rules 3.1 Introduction This chapter, presents the concept of visualization, visualization benefits and Visualization Techniques which used to visualize the association rules (AR) in KDD process. 3.2 Visualization Visualization is the process of transforming data, information, and knowledge into visual form making use of human’s natural visual capabilities [9]. Typical of a visualization application is the field of computer graphics. The invention of computer graphics may be the most important development in visualization since the invention of central perspective in the renaissance period. The development of animation also helped advance visualization. In spite of the importance of the visualization, there are many limitations and difficulties that must be taken in consideration such as [28, 4]: The main limitations are: • Visualization techniques are always difficult to evaluate. This one is no exception. • The implementation may require, the use of an operating system from one specific vendor. •The visualization techniques offered are very limited. • The limitation of many 3D visualizations is the possible waste of screen space towards the comers of the screen. • The traditional menu bar approach would require long mouse movements from the visualization to the menu bar and vice versa.
  • 32. Visualization Techniques of Association Rules 21 Chapter 3 •Object interacting complexity occurs within 3-d environment, for example the user can transform the parallel bar chart into a matrix format and vice versa. 3.3 Benefits of Visualization Visual data exploration can be seen as a hypothesis generation process, the visualizations of the data allow the user to gain insight into the data and come up with new hypotheses. The verification of the hypotheses can also be done via data visualization, but may also be accomplished by automatic techniques from statistics, pattern recognition, or machine learning. In addition to the direct involvement of the user, the main advantages of visual data exploration over automatic data analysis techniques are: • Visual data exploration can easily deal with highly non-homogeneous and noisy data. • Visual data exploration is intuitive and requires no understanding of complex mathematical or statistical algorithms or parameters. • Visualization can provide a qualitative overview of the data, allowing data phenomena to be isolated for further quantitative analysis. As a result, visual data exploration usually allows a faster data exploration and often provides more interesting results, especially in cases where automatic algorithms fail. In addition, visual data exploration techniques provide a much higher degree of confidence in the findings of the exploration. These facts lead to a high demand for visual exploration techniques and make them indispensable in conjunction with automatic exploration techniques [6]. 3.4 Visualization of Association Rule Visualizing association rules aims at solving some major problems that come with association rules. First of all the rules found by automatic procedures must be filtered. Depending on what minimum confidence and what support is specified a vast amount of rules may be generated. There are at least five parameters involved in a visualization of association rules [19]. · Sets of antecedent items. · Sets of consequent items.
  • 33. Visualization Techniques of Association Rules 22 Chapter 3 · Associations between antecedent and consequent. · Rules' support. . Rules' confidence. The goal of association rule generation is to find interesting patterns and trends in transaction databases. Association rules are statistical relations between two or more items in the data set. In a supermarket basket application, associations express "the relations between items that are bought together. It is for example interesting if we find out that in 70% of the cases when people buy bread, they also buy milk. Association rules tell us that the presence of some items in a transaction implies the presence of other items In the same transaction with a certain probability, called confidence. A second important parameter is the support of an association rule, which is defined as the percentage of transactions in which the items co·occur. Let I = {i1., .. .in} be a set of items and let D be a set of transactions, where each transaction T is a set of items such that T ⊆ I. An association rule is an implication of the form X → Y, ,where X ⊆I ,Y ∈ I, X, Y≠ O. The confidence c is defined as the percentage of transactions that contain Y, given X The support is the percentage of transactions that contain both X and Y. For a given support and confidence level, there are efficient algorithms to determine all association rules. A problem, however, is that the resulting set of association rules is usually very large, especially for low support and confidence levels [8,9]. Using higher support and confidence levels may not be effective since then, useful rules may be overlooked. Pattern visualization techniques have been used to overcome this problem and to allow an interactive selection of good support and confidence levels. Figure (2.5) shows SGI MineSets Rule Visualizer[14], which maps the left and right hand sides of the rules to the x- and y-axes of the plot, respectively, and shows the confidence as the height of the bars and the support as the height of the discs. The color of the bars shows the interestingness of the rule.
  • 34. Visualization Techniques of Association Rules 23 Chapter 3 Figure (3.1) MineSet's Association Rule Visualizer Using the visualization, the user is able to see groups of related rules and the impact of different confidence and support levels. The goal of association rules visualization is to visualize a large number of association rules and their metadata in two- dimensional (2D) or three-dimensional (3D) display with minimum human interaction, minimum occlusion, and no screen swapping. There are many approaches developed to visualize association rules which are the: 1- Rule Table 2- two-dimensional matrix 3- directed graph 4- rule-item approach 5- Mosaic Plot 6- Double Decker Plot, 7- Parallel Coordinates, 8- Many- to- Many AR Visualization Technique. U3.4.1 Rule TableU The most straightforward method for the association rule visualization is to use the rule table. The following rule table format has been used [26]: tem 1 Item 2 Item 3 Item 4 Item 5 Item N Rule N Antecedent N Confidence Support
  • 35. Visualization Techniques of Association Rules 24 Chapter 3 Here Item1, Item2, …, and Item5 mean the 5 items, Rule N means the number of item in rule, antecedent N means the number of item in rule antecedent , Rule N – antecedentN= consequent. Table (3.1) Example of Association Rules in Rule Table Format Item 1 Item2 Item3 Item4 Item5 Item 5 Rule N Antecedent N Confidence Support Bread Milk Null Null Null Null 2 1 90% 10% Eggs Bread Milk Null Null Null 3 1 85% 7% Milk Bread Eggs Olive Null Null 4 2 60% 3% In Table 3.1, rule #3 (the third row), the column Rule N= 4 means the rule consists of 4 items.’ antecedentN=2’ means there are 2 items in the rule head. Milk, Bread 60% Eggs, Olive and support 3%. Rule table is the most straightforward way to show the association rule to the users. However, the rule table is only suitable to display the limited number of rules to the users. If the user needs to have a global view of all the rules, the rule table is not a suitable approach. • The strengths of a 2D matrix, however, break down when we need to Visualize many-to-one relationships such as association rules with 3.4.2 Two-Dimensional Matrix The design of a two-dimensional (2D) association matrix positions the antecedent and consequent items on separate axes of a square matrix. Customized icons are drawn on certain matrix tiles that connect the antecedent and the consequent items of the corresponding association rules. Different icons can be used to depict different metadata such as the support and confidence values of the rules. Figure (2.2) depicts an association rule (B→C). Both the height and the color of the column icon can be used to present metadata values. The values of support and confidence are mapped to 3D columns that are built separately on and beneath the matrix tiles. Other icons such as disk and bar are also used to visualize metadata in the rule visualize of MineSet [4,22,28] . A 2D matrix is arguably the most effective technique to show one-to- one binary relationship.
  • 36. Visualization Techniques of Association Rules 25 Chapter 3 multiple antecedent items. For example, in Figure (2.3) it is almost impossible to tell whether there is only one association rule (A+B→C) or two (A→C and B→C). • the lack of a practical way to identify the togetherness of individual antecedent items makes a 2D matrix a weaker candidate to visualize rules with multiple antecedent items. MineSet[23] addresses the problem by grouping all the antecedent items of an association rule as one unit and plotting it against its consequent, i.e., an antecedent -to-consequent plot. For example, a dedicated item group (A+B) is created in Figure (3.4) to describe the association rule (A +B→C). Figure (3.2) The colored column indicates the association rule (B →C). Different icon colors are used to show different metadata values of the association rule • The strategy works fine for smaller antecedent sets (e.g., less than 3items). In our text mining studies, we encounter association rules with as many as 12 items in the antecedent. • The replication of items in the antecedent groups creates a much larger antecedent-to-consequent plot when compared with the corresponding item-to-item plot. The loss of item identity within an antecedent group also defeats the purpose of visualizing the associations with a matrix. For example, the row (or column) of the matrix connected to an item can no longer be used to search for all the rules involving that item.
  • 37. Visualization Techniques of Association Rules 26 Chapter 3 Figure. (3.3) It is Very difficult to determine the differences between (A+B→C) and (A→C and B→C) Figure (3.4) The identities of A and B are lost in the new item group that was created to depict the association rule (A+B→C). • Another problem in a 2D·matrix display is object occlusion, especially when multiple icons are used to depict different metadata values on the matrix tiles. The occlusion problem is obvious in Figure (3.5).
  • 38. Visualization Techniques of Association Rules 27 Chapter 3 Figure (3.5) Object occlusions are unavoidable. Figure (3.6) Left: A →C and B →C. Right: A+B→C. 3.4.3 Directed Graph A directed graph is another prevailing technique to depict item associations. The nodes of a directed graph represent the items, and the edges represent the associations. Figure (3.6) shows three association rules (A→C, B→C, A+B→C). • This technique works well when only a few items (nodes) and associations (edges) are involved. An association graph can quickly turn in to a tangled display with as few as a dozen rules. Hetzler et at [19] address the problem by animating the edges to show the association of certain items with 3D rainbow arcs. The animation technique requires significcp1t human interaction to turn on and off the item nodes. It is not an easy task to show multiple metadata values including support and confidence, alongside the association rules.
  • 39. Visualization Techniques of Association Rules 28 Chapter 3 3.4.4 Rule-to-Item Visualization Technique To visualize many-to-one association rules, instead of using the tiles of a 2D matrix to show the item-to-item association rules, the matrix of the rule-to-item relationship is used to depict many-to-one rule[19]. In figure (3.7) the rows of the matrix floor represent the items (or topics in the context of text mining), and the columns represent the item associations. The blue and red blocks of each column (rule) represent the antecedent and the consequent of the rule. The identities of the items are shown along the right side of the matrix. The confidence and support levels of the rules are given by the corresponding bar charts in different scales at the far end of the matrix. The rule-to-item visualization approach has many advantages over all the other matrix- based predecessors: •There is virtually no upper limit on the number of items in an antecedent. We can analyze the distributions of the association rules(horizontal axis) as well as the items within (vertical axis) simultaneously. •Unlike Figure (3.4), the identity of individual items within an antecedent group is clearly shown. •No new antecedent groups are created because of the multiple antecedent items in association rules. •Because all the metadata are plotted at the far end and the height of the columns is scaled so that the front columns do not block the rear ones, few occlusions occur. • No screen swapping, animation, or human interaction (other than basic mouse zooming) is required to analyze the rules. Although this technique is the better one, there are fatal drawbacks that are suffers from, such as: • It is unable to visualize many-to-many association rule. • It suffers from antecedent-consequent interlining, i.e interleaving of the items of the antecedent and consequent, although they are given different colors
  • 40. Visualization Techniques of Association Rules 29 Chapter 3 • Deterioration of the naturalness of the rule's parts sequence. Figure (3.7) A visualization of item associations with support 0.4% and confidence 50%. Parallel Coordinates [1,2,13],the Basic elements of association rules are sets of items, which can be handled by listing all items along a vertical coordinate. The resulting coordinate is then repeated evenly in the horizontal direction until there are enough coordinates to host the longest of the association rule. An association rule can be visualized as a polygonal line connecting all items in the rule. Parameters such as support factor and confidence can be mapped to graphics features such as line-width and color. Figure (3.8) illustrates an association rule ab → cd as one polygonal line for its LHS, followed by an arrow connecting another polygonal line for its RHS. This visualization handles nicely the 3.4.5 Parallel Coordinates
  • 41. Visualization Techniques of Association Rules 30 Chapter 3 upward closure property of association rules: subsets of the RHS are absorbed and are not displayed. For example, ab → cd implies that abc → d, abd → c, ab → c, and ab → d are valid association rules. The implied association rules are not displayed.If two or more itemsets or rules have parts in common, for example, adbe and cdb in Figure (3.8). Figure (3.8) association rule ab → cd in Parallel Coordinates Visualization technique U3.4.6 Mosaic Plot The basic idea is to partition a rectangle on the y-axis according to one attribute and make the regions proportional to the sum of the corresponding data values the height of the bars instead of the width to show the parameter value. Then each resulting area is split in the same way according to a second attribute [13]. The coloring reflects the percentage of data items that fulfill a third attribute. The visualization shows the support and confidence values of all rules of the form X1,X2 → Y Figure (3.9). Mosaic plots are restricted to two attributes on the left side of the association rule [6].
  • 42. Visualization Techniques of Association Rules 31 Chapter 3 Figure (3.9) X1,X2 → Y in Mosaic Plot Figure (3.10) X1,X2 → Y in Double Decker Plot 3.4.7 Double Decker Plot Double decker plots can be used to show more than two attributes on the left side. The idea is to show a hierarchy of attributes on the bottom (heineken, coke, chicken in the example shown in figure (3.10) corresponding to the left hand side of the association rules and the bars on the top correspond to the number of items in the corresponding subset of the database and therefore visualize the support of the rule. The colored areas in the bars correspond to the percentage of data transactions that contain an additional item and therefore correspond to the support [6,11].
  • 43. Visualization Techniques of Association Rules 32 Chapter 3 As previously mentioned, three approaches developed to visualize association rules are the two-dimensional matrix, directed graph, and rule-item approach. Also, it is shown that rules-item approach is the best technique in spite of its drawbacks such as its inability to represent many-to -many AR and interlining of consequent and antecedent items in the visualization area. This section presents a new technique which excludes these drawbacks. It excludes the items interleaving and efficiently represents many-to-many AR. This technique has been called many-to-many AR visualization technique, MARVT. In this technique the visualization area is divided into three regions; antecedent region, statistical region, and consequent region. This technique can be implemented in 2- dimension or 3- dimension. If the 2-dimension implementation is chosen, the x-axis of the visualization area is rule identifiers, while the y-axis of antecedent region is items of the antecedent of the rules to be visualized. The y-axis of the statistical region is divided according to the confidence and support level of the rules, while the y-axis of the antecedent region is the items of the consequent of the selector rules. Figure (3.11) depicts the general structure of visualization area of the proposed technique. If an item i is belonging to the antecedent of a rule R a red ellipse is drawn in (R, i) position of the antecedent region and if an item j is part of the consequent of the rule R, a black ellipse is drawn in the (R, j) position of consequent area. The statistical region contains an important statistical value such as the confidence, support, support of antecedent item set and- support of consequent itemset of each rule in a specified region of a rule. The y-axis of statistical region is divided beginning at the minsup and minconf threshold and ending with 100%. The technique is flexible to visualize more statistical information such as the support for each item. Also, it is possible to display the order of the rule. If this technique is implemented as a 3-dimension, the same regions are utilized. X-axis is determined by rule id. Y-axis is determined by the items of antecedent and consequent for their regions respectively. Z-axis is determined by the support and confidence beginning at minconf or minsup threshold. 3.4.8 Many to Many AR Visualization Technique
  • 44. Visualization Techniques of Association Rules 33 Chapter 3 The third dimension is used to show the support of the items, the confidence, and the support of a rule, and the support of antecedent itemset and consequent itemsets. In this technique it is possible to visualize many-many rules, one-to-many, many-to-one, etc. because it determines two separated regions for antecedent an consequent which hold unlimited number of items. This separation, also, excludes the items interlining because the items of consequent and antecedent are presented at different regions. Figure (3.11) General Structure of Visualization Area of Proposed Many-to-Many Association Rules Visualization Technique, MARVT .
  • 45. Visualization Techniques of Association Rules 34 Chapter 3 To give more_ illustration of this technique, for example, consider the following rules: 1- a,b→c,q1 and its confidence, and support are 63, 2 respectively. 2- a,b,c→q1,m and its confidence, and support are 100, 3 respectively. 3-b,c→c,m,q1 and its confidence, and support are 50, 1 respectively. Figure (3, 12) shows the hypothesis visualization of these rules. As shown the antecedent items of R1 are a and b therefore, the position (R1, a) Figure (3.12) Visualization Area of Many-to-Many Association Rules Visualization Technique
  • 46. Visualization Techniques of Association Rules 35 Chapter 3 and (R1, b) of antecedent area is marked with red cycles and so on for the rest to rules. Also, (R1, c) and (Rl, ql) of consequent area are marked with black cycles because e and ql are the consequent items of Rl. The same process is done for R2 and R3. The statistical area visualizes the support of antecedent and consequent itemsets and furthermore the support and confidence of the rules. Also, it is possible to add the support of each item with its ellipse in its position. For example, the number 3 beside the ellipse of the item a in Rl represents the support of the item a and so on for each items. Figure (3.13) depicts the general structure of MARVT. This structure preserves the same pertaining regions; consequent, antecedent, and statistical regions.
  • 47. Visualization Techniques of Association Rules 36 Chapter 3 Figure (3.13) 3D General Structure of MARVT
  • 49. Conclusion 38 Chapter 4 Chapter four Summary and Future work 4.1 introductions In chapter three, the most important techniques which visualized the association rules are presented. In this chapter, the summary of these techniques by Review the most important advantages and disadvantages of these techniques, 4.2 Summary Summary by review of the most important characteristics of the previous techniques. 1- Visualize one-to- one, many-to-one, many-to-many relationships. 4.2.1 Rule Table 2- Ability to sort the results by the column of interest. 3- Visualize full details for the rule (antecedent, consequent, support, confidence). 4- Display the limited number of rules. 5- Its main limitation is the close resemblance to the original row textual form so that the user can inspect only few rules without having a global view of all the information. 6- Not interacting.
  • 50. Conclusion 39 Chapter 4 1- Effective technique to show one-to- one binary relationship. 4.2.2 Two-Dimensional Matrix 2- Break down when we need to Visualize many-to-one, many-to- many relationships. 3- Visualize full details for the rule (antecedent, consequent, support, confidence). 4- Object occlusion, especially when multiple icons are used to depict different metadata values on the matrix tiles. 5- Limited number of rule. 6- Not interacting. 1- Visualize one-to- one, many-to-one relationships. 4.2.3 Directed Graph 2- Display the limited number of rules. 3- Lacks a clear representation the 4- support and confidence. Edges overlap with each other to 5- Not interacting. different rules. 1- Visualize many-to-one relationships. 4.2.4 Rule-to-Item Visualization Technique 2- Break down when we need to Visualize many-to-many relationships. 3- No upper limit on the number of items in an antecedent. 4- Clearly shown to the individual items within an antecedent group. 5- No new antecedent groups are created because of the multiple antecedent items in association rules. 6- No Object occlusion. 7- Deterioration of the naturalness of the rule's parts sequence 8- Interleaving of the items of the antecedent and consequent, although they are given different colors. 9- Interacting.
  • 51. Conclusion 40 Chapter 4 1- Visualize one-to- one, many-to-one, many-to-many relationships. 4.2.5 Parallel Coordinates 2- Visualize full details for the rule (antecedent, consequent, support, confidence). 3- Visual rules overlap 4- Object occlusion. with each other. 5- Lacks a clear representation the support and confidence figure (4.1). Figure (4.1) The rules overlap and lack of representation is clear for the support and confidence 1- Visualize one-to- one, many-to-one, many-to-many relationships. 4.2.6 Mosaic Plot 2- Restricted to two attributes on the left side of the association rule. 3- Visualize one rule in time. 4- Difficult to understand and implementation. 5- Lacks a clear representation the support and confidence.
  • 52. Conclusion 41 Chapter 4 1- Visualize one-to- one, many-to-one, many-to-many relationships. 4.2.7 Double Decker Plot 2- Show more than two attributes on the left side. 3- Visualize one rule in time. 4- Lacks a clear representation the 5- Difficult to understand and implementation. support and confidence. 1- Best technique to Visualize many-to-many relationships. 4.2.8 Many to Many AR Visualization Technique 2- Visualize full details for the rule (antecedent, consequent, support, confidence). 3- No Object occlusion. 4- No upper limit on the number of items in an antecedent. 5- Clear representation the 6- Interacting. support and confidence. 7- Flexible to visualize more statistical information. 8- It is possible to display the order of the rule. 4.3 Future work The exploration of large data sets is an important but difficult problem. Information visualization techniques can be useful in solving this problem. Visual data exploration has a high potential, and many applications such as fraud detection and data mining can use information visualization technology for improved data analysis. Avenues for future work include the tight integration of visualization techniques with traditional techniques from such disciplines as statistics, machine learning, operations research, and simulation. Integration of visualization techniques and these more established methods would combine fast automatic data mining algorithms with the intuitive power of the human mind, improving the quality and speed of the data mining process. Visual data mining techniques also need to be tightly integrated with the systems used to manage the vast amounts of relational and semi structured information, including database management and data warehouse systems. The ultimate goal is to bring the power of visualization technology to every desktop to allow a better, faster and more intuitive exploration of very large data resources. This will not only be valuable in an economic sense but will also stimulate and delight the user.
  • 53. 42 References [1] Alfred Inselberg, “Parallel Coordinates: Visual Multidimensional Geometry and Its Application”, University of San Francisco, 2009. [2] Alfred Inselberg, “Visualizing high dimensional datasets and multivariate relations”, (tutorial).In: Proc. 6th [4] B. Bustos, D. KeIrn, C. Panse, T Schreck, “ Pattern Visualization", ACMSIGKDD Inter. Conf. on Knowledge Discovery and Data Mining (KDD 2000), Boston, MA (2000). [3] Anil K. Jain and Richard C. Dubes, “Algorithms for Clustering Data”, Prentice Hall, 1988. wawTyniuk}@dbvis.infUlUkonslanz., 2003. [5] Cheung D.W., Ng V., Fu A.W. and Fu Y., “Efficient Mining of Association Rules in Distributed Databases”, Special Issue in ata ining”,IEEE Transaction on Knowledge and Data Engineering, IEEE Computer Society, 1996. [6] Daniel Keim and Matthew Ward, “Visual Data MiningTechniques “, University of Konstanz, Germany and Worcester Polytechnic Institute, USA 2002. [7] D. Bruzzese, C. Davino, “Visual Post-Analysis of Association Rules”, Dept. of athematics and Statistics, University of Naples Federico, Italy, {dbruzzes, cdavino !aunina.it, 2002.
  • 54. 43 [8] D. Keim, "Designing fuel-Oriented Visualization Techniques” , University of Florida,,2000. [9] Gershon N., Eick S. G., and Card S., “Information Visualization”, ACM Interactions, vol. 5, no. 2, pp. 9-15, March/April 1998. [10] G. Karypis and V. Kumar, “Scalable Parallel Data Mining for Association Rules”, University Arizona,2000. [11] H. Hofmann, A. Siebes, and A. Wilhelm, “Visualizing association rules with interactive mosaic plots”, SIGKDD Int. Conf. On Knowledge Discovery & Data Mining (KDD 2000), Boston, MA, 2000. [12] J.Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation”. In Proc. 2000 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’00, Dallas, TX, May 2000. [13] Martin, A., Ward, M.O.: High dimensional brushing for interactive exploration of multivariate data, In: Proc. IEEE Conf. on Visualization, Atlanta,(1995). [14] Matthias Schubert, “Advanced Data Mining Techniques for Compound Objects”, Maximilians- University¨, 2004. [15] M. Deshpande and G. Karypis. ”Evaluation of Techniques for lassifying Biological equences”. Taipei, Taiwan2002. [16] Michael Hahsler and Sudheer Chelluboina, “Visualizing Association Rules: Introduction to theR-extension Package arulesViz”, Southern Methodist University 2004.
  • 55. 44 [17] M. J. Zaki and C. J. Hsiao. CHARM: “An efficient algorithm for closed itemset mining”. In Proc. 2002 SIAM Int. Conf. Data Mining (SDM’02), pages 457–473, Arlington, VA, April 2002. [18] Pang-Ning Tan, Michael Steinbach, and Vipin Kumar,” Introduction to Data Mining”, University of Minnesota , 2005. [19] P. C Wong, P. Whitney, J. Thomas, "Visualizing Anociation Rules for Text Mining", Pacific Northwest National Laboratory, 2000. [20] Rakesh Agrawal Ramakrishnan Srikant, “Fast Algorithms for Mining Association Rules”, IBM Almaden Research Center 1994. [21] Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami:” Mining Association Rules between Sets of Items in Large Databases”. SIGMOD Conference 1993. [22] Redpath, B. Sriruvasan, "Criteria for Comparati"e Study of VISualization Techniques in Data mining", IEEE 3..1 into Conf On Intelligent System, Tulsa, USA, 2003. [23] S. G. Inc. Mineset. http://www.sgi.com/software/mineset, 2001. [24] Simeon J. Simoff, Michael H. Böhlen, “Visual Data Mining”, University ofWestern Sydney,1998. [25] Stefanos Manganaris. “Supervised Classification with Temporal Data”, PhD thesis, School of Engineering, Vanderbilt University, 1997.
  • 56. 45 [26] Thomas S., “Architectures and Optimizations for Integrating Data Mining Algorithms with Database Systems”, Ph.D. dissertation, University of Florida, Gainesville, 1998. [27] U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy (Editors). “Advances in Knowledge Discovery and Data Mining”, Menlo Park, 1996. [28] U. M. Fan-ad, G. Grinstein, "Information Visualization in Dara Mining and Knowledge Discovery", Morgan Kaufman, San Francisco (CA), 2004. [29] vincent wing-sing cho ,”knowledge discovery from distributed and textual data” , Hong Kong University of Science and Technology , 1999. [30] http://en.wikipedia.org/wiki/Association_rule_learning.
  • 57. ‫ﺍﻟﻌﺭﺍﻕ‬ ‫ﺟﻣﻬﻭﺭﻳﺔ‬ ‫ﺍﻟﻌﻠـﻣﻲ‬ ‫ﻭﺍﻟﺑﺣﺙ‬ ‫ﺍﻟﻌﺎﻟﻲ‬ ‫ﺍﻟﺗﻌﻠﻳﻡ‬ ‫ﻭﺯﺍﺭﺓ‬ ‫ﻭﺍﻟﻣﻌﻠﻭﻣﺎﺗﻳﺔ‬ ‫ﻟﻠﺣﺎﺳﺑﺎﺕ‬ ‫ﺍﻟﻌﺭﺍﻗﻳﺔ‬ ‫ﺍﻟﻬﻳﺋﺔ‬ ‫ﺍﻟﻌﻠﻳـﺎ‬ ‫ﻟﻠﺩﺭﺍﺳـﺎﺕ‬ ‫ﺍﻟﻣﻌﻠـﻭﻣﺎﺗﻳـﺔ‬ ‫ﻣﻌﻬـﺩ‬ ‫ﺍﻟﻣﺟﻣﻌﺔ‬ ‫ﺍﻟﻘﻭﺍﻋﺩ‬ ‫ﺗﺭﺋﻳﺔ‬ ‫ﺗﻘﻧﻳﺎﺕ‬ ‫ﺩﺭﺍﺳﺔ‬ ‫ﻣﻘﺩﻣﺔ‬ ‫ﺭﺳﺎﻟﺔ‬ ‫ﺍﻟﻰ‬ ‫ﻭﺍﻟﻣﻌﻠﻭﻣﺎﺗﻳﺔ‬ ‫ﻟﻠﺣﺎﺳﺑﺎﺕ‬ ‫ﺍﻟﻌﺭﺍﻗﻳﺔ‬ ‫ﺍﻟﻬﻳﺋﺔ‬ /‫ﺍﻟﻌﻠﻳـﺎ‬ ‫ﻟﻠﺩﺭﺍﺳـﺎﺕ‬ ‫ﺍﻟﻣﻌﻠـﻭﻣﺎﺗﻳـﺔ‬ ‫ﻣﻌﻬـﺩ‬ ‫ﻓﻲ‬ ‫ﺍﻟﻌﺎﻟﻲ‬ ‫ﺍﻟﺩﺑﻠﻭﻡ‬ ‫ﺷﻬﺎﺩﺓ‬ ‫ﻧﻳﻝ‬ ‫ﻣﺗﻁﻠﺑﺎﺕ‬ ‫ﻣﻥ‬ ‫ﻛﺟﺯء‬ ‫ﺍﻟﺷﺑﻛﺔ‬ ‫ﻣﻭﺍﻗﻊ‬ ‫ﺗﻘﻧﻳﺔ‬ ‫ﻗﺑﻝ‬ ‫ﻣﻥ‬ ‫ﺷﻬﻳﺩ‬ ‫ﺻﺑﺎﺡ‬ ‫ﻣﺻﻁﻔﻰ‬ ‫ﺑﺄﺷﺭﺍﻑ‬ ‫ﺍﻟﺧﻔﺎﺟﻲ‬ ‫ﺣﺳﻳﻥ‬ .‫ﺩ‬ ‫ﺍﻻﻭﻝ‬ ‫ﺭﺑﻳﻊ‬ 1432 ‫ﺷﺑﺎﻁ‬ 2011