SlideShare a Scribd company logo
Agenda
 Introduction
 Data Mining Process
 Techniques in Data Mining
 Association Rule Mining
 Hash Based Techniques
 Multi level Association Rules
 Partition Algorithm
 Parallel and distributed algorithms
 Measuring Quality of Rules
Data Mining
Definition
 Data mining is the process of sorting through
large data sets to identify patterns and
relationships that can help solve business
problems through data analysis.
 Data mining techniques and tools enable
enterprises to predict future trends and make
more-informed business decision.
Data mining process: How does it work?
 Data gathering. Relevant data for an analytics
application is identified and assembled.
 The data may be located in different source
systems, a data warehouse or a data lake, an
increasingly common repository in bid data
environments that contain a mix of structured
and unstructured data.
 External data sources may also be used.
Wherever the data comes from, a data scientist
often moves it to a data lake for the remaining
steps in the process.
Data mining process…
 Data preparation
 This stage includes a set of steps to get the data
ready to be mined.
 It starts with data exploration, profiling and
pre-processing, followed by data cleansing work
to fix errors and other data quality issues.
 Data transformation is also done to make data
sets consistent, unless a data scientist is
looking to analyze unfiltered raw data for a
particular application.
Data mining process…
 Mining the data. Once the data is prepared, a
data scientist chooses the appropriate data
mining technique and then implements one or
more algorithms to do the mining.
 In machine learning applications, the
algorithms typically must be trained on sample
data sets to look for the information being
sought before they're run against the full set of
data.
Data mining process…
 Data analysis and interpretation.
 The data mining results are used to create analytical
models that can help drive decision-making and other
business actions.
 The data scientist or another member of a data
science team must communicate the findings to
business executives and users, often through data
visualization and the use of data story telling
techniques
Data Mining Process
Techniques in Data Mining
 1. Classification:
 2. Clustering:
 3. Regression:
 4. Association Rule Mining
 5.Pattern Mining
 6.Anomaly Detection
 7.Neural Network Classifier
 8. Genetic Algorithms
ASSOCIATION RULE MINING
APRIORI-Market Basket Analysis
Association Rule Mining
 The purchasing of one product when another product is
purchased represents an association rule.
 Association rules are frequently used by retail stores to
assist in marketing, advertising, floor placement, and
inventory control.
 They have direct applicability to retail businesses, they
have been used for other purposes as well, including
predicting faults in telecommunication networks.
 Association rules are used to show the relationships
between data items
Mining Multilevel Association
Rules
 For many applications, it is difficult to find strong
associations among data items at low or primitive levels of
abstraction due to the sparsity of data at those levels. Strong
associations discovered at high levels of abstraction may
represent commonsense knowledge.
 . Therefore, data mining systems should provide capabilities
for mining association rules at multiple levels of abstraction,
with sufficient flexibility for easy traversal among different
abstraction spaces.
Mining Multilevel Association
Rules
 Mining multilevel association rules. Suppose we are given
the task-relevant set of transactional data in Table for sales
in an AllElectronics store, showing the items purchased
for each transaction.
 A concept hierarchy defines a sequence of mappings from
a set of low-level concepts to higher level, more general
concepts.
 Data can be generalized by replacing low-level
concepts within the data by their higher-level concepts,
or ancestors, from a concept hierarchy

 The concept hierarchy for the items is shown in
Figure . A concept hierarchy defines a sequence of
mappings from a set of low-level concepts to higher
level, more general concepts. Data can be generalized
by replacing low-level concepts within the data by
their higher-level concepts, or ancestors, from a
concept hierarchy.
Mining Multilevel Association
Rules
 Association rules generated from mining data at
multiple levels of abstraction are called multiple-
level or multilevel association rules.
 Multilevel association rules can be mined
efficiently using concept hierarchies under a
support-confidence framework.
 In general, a top-down strategy is employed, For
each level, any algorithm for discovering frequent
itemsets may be used, such as Apriori or its
variations.
Mining Multilevel Association
Rules
 Using uniform minimum support for all levels
(referred to as uniform support): The same
minimum support threshold is used when mining
at each level of abstraction.
 For example, in Figure 5.11, a minimum support
threshold of 5% is used throughout (e.g., for mining
from “computer” down to “laptop computer”).
Both “computer” and “laptop computer” are found
to be frequent, while “desktop computer” is not.
Mining Multilevel
Association Rules
 When a uniform minimum support threshold is
used, the search procedure is simplified. The
method is also simple in that users are required to
specify only one minimum support threshold.
 An Apriori-like optimization technique can be
adopted, based on the knowledge that an ancestor
is a superset of its descendants: The search avoids
examining item sets containing any item whose
ancestors do not have minimum support.
Using reduced minimum support at lower
levels (referred to as reduced support):
 Each level of abstraction has its own minimum support
threshold. The deeper the level of abstraction, the smaller
the corresponding threshold is.
 For example, in Figure, the minimum support thresholds
for levels 1 and 2 are 5% and 3%, respectively. In this
way, “computer,” “laptop computer,” and “desktop
computer” are all considered frequent.
Using item or group-based minimum support
(referred to as group-based support):
 Because users or experts often have insight as to which
groups are more important than others, it is sometimes
more desirable to set up user-specific, item, or group
based minimal support thresholds when mining multilevel
rules.
 For example, a user could set up the minimum support
thresholds based on product price, or on items of interest,
such as by setting particularly low support thresholds
for laptop computers and flash drives in order to pay
particular attention to the association patterns containing
items in these categories.
Mining Multidimensional Association Rules from
Relational Databases and Data Warehouses
 We have studied association rules that imply a single
predicate, that is, the predicate buys. For instance, in
mining our AllElectronics database, we may discover
the Boolean association rule
Mining Multidimensional
Association Rules
 Following the terminology used in multidimensional
databases, we refer to each distinct predicate in a rule as a
dimension.
 Hence, we can refer to Rule above as a single dimensional
or intra dimensional association rule because it contains a
single distinct predicate (e.g., buys)with multiple
occurrences (i.e., the predicate occurs more than once
within the rule).
Mining Multidimensional
Association Rules
 Considering each database attribute or warehouse
dimension as a predicate, we can therefore mine
association rules containing multiple predicates, such
as

Mining Multidimensional
Association Rules
 Association rules that involve two or more dimensions or
predicates can be referred to as multidimensional association
rules.
 Rule above contains three predicates (age, occupation,
and buys), each of which occurs only once in the rule. Hence,
we say that it has no repeated predicates.
 Multidimensional association rules with no repeated predicates
are called inter dimensional association rules. We can also mine
multidimensional association rules with repeated predicates,
which contain multiple occurrences of some predicates.
 These rules are called hybrid-dimensional association rules. An
example of such a rule is the following, where the
predicate buys is repeated:
Note that database attributes can be categorical or quantitative. Categorical
attributes have a finite number of possible values, with no ordering among the
values (e.g., occupation, brand, color).
 Categorical attributes are also called nominal attributes, because their
values are ―names of things.‖ Quantitative attributes are numeric and have an
implicit ordering among values (e.g., age, income, price).
Techniques for mining multidimensional association rules can be
categorized into two basic approaches regarding the treatment of quantitative
attributes.
Mining Quantitative Association Rules
 Quantitative association rules are multidimensional
association rules in which the numeric attributes
are dynamically discretized during the mining process so
as to satisfy some mining criteria, such as maximizing the
confidence or compactness of the rules mined.
 In this section, we focus specifically on how to mine
quantitative association rules having two quantitative
attributes on the left-hand side of the rule and one
categorical attribute on the right-hand side of the rule.
That is,
where Aquan1 and Aquan2 are tests on quantitative
attribute intervals (where the intervals are dynamically
determined), and Acat tests a categorical attribute from
the task-relevant data.
Such rules have been referred to as two-dimensional
quantitative association rules, because they contain two
quantitative dimensions.
Mining Quantitative Association
Rules
 For instance, suppose you are curious about the
association relationship between pairs of quantitative
attributes, like customer age and income, and the type of
television (such as high-definition TV, i.e., HDTV) that
customers like to buy. An example of such a 2-D
quantitative association rule is

Partition Algorithm
 If we are given a database with a small number of
potential large itemsets, say, a few thousand, then the
support for all of them can be tested in one scan by
using a partitioning technique.
 Partitioning divides the database into nonoverlapping
subsets; these are individually considered as separate
databases and all large itemsets for that partition,
called local frequent itemsets, are generated in one
pass.
Partition Algorithm
 The Apriori algorithm can then be used efficiently on
each partition if it fits entirely in main memory.
Partitions are chosen in such a way that each partition
can be accommodated in main memory.
Partition Algorithm
 As such, a partition is read only once in each pass. The
only limitation with the partition method is that the
minimum support used for each partition has a
slightly different meaning from the original value.
 The minimum support is based on the size of the
partition rather than the size of the database for
determining local frequent (large) itemsets.
 The actual support threshold value is the same as given
earlier, but the support is computed only for a
partition.
Partition Algorithm
 At the end of pass one, we take the union of all frequent
itemsets from each partition. This forms the global
candidate frequent itemsets for the entire database. When
these lists are merged, they may contain some false
positives.
 That is, some of the itemsets that are frequent (large) in
one partition may not qualify in several other partitions
and hence may not exceed the minimum support when the
original database is considered. Note that there are no false
negatives; no large itemsets will be missed.
Partition Algorithm
 The global candidate large itemsets identified in pass
one are verified in pass two; that is, their actual
support is measured for the entire database. At the end
of phase two, all global large itemsets are identified.
The Partition algorithm lends itself naturally to a
parallel or distributed implementation for better
efficiency.
PARALLEL AND DISTRIBUTED
ALGORITHMS
 Algorithms can be classified along the following
dimensions [DXGHOO] :
 Target: The algorithms we have examined generate all
rules that satisfy a given support and confidence level.
Alternatives to these types of algorithms are those that
generate some subset of the algorithms based on the
constraints given
 Type: Algorithms may generate regular association
rules or more advanced asso ciation rules s ch as those
introduced in section 6.7 and Chapters 8 and 9.
 Data type: We have examined rules generated for data
in categorical databases. Rules may also be derived for
other types of data such as plain text. This concept is
further investigated in Section 6.7 and in Chapter 7
when we look at Web usage mining.
 Data source: Our investigation has been limited to
the use of association rules for market basket data.
This assumes that data are present in a transaction.
The absence of data may also be important.
 Technique: The most common strategy to generate
association rules is that of finding large itemsets.
Other techniques may also be used.
 Itemset strategy: Itemsets may be counted in different
ways. The most naive approach is to generate all
itemsets and count them. As this is usually too space
 intensive, the bottom-up approach used by Apriori,
which takes advantage of the large itemset property, is
the most common appro ach. A top-down technique
could also be used.
 Transaction strategy: To count the itemsets, the
transactions in the database must be scanned. All
transactions could be counted, only a sample may be
counted, or the transactions could be divided into
partitions.
 Itemset data structure: The most common data structure
used to store the can didate itemsets and their counts is a
hash tree. Hash trees provide an effective technique to
store, access, and count itemsets. They are efficient to
search, insert, and delete itemsets . A hash tree is a
multiway search tree where the branch to be taken at
each level in the tree is determined by applying a hash
function as
 opposed to comparing key values to branching points in
the node.
 Transaction data structure: Transactions
may be viewed as in a flat file or as a TID
list, which can be viewed as an inverted
file. TI1e items usually are encoded (as
seen in the hash tree example), and the use
of bit maps has also been proposed.
 Optimization: These techniques
look at how to improve on the
performance of an algorithm given
data distribution (skewness) or
amount of main memory.
Architecture: Sequential, parallel,
and distributed algorithms have
been proposed.
Parallelism strategy: B oth data
parallelism and task parallelism
have been used.
Comparing algorithms
 Partitioning Scans Data Structure Parallelism
 Apriori m + 1 hash tree none
 Sampling 2 not specified none
 Partitioning 2 hash table none
 CDA m + l hash tree data
 DDA m + 1 tree t ask
Measuring Quality of Rules
Support
Confidence
LIft
 https://www.researchgate.net/publication/284921921_
Analysing_the_Quality_of_Association_Rules_by_Co
mputing_an_Interestingness_Measures
Association rule mining.pptx
Association rule mining.pptx

More Related Content

What's hot

Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
malathieswaran29
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text Mining
Hemant Sharma
 
Data Mining
Data MiningData Mining
Data Mining
SHIKHA GAUTAM
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
Acad
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
Knoldus Inc.
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization
janani thirupathi
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
Justin Cletus
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
DataminingTools Inc
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classification
Krish_ver2
 
Artificial Intelligence: Knowledge Acquisition
Artificial Intelligence: Knowledge AcquisitionArtificial Intelligence: Knowledge Acquisition
Artificial Intelligence: Knowledge Acquisition
The Integral Worm
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
hina firdaus
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
Mahendra Gupta
 
WEB MINING.
WEB MINING.WEB MINING.
WEB MINING.
Sushil kasar
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
Salah Amean
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
Valerii Klymchuk
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm

What's hot (20)

Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text Mining
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Data Mining
Data MiningData Mining
Data Mining
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
web mining
web miningweb mining
web mining
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Decision trees
Decision treesDecision trees
Decision trees
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classification
 
Artificial Intelligence: Knowledge Acquisition
Artificial Intelligence: Knowledge AcquisitionArtificial Intelligence: Knowledge Acquisition
Artificial Intelligence: Knowledge Acquisition
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
 
WEB MINING.
WEB MINING.WEB MINING.
WEB MINING.
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 

Similar to Association rule mining.pptx

Introduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIntroduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its Methods
IJSRD
 
UNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data MiningUNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data Mining
Nandakumar P
 
Chapter 1.pdf
Chapter 1.pdfChapter 1.pdf
Chapter 1.pdf
DrGnaneswariG
 
Multiple Minimum Support Implementations with Dynamic Matrix Apriori Algorith...
Multiple Minimum Support Implementations with Dynamic Matrix Apriori Algorith...Multiple Minimum Support Implementations with Dynamic Matrix Apriori Algorith...
Multiple Minimum Support Implementations with Dynamic Matrix Apriori Algorith...
ijsrd.com
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture Notes
FellowBuddy.com
 
Classification on multi label dataset using rule mining technique
Classification on multi label dataset using rule mining techniqueClassification on multi label dataset using rule mining technique
Classification on multi label dataset using rule mining technique
eSAT Publishing House
 
Hu3414421448
Hu3414421448Hu3414421448
Hu3414421448
IJERA Editor
 
Unit-V-Introduction to Data Mining.pptx
Unit-V-Introduction to  Data Mining.pptxUnit-V-Introduction to  Data Mining.pptx
Unit-V-Introduction to Data Mining.pptx
Harsha Patel
 
Cluster2
Cluster2Cluster2
Cluster2work
 
Paper id 212014126
Paper id 212014126Paper id 212014126
Paper id 212014126IJRAT
 
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
IOSR Journals
 
data mining
data miningdata mining
data mining
manasa polu
 
Data Mining based on Hashing Technique
Data Mining based on Hashing TechniqueData Mining based on Hashing Technique
Data Mining based on Hashing Technique
ijtsrd
 
PATTERN DISCOVERY FOR MULTIPLE DATA SOURCES BASED ON ITEM RANK
PATTERN DISCOVERY FOR MULTIPLE DATA SOURCES BASED ON ITEM RANKPATTERN DISCOVERY FOR MULTIPLE DATA SOURCES BASED ON ITEM RANK
PATTERN DISCOVERY FOR MULTIPLE DATA SOURCES BASED ON ITEM RANK
IJDKP
 
Data mining
Data miningData mining
Data mining
Daminda Herath
 
Data mining
Data miningData mining
Data mining
Daminda Herath
 

Similar to Association rule mining.pptx (20)

Introduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIntroduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its Methods
 
UNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data MiningUNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data Mining
 
Chapter 1.pdf
Chapter 1.pdfChapter 1.pdf
Chapter 1.pdf
 
Multiple Minimum Support Implementations with Dynamic Matrix Apriori Algorith...
Multiple Minimum Support Implementations with Dynamic Matrix Apriori Algorith...Multiple Minimum Support Implementations with Dynamic Matrix Apriori Algorith...
Multiple Minimum Support Implementations with Dynamic Matrix Apriori Algorith...
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture Notes
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
 
Gr2411971203
Gr2411971203Gr2411971203
Gr2411971203
 
Classification on multi label dataset using rule mining technique
Classification on multi label dataset using rule mining techniqueClassification on multi label dataset using rule mining technique
Classification on multi label dataset using rule mining technique
 
Hu3414421448
Hu3414421448Hu3414421448
Hu3414421448
 
Unit-V-Introduction to Data Mining.pptx
Unit-V-Introduction to  Data Mining.pptxUnit-V-Introduction to  Data Mining.pptx
Unit-V-Introduction to Data Mining.pptx
 
Cluster2
Cluster2Cluster2
Cluster2
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Paper id 212014126
Paper id 212014126Paper id 212014126
Paper id 212014126
 
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
 
data mining
data miningdata mining
data mining
 
Data Mining based on Hashing Technique
Data Mining based on Hashing TechniqueData Mining based on Hashing Technique
Data Mining based on Hashing Technique
 
PATTERN DISCOVERY FOR MULTIPLE DATA SOURCES BASED ON ITEM RANK
PATTERN DISCOVERY FOR MULTIPLE DATA SOURCES BASED ON ITEM RANKPATTERN DISCOVERY FOR MULTIPLE DATA SOURCES BASED ON ITEM RANK
PATTERN DISCOVERY FOR MULTIPLE DATA SOURCES BASED ON ITEM RANK
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 

Recently uploaded

Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
taiba qazi
 
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
NelTorrente
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Ashish Kohli
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
IreneSebastianRueco1
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
MERN Stack Developer Roadmap By ScholarHat PDF
MERN Stack Developer Roadmap By ScholarHat PDFMERN Stack Developer Roadmap By ScholarHat PDF
MERN Stack Developer Roadmap By ScholarHat PDF
scholarhattraining
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
goswamiyash170123
 

Recently uploaded (20)

Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
 
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
MERN Stack Developer Roadmap By ScholarHat PDF
MERN Stack Developer Roadmap By ScholarHat PDFMERN Stack Developer Roadmap By ScholarHat PDF
MERN Stack Developer Roadmap By ScholarHat PDF
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
 

Association rule mining.pptx

  • 1.
  • 2. Agenda  Introduction  Data Mining Process  Techniques in Data Mining  Association Rule Mining  Hash Based Techniques  Multi level Association Rules  Partition Algorithm  Parallel and distributed algorithms  Measuring Quality of Rules
  • 3. Data Mining Definition  Data mining is the process of sorting through large data sets to identify patterns and relationships that can help solve business problems through data analysis.  Data mining techniques and tools enable enterprises to predict future trends and make more-informed business decision.
  • 4. Data mining process: How does it work?  Data gathering. Relevant data for an analytics application is identified and assembled.  The data may be located in different source systems, a data warehouse or a data lake, an increasingly common repository in bid data environments that contain a mix of structured and unstructured data.  External data sources may also be used. Wherever the data comes from, a data scientist often moves it to a data lake for the remaining steps in the process.
  • 5. Data mining process…  Data preparation  This stage includes a set of steps to get the data ready to be mined.  It starts with data exploration, profiling and pre-processing, followed by data cleansing work to fix errors and other data quality issues.  Data transformation is also done to make data sets consistent, unless a data scientist is looking to analyze unfiltered raw data for a particular application.
  • 6. Data mining process…  Mining the data. Once the data is prepared, a data scientist chooses the appropriate data mining technique and then implements one or more algorithms to do the mining.  In machine learning applications, the algorithms typically must be trained on sample data sets to look for the information being sought before they're run against the full set of data.
  • 7. Data mining process…  Data analysis and interpretation.  The data mining results are used to create analytical models that can help drive decision-making and other business actions.  The data scientist or another member of a data science team must communicate the findings to business executives and users, often through data visualization and the use of data story telling techniques
  • 9. Techniques in Data Mining  1. Classification:  2. Clustering:  3. Regression:  4. Association Rule Mining  5.Pattern Mining  6.Anomaly Detection  7.Neural Network Classifier  8. Genetic Algorithms
  • 12. Association Rule Mining  The purchasing of one product when another product is purchased represents an association rule.  Association rules are frequently used by retail stores to assist in marketing, advertising, floor placement, and inventory control.  They have direct applicability to retail businesses, they have been used for other purposes as well, including predicting faults in telecommunication networks.  Association rules are used to show the relationships between data items
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25. Mining Multilevel Association Rules  For many applications, it is difficult to find strong associations among data items at low or primitive levels of abstraction due to the sparsity of data at those levels. Strong associations discovered at high levels of abstraction may represent commonsense knowledge.  . Therefore, data mining systems should provide capabilities for mining association rules at multiple levels of abstraction, with sufficient flexibility for easy traversal among different abstraction spaces.
  • 26. Mining Multilevel Association Rules  Mining multilevel association rules. Suppose we are given the task-relevant set of transactional data in Table for sales in an AllElectronics store, showing the items purchased for each transaction.  A concept hierarchy defines a sequence of mappings from a set of low-level concepts to higher level, more general concepts.  Data can be generalized by replacing low-level concepts within the data by their higher-level concepts, or ancestors, from a concept hierarchy 
  • 27.  The concept hierarchy for the items is shown in Figure . A concept hierarchy defines a sequence of mappings from a set of low-level concepts to higher level, more general concepts. Data can be generalized by replacing low-level concepts within the data by their higher-level concepts, or ancestors, from a concept hierarchy.
  • 28.
  • 29.
  • 30. Mining Multilevel Association Rules  Association rules generated from mining data at multiple levels of abstraction are called multiple- level or multilevel association rules.  Multilevel association rules can be mined efficiently using concept hierarchies under a support-confidence framework.  In general, a top-down strategy is employed, For each level, any algorithm for discovering frequent itemsets may be used, such as Apriori or its variations.
  • 31. Mining Multilevel Association Rules  Using uniform minimum support for all levels (referred to as uniform support): The same minimum support threshold is used when mining at each level of abstraction.  For example, in Figure 5.11, a minimum support threshold of 5% is used throughout (e.g., for mining from “computer” down to “laptop computer”). Both “computer” and “laptop computer” are found to be frequent, while “desktop computer” is not.
  • 32. Mining Multilevel Association Rules  When a uniform minimum support threshold is used, the search procedure is simplified. The method is also simple in that users are required to specify only one minimum support threshold.  An Apriori-like optimization technique can be adopted, based on the knowledge that an ancestor is a superset of its descendants: The search avoids examining item sets containing any item whose ancestors do not have minimum support.
  • 33.
  • 34. Using reduced minimum support at lower levels (referred to as reduced support):  Each level of abstraction has its own minimum support threshold. The deeper the level of abstraction, the smaller the corresponding threshold is.  For example, in Figure, the minimum support thresholds for levels 1 and 2 are 5% and 3%, respectively. In this way, “computer,” “laptop computer,” and “desktop computer” are all considered frequent.
  • 35. Using item or group-based minimum support (referred to as group-based support):  Because users or experts often have insight as to which groups are more important than others, it is sometimes more desirable to set up user-specific, item, or group based minimal support thresholds when mining multilevel rules.  For example, a user could set up the minimum support thresholds based on product price, or on items of interest, such as by setting particularly low support thresholds for laptop computers and flash drives in order to pay particular attention to the association patterns containing items in these categories.
  • 36. Mining Multidimensional Association Rules from Relational Databases and Data Warehouses  We have studied association rules that imply a single predicate, that is, the predicate buys. For instance, in mining our AllElectronics database, we may discover the Boolean association rule
  • 37. Mining Multidimensional Association Rules  Following the terminology used in multidimensional databases, we refer to each distinct predicate in a rule as a dimension.  Hence, we can refer to Rule above as a single dimensional or intra dimensional association rule because it contains a single distinct predicate (e.g., buys)with multiple occurrences (i.e., the predicate occurs more than once within the rule).
  • 38. Mining Multidimensional Association Rules  Considering each database attribute or warehouse dimension as a predicate, we can therefore mine association rules containing multiple predicates, such as 
  • 39. Mining Multidimensional Association Rules  Association rules that involve two or more dimensions or predicates can be referred to as multidimensional association rules.  Rule above contains three predicates (age, occupation, and buys), each of which occurs only once in the rule. Hence, we say that it has no repeated predicates.  Multidimensional association rules with no repeated predicates are called inter dimensional association rules. We can also mine multidimensional association rules with repeated predicates, which contain multiple occurrences of some predicates.  These rules are called hybrid-dimensional association rules. An example of such a rule is the following, where the predicate buys is repeated:
  • 40. Note that database attributes can be categorical or quantitative. Categorical attributes have a finite number of possible values, with no ordering among the values (e.g., occupation, brand, color).  Categorical attributes are also called nominal attributes, because their values are ―names of things.‖ Quantitative attributes are numeric and have an implicit ordering among values (e.g., age, income, price). Techniques for mining multidimensional association rules can be categorized into two basic approaches regarding the treatment of quantitative attributes.
  • 41. Mining Quantitative Association Rules  Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy some mining criteria, such as maximizing the confidence or compactness of the rules mined.  In this section, we focus specifically on how to mine quantitative association rules having two quantitative attributes on the left-hand side of the rule and one categorical attribute on the right-hand side of the rule. That is,
  • 42. where Aquan1 and Aquan2 are tests on quantitative attribute intervals (where the intervals are dynamically determined), and Acat tests a categorical attribute from the task-relevant data. Such rules have been referred to as two-dimensional quantitative association rules, because they contain two quantitative dimensions.
  • 43. Mining Quantitative Association Rules  For instance, suppose you are curious about the association relationship between pairs of quantitative attributes, like customer age and income, and the type of television (such as high-definition TV, i.e., HDTV) that customers like to buy. An example of such a 2-D quantitative association rule is 
  • 44. Partition Algorithm  If we are given a database with a small number of potential large itemsets, say, a few thousand, then the support for all of them can be tested in one scan by using a partitioning technique.  Partitioning divides the database into nonoverlapping subsets; these are individually considered as separate databases and all large itemsets for that partition, called local frequent itemsets, are generated in one pass.
  • 45. Partition Algorithm  The Apriori algorithm can then be used efficiently on each partition if it fits entirely in main memory. Partitions are chosen in such a way that each partition can be accommodated in main memory.
  • 46. Partition Algorithm  As such, a partition is read only once in each pass. The only limitation with the partition method is that the minimum support used for each partition has a slightly different meaning from the original value.  The minimum support is based on the size of the partition rather than the size of the database for determining local frequent (large) itemsets.  The actual support threshold value is the same as given earlier, but the support is computed only for a partition.
  • 47. Partition Algorithm  At the end of pass one, we take the union of all frequent itemsets from each partition. This forms the global candidate frequent itemsets for the entire database. When these lists are merged, they may contain some false positives.  That is, some of the itemsets that are frequent (large) in one partition may not qualify in several other partitions and hence may not exceed the minimum support when the original database is considered. Note that there are no false negatives; no large itemsets will be missed.
  • 48. Partition Algorithm  The global candidate large itemsets identified in pass one are verified in pass two; that is, their actual support is measured for the entire database. At the end of phase two, all global large itemsets are identified. The Partition algorithm lends itself naturally to a parallel or distributed implementation for better efficiency.
  • 49.
  • 50.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.  Algorithms can be classified along the following dimensions [DXGHOO] :  Target: The algorithms we have examined generate all rules that satisfy a given support and confidence level. Alternatives to these types of algorithms are those that generate some subset of the algorithms based on the constraints given
  • 58.  Type: Algorithms may generate regular association rules or more advanced asso ciation rules s ch as those introduced in section 6.7 and Chapters 8 and 9.
  • 59.  Data type: We have examined rules generated for data in categorical databases. Rules may also be derived for other types of data such as plain text. This concept is further investigated in Section 6.7 and in Chapter 7 when we look at Web usage mining.
  • 60.  Data source: Our investigation has been limited to the use of association rules for market basket data. This assumes that data are present in a transaction. The absence of data may also be important.
  • 61.  Technique: The most common strategy to generate association rules is that of finding large itemsets. Other techniques may also be used.
  • 62.  Itemset strategy: Itemsets may be counted in different ways. The most naive approach is to generate all itemsets and count them. As this is usually too space  intensive, the bottom-up approach used by Apriori, which takes advantage of the large itemset property, is the most common appro ach. A top-down technique could also be used.
  • 63.  Transaction strategy: To count the itemsets, the transactions in the database must be scanned. All transactions could be counted, only a sample may be counted, or the transactions could be divided into partitions.
  • 64.  Itemset data structure: The most common data structure used to store the can didate itemsets and their counts is a hash tree. Hash trees provide an effective technique to store, access, and count itemsets. They are efficient to search, insert, and delete itemsets . A hash tree is a multiway search tree where the branch to be taken at each level in the tree is determined by applying a hash function as  opposed to comparing key values to branching points in the node.
  • 65.  Transaction data structure: Transactions may be viewed as in a flat file or as a TID list, which can be viewed as an inverted file. TI1e items usually are encoded (as seen in the hash tree example), and the use of bit maps has also been proposed.
  • 66.  Optimization: These techniques look at how to improve on the performance of an algorithm given data distribution (skewness) or amount of main memory.
  • 67. Architecture: Sequential, parallel, and distributed algorithms have been proposed. Parallelism strategy: B oth data parallelism and task parallelism have been used.
  • 68. Comparing algorithms  Partitioning Scans Data Structure Parallelism  Apriori m + 1 hash tree none  Sampling 2 not specified none  Partitioning 2 hash table none  CDA m + l hash tree data  DDA m + 1 tree t ask
  • 69. Measuring Quality of Rules Support
  • 71. LIft