The document discusses association rule mining and market basket analysis. It begins by describing the problem - a retailer tracking what items customers purchase together. It then provides definitions for key concepts in association rule mining like support, confidence and lifts. The goal is to discover all association rules that have minimum support and confidence thresholds. Common algorithms like Apriori are described to efficiently generate frequent itemsets and association rules from transactional data.
The evolution of computers is a fascinating journey that spans over several decades, marked by a series of significant advancements and innovations. This evolution has transformed computers from room-sized behemoths with limited functionality into the sleek, powerful devices we rely on in our daily lives today.
PRAGMATIC ANALYSIS OF WHATSAPP CHATS.docxResearchWap
Language is one of the most complex of all human-specific phenomena. Its convolutions of parts and meanings. It goes beyond its semiotic possibility of conveying information at a communicative level to have an art form that exists by it alone which is known as the literary art.
At the communicative level, it involves other tools to aid interlocution namely voice modulation and pitch, gesticulations which for the sake of this study include facial expressions and feedback from the other person for the clarification of meanings and understanding. At the interpersonal level, language is always based on contextual sense-making as the complexity of language always bears upon every utterance.
Remove the verbal and personal arrangement of this semiotic speech act and all the other tools for sense-making to go with it. So that one runs the risk of being misunderstood which defeats the aim of conversations at all levels. However, with the advancement of technology especially in the telecommunications sector, people now rely much on texting and instant messaging platforms are becoming more and more popular across social classes and with this popularity comes the need for its acceptance by formal and informal purposes.
The evolution of computers is a fascinating journey that spans over several decades, marked by a series of significant advancements and innovations. This evolution has transformed computers from room-sized behemoths with limited functionality into the sleek, powerful devices we rely on in our daily lives today.
PRAGMATIC ANALYSIS OF WHATSAPP CHATS.docxResearchWap
Language is one of the most complex of all human-specific phenomena. Its convolutions of parts and meanings. It goes beyond its semiotic possibility of conveying information at a communicative level to have an art form that exists by it alone which is known as the literary art.
At the communicative level, it involves other tools to aid interlocution namely voice modulation and pitch, gesticulations which for the sake of this study include facial expressions and feedback from the other person for the clarification of meanings and understanding. At the interpersonal level, language is always based on contextual sense-making as the complexity of language always bears upon every utterance.
Remove the verbal and personal arrangement of this semiotic speech act and all the other tools for sense-making to go with it. So that one runs the risk of being misunderstood which defeats the aim of conversations at all levels. However, with the advancement of technology especially in the telecommunications sector, people now rely much on texting and instant messaging platforms are becoming more and more popular across social classes and with this popularity comes the need for its acceptance by formal and informal purposes.
Where we explain how the concept of a crypto currency can lead to the creation of a new kind of autonomous corporation. This one part of a three part slide deck. For the full deck and the context please visit http://bit.ly/pm-bbc
One of the earliest presentation made in Bangla to a group of school students in Nabadwip in the year 2000. The original Powerpoint presentation is no more usable because the fonts used are not available any more. However the screen shots have been preserved here.
Bitcoin, Blockchain and the Crypto Contracts - Part 2Prithwis Mukerjee
Where we explain how the cryptographic ideas are used to create a crypto asset on the block chain. This one part of a three part slide deck. For the full deck and the context please visit http://bit.ly/pm-bbc
This slide is about all necessary information about the rules of data mining...
This slide is about all necessary information about the rules of data mining...
This slide is about all necessary information about the rules of data mining...
This slide is about all necessary information about the rules of data mining...
This slide is about all necessary information about the rules of data mining...
This slide is about all necessary information about the rules of data mining...
Association rule mining is a procedure which is meant to find frequent patterns, correlations, associations, or causal structures from data sets found in various kinds of databases such as relational databases, transactional databases, and other forms of data repositories.
Apriori is the most famous frequent pattern mining method. It scans dataset repeatedly and generate item sets by bottom-top approach.
Apriori algorithm is given by R. Agrawal and R. Srikant in 1994 for finding frequent itemsets in a dataset for boolean association rule. Name of the algorithm is Apriori because it uses prior knowledge of frequent itemset properties
Where we explain how the concept of a crypto currency can lead to the creation of a new kind of autonomous corporation. This one part of a three part slide deck. For the full deck and the context please visit http://bit.ly/pm-bbc
One of the earliest presentation made in Bangla to a group of school students in Nabadwip in the year 2000. The original Powerpoint presentation is no more usable because the fonts used are not available any more. However the screen shots have been preserved here.
Bitcoin, Blockchain and the Crypto Contracts - Part 2Prithwis Mukerjee
Where we explain how the cryptographic ideas are used to create a crypto asset on the block chain. This one part of a three part slide deck. For the full deck and the context please visit http://bit.ly/pm-bbc
This slide is about all necessary information about the rules of data mining...
This slide is about all necessary information about the rules of data mining...
This slide is about all necessary information about the rules of data mining...
This slide is about all necessary information about the rules of data mining...
This slide is about all necessary information about the rules of data mining...
This slide is about all necessary information about the rules of data mining...
Association rule mining is a procedure which is meant to find frequent patterns, correlations, associations, or causal structures from data sets found in various kinds of databases such as relational databases, transactional databases, and other forms of data repositories.
Apriori is the most famous frequent pattern mining method. It scans dataset repeatedly and generate item sets by bottom-top approach.
Apriori algorithm is given by R. Agrawal and R. Srikant in 1994 for finding frequent itemsets in a dataset for boolean association rule. Name of the algorithm is Apriori because it uses prior knowledge of frequent itemset properties
Case study on Transaction in Grocery Store divyawani2
It is the case study on Transactions in Grocery store using datasets of Groceries in R Programming Language.
This case study is based on Apriori Algorithm (confidence, lift , support).
Presentation made at Engage 2013, the annual event of the Public Relations Society of India on the topic of how to create your own personal radio and TV channel
Can a mind control a machine ? Can a machine control a mind ? Can a mind control another mind through a machine ? Explore all these fascinating possibilities in a slidedeck that I had presented at the PricewaterhouseCoopers Technology Forecast in Calcutta
Please click on the embedded Videos to see them in YouTube
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
2. Prithwis
Mukerjee 2
Let us describe the problem ...
A retailer sells the following items
And we assume that the shopkeeper keeps track of what
each customer purchases :
He needs to know which items are generally sold together
Bread Cheese Coffee Juice
Milk Tea BiscuitsSugar Newspaper
Items
10 Bread, Cheese, Newspaper
20 Bread, Cheese, Juice
30 Bread, Milk
40 Cheese, Juice, Milk, Coffee
50 Sugar, Tea, Coffee, Biscuits, Newspaper
60 Sugar, Tea, Coffee, Biscuits, Milk, Juice, Newspaper
70 Bread, Cheese
80 Bread, Cheese, Juice, Coffee
90 Bread, Milk
100 Sugar, Tea, Coffee, Bread, Milk, Juice, Newspaper
Trans ID
3. Prithwis
Mukerjee 3
Associations
Rules expressing relations between items in a
“Market Basket”
{ Sugar and Tea } => {Biscuits}
Is it true, that if a customer buys Sugar and Tea, she will
also buy biscuits ?
If so, then
These items should be ordered together
But discounts should not be given on these items at the same
time !
We can make a guess but
It would be better if we could structure this problem in
terms of mathematics
4. Prithwis
Mukerjee 4
Basic Concepts
Set of n Items on Sale
I = { i1
, i2
, i3
, i4
, i5
, i5
, ......, in
}
Transaction
A subset of I : T ⊆ I
A set of items purchased in an individual transaction
With each transaction having m items
ti
= { i1
, i2
, i3
, i4
, i5
, i5
, ......, im
} with m < n
If we have N transactions then we have t1
, t2
,t3
,.. tN
as
unique identifier for each transaction
D is our total data about all N transactions
D = {t1
, t2
,t3
,.. tN
}
5. Prithwis
Mukerjee 5
An Association Rule
Whenever X appears, Y also appears
X ⇒ Y
X ⊆ I, Y ⊆ I, X ∩ Y = ∅
X and Y may be
Single items or
Sets of items – in which the same item does not appear
X is referred to as the antecedent
Y is referred to as the consequent
Whether a rule like this exists is the focus of
our analysis
6. Prithwis
Mukerjee 6
Two key concepts
Support ( or prevalence)
How often does X and Y appear together in the basket ?
If this number is very low then it is not worth examining
Expressed as a fraction of the total number of transactions
Say 10% or 0.1
Confidence ( or predictability )
Of all the occurances of X, in what fraction does Y also
appear ?
Expressed as a fraction of all transactions containing X
Say 80% or 0.8
We are interested in rules that have a
Minimum value of support : say 25%
Minimum value of confidence : say 75%
7. Prithwis
Mukerjee 7
Mathematically speaking ...
Support (X)
= (Number of times X appears ) / N
= P(X)
Support (XY)
= (Number of times X and Y appears ) / N
= P(X ∩ Y)
Confidence (X ⇒ Y)
= Support (XY) / Support(X)
= Probability (X ∩ Y) / P(X)
= Conditional Probability P( Y | X)
Lift : an optional term
Measures the power of association
P( Y | X) / P(Y)
8. Prithwis
Mukerjee 8
The task at hand ...
Given a large set of transactions, we seek a
procedure ( or algorithm )
That will discover all association rules
That have a minimum support of p%
And a minimum confidence level of q%
And to do so in an efficient manner
Algorithms
The Naive or Brute Force Method
The Improved Naive algorithm
The Apriori Algorithm
Improvements to the Apriori algorithm
FP ( Frequent Pattern ) Algorithm
9. Prithwis
Mukerjee 9
Let us try the Naive Algorithm manually !
This is the set of transaction that we have ...
We want to find Association Rules with
Minimum 50% support and
Minimum 75% confidence
Items
100 Bread, Cheese
200 Bread, Cheese, Juice
300 Bread, Milk
400 Cheese, Juice, Milk
Trans ID
10. Prithwis
Mukerjee 10
Itemsets & Frequencies
Which sets are frequent ?
Since we are looking for a
support of 50%, we need a
set to appear in 2 out of 4
transactions
= (# of times X appears ) / N
= P(X)
6 sets meet this criteria
Item Sets Frequency
{Bread} 3
{Cheese } 3
{Juice} 2
{Milk} 2
{Bread, Cheese} 2
{Bread, Juice } 1
{Bread, Milk} 1
{Cheese, Juice} 2
{Cheese, Milk} 1
{Juice, Milk} 1
{Bread, Cheese, Juice} 1
{Bread, Cheese, Milk} 0
{Bread, Juice, Milk} 0
{Cheese, Juice, Milk} 1
{Bread, Cheese, Juice, Milk} 0
11. Prithwis
Mukerjee 11
A closer look at the “Frequent Set”
Look at itemsets with more than 1 item
{Bread, Cheese}, {Cheese, Juice}
4 rules are possible
Look for confidence levels
Confidence (X ⇒ Y)
= Support (XY) / Support(X)
Item Sets Frequency Rule Confidence
{Bread} 3 Bread => Cheese 2 / 3 67.00%
{Cheese } 3
{Juice} 2 Cheese => Bread 2 / 3 67.00%
{Milk} 2
{Bread, Cheese} 2 Cheese => Juice 2 / 3 67.00%
{Cheese, Juice} 2
Juice => Cheese 2 / 2 100.00%
12. Prithwis
Mukerjee 12
A closer look at the “Frequent Set”
Look at itemsets with more than 1 item
{Bread, Cheese}, {Cheese, Juice}
4 rules are possible
Look for confidence levels
Confidence (X ⇒ Y)
= Support (XY) / Support(X)
Item Sets Frequency Rule Confidence
{Bread} 3 Bread => Cheese 2 / 3 67.00%
{Cheese } 3
{Juice} 2 Cheese => Bread 2 / 3 67.00%
{Milk} 2
{Bread, Cheese} 2 Cheese => Juice 2 / 3 67.00%
{Cheese, Juice} 2
Juice => Cheese 2 / 2 100.00%
13. Prithwis
Mukerjee 13
The Big Picture
List all itemsets
Find frequency of each
Identify “frequent sets”
Based on support
Search for Rules within “frequent sets”
Based on confidence
14. Prithwis
Mukerjee 14
Looking Beyond the Retail Store
Counter Terrorism
Track phone calls made
or received from a
particular number every
day
Is an incoming call from a
particular number
followed by a call to
another number ?
Are there any sets of
numbers that are always
called together ?
Expand the item sets
to include
Electronic fund transfers
Travel between two
locations
Boarding cards
Railway reservation
All data is available
in electronic format
15. Prithwis
Mukerjee 15
Major Problem
Exponential Growth of
number of Itemsets
4 items : 16 = 24
members
n items : 2n
members
As n becomes larger, the
problem cannot be solved
anymore in finite time
All attempts are made to
reduce the number of
Item sets to be processed
“Improved” Naive
algorithm
Ignore sets with zero
frequency
Item Sets Frequency
{Bread} 3
{Cheese } 3
{Juice} 2
{Milk} 2
{Bread, Cheese} 2
{Bread, Juice } 1
{Bread, Milk} 1
{Cheese, Juice} 2
{Cheese, Milk} 1
{Juice, Milk} 1
{Bread, Cheese, Juice} 1
{Bread, Cheese, Milk} 0
{Bread, Juice, Milk} 0
{Cheese, Juice, Milk} 1
{Bread, Cheese, Juice, Milk} 0
16. Prithwis
Mukerjee 16
The APriori Algorithm
Consists of two PARTS
First find the frequent itemsets
Most of the cleverness happens here
We will do better than the naive algorithm
Find the rules
This is relatively simpler
17. Prithwis
Mukerjee 17
APriori : Part 1 - Frequent Sets
Step 1
Scan all transactions and find all frequent items that have
support above p%. This is set L1
Step 2 : Apriori-Gen
Build potential sets of k items from the Lk-1
by using pairs of
itemsets in Lk-1
that has the first k-2 items common and one
remaining item from each member of the pair.
This is Candidate set CK
Step 3 : Find Frequent Item Sets again
Scan all transactions and find frequency of sets in CK
that
are frequent : This gives LK
If LK
is empty, stop, else go back to step 2
18. Prithwis
Mukerjee 18
APriori : Part 1 - Frequent Sets
Step 1
Scan all transactions and find all frequent items that have
support above p% - This is set L1
20. Prithwis
Mukerjee 20
Apriori : Step 1 – Computing L1
Count frequency for each item and exclude
those that are below minimum support
Item No Item Name Frequency
1 Biscuits 4
2 Bread 13
3 Cereal 10
4 Cheese 11
5 Chocolate 9
6 Coffee 9
7 10
8 Eggs 2
9 Juice 11
10 Milk 6
11 Newspaper 2
12 Pastry 1
13 Rolls 2
14 Sugar 1
15 Tea 4
16 2
Donuts
Yogurt
Item No Item Name Frequency
2 Bread 13
3 Cereal 10
4 Cheese 11
5 Chocolate 9
6 Coffee 9
7 10
9 Juice 11
Donuts
25%
support
25%
support
This is set L1
21. Prithwis
Mukerjee 21
APriori : Part 1 - Frequent Sets
Step 1
Scan all transactions and find all frequent items that have
support above p%. This is set L1
Step 2 : Apriori-Gen
Build potential sets of k items from the Lk-1
by using pairs of
itemsets in Lk-1
that has the first k-2 items common and one
remaining item from each member of the pair.
This is Candidate set CK
22. Prithwis
Mukerjee 22
Step 2 : Computing C2
Given L1
, we now form candidate pairs of C2
. The 7 items in
form 21 pairs : d*(d-1)/2 – this is a quadratic function and
not a exponential function.
1 {Bread, Cereal}
2 {Bread, Cheese}
3 {Bread, Chocolate}
4 {Bread, Coffee}
5
6 {Bread,Juice}
7 {Cereal, Cheese}
8 {Cereal, Coffee}
9 {Cereal, Chocolate}
10
11 {Cereal, Juice}
12 {Cheese, Chocolate}
13 {Cheese, Coffee}
14
15 {Cheese, Juice}
16 {Chocolate, Coffee}
17
18 {Chocolate, Juice}
19
20 {Coffee, Juice}
21
{Bread, Donuts}
{Cereal, Donuts}
{Cheese, Donuts}
{Chocolate, Donuts}
{Coffee, Donuts}
{Donuts, Juice}
Item No Item Name Frequency
2 Bread 13
3 Cereal 10
4 Cheese 11
5 Chocolate 9
6 Coffee 9
7 10
9 Juice 11
Donuts
L1
to C2
L1
to C2
23. Prithwis
Mukerjee 23
APriori : Part 1 - Frequent Sets
Step 1
Scan all transactions and find all frequent items that have
support above p%. This is set L1
Step 2 : Apriori-Gen
Build potential sets of k items from the Lk-1
by using pairs of
itemsets in Lk-1
that has the first k-2 items common and one
remaining item from each member of the pair.
This is Candidate set CK
Step 3 : Find Frequent Item Sets again
Scan all transactions and find frequency of sets in CK
that
are frequent : This gives LK
If LK
is empty, stop, else go back to step 2
24. Prithwis
Mukerjee 24
From C2
to L2
based on minimum support
Candidate 2-Item Set Freq
{Bread, Cereal} 9
{Bread, Cheese} 8
{Bread, Chocolate} 4
{Bread, Coffee} 8
4
{Bread,Juice} 6
{Cereal, Cheese} 5
{Cereal, Coffee} 4
{Cereal, Chocolate} 5
4
{Cereal, Juice} 6
{Cheese, Chocolate} 4
{Cheese, Coffee} 9
3
{Cheese, Juice} 4
{Chocolate, Coffee} 1
7
{Chocolate, Juice} 7
1
{Coffee, Juice} 2
9
{Bread, Donuts}
{Cereal, Donuts}
{Cheese, Donuts}
{Chocolate, Donuts}
{Coffee, Donuts}
{Donuts, Juice}
Frequent 2-Item Set Freq
{Bread, Cereal} 9
{Bread, Cheese} 8
{Bread, Coffee} 8
{Cheese, Coffee} 9
7
{Chocolate, Juice} 7
9
{Chocolate, Donuts}
{Donuts, Juice}
25%
support
25%
support
This is a computationally
intensive step
L2
is not empty
This is set L2
25. Prithwis
Mukerjee 25
APriori : Part 1 - Frequent Sets
Step 1
Scan all transactions and find all frequent items that have
support above p%. This is set L1
Step 2 : Apriori-Gen
Build potential sets of k items from the Lk-1
by using pairs of
itemsets in Lk-1
that has the first k-2 items common and one
remaining item from each member of the pair.
This is Candidate set CK
Step 3 : Find Frequent Item Sets again
Scan all transactions and find frequency of sets in CK
that
are frequent : This gives LK
If LK
is empty, stop, else go back to step 2
26. Prithwis
Mukerjee 26
Step 2 Again : Get C3
We combine the appropriate frequent 2-item sets from L2
(which must have the same first item) and obtain four such
itemsets each containing three items
Frequent 2-Item Set Freq
{Bread, Cereal} 9
{Bread, Cheese} 8
{Bread, Coffee} 8
{Cheese, Coffee} 9
7
{Chocolate, Juice} 7
9
{Chocolate, Donuts}
{Donuts, Juice}
This is set L2
Candidate 3 item set
{Bread, Cheese, Cereal}
{Bread, Cereal, Coffee}
{Bread, Cheese, Coffee}
{Chocolate, Donut, Juice}
L2
to C3
L2
to C3
27. Prithwis
Mukerjee 27
Step 3 Again C3
to L3
Again Based on Minimum Support
Since C4 cannot be formed, L4 cannot be formed so we
stop here
Candidate 3 item set Frequency
{Bread, Cheese, Cereal} 4
{Bread, Cereal, Coffee} 4
{Bread, Cheese, Coffee} 8
7{Chocolate, Donut, Juice}
Frequent 3 item set Frequency
{Bread, Cheese, Coffee} 8
7{Chocolate, Donut, Juice}
25%
support
25%
support
28. Prithwis
Mukerjee 28
APriori : Part 1 - Frequent Sets
Step 1
Scan all transactions and find all frequent items that have
support above p%. This is set L1
Step 2 : Apriori-Gen
Build potential sets of k items from the Lk-1
by using pairs of
itemsets in Lk-1
that has the first k-2 items common and one
remaining item from each member of the pair.
This is Candidate set CK
Step 3 : Find Frequent Item Sets again
Scan all transactions and find frequency of sets in CK
that
are frequent : This gives LK
If LK
is empty, stop, else go back to step 2
29. Prithwis
Mukerjee 29
The APriori Algorithm
Consists of two PARTS
First find the frequent itemsets
Most of the cleverness happens here
We will do better than the naive algorithm
Find the rules
This is relatively simpler
30. Prithwis
Mukerjee 30
APriori : Part 2 – Find Rules
Rules will be found by looking at
3-item sets found in L3
2-item sets in L2 that are not subsets of L3
In each case we
Calculate confidence (A ⇒ B )
= P (B | A) = P(A ∩ B ) / P(A)
Some short hand
{Bread, Cheese, Coffee } is written as { B, C, D}
31. Prithwis
Mukerjee 31
Rules for Finding Rules !
A 3 item frequent set { BCD} results in 6 rules
B ⇒ CD, C ⇒ BD, D ⇒ BC
CD ⇒ B, BD ⇒ C, BC ⇒ D
Also note that
B ⇒ CD can also be written as
B ⇒ D, B ⇒ C
We now look at these two 3-item sets and find
their confidence levels
{ Bread, Cheese, Coffee}
{ Chocolate, Donuts, Juice }
From the L3
set ( the highest L set ) and note that support
for these rules is 8 and 7
32. Prithwis
Mukerjee 32
Rules from First of 2 Itemsets in L3
One rule drops out because confidence < 70%
Calculate confidence (X ⇒ Y )
= P (Y | X) = P(X ∩ Y ) / P(X)
Confidence of association rules from { Bread, Cheese, Coffee }
Rule Confidence
B => CD 8 13 0.615
C => BD 8 11 0.727
D => BC 8 9 0.889
CD => B 8 9 0.889
BD => C 8 8 1.000
BC => D 8 8 1.000
Support
of BCD
Frequency
of LHS
Item No Item Name Frequency
1 Biscuits 4
2 Bread 13
3 Cereal 10
4 Cheese 11
5 Chocolate 9
6 Coffee 9
7 10
8 Eggs 2
9 Juice 11
10 Milk 6
11 Newspaper 2
12 Pastry 1
13 Rolls 2
14 Sugar 1
15 Tea 4
16 2
Donuts
Yogurt
33. Prithwis
Mukerjee 33
Rules from First of 2 Itemsets in L3
One rule drops out because confidence < 70%
Confidence of association rules from { Bread B, Cheese C, Coffee D }
Rule Confidence
B => CD 8 13 0.615
C => BD 8 11 0.727
D => BC 8 9 0.889
CD => B 8 9 0.889
BD => C 8 8 1.000
BC => D 8 8 1.000
Support
of BCD
Frequency
of LHS
Frequent 2-Item Set Freq
{Bread, Cereal} 9
{Bread, Cheese} 8
{Bread, Coffee} 8
{Cheese, Coffee} 9
7
{Chocolate, Juice} 7
9
{Chocolate, Donuts}
{Donuts, Juice}
34. Prithwis
Mukerjee 34
Rules from Second of 2 Itemsets in L3
One rule drops out because confidence < 70%
Rule Confidence
N => MP 7 9 0.778
M => NP 7 10 0.700
P => NM 7 11 0.636
MP => N 7 9 0.778
NP => M 7 7 1.000
NM => P 7 7 1.000
Confidence of association rules from { chocolate N, donut M, juice P}
Support
of BCD
Frequency
of LHS
Item No Item Name Frequency
1 Biscuits 4
2 Bread 13
3 Cereal 10
4 Cheese 11
5 Chocolate 9
6 Coffee 9
7 10
8 Eggs 2
9 Juice 11
10 Milk 6
11 Newspaper 2
12 Pastry 1
13 Rolls 2
14 Sugar 1
15 Tea 4
16 2
Donuts
Yogurt
35. Prithwis
Mukerjee 35
Rules from Second of 2 Itemsets in L3
One rule drops out because confidence < 70%
Rule Confidence
N => MP 7 9 0.778
M => NP 7 10 0.700
P => NM 7 11 0.636
MP => N 7 9 0.778
NP => M 7 7 1.000
NM => P 7 7 1.000
Confidence of association rules from { chocolate N, donut M, juice P}
Support
of BCD
Frequency
of LHS
Frequent 2-Item Set Freq
{Bread, Cereal} 9
{Bread, Cheese} 8
{Bread, Coffee} 8
{Cheese, Coffee} 9
7
{Chocolate, Juice} 7
9
{Chocolate, Donuts}
{Donuts, Juice}
36. Prithwis
Mukerjee 36
Set of 14 Rules obtained from L3
C => BD
C => B 1 Cheese => Bread
C => D 2 Cheese => Coffee
D => BC
D => B 3 Coffee = > Bread
D => C 4 Coffee => Cheese
CD => B 5 Cheese, Coffee => Bread
BD => C 6 Bread, Coffee => Cheese
BC => D 7 Bread, Cheese => Coffee
N => MP
N => M 8
N => P 9 Chocolate => Juice
M => NP
M => P 10
M => N 11
MP => N 12
NP => M 13
NM => P 14
Chocolate => Donuts
Donuts => Chocolate
Donuts => Juice
Donuts, Juice => Chocolate
Chocolate , Juice => Donuts
Chocolate, Donuts => Juice
37. Prithwis
Mukerjee 37
What about L2
?
Look for sets in L2
that are not subsets of L3
{ Bread, Cereal} is the only candidate
Which gives are two more rules
Bread ⇒ Cereal
Cereal ⇒ Bread
Frequent 2-Item Set Freq
{Bread, Cereal} 9
{Bread, Cheese} 8
{Bread, Coffee} 8
{Cheese, Coffee} 9
7
{Chocolate, Juice} 7
9
{Chocolate, Donuts}
{Donuts, Juice}
Frequent 3 item set Frequency
{Bread, Cheese, Coffee} 8
7{Chocolate, Donut, Juice}
38. Prithwis
Mukerjee 38
Which are now added to get 16 rules
C => BD
C => B 1 Cheese => Bread
C => D 2 Cheese => Coffee
D => BC
D => B 3 Coffee = > Bread
D => C 4 Coffee => Cheese
CD => B 5 Cheese, Coffee => Bread
BD => C 6 Bread, Coffee => Cheese
BC => D 7 Bread, Cheese => Coffee
N => MP
N => M 8
N => P 9 Chocolate => Juice
M => NP
M => P 10
M => N 11
MP => N 12
NP => M 13
NM => P 14
15 Bread = > Cereal
16 Cereal => Bread
Chocolate => Donuts
Donuts => Chocolate
Donuts => Juice
Donuts, Juice => Chocolate
Chocolate , Juice => Donuts
Chocolate, Donuts => Juice
39. Prithwis
Mukerjee 39
So where are we ?
Apriori Algorithm
Consists of two
PARTS
First find the frequent
itemsets
Most of the cleverness
happens here
We will do better than
the naive algorithm
Find the rules
This is relatively simpler
We have just
completed the two
PARTS
Overall approach to
ARM is as follows
List all itemsets
Find frequency of each
Identify “frequent sets”
Based on support
Search for Rules within
“frequent sets”
Based on confidence
Naive Algorithm
Exponential Time
A Priori Algoritm
Polynomial Time
40. Prithwis
Mukerjee 40
Observations
Actual values of support and confidence
25%, 75% are very high values
In reality one works with far smaller values
“Interestingness” of a rule
Since X, Y are related events – not independent – hence
P(X ∩ Y) ≠ P(X)P(Y)
Interestingness ≈ P(X ∩ Y) – P(X)P(Y)
Triviality of rules
Rules involving very frequent items can be trivial
You always buy potatoes when you go to the market and
so you can get rules that connect potatoes to many things
Inexplicable rules
Toothbrush was the most frequent item on Tuesday ??
41. Prithwis
Mukerjee 41
Better Algorithms
Enhancements to
the Apriori
Algorithm
AP-TID
Direct Hashing and
Pruning (DHP)
Dynamic Itemset
Counting (DIC)
Frequent Pattern (FP)
Tree
Only frequent items are
needed to find association
rules – so ignore others !
Move the data of only
frequent items to a more
compact and efficient
structure
A Tree structure or a directed
graph is used
Multiple transactions with
same (frequent) items are
stored once with a count
information
42. Prithwis
Mukerjee 42
Software Support
KDNuggets.com
Excellent collections of software available
Bart Goethals
Free software for Apriori, FP-Tree
ARMiner
GNU Open Source software from UMass/Boston
DMII
National University of Singapore
DB2 Intelligent Data Miner
IBM Corporation
Equivalent software available from other vendors as well