A Method of Mining Association Rules for Geographical Points of Interest

A Method of Mining Association Rules for
Geographical Points of Interest
Shiwei Lian, Jinning Gao, and Hongwei Li
International Journal of Geo-Information
Published: 10 April 2018
Presenter:
Iva Nurwauziyah | P66067021
Muhammad Irsyadi Firdaus | P66067055
Data Mining
Final Paper Presentation
6 June 2018
NATIONAL CHENG KUNG UNIVERSITY 1

Outline
1. Introduction
2. Experiment and Analysis
3. Conclusion
2

Introduction
 Association rule (AR) mining has become one of the most important task in the
field of knowledge discovery.
 This field example are:
Urban bus networks
Recommendation
Product-service systems
Etc.
 In this paper is to propose a method of mining AR for geographical points of
interest in order to find the relationship between geographic points of interest,
including quantitative relationship.
3

Experiment
 Step 1: Generating Transactional Dataset
4
Data Preparation
Spatial Clustering DBSCAN
Generating
Transaction Data
Step 1

Data Preparation
 The experimental object is in the urban shop in Luoyang, China
 All the data provided by the Urban Planning Bureau of Luoyang.
5
Item ID Item Name Quantities
1 Catering 8094
2 Bank 1014
3 Entertainment 4299
4 Store 17,888
5
Educational
Institution
784
6 Hotel 786
7 Government 469
8 Medical Institution 1332
sum All 34,666

Spatial Clustering with DBSCAN
 The target data contain coordinate
information.
 Using a spatial similarity of data points
to generate the cluster.
 The algorithm is DBSCAN
 This algorithm consider with radius and
density as the parameter.
6
Figure 1. The data objects of Luoyang

The Radius and Density
7
 The 𝜖-radius : the quantity of clusters decreases with increasing density
 The 𝜇-density : the quantity of clusters is represented as a single wave shape with increasing radius.
Table 1. DBSCAN Clustering

Spatial Data Clustering
8
Figure 2. Spatial Data Clustering. (a) Clusters (radius = 110, density = 80); (b) Detailed view of area in red in the left panel.

 Each cluster contains many data points
which belong to different item.
 The items in one cluster can be
regarded as one item of transaction
data.
9
Conversion the Cluster to Transactional Data
Trans. ID List of Item’s ID Trans. ID List of Item’s ID
1 1,3,4,5,7,8 17 1,2,3,4,5,6,7,8
2 1,2,3,4,5,6,7,8 18 1,3,4,5,6,8
3 1,3,4,5,6,8 19 1,2,3,4,5,6,7,8
4 1,3,4,5,6 20 1,3,4,6,7,8
5 1,2,3,4,6 21 1,2,3,4,5,7,8
6 1,2,3,4,5,6,8 22 1,2,3,4,5,6,8
7 1,3,4,7 23 1,2,3,4,5,6,7,8
8 1,2,3,4,8 24 1,2,3,4,5,6,7,8
9 1,2,3,4,5,6,8 25 1,2,3,4,6,7
10 1,3,4,5,6,8 26 1,2,3,4,5,6,7,8
11 1,2,3,4,6,8 27 1,2,3,4,5,6,7,8
12 1,2,3,4,7 28 1,2,3,4,5,6,7,8
13 1,3,4,6,8 29 1,2,3,4,5,6,8
14 1,2,3,4,5,6,8 30 1,2,3,4,6,8
15 1,2,3,4,6,8 31 1,2,3,4,5,6,7,8
16 1,2,3,4,5,6,8 32 1,2,3,4,6,8
Table 2. Transactional Data

Experiment
 Step 2 Generating Association Rule
10
Generating
Frequent Itemsets
FP-GCID
Filtering
Association Rule
Step 2

Apriori vs FP-Growth
11
Apriori
 An efficient method for frequent pattern mining.
 Uses a generate and test approach  generates
candidates itemsets and tests if they are frequent.
 Disadvantage:
 The candidate generation could be extremely
slow (pairs, triplets, etc.).
 The candidate generation could generate
duplicates depending on the implementation.
 The counting method iterates through all of the
transaction each time.
 Constant items make the algorithm a lot heavier.
 Huge memory consumption.
FP-Growth
 Allows frequent itemset discovery without candidate
generation.
 Advantage:
 The algorithm only needs to read file twice, as
opposite to apriori who reads it once for every
iteration.
 No candidate generation.
 Remove the need to calculate the pairs to be
counted (compresses dataset)
 Much faster than Apriori

FP-GCID
 Novel FP-Growth Algorithm Based on Cluster IDs
• FP-GCID is a novel FP-Growth
algorithm based on Cluster IDs.
• The FP-GCID reads 1 transaction
at a time and maps it to a path.
• The path can overlap when
transactions share items.
12
Figure 3. FP-GCID—novel FP-Growth algorithm based on Cluster IDs.

Generating Frequent Itemsets
13
Figure 4. Constructing CFP-Tree (Conditional Frequent Pattern Tree).
 Step 1: Constructing CFP-Tree
(Conditional Frequent Pattern Tree).
The nodes contain
items (inside the
circle).
The number of
item.
The red brackets
contain cluster IDs
For example: the first node behind the root
node represent item 1, and there are 32
transactional data items containing item 1.

 Step 2: Combining the repeated frequent
itemsets.
14
Since there are two frequent itemsets in the CFP-
Tree. Therefore, we should combine these repeated
itemsets.
Let suppose that itemset X = {2,5,7} as a one
frequent.
Combine these three itemsets with some
operations, such as:
 Adding the count of the itemsets together
 Adding the cluster IDs of the itemsets together
 Deleting the same cluster IDs in the last
itemset.

 Step 3: Checking the frequent
itemsets.
15
According to the quantity of transactional data
items and the CFP-Tree, we set the minimum
support as equal to 15 so that more than 50
percent of the frequent itemsets can be filtered
out.

List of item IDs contained in the clusters
Cluster IDs List of Item IDs
2 1,2,3,4,5,6,7,8
17 1,2,3,4,5,6,7,8
19 1,2,3,4,5,6,7,8
21 1,2,3,4,5,7,8
23 1,2,3,4,5,6,7,8
24 1,2,3,4,5,6,7,8
26 1,2,3,4,5,6,7,8
27 1,2,3,4,5,6,7,8
28 1,2,3,4,5,6,7,8
31 1,2,3,4,5,6,7,8
16
 The combining three itemsets {2,5,7}, can
obtain the terminal itemset.
 Finally, get the 10 cluster IDs.

Association Rule Filtering
• The objective of filtering is to remove redundant Ars
• The filtering process can be divided into two steps
• Confidence filtering
- maximum possible minimum confidence (0.99), that is, only rules with a
confidence of greater than 0.99 will be reserved.
- After minimum confidence filtering, there are 360 association rules reserved.
• Type filtering
- One-to-Multi, Multi-to-Multi, and Multi-to-One.
- After type filtering, only 193 ARs have been reserved.
17

Experiment
 Step 3: Ranking Rule
18
Ranking Rule MPP
Step 3

Finding Interesting Rules with MPP
• Mean-Product of Probabilities (MPP)
19
This section will present a ranking algorithm for the MPP, that is interpreted as averaging and multiplying
probability values derived from the calculation of in this below equation.

Rule ranking with two antecedent and fixed consequents of 1.
Ranking Rules Probability Ranking Rules Probability
1 ‘2,3-1’ 0.509 9 ‘6,4-1’ 0.007
2 ‘2,4-1’ 0.148 10 ‘8,3-1’ 0.007
3 ‘2,6-1’ 0.114 11 ‘5,6-1’ 0.006
4 ‘5,2-1’ 0.096 12 ‘5,4-1’ 0.006
5 ‘2,8-1’ 0.037 13 ‘7,4-1’ 0.004
6 ‘6,3-1’ 0.026 14 ‘8,4-1’ 0.002
7 ‘5,3-1’ 0.019 15 ‘8,6-1’ 0.002
8 ‘7,3-1’ 0.015 16 ‘5,8-1’ 0.001
20
Combine rules that have rank values of less than 0.09

21
Fixing the consequent as 1.
Taking n = 2 (‘n’ represents the number of the
antecedent in an association rule)
{Bank, Entertainment} ⇒ Catering is the most important
rule in n = 2 with the consequent equaling 1
The rule {2, 3} ⇒ 1 constitutes the largest
proportion of rule type {a1, a2} ⇒ 1, where a1, a2 ∈
{2, 3, 4, 5, 6, 7, 8}.
This indicates that one should be more considerate of
nearby banks and entertainment when deploying
a restaurant at a point of interest

22
The quantity of the antecedent fixed as 3.
The most effective rule with the corresponding
consequent or determine which items have the
closest relation when we choose the point of
interest.
The rule {2, 3, 4} ⇒ 1
The rule {2, 1, 4} ⇒ 3
The rule {2, 1, 3} ⇒ 4
{Bank, Entertainment, store} ⇒ Catering is the
most important rule in n = 3 with the consequent
equaling 1
{Bank, Catering, store} ⇒ Entertainment is the
most important rule in n = 3 with the consequent
equaling 3

23
The distribution for the rule {2, 3}  1 .
According to Equation MPP, the mean
probability of items of this rule is {0.5395,
0.1025} ⇒ 0.2663.
After normalization, this can be
expressed as {5.3, 1} ⇒ 2.7
5.3 banks and 1 entertainment establishments,
there would be approximately 2.7 catering
establishments nearby

Conclusions
• FP-GCID, which not only has the information of the antecedent and
the consequent, but also contains the cluster information of each
transaction item in the rule.
• MPP method can solve a problem to calculate rank all the rules, with
the top rule as the desired result.
24

“Thank you for Your Attention”
25

A Method of Mining Association Rules for Geographical Points of Interest

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to A Method of Mining Association Rules for Geographical Points of Interest

Similar to A Method of Mining Association Rules for Geographical Points of Interest (20)

More from National Cheng Kung University

More from National Cheng Kung University (20)

Recently uploaded

Recently uploaded (20)

A Method of Mining Association Rules for Geographical Points of Interest

Editor's Notes