Lect7 Association analysis to correlation analysis
Mining Data from Reservoir Simulation Result
1. Mining Data from Reservoir Simulation Results
using R
(to be presented at ICIPEG ’10)
Akmal Aulia, Tham Boon Keat, M. Sanif Maulut,
Dr. Noaman El-Khatib, Mazuin Jasamai
EOR Centre, UT PETRONAS
Supervisor: Prof. Dr. Noaman El-Khatib
June 9th , 2010
3. Introduction to Association Rules
Market Basket Analysis - imagine a set of transactions
”Does a person who purchase a milk and eggs tends to buy
bread?”
4. Introduction to Association Rules
Market Basket Analysis - imagine a set of transactions
”Does a person who purchase a milk and eggs tends to buy
bread?”
Math-wise: Degree of chance of the frequent set S such that,
S = {milk,eggs, bread}, where,
A = {milk,eggs},
B = {bread}
Thus, A ⇒ B, A ∪ B ⊆ S, and A ∩ B = ∅
5. Introduction to Association Rules
Market Basket Analysis - imagine a set of transactions
”Does a person who purchase a milk and eggs tends to buy
bread?”
Math-wise: Degree of chance of the frequent set S such that,
S = {milk,eggs, bread}, where,
A = {milk,eggs},
B = {bread}
Thus, A ⇒ B, A ∪ B ⊆ S, and A ∩ B = ∅
A ⇒ B is called a ”Rule”
6. Introduction to Association Rules
Market Basket Analysis - imagine a set of transactions
”Does a person who purchase a milk and eggs tends to buy
bread?”
Math-wise: Degree of chance of the frequent set S such that,
S = {milk,eggs, bread}, where,
A = {milk,eggs},
B = {bread}
Thus, A ⇒ B, A ∪ B ⊆ S, and A ∩ B = ∅
A ⇒ B is called a ”Rule”
Association Rules in Amazon.com:
”Customers who bought this item also bought..”
7. Introduction to Association Rules: A Simple Example
Table: Transactional Data Sample
Transaction ID Items
1 milk, eggs
2 eggs, butter
3 peanut
4 milk, eggs, bread
5 eggs, bread
Support of A = {milk, eggs} = 2 / 5 = 0.4 = 40%
8. Introduction to Association Rules: A Simple Example
Table: Transactional Data Sample
Transaction ID Items
1 milk, eggs
2 eggs, butter
3 peanut
4 milk, eggs, bread
5 eggs, bread
Support of A = {milk, eggs} = 2 / 5 = 0.4 = 40%
Support of B = {bread} = 3 / 5 = 0.6 = 60%
9. Introduction to Association Rules: A Simple Example
Table: Transactional Data Sample
Transaction ID Items
1 milk, eggs
2 eggs, butter
3 peanut
4 milk, eggs, bread
5 eggs, bread
Support of A = {milk, eggs} = 2 / 5 = 0.4 = 40%
Support of B = {bread} = 3 / 5 = 0.6 = 60%
Support of A ⇒ B = 1/5 = 0.2 =20%
10. Introduction to Association Rules: A Simple Example
Table: Transactional Data Sample
Transaction ID Items
1 milk, eggs
2 eggs, butter
3 peanut
4 milk, eggs, bread
5 eggs, bread
Support of A = {milk, eggs} = 2 / 5 = 0.4 = 40%
Support of B = {bread} = 3 / 5 = 0.6 = 60%
Support of A ⇒ B = 1/5 = 0.2 =20%
A⇒B
Confidence of A ⇒ B = Support of of A = 0.2 = 0.5 = 50%
Support 0.4
11. Introduction to Association Rules: A Simple Example
Table: Transactional Data Sample
Transaction ID Items
1 milk, eggs
2 eggs, butter
3 peanut
4 milk, eggs, bread
5 eggs, bread
Support of A = {milk, eggs} = 2 / 5 = 0.4 = 40%
Support of B = {bread} = 3 / 5 = 0.6 = 60%
Support of A ⇒ B = 1/5 = 0.2 =20%
A⇒B
Confidence of A ⇒ B = Support of of A = 0.2 = 0.5 = 50%
Support 0.4
0.2
Lift of A ⇒ B = (0.4)(0.6) = 0.83
13. Association Rules: Formal Definition
Support(A ⇒ B) = P(A ∪ B) (1)
P(A ∪ B)
Confidence(A ⇒ B) = P(B|A) = (2)
P(A)
14. Association Rules: Formal Definition
Support(A ⇒ B) = P(A ∪ B) (1)
P(A ∪ B)
Confidence(A ⇒ B) = P(B|A) = (2)
P(A)
P(B|A) P(A ∪ B)
Lift(A ⇒ B) = = (3)
P(B) P(A)P(B)
15. Association Rules: Formal Definition
Support(A ⇒ B) = P(A ∪ B) (1)
P(A ∪ B)
Confidence(A ⇒ B) = P(B|A) = (2)
P(A)
P(B|A) P(A ∪ B)
Lift(A ⇒ B) = = (3)
P(B) P(A)P(B)
Reliable Rule: Large Confidence, Large Support, and Lift > 1
17. Implementation using R
Language for statistical computing, graphics
GNU General Public License ⇒ FREE!!
18. Implementation using R
Language for statistical computing, graphics
GNU General Public License ⇒ FREE!!
Over 2416 contributed packages - ARULES, GA, ANN, etc
19. Implementation using R
Language for statistical computing, graphics
GNU General Public License ⇒ FREE!!
Over 2416 contributed packages - ARULES, GA, ANN, etc
Over 106 books published - Bayesian, Monte Carlo, Chemistry
20. Implementation using R
Language for statistical computing, graphics
GNU General Public License ⇒ FREE!!
Over 2416 contributed packages - ARULES, GA, ANN, etc
Over 106 books published - Bayesian, Monte Carlo, Chemistry
Parallel Computation
21. Mining Data from Reservoir Simulation Results
1 Injection at (1,1), 1 Production at (5,5)
22. Mining Data from Reservoir Simulation Results
Let reservoir simulation parameter Xi such that i ∈ {1, 2, · · · , 8}.
Table: Description of Parameters
Parameter Description Units
X1 Surf. rate at inj. well stb/day
X2 Bot. hole pres. limit at the inj. well psia
X3 Liq. rate at the prod. well stb/day
X4 Bot. hole pres. limit at the prod. well psia
X5 Bot. hole pres. datum at the prod. well ft
X6 Bot. hole pres. datum at the inj. well ft
X7 Inner diameter of the prod. well ft
X8 Inner diameter of the inj. well ft
OIPt0 −OIPt
T Final oil recovery (recovery factor) OIPt0
23. Dataset Construction
Use Excel to generate random numbers for each parameter Xi ,
ROUND(RAND() ∗ (max(Xi ) − min(Xi )) + min(Xi ), 0)
Figure: Dataset Formation
25. Data Pre-processing
Association Rules analyzes Categorical Data. ⇒ Convert it!
26. Data Pre-processing
Association Rules analyzes Categorical Data. ⇒ Convert it!
Split each parameters by some Xik such that
Xi = {Xi1 , Xi2 , . . . , Xik , . . . , Xi8 }. Xik can be,
Xik = mean(Xi ) (4)
Xik = median(Xi ) (5)
Thus, ∀Xi ,
High(⇑), for Xik =k > Xik
Xik =k =
Low(⇓), for Xik =k ≤ Xik
27. Data Pre-processing
Thus, you’ll see something like this, (use R to do this)
Table: Obtained Categorical Dataset
X1 X2 X3 X4 X5 X6 X7 X8 T
HIGH LOW LOW LOW HIGH LOW LOW LOW HIGH
LOW HIGH LOW LOW LOW HIGH LOW LOW LOW
... ... ... ... ... ... ... ... ...
29. The ARULES package
Use R’s ARULES package
Apriori algorithm,
i=1
Di = {G : G is an itemset of size 1}
while Di is not empty do
database pass:
for each set in Di , test whether it is frequent
let Fi be the collection of frequent sets from Di
candidate formation:
Let Di be those sets of size i + 1 whose all subsets are
frequent
end while
30. Results and Discussion
Table: Limits for the Apriori algorithm’s parameters
Lift Confidence
1.5 0.9
⇒ Generated some 24098 rules (for mean-based splitting)
35. Summary
X2 (BHP limit, INJ) and X3 (liquid rate, PROD) frequently
showed up - clue to higher recovery!
36. Summary
X2 (BHP limit, INJ) and X3 (liquid rate, PROD) frequently
showed up - clue to higher recovery!
More parameters, more wells, a more legitimate study.