R for finding the non-dominated rules
in multi-objective optimization

Bo-Han Wu
Jan 27, 2014

Taiwan R User Group/MLDM Monday
Google搜尋「資料科學實驗室」

Wu Bo-Han rippleblue2002@gmail.com
Outline
•
•
•
•
•
•
•
•
•

Introduction
Classification rule
Accuracy
Comprehensibility
Interestingness
Multi-objective optimization
Non-dominated rules
SPEA2
Case study
Wu Bo-Han rippleblue2002@gmail.com
Data growing

Wu Bo-Han rippleblue2002@gmail.com
Introduction
• Facing the age of data
explosion, the amount of
data is increasing very fast
in databases.
• Those data normally include
hidden knowledge, and they
can be used to improve the
decision-making process of
any kinds of company.
Wu Bo-Han rippleblue2002@gmail.com
Classification rule
• Classification rule mining is a common
technology in data mining.
• From the historical data, rule can be generalized
to classify unknown samples or predict the future.

Wu Bo-Han rippleblue2002@gmail.com
Classification rule
• IF <some conditions are satisfied> AND <some
conditions are satisfied> THEN <assign some
values of the goal attribute>
• Example:
IF Sex=Male AND Location = Taipei THEN
Product= beer

Wu Bo-Han rippleblue2002@gmail.com
Classification rule
• Traditional mining techniques mostly focus on
accuracy and usually generate lots of rules that
are hard to choose meaningful ones from.
• In order to select optimally meaningful rules,
accuracy, comprehensibility and interestingness
are employed as three objectives.

Wu Bo-Han rippleblue2002@gmail.com
Accuracy

sup( A & C )
A(R) 
sup( A )
•
•

is the support for the rule R
represents the support for the antecedent
of rule R
Wu Bo-Han rippleblue2002@gmail.com
Comprehensibility

Nc ( R)
C( R)  1 
Mc
• Nc(R)is the number of conditions in the rule
• Mc is the maximum number of conditions that a
rule can have
Wu Bo-Han rippleblue2002@gmail.com
Interestingness
sup( A & C ) sup( A & C )  sup( A & C ) 
I (R) 

 1 

sup( A )
sup( C )
D





• 1
• 1
•

gives the probability of generating the rule depending on the antecedent part
gives the probability of generating the rule depending on the consequent part
gives the probability of generating the rule depending on the whole data-set

Wu Bo-Han rippleblue2002@gmail.com
Multi-objective optimization
Low price and high performance
90%

Performance

40%
10k
Non‐dominated solution

Price

100k

Wu Bo-Han rippleblue2002@gmail.com
Multi-objective optimization
Low price and high performance
90%

4

5

3
2

Performance

40%

1

10k
Non‐dominated solution

Price

100k

Wu Bo-Han rippleblue2002@gmail.com
Multi-objective optimization
Low price and high performance
90%

4

5

3
2

Performance

40%
Non‐dominated solution set
Non‐dominated solution

1

10k

Price

100k

Wu Bo-Han rippleblue2002@gmail.com
Multi-objective optimization
• However, traditional methods handle multiobjective problems by converting them into a
single objective problem.
• But this approach can not guarantee to find
optimal solutions for multiple objectives.

Wu Bo-Han rippleblue2002@gmail.com
SPEA2
• SPEA2 is designed by the
concept "survival of the fittest"
from natural evolution.
• The work intended to improve
quality of individuals from
solution space in each
generation.
• SPEA2 used the strategy of
selection, crossover and
mutation to retain the best
individuals and discard worst
ones.
Wu Bo-Han rippleblue2002@gmail.com
SPEA2

Wu Bo-Han rippleblue2002@gmail.com
SPEA2

Initial population

Empty archive

Individual
Wu Bo-Han rippleblue2002@gmail.com
SPEA2

Wu Bo-Han rippleblue2002@gmail.com
Non-dominated

Wu Bo-Han rippleblue2002@gmail.com
Non-dominated solution

Wu Bo-Han rippleblue2002@gmail.com
Non-dominated solution set
E

F

Wu Bo-Han rippleblue2002@gmail.com
SPEA2

Individual
Nod-dominated Individual
Wu Bo-Han rippleblue2002@gmail.com
SPEA2

Wu Bo-Han rippleblue2002@gmail.com
SPEA2

Individual
Nod-dominated Individual
Wu Bo-Han rippleblue2002@gmail.com
SPEA2
Truncation
operator

Individual
Nod-dominated Individual
Wu Bo-Han rippleblue2002@gmail.com
SPEA2

Wu Bo-Han rippleblue2002@gmail.com
SPEA2

Wu Bo-Han rippleblue2002@gmail.com
SPEA2

2

4

1
3

Wu Bo-Han rippleblue2002@gmail.com
SPEA2

Wu Bo-Han rippleblue2002@gmail.com
SPEA2
Recombination
= 10101101011001100100010010111
= 01100110010111001011101101101

Mutation
= 01100101011001100100010010111
= 10010101011001100100010010111

Wu Bo-Han rippleblue2002@gmail.com
SPEA2

4

3

2

1

Wu Bo-Han rippleblue2002@gmail.com
Non-dominated rules
• Three objectives

IF Sex=Male AND Location = Taipei
THEN Product= beer 

A = 0.333333
C = 0.875000
I = 0.080000

– Accuracy
– Comprehensibility
– Interestingness

Non‐dominated rules
Wu Bo-Han rippleblue2002@gmail.com
Case study
Transaction data of an insurance broker company
Date : 2005 ‐ 2006

Attribute
Gender
Occupation
Payment frequency
Sales methods
Payment methods
Location
Data source
Company
Product

Attribute value index
男、女
士、工、軍
月、年、躉繳(一次性繳費)
電話行銷、臨櫃保險
信用卡、現金、郵局劃撥、轉帳
北部、中部、南部、東部(含離島)
百貨、電信業、銀行
外商壽險公司、本土壽險公司、本地產險公司
年金險、長年期壽險、短年期壽險、意外險、醫療險

Wu Bo-Han rippleblue2002@gmail.com
Case study

Data Cleaning

Data transaction

Training data and 
Test data

Example: Male→01 Female→10

Accuracy
Data transaction

SPEA2

Comprehensibility
Interestingness

Example: 01→ Male 10→Female

Wu Bo-Han rippleblue2002@gmail.com
Case study
SPEA2
RuleMing.r
Objective 
Functions.r
SPEA2
Functions.r

Truncation.r

Crossover.r

Mutation.r

Wu Bo-Han rippleblue2002@gmail.com
Case study
Non-dominated rules
Sales methods=臨櫃保險 AND Data source=百貨公司 AND Company=外商壽險公司
THEN Product=短年期壽險
Payment methods=現金 AND Data source=百貨公司 AND Company=外商壽險公司
THEN Product=短年期壽險
Payment frequency=月 AND Data source=百貨公司 Company=外商壽險公司

Wu Bo-Han rippleblue2002@gmail.com
Case study
Non-dominated rules
Sales methods=臨櫃保險 AND Data source=
百貨公司 AND Company=外商壽險公司
THEN Product=短年期壽險

「透過臨櫃保險參加保險的百貨公司
客戶,較會考慮在外商壽險公司購買
短年期壽險」

表示外商壽險公司在針對以臨櫃購買
保險的百貨公司客戶,可以推薦短年
期壽險。

Wu Bo-Han rippleblue2002@gmail.com
Thanks for your listening

Wu Bo-Han rippleblue2002@gmail.com

MLDM Monday -- Optimization Series Talk