Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MLDM Monday -- Optimization Series Talk

168,757 views

Published on

Taiwan R User Group/MLDM Monday

Published in: Technology

MLDM Monday -- Optimization Series Talk

  1. 1. R for finding the non-dominated rules in multi-objective optimization Bo-Han Wu Jan 27, 2014 Taiwan R User Group/MLDM Monday
  2. 2. Google搜尋「資料科學實驗室」 Wu Bo-Han rippleblue2002@gmail.com
  3. 3. Outline • • • • • • • • • Introduction Classification rule Accuracy Comprehensibility Interestingness Multi-objective optimization Non-dominated rules SPEA2 Case study Wu Bo-Han rippleblue2002@gmail.com
  4. 4. Data growing Wu Bo-Han rippleblue2002@gmail.com
  5. 5. Introduction • Facing the age of data explosion, the amount of data is increasing very fast in databases. • Those data normally include hidden knowledge, and they can be used to improve the decision-making process of any kinds of company. Wu Bo-Han rippleblue2002@gmail.com
  6. 6. Classification rule • Classification rule mining is a common technology in data mining. • From the historical data, rule can be generalized to classify unknown samples or predict the future. Wu Bo-Han rippleblue2002@gmail.com
  7. 7. Classification rule • IF <some conditions are satisfied> AND <some conditions are satisfied> THEN <assign some values of the goal attribute> • Example: IF Sex=Male AND Location = Taipei THEN Product= beer Wu Bo-Han rippleblue2002@gmail.com
  8. 8. Classification rule • Traditional mining techniques mostly focus on accuracy and usually generate lots of rules that are hard to choose meaningful ones from. • In order to select optimally meaningful rules, accuracy, comprehensibility and interestingness are employed as three objectives. Wu Bo-Han rippleblue2002@gmail.com
  9. 9. Accuracy sup( A & C ) A(R)  sup( A ) • • is the support for the rule R represents the support for the antecedent of rule R Wu Bo-Han rippleblue2002@gmail.com
  10. 10. Comprehensibility Nc ( R) C( R)  1  Mc • Nc(R)is the number of conditions in the rule • Mc is the maximum number of conditions that a rule can have Wu Bo-Han rippleblue2002@gmail.com
  11. 11. Interestingness sup( A & C ) sup( A & C )  sup( A & C )  I (R)    1   sup( A ) sup( C ) D     • 1 • 1 • gives the probability of generating the rule depending on the antecedent part gives the probability of generating the rule depending on the consequent part gives the probability of generating the rule depending on the whole data-set Wu Bo-Han rippleblue2002@gmail.com
  12. 12. Multi-objective optimization Low price and high performance 90% Performance 40% 10k Non‐dominated solution Price 100k Wu Bo-Han rippleblue2002@gmail.com
  13. 13. Multi-objective optimization Low price and high performance 90% 4 5 3 2 Performance 40% 1 10k Non‐dominated solution Price 100k Wu Bo-Han rippleblue2002@gmail.com
  14. 14. Multi-objective optimization Low price and high performance 90% 4 5 3 2 Performance 40% Non‐dominated solution set Non‐dominated solution 1 10k Price 100k Wu Bo-Han rippleblue2002@gmail.com
  15. 15. Multi-objective optimization • However, traditional methods handle multiobjective problems by converting them into a single objective problem. • But this approach can not guarantee to find optimal solutions for multiple objectives. Wu Bo-Han rippleblue2002@gmail.com
  16. 16. SPEA2 • SPEA2 is designed by the concept "survival of the fittest" from natural evolution. • The work intended to improve quality of individuals from solution space in each generation. • SPEA2 used the strategy of selection, crossover and mutation to retain the best individuals and discard worst ones. Wu Bo-Han rippleblue2002@gmail.com
  17. 17. SPEA2 Wu Bo-Han rippleblue2002@gmail.com
  18. 18. SPEA2 Initial population Empty archive Individual Wu Bo-Han rippleblue2002@gmail.com
  19. 19. SPEA2 Wu Bo-Han rippleblue2002@gmail.com
  20. 20. Non-dominated Wu Bo-Han rippleblue2002@gmail.com
  21. 21. Non-dominated solution Wu Bo-Han rippleblue2002@gmail.com
  22. 22. Non-dominated solution set E F Wu Bo-Han rippleblue2002@gmail.com
  23. 23. SPEA2 Individual Nod-dominated Individual Wu Bo-Han rippleblue2002@gmail.com
  24. 24. SPEA2 Wu Bo-Han rippleblue2002@gmail.com
  25. 25. SPEA2 Individual Nod-dominated Individual Wu Bo-Han rippleblue2002@gmail.com
  26. 26. SPEA2 Truncation operator Individual Nod-dominated Individual Wu Bo-Han rippleblue2002@gmail.com
  27. 27. SPEA2 Wu Bo-Han rippleblue2002@gmail.com
  28. 28. SPEA2 Wu Bo-Han rippleblue2002@gmail.com
  29. 29. SPEA2 2 4 1 3 Wu Bo-Han rippleblue2002@gmail.com
  30. 30. SPEA2 Wu Bo-Han rippleblue2002@gmail.com
  31. 31. SPEA2 Recombination = 10101101011001100100010010111 = 01100110010111001011101101101 Mutation = 01100101011001100100010010111 = 10010101011001100100010010111 Wu Bo-Han rippleblue2002@gmail.com
  32. 32. SPEA2 4 3 2 1 Wu Bo-Han rippleblue2002@gmail.com
  33. 33. Non-dominated rules • Three objectives IF Sex=Male AND Location = Taipei THEN Product= beer  A = 0.333333 C = 0.875000 I = 0.080000 – Accuracy – Comprehensibility – Interestingness Non‐dominated rules Wu Bo-Han rippleblue2002@gmail.com
  34. 34. Case study Transaction data of an insurance broker company Date : 2005 ‐ 2006 Attribute Gender Occupation Payment frequency Sales methods Payment methods Location Data source Company Product Attribute value index 男、女 士、工、軍 月、年、躉繳(一次性繳費) 電話行銷、臨櫃保險 信用卡、現金、郵局劃撥、轉帳 北部、中部、南部、東部(含離島) 百貨、電信業、銀行 外商壽險公司、本土壽險公司、本地產險公司 年金險、長年期壽險、短年期壽險、意外險、醫療險 Wu Bo-Han rippleblue2002@gmail.com
  35. 35. Case study Data Cleaning Data transaction Training data and  Test data Example: Male→01 Female→10 Accuracy Data transaction SPEA2 Comprehensibility Interestingness Example: 01→ Male 10→Female Wu Bo-Han rippleblue2002@gmail.com
  36. 36. Case study SPEA2 RuleMing.r Objective  Functions.r SPEA2 Functions.r Truncation.r Crossover.r Mutation.r Wu Bo-Han rippleblue2002@gmail.com
  37. 37. Case study Non-dominated rules Sales methods=臨櫃保險 AND Data source=百貨公司 AND Company=外商壽險公司 THEN Product=短年期壽險 Payment methods=現金 AND Data source=百貨公司 AND Company=外商壽險公司 THEN Product=短年期壽險 Payment frequency=月 AND Data source=百貨公司 Company=外商壽險公司 Wu Bo-Han rippleblue2002@gmail.com
  38. 38. Case study Non-dominated rules Sales methods=臨櫃保險 AND Data source= 百貨公司 AND Company=外商壽險公司 THEN Product=短年期壽險 「透過臨櫃保險參加保險的百貨公司 客戶,較會考慮在外商壽險公司購買 短年期壽險」 表示外商壽險公司在針對以臨櫃購買 保險的百貨公司客戶,可以推薦短年 期壽險。 Wu Bo-Han rippleblue2002@gmail.com
  39. 39. Thanks for your listening Wu Bo-Han rippleblue2002@gmail.com

×