Datamining 2nd Decisiontree

890 views
839 views

Published on

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
890
On SlideShare
0
From Embeds
0
Number of Embeds
320
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Datamining 2nd Decisiontree

  1. 1. (1/3) • • • • •
  2. 2. (2/3) • ( Classification, Pattern Recognition) • • • A
  3. 3. (3/3) • (Clustering) • • A B • A B • (Association Rules) • • A B •
  4. 4. • • 7
  5. 5. • K L 8
  6. 6. • • • • • T T’ Yes No Yes No (A) (B) 9
  7. 7. • T v∈T cost(v) v T . {cost(x) | x ∈ T is leaf} 5.4. 133 X1 X2 X3 1 0 1 X4 : 2 2 2 1 0 3 3 10
  8. 8. 11
  9. 9. 5.1 EXACT COVER BY 3-SET • NP NP • NP NP EXACT EXACT COVER BY 3-SET COVER BY 3-SET EXACT COVER BY 3-SET • EXACT COVER BY 3-SET 5.2 3 X 3 X S = {T 1, T 2, ...} S1 ⊂ S NP (1) ∪{T |T ∈ S1 } = X 5.4. X S1 135 134 5 (2) i=j Ti ∩ Tj = φ S 1 5.4 EXACT COVER BY 3-SET 1 2 3 {{1, 4, 7}, {2, 3, 5}, {6, 8, 9}} X EXACT COVER BY 3-SET 4 5 6 EXACT COVER : NP BY 3-SET 7 8 9 12
  10. 10. BY 3-SET •: NP • • X 1 |X| • X Y = {y1 , y2 , ..., y|X| } |X| 0 Y t X Y Y = {y1 , y2 , . . . X |X| } ,y 1 Y • t 0X Y X 1 Y 0 1 t∈X t[A] = 0 t∈Y X 3 T1 , T 2 , . . . 1 0 yi 1 1 t ∈ Ti 1 t = yi
  11. 11. 0 t∈Y •X X 3 3 T1 , T 2 , . . . T1 , T 2 , . . . 1 1 0 0 yi • yi 1 1 1 t ∈ Ti 1 t = yi t[Ti ] = , t[yi ] = 0 t ∈ Ti 0 t = yi • Ti yi • • 2 • Ti 9 3 5.5(A) • 5.5(B)
  12. 12. • • |X| |X| |X| 1 + 2 + ··· + + 3 3 • • EXACT COVER BY 3-SET 136 EXACT COVER BY 3-SET5 : : Y T1 T2 T3 yi yi T1 |Y | = |X| 9 5.6(A) 1 T2 1 T3 1 + 2 + · · · + |X| + |X|1 0 (A) (B)
  13. 13. 5.6(A) • Ti yi yi yi |Y| = |X| • 1 + 2 + · · · + |X| + |X| • 2 1 + 2 + ··· + |X| |X| + 3 3 • EXACT COVER BY 3-SET EXACT COVER BY 3-SET 1 + 2 + · · · + |X|/3 + |X|/3 NP 5.4. 137 y1 5.4.2 0 y2 y1 y2 y3 NP 0 y9 0 1 (A) (B)
  14. 14. 17
  15. 15. T T’ Yes No Yes No (A) (B) 18
  16. 16. S = {(x1 , c1 ), (x2 , c2 ), . . . , (xN , cN )} H(C) = −p log2 p − p× log2 p× p p× p 19
  17. 17. • 4 6 p = , p× = 10 10 • H(C) = −p log2 p − p× log2 p× 4 4 6 6 = − log2 − log2 = 0.971 10 10 10 10 20
  18. 18. • 30 YES NO C: 2 2 4 × 2 4 6 4 6 10 21
  19. 19. T1: 30 YES NO C: 2 2 4 × 2 4 6 4 6 10 2 2 2 2 H(C | T1 = Yes) = − log2 − log2 = 1.0 4 4 4 4 2 2 4 4 H(C | T1 = No) = − log2 − log2 = 0.918 6 6 6 6 • 4 6 H(C | T1 ) = H(C | T1 = Yes) + H(C | T1 = No) = 0.951 10 10 22
  20. 20. • T I(T) I(T ) = H(C) − H(C | T ) • I(T1 ) = H(C) − H(C | T1 ) = 0.971 − 0.951 = 0.020 • I(T2 ) = 0.420, I(T3 ) = 0.091, I(T4 ) = 0.420 • T2 T2 • T4 T2 23
  21. 21. T2: Yes No • • • •
  22. 22. • Yes 4 4 2 2 H(C) = − log2 − log2 = 0.918 6 6 6 6 T1 4 2 2 2 2 H(C | T1 ) = − log2 − log2 6 4 4 4 4 2 2 2 − log2 6 2 2 = 0.667 I(T1 ) = 0.918 − 0.667 = 0.251 I(T3 ) = 0, I(T4 ) = 0.918 T4 25
  23. 23.
  24. 24. • • • naive bayes • • • • • • • • 27
  25. 25. • • • • ID3 2 • • • CART (Classification And Regression Tree) C4.5 28
  26. 26. • CART • 2 • • • C4.5 • • • • • • Forest 29

×