Upcoming SlideShare
×

# Datamining 2nd decisiontree

584 views
551 views

Published on

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
584
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
5
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Datamining 2nd decisiontree

1. 1. (1/3)• •• • •
2. 2. (2/3)• ( Classiﬁcation, Pattern Recognition) • • • A
3. 3. (3/3)• (Clustering) • • A B • A B• (Association Rules) • • A B •
4. 4. •• 9
5. 5. • K L 10
6. 6. • • •• • T T’ Yes No Yes No (A) (B) 11
7. 7. • T v∈T cost(v) v T . {cost(x) | x ∈ T is leaf}5.4. 133 X1 X2 X3 1 0 1 X4 : 2 2 2 1 0 3 3 12
8. 8. 13
9. 9. 5.1 EXACT COVER BY 3-SET • NP NP • NP NP EXACT EXACT COVER BY 3-SET COVER BY 3-SET EXACT COVER BY 3-SET • EXACT COVER BY 3-SET 5.2 3 X 3 X S = {T 1, T 2, ...} S1 ⊂ S NP (1) ∪{T |T ∈ S1 } = X 5.4. X S1 135134 5 (2) i=j Ti ∩ Tj = φ S 1 5.4 EXACT COVER BY 3-SET 1 2 3 {{1, 4, 7}, {2, 3, 5}, {6, 8, 9}} X EXACT COVER BY 3-SET 4 5 6 EXACT COVER : NP BY 3-SET 7 8 9 14
10. 10. BY 3-SET •: NP • • X 1 |X| • X Y = {y1 , y2 , ..., y|X| } |X| 0 Y t X Y Y = {y1 , y2 , . . . X |X| } ,y 1 Y • t 0X Y X 1 Y 0 1 t∈X t[A] = 0 t∈YX 3 T1 , T 2 , . . . 1 0 yi 1 1 t ∈ Ti 1 t = yi
11. 11. 0 t∈Y•X X 3 3 T1 , T 2 , . . . T1 , T 2 , . . . 1 1 0 0 yi• yi 1 1 1 t ∈ Ti 1 t = yi t[Ti ] = , t[yi ] = 0 t ∈ Ti 0 t = yi• Ti yi•• 2 • Ti 9 3 5.5(A) • 5.5(B)
12. 12. • • |X| |X| |X| 1 + 2 + ··· + + 3 3 • • EXACT COVER BY 3-SET 136 EXACT COVER BY 3-SET5 : :Y T1 T2 T3 yi yi T1 |Y | = |X| 9 5.6(A) 1 T2 1 T3 1 + 2 + · · · + |X| + |X|1 0 (A) (B)
13. 13. 5.6(A)• Ti yi yi yi |Y| = |X|• 1 + 2 + · · · + |X| + |X|• 2 1 + 2 + ··· + |X| |X| + 3 3• EXACT COVER BY 3-SET EXACT COVER BY 3-SET 1 + 2 + · · · + |X|/3 + |X|/3 NP 5.4. 137 y1 5.4.2 0 y2 y1 y2 y3 NP 0 y9 0 1 (A) (B)
14. 14. 19
15. 15. T T’ Yes No Yes No(A) (B) 20
16. 16. S = {(x1 , c1 ), (x2 , c2 ), . . . , (xN , cN )} H(C) = −p log2 p − p× log2 p× p p×p 21
17. 17. • 4 6 p = , p× = 10 10• H(C) = −p log2 p − p× log2 p× 4 4 6 6 = − log2 − log2 = 0.971 10 10 10 10 22
18. 18. • 30 YES NO C: 2 2 4 × 2 4 6 4 6 10 23
19. 19. T1: 30 YES NO C: 2 2 4 × 2 4 6 4 6 10 2 2 2 2 H(C | T1 = Yes) = − log2 − log2 = 1.0 4 4 4 4 2 2 4 4 H(C | T1 = No) = − log2 − log2 = 0.918 6 6 6 6 • 4 6H(C | T1 ) = H(C | T1 = Yes) + H(C | T1 = No) = 0.951 10 10 24
20. 20. • T I(T) I(T ) = H(C) − H(C | T )• I(T1 ) = H(C) − H(C | T1 ) = 0.971 − 0.951 = 0.020• I(T2 ) = 0.420, I(T3 ) = 0.091, I(T4 ) = 0.420• T2 T2 • T4 T2 25
21. 21. T2: Yes No• • ••
22. 22. • Yes 4 4 2 2 H(C) = − log2 − log2 = 0.918 6 6 6 6 T1 4 2 2 2 2 H(C | T1 ) = − log2 − log2 6 4 4 4 4 2 2 2 − log2 6 2 2 = 0.667 I(T1 ) = 0.918 − 0.667 = 0.251 I(T3 ) = 0, I(T4 ) = 0.918 T4 27
23. 23.
24. 24. • • • naive bayes • • •• • • • • 29
25. 25. •• • • ID3 2• • • CART (Classiﬁcation And Regression Tree) C4.5 30
26. 26. • CART • 2 • •• C4.5 • •• • • • Forest 31
27. 27. (10/21)• sesejun+dm10@sel.is.ocha.ac.jp•• 11/2( )• • http://togodb.sel.is.ocha.ac.jp/ 22 2010 32
28. 28. 33