SlideShare a Scribd company logo
1 of 79
Download to read offline
BOAZ 11๊ธฐ ์กฐ๋‹จ๋น„
01 02 03
๋ฐฐ๊น… ๋ถ€์ŠคํŒ… ์Šคํƒœํ‚น
01 02 03
๋ชจ๋ธ ํ˜•์„ฑ๊ณผ์ • ์ •๋ฆฌ
Decision Tree
๋ถ„๋ฅ˜ / ์˜ˆ์ธก์„ ํ•˜๋Š” ์ง€๋„ํ•™์Šต ๋ฐฉ๋ฒ•๋ก 
์˜์‚ฌ๊ฒฐ์ • ๊ทœ์น™(Decision Rule)์— ๋”ฐ๋ผ์„œ
Tree๋ฅผ ์ƒ์„ฑํ•œ ํ›„์— ๋ถ„๋ฅ˜ / ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก 
3
๋ชฉ์ 
1). ์„ธ๋ถ„ํ™”(Segmentation)
๋ฐ์ดํ„ฐ๋ฅผ ๋น„์Šทํ•œ ํŠน์„ฑ์„ ๊ฐ–๋Š” ๋ช‡ ๊ฐœ์˜ ๊ทธ๋ฃน์œผ๋กœ ๋ถ„ํ• ,
๊ทธ๋ฃน๋ณ„ ํŠน์„ฑ์„ ๋ฐœ๊ฒฌํ•˜๋Š” ๊ฒฝ์šฐ ๋˜๋Š” ๊ฐ ๋ฐ์ดํ„ฐ๊ฐ€ ์–ด๋–ค ์ง‘๋‹จ์— ์†ํ•˜๋Š”์ง€๋ฅผ ํŒŒ์•…ํ•˜๊ณ ์ž ํ•˜๋Š” ๊ฒฝ์šฐ
Ex) ์‹œ์žฅ ์„ธ๋ถ„ํ™”, ๊ณ ๊ฐ ์„ธ๋ถ„ํ™”
2). ๋ถ„๋ฅ˜(Classification)
๊ด€์ธก๊ฐœ์ฒด๋ฅผ ์—ฌ๋Ÿฌ ์˜ˆ์ธก๋ณ€์ˆ˜๋“ค์— ๊ทผ๊ฑฐํ•ด ๋ชฉํ‘œ๋ณ€์ˆ˜์˜ ๋ฒ”์ฃผ๋ฅผ ๋ช‡ ๊ฐœ์˜ ๋“ฑ๊ธ‰์œผ๋กœ ๋ถ„๋ฅ˜ํ•˜๊ณ ์ž ํ•˜๋Š” ๊ฒฝ์šฐ
Ex) ๊ณ ๊ฐ์„ ์‹ ์šฉ๋„์— ๋”ฐ๋ผ ์šฐ๋Ÿ‰/๋ถˆ๋Ÿ‰์œผ๋กœ ๋ถ„๋ฅ˜
3). ์˜ˆ์ธก(Prediction)
์ž๋ฃŒ์—์„œ ๊ทœ์น™์„ ์ฐพ์•„๋‚ด๊ณ  ์ด๋ฅผ ์ด์šฉํ•ด ๋ฏธ๋ž˜์˜ ์‚ฌ๊ฑด์„ ์˜ˆ์ธกํ•˜๊ณ ์ž ํ•˜๋Š” ๊ฒฝ์šฐ
Ex) ๊ณ ๊ฐ์†์„ฑ์— ๋”ฐ๋ผ ๋Œ€์ถœํ•œ๋„์•ก์„ ์˜ˆ์ธก
4). ์ฐจ์›์ถ•์†Œ ๋ฐ ๋ณ€์ˆ˜์„ ํƒ(Data reduction and variable screening)
๋งค์šฐ ๋งŽ์€ ์ˆ˜์˜ ์˜ˆ์ธก๋ณ€์ˆ˜ ์ค‘์—์„œ ๋ชฉํ‘œ๋ณ€์ˆ˜์— ํฐ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ๋ณ€์ˆ˜๋“ค์„ ๊ณจ๋ผ๋‚ด๊ณ ์ž ํ•˜๋Š” ๊ฒฝ์šฐ
5). ๊ตํ˜ธ์ž‘์šฉํšจ๊ณผ์˜ ํŒŒ์•…(Interaction effect identification)
์—ฌ๋Ÿฌ ๊ฐœ์˜ ์˜ˆ์ธก๋ณ€์ˆ˜๋“ค์„ ๊ฒฐํ•ฉํ•ด ๋ชฉํ‘œ๋ณ€์ˆ˜์— ์ž‘์šฉํ•˜๋Š” ๊ทœ์น™(๊ตํ˜ธ์ž‘์šฉํšจ๊ณผ)์„ ํŒŒ์•…ํ•˜๊ณ ์ž ํ•˜๋Š” ๊ฒฝ์šฐ
6). ๋ฒ”์ฃผ์˜ ๋ณ‘ํ•ฉ ๋˜๋Š” ์—ฐ์†ํ˜• ๋ณ€์ˆ˜์˜ ์ด์‚ฐํ™”(Binning)
๋ฒ”์ฃผํ˜• ๋ชฉํ‘œ๋ณ€์ˆ˜์˜ ๋ฒ”์ฃผ๋ฅผ ์†Œ์ˆ˜์˜ ๋ช‡ ๊ฐœ๋กœ ๋ณ‘ํ•ฉํ•˜๊ฑฐ๋‚˜ ์—ฐ์†ํ˜• ๋ชฉํ‘œ๋ณ€์ˆ˜๋ฅผ ๋ช‡ ๊ฐœ์˜ ๋“ฑ๊ธ‰์œผ๋กœ
์ด์‚ฐํ™” ํ•˜๊ณ ์ž ํ•˜๋Š” ๊ฒฝ์šฐ
4
๋ถ€๋ชจ๋งˆ๋””(parent node)
> ์ž์‹๋งˆ๋””์˜ ์ƒ์œ„๋งˆ๋””
์ž์‹๋งˆ๋””(child node)
> ํ•˜๋‚˜์˜ ๋งˆ๋””๋กœ๋ถ€ํ„ฐ ๋ถ„๋ฆฌ๋˜์–ด ๋‚˜๊ฐ„ ๋งˆ๋””
๊นŠ์ด(depth)
> ๊ฐ€์ง€๋ฅผ ์ด๋ฃจ๊ณ  ์žˆ๋Š” ๋งˆ๋””์˜ ๊ฐœ์ˆ˜
๊ฐ€์ง€(branch)
> ํ•˜๋‚˜์˜ ๋งˆ๋””๋กœ๋ถ€ํ„ฐ ๋๋งˆ๋””๊นŒ์ง€ ์—ฐ๊ฒฐ๋œ ๋งˆ๋””๋“ค
๋ถ€๋ชจ๋งˆ๋””
์ž์‹๋งˆ๋””
๊นŠ์ด:3
5
๋ฟŒ๋ฆฌ๋งˆ๋””
์ค‘๊ฐ„๋งˆ๋””
๋๋งˆ๋””
๋ฟŒ๋ฆฌ๋งˆ๋””(root node)
> ๋‚˜๋ฌด๊ตฌ์กฐ๊ฐ€ ์‹œ์ž‘๋˜๋Š” ๋งˆ๋””
> ์‹œ์ž‘๋˜๋Š” ๋งˆ๋””๋กœ ์ „์ฒด ์ž๋ฃŒ๊ฐ€ ํฌํ•จ
์ค‘๊ฐ„๋งˆ๋””(internal node)
> ์ค‘๊ฐ„์— ์žˆ๋Š” ๋๋งˆ๋””๊ฐ€ ์•„๋‹Œ ๋งˆ๋””
> ๋ถ€๋ชจ๋งˆ๋””์™€ ์ž์‹๋งˆ๋””๊ฐ€ ๋ชจ๋‘ ์žˆ๋Š” ๋งˆ๋””
๋๋งˆ๋””(terminal node, leaf)
> ๊ฐ ๋‚˜๋ฌด์ค„๊ธฐ์˜ ๋์— ์œ„์น˜ํ•˜๋Š” ๋งˆ๋””
> ์ž์‹๋งˆ๋””๊ฐ€ ์—†๋Š” ๋งˆ๋””
6
์ด๋ฆ„ ํ†ตํ•™์‹œ๊ฐ„(๋ถ„) ์–‘์น˜์งˆ ๋งˆ์Šคํฌ ๊ฐ๊ธฐ
์ด์˜ˆ์ฐฌ 40 X O No
๊น€์˜๋„ 35 O O No
๋ฐ•์ •ํ˜„ 110 O X Yes
๊น€ํƒœํฌ 85 O O No
ํ™์ง€๋ฏผ 100 X O Yes
์ข…์†๋ณ€์ˆ˜
Ex) ๊ฐ๊ธฐ ์œ ๋ฌด๋ฅผ ๋งž์ถ”์ž!
7
8
Train_data
์ด๋ฆ„ ํ†ตํ•™์‹œ๊ฐ„(๋ถ„) ์–‘์น˜์งˆ ๋งˆ์Šคํฌ ๊ฐ๊ธฐ
์ด์˜ˆ์ฐฌ 40 X O No
๊น€์˜๋„ 35 O O No
๋ฐ•์ •ํ˜„ 110 O X Yes
๊น€ํƒœํฌ 85 O O No
ํ™์ง€๋ฏผ 100 X O Yes
ํ†ตํ•™์‹œ๊ฐ„ >= 60
์–‘์น˜์งˆ ์—ฌ๋ถ€
๊ฐ๊ธฐ: No
๊ฐ๊ธฐ: Yes ๋งˆ์Šคํฌ ์—ฌ๋ถ€
No Yes
No Yes
๊ฐ๊ธฐ: Yes ๊ฐ๊ธฐ: No
No Yes
Train_data๋ฅผ ์™„๋ฒฝ ๋ถ„๋ฅ˜!
์ •ํ™•๋„=1
์„ฑ๋Šฅ ์ข‹์€ Tree
9
Train_data
์ด์˜ˆ์ฐฌ
๊น€์˜๋„
ํ™์ง€๋ฏผ
๋ฐ•์ •ํ˜„ ๊น€ํƒœํฌ
์ด๋ฆ„ ํ†ตํ•™์‹œ๊ฐ„(๋ถ„) ์–‘์น˜์งˆ ๋งˆ์Šคํฌ ๊ฐ๊ธฐ
์ด์˜ˆ์ฐฌ 40 X O No
๊น€์˜๋„ 35 O O No
๋ฐ•์ •ํ˜„ 110 O X Yes
๊น€ํƒœํฌ 85 O O No
ํ™์ง€๋ฏผ 100 X O Yes
ํ†ตํ•™์‹œ๊ฐ„ >= 60
์–‘์น˜์งˆ ์—ฌ๋ถ€
๊ฐ๊ธฐ: No
๊ฐ๊ธฐ: Yes ๋งˆ์Šคํฌ ์—ฌ๋ถ€
No Yes
No Yes
๊ฐ๊ธฐ: Yes ๊ฐ๊ธฐ: No
No Yes
10
Train_data
์ด์˜ˆ์ฐฌ
๊น€์˜๋„
ํ™์ง€๋ฏผ
๋ฐ•์ •ํ˜„ ๊น€ํƒœํฌ
์กฐ๋‹จ๋น„ 50 O X No
Test_data
์ด๋ฆ„ ํ†ตํ•™์‹œ๊ฐ„(๋ถ„) ์–‘์น˜์งˆ ๋งˆ์Šคํฌ ๊ฐ๊ธฐ
์ด์˜ˆ์ฐฌ 40 X O No
๊น€์˜๋„ 35 O O No
๋ฐ•์ •ํ˜„ 110 O X Yes
๊น€ํƒœํฌ 85 O O No
ํ™์ง€๋ฏผ 100 X O Yes
ํ†ตํ•™์‹œ๊ฐ„ >= 60
์–‘์น˜์งˆ ์—ฌ๋ถ€
๊ฐ๊ธฐ: No
๊ฐ๊ธฐ: Yes ๋งˆ์Šคํฌ ์—ฌ๋ถ€
No Yes
No Yes
๊ฐ๊ธฐ: Yes ๊ฐ๊ธฐ: No
No Yes
์กฐ๋‹จ๋น„?? -> ์˜ค๋ถ„๋ฅ˜
11
Train_data
์ด์˜ˆ์ฐฌ
๊น€์˜๋„
ํ™์ง€๋ฏผ
์กฐ๋‹จ๋น„ 50 O X No
Test_data
์ด๋ฆ„ ํ†ตํ•™์‹œ๊ฐ„(๋ถ„) ์–‘์น˜์งˆ ๋งˆ์Šคํฌ ๊ฐ๊ธฐ
์ด์˜ˆ์ฐฌ 40 X O No
๊น€์˜๋„ 35 O O No
๋ฐ•์ •ํ˜„ 110 O X Yes
๊น€ํƒœํฌ 85 O O No
ํ™์ง€๋ฏผ 100 X O Yes
ํ†ตํ•™์‹œ๊ฐ„ >= 60
์–‘์น˜์งˆ ์—ฌ๋ถ€
๊ฐ๊ธฐ: No
๊ฐ๊ธฐ: Yes ๊ฐ๊ธฐ: No
No Yes
No Yes
๊น€ํƒœํฌ
์กฐ๋‹จ๋น„
์งˆ๋ฌธ 1๊ฐœ๋ฅผ ๋œ ํ–ˆ๋”๋ผ๋ฉด?
Train_data๋ฅผ ์™„๋ฒฝํ•˜๊ฒŒ ๋ถ„๋ฅ˜ํ•˜์ง„ ๋ชปํ•˜์ง€๋งŒ Test_data๋ฅผ ๋งž์ถœ ์ˆ˜ ์žˆ๋‹ค.
โ€œ์–ด๋–ค ๊ธฐ์ค€์œผ๋กœ ์งˆ๋ฌธ์„ ํ•ด์•ผํ• ๊นŒ?โ€
โ€œ์–ด๋–ค ๊ธฐ์ค€์œผ๋กœ ์งˆ๋ฌธ์„ ์ž˜๋ผ์•ผ ํ• ๊นŒ?โ€
๋ถ„์„์˜ ๋ชฉ์ ๊ณผ ์ž๋ฃŒ๊ตฌ์กฐ์— ๋”ฐ๋ผ์„œ ์ ์ ˆํ•œ ๋ถ„ํ• ๊ธฐ์ค€๊ณผ ์ •์ง€๊ทœ์น™์„ ์ง€์ •ํ•˜๊ณ 
์˜ˆ์ธก๊ฐ’์„ ํ• ๋‹นํ•˜์—ฌ ์˜์‚ฌ๊ฒฐ์ • ๋‚˜๋ฌด๋ฅผ ์–ป๋Š”๋‹ค.
์˜์‚ฌ๊ฒฐ์ •๋‚˜๋ฌด์˜ ํ˜•์„ฑ
โ€œ๋ฟŒ๋ฆฌ๋งˆ๋””์˜ ์งˆ๋ฌธ์ด ์™œ ๋‚จ์ž์ธ๊ฐ€โ€
> ๋ถ„ํ• ๊ธฐ์ค€
โ€œ๋๋งˆ๋””์˜ ์œ„์น˜๊ฐ€ ์™œ ๋‹ค๋ฅธ๊ฐ€๏ผ‚
> ์ •์ง€๊ทœ์น™
โ€œ์ตœ์ข… ๋ถ„๋ฅ˜(์˜ˆ์ธก)์‹œ ์–ด๋–ค ๊ฐ’์„ ํ• ๋‹นํ•˜๋Š”๊ฐ€โ€œ
> ์˜ˆ์ธก๊ฐ’ ํ• ๋‹น
12
ํ˜•์„ฑ๊ณผ์ •
Split Rule(Growing Rule): Tree์˜ ๋ถ„๋ฆฌ ๊ทœ์น™/ ์„ฑ์žฅ ๊ทœ์น™
- ๋ถ€๋ชจ ๋งˆ๋””๋กœ๋ถ€ํ„ฐ ์ž์‹ ๋งˆ๋””๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ธฐ์ค€
- ์ˆœ์ˆ˜๋„: ๋ชฉํ‘œ๋ณ€์ˆ˜์˜ ํŠน์ • ๋ฒ”์ฃผ์— ๊ฐœ์ฒด๋“ค์ด ํฌํ•จ๋˜์–ด ์žˆ๋Š” ์ •๋„
- ๋ชฉํ‘œ ๋ณ€์ˆ˜ ๊ตฌ๋ณ„ ์ •๋„๋ฅผ ์ˆœ์ˆ˜๋„ or ๋ถˆ์ˆœ๋„์— ์˜ํ•ด์„œ ์ธก์ •
- ์ˆœ์ˆ˜๋„๊ฐ€ ์ฆ๊ฐ€ํ• ์ˆ˜๋ก ๋ถ„ํ• ๊ธฐ์ค€์— ๋”ฐ๋ผ ์ž์‹๋งˆ๋”” ํ˜•์„ฑ
Stop Rule: ์ •์ง€ ๊ทœ์น™
- ๋ถ„๋ฆฌ๊ฐ€ ๋” ์ด์ƒ ์ผ์–ด๋‚˜์ง€ ์•Š๋„๋ก ํ•˜๋Š” ๊ธฐ์ค€
Pruning: ๊ฐ€์ง€์น˜๊ธฐ
- Tree๊ฐ€ ์ง€๋‚˜์น˜๊ฒŒ ๋งŽ์€ ๋งˆ๋””๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์„ ๊ฒฝ์šฐ, overfitting์˜ ๋ฌธ์ œ ๋ฐœ์ƒ
- ๋ถ€์ ์ ˆํ•œ ๋งˆ๋””๋ฅผ ์ž˜๋ผ๋‚ด ๋ชจํ˜•์„ ๋‹จ์ˆœํ™”
13
์˜์‚ฌ๊ฒฐ์ •๋‚˜๋ฌด์˜ ํ˜•์„ฑ
ํ˜•์„ฑ๊ณผ์ •
ํƒ€๋‹น์„ฑ ํ‰๊ฐ€
์ด์ต๋„ํ‘œ(Gain chart)๋‚˜ ์œ„ํ—˜๋„ํ‘œ(Risk chart) ๋˜๋Š” Cross validation์„
์ด์šฉํ•ด์„œ tree ํ‰๊ฐ€
ํ•ด์„ ๋ฐ ์˜ˆ์ธก
์ƒ์„ฑํ•œ Tree์— ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ๋Œ€์ž… ํ›„, ๋ถ„๋ฅ˜ ๋ฐ ์˜ˆ์ธก
ํƒ€๋‹น์„ฑ ํ‰๊ฐ€ & ํ•ด์„ ๋ฐ ์˜ˆ์ธก
14
ํ˜•์„ฑ๊ณผ์ •
๋ถ„๋ฅ˜ ํŠธ๋ฆฌ
์ข…์†๋ณ€์ˆ˜๊ฐ€ ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜ ์ผ ๋•Œ,
๋งŒ๋“ค์–ด์ง€๋Š” ๋ชจ๋ธ(ํŠธ๋ฆฌ)
ํšŒ๊ท€ ํŠธ๋ฆฌ
์ข…์†๋ณ€์ˆ˜๊ฐ€ ์—ฐ์†ํ˜• ๋ณ€์ˆ˜ ์ผ ๋•Œ,
๋งŒ๋“ค์–ด์ง€๋Š” ๋ชจ๋ธ(ํŠธ๋ฆฌ)
Tree๋ฅผ ๋งŒ๋“œ๋Š” ๊ธฐ์ค€?
๋‘๊ฐ€์ง€ ํŠธ๋ฆฌ์˜ ๋งŒ๋“œ๋Š” ๋ฐฉ์‹์€ ๋™์ผ,
Split(growing) Rule๊ณผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ์‹์ด ๋‹ค๋ฅด๋‹ค!
15
๋ถ„๋ฅ˜ ํŠธ๋ฆฌ
๋ถ„๋ฆฌ ๊ทœ์น™(Split Rule)
๋ชฉํ‘œ ๋ณ€์ˆ˜๊ฐ€ ๊ฐ ๋ฒ”์ฃผ์— ์†ํ•˜๋Š” ๋นˆ๋„์— ๊ธฐ์ดˆํ•˜์—ฌ ๋ถ„๋ฆฌ๊ฐ€ ์ผ์–ด๋‚œ๋‹ค.
๋ถˆ์ˆœ๋„: ์–ผ๋งˆ๋‚˜ ๋‹ค๋ฅธ ๋ฒ”์ฃผ์˜ ๊ฐœ์ฒด๋“ค์ด ํฌํ•จ๋˜์–ด ์žˆ๋Š”๊ฐ€
- ์—”ํŠธ๋กœํ”ผ ๊ณ„์ˆ˜(Entropy index)
- ์ง€๋‹ˆ ๊ณ„์ˆ˜(Gini index)
- ์นด์ด์ œ๊ณฑ ํ†ต๊ณ„๋Ÿ‰์˜ p-value
> ๋ถˆ์ˆœ๋„๊ฐ€ ๊ฐ€์žฅ ์ž‘์€ ๋ถ„ํ•  ๊ธฐ์ค€์— ๋”ฐ๋ผ Split
16
17
๋ถ„๋ฅ˜ ํŠธ๋ฆฌ
์—”ํŠธ๋กœํ”ผ ๊ณ„์ˆ˜
์—”ํŠธ๋กœํ”ผ ๊ณ„์ˆ˜๊ฐ€ ๊ฐ€์žฅ ์ž‘์€
๋ถ„ํ•  ๊ธฐ์ค€์— ๋”ฐ๋ผ ์ž์‹ ๋งˆ๋””๋ฅผ ์ƒ์„ฑ
์ง€๋‹ˆ ๊ณ„์ˆ˜
์ง€๋‹ˆ ๊ณ„์ˆ˜๋ฅผ ๊ฐ์†Œ์‹œ์ผœ์ฃผ๋Š”
๋ถ„ํ•  ๊ธฐ์ค€์— ๋”ฐ๋ผ ์ž์‹ ๋งˆ๋””๋ฅผ ์ƒ์„ฑ
18
๋ถ„๋ฅ˜ ํŠธ๋ฆฌ
> Pk: A์˜์—ญ์— ์†ํ•˜๋Š” ๊ด€์ธก์น˜ ์ค‘ K ์ง‘๋‹จ์— ์†ํ•˜๋Š” ๊ด€์ธก์น˜ ๋น„์œจ
G(L) = 1 โˆ’
7
8
2
+
1
8
2
= 0.21875
G(R) = 1 โˆ’
3
8
2
+
5
8
2
= 0.46875
Gini =
8
16
ร— 0.21875 +
8
16
ร— 0.46875 = 0.34375
> Pk: A์˜์—ญ์— ์†ํ•˜๋Š” ๊ด€์ธก์น˜ ์ค‘ K ์ง‘๋‹จ์— ์†ํ•˜๋Š” ๊ด€์ธก์น˜ ๋น„์œจ
Entropy =
8
16
ร— 0.5436 +
8
16
ร— 0.9544 = 0.749
E(L) = -
7
8
ร— log2
7
8
+
1
8
ร— log2
1
8
= 0.5436
E(R) = -
3
8
ร— log2
3
8
+
5
8
ร— log2
5
8
= 0.9544
์ง€๋‹ˆ ๊ณ„์ˆ˜
์—”ํŠธ๋กœํ”ผ ๊ณ„์ˆ˜
์˜ค๋ถ„๋ฅ˜์˜ค์ฐจ
ํ•œ์ชฝ ๋ฒ”์ฃผ์— ์†ํ•œ ๋น„์œจp์— ๋”ฐ๋ฅธ ๋ถˆ์ˆœ๋„์˜ ๋ณ€ํ™”๋Ÿ‰์„ ๊ทธ๋ž˜ํ”„๋กœ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ
> ๋น„์œจ์ด 0.5(๋‘ ๋ฒ”์ฃผ๊ฐ€ ๊ฐ๊ฐ ๋ฐ˜๋ฐ˜์”ฉ ์„ž์—ฌ ์žˆ๋Š” ๊ฒฝ์šฐ)์ผ ๋•Œ ๋ถˆ์ˆœ๋„๊ฐ€ ์ตœ๋Œ€.
์˜์—ญ A์— ์†ํ•œ ๋ชจ๋“  ๊ด€์ธก์น˜๊ฐ€
๋™์ผํ•œ ๋ฒ”์ฃผ์— ์†ํ•  ๊ฒฝ์šฐ
๋ถˆํ™•์‹ค์„ฑ ์ตœ์†Œ = ์ˆœ๋„ ์ตœ๋Œ€
์—”ํŠธ๋กœํ”ผ = 0
๋ฒ”์ฃผ๊ฐ€ ๋‘๊ฐ€์ง€์ด๊ณ  ํ•ด๋‹น ๊ด€์ธก์น˜๊ฐ€
๋™์ผํ•˜๊ฒŒ ๋ฐ˜๋ฐ˜ ์„ž์—ฌ ์žˆ์„ ๊ฒฝ์šฐ
๋ถˆํ™•์‹ค์„ฑ ์ตœ๋Œ€ = ์ˆœ๋„ ์ตœ์†Œ
์—”ํŠธ๋กœํ”ผ = 0.5
19
P-value๊ฐ’์ด ๊ฐ€์žฅ ์ž‘์€ ๋ถ„ํ•  ๊ธฐ์ค€์— ๋”ฐ๋ผ ์ž์‹ ๋งˆ๋””๋ฅผ ์ƒ์„ฑ
> ์นด์ด์ œ๊ณฑ ํ†ต๊ณ„๋Ÿ‰์ด ๊ฐ€์žฅ ํฐ ๊ฒƒ (= p-value ๊ฐ’์ด ๊ฐ€์žฅ ์ž‘์€ ๊ฒƒ)
Good Bad Total
Left 32(56) 48(24) 80
Right 178(154) 42(66) 220
Total 210 90 300
๊ฐ ์…€์— ๋Œ€ํ•œ ((๊ธฐ๋Œ€๋„์ˆ˜ โ€“ ์‹ค์ œ๋„์ˆ˜)์˜ ์ œ๊ณฑ/๊ธฐ๋Œ€๋„์ˆ˜)์˜ ํ•ฉ
56โˆ’32 2
56
+
24โˆ’48 2
24
+
154โˆ’178 2
154
+
66โˆ’42 2
66
= 46.75
20
์นด์ด์ œ๊ณฑ ํ†ต๊ณ„๋Ÿ‰
๋ถ„๋ฅ˜ ํŠธ๋ฆฌ
21
๋ถ„๋ฅ˜ ํŠธ๋ฆฌ
Case 1
R1 R2
Case 2
R1 R2
Case 3
R1 R2
A B,C B A,C C A,B
Case ๋ณ„๋กœ ์ง€๋‹ˆ ๊ณ„์ˆ˜ ํ˜น์€ ์—”ํŠธ๋กœํ”ผ ๊ณ„์ˆ˜๋ฅผ ๋น„๊ตํ•ด ๊ฐ€์žฅ ์ž‘์€ ๊ฐ’์„ ๊ฐ–๋Š” Split rule(case)๋ฅผ ์„ ํƒํ•œ๋‹ค.
์„ค๋ช… ๋ณ€์ˆ˜: ๋ฒ”์ฃผํ˜• (โ€˜Aโ€™, โ€˜Bโ€™, โ€˜Cโ€™)์ผ ๊ฒฝ์šฐ
22
๋ถ„๋ฅ˜ ํŠธ๋ฆฌ
ํ†ตํ•™์‹œ๊ฐ„ >= 60
R1 R2
Yes No
์ง€๋‹ˆ๊ณ„์ˆ˜ ํ˜น์€ ์—”ํŠธ๋กœํ”ผ ๊ณ„์ˆ˜๋ฅผ ๋น„๊ตํ•ด ๊ฐ€์žฅ ์ž‘์€ ๊ฐ’์„ ๊ฐ–๋„๋กํ•˜๋Š” ๊ฒฝ๊ณ„๊ฐ’์„ ์ฐพ๋Š”๋‹ค.
์„ค๋ช… ๋ณ€์ˆ˜: ์—ฐ์†ํ˜•์ผ ๊ฒฝ์šฐ_ ๊ฒฝ๊ณ„๊ฐ’์˜ ๊ธฐ์ค€์„ ์ฐพ๋Š”๋‹ค.
๊ฒฝ๊ณ„๊ฐ’ ํ›„๋ณด
Ex1) 1์‚ฌ๋ถ„์œ„์ˆ˜ / 2์‚ฌ๋ถ„์œ„์ˆ˜ / 3์‚ฌ๋ถ„์œ„์ˆ˜
Ex2) seq(from=1์‚ฌ๋ถ„์œ„์ˆ˜, to=3์‚ฌ๋ถ„์œ„์ˆ˜, length.out=10)
Ex3) seq(from=min(x), to=max(x), length.out=10)
ํšŒ๊ท€ ํŠธ๋ฆฌ
๋ถ„๋ฆฌ ๊ทœ์น™(Split Rule)
๋ชฉํ‘œ ๋ณ€์ˆ˜์˜ ํ‰๊ท ๊ณผ ํ‘œ์ค€ํŽธ์ฐจ์— ๊ธฐ์ดˆํ•˜์—ฌ ๋งˆ๋””์˜ ๋ถ„๋ฆฌ๊ฐ€ ์ผ์–ด๋‚œ๋‹ค.
์˜ˆ์ธก๊ฐ’์˜ ์ข…๋ฅ˜๋Š” terminal node์˜ ๊ฐœ์ˆ˜์™€ ๋™์ผํ•˜๋‹ค.
๋ถˆ์ˆœ๋„: ์–ผ๋งˆ๋‚˜ ๋‹ค๋ฅธ ๋ฒ”์ฃผ์˜ ๊ฐœ์ฒด๋“ค์ด ํฌํ•จ๋˜์–ด ์žˆ๋Š”๊ฐ€
- F-ํ†ต๊ณ„๋Ÿ‰์˜ p-value๊ฐ’
- ๋ถ„์‚ฐ์˜ ๊ฐ์†Œ๋Ÿ‰(Variance reduction)
> ์ตœ์†Œ์ œ๊ณฑ ์ถ”์ •๋Ÿ‰(SSE)์„ ์ค„์—ฌ ๋‚˜๊ฐ€๋Š” ๋ฐฉ์‹์œผ๋กœ Split
23
์ตœ์†Œ ์ œ๊ณฑ ์ถ”์ •๋Ÿ‰(SSE)์„ ์ค„์—ฌ ๋‚˜๊ฐ€๋Š” ๋ฐฉ์‹์œผ๋กœ split
Node1
y11 = 10
y12 = 12
y13 = 13
y14 = 8
y15 = 7
เดค
๐‘ฆ = 10
Node2
y21 = 20
y22 = 24
y23 = 26
y24 = 16
y25 = 14
เดค
๐‘ฆ = 20
Node1_SSE
: 10 โˆ’ 10 2 + 12 โˆ’ 10 2 + โ€ฆ + 7 โˆ’ 10 2 = 26
Node2_SSE
: 20 โˆ’ 20 2
+ 24 โˆ’ 20 2
+ โ€ฆ + 14 โˆ’ 20 2
= 104
SSE์˜ ํ•ฉ:
5
10
ร— 26 +
5
10
ร— 104 = 65
SSE๋ฅผ ์ค„์—ฌ ๋‚˜๊ฐ€๋Š” ๋ฐฉ์‹์œผ๋กœ split
24
ํšŒ๊ท€ ํŠธ๋ฆฌ F-ํ†ต๊ณ„๋Ÿ‰ & ๋ถ„์‚ฐ์˜ ๊ฐ์†Œ๋Ÿ‰
25
ํšŒ๊ท€ ํŠธ๋ฆฌ
26
ํšŒ๊ท€ ํŠธ๋ฆฌ
X0 X1 X2 X3 Y
mse = ๊ฐ ๋…ธ๋“œ์˜ MSE๊ฐ’
sample = ๋ถ€๋ชจ ๋…ธ๋“œ๋กœ๋ถ€ํ„ฐ ๋‚ด๋ ค์˜จ ์ž์‹ ๋…ธ๋“œ์˜ sample ์ˆ˜
value = ๊ฐ ๋…ธ๋“œ์˜ sample๋“ค์˜ y ํ‰๊ท ๊ฐ’
์ •์ง€ ๊ทœ์น™
์ •์ง€๊ทœ์น™(Stop rule)
: ๋ถ„๋ฆฌ๊ฐ€ ๋” ์ด์ƒ ์ผ์–ด๋‚˜์ง€ ์•Š๋„๋ก ํ•˜๋Š” ๊ธฐ์ค€
1. ๋” ์ด์ƒ ๋ถ„๋ฆฌํ•ด๋„ ๋ถˆ์ˆœ๋„๊ฐ€ ์ค„์–ด๋“ค์ง€ ์•Š์„ ๊ฒฝ์šฐ
2. ์ž์‹ ๋งˆ๋””์— ๋‚จ์•„ ์žˆ๋Š” sample ์ˆ˜๊ฐ€ ์ ์€ ๊ฒฝ์šฐ
3. ๋ถ„์„์ž๊ฐ€ ๋ฏธ๋ฆฌ ์ •ํ•ด ๋†“์€ ๊นŠ์ด์— ๋„๋‹ฌํ–ˆ์„ ๊ฒฝ์šฐ
27
Pruning
์ ์ ˆํ•˜์ง€ ์•Š์€ ๋งˆ๋””๋ฅผ ์ œ๊ฑฐ
> ํ•™์Šต ๋ฐ์ดํ„ฐ์— overfitting ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ๊ฐ€๋Šฅ์„ฑ ์กด์žฌ
> ์ ๋‹นํ•œ ํฌ๊ธฐ์˜ ๋‚˜๋ฌด ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๋„๋ก ํ•˜๋Š” ๊ทœ์น™
28
Q. ์˜์‚ฌ๊ฒฐ์ • ๋‚˜๋ฌด์˜
๋ถ„๋ฅ˜๊ธฐ๊ฐ€ ์ฆ๊ฐ€ํ•˜๋ฉด?
A. ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ
์˜ค๋ถ„๋ฅ˜์œจ์ด ๊ฐ์†Œํ•œ๋‹ค.
Q. ๊ณ„์† ๋ถ„๋ฅ˜๊ธฐ๋ฅผ
๋ฌดํ•œํžˆ ์ฆ๊ฐ€ํ•˜๋ฉด?
A. ์ผ์ • ์ˆ˜์ค€ ์ด์ƒ์˜ ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ๊ฐ€์งˆ ๊ฒฝ์šฐ,
์˜ค๋ถ„๋ฅ˜์œจ์ด ์˜คํžˆ๋ ค ์ฆ๊ฐ€ํ•œ๋‹ค.
๊ฐ€์ง€์น˜๋Š” ์‹œ์ ?
>๊ฒ€์ฆ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์˜ค๋ถ„๋ฅ˜์œจ์ด ์ฆ๊ฐ€ํ•˜๋Š” ์‹œ์ 
๊ฐ€์ง€์น˜๊ธฐ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋ฒ„๋ฆฌ๋Š” ๊ฐœ๋…์ด ์•„๋‹Œ
๋ถ„๋ฅ˜๊ธฐ๋ฅผ ํ•ฉ์น˜๋Š” ๊ฐœ๋…์œผ๋กœ ์ดํ•ด
29
Pruning
๋น„์šฉ๋ณต์žก๋„๋ฅผ ์ตœ์†Œ๋กœ ํ•˜๋Š” ๋ถ„๊ธฐ๋ฅผ ์ฐพ์•„๋‚ด๋„๋ก ํ•™์Šต
CC(T): ์˜์‚ฌ๊ฒฐ์ • ๋‚˜๋ฌด์˜ ๋น„์šฉ ๋ณต์žก๋„
(= ์˜ค๋ฅ˜๊ฐ€ ์ ์œผ๋ฉด์„œ terminal node์ˆ˜๊ฐ€ ์ ์€ ๋‹จ์ˆœํ•œ ๋ชจ๋ธ์ผ์ˆ˜๋ก ์ž‘์€ ๊ฐ’)
Err(T): ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์˜ค๋ถ„๋ฅ˜์œจ(๋ถˆ์ˆœ๋„)
L(T): ๋๋งˆ๋””์˜ ์ˆ˜(๊ตฌ์กฐ์˜ ๋ณต์žก๋„)
a: Err(T)์™€ L(T)๋ฅผ ๊ฒฐํ•ฉํ•˜๋Š” ๊ฐ€์ค‘์น˜ (๋ณดํ†ต 0.01~0.1์˜ ๊ฐ’)
30
๋น„์šฉ๋ณต์žก๋„(cost-complexity)
Pruning
Splitํ•  ๋•Œ ์˜ค๋ถ„๋ฅ˜์œจ + (๋งˆ๋”” ์ˆ˜ *0.5) ๋งŒํผ์˜ ์˜ค์ฐจ๊ฐ€ ๋” ์žˆ๋‹ค๊ณ  ๊ฐ€์ •
> ๊ณผ์ ํ•ฉ์„ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด์„œ
์˜ค๋ถ„๋ฅ˜์œจ: 0.45
๋๋งˆ๋”” ์ˆ˜: 4
Pessimistic error
0.45+4*0.5 = 2.45
์˜ค๋ถ„๋ฅ˜์œจ: 0.66
๋๋งˆ๋”” ์ˆ˜: 3
Pessimistic error
0.66+3*0.5 = 2.16
Tree model์„ pruning ํ–ˆ์„ ๋•Œ๋–„
pessimistic error๊ฐ’์ด ๋” ๋‚ฎ์œผ๋ฏ€๋กœ,
pruning ์‹คํ–‰
31
Pessimistic pruning
Pruning
๋ถ„๋ฅ˜ ํŠธ๋ฆฌ
๋๋งˆ๋””์˜ ๋ฒ”์ฃผ์˜ voting๊ฐ’์œผ๋กœ ์˜ˆ์ธก
ํšŒ๊ท€ ํŠธ๋ฆฌ
๋๋งˆ๋””์˜ เดฅ
๐ฒ๋กœ ์˜ˆ์ธก
Node1
y11 = 10
y12 = 12
y13 = 13
y14 = 8
y15 = 7
เดค
๐‘ฆ = 10
Node2
y21 = 20
y22 = 24
y23 = 26
y24 = 16
y25 = 14
เดค
๐‘ฆ = 20
Yes: 90
No: 10
Yes: 34
No: 66
yes no
32
ํ•ด์„ ๋ฐ ์˜ˆ์ธก
๋‹จ์ 
์žฅ์ 
1. ํ•ด์„์˜ ์šฉ์ด์„ฑ
- ๋‚˜๋ฌด๊ตฌ์กฐ์— ์˜ํ•ด์„œ ๋ชจํ˜•์ด ํ‘œํ˜„๋˜๊ธฐ ๋•Œ๋ฌธ์—
ํ•ด์„์ด ์‰ฝ๋‹ค.
- ์ƒˆ๋กœ์šด ์ž๋ฃŒ์— ๋ชจํ˜•์„ ์ ํ•ฉ ์‹œํ‚ค๊ธฐ ์‰ฝ๋‹ค.
- ์–ด๋–ค ์ž…๋ ฅ๋ณ€์ˆ˜๊ฐ€ ์ค‘์š”ํ•œ์ง€ ํŒŒ์•…์ด ์‰ฝ๋‹ค.
2. ๊ตํ˜ธ์ž‘์šฉ ํšจ๊ณผ์˜ ํ•ด์„
- ๋‘ ๊ฐœ ์ด์ƒ์˜ ๋ณ€์ˆ˜๊ฐ€ ๊ฒฐํ•ฉํ•˜์—ฌ
๋ชฉํ‘œ๋ณ€์ˆ˜์— ์–ด๋– ํ•œ ์˜ํ–ฅ์„ ์ฃผ๋Š”์ง€ ์•Œ๊ธฐ ์‰ฝ๋‹ค.
3. ๋น„๋ชจ์ˆ˜์  ๋ชจํ˜•
- ์„ ํ˜•์„ฑ, ์ •๊ทœ์„ฑ, ๋“ฑ๋ถ„์‚ฐ์„ฑ์˜ ๊ฐ€์ •์ด
ํ•„์š”ํ•˜์ง€ ์•Š๋‹ค.
- ๋‹จ์ง€ ์ˆœ์œ„๋งŒ ๋ถ„์„์— ์˜ํ–ฅ์„ ์ฃผ๋ฏ€๋กœ
์ด์ƒ์น˜์— ๋ฏผ๊ฐํ•˜์ง€ ์•Š๋‹ค.
1. ๋น„์—ฐ์†์„ฑ
- ์—ฐ์†ํ˜• ๋ณ€์ˆ˜๋ฅผ ๋น„์—ฐ์†์ ์ธ ๊ฐ’์œผ๋กœ ์ทจ๊ธ‰ํ•˜์—ฌ
์˜ˆ์ธก์˜ค๋ฅ˜๊ฐ€ ํด ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋‹ค.
2. ์„ ํ˜•์„ฑ ๋˜๋Š” ์ฃผํšจ๊ณผ์˜ ๊ฒฐ์—ฌ
- ์„ ํ˜• ๋˜๋Š” ์ฃผํšจ๊ณผ ๋ชจํ˜•์—์„œ์™€ ๊ฐ™์€ ๊ฒฐ๊ณผ๋ฅผ
์–ป์„ ์ˆ˜ ์—†๋‹ค.
3. ๋น„์•ˆ์ •์„ฑ
- ๋ถ„์„์šฉ ์ž๋ฃŒ์—๋งŒ ์˜์กดํ•˜๋ฏ€๋กœ
์ƒˆ๋กœ์šด ์ž๋ฃŒ์˜ ์˜ˆ์ธก์— ๋ถˆ์•ˆ์ •ํ•˜๋‹ค.
33
์ •๋ฆฌ
Tree๋Š” ํ•ด์„์ด ์šฉ์ดํ•˜๊ณ , ์˜์‚ฌ๊ฒฐ์ • ๋ฐฉ๋ฒ•์ด ์ธ๊ฐ„์˜ ์˜์‚ฌ๊ฒฐ์ • ๋ฐฉ๋ฒ•๊ณผ ์œ ์‚ฌํ•˜์ง€๋งŒ,
์˜ˆ์ธก๋ ฅ์ด ๋งŽ์ด ๋–จ์–ด์ง€๊ณ , bias-variance ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค.
์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด!
Tree๋ฅผ ์—ฌ๋Ÿฌ ๊ฐœ ์ƒ์„ฑํ•˜๊ณ  ๊ฒฐํ•ฉํ•˜๋Š”
Ensemble (bagging, boosting, Random Forest)์„ ์‚ฌ์šฉํ•˜์—ฌ
Tree์˜ ์˜ˆ์ธก๋ ฅ์„ ๋†’์ธ๋‹ค.
34
์‰ฝ์‹œ๋‹ค!!!!
35
Ensemble
์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ถ„๋ฅ˜, ํšŒ๊ท€ ๋ชจํ˜•์„ ์กฐํ•ฉํ•˜์—ฌ ์˜ˆ์ธก์˜ ์ •ํ™•๋„๋ฅผ ๋†’์ด๋Š” ๋ฐฉ๋ฒ•
Bias & variance Trade off๋ฅผ ํ•ด๊ฒฐํ•ด error๋ฅผ ๋‚ฎ์ถ”๋Š” ๊ฒƒ์ด ๋ชฉ์ 
๋ฐฐ๊น…(Bagging) / ๋ถ€์ŠคํŒ…(Boosting) / ์Šคํƒœํ‚น(Stacking)
36
์•™์ƒ๋ธ”์„ ์™œ ์‚ฌ์šฉํ• ๊นŒ?
ํ•™์Šต์—์„œ์˜ ์˜ค๋ฅ˜
1. Underfitting(๋‚ฎ์€ bias)
2. Overfitting(๋†’์€ Variance)
ํŠนํžˆ ๋ฐฐ๊น…์€ ๊ฐ ์ƒ˜ํ”Œ์—์„œ ๋‚˜ํƒ€๋‚œ ๊ฒฐ๊ณผ๋ฅผ
์ผ์ข…์˜ ์ค‘๊ฐ„๊ฐ’์œผ๋กœ ๋งž์ถ”์–ด ์ฃผ๊ธฐ ๋•Œ๋ฌธ์—
overfitting์„ ํ”ผํ•  ์ˆ˜ ์žˆ๋‹ค.
๋ฒ”์ฃผํ˜•์ผ ๊ฒฝ์šฐ, Voting์œผ๋กœ ์ง‘๊ณ„
์—ฐ์†ํ˜•์ผ ๊ฒฝ์šฐ, Average๋กœ ์ง‘๊ณ„
์•™์ƒ๋ธ” ๊ธฐ๋ฒ•์„ ํ†ตํ•ด
์˜ค๋ฅ˜๋ฅผ ์ตœ์†Œํ™”!
37
์•™์ƒ๋ธ”์„ ์™œ ์‚ฌ์šฉํ• ๊นŒ?
38
https://www.youtube.com/watch?v=Un9zObFjBH0
๋ถ€์ŠคํŒ…
Boosting
๋ฐฐ๊น…
Bootstrap AGGregatING
(Bagging)
์—ฌ๋Ÿฌ ๋ฒˆ์˜ ์ƒ˜ํ”Œ์„ ๋ฝ‘์•„ ๊ฐ ๋ชจ๋ธ์„ ํ•™์Šต์‹œ์ผœ
๊ฒฐ๊ณผ๋ฅผ ์ง‘๊ณ„(Aggregating)ํ•˜๋Š” ๋ฐฉ๋ฒ•
Random Forest
์•ฝํ•œ ํ•™์Šต ๊ฒฐ๊ณผ๋ฅผ ํ•ฉ์ณ
๊ฐ•ํ•œ ํ•™์Šต ๊ฒฐ๊ณผ๋ฅผ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•
AdaBoost / XGBoost
GBM / Light GBM
39
40
Bagging
Bagging: Bootstrap aggregating์˜ ์ค€๋ง.
(Bootstrap: ๋ฐ์ดํ„ฐ์—์„œ ์ผ๋ถ€๋ฅผ ์ƒ˜ํ”Œ๋งํ•˜๋Š” ๊ฒƒ)
- ์› ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์œผ๋กœ๋ถ€ํ„ฐ ํฌ๊ธฐ๊ฐ€ ๊ฐ™์€ ํ‘œ๋ณธ์„ ์—ฌ๋Ÿฌ ๋ฒˆ ๋‹จ์ˆœ ์ž„์˜ ๋ณต์› ์ถ”์ถœํ•˜์—ฌ
๊ฐ ํ‘œ๋ณธ์— ๋Œ€ํ•ด ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ์ƒ์„ฑํ•œ ํ›„ ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ์•™์ƒ๋ธ” ํ•˜๋Š” ๋ฐฉ๋ฒ•
- ๋ฐ˜๋ณต ์ถ”์ถœ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์—
๊ฐ™์€ ๋ฐ์ดํ„ฐ๊ฐ€ ํ•œ ํ‘œ๋ณธ์— ์—ฌ๋Ÿฌ ๋ฒˆ ์ถ”์ถœ ๋  ์ˆ˜๋„, ์–ด๋–ค ๋ฐ์ดํ„ฐ๋Š” ์ถ”์ถœ๋˜์ง€ ์•Š์„ ์ˆ˜๋„ ์žˆ๋‹ค.
- ํ•˜๋‚˜์˜ ํ•™์Šต ๋ฐ์ดํ„ฐ์— overfitting ๋˜์–ด ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์—์„œ ์„ฑ๋Šฅ์ด ๋‚ฎ์„ ๊ฒฝ์šฐ
์ฆ‰, high variance์ผ ๊ฒฝ์šฐ๋ฅผ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ๋งŒ๋“ค์–ด ๊ฒฐํ•ฉํ•จ์œผ๋กœ์จ
low variance๋กœ ๋ณ€ํ™”์‹œ์ผœ ์˜ˆ์ธก ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค. 41
1. n๋ฒˆ ๋‹จ์ˆœ ์ž„์˜ ๋ณต์› ์ถ”์ถœ
Bagging
42
A
B
C
D
Train data
A
A A
B
B
D D
D
C
C
C
C
. . .
Bootstrap sample1 Bootstrap sample2 Bootstrap sample3
2. ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋‹จ์ผ ๋ชจ๋ธ ์ƒ์„ฑ
Bagging
43
A
A A
B
B
D D
D
C
C
C
C
. . .
training
training
training
model1 model2 model3 . . .
๋ฒ”์ฃผํ˜•: voting
์—ฐ์†ํ˜•: averaging
Bagging
44
Bagging
45
https://www.youtube.com/watch?time_continue=1&v=2Mg8QD0F1dQ
Bagging
> Variance๋ฅผ ๋‚ฎ์ถ”๊ธฐ ์ข‹๋‹ค.
์„œ๋กœ ๋‹ค๋ฅธ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ชจ๋ธ๋“ค์˜ ๊ฒฐ๊ณผ๋“ค์„ ๋ชจ๋‘ ๊ณ ๋ คํ•˜์—ฌ ์ตœ์ข… ๊ฒฐ๊ณผ๋ฅผ ๋‚ด๊ธฐ ๋•Œ๋ฌธ์—
ํ•˜๋‚˜์˜ trainset์— ๋„ˆ๋ฌด ์น˜์ค‘๋œ ์ƒํƒœ์—์„œ train์ด ๋˜๋Š” ๊ฒฝ์šฐ(overfitting)๋ฅผ ํ”ผํ•  ์ˆ˜ ์žˆ๋‹ค.
๋™์‹œ์— ๋ชจ๋ธ์˜ ๋†’์€ ์ •ํ™•๋„๋„ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ๋‹ค.
>> ๊ณผ์ ํ•ฉ ์šฐ๋ ค๊ฐ€ ํฐ (=variance๊ฐ€ ํฐ) ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ๋ฒ•๋“ค์— ์ ์šฉํ•˜๊ธฐ ์ข‹์€ ์•™์ƒ๋ธ” ๋ฐฉ๋ฒ•์ด๋‹ค.
46
Bagging
๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ(Random Forest)
> ์˜์‚ฌ๊ฒฐ์ •๋‚˜๋ฌด์— ๋ฐฐ๊น…์„ ์ ์šฉํ•œ ๋ชจํ˜•
= Data Bagging + Feature Bagging + Decision Tree
Data Bagging
> ๋ฐ์ดํ„ฐ๋ฅผ ์ž„์˜ ๋‹จ์ˆœ ๋ณต์› ์ถ”์ถœ ์‹œํ–‰(์ผ๋ฐ˜์ ์ธ ๋ฐฐ๊น…)
Feature Bagging(ํŠน์„ฑ ๋ฐฐ๊น…)
> ๋ณ€์ˆ˜๋ฅผ ์ž„์˜ ๋‹จ์ˆœ ๋ณต์› ์ถ”์ถœ (๋ฌด์ž‘์œ„๋กœ ์„ ํƒ๋œ ํŠน์„ฑ์„ ์ด์šฉ=์ž„์˜ ๋…ธ๋“œ ์ตœ์ ํ™”)
Decision Tree
> ์˜์‚ฌ๊ฒฐ์ • ๋‚˜๋ฌด(CART)
47
Random Forest
Bagging
48
Random Forest = Data Bagging + Feature Bagging + Decision Tree
์˜จ๋„ ์Šต๋„ ํ’์† ๋น„ ์—ฌ๋ถ€
A 15 60 15 1
B 21 10 1 1
C 3 70 5 0
D 7 2 30 0
์˜จ๋„ ์Šต๋„ ๋น„ ์—ฌ๋ถ€
A 15 60 1
A 15 60 1
C 3 70 0
A 15 60 1
์Šต๋„ ํ’์† ๋น„ ์—ฌ๋ถ€
D 2 30 0
B 10 1 1
C 70 5 0
C 70 5 0
Tree1 Tree2
Tree correlation(๋ชจ๋‘ ๋™์ผํ•œ ๋ณ€์ˆ˜๋ฅผ ๊ฐ€์ ธ์™”์„ ๋•Œ ์ƒ๊ธฐ๋Š” ๋ฌธ์ œ)์ด
๋†’์•„์ง€๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด feature bagging ์‹คํ–‰
Bagging Random Forest
์„ฑ๋Šฅํ‰๊ฐ€
OOB(Out-Of-Bag) error
> ๋ถ€ํŠธ์ŠคํŠธ๋žฉ ์ƒ˜ํ”Œ๋ง ๊ณผ์ •์—์„œ
์ถ”์ถœ๋˜์ง€ ์•Š์€ ๋ฐ์ดํ„ฐ๋“ค์„
๋”ฐ๋กœ ๋žœ๋คํฌ๋ ˆ์ŠคํŠธ๋กœ ํ•™์Šต์‹œ์ผœ ๋‚˜์˜จ ๊ฒฐ๊ณผ
> OOB ์ƒ˜ํ”Œ๋“ค์€ ์ฃผ๋กœ ํ‰๊ฐ€์šฉ ๋ฐ์ดํ„ฐ์—์„œ์˜
์˜ค๋ถ„๋ฅ˜์œจ์„ ์˜ˆ์ธกํ•˜๋Š” ์šฉ๋„ ๋ฐ ๋ณ€์ˆ˜ ์ค‘์š”๋„๋ฅผ
์ถ”์ •ํ•˜๋Š” ์šฉ๋„๋กœ ๋งŽ์ด ์ด์šฉ๋œ๋‹ค.
49
Bagging Random Forest
Random Forest ์ค‘์š” ํŠน์ง•: ๋ณ€์ˆ˜ ์ค‘์š”๋„
- ๋ฐ์ดํ„ฐ ๋ถ„์„ ์‹œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ issue ์ค‘ ํ•˜๋‚˜๋Š” ๋ณ€์ˆ˜ ์„ ํƒ!
- Feature bagging์„ ์ ์šฉํ•˜๋Š” Random Forest๋Š” ๊ฐ ๋ณ€์ˆ˜๋“ค์˜ ์ค‘์š”์„ฑ์— ์ˆœ์œ„๋ฅผ ๋งค๊ธธ ์ˆ˜ ์žˆ๋‹ค.
- Random Forest ๊ตฌ์„ฑ ๊ณผ์ •์—์„œ ๊ฐ feature์— ๋Œ€ํ•œ OOB-์˜ค์ฐจ(Out-Of-Bag error)๊ฐ€ ๊ตฌํ•ด์ง„๋‹ค.
(OOB: Bootstrap์„ ํ†ตํ•œ ์ž„์˜ ์ค‘๋ณต ์ถ”์ถœ ์‹œ ๊ฐ feature์— ๋Œ€ํ•œ test data)
- ๋ถ„๋ฅ˜์˜ ๊ฒฝ์šฐ, ํ•ด๋‹น ๋ณ€์ˆ˜๋กœ ๋ถ€ํ„ฐ ๋ถ„ํ• ์ด ์ผ์–ด๋‚  ๋•Œ ๋ถˆ์ˆœ๋„์˜ ๊ฐ์†Œ๊ฐ€ ์–ผ๋งˆ๋‚˜ ์ผ์–ด๋‚˜๋Š”์ง€์— ๋”ฐ๋ผ ๋ณ€์ˆ˜ ์ค‘์š”๋„ ๊ฒฐ์ •
- ํšŒ๊ท€์˜ ๊ฒฝ์šฐ, ์ž”์ฐจ์ œ๊ณฑํ•ฉ์„ ํ†ตํ•ด ๋ณ€์ˆ˜ ์ค‘์š”๋„ ์ธก์ •
50
Bagging Random Forest
51
Boosting
Boosting
Bootstrap ์ƒ˜ํ”Œ๋ง ์ดํ›„,
๋ถ„๋ฅ˜ ๊ฒฐ๊ณผ๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ถ„๋ฅ˜๊ฐ€ ์ž˜๋ชป๋œ ๋ฐ์ดํ„ฐ์— ๋” ํฐ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜์—ฌ ํ‘œ๋ณธ์„ ์ถ”์ถœํ•˜๋Š” ๋ฐฉ๋ฒ•
> Bagging์€ ๋ณ‘๋ ฌ๋กœ ํ•™์Šตํ•˜๋Š” ๋ฐ˜๋ฉด, Boosting์€ ์ˆœ์ฐจ์ ์œผ๋กœ ํ•™์Šต
> ํ•™์Šต์ด ๋๋‚˜๋ฉด ๋‚˜์˜จ ๊ฒฐ๊ณผ์— ๋”ฐ๋ผ ๊ฐ€์ค‘์น˜ ์žฌ๋ถ„๋ฐฐ
> ์˜ค๋‹ต์— ๋Œ€ํ•ด ๋†’์€ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜๊ณ  ์ •๋‹ต์— ๋Œ€ํ•ด ๋‚ฎ์€ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜์—ฌ
์˜ค๋‹ต์— ๋”์šฑ ์ง‘์ค‘ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค.
52
Boosting
53
Boosting
54
์•Œ๊ณ ๋ฆฌ์ฆ˜ ํŠน์ง• ๋น„๊ณ 
AdaBoost - ์˜ค๋ถ„๋ฅ˜์— ๋Œ€ํ•ด ๊ฐ€์ค‘์น˜ ๋ถ€์—ฌ 2003
GBM - Loss Function์˜ gradient๋ฅผ ํ†ตํ•ด ์˜ค๋‹ต์— ๊ฐ€์ค‘์น˜ ๋ถ€์—ฌ 2007
XGBoost
- GBM ๋Œ€๋น„ ์„ฑ๋Šฅ ํ–ฅ์ƒ
- ์‹œ์Šคํ…œ ์ž์› ํšจ์œจ์  ํ™œ์šฉ(CPU, Mem)
- Kaggle์„ ํ†ตํ•œ ์„ฑ๋Šฅ ๊ฒ€์ฆ
2014
Light GBM
- XGBoost ๋Œ€๋น„ ์„ฑ๋Šฅ ํ–ฅ์ƒ ๋ฐ ์ž์›์†Œ๋ชจ ์ตœ์†Œํ™”
- XGBoost๊ฐ€ ์ฒ˜๋ฆฌํ•˜์ง€ ๋ชปํ•˜๋Š” ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ ํ•™์Šต ๊ฐ€๋Šฅ
- Approximates the split(๊ทผ์‚ฌ์น˜์˜ ๋ถ„ํ• )์„ ํ†ตํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ
2016
Boosting AdaBoost
0. ๋ฐฐ๊น…๊ณผ ๋™์ผํ•˜๊ฒŒ ์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐ ์ถ”์ถœ
1. ํ•˜๋‚˜์˜ ๋ชจํ˜• ํ•™์Šต(Box1)
2. ํ•™์Šต ๊ฒฐ๊ณผ๋กœ ์˜ˆ์ธกํ•˜์—ฌ ํ‹€๋ฆฐ ์‚ฌ๋ก€์—
ํฐ ๊ฐ€์ค‘์น˜ ๋ถ€์—ฌํ•˜๊ณ  ๋‹ค์‹œ ํ•™์Šต(Box2)
3. ์ด ๊ณผ์ •์„ ๋ฐ˜๋ณต(Box3)ํ•˜์—ฌ ํ•™์Šต ๊ฒฐ๊ณผ๋ฅผ ํ•ฉ์นœ๋‹ค.(Box4)
> ์žก์Œ์ด ๋งŽ์€ ๋ฐ์ดํ„ฐ์™€ ์ด์ƒ์ ์— ์ทจ์•ฝํ•˜๋‹ค.
AdaBoost
55
1. 1ํšŒ ๋ณต์›์ถ”์ถœ ํ›„ ํŠธ๋ฆฌ ๋ชจ๋ธ ์ƒ์„ฑ
AdaBoost
56
A
B
C
D
Train data
A
C
D
Tree 1
Bagging๊ณผ ๋™์ผ
A
2. ๊ฐ€์ค‘์น˜ ์—…๋ฐ์ดํŠธ
AdaBoost
57
B
C
D
A
B
C
D
Train data
Train_Data๋กœ test๋ฅผ ํ•œ ํ›„,
์˜ค๋ถ„๋ฅ˜๋œ ๋ฐ์ดํ„ฐ๋“ค์ด ์ถ”์ถœ๋  ํ™•๋ฅ ์„ ๋†’์ธ ํ›„ 1๋ฒˆ ๊ณผ์ •์„ ๋‹ค์‹œ ์‹œํ–‰
๊ฐ€์ค‘์น˜
A
1
4
B
1
4
C
1
4
D
1
4
์—…๋ฐ์ดํŠธ๋œ ๊ฐ€์ค‘์น˜
A
1
4
โˆ— exp(โˆ’๐‘Ž)/(๐ด + ๐ต + ๐ถ + ๐ท)
B
1
4
โˆ— exp(๐‘Ž)/(๐ด + ๐ต + ๐ถ + ๐ท)
C
1
4
โˆ— exp(โˆ’๐‘Ž)/(๐ด + ๐ต + ๐ถ + ๐ท)
D
1
4
โˆ— exp(โˆ’๐‘Ž)/(๐ด + ๐ต + ๐ถ + ๐ท)
B
Tree 2
#. e(์—๋Ÿฌ์œจ): ์˜ค๋ฅ˜๋ฐ์ดํ„ฐ ๊ฐ€์ค‘์น˜ ํ•ฉ / ์ „์ฒด ๋ฐ์ดํ„ฐ ๊ฐ€์ค‘์น˜ ํ•ฉ
#. a(์‹ ๋ขฐ๋„):
1
2
โˆ— ln(
1โˆ’๐‘’
๐‘’
)
B๋งŒ ์˜ค๋ถ„๋ฅ˜๋œ ์ƒํƒœ
3. ์œ„ ๊ณผ์ • ๋ฐ˜๋ณต
AdaBoost
58
์—๋Ÿฌ์œจ์ด 0์ด ๋  ๋•Œ๊นŒ์ง€ ํ˜น์€ ํŠธ๋ฆฌ ๋ชจ๋ธ ์ˆ˜๊ฐ€ ์ผ์ •ํ•œ ์ˆ˜์— ๋„๋‹ฌํ•  ๋•Œ๊นŒ์ง€
์œ„ ๊ณผ์ •๋“ค์„ ๊ณ„์† ๋ฐ˜๋ณต
. . .
A
B
C
D
Train data
D
C
D
B
B
C
D
B C
Tree 1 Tree 2 Tree 3
A
A A
4. ์‹ ๋ขฐ๋„(a)๋ฅผ ๊ณฑํ•˜์—ฌ voting
AdaBoost
59
Tree 1
Predict*a
Tree 2
Predict*a
Tree 3
Predict*a
. . .
. . .
+ + =
Boosting AdaBoost
60
https://www.youtube.com/watch?v=GM3CDQfQ4sw
Boosting GBM
GBM(Gradient Boosting) = ๊ฒฝ์‚ฌ ๋ถ€์ŠคํŒ…
> ์ด์ „์˜ ํ•™์Šต ๊ฒฐ๊ณผ์™€ ์‹ค์ œ์˜ ์ฐจ์ด๋ฅผ ํ•™์Šตํ•˜๋Š” ๋ฐฉ์‹
> ๊ฐ€์ค‘์น˜ ๊ณ„์‚ฐ ๋ฐฉ์‹์—์„œ Gradient Descent๋ฅผ ์ ์šฉ
๋‘๋ฒˆ์งธ ํ•™์Šต ๊ฒฐ๊ณผ f2๊ฐ€
์ฒซ๋ฒˆ์งธ ์˜ˆ์ธก ์˜ค์ฐจ๋ฅผ ํ•™์Šต
์ฒซ๋ฒˆ์งธ ํ•™์Šต ๊ฒฐ๊ณผ f1 ํ•™์Šต์ด ์ž˜ ๋˜์—ˆ๋‹ค๋ฉด,
์ฒซ๋ฒˆ์งธ ํ•™์Šต ์˜ค์ฐจ๋ถ„์‚ฐ Var(e1)๋ณด๋‹ค
๋‘๋ฒˆ์งธ ํ•™์Šต์˜ ์˜ค์ฐจ๋ถ„์‚ฐ Var(e2)๊ฐ€
๋” ์ค„์–ด๋“ค๊ฒŒ ๋œ๋‹ค.
61
Boosting GBM
ํ•™์Šต๊ธฐ M์— ๋Œ€ํ•ด์„œ Y๋ฅผ ์˜ˆ์ธกํ•  ํ™•๋ฅ 
> Y = M(x) + error
> error = G(x) + error2 (error > error2)
> error2 = H(x) + error3 (error2 > error3)
> Y = M(x) + G(x) + H(x) + error3
ํ•™์Šต๊ธฐ M์„ ๋‹จ๋…์œผ๋กœ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ๋ณด๋‹ค
์ •ํ™•๋„๊ฐ€ ๋†’์„ ๊ฒƒ์ด๋‹ค. (error > error3)
Error๋ฅผ ์„ธ๋ถ„ํ™”
62
Boosting GBM
ํ•™์Šต๊ธฐ M์„ ๋‹จ๋…์œผ๋กœ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ๋ณด๋‹ค
์ •ํ™•๋„๊ฐ€ ๋†’์„ ๊ฒƒ์ด๋‹ค. (error > error3)
Error๋ฅผ ์„ธ๋ถ„ํ™”
63
M, G, H ๊ฐ๊ฐ ๋ถ„๋ฅ˜๊ธฐ์˜ ์„ฑ๋Šฅ์ด ๋‹ค๋ฅธ๋ฐ,
๋ชจ๋‘ ๊ฐ™์€ ๋น„์ค‘์„ ๋‘๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์—
์ž„์˜์˜ x์— ๋Œ€ํ•ด ์„œ๋กœ ๊ฐ„์„ญํ•˜์—ฌ
์˜ค๋ฅ˜๋ฅผ ๋†’์ด๋Š” ๊ฒฐ๊ณผ๋ฅผ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋‹ค.
ํ•™์Šต๊ธฐ M์— ๋Œ€ํ•ด์„œ Y๋ฅผ ์˜ˆ์ธกํ•  ํ™•๋ฅ 
> Y = M(x) + error
> error = G(x) + error2 (error > error2)
> error2 = H(x) + error3 (error2 > error3)
> Y = M(x) + G(x) + H(x) + error3
Boosting GBM
๊ฐ ํ•จ์ˆ˜๋ณ„ ์ตœ์ ์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์ฐพ์œผ๋ฉด, ์˜ˆ์ธก ์ •ํ™•๋„ ํ–ฅ์ƒ
(Gradient Descent ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ์ตœ์  weight๊ฐ’ ๊ณ„์‚ฐ)
Y = alpha * M(x) + beta * G(x) + gamma * H(x) + error3
64
Boosting GBM
๊ฐ€์ค‘์น˜ ๋ถ€์—ฌ ๋ฐฉ์‹?
> ์ดˆ๊ธฐ ๋ฐ์ดํ„ฐ ๊ฐ€์ค‘์น˜(D) = 1/n (n: ์ „์ฒด ํ•™์Šต ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜)
> ์˜ค๋ฅ˜(e): ์˜ค๋ฅ˜๋ฐ์ดํ„ฐ / ์ „์ฒด ํ•™์Šต ๋ฐ์ดํ„ฐ
> Weak model์˜ ํ•จ์ˆ˜: h(t) (=๋ชจ๋ธ์˜ ์˜ˆ์ธก๊ฐ’)
> ๊ฐ€์ค‘์น˜ ์—…๋ฐ์ดํŠธ(D)
> ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜(a) (=learning rate์—ญํ• )
65
Boosting XGBoost
- GBM + ๋ถ„์‚ฐ/๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ
- Split ์ง€์ ์„ ๊ณ ๋ คํ•  ๋•Œ ์ผ๋ถ€๋ฅผ ๋ณด๊ณ  ๊ฒฐ์ •
- Sparsity Awareness๊ฐ€ ๊ฐ€๋Šฅ
- Binary classification / Multiclass classification /
Regression / Learning to Rank
XGBoost (eXtreme Gradient Boosting)
> ๋ชฉ์ ํ•จ์ˆ˜๋ฅผ ์ •์˜ํ•˜๊ณ  ์ด๋ฅผ ์ตœ์†Œํ™” ํ•˜๋Š” ๊ฐ’์„ ์ฐพ์•„ ๊ฐ€์ค‘์น˜๋ฅผ ์—…๊ทธ๋ ˆ์ด๋“œ
> ์ง€๋„ํ•™์Šต์œผ๋กœ ๋ณ€์ˆ˜(x)๋ฅผ ํ•™์Šตํ•˜์—ฌ ์ •๋‹ต(y)์„ ์˜ˆ์ธก
66
Boosting XGBoost
> CPU ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ, ์ฝ”์–ด๋“ค์ด ๊ฐ์ž ํ• ๋‹น ๋ฐ›์€ ๋ณ€์ˆ˜๋“ค๋กœ ์ œ๊ฐ๊ธฐ ๊ฐ€์ง€๋ฅผ ์ณ ๋‚˜๊ฐ„๋‹ค.
> ์—ฐ์‚ฐ์‹œ๊ฐ„ ๋‹จ์ถ•
67
Boosting XGBoost
> ์—ฐ์†ํ˜• ๋ณ€์ˆ˜์—์„œ split ์ง€์ ์„ ๊ณ ๋ คํ•  ๋•Œ
๋ชจ๋“  ๊ฐ’๋“ค์„ ์‚ดํŽด๋ณด๊ณ  ๊ฒฐ์ •ํ•˜๊ธฐ ๋ณด๋‹จ ์ผ๋ถ€๋ถ„๋งŒ์„ ๋ณด๊ณ  ๊ฒฐ์ •์„ ํ•œ๋‹ค.
ID ๊ตญ์–ด ์ˆ˜ํ•™
1 100 90
2 95 80
3 75 90
4 20 55
์ผ๋ถ€ ํ›„๋ณด๊ตฐ๋งŒ ๋ณด๊ณ 
split ์ง€์  ๊ฒฐ์ •
68
Boosting XGBoost
> Sparsity Awareness๊ฐ€ ๊ฐ€๋Šฅ
> Zero ๋ฐ์ดํ„ฐ๋ฅผ ๊ฑด๋„ˆ๋›ฐ๋ฉด์„œ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•˜๋‹ค.
> Input์„ dummyํ™” ํ•˜๋ฉด ์†๋„๊ฐ€ ์ƒ์Šน!
ID ๊ณผ๋ชฉ
1 ๊ตญ์–ด
2 ์ˆ˜ํ•™
3 ์˜์–ด
4 ๊ณผํ•™
ID ๊ตญ์–ด ์ˆ˜ํ•™ ์˜์–ด ๊ณผํ•™
1 1 0 0 0
2 0 1 0 0
3 0 0 1 0
4 0 0 0 1
69
Boosting Light GBM
70
Light GBM
> Decision Tree ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ธฐ๋ฐ˜์˜ GBM ํ”„๋ ˆ์ž„์›Œํฌ (์†๋„, ์„ฑ๋Šฅ ํ–ฅ์ƒ)
> Ranking, Classification๋“ฑ์˜ ๋ฌธ์ œ์— ํ™œ์šฉ
๋‹ค๋ฅธ tree ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ์˜ ์ฐจ์ด์ ?
- Leaf-wise๋กœ tree๋ฅผ ์„ฑ์žฅ(์ˆ˜์ง ๋ฐฉํ–ฅ)
- ์ตœ๋Œ€ delta loss์˜ leaf๋ฅผ ์„ฑ์žฅ
์ฆ‰, ๋™์ผํ•œ leaf๋ฅผ ์„ฑ์žฅ์‹œํ‚ฌ ๋•Œ, leaf-wise๊ฐ€ loss๋ฅผ ๋” ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค.
https://www.slideshare.net/freepsw/boosting-bagging-vs-boosting
Boosting Light GBM
71
Leaf-wise(Light GBM)
> ๊ฐ€์žฅ loss๋ณ€ํ™”๊ฐ€ ํฐ ๋…ธ๋“œ์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„ํ• 
> ํ•™์Šต ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์€ ๊ฒฝ์šฐ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ
> ์ˆ˜์ง ์„ฑ์žฅ
Level-wise(XGBoost, Random Forest)
> ๊ฐ ๋…ธ๋“œ๋Š” root ๋…ธ๋“œ์™€ ๊ฐ€๊นŒ์šด ๋…ธ๋“œ๋ฅผ ์šฐ์„  ์ˆœํšŒ
> ์ˆ˜ํ‰ ์„ฑ์žฅ
Boosting Light GBM
72
์™œ Light GBM์ด ์ธ๊ธฐ๊ฐ€ ์žˆ์„๊นŒ?
> ๋Œ€๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ‘๋ ฌ๋กœ ๋น ๋ฅด๊ฒŒ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•˜๋‹ค.(Low memory, GPU ํ™œ์šฉ๊ฐ€๋Šฅ)
> ์˜ˆ์ธก ์ •ํ™•๋„๊ฐ€ ๋” ๋†’๋‹ค. (Leaf-wise tree์˜ ์žฅ์ : ๊ณผ์ ํ•ฉ์— ๋ฏผ๊ฐ)
ํŠน์ง•
> XGBoost์˜ 2~10๋ฐฐ (๋™์ผํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ์„ค์ •)๋น ๋ฅด๋‹ค.
> Leaf-wise tree๋Š” overfitting์— ๋ฏผ๊ฐํ•˜์—ฌ, ๋Œ€๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ ํ•™์Šต์— ์ ํ•ฉํ•˜๋‹ค.
(์ ์–ด๋„ 10,000๊ฐœ ์ด์ƒ์˜ ํ–‰์ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ์— ์ ํ•ฉํ•˜๋‹ค.)
Stacking
โ€œTwo heads are better than oneโ€
= Meta modeling
์—ฌ๋Ÿฌ ๋ชจ๋ธ๋“ค์˜ ๊ฒฐ๊ณผ๊ฐ’์„ averaging, votingํ•˜์—ฌ ์ƒˆ๋กœ์šด ๊ฒฐ๊ณผ๊ฐ’ ์ƒ์„ฑ
์—ฌ๋Ÿฌ ๋ชจ๋ธ๋“ค์˜ ๊ฒฐ๊ณผ๊ฐ’์„ ์ƒˆ๋กœ์šด ๋ณ€์ˆ˜๋กœ ๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑ (์ดํ›„ ๋‹ค์‹œ ๋ชจ๋ธ ํ•™์Šต)
73
Stacking
> ์—ฌ๋Ÿฌ ๋ชจ๋ธ๋“ค์˜ ๊ฒฐ๊ณผ๊ฐ’์„ averaging, votingํ•˜์—ฌ ์ƒˆ๋กœ์šด ๊ฒฐ๊ณผ๊ฐ’ ์ƒ์„ฑ
์ข…์†๋ณ€์ˆ˜: ๋ฒ”์ฃผํ˜•
- Predict ๊ฐ’ ์ค‘ ํ™•๋ฅ  ๊ฐ’๋“ค์„ ํ‰๊ท ๋‚ด ์ตœ์ข… ์˜ˆ์ธก ๊ฐ’์œผ๋กœ ์‚ฌ์šฉ
- ์˜ˆ์ธก ๊ฐ’ ์ž์ฒด๋ฅผ votingํ•˜์—ฌ ์ตœ์ข… ์˜ˆ์ธก ๊ฐ’์œผ๋กœ ์‚ฌ์šฉ
์ข…์†๋ณ€์ˆ˜: ์—ฐ์†ํ˜•
- ์˜ˆ์ธก ๊ฐ’ ์ž์ฒด๋ฅผ averagingํ•˜์—ฌ ์ตœ์ข… ์˜ˆ์ธก ๊ฐ’์œผ๋กœ ์‚ฌ์šฉ
74
Stacking
> ์—ฌ๋Ÿฌ ๋ชจ๋ธ๋“ค์˜ ๊ฒฐ๊ณผ๊ฐ’์„ ์ƒˆ๋กœ์šด ๋ณ€์ˆ˜๋กœ ๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑ (์ดํ›„ ๋‹ค์‹œ ๋ชจ๋ธ ํ•™์Šต)
์ข…์†๋ณ€์ˆ˜: ๋ฒ”์ฃผํ˜•
- Predict ๊ฐ’ ์ค‘ ํ™•๋ฅ  ๊ฐ’๋“ค์„ ์ƒˆ๋กœ์šด ๋ณ€์ˆ˜๋กœ ์‚ฌ์šฉ
- ์˜ˆ์ธก ๊ฐ’ ์ž์ฒด๋ฅผ ์ƒˆ๋กœ์šด ๋ณ€์ˆ˜๋กœ ์‚ฌ์šฉ
์ข…์†๋ณ€์ˆ˜: ์—ฐ์†ํ˜•
- ์˜ˆ์ธก ๊ฐ’ ์ž์ฒด๋ฅผ ์ƒˆ๋กœ์šด ๋ณ€์ˆ˜๋กœ ์‚ฌ์šฉ
75
Train
Test
predict
Model 1
Train
Test
Model 3
Voting
or
Averaging
76
Stacking
Train
Test
Model 2
predict predict
predict
Stacking
Train
Test
predict
Model 1
Train
Test
feature1
์ƒˆ๋กœ์šด ๋ณ€์ˆ˜๋กœ ๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑ
77
Train
Test
predict
Model 2
feature2
Q & A
Decision tree and ensemble

More Related Content

What's hot

Lec 8 03_sept [compatibility mode]
Lec 8 03_sept [compatibility mode]Lec 8 03_sept [compatibility mode]
Lec 8 03_sept [compatibility mode]Palak Sanghani
ย 
Classification decision tree
Classification  decision treeClassification  decision tree
Classification decision treeyazad dumasia
ย 
ID3 Algorithm & ROC Analysis
ID3 Algorithm & ROC AnalysisID3 Algorithm & ROC Analysis
ID3 Algorithm & ROC AnalysisTalha Kabakus
ย 
Cs221 lecture5-fall11
Cs221 lecture5-fall11Cs221 lecture5-fall11
Cs221 lecture5-fall11darwinrlo
ย 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learningSANTHOSH RAJA M G
ย 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionSeonho Park
ย 
Decision Tree and Bayesian Classification
Decision Tree and Bayesian ClassificationDecision Tree and Bayesian Classification
Decision Tree and Bayesian ClassificationKomal Kotak
ย 
Cs583 association-rules
Cs583 association-rulesCs583 association-rules
Cs583 association-rulesGautam Thakur
ย 
Ultra Violet-Visable Spectroscopy Analysis of Spin Coated Substrates
Ultra Violet-Visable Spectroscopy Analysis of Spin Coated SubstratesUltra Violet-Visable Spectroscopy Analysis of Spin Coated Substrates
Ultra Violet-Visable Spectroscopy Analysis of Spin Coated SubstratesNicholas Lauer
ย 
Association rule mining
Association rule miningAssociation rule mining
Association rule miningAcad
ย 

What's hot (11)

Lec 8 03_sept [compatibility mode]
Lec 8 03_sept [compatibility mode]Lec 8 03_sept [compatibility mode]
Lec 8 03_sept [compatibility mode]
ย 
Classification decision tree
Classification  decision treeClassification  decision tree
Classification decision tree
ย 
ID3 Algorithm & ROC Analysis
ID3 Algorithm & ROC AnalysisID3 Algorithm & ROC Analysis
ID3 Algorithm & ROC Analysis
ย 
Cs221 lecture5-fall11
Cs221 lecture5-fall11Cs221 lecture5-fall11
Cs221 lecture5-fall11
ย 
Decision tree
Decision treeDecision tree
Decision tree
ย 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learning
ย 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for Regression
ย 
Decision Tree and Bayesian Classification
Decision Tree and Bayesian ClassificationDecision Tree and Bayesian Classification
Decision Tree and Bayesian Classification
ย 
Cs583 association-rules
Cs583 association-rulesCs583 association-rules
Cs583 association-rules
ย 
Ultra Violet-Visable Spectroscopy Analysis of Spin Coated Substrates
Ultra Violet-Visable Spectroscopy Analysis of Spin Coated SubstratesUltra Violet-Visable Spectroscopy Analysis of Spin Coated Substrates
Ultra Violet-Visable Spectroscopy Analysis of Spin Coated Substrates
ย 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
ย 

Similar to Decision tree and ensemble

Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
ย 
Module III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptxModule III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptxShivakrishnan18
ย 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
ย 
A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...Yao Wu
ย 
report
reportreport
reportArthur He
ย 
Decision tree
Decision treeDecision tree
Decision treeVarun Jain
ย 
Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Simplilearn
ย 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
ย 
Dataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptxDataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptxHimanshuSharma997566
ย 
16 Simple CART
16 Simple CART16 Simple CART
16 Simple CARTVishal Dutt
ย 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptRvishnupriya2
ย 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptRvishnupriya2
ย 
Random Forest / Bootstrap Aggregation
Random Forest / Bootstrap AggregationRandom Forest / Bootstrap Aggregation
Random Forest / Bootstrap AggregationRupak Roy
ย 
unit classification.pptx
unit  classification.pptxunit  classification.pptx
unit classification.pptxssuser908de6
ย 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf321106410027
ย 

Similar to Decision tree and ensemble (20)

Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
ย 
Decision tree
Decision tree Decision tree
Decision tree
ย 
Module III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptxModule III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptx
ย 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
ย 
A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...
ย 
report
reportreport
report
ย 
Decision tree
Decision treeDecision tree
Decision tree
ย 
Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...
ย 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
ย 
Dataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptxDataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptx
ย 
Classification
ClassificationClassification
Classification
ย 
Classification
ClassificationClassification
Classification
ย 
16 Simple CART
16 Simple CART16 Simple CART
16 Simple CART
ย 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
ย 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
ย 
Random Forest / Bootstrap Aggregation
Random Forest / Bootstrap AggregationRandom Forest / Bootstrap Aggregation
Random Forest / Bootstrap Aggregation
ย 
Dbm630 lecture06
Dbm630 lecture06Dbm630 lecture06
Dbm630 lecture06
ย 
Decision Trees.ppt
Decision Trees.pptDecision Trees.ppt
Decision Trees.ppt
ย 
unit classification.pptx
unit  classification.pptxunit  classification.pptx
unit classification.pptx
ย 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf
ย 

More from Danbi Cho

Crf based named entity recognition using a korean lexical semantic network
Crf based named entity recognition using a korean lexical semantic networkCrf based named entity recognition using a korean lexical semantic network
Crf based named entity recognition using a korean lexical semantic networkDanbi Cho
ย 
Gpt models
Gpt modelsGpt models
Gpt modelsDanbi Cho
ย 
Attention boosted deep networks for video classification
Attention boosted deep networks for video classificationAttention boosted deep networks for video classification
Attention boosted deep networks for video classificationDanbi Cho
ย 
A survey on deep learning based approaches for action and gesture recognition...
A survey on deep learning based approaches for action and gesture recognition...A survey on deep learning based approaches for action and gesture recognition...
A survey on deep learning based approaches for action and gesture recognition...Danbi Cho
ย 
ELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
ELECTRA_Pretraining Text Encoders as Discriminators rather than GeneratorsELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
ELECTRA_Pretraining Text Encoders as Discriminators rather than GeneratorsDanbi Cho
ย 
A survey on automatic detection of hate speech in text
A survey on automatic detection of hate speech in textA survey on automatic detection of hate speech in text
A survey on automatic detection of hate speech in textDanbi Cho
ย 
Zero wall detecting zero-day web attacks through encoder-decoder recurrent ne...
Zero wall detecting zero-day web attacks through encoder-decoder recurrent ne...Zero wall detecting zero-day web attacks through encoder-decoder recurrent ne...
Zero wall detecting zero-day web attacks through encoder-decoder recurrent ne...Danbi Cho
ย 
Can recurrent neural networks warp time
Can recurrent neural networks warp timeCan recurrent neural networks warp time
Can recurrent neural networks warp timeDanbi Cho
ย 
Man is to computer programmer as woman is to homemaker debiasing word embeddings
Man is to computer programmer as woman is to homemaker debiasing word embeddingsMan is to computer programmer as woman is to homemaker debiasing word embeddings
Man is to computer programmer as woman is to homemaker debiasing word embeddingsDanbi Cho
ย 
Situation recognition visual semantic role labeling for image understanding
Situation recognition visual semantic role labeling for image understandingSituation recognition visual semantic role labeling for image understanding
Situation recognition visual semantic role labeling for image understandingDanbi Cho
ย 
Mitigating unwanted biases with adversarial learning
Mitigating unwanted biases with adversarial learningMitigating unwanted biases with adversarial learning
Mitigating unwanted biases with adversarial learningDanbi Cho
ย 

More from Danbi Cho (11)

Crf based named entity recognition using a korean lexical semantic network
Crf based named entity recognition using a korean lexical semantic networkCrf based named entity recognition using a korean lexical semantic network
Crf based named entity recognition using a korean lexical semantic network
ย 
Gpt models
Gpt modelsGpt models
Gpt models
ย 
Attention boosted deep networks for video classification
Attention boosted deep networks for video classificationAttention boosted deep networks for video classification
Attention boosted deep networks for video classification
ย 
A survey on deep learning based approaches for action and gesture recognition...
A survey on deep learning based approaches for action and gesture recognition...A survey on deep learning based approaches for action and gesture recognition...
A survey on deep learning based approaches for action and gesture recognition...
ย 
ELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
ELECTRA_Pretraining Text Encoders as Discriminators rather than GeneratorsELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
ELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
ย 
A survey on automatic detection of hate speech in text
A survey on automatic detection of hate speech in textA survey on automatic detection of hate speech in text
A survey on automatic detection of hate speech in text
ย 
Zero wall detecting zero-day web attacks through encoder-decoder recurrent ne...
Zero wall detecting zero-day web attacks through encoder-decoder recurrent ne...Zero wall detecting zero-day web attacks through encoder-decoder recurrent ne...
Zero wall detecting zero-day web attacks through encoder-decoder recurrent ne...
ย 
Can recurrent neural networks warp time
Can recurrent neural networks warp timeCan recurrent neural networks warp time
Can recurrent neural networks warp time
ย 
Man is to computer programmer as woman is to homemaker debiasing word embeddings
Man is to computer programmer as woman is to homemaker debiasing word embeddingsMan is to computer programmer as woman is to homemaker debiasing word embeddings
Man is to computer programmer as woman is to homemaker debiasing word embeddings
ย 
Situation recognition visual semantic role labeling for image understanding
Situation recognition visual semantic role labeling for image understandingSituation recognition visual semantic role labeling for image understanding
Situation recognition visual semantic role labeling for image understanding
ย 
Mitigating unwanted biases with adversarial learning
Mitigating unwanted biases with adversarial learningMitigating unwanted biases with adversarial learning
Mitigating unwanted biases with adversarial learning
ย 

Recently uploaded

(Genuine) Escort Service Lucknow | Starting โ‚น,5K To @25k with A/C ๐Ÿง‘๐Ÿฝโ€โค๏ธโ€๐Ÿง‘๐Ÿป 89...
(Genuine) Escort Service Lucknow | Starting โ‚น,5K To @25k with A/C ๐Ÿง‘๐Ÿฝโ€โค๏ธโ€๐Ÿง‘๐Ÿป 89...(Genuine) Escort Service Lucknow | Starting โ‚น,5K To @25k with A/C ๐Ÿง‘๐Ÿฝโ€โค๏ธโ€๐Ÿง‘๐Ÿป 89...
(Genuine) Escort Service Lucknow | Starting โ‚น,5K To @25k with A/C ๐Ÿง‘๐Ÿฝโ€โค๏ธโ€๐Ÿง‘๐Ÿป 89...gurkirankumar98700
ย 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
ย 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
ย 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
ย 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
ย 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
ย 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
ย 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
ย 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
ย 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
ย 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
ย 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
ย 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
ย 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
ย 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
ย 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
ย 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
ย 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
ย 

Recently uploaded (20)

(Genuine) Escort Service Lucknow | Starting โ‚น,5K To @25k with A/C ๐Ÿง‘๐Ÿฝโ€โค๏ธโ€๐Ÿง‘๐Ÿป 89...
(Genuine) Escort Service Lucknow | Starting โ‚น,5K To @25k with A/C ๐Ÿง‘๐Ÿฝโ€โค๏ธโ€๐Ÿง‘๐Ÿป 89...(Genuine) Escort Service Lucknow | Starting โ‚น,5K To @25k with A/C ๐Ÿง‘๐Ÿฝโ€โค๏ธโ€๐Ÿง‘๐Ÿป 89...
(Genuine) Escort Service Lucknow | Starting โ‚น,5K To @25k with A/C ๐Ÿง‘๐Ÿฝโ€โค๏ธโ€๐Ÿง‘๐Ÿป 89...
ย 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
ย 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
ย 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
ย 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
ย 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
ย 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
ย 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
ย 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
ย 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
ย 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
ย 
Vip Call Girls Noida โžก๏ธ Delhi โžก๏ธ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida โžก๏ธ Delhi โžก๏ธ 9999965857 No Advance 24HRS LiveVip Call Girls Noida โžก๏ธ Delhi โžก๏ธ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida โžก๏ธ Delhi โžก๏ธ 9999965857 No Advance 24HRS Live
ย 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
ย 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
ย 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
ย 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
ย 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
ย 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
ย 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
ย 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
ย 

Decision tree and ensemble

  • 2. 01 02 03 ๋ฐฐ๊น… ๋ถ€์ŠคํŒ… ์Šคํƒœํ‚น 01 02 03 ๋ชจ๋ธ ํ˜•์„ฑ๊ณผ์ • ์ •๋ฆฌ
  • 3. Decision Tree ๋ถ„๋ฅ˜ / ์˜ˆ์ธก์„ ํ•˜๋Š” ์ง€๋„ํ•™์Šต ๋ฐฉ๋ฒ•๋ก  ์˜์‚ฌ๊ฒฐ์ • ๊ทœ์น™(Decision Rule)์— ๋”ฐ๋ผ์„œ Tree๋ฅผ ์ƒ์„ฑํ•œ ํ›„์— ๋ถ„๋ฅ˜ / ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก  3
  • 4. ๋ชฉ์  1). ์„ธ๋ถ„ํ™”(Segmentation) ๋ฐ์ดํ„ฐ๋ฅผ ๋น„์Šทํ•œ ํŠน์„ฑ์„ ๊ฐ–๋Š” ๋ช‡ ๊ฐœ์˜ ๊ทธ๋ฃน์œผ๋กœ ๋ถ„ํ• , ๊ทธ๋ฃน๋ณ„ ํŠน์„ฑ์„ ๋ฐœ๊ฒฌํ•˜๋Š” ๊ฒฝ์šฐ ๋˜๋Š” ๊ฐ ๋ฐ์ดํ„ฐ๊ฐ€ ์–ด๋–ค ์ง‘๋‹จ์— ์†ํ•˜๋Š”์ง€๋ฅผ ํŒŒ์•…ํ•˜๊ณ ์ž ํ•˜๋Š” ๊ฒฝ์šฐ Ex) ์‹œ์žฅ ์„ธ๋ถ„ํ™”, ๊ณ ๊ฐ ์„ธ๋ถ„ํ™” 2). ๋ถ„๋ฅ˜(Classification) ๊ด€์ธก๊ฐœ์ฒด๋ฅผ ์—ฌ๋Ÿฌ ์˜ˆ์ธก๋ณ€์ˆ˜๋“ค์— ๊ทผ๊ฑฐํ•ด ๋ชฉํ‘œ๋ณ€์ˆ˜์˜ ๋ฒ”์ฃผ๋ฅผ ๋ช‡ ๊ฐœ์˜ ๋“ฑ๊ธ‰์œผ๋กœ ๋ถ„๋ฅ˜ํ•˜๊ณ ์ž ํ•˜๋Š” ๊ฒฝ์šฐ Ex) ๊ณ ๊ฐ์„ ์‹ ์šฉ๋„์— ๋”ฐ๋ผ ์šฐ๋Ÿ‰/๋ถˆ๋Ÿ‰์œผ๋กœ ๋ถ„๋ฅ˜ 3). ์˜ˆ์ธก(Prediction) ์ž๋ฃŒ์—์„œ ๊ทœ์น™์„ ์ฐพ์•„๋‚ด๊ณ  ์ด๋ฅผ ์ด์šฉํ•ด ๋ฏธ๋ž˜์˜ ์‚ฌ๊ฑด์„ ์˜ˆ์ธกํ•˜๊ณ ์ž ํ•˜๋Š” ๊ฒฝ์šฐ Ex) ๊ณ ๊ฐ์†์„ฑ์— ๋”ฐ๋ผ ๋Œ€์ถœํ•œ๋„์•ก์„ ์˜ˆ์ธก 4). ์ฐจ์›์ถ•์†Œ ๋ฐ ๋ณ€์ˆ˜์„ ํƒ(Data reduction and variable screening) ๋งค์šฐ ๋งŽ์€ ์ˆ˜์˜ ์˜ˆ์ธก๋ณ€์ˆ˜ ์ค‘์—์„œ ๋ชฉํ‘œ๋ณ€์ˆ˜์— ํฐ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ๋ณ€์ˆ˜๋“ค์„ ๊ณจ๋ผ๋‚ด๊ณ ์ž ํ•˜๋Š” ๊ฒฝ์šฐ 5). ๊ตํ˜ธ์ž‘์šฉํšจ๊ณผ์˜ ํŒŒ์•…(Interaction effect identification) ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์˜ˆ์ธก๋ณ€์ˆ˜๋“ค์„ ๊ฒฐํ•ฉํ•ด ๋ชฉํ‘œ๋ณ€์ˆ˜์— ์ž‘์šฉํ•˜๋Š” ๊ทœ์น™(๊ตํ˜ธ์ž‘์šฉํšจ๊ณผ)์„ ํŒŒ์•…ํ•˜๊ณ ์ž ํ•˜๋Š” ๊ฒฝ์šฐ 6). ๋ฒ”์ฃผ์˜ ๋ณ‘ํ•ฉ ๋˜๋Š” ์—ฐ์†ํ˜• ๋ณ€์ˆ˜์˜ ์ด์‚ฐํ™”(Binning) ๋ฒ”์ฃผํ˜• ๋ชฉํ‘œ๋ณ€์ˆ˜์˜ ๋ฒ”์ฃผ๋ฅผ ์†Œ์ˆ˜์˜ ๋ช‡ ๊ฐœ๋กœ ๋ณ‘ํ•ฉํ•˜๊ฑฐ๋‚˜ ์—ฐ์†ํ˜• ๋ชฉํ‘œ๋ณ€์ˆ˜๋ฅผ ๋ช‡ ๊ฐœ์˜ ๋“ฑ๊ธ‰์œผ๋กœ ์ด์‚ฐํ™” ํ•˜๊ณ ์ž ํ•˜๋Š” ๊ฒฝ์šฐ 4
  • 5. ๋ถ€๋ชจ๋งˆ๋””(parent node) > ์ž์‹๋งˆ๋””์˜ ์ƒ์œ„๋งˆ๋”” ์ž์‹๋งˆ๋””(child node) > ํ•˜๋‚˜์˜ ๋งˆ๋””๋กœ๋ถ€ํ„ฐ ๋ถ„๋ฆฌ๋˜์–ด ๋‚˜๊ฐ„ ๋งˆ๋”” ๊นŠ์ด(depth) > ๊ฐ€์ง€๋ฅผ ์ด๋ฃจ๊ณ  ์žˆ๋Š” ๋งˆ๋””์˜ ๊ฐœ์ˆ˜ ๊ฐ€์ง€(branch) > ํ•˜๋‚˜์˜ ๋งˆ๋””๋กœ๋ถ€ํ„ฐ ๋๋งˆ๋””๊นŒ์ง€ ์—ฐ๊ฒฐ๋œ ๋งˆ๋””๋“ค ๋ถ€๋ชจ๋งˆ๋”” ์ž์‹๋งˆ๋”” ๊นŠ์ด:3 5
  • 6. ๋ฟŒ๋ฆฌ๋งˆ๋”” ์ค‘๊ฐ„๋งˆ๋”” ๋๋งˆ๋”” ๋ฟŒ๋ฆฌ๋งˆ๋””(root node) > ๋‚˜๋ฌด๊ตฌ์กฐ๊ฐ€ ์‹œ์ž‘๋˜๋Š” ๋งˆ๋”” > ์‹œ์ž‘๋˜๋Š” ๋งˆ๋””๋กœ ์ „์ฒด ์ž๋ฃŒ๊ฐ€ ํฌํ•จ ์ค‘๊ฐ„๋งˆ๋””(internal node) > ์ค‘๊ฐ„์— ์žˆ๋Š” ๋๋งˆ๋””๊ฐ€ ์•„๋‹Œ ๋งˆ๋”” > ๋ถ€๋ชจ๋งˆ๋””์™€ ์ž์‹๋งˆ๋””๊ฐ€ ๋ชจ๋‘ ์žˆ๋Š” ๋งˆ๋”” ๋๋งˆ๋””(terminal node, leaf) > ๊ฐ ๋‚˜๋ฌด์ค„๊ธฐ์˜ ๋์— ์œ„์น˜ํ•˜๋Š” ๋งˆ๋”” > ์ž์‹๋งˆ๋””๊ฐ€ ์—†๋Š” ๋งˆ๋”” 6
  • 7. ์ด๋ฆ„ ํ†ตํ•™์‹œ๊ฐ„(๋ถ„) ์–‘์น˜์งˆ ๋งˆ์Šคํฌ ๊ฐ๊ธฐ ์ด์˜ˆ์ฐฌ 40 X O No ๊น€์˜๋„ 35 O O No ๋ฐ•์ •ํ˜„ 110 O X Yes ๊น€ํƒœํฌ 85 O O No ํ™์ง€๋ฏผ 100 X O Yes ์ข…์†๋ณ€์ˆ˜ Ex) ๊ฐ๊ธฐ ์œ ๋ฌด๋ฅผ ๋งž์ถ”์ž! 7
  • 8. 8 Train_data ์ด๋ฆ„ ํ†ตํ•™์‹œ๊ฐ„(๋ถ„) ์–‘์น˜์งˆ ๋งˆ์Šคํฌ ๊ฐ๊ธฐ ์ด์˜ˆ์ฐฌ 40 X O No ๊น€์˜๋„ 35 O O No ๋ฐ•์ •ํ˜„ 110 O X Yes ๊น€ํƒœํฌ 85 O O No ํ™์ง€๋ฏผ 100 X O Yes ํ†ตํ•™์‹œ๊ฐ„ >= 60 ์–‘์น˜์งˆ ์—ฌ๋ถ€ ๊ฐ๊ธฐ: No ๊ฐ๊ธฐ: Yes ๋งˆ์Šคํฌ ์—ฌ๋ถ€ No Yes No Yes ๊ฐ๊ธฐ: Yes ๊ฐ๊ธฐ: No No Yes
  • 9. Train_data๋ฅผ ์™„๋ฒฝ ๋ถ„๋ฅ˜! ์ •ํ™•๋„=1 ์„ฑ๋Šฅ ์ข‹์€ Tree 9 Train_data ์ด์˜ˆ์ฐฌ ๊น€์˜๋„ ํ™์ง€๋ฏผ ๋ฐ•์ •ํ˜„ ๊น€ํƒœํฌ ์ด๋ฆ„ ํ†ตํ•™์‹œ๊ฐ„(๋ถ„) ์–‘์น˜์งˆ ๋งˆ์Šคํฌ ๊ฐ๊ธฐ ์ด์˜ˆ์ฐฌ 40 X O No ๊น€์˜๋„ 35 O O No ๋ฐ•์ •ํ˜„ 110 O X Yes ๊น€ํƒœํฌ 85 O O No ํ™์ง€๋ฏผ 100 X O Yes ํ†ตํ•™์‹œ๊ฐ„ >= 60 ์–‘์น˜์งˆ ์—ฌ๋ถ€ ๊ฐ๊ธฐ: No ๊ฐ๊ธฐ: Yes ๋งˆ์Šคํฌ ์—ฌ๋ถ€ No Yes No Yes ๊ฐ๊ธฐ: Yes ๊ฐ๊ธฐ: No No Yes
  • 10. 10 Train_data ์ด์˜ˆ์ฐฌ ๊น€์˜๋„ ํ™์ง€๋ฏผ ๋ฐ•์ •ํ˜„ ๊น€ํƒœํฌ ์กฐ๋‹จ๋น„ 50 O X No Test_data ์ด๋ฆ„ ํ†ตํ•™์‹œ๊ฐ„(๋ถ„) ์–‘์น˜์งˆ ๋งˆ์Šคํฌ ๊ฐ๊ธฐ ์ด์˜ˆ์ฐฌ 40 X O No ๊น€์˜๋„ 35 O O No ๋ฐ•์ •ํ˜„ 110 O X Yes ๊น€ํƒœํฌ 85 O O No ํ™์ง€๋ฏผ 100 X O Yes ํ†ตํ•™์‹œ๊ฐ„ >= 60 ์–‘์น˜์งˆ ์—ฌ๋ถ€ ๊ฐ๊ธฐ: No ๊ฐ๊ธฐ: Yes ๋งˆ์Šคํฌ ์—ฌ๋ถ€ No Yes No Yes ๊ฐ๊ธฐ: Yes ๊ฐ๊ธฐ: No No Yes ์กฐ๋‹จ๋น„?? -> ์˜ค๋ถ„๋ฅ˜
  • 11. 11 Train_data ์ด์˜ˆ์ฐฌ ๊น€์˜๋„ ํ™์ง€๋ฏผ ์กฐ๋‹จ๋น„ 50 O X No Test_data ์ด๋ฆ„ ํ†ตํ•™์‹œ๊ฐ„(๋ถ„) ์–‘์น˜์งˆ ๋งˆ์Šคํฌ ๊ฐ๊ธฐ ์ด์˜ˆ์ฐฌ 40 X O No ๊น€์˜๋„ 35 O O No ๋ฐ•์ •ํ˜„ 110 O X Yes ๊น€ํƒœํฌ 85 O O No ํ™์ง€๋ฏผ 100 X O Yes ํ†ตํ•™์‹œ๊ฐ„ >= 60 ์–‘์น˜์งˆ ์—ฌ๋ถ€ ๊ฐ๊ธฐ: No ๊ฐ๊ธฐ: Yes ๊ฐ๊ธฐ: No No Yes No Yes ๊น€ํƒœํฌ ์กฐ๋‹จ๋น„ ์งˆ๋ฌธ 1๊ฐœ๋ฅผ ๋œ ํ–ˆ๋”๋ผ๋ฉด? Train_data๋ฅผ ์™„๋ฒฝํ•˜๊ฒŒ ๋ถ„๋ฅ˜ํ•˜์ง„ ๋ชปํ•˜์ง€๋งŒ Test_data๋ฅผ ๋งž์ถœ ์ˆ˜ ์žˆ๋‹ค. โ€œ์–ด๋–ค ๊ธฐ์ค€์œผ๋กœ ์งˆ๋ฌธ์„ ํ•ด์•ผํ• ๊นŒ?โ€ โ€œ์–ด๋–ค ๊ธฐ์ค€์œผ๋กœ ์งˆ๋ฌธ์„ ์ž˜๋ผ์•ผ ํ• ๊นŒ?โ€
  • 12. ๋ถ„์„์˜ ๋ชฉ์ ๊ณผ ์ž๋ฃŒ๊ตฌ์กฐ์— ๋”ฐ๋ผ์„œ ์ ์ ˆํ•œ ๋ถ„ํ• ๊ธฐ์ค€๊ณผ ์ •์ง€๊ทœ์น™์„ ์ง€์ •ํ•˜๊ณ  ์˜ˆ์ธก๊ฐ’์„ ํ• ๋‹นํ•˜์—ฌ ์˜์‚ฌ๊ฒฐ์ • ๋‚˜๋ฌด๋ฅผ ์–ป๋Š”๋‹ค. ์˜์‚ฌ๊ฒฐ์ •๋‚˜๋ฌด์˜ ํ˜•์„ฑ โ€œ๋ฟŒ๋ฆฌ๋งˆ๋””์˜ ์งˆ๋ฌธ์ด ์™œ ๋‚จ์ž์ธ๊ฐ€โ€ > ๋ถ„ํ• ๊ธฐ์ค€ โ€œ๋๋งˆ๋””์˜ ์œ„์น˜๊ฐ€ ์™œ ๋‹ค๋ฅธ๊ฐ€๏ผ‚ > ์ •์ง€๊ทœ์น™ โ€œ์ตœ์ข… ๋ถ„๋ฅ˜(์˜ˆ์ธก)์‹œ ์–ด๋–ค ๊ฐ’์„ ํ• ๋‹นํ•˜๋Š”๊ฐ€โ€œ > ์˜ˆ์ธก๊ฐ’ ํ• ๋‹น 12 ํ˜•์„ฑ๊ณผ์ •
  • 13. Split Rule(Growing Rule): Tree์˜ ๋ถ„๋ฆฌ ๊ทœ์น™/ ์„ฑ์žฅ ๊ทœ์น™ - ๋ถ€๋ชจ ๋งˆ๋””๋กœ๋ถ€ํ„ฐ ์ž์‹ ๋งˆ๋””๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ธฐ์ค€ - ์ˆœ์ˆ˜๋„: ๋ชฉํ‘œ๋ณ€์ˆ˜์˜ ํŠน์ • ๋ฒ”์ฃผ์— ๊ฐœ์ฒด๋“ค์ด ํฌํ•จ๋˜์–ด ์žˆ๋Š” ์ •๋„ - ๋ชฉํ‘œ ๋ณ€์ˆ˜ ๊ตฌ๋ณ„ ์ •๋„๋ฅผ ์ˆœ์ˆ˜๋„ or ๋ถˆ์ˆœ๋„์— ์˜ํ•ด์„œ ์ธก์ • - ์ˆœ์ˆ˜๋„๊ฐ€ ์ฆ๊ฐ€ํ• ์ˆ˜๋ก ๋ถ„ํ• ๊ธฐ์ค€์— ๋”ฐ๋ผ ์ž์‹๋งˆ๋”” ํ˜•์„ฑ Stop Rule: ์ •์ง€ ๊ทœ์น™ - ๋ถ„๋ฆฌ๊ฐ€ ๋” ์ด์ƒ ์ผ์–ด๋‚˜์ง€ ์•Š๋„๋ก ํ•˜๋Š” ๊ธฐ์ค€ Pruning: ๊ฐ€์ง€์น˜๊ธฐ - Tree๊ฐ€ ์ง€๋‚˜์น˜๊ฒŒ ๋งŽ์€ ๋งˆ๋””๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์„ ๊ฒฝ์šฐ, overfitting์˜ ๋ฌธ์ œ ๋ฐœ์ƒ - ๋ถ€์ ์ ˆํ•œ ๋งˆ๋””๋ฅผ ์ž˜๋ผ๋‚ด ๋ชจํ˜•์„ ๋‹จ์ˆœํ™” 13 ์˜์‚ฌ๊ฒฐ์ •๋‚˜๋ฌด์˜ ํ˜•์„ฑ ํ˜•์„ฑ๊ณผ์ •
  • 14. ํƒ€๋‹น์„ฑ ํ‰๊ฐ€ ์ด์ต๋„ํ‘œ(Gain chart)๋‚˜ ์œ„ํ—˜๋„ํ‘œ(Risk chart) ๋˜๋Š” Cross validation์„ ์ด์šฉํ•ด์„œ tree ํ‰๊ฐ€ ํ•ด์„ ๋ฐ ์˜ˆ์ธก ์ƒ์„ฑํ•œ Tree์— ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ๋Œ€์ž… ํ›„, ๋ถ„๋ฅ˜ ๋ฐ ์˜ˆ์ธก ํƒ€๋‹น์„ฑ ํ‰๊ฐ€ & ํ•ด์„ ๋ฐ ์˜ˆ์ธก 14 ํ˜•์„ฑ๊ณผ์ •
  • 15. ๋ถ„๋ฅ˜ ํŠธ๋ฆฌ ์ข…์†๋ณ€์ˆ˜๊ฐ€ ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜ ์ผ ๋•Œ, ๋งŒ๋“ค์–ด์ง€๋Š” ๋ชจ๋ธ(ํŠธ๋ฆฌ) ํšŒ๊ท€ ํŠธ๋ฆฌ ์ข…์†๋ณ€์ˆ˜๊ฐ€ ์—ฐ์†ํ˜• ๋ณ€์ˆ˜ ์ผ ๋•Œ, ๋งŒ๋“ค์–ด์ง€๋Š” ๋ชจ๋ธ(ํŠธ๋ฆฌ) Tree๋ฅผ ๋งŒ๋“œ๋Š” ๊ธฐ์ค€? ๋‘๊ฐ€์ง€ ํŠธ๋ฆฌ์˜ ๋งŒ๋“œ๋Š” ๋ฐฉ์‹์€ ๋™์ผ, Split(growing) Rule๊ณผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ์‹์ด ๋‹ค๋ฅด๋‹ค! 15
  • 16. ๋ถ„๋ฅ˜ ํŠธ๋ฆฌ ๋ถ„๋ฆฌ ๊ทœ์น™(Split Rule) ๋ชฉํ‘œ ๋ณ€์ˆ˜๊ฐ€ ๊ฐ ๋ฒ”์ฃผ์— ์†ํ•˜๋Š” ๋นˆ๋„์— ๊ธฐ์ดˆํ•˜์—ฌ ๋ถ„๋ฆฌ๊ฐ€ ์ผ์–ด๋‚œ๋‹ค. ๋ถˆ์ˆœ๋„: ์–ผ๋งˆ๋‚˜ ๋‹ค๋ฅธ ๋ฒ”์ฃผ์˜ ๊ฐœ์ฒด๋“ค์ด ํฌํ•จ๋˜์–ด ์žˆ๋Š”๊ฐ€ - ์—”ํŠธ๋กœํ”ผ ๊ณ„์ˆ˜(Entropy index) - ์ง€๋‹ˆ ๊ณ„์ˆ˜(Gini index) - ์นด์ด์ œ๊ณฑ ํ†ต๊ณ„๋Ÿ‰์˜ p-value > ๋ถˆ์ˆœ๋„๊ฐ€ ๊ฐ€์žฅ ์ž‘์€ ๋ถ„ํ•  ๊ธฐ์ค€์— ๋”ฐ๋ผ Split 16
  • 17. 17 ๋ถ„๋ฅ˜ ํŠธ๋ฆฌ ์—”ํŠธ๋กœํ”ผ ๊ณ„์ˆ˜ ์—”ํŠธ๋กœํ”ผ ๊ณ„์ˆ˜๊ฐ€ ๊ฐ€์žฅ ์ž‘์€ ๋ถ„ํ•  ๊ธฐ์ค€์— ๋”ฐ๋ผ ์ž์‹ ๋งˆ๋””๋ฅผ ์ƒ์„ฑ ์ง€๋‹ˆ ๊ณ„์ˆ˜ ์ง€๋‹ˆ ๊ณ„์ˆ˜๋ฅผ ๊ฐ์†Œ์‹œ์ผœ์ฃผ๋Š” ๋ถ„ํ•  ๊ธฐ์ค€์— ๋”ฐ๋ผ ์ž์‹ ๋งˆ๋””๋ฅผ ์ƒ์„ฑ
  • 18. 18 ๋ถ„๋ฅ˜ ํŠธ๋ฆฌ > Pk: A์˜์—ญ์— ์†ํ•˜๋Š” ๊ด€์ธก์น˜ ์ค‘ K ์ง‘๋‹จ์— ์†ํ•˜๋Š” ๊ด€์ธก์น˜ ๋น„์œจ G(L) = 1 โˆ’ 7 8 2 + 1 8 2 = 0.21875 G(R) = 1 โˆ’ 3 8 2 + 5 8 2 = 0.46875 Gini = 8 16 ร— 0.21875 + 8 16 ร— 0.46875 = 0.34375 > Pk: A์˜์—ญ์— ์†ํ•˜๋Š” ๊ด€์ธก์น˜ ์ค‘ K ์ง‘๋‹จ์— ์†ํ•˜๋Š” ๊ด€์ธก์น˜ ๋น„์œจ Entropy = 8 16 ร— 0.5436 + 8 16 ร— 0.9544 = 0.749 E(L) = - 7 8 ร— log2 7 8 + 1 8 ร— log2 1 8 = 0.5436 E(R) = - 3 8 ร— log2 3 8 + 5 8 ร— log2 5 8 = 0.9544 ์ง€๋‹ˆ ๊ณ„์ˆ˜ ์—”ํŠธ๋กœํ”ผ ๊ณ„์ˆ˜
  • 19. ์˜ค๋ถ„๋ฅ˜์˜ค์ฐจ ํ•œ์ชฝ ๋ฒ”์ฃผ์— ์†ํ•œ ๋น„์œจp์— ๋”ฐ๋ฅธ ๋ถˆ์ˆœ๋„์˜ ๋ณ€ํ™”๋Ÿ‰์„ ๊ทธ๋ž˜ํ”„๋กœ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ > ๋น„์œจ์ด 0.5(๋‘ ๋ฒ”์ฃผ๊ฐ€ ๊ฐ๊ฐ ๋ฐ˜๋ฐ˜์”ฉ ์„ž์—ฌ ์žˆ๋Š” ๊ฒฝ์šฐ)์ผ ๋•Œ ๋ถˆ์ˆœ๋„๊ฐ€ ์ตœ๋Œ€. ์˜์—ญ A์— ์†ํ•œ ๋ชจ๋“  ๊ด€์ธก์น˜๊ฐ€ ๋™์ผํ•œ ๋ฒ”์ฃผ์— ์†ํ•  ๊ฒฝ์šฐ ๋ถˆํ™•์‹ค์„ฑ ์ตœ์†Œ = ์ˆœ๋„ ์ตœ๋Œ€ ์—”ํŠธ๋กœํ”ผ = 0 ๋ฒ”์ฃผ๊ฐ€ ๋‘๊ฐ€์ง€์ด๊ณ  ํ•ด๋‹น ๊ด€์ธก์น˜๊ฐ€ ๋™์ผํ•˜๊ฒŒ ๋ฐ˜๋ฐ˜ ์„ž์—ฌ ์žˆ์„ ๊ฒฝ์šฐ ๋ถˆํ™•์‹ค์„ฑ ์ตœ๋Œ€ = ์ˆœ๋„ ์ตœ์†Œ ์—”ํŠธ๋กœํ”ผ = 0.5 19
  • 20. P-value๊ฐ’์ด ๊ฐ€์žฅ ์ž‘์€ ๋ถ„ํ•  ๊ธฐ์ค€์— ๋”ฐ๋ผ ์ž์‹ ๋งˆ๋””๋ฅผ ์ƒ์„ฑ > ์นด์ด์ œ๊ณฑ ํ†ต๊ณ„๋Ÿ‰์ด ๊ฐ€์žฅ ํฐ ๊ฒƒ (= p-value ๊ฐ’์ด ๊ฐ€์žฅ ์ž‘์€ ๊ฒƒ) Good Bad Total Left 32(56) 48(24) 80 Right 178(154) 42(66) 220 Total 210 90 300 ๊ฐ ์…€์— ๋Œ€ํ•œ ((๊ธฐ๋Œ€๋„์ˆ˜ โ€“ ์‹ค์ œ๋„์ˆ˜)์˜ ์ œ๊ณฑ/๊ธฐ๋Œ€๋„์ˆ˜)์˜ ํ•ฉ 56โˆ’32 2 56 + 24โˆ’48 2 24 + 154โˆ’178 2 154 + 66โˆ’42 2 66 = 46.75 20 ์นด์ด์ œ๊ณฑ ํ†ต๊ณ„๋Ÿ‰ ๋ถ„๋ฅ˜ ํŠธ๋ฆฌ
  • 21. 21 ๋ถ„๋ฅ˜ ํŠธ๋ฆฌ Case 1 R1 R2 Case 2 R1 R2 Case 3 R1 R2 A B,C B A,C C A,B Case ๋ณ„๋กœ ์ง€๋‹ˆ ๊ณ„์ˆ˜ ํ˜น์€ ์—”ํŠธ๋กœํ”ผ ๊ณ„์ˆ˜๋ฅผ ๋น„๊ตํ•ด ๊ฐ€์žฅ ์ž‘์€ ๊ฐ’์„ ๊ฐ–๋Š” Split rule(case)๋ฅผ ์„ ํƒํ•œ๋‹ค. ์„ค๋ช… ๋ณ€์ˆ˜: ๋ฒ”์ฃผํ˜• (โ€˜Aโ€™, โ€˜Bโ€™, โ€˜Cโ€™)์ผ ๊ฒฝ์šฐ
  • 22. 22 ๋ถ„๋ฅ˜ ํŠธ๋ฆฌ ํ†ตํ•™์‹œ๊ฐ„ >= 60 R1 R2 Yes No ์ง€๋‹ˆ๊ณ„์ˆ˜ ํ˜น์€ ์—”ํŠธ๋กœํ”ผ ๊ณ„์ˆ˜๋ฅผ ๋น„๊ตํ•ด ๊ฐ€์žฅ ์ž‘์€ ๊ฐ’์„ ๊ฐ–๋„๋กํ•˜๋Š” ๊ฒฝ๊ณ„๊ฐ’์„ ์ฐพ๋Š”๋‹ค. ์„ค๋ช… ๋ณ€์ˆ˜: ์—ฐ์†ํ˜•์ผ ๊ฒฝ์šฐ_ ๊ฒฝ๊ณ„๊ฐ’์˜ ๊ธฐ์ค€์„ ์ฐพ๋Š”๋‹ค. ๊ฒฝ๊ณ„๊ฐ’ ํ›„๋ณด Ex1) 1์‚ฌ๋ถ„์œ„์ˆ˜ / 2์‚ฌ๋ถ„์œ„์ˆ˜ / 3์‚ฌ๋ถ„์œ„์ˆ˜ Ex2) seq(from=1์‚ฌ๋ถ„์œ„์ˆ˜, to=3์‚ฌ๋ถ„์œ„์ˆ˜, length.out=10) Ex3) seq(from=min(x), to=max(x), length.out=10)
  • 23. ํšŒ๊ท€ ํŠธ๋ฆฌ ๋ถ„๋ฆฌ ๊ทœ์น™(Split Rule) ๋ชฉํ‘œ ๋ณ€์ˆ˜์˜ ํ‰๊ท ๊ณผ ํ‘œ์ค€ํŽธ์ฐจ์— ๊ธฐ์ดˆํ•˜์—ฌ ๋งˆ๋””์˜ ๋ถ„๋ฆฌ๊ฐ€ ์ผ์–ด๋‚œ๋‹ค. ์˜ˆ์ธก๊ฐ’์˜ ์ข…๋ฅ˜๋Š” terminal node์˜ ๊ฐœ์ˆ˜์™€ ๋™์ผํ•˜๋‹ค. ๋ถˆ์ˆœ๋„: ์–ผ๋งˆ๋‚˜ ๋‹ค๋ฅธ ๋ฒ”์ฃผ์˜ ๊ฐœ์ฒด๋“ค์ด ํฌํ•จ๋˜์–ด ์žˆ๋Š”๊ฐ€ - F-ํ†ต๊ณ„๋Ÿ‰์˜ p-value๊ฐ’ - ๋ถ„์‚ฐ์˜ ๊ฐ์†Œ๋Ÿ‰(Variance reduction) > ์ตœ์†Œ์ œ๊ณฑ ์ถ”์ •๋Ÿ‰(SSE)์„ ์ค„์—ฌ ๋‚˜๊ฐ€๋Š” ๋ฐฉ์‹์œผ๋กœ Split 23
  • 24. ์ตœ์†Œ ์ œ๊ณฑ ์ถ”์ •๋Ÿ‰(SSE)์„ ์ค„์—ฌ ๋‚˜๊ฐ€๋Š” ๋ฐฉ์‹์œผ๋กœ split Node1 y11 = 10 y12 = 12 y13 = 13 y14 = 8 y15 = 7 เดค ๐‘ฆ = 10 Node2 y21 = 20 y22 = 24 y23 = 26 y24 = 16 y25 = 14 เดค ๐‘ฆ = 20 Node1_SSE : 10 โˆ’ 10 2 + 12 โˆ’ 10 2 + โ€ฆ + 7 โˆ’ 10 2 = 26 Node2_SSE : 20 โˆ’ 20 2 + 24 โˆ’ 20 2 + โ€ฆ + 14 โˆ’ 20 2 = 104 SSE์˜ ํ•ฉ: 5 10 ร— 26 + 5 10 ร— 104 = 65 SSE๋ฅผ ์ค„์—ฌ ๋‚˜๊ฐ€๋Š” ๋ฐฉ์‹์œผ๋กœ split 24 ํšŒ๊ท€ ํŠธ๋ฆฌ F-ํ†ต๊ณ„๋Ÿ‰ & ๋ถ„์‚ฐ์˜ ๊ฐ์†Œ๋Ÿ‰
  • 26. 26 ํšŒ๊ท€ ํŠธ๋ฆฌ X0 X1 X2 X3 Y mse = ๊ฐ ๋…ธ๋“œ์˜ MSE๊ฐ’ sample = ๋ถ€๋ชจ ๋…ธ๋“œ๋กœ๋ถ€ํ„ฐ ๋‚ด๋ ค์˜จ ์ž์‹ ๋…ธ๋“œ์˜ sample ์ˆ˜ value = ๊ฐ ๋…ธ๋“œ์˜ sample๋“ค์˜ y ํ‰๊ท ๊ฐ’
  • 27. ์ •์ง€ ๊ทœ์น™ ์ •์ง€๊ทœ์น™(Stop rule) : ๋ถ„๋ฆฌ๊ฐ€ ๋” ์ด์ƒ ์ผ์–ด๋‚˜์ง€ ์•Š๋„๋ก ํ•˜๋Š” ๊ธฐ์ค€ 1. ๋” ์ด์ƒ ๋ถ„๋ฆฌํ•ด๋„ ๋ถˆ์ˆœ๋„๊ฐ€ ์ค„์–ด๋“ค์ง€ ์•Š์„ ๊ฒฝ์šฐ 2. ์ž์‹ ๋งˆ๋””์— ๋‚จ์•„ ์žˆ๋Š” sample ์ˆ˜๊ฐ€ ์ ์€ ๊ฒฝ์šฐ 3. ๋ถ„์„์ž๊ฐ€ ๋ฏธ๋ฆฌ ์ •ํ•ด ๋†“์€ ๊นŠ์ด์— ๋„๋‹ฌํ–ˆ์„ ๊ฒฝ์šฐ 27
  • 28. Pruning ์ ์ ˆํ•˜์ง€ ์•Š์€ ๋งˆ๋””๋ฅผ ์ œ๊ฑฐ > ํ•™์Šต ๋ฐ์ดํ„ฐ์— overfitting ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ๊ฐ€๋Šฅ์„ฑ ์กด์žฌ > ์ ๋‹นํ•œ ํฌ๊ธฐ์˜ ๋‚˜๋ฌด ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๋„๋ก ํ•˜๋Š” ๊ทœ์น™ 28
  • 29. Q. ์˜์‚ฌ๊ฒฐ์ • ๋‚˜๋ฌด์˜ ๋ถ„๋ฅ˜๊ธฐ๊ฐ€ ์ฆ๊ฐ€ํ•˜๋ฉด? A. ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์˜ค๋ถ„๋ฅ˜์œจ์ด ๊ฐ์†Œํ•œ๋‹ค. Q. ๊ณ„์† ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ๋ฌดํ•œํžˆ ์ฆ๊ฐ€ํ•˜๋ฉด? A. ์ผ์ • ์ˆ˜์ค€ ์ด์ƒ์˜ ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ๊ฐ€์งˆ ๊ฒฝ์šฐ, ์˜ค๋ถ„๋ฅ˜์œจ์ด ์˜คํžˆ๋ ค ์ฆ๊ฐ€ํ•œ๋‹ค. ๊ฐ€์ง€์น˜๋Š” ์‹œ์ ? >๊ฒ€์ฆ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์˜ค๋ถ„๋ฅ˜์œจ์ด ์ฆ๊ฐ€ํ•˜๋Š” ์‹œ์  ๊ฐ€์ง€์น˜๊ธฐ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋ฒ„๋ฆฌ๋Š” ๊ฐœ๋…์ด ์•„๋‹Œ ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ํ•ฉ์น˜๋Š” ๊ฐœ๋…์œผ๋กœ ์ดํ•ด 29 Pruning
  • 30. ๋น„์šฉ๋ณต์žก๋„๋ฅผ ์ตœ์†Œ๋กœ ํ•˜๋Š” ๋ถ„๊ธฐ๋ฅผ ์ฐพ์•„๋‚ด๋„๋ก ํ•™์Šต CC(T): ์˜์‚ฌ๊ฒฐ์ • ๋‚˜๋ฌด์˜ ๋น„์šฉ ๋ณต์žก๋„ (= ์˜ค๋ฅ˜๊ฐ€ ์ ์œผ๋ฉด์„œ terminal node์ˆ˜๊ฐ€ ์ ์€ ๋‹จ์ˆœํ•œ ๋ชจ๋ธ์ผ์ˆ˜๋ก ์ž‘์€ ๊ฐ’) Err(T): ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์˜ค๋ถ„๋ฅ˜์œจ(๋ถˆ์ˆœ๋„) L(T): ๋๋งˆ๋””์˜ ์ˆ˜(๊ตฌ์กฐ์˜ ๋ณต์žก๋„) a: Err(T)์™€ L(T)๋ฅผ ๊ฒฐํ•ฉํ•˜๋Š” ๊ฐ€์ค‘์น˜ (๋ณดํ†ต 0.01~0.1์˜ ๊ฐ’) 30 ๋น„์šฉ๋ณต์žก๋„(cost-complexity) Pruning
  • 31. Splitํ•  ๋•Œ ์˜ค๋ถ„๋ฅ˜์œจ + (๋งˆ๋”” ์ˆ˜ *0.5) ๋งŒํผ์˜ ์˜ค์ฐจ๊ฐ€ ๋” ์žˆ๋‹ค๊ณ  ๊ฐ€์ • > ๊ณผ์ ํ•ฉ์„ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด์„œ ์˜ค๋ถ„๋ฅ˜์œจ: 0.45 ๋๋งˆ๋”” ์ˆ˜: 4 Pessimistic error 0.45+4*0.5 = 2.45 ์˜ค๋ถ„๋ฅ˜์œจ: 0.66 ๋๋งˆ๋”” ์ˆ˜: 3 Pessimistic error 0.66+3*0.5 = 2.16 Tree model์„ pruning ํ–ˆ์„ ๋•Œ๋–„ pessimistic error๊ฐ’์ด ๋” ๋‚ฎ์œผ๋ฏ€๋กœ, pruning ์‹คํ–‰ 31 Pessimistic pruning Pruning
  • 32. ๋ถ„๋ฅ˜ ํŠธ๋ฆฌ ๋๋งˆ๋””์˜ ๋ฒ”์ฃผ์˜ voting๊ฐ’์œผ๋กœ ์˜ˆ์ธก ํšŒ๊ท€ ํŠธ๋ฆฌ ๋๋งˆ๋””์˜ เดฅ ๐ฒ๋กœ ์˜ˆ์ธก Node1 y11 = 10 y12 = 12 y13 = 13 y14 = 8 y15 = 7 เดค ๐‘ฆ = 10 Node2 y21 = 20 y22 = 24 y23 = 26 y24 = 16 y25 = 14 เดค ๐‘ฆ = 20 Yes: 90 No: 10 Yes: 34 No: 66 yes no 32 ํ•ด์„ ๋ฐ ์˜ˆ์ธก
  • 33. ๋‹จ์  ์žฅ์  1. ํ•ด์„์˜ ์šฉ์ด์„ฑ - ๋‚˜๋ฌด๊ตฌ์กฐ์— ์˜ํ•ด์„œ ๋ชจํ˜•์ด ํ‘œํ˜„๋˜๊ธฐ ๋•Œ๋ฌธ์— ํ•ด์„์ด ์‰ฝ๋‹ค. - ์ƒˆ๋กœ์šด ์ž๋ฃŒ์— ๋ชจํ˜•์„ ์ ํ•ฉ ์‹œํ‚ค๊ธฐ ์‰ฝ๋‹ค. - ์–ด๋–ค ์ž…๋ ฅ๋ณ€์ˆ˜๊ฐ€ ์ค‘์š”ํ•œ์ง€ ํŒŒ์•…์ด ์‰ฝ๋‹ค. 2. ๊ตํ˜ธ์ž‘์šฉ ํšจ๊ณผ์˜ ํ•ด์„ - ๋‘ ๊ฐœ ์ด์ƒ์˜ ๋ณ€์ˆ˜๊ฐ€ ๊ฒฐํ•ฉํ•˜์—ฌ ๋ชฉํ‘œ๋ณ€์ˆ˜์— ์–ด๋– ํ•œ ์˜ํ–ฅ์„ ์ฃผ๋Š”์ง€ ์•Œ๊ธฐ ์‰ฝ๋‹ค. 3. ๋น„๋ชจ์ˆ˜์  ๋ชจํ˜• - ์„ ํ˜•์„ฑ, ์ •๊ทœ์„ฑ, ๋“ฑ๋ถ„์‚ฐ์„ฑ์˜ ๊ฐ€์ •์ด ํ•„์š”ํ•˜์ง€ ์•Š๋‹ค. - ๋‹จ์ง€ ์ˆœ์œ„๋งŒ ๋ถ„์„์— ์˜ํ–ฅ์„ ์ฃผ๋ฏ€๋กœ ์ด์ƒ์น˜์— ๋ฏผ๊ฐํ•˜์ง€ ์•Š๋‹ค. 1. ๋น„์—ฐ์†์„ฑ - ์—ฐ์†ํ˜• ๋ณ€์ˆ˜๋ฅผ ๋น„์—ฐ์†์ ์ธ ๊ฐ’์œผ๋กœ ์ทจ๊ธ‰ํ•˜์—ฌ ์˜ˆ์ธก์˜ค๋ฅ˜๊ฐ€ ํด ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋‹ค. 2. ์„ ํ˜•์„ฑ ๋˜๋Š” ์ฃผํšจ๊ณผ์˜ ๊ฒฐ์—ฌ - ์„ ํ˜• ๋˜๋Š” ์ฃผํšจ๊ณผ ๋ชจํ˜•์—์„œ์™€ ๊ฐ™์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์—†๋‹ค. 3. ๋น„์•ˆ์ •์„ฑ - ๋ถ„์„์šฉ ์ž๋ฃŒ์—๋งŒ ์˜์กดํ•˜๋ฏ€๋กœ ์ƒˆ๋กœ์šด ์ž๋ฃŒ์˜ ์˜ˆ์ธก์— ๋ถˆ์•ˆ์ •ํ•˜๋‹ค. 33
  • 34. ์ •๋ฆฌ Tree๋Š” ํ•ด์„์ด ์šฉ์ดํ•˜๊ณ , ์˜์‚ฌ๊ฒฐ์ • ๋ฐฉ๋ฒ•์ด ์ธ๊ฐ„์˜ ์˜์‚ฌ๊ฒฐ์ • ๋ฐฉ๋ฒ•๊ณผ ์œ ์‚ฌํ•˜์ง€๋งŒ, ์˜ˆ์ธก๋ ฅ์ด ๋งŽ์ด ๋–จ์–ด์ง€๊ณ , bias-variance ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด! Tree๋ฅผ ์—ฌ๋Ÿฌ ๊ฐœ ์ƒ์„ฑํ•˜๊ณ  ๊ฒฐํ•ฉํ•˜๋Š” Ensemble (bagging, boosting, Random Forest)์„ ์‚ฌ์šฉํ•˜์—ฌ Tree์˜ ์˜ˆ์ธก๋ ฅ์„ ๋†’์ธ๋‹ค. 34
  • 36. Ensemble ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ถ„๋ฅ˜, ํšŒ๊ท€ ๋ชจํ˜•์„ ์กฐํ•ฉํ•˜์—ฌ ์˜ˆ์ธก์˜ ์ •ํ™•๋„๋ฅผ ๋†’์ด๋Š” ๋ฐฉ๋ฒ• Bias & variance Trade off๋ฅผ ํ•ด๊ฒฐํ•ด error๋ฅผ ๋‚ฎ์ถ”๋Š” ๊ฒƒ์ด ๋ชฉ์  ๋ฐฐ๊น…(Bagging) / ๋ถ€์ŠคํŒ…(Boosting) / ์Šคํƒœํ‚น(Stacking) 36
  • 37. ์•™์ƒ๋ธ”์„ ์™œ ์‚ฌ์šฉํ• ๊นŒ? ํ•™์Šต์—์„œ์˜ ์˜ค๋ฅ˜ 1. Underfitting(๋‚ฎ์€ bias) 2. Overfitting(๋†’์€ Variance) ํŠนํžˆ ๋ฐฐ๊น…์€ ๊ฐ ์ƒ˜ํ”Œ์—์„œ ๋‚˜ํƒ€๋‚œ ๊ฒฐ๊ณผ๋ฅผ ์ผ์ข…์˜ ์ค‘๊ฐ„๊ฐ’์œผ๋กœ ๋งž์ถ”์–ด ์ฃผ๊ธฐ ๋•Œ๋ฌธ์— overfitting์„ ํ”ผํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ฒ”์ฃผํ˜•์ผ ๊ฒฝ์šฐ, Voting์œผ๋กœ ์ง‘๊ณ„ ์—ฐ์†ํ˜•์ผ ๊ฒฝ์šฐ, Average๋กœ ์ง‘๊ณ„ ์•™์ƒ๋ธ” ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ์˜ค๋ฅ˜๋ฅผ ์ตœ์†Œํ™”! 37
  • 39. ๋ถ€์ŠคํŒ… Boosting ๋ฐฐ๊น… Bootstrap AGGregatING (Bagging) ์—ฌ๋Ÿฌ ๋ฒˆ์˜ ์ƒ˜ํ”Œ์„ ๋ฝ‘์•„ ๊ฐ ๋ชจ๋ธ์„ ํ•™์Šต์‹œ์ผœ ๊ฒฐ๊ณผ๋ฅผ ์ง‘๊ณ„(Aggregating)ํ•˜๋Š” ๋ฐฉ๋ฒ• Random Forest ์•ฝํ•œ ํ•™์Šต ๊ฒฐ๊ณผ๋ฅผ ํ•ฉ์ณ ๊ฐ•ํ•œ ํ•™์Šต ๊ฒฐ๊ณผ๋ฅผ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ• AdaBoost / XGBoost GBM / Light GBM 39
  • 40. 40
  • 41. Bagging Bagging: Bootstrap aggregating์˜ ์ค€๋ง. (Bootstrap: ๋ฐ์ดํ„ฐ์—์„œ ์ผ๋ถ€๋ฅผ ์ƒ˜ํ”Œ๋งํ•˜๋Š” ๊ฒƒ) - ์› ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์œผ๋กœ๋ถ€ํ„ฐ ํฌ๊ธฐ๊ฐ€ ๊ฐ™์€ ํ‘œ๋ณธ์„ ์—ฌ๋Ÿฌ ๋ฒˆ ๋‹จ์ˆœ ์ž„์˜ ๋ณต์› ์ถ”์ถœํ•˜์—ฌ ๊ฐ ํ‘œ๋ณธ์— ๋Œ€ํ•ด ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ์ƒ์„ฑํ•œ ํ›„ ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ์•™์ƒ๋ธ” ํ•˜๋Š” ๋ฐฉ๋ฒ• - ๋ฐ˜๋ณต ์ถ”์ถœ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ™์€ ๋ฐ์ดํ„ฐ๊ฐ€ ํ•œ ํ‘œ๋ณธ์— ์—ฌ๋Ÿฌ ๋ฒˆ ์ถ”์ถœ ๋  ์ˆ˜๋„, ์–ด๋–ค ๋ฐ์ดํ„ฐ๋Š” ์ถ”์ถœ๋˜์ง€ ์•Š์„ ์ˆ˜๋„ ์žˆ๋‹ค. - ํ•˜๋‚˜์˜ ํ•™์Šต ๋ฐ์ดํ„ฐ์— overfitting ๋˜์–ด ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์—์„œ ์„ฑ๋Šฅ์ด ๋‚ฎ์„ ๊ฒฝ์šฐ ์ฆ‰, high variance์ผ ๊ฒฝ์šฐ๋ฅผ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ๋งŒ๋“ค์–ด ๊ฒฐํ•ฉํ•จ์œผ๋กœ์จ low variance๋กœ ๋ณ€ํ™”์‹œ์ผœ ์˜ˆ์ธก ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค. 41
  • 42. 1. n๋ฒˆ ๋‹จ์ˆœ ์ž„์˜ ๋ณต์› ์ถ”์ถœ Bagging 42 A B C D Train data A A A B B D D D C C C C . . . Bootstrap sample1 Bootstrap sample2 Bootstrap sample3
  • 43. 2. ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋‹จ์ผ ๋ชจ๋ธ ์ƒ์„ฑ Bagging 43 A A A B B D D D C C C C . . . training training training model1 model2 model3 . . . ๋ฒ”์ฃผํ˜•: voting ์—ฐ์†ํ˜•: averaging
  • 46. Bagging > Variance๋ฅผ ๋‚ฎ์ถ”๊ธฐ ์ข‹๋‹ค. ์„œ๋กœ ๋‹ค๋ฅธ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ชจ๋ธ๋“ค์˜ ๊ฒฐ๊ณผ๋“ค์„ ๋ชจ๋‘ ๊ณ ๋ คํ•˜์—ฌ ์ตœ์ข… ๊ฒฐ๊ณผ๋ฅผ ๋‚ด๊ธฐ ๋•Œ๋ฌธ์— ํ•˜๋‚˜์˜ trainset์— ๋„ˆ๋ฌด ์น˜์ค‘๋œ ์ƒํƒœ์—์„œ train์ด ๋˜๋Š” ๊ฒฝ์šฐ(overfitting)๋ฅผ ํ”ผํ•  ์ˆ˜ ์žˆ๋‹ค. ๋™์‹œ์— ๋ชจ๋ธ์˜ ๋†’์€ ์ •ํ™•๋„๋„ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ๋‹ค. >> ๊ณผ์ ํ•ฉ ์šฐ๋ ค๊ฐ€ ํฐ (=variance๊ฐ€ ํฐ) ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ๋ฒ•๋“ค์— ์ ์šฉํ•˜๊ธฐ ์ข‹์€ ์•™์ƒ๋ธ” ๋ฐฉ๋ฒ•์ด๋‹ค. 46
  • 47. Bagging ๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ(Random Forest) > ์˜์‚ฌ๊ฒฐ์ •๋‚˜๋ฌด์— ๋ฐฐ๊น…์„ ์ ์šฉํ•œ ๋ชจํ˜• = Data Bagging + Feature Bagging + Decision Tree Data Bagging > ๋ฐ์ดํ„ฐ๋ฅผ ์ž„์˜ ๋‹จ์ˆœ ๋ณต์› ์ถ”์ถœ ์‹œํ–‰(์ผ๋ฐ˜์ ์ธ ๋ฐฐ๊น…) Feature Bagging(ํŠน์„ฑ ๋ฐฐ๊น…) > ๋ณ€์ˆ˜๋ฅผ ์ž„์˜ ๋‹จ์ˆœ ๋ณต์› ์ถ”์ถœ (๋ฌด์ž‘์œ„๋กœ ์„ ํƒ๋œ ํŠน์„ฑ์„ ์ด์šฉ=์ž„์˜ ๋…ธ๋“œ ์ตœ์ ํ™”) Decision Tree > ์˜์‚ฌ๊ฒฐ์ • ๋‚˜๋ฌด(CART) 47 Random Forest
  • 48. Bagging 48 Random Forest = Data Bagging + Feature Bagging + Decision Tree ์˜จ๋„ ์Šต๋„ ํ’์† ๋น„ ์—ฌ๋ถ€ A 15 60 15 1 B 21 10 1 1 C 3 70 5 0 D 7 2 30 0 ์˜จ๋„ ์Šต๋„ ๋น„ ์—ฌ๋ถ€ A 15 60 1 A 15 60 1 C 3 70 0 A 15 60 1 ์Šต๋„ ํ’์† ๋น„ ์—ฌ๋ถ€ D 2 30 0 B 10 1 1 C 70 5 0 C 70 5 0 Tree1 Tree2 Tree correlation(๋ชจ๋‘ ๋™์ผํ•œ ๋ณ€์ˆ˜๋ฅผ ๊ฐ€์ ธ์™”์„ ๋•Œ ์ƒ๊ธฐ๋Š” ๋ฌธ์ œ)์ด ๋†’์•„์ง€๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด feature bagging ์‹คํ–‰
  • 49. Bagging Random Forest ์„ฑ๋Šฅํ‰๊ฐ€ OOB(Out-Of-Bag) error > ๋ถ€ํŠธ์ŠคํŠธ๋žฉ ์ƒ˜ํ”Œ๋ง ๊ณผ์ •์—์„œ ์ถ”์ถœ๋˜์ง€ ์•Š์€ ๋ฐ์ดํ„ฐ๋“ค์„ ๋”ฐ๋กœ ๋žœ๋คํฌ๋ ˆ์ŠคํŠธ๋กœ ํ•™์Šต์‹œ์ผœ ๋‚˜์˜จ ๊ฒฐ๊ณผ > OOB ์ƒ˜ํ”Œ๋“ค์€ ์ฃผ๋กœ ํ‰๊ฐ€์šฉ ๋ฐ์ดํ„ฐ์—์„œ์˜ ์˜ค๋ถ„๋ฅ˜์œจ์„ ์˜ˆ์ธกํ•˜๋Š” ์šฉ๋„ ๋ฐ ๋ณ€์ˆ˜ ์ค‘์š”๋„๋ฅผ ์ถ”์ •ํ•˜๋Š” ์šฉ๋„๋กœ ๋งŽ์ด ์ด์šฉ๋œ๋‹ค. 49
  • 50. Bagging Random Forest Random Forest ์ค‘์š” ํŠน์ง•: ๋ณ€์ˆ˜ ์ค‘์š”๋„ - ๋ฐ์ดํ„ฐ ๋ถ„์„ ์‹œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ issue ์ค‘ ํ•˜๋‚˜๋Š” ๋ณ€์ˆ˜ ์„ ํƒ! - Feature bagging์„ ์ ์šฉํ•˜๋Š” Random Forest๋Š” ๊ฐ ๋ณ€์ˆ˜๋“ค์˜ ์ค‘์š”์„ฑ์— ์ˆœ์œ„๋ฅผ ๋งค๊ธธ ์ˆ˜ ์žˆ๋‹ค. - Random Forest ๊ตฌ์„ฑ ๊ณผ์ •์—์„œ ๊ฐ feature์— ๋Œ€ํ•œ OOB-์˜ค์ฐจ(Out-Of-Bag error)๊ฐ€ ๊ตฌํ•ด์ง„๋‹ค. (OOB: Bootstrap์„ ํ†ตํ•œ ์ž„์˜ ์ค‘๋ณต ์ถ”์ถœ ์‹œ ๊ฐ feature์— ๋Œ€ํ•œ test data) - ๋ถ„๋ฅ˜์˜ ๊ฒฝ์šฐ, ํ•ด๋‹น ๋ณ€์ˆ˜๋กœ ๋ถ€ํ„ฐ ๋ถ„ํ• ์ด ์ผ์–ด๋‚  ๋•Œ ๋ถˆ์ˆœ๋„์˜ ๊ฐ์†Œ๊ฐ€ ์–ผ๋งˆ๋‚˜ ์ผ์–ด๋‚˜๋Š”์ง€์— ๋”ฐ๋ผ ๋ณ€์ˆ˜ ์ค‘์š”๋„ ๊ฒฐ์ • - ํšŒ๊ท€์˜ ๊ฒฝ์šฐ, ์ž”์ฐจ์ œ๊ณฑํ•ฉ์„ ํ†ตํ•ด ๋ณ€์ˆ˜ ์ค‘์š”๋„ ์ธก์ • 50
  • 52. Boosting Boosting Bootstrap ์ƒ˜ํ”Œ๋ง ์ดํ›„, ๋ถ„๋ฅ˜ ๊ฒฐ๊ณผ๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ถ„๋ฅ˜๊ฐ€ ์ž˜๋ชป๋œ ๋ฐ์ดํ„ฐ์— ๋” ํฐ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜์—ฌ ํ‘œ๋ณธ์„ ์ถ”์ถœํ•˜๋Š” ๋ฐฉ๋ฒ• > Bagging์€ ๋ณ‘๋ ฌ๋กœ ํ•™์Šตํ•˜๋Š” ๋ฐ˜๋ฉด, Boosting์€ ์ˆœ์ฐจ์ ์œผ๋กœ ํ•™์Šต > ํ•™์Šต์ด ๋๋‚˜๋ฉด ๋‚˜์˜จ ๊ฒฐ๊ณผ์— ๋”ฐ๋ผ ๊ฐ€์ค‘์น˜ ์žฌ๋ถ„๋ฐฐ > ์˜ค๋‹ต์— ๋Œ€ํ•ด ๋†’์€ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜๊ณ  ์ •๋‹ต์— ๋Œ€ํ•ด ๋‚ฎ์€ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜์—ฌ ์˜ค๋‹ต์— ๋”์šฑ ์ง‘์ค‘ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค. 52
  • 54. Boosting 54 ์•Œ๊ณ ๋ฆฌ์ฆ˜ ํŠน์ง• ๋น„๊ณ  AdaBoost - ์˜ค๋ถ„๋ฅ˜์— ๋Œ€ํ•ด ๊ฐ€์ค‘์น˜ ๋ถ€์—ฌ 2003 GBM - Loss Function์˜ gradient๋ฅผ ํ†ตํ•ด ์˜ค๋‹ต์— ๊ฐ€์ค‘์น˜ ๋ถ€์—ฌ 2007 XGBoost - GBM ๋Œ€๋น„ ์„ฑ๋Šฅ ํ–ฅ์ƒ - ์‹œ์Šคํ…œ ์ž์› ํšจ์œจ์  ํ™œ์šฉ(CPU, Mem) - Kaggle์„ ํ†ตํ•œ ์„ฑ๋Šฅ ๊ฒ€์ฆ 2014 Light GBM - XGBoost ๋Œ€๋น„ ์„ฑ๋Šฅ ํ–ฅ์ƒ ๋ฐ ์ž์›์†Œ๋ชจ ์ตœ์†Œํ™” - XGBoost๊ฐ€ ์ฒ˜๋ฆฌํ•˜์ง€ ๋ชปํ•˜๋Š” ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ ํ•™์Šต ๊ฐ€๋Šฅ - Approximates the split(๊ทผ์‚ฌ์น˜์˜ ๋ถ„ํ• )์„ ํ†ตํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ 2016
  • 55. Boosting AdaBoost 0. ๋ฐฐ๊น…๊ณผ ๋™์ผํ•˜๊ฒŒ ์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐ ์ถ”์ถœ 1. ํ•˜๋‚˜์˜ ๋ชจํ˜• ํ•™์Šต(Box1) 2. ํ•™์Šต ๊ฒฐ๊ณผ๋กœ ์˜ˆ์ธกํ•˜์—ฌ ํ‹€๋ฆฐ ์‚ฌ๋ก€์— ํฐ ๊ฐ€์ค‘์น˜ ๋ถ€์—ฌํ•˜๊ณ  ๋‹ค์‹œ ํ•™์Šต(Box2) 3. ์ด ๊ณผ์ •์„ ๋ฐ˜๋ณต(Box3)ํ•˜์—ฌ ํ•™์Šต ๊ฒฐ๊ณผ๋ฅผ ํ•ฉ์นœ๋‹ค.(Box4) > ์žก์Œ์ด ๋งŽ์€ ๋ฐ์ดํ„ฐ์™€ ์ด์ƒ์ ์— ์ทจ์•ฝํ•˜๋‹ค. AdaBoost 55
  • 56. 1. 1ํšŒ ๋ณต์›์ถ”์ถœ ํ›„ ํŠธ๋ฆฌ ๋ชจ๋ธ ์ƒ์„ฑ AdaBoost 56 A B C D Train data A C D Tree 1 Bagging๊ณผ ๋™์ผ A
  • 57. 2. ๊ฐ€์ค‘์น˜ ์—…๋ฐ์ดํŠธ AdaBoost 57 B C D A B C D Train data Train_Data๋กœ test๋ฅผ ํ•œ ํ›„, ์˜ค๋ถ„๋ฅ˜๋œ ๋ฐ์ดํ„ฐ๋“ค์ด ์ถ”์ถœ๋  ํ™•๋ฅ ์„ ๋†’์ธ ํ›„ 1๋ฒˆ ๊ณผ์ •์„ ๋‹ค์‹œ ์‹œํ–‰ ๊ฐ€์ค‘์น˜ A 1 4 B 1 4 C 1 4 D 1 4 ์—…๋ฐ์ดํŠธ๋œ ๊ฐ€์ค‘์น˜ A 1 4 โˆ— exp(โˆ’๐‘Ž)/(๐ด + ๐ต + ๐ถ + ๐ท) B 1 4 โˆ— exp(๐‘Ž)/(๐ด + ๐ต + ๐ถ + ๐ท) C 1 4 โˆ— exp(โˆ’๐‘Ž)/(๐ด + ๐ต + ๐ถ + ๐ท) D 1 4 โˆ— exp(โˆ’๐‘Ž)/(๐ด + ๐ต + ๐ถ + ๐ท) B Tree 2 #. e(์—๋Ÿฌ์œจ): ์˜ค๋ฅ˜๋ฐ์ดํ„ฐ ๊ฐ€์ค‘์น˜ ํ•ฉ / ์ „์ฒด ๋ฐ์ดํ„ฐ ๊ฐ€์ค‘์น˜ ํ•ฉ #. a(์‹ ๋ขฐ๋„): 1 2 โˆ— ln( 1โˆ’๐‘’ ๐‘’ ) B๋งŒ ์˜ค๋ถ„๋ฅ˜๋œ ์ƒํƒœ
  • 58. 3. ์œ„ ๊ณผ์ • ๋ฐ˜๋ณต AdaBoost 58 ์—๋Ÿฌ์œจ์ด 0์ด ๋  ๋•Œ๊นŒ์ง€ ํ˜น์€ ํŠธ๋ฆฌ ๋ชจ๋ธ ์ˆ˜๊ฐ€ ์ผ์ •ํ•œ ์ˆ˜์— ๋„๋‹ฌํ•  ๋•Œ๊นŒ์ง€ ์œ„ ๊ณผ์ •๋“ค์„ ๊ณ„์† ๋ฐ˜๋ณต . . . A B C D Train data D C D B B C D B C Tree 1 Tree 2 Tree 3 A A A
  • 59. 4. ์‹ ๋ขฐ๋„(a)๋ฅผ ๊ณฑํ•˜์—ฌ voting AdaBoost 59 Tree 1 Predict*a Tree 2 Predict*a Tree 3 Predict*a . . . . . . + + =
  • 61. Boosting GBM GBM(Gradient Boosting) = ๊ฒฝ์‚ฌ ๋ถ€์ŠคํŒ… > ์ด์ „์˜ ํ•™์Šต ๊ฒฐ๊ณผ์™€ ์‹ค์ œ์˜ ์ฐจ์ด๋ฅผ ํ•™์Šตํ•˜๋Š” ๋ฐฉ์‹ > ๊ฐ€์ค‘์น˜ ๊ณ„์‚ฐ ๋ฐฉ์‹์—์„œ Gradient Descent๋ฅผ ์ ์šฉ ๋‘๋ฒˆ์งธ ํ•™์Šต ๊ฒฐ๊ณผ f2๊ฐ€ ์ฒซ๋ฒˆ์งธ ์˜ˆ์ธก ์˜ค์ฐจ๋ฅผ ํ•™์Šต ์ฒซ๋ฒˆ์งธ ํ•™์Šต ๊ฒฐ๊ณผ f1 ํ•™์Šต์ด ์ž˜ ๋˜์—ˆ๋‹ค๋ฉด, ์ฒซ๋ฒˆ์งธ ํ•™์Šต ์˜ค์ฐจ๋ถ„์‚ฐ Var(e1)๋ณด๋‹ค ๋‘๋ฒˆ์งธ ํ•™์Šต์˜ ์˜ค์ฐจ๋ถ„์‚ฐ Var(e2)๊ฐ€ ๋” ์ค„์–ด๋“ค๊ฒŒ ๋œ๋‹ค. 61
  • 62. Boosting GBM ํ•™์Šต๊ธฐ M์— ๋Œ€ํ•ด์„œ Y๋ฅผ ์˜ˆ์ธกํ•  ํ™•๋ฅ  > Y = M(x) + error > error = G(x) + error2 (error > error2) > error2 = H(x) + error3 (error2 > error3) > Y = M(x) + G(x) + H(x) + error3 ํ•™์Šต๊ธฐ M์„ ๋‹จ๋…์œผ๋กœ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ๋ณด๋‹ค ์ •ํ™•๋„๊ฐ€ ๋†’์„ ๊ฒƒ์ด๋‹ค. (error > error3) Error๋ฅผ ์„ธ๋ถ„ํ™” 62
  • 63. Boosting GBM ํ•™์Šต๊ธฐ M์„ ๋‹จ๋…์œผ๋กœ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ๋ณด๋‹ค ์ •ํ™•๋„๊ฐ€ ๋†’์„ ๊ฒƒ์ด๋‹ค. (error > error3) Error๋ฅผ ์„ธ๋ถ„ํ™” 63 M, G, H ๊ฐ๊ฐ ๋ถ„๋ฅ˜๊ธฐ์˜ ์„ฑ๋Šฅ์ด ๋‹ค๋ฅธ๋ฐ, ๋ชจ๋‘ ๊ฐ™์€ ๋น„์ค‘์„ ๋‘๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ž„์˜์˜ x์— ๋Œ€ํ•ด ์„œ๋กœ ๊ฐ„์„ญํ•˜์—ฌ ์˜ค๋ฅ˜๋ฅผ ๋†’์ด๋Š” ๊ฒฐ๊ณผ๋ฅผ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋‹ค. ํ•™์Šต๊ธฐ M์— ๋Œ€ํ•ด์„œ Y๋ฅผ ์˜ˆ์ธกํ•  ํ™•๋ฅ  > Y = M(x) + error > error = G(x) + error2 (error > error2) > error2 = H(x) + error3 (error2 > error3) > Y = M(x) + G(x) + H(x) + error3
  • 64. Boosting GBM ๊ฐ ํ•จ์ˆ˜๋ณ„ ์ตœ์ ์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์ฐพ์œผ๋ฉด, ์˜ˆ์ธก ์ •ํ™•๋„ ํ–ฅ์ƒ (Gradient Descent ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ์ตœ์  weight๊ฐ’ ๊ณ„์‚ฐ) Y = alpha * M(x) + beta * G(x) + gamma * H(x) + error3 64
  • 65. Boosting GBM ๊ฐ€์ค‘์น˜ ๋ถ€์—ฌ ๋ฐฉ์‹? > ์ดˆ๊ธฐ ๋ฐ์ดํ„ฐ ๊ฐ€์ค‘์น˜(D) = 1/n (n: ์ „์ฒด ํ•™์Šต ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜) > ์˜ค๋ฅ˜(e): ์˜ค๋ฅ˜๋ฐ์ดํ„ฐ / ์ „์ฒด ํ•™์Šต ๋ฐ์ดํ„ฐ > Weak model์˜ ํ•จ์ˆ˜: h(t) (=๋ชจ๋ธ์˜ ์˜ˆ์ธก๊ฐ’) > ๊ฐ€์ค‘์น˜ ์—…๋ฐ์ดํŠธ(D) > ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜(a) (=learning rate์—ญํ• ) 65
  • 66. Boosting XGBoost - GBM + ๋ถ„์‚ฐ/๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ - Split ์ง€์ ์„ ๊ณ ๋ คํ•  ๋•Œ ์ผ๋ถ€๋ฅผ ๋ณด๊ณ  ๊ฒฐ์ • - Sparsity Awareness๊ฐ€ ๊ฐ€๋Šฅ - Binary classification / Multiclass classification / Regression / Learning to Rank XGBoost (eXtreme Gradient Boosting) > ๋ชฉ์ ํ•จ์ˆ˜๋ฅผ ์ •์˜ํ•˜๊ณ  ์ด๋ฅผ ์ตœ์†Œํ™” ํ•˜๋Š” ๊ฐ’์„ ์ฐพ์•„ ๊ฐ€์ค‘์น˜๋ฅผ ์—…๊ทธ๋ ˆ์ด๋“œ > ์ง€๋„ํ•™์Šต์œผ๋กœ ๋ณ€์ˆ˜(x)๋ฅผ ํ•™์Šตํ•˜์—ฌ ์ •๋‹ต(y)์„ ์˜ˆ์ธก 66
  • 67. Boosting XGBoost > CPU ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ, ์ฝ”์–ด๋“ค์ด ๊ฐ์ž ํ• ๋‹น ๋ฐ›์€ ๋ณ€์ˆ˜๋“ค๋กœ ์ œ๊ฐ๊ธฐ ๊ฐ€์ง€๋ฅผ ์ณ ๋‚˜๊ฐ„๋‹ค. > ์—ฐ์‚ฐ์‹œ๊ฐ„ ๋‹จ์ถ• 67
  • 68. Boosting XGBoost > ์—ฐ์†ํ˜• ๋ณ€์ˆ˜์—์„œ split ์ง€์ ์„ ๊ณ ๋ คํ•  ๋•Œ ๋ชจ๋“  ๊ฐ’๋“ค์„ ์‚ดํŽด๋ณด๊ณ  ๊ฒฐ์ •ํ•˜๊ธฐ ๋ณด๋‹จ ์ผ๋ถ€๋ถ„๋งŒ์„ ๋ณด๊ณ  ๊ฒฐ์ •์„ ํ•œ๋‹ค. ID ๊ตญ์–ด ์ˆ˜ํ•™ 1 100 90 2 95 80 3 75 90 4 20 55 ์ผ๋ถ€ ํ›„๋ณด๊ตฐ๋งŒ ๋ณด๊ณ  split ์ง€์  ๊ฒฐ์ • 68
  • 69. Boosting XGBoost > Sparsity Awareness๊ฐ€ ๊ฐ€๋Šฅ > Zero ๋ฐ์ดํ„ฐ๋ฅผ ๊ฑด๋„ˆ๋›ฐ๋ฉด์„œ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•˜๋‹ค. > Input์„ dummyํ™” ํ•˜๋ฉด ์†๋„๊ฐ€ ์ƒ์Šน! ID ๊ณผ๋ชฉ 1 ๊ตญ์–ด 2 ์ˆ˜ํ•™ 3 ์˜์–ด 4 ๊ณผํ•™ ID ๊ตญ์–ด ์ˆ˜ํ•™ ์˜์–ด ๊ณผํ•™ 1 1 0 0 0 2 0 1 0 0 3 0 0 1 0 4 0 0 0 1 69
  • 70. Boosting Light GBM 70 Light GBM > Decision Tree ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ธฐ๋ฐ˜์˜ GBM ํ”„๋ ˆ์ž„์›Œํฌ (์†๋„, ์„ฑ๋Šฅ ํ–ฅ์ƒ) > Ranking, Classification๋“ฑ์˜ ๋ฌธ์ œ์— ํ™œ์šฉ ๋‹ค๋ฅธ tree ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ์˜ ์ฐจ์ด์ ? - Leaf-wise๋กœ tree๋ฅผ ์„ฑ์žฅ(์ˆ˜์ง ๋ฐฉํ–ฅ) - ์ตœ๋Œ€ delta loss์˜ leaf๋ฅผ ์„ฑ์žฅ ์ฆ‰, ๋™์ผํ•œ leaf๋ฅผ ์„ฑ์žฅ์‹œํ‚ฌ ๋•Œ, leaf-wise๊ฐ€ loss๋ฅผ ๋” ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค. https://www.slideshare.net/freepsw/boosting-bagging-vs-boosting
  • 71. Boosting Light GBM 71 Leaf-wise(Light GBM) > ๊ฐ€์žฅ loss๋ณ€ํ™”๊ฐ€ ํฐ ๋…ธ๋“œ์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„ํ•  > ํ•™์Šต ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์€ ๊ฒฝ์šฐ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ > ์ˆ˜์ง ์„ฑ์žฅ Level-wise(XGBoost, Random Forest) > ๊ฐ ๋…ธ๋“œ๋Š” root ๋…ธ๋“œ์™€ ๊ฐ€๊นŒ์šด ๋…ธ๋“œ๋ฅผ ์šฐ์„  ์ˆœํšŒ > ์ˆ˜ํ‰ ์„ฑ์žฅ
  • 72. Boosting Light GBM 72 ์™œ Light GBM์ด ์ธ๊ธฐ๊ฐ€ ์žˆ์„๊นŒ? > ๋Œ€๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ‘๋ ฌ๋กœ ๋น ๋ฅด๊ฒŒ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•˜๋‹ค.(Low memory, GPU ํ™œ์šฉ๊ฐ€๋Šฅ) > ์˜ˆ์ธก ์ •ํ™•๋„๊ฐ€ ๋” ๋†’๋‹ค. (Leaf-wise tree์˜ ์žฅ์ : ๊ณผ์ ํ•ฉ์— ๋ฏผ๊ฐ) ํŠน์ง• > XGBoost์˜ 2~10๋ฐฐ (๋™์ผํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ์„ค์ •)๋น ๋ฅด๋‹ค. > Leaf-wise tree๋Š” overfitting์— ๋ฏผ๊ฐํ•˜์—ฌ, ๋Œ€๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ ํ•™์Šต์— ์ ํ•ฉํ•˜๋‹ค. (์ ์–ด๋„ 10,000๊ฐœ ์ด์ƒ์˜ ํ–‰์ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ์— ์ ํ•ฉํ•˜๋‹ค.)
  • 73. Stacking โ€œTwo heads are better than oneโ€ = Meta modeling ์—ฌ๋Ÿฌ ๋ชจ๋ธ๋“ค์˜ ๊ฒฐ๊ณผ๊ฐ’์„ averaging, votingํ•˜์—ฌ ์ƒˆ๋กœ์šด ๊ฒฐ๊ณผ๊ฐ’ ์ƒ์„ฑ ์—ฌ๋Ÿฌ ๋ชจ๋ธ๋“ค์˜ ๊ฒฐ๊ณผ๊ฐ’์„ ์ƒˆ๋กœ์šด ๋ณ€์ˆ˜๋กœ ๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑ (์ดํ›„ ๋‹ค์‹œ ๋ชจ๋ธ ํ•™์Šต) 73
  • 74. Stacking > ์—ฌ๋Ÿฌ ๋ชจ๋ธ๋“ค์˜ ๊ฒฐ๊ณผ๊ฐ’์„ averaging, votingํ•˜์—ฌ ์ƒˆ๋กœ์šด ๊ฒฐ๊ณผ๊ฐ’ ์ƒ์„ฑ ์ข…์†๋ณ€์ˆ˜: ๋ฒ”์ฃผํ˜• - Predict ๊ฐ’ ์ค‘ ํ™•๋ฅ  ๊ฐ’๋“ค์„ ํ‰๊ท ๋‚ด ์ตœ์ข… ์˜ˆ์ธก ๊ฐ’์œผ๋กœ ์‚ฌ์šฉ - ์˜ˆ์ธก ๊ฐ’ ์ž์ฒด๋ฅผ votingํ•˜์—ฌ ์ตœ์ข… ์˜ˆ์ธก ๊ฐ’์œผ๋กœ ์‚ฌ์šฉ ์ข…์†๋ณ€์ˆ˜: ์—ฐ์†ํ˜• - ์˜ˆ์ธก ๊ฐ’ ์ž์ฒด๋ฅผ averagingํ•˜์—ฌ ์ตœ์ข… ์˜ˆ์ธก ๊ฐ’์œผ๋กœ ์‚ฌ์šฉ 74
  • 75. Stacking > ์—ฌ๋Ÿฌ ๋ชจ๋ธ๋“ค์˜ ๊ฒฐ๊ณผ๊ฐ’์„ ์ƒˆ๋กœ์šด ๋ณ€์ˆ˜๋กœ ๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑ (์ดํ›„ ๋‹ค์‹œ ๋ชจ๋ธ ํ•™์Šต) ์ข…์†๋ณ€์ˆ˜: ๋ฒ”์ฃผํ˜• - Predict ๊ฐ’ ์ค‘ ํ™•๋ฅ  ๊ฐ’๋“ค์„ ์ƒˆ๋กœ์šด ๋ณ€์ˆ˜๋กœ ์‚ฌ์šฉ - ์˜ˆ์ธก ๊ฐ’ ์ž์ฒด๋ฅผ ์ƒˆ๋กœ์šด ๋ณ€์ˆ˜๋กœ ์‚ฌ์šฉ ์ข…์†๋ณ€์ˆ˜: ์—ฐ์†ํ˜• - ์˜ˆ์ธก ๊ฐ’ ์ž์ฒด๋ฅผ ์ƒˆ๋กœ์šด ๋ณ€์ˆ˜๋กœ ์‚ฌ์šฉ 75
  • 77. Stacking Train Test predict Model 1 Train Test feature1 ์ƒˆ๋กœ์šด ๋ณ€์ˆ˜๋กœ ๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑ 77 Train Test predict Model 2 feature2
  • 78. Q & A