7. Regularization for Deep Learning
์‹ฌ์ธต ํ•™์Šต์„ ์œ„ํ•œ ์ •์น™ํ™”
์žฅ๊ฒฝ์šฑ
7. Regularization for Deep Learning
Training Test
์ƒˆ๋กœ์šด ์ž…๋ ฅ Input
์ •์น™ํ™” Regularization
7. Regularization for Deep Learning
์ •์น™ํ™” Regularization
: โ€œํ›ˆ๋ จ์˜ค์ฐจ๊ฐ€ ์•„๋‹ˆ๋ผ ์ผ๋ฐ˜ํ™” ์˜ค์ฐจ๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๊ฐ€ํ•˜๋Š” ๋ชจ๋“  ์ข…๋ฅ˜์˜ ์ˆ˜์ •โ€
์ˆ˜์ถ• ๊ธฐ๋ฒ• = ์ •๊ทœํ™”(Regularization)
ํŒจ๋„ํ‹ฐ๋ฅผ ๋ถ€๊ณผํ•˜์—ฌ ๊ณ„์ˆ˜๋ฅผ ์ˆ˜์ถ•ํ•˜๋Š” ๊ฒƒ
๋ณ€์ˆ˜ p์˜ ๊ฐœ์ˆ˜ โ†‘ โ˜ž ๋ชจ๋ธ ๊ณผ์ ํ•ฉ ์œ„ํ—˜ (ํŽธํ–ฅ โ†“ ๋ถ„์‚ฐ โ†‘)
โ˜ž ๋ชจ๋ธ์˜ ๊ณ„์ˆ˜ ์ œํ•œ โ˜ž ๋ชจ๋ธ์˜ ๋ถ„์‚ฐ์„ ์ค„์ด๋Š” ์‹œ๋„ = ์ •๊ทœํ™”(Regularization
์ •๊ทœํ™”
Thanks to ISLR Chapter 6
7.1 Parameter Norm Penalties(๋งค๊ฐœ๋ณ€์ˆ˜ ๋…ธ๋ฆ„ ๋ฒŒ์ )
๋ชฉ์ ํ•จ์ˆ˜ J์— ๋งค๊ฐœ๋ณ€์ˆ˜ ๋…ธ๋ฆ„ ๋ฒŒ์ (ใ…‹..)(Parameter norm penalty) โ„ฆ๋ฅผ ์ถ”๊ฐ€
๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ = ๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ + ๐›ผ๐›บ(๐œƒ)
๐›ผ๋Š” ๐›บ์˜ ์ƒ๋Œ€์ ์ธ ๊ธฐ์—ฌ๋„๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๊ฐ€์ค‘์น˜๋กœ ์ž‘์šฉํ•˜๋Š” ์ดˆ๋งค๊ฐœ๋ณ€์ˆ˜
b?
7.1.1 L2Parameter Regularization(L2 ๋งค๊ฐœ๋ณ€์ˆ˜ ์ •์น™ํ™”)
๐›บ ๐œƒ =
1
2
๐‘ค 2
2
์ถ”๊ฐ€
L2 ์ •์น™ํ™” = ๋Šฅ์„ ํšŒ๊ท€(Ridge Regression) = ํ‹ฐ์ฝ”๋…ธํ”„ ์ •์น™ํ™”
๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ = ๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ + ๐›ผ๐›บ(๐œƒ)
๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ = ๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ + ๐›ผ
1
2
๐‘ค 2
2
7.1.1 L2Parameter Regularization(L2 ๋งค๊ฐœ๋ณ€์ˆ˜ ์ •์น™ํ™”)
๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ = ๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ + ๐›ผ
1
2
๐‘ค 2
2
๐‘ค โ‰” ๐‘ค โˆ’ ๐œ–(๐›ผ๐‘ค + โˆ‡ ๐‘ค ๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ )
โˆ‡ ๐‘ค ๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ = โˆ‡ ๐‘ค ๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ + ๐›ผ๐‘ค
๐‘ค โ‰” (1 โˆ’ ๐œ–๐›ผ)๐‘ค โˆ’ ๐œ–โˆ‡ ๐‘ค ๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ
๐‘ค(1 โˆ’ ๐œ–๐›ผ) < w
โ€œW๊ฐ€ ์–ด๋–ค ๊ฐ’์ด๋“  ๊ฐ’์ด ์•ฝ๊ฐ„ ๋” ์ž‘์•„์ง„๋‹คโ€
L2 ์ •๊ทœํ™” = ๊ฐ€์ค‘์น˜ ๊ฐ์‡ (Weight Decay)
7.1.1 L2Parameter Regularization(L2 ๋งค๊ฐœ๋ณ€์ˆ˜ ์ •์น™ํ™”)
๐‘ค(1 โˆ’ ๐œ–๐›ผ) < w
์ง๊ด€์  ์ดํ•ด
๐›ผ =
๐œ†
2๐‘š
๐œ† = ์ •์น™ํ™” ๋ณ€์ˆ˜
m = Data ํฌ๊ธฐ
z = wx+b
๐œ† ๐›ผ ๐‘ค z
Activation Function -> ์„ ํ˜•์ 
7.1.1 L2Parameter Regularization(L2 ๋งค๊ฐœ๋ณ€์ˆ˜ ์ •์น™ํ™”)
Activation Function -> ์„ ํ˜•์  -> ์ „์ฒด ๋ชจ๋ธ์ด ๋ณด๋‹ค ์„ ํ˜•์ 
-> ๋ณต์žกํ•œ ๋ชจ๋ธ X -> ์ •์น™ํ™”(์ •๊ทœํ™”)
7.1.2 L1Regularization
๐›บ ๐œƒ = ๐‘ค 1 =
๐‘–
๐‘ค๐‘–
โ€œL1 ์ •์น™ํ™” ํ•ญ์€ ๊ฐœ๋ณ„ ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ์ ˆ๋Œ€๊ฐ’๋“ค์˜ ํ•ฉโ€
L2 ์ •์น™ํ™”์— ๋น„ํ•ด L1 ์ •์น™ํ™”๋Š” ์ข€ ๋” ํฌ์†Œํ•œ(Sparse) ํ•ด๋ฅผ ์‚ฐ์ถœํ•œ๋‹ค
: ํฌ์†Œ์„ฑ(sparsity) = ์ตœ์ ๊ฐ’ 0์— ๋„๋‹ฌํ•˜๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜๊ฐ€ ์žˆ์Œ
L1 ์ •์น™ํ™”๊ฐ€ ์œ ๋ฐœํ•˜๋Š” ์ด๋Ÿฌํ•œ ํฌ์†Œ์„ฑ์€ ์˜ˆ์ „๋ถ€ํ„ฐ
์ผ์ข…์˜ ํŠน์ง•์„ ํƒ(Feature Selection)์„ ์œ„ํ•œ
ํ•˜๋‚˜์˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์œผ๋กœ ํ™œ์šฉ๋˜์—ˆ๋‹ค
Andrew Ng said
๋ชจ๋ธ์„ ์••์ถ•ํ•˜๊ฒ ๋‹ค๋Š” ๋ชฉํ‘œ๊ฐ€ ์žˆ์ง€ ์•Š๋Š” ์ด์ƒ L1์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค
-> L1๋ณด๋‹ค L2์˜ ์‚ฌ์šฉ์ด ์••๋„์ ์œผ๋กœ ๋†’๋‹ค
7.2 Norm Penalties as Constrained Optimization(์ œ์•ฝ ์žˆ๋Š” ์ตœ์ ํ™”๋กœ์„œ์˜ ๋…ธ๋ฆ„ ๋ฒŒ์ )
๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ = ๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ + ๐›ผ๐›บ(๐œƒ)
๐ฟ ๐œƒ, ๐›ผ; ๐‘‹, ๐‘ฆ = ๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ + ๐›ผ๐›บ(๐œƒ โˆ’ ๐‘˜)
๐›บ(๐œƒ)๊ฐ€ ๋ฐ˜๋“œ์‹œ ์–ด๋–ค ์ƒ์ˆ˜ k๋ณด๋‹ค ์ž‘์•„์•ผ ํ•œ๋‹ค๋Š” ์ œ์•ฝ์„ ๊ฐ€ํ•  ๋•Œ,
๐œƒโˆ— = ๐‘Ž๐‘Ÿ๐‘”min
๐œƒ
max
๐›ผ,๐›ผโ‰ฅ0
๐ฟ(๐œƒ, ๐›ผ)
์ œ์•ฝ ์žˆ๋Š” ๋ฌธ์ œ(Constrained problem)์˜ ํ•ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค
๐›ผโˆ—๊ณ ์ •
๐œƒโˆ—
= ๐‘Ž๐‘Ÿ๐‘”min
๐œƒ
๐ฟ(๐œƒ, ๐›ผโˆ—
)
๐›ผโˆ—๐ฝ k
7.3 Regularization and Under-Constrained Problems(์ •์น™ํ™”์™€ ๊ณผ์†Œ์ œ์•ฝ ๋ฌธ์ œ)
์„ ํ˜•ํšŒ๊ท€์™€ PCA๋ฅผ ํฌํ•จํ•œ ๊ธฐ๊ณ„ํ•™์Šต์˜ ์—ฌ๋Ÿฌ ์„ ํ˜•๋ชจํ˜•์€ ํ–‰๋ ฌ ๐‘‹ ๐‘‡
๐‘‹ ์˜ ์—ญํ–‰๋ ฌ์— ์˜์กด
๐‘‹ ๐‘‡ ๐‘‹๊ฐ€ ํŠน์ดํ–‰๋ ฌ์ด๋ฉด ์—ญํ–‰๋ ฌ X
์ƒ์„ฑ ๋ถ„ํฌ๊ฐ€ ํŠน์ • ๋ฐฉํ–ฅ์—์„œ ๋ถ„์‚ฐ์ด ์ „ํ˜€ ์—†๊ฑฐ๋‚˜,
ํŠน์ง•๋ณด๋‹ค data๊ฐ€ ์ ์–ด์„œ ๋ถ„์‚ฐ์ด ๊ด€์ธก๋˜์ง€ ์•Š์„ ๋•Œ
Ex) ์‹์ˆ˜ ํŠน์ง• 50๊ฐœ, data 30๊ฐœ
๐‘‹ ๐‘‡
๐‘‹ + ๐›ผ๐ผ ์˜ ์—ญํ–‰๋ ฌ ํ’€๋ฉด ๋จ
7.4 Dataset Augmentation(์ž๋ฃŒ ์ง‘ํ•ฉ์˜ ์ฆ๊ฐ•)
7.5 Noise Robustness(์žก์Œ์— ๋Œ€ํ•œ ๊ฐ•์ธ์„ฑ)
Other Wayโ€ฆ ์žก์Œ์„ ์ž…๋ ฅ์ด ์•„๋‹ˆ๋ผ ๊ฐ€์ค‘์น˜์— ๋”ํ•˜๋Š” ๊ฒƒ..!
๋งค๊ฐœ๋ณ€์ˆ˜๋“ค์˜ ํฌ๊ธฐ ์ค„์ด๊ธฐ ์žก์Œ(Noise) ์ฃผ์ž…
7.5.1 Injecting Noise at the Output Targets(์ถœ๋ ฅ ๋ชฉํ‘œ๋“ค์— ์žก์Œ ์ฃผ์ž…)
Y๊ฐ€ ์‹ค์ˆ˜์ผ ๋•Œ, ์ฆ‰ ํ•ด๋‹น ๊ฒฌ๋ณธ์˜ ์ •ํ™•ํ•œ ์ด๋ฆ„ํ‘œ(Label)๊ฐ€ ์•„๋‹ ๋•Œ, log(Y|x)๋ฅผ ์ตœ๋Œ€ํ™” ํ•˜๋ฉด ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜๋น ์ง
Label๋“ค์— ๋ช…์‹œ์ ์œผ๋กœ ์žก์Œ(Noise) ๋ฐ˜์˜

Chapter 7 Regularization for deep learning - 1

  • 1.
    7. Regularization forDeep Learning ์‹ฌ์ธต ํ•™์Šต์„ ์œ„ํ•œ ์ •์น™ํ™” ์žฅ๊ฒฝ์šฑ
  • 2.
    7. Regularization forDeep Learning Training Test ์ƒˆ๋กœ์šด ์ž…๋ ฅ Input ์ •์น™ํ™” Regularization
  • 3.
    7. Regularization forDeep Learning ์ •์น™ํ™” Regularization : โ€œํ›ˆ๋ จ์˜ค์ฐจ๊ฐ€ ์•„๋‹ˆ๋ผ ์ผ๋ฐ˜ํ™” ์˜ค์ฐจ๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๊ฐ€ํ•˜๋Š” ๋ชจ๋“  ์ข…๋ฅ˜์˜ ์ˆ˜์ •โ€
  • 4.
    ์ˆ˜์ถ• ๊ธฐ๋ฒ• =์ •๊ทœํ™”(Regularization) ํŒจ๋„ํ‹ฐ๋ฅผ ๋ถ€๊ณผํ•˜์—ฌ ๊ณ„์ˆ˜๋ฅผ ์ˆ˜์ถ•ํ•˜๋Š” ๊ฒƒ ๋ณ€์ˆ˜ p์˜ ๊ฐœ์ˆ˜ โ†‘ โ˜ž ๋ชจ๋ธ ๊ณผ์ ํ•ฉ ์œ„ํ—˜ (ํŽธํ–ฅ โ†“ ๋ถ„์‚ฐ โ†‘) โ˜ž ๋ชจ๋ธ์˜ ๊ณ„์ˆ˜ ์ œํ•œ โ˜ž ๋ชจ๋ธ์˜ ๋ถ„์‚ฐ์„ ์ค„์ด๋Š” ์‹œ๋„ = ์ •๊ทœํ™”(Regularization ์ •๊ทœํ™” Thanks to ISLR Chapter 6
  • 5.
    7.1 Parameter NormPenalties(๋งค๊ฐœ๋ณ€์ˆ˜ ๋…ธ๋ฆ„ ๋ฒŒ์ ) ๋ชฉ์ ํ•จ์ˆ˜ J์— ๋งค๊ฐœ๋ณ€์ˆ˜ ๋…ธ๋ฆ„ ๋ฒŒ์ (ใ…‹..)(Parameter norm penalty) โ„ฆ๋ฅผ ์ถ”๊ฐ€ ๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ = ๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ + ๐›ผ๐›บ(๐œƒ) ๐›ผ๋Š” ๐›บ์˜ ์ƒ๋Œ€์ ์ธ ๊ธฐ์—ฌ๋„๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๊ฐ€์ค‘์น˜๋กœ ์ž‘์šฉํ•˜๋Š” ์ดˆ๋งค๊ฐœ๋ณ€์ˆ˜ b?
  • 6.
    7.1.1 L2Parameter Regularization(L2๋งค๊ฐœ๋ณ€์ˆ˜ ์ •์น™ํ™”) ๐›บ ๐œƒ = 1 2 ๐‘ค 2 2 ์ถ”๊ฐ€ L2 ์ •์น™ํ™” = ๋Šฅ์„ ํšŒ๊ท€(Ridge Regression) = ํ‹ฐ์ฝ”๋…ธํ”„ ์ •์น™ํ™” ๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ = ๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ + ๐›ผ๐›บ(๐œƒ) ๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ = ๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ + ๐›ผ 1 2 ๐‘ค 2 2
  • 7.
    7.1.1 L2Parameter Regularization(L2๋งค๊ฐœ๋ณ€์ˆ˜ ์ •์น™ํ™”) ๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ = ๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ + ๐›ผ 1 2 ๐‘ค 2 2 ๐‘ค โ‰” ๐‘ค โˆ’ ๐œ–(๐›ผ๐‘ค + โˆ‡ ๐‘ค ๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ ) โˆ‡ ๐‘ค ๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ = โˆ‡ ๐‘ค ๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ + ๐›ผ๐‘ค ๐‘ค โ‰” (1 โˆ’ ๐œ–๐›ผ)๐‘ค โˆ’ ๐œ–โˆ‡ ๐‘ค ๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ ๐‘ค(1 โˆ’ ๐œ–๐›ผ) < w โ€œW๊ฐ€ ์–ด๋–ค ๊ฐ’์ด๋“  ๊ฐ’์ด ์•ฝ๊ฐ„ ๋” ์ž‘์•„์ง„๋‹คโ€ L2 ์ •๊ทœํ™” = ๊ฐ€์ค‘์น˜ ๊ฐ์‡ (Weight Decay)
  • 8.
    7.1.1 L2Parameter Regularization(L2๋งค๊ฐœ๋ณ€์ˆ˜ ์ •์น™ํ™”) ๐‘ค(1 โˆ’ ๐œ–๐›ผ) < w ์ง๊ด€์  ์ดํ•ด ๐›ผ = ๐œ† 2๐‘š ๐œ† = ์ •์น™ํ™” ๋ณ€์ˆ˜ m = Data ํฌ๊ธฐ z = wx+b ๐œ† ๐›ผ ๐‘ค z Activation Function -> ์„ ํ˜•์ 
  • 9.
    7.1.1 L2Parameter Regularization(L2๋งค๊ฐœ๋ณ€์ˆ˜ ์ •์น™ํ™”) Activation Function -> ์„ ํ˜•์  -> ์ „์ฒด ๋ชจ๋ธ์ด ๋ณด๋‹ค ์„ ํ˜•์  -> ๋ณต์žกํ•œ ๋ชจ๋ธ X -> ์ •์น™ํ™”(์ •๊ทœํ™”)
  • 10.
    7.1.2 L1Regularization ๐›บ ๐œƒ= ๐‘ค 1 = ๐‘– ๐‘ค๐‘– โ€œL1 ์ •์น™ํ™” ํ•ญ์€ ๊ฐœ๋ณ„ ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ์ ˆ๋Œ€๊ฐ’๋“ค์˜ ํ•ฉโ€ L2 ์ •์น™ํ™”์— ๋น„ํ•ด L1 ์ •์น™ํ™”๋Š” ์ข€ ๋” ํฌ์†Œํ•œ(Sparse) ํ•ด๋ฅผ ์‚ฐ์ถœํ•œ๋‹ค : ํฌ์†Œ์„ฑ(sparsity) = ์ตœ์ ๊ฐ’ 0์— ๋„๋‹ฌํ•˜๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜๊ฐ€ ์žˆ์Œ L1 ์ •์น™ํ™”๊ฐ€ ์œ ๋ฐœํ•˜๋Š” ์ด๋Ÿฌํ•œ ํฌ์†Œ์„ฑ์€ ์˜ˆ์ „๋ถ€ํ„ฐ ์ผ์ข…์˜ ํŠน์ง•์„ ํƒ(Feature Selection)์„ ์œ„ํ•œ ํ•˜๋‚˜์˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์œผ๋กœ ํ™œ์šฉ๋˜์—ˆ๋‹ค Andrew Ng said ๋ชจ๋ธ์„ ์••์ถ•ํ•˜๊ฒ ๋‹ค๋Š” ๋ชฉํ‘œ๊ฐ€ ์žˆ์ง€ ์•Š๋Š” ์ด์ƒ L1์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค -> L1๋ณด๋‹ค L2์˜ ์‚ฌ์šฉ์ด ์••๋„์ ์œผ๋กœ ๋†’๋‹ค
  • 11.
    7.2 Norm Penaltiesas Constrained Optimization(์ œ์•ฝ ์žˆ๋Š” ์ตœ์ ํ™”๋กœ์„œ์˜ ๋…ธ๋ฆ„ ๋ฒŒ์ ) ๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ = ๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ + ๐›ผ๐›บ(๐œƒ) ๐ฟ ๐œƒ, ๐›ผ; ๐‘‹, ๐‘ฆ = ๐ฝ ๐œƒ; ๐‘‹, ๐‘ฆ + ๐›ผ๐›บ(๐œƒ โˆ’ ๐‘˜) ๐›บ(๐œƒ)๊ฐ€ ๋ฐ˜๋“œ์‹œ ์–ด๋–ค ์ƒ์ˆ˜ k๋ณด๋‹ค ์ž‘์•„์•ผ ํ•œ๋‹ค๋Š” ์ œ์•ฝ์„ ๊ฐ€ํ•  ๋•Œ, ๐œƒโˆ— = ๐‘Ž๐‘Ÿ๐‘”min ๐œƒ max ๐›ผ,๐›ผโ‰ฅ0 ๐ฟ(๐œƒ, ๐›ผ) ์ œ์•ฝ ์žˆ๋Š” ๋ฌธ์ œ(Constrained problem)์˜ ํ•ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค ๐›ผโˆ—๊ณ ์ • ๐œƒโˆ— = ๐‘Ž๐‘Ÿ๐‘”min ๐œƒ ๐ฟ(๐œƒ, ๐›ผโˆ— ) ๐›ผโˆ—๐ฝ k
  • 12.
    7.3 Regularization andUnder-Constrained Problems(์ •์น™ํ™”์™€ ๊ณผ์†Œ์ œ์•ฝ ๋ฌธ์ œ) ์„ ํ˜•ํšŒ๊ท€์™€ PCA๋ฅผ ํฌํ•จํ•œ ๊ธฐ๊ณ„ํ•™์Šต์˜ ์—ฌ๋Ÿฌ ์„ ํ˜•๋ชจํ˜•์€ ํ–‰๋ ฌ ๐‘‹ ๐‘‡ ๐‘‹ ์˜ ์—ญํ–‰๋ ฌ์— ์˜์กด ๐‘‹ ๐‘‡ ๐‘‹๊ฐ€ ํŠน์ดํ–‰๋ ฌ์ด๋ฉด ์—ญํ–‰๋ ฌ X ์ƒ์„ฑ ๋ถ„ํฌ๊ฐ€ ํŠน์ • ๋ฐฉํ–ฅ์—์„œ ๋ถ„์‚ฐ์ด ์ „ํ˜€ ์—†๊ฑฐ๋‚˜, ํŠน์ง•๋ณด๋‹ค data๊ฐ€ ์ ์–ด์„œ ๋ถ„์‚ฐ์ด ๊ด€์ธก๋˜์ง€ ์•Š์„ ๋•Œ Ex) ์‹์ˆ˜ ํŠน์ง• 50๊ฐœ, data 30๊ฐœ ๐‘‹ ๐‘‡ ๐‘‹ + ๐›ผ๐ผ ์˜ ์—ญํ–‰๋ ฌ ํ’€๋ฉด ๋จ
  • 13.
    7.4 Dataset Augmentation(์ž๋ฃŒ์ง‘ํ•ฉ์˜ ์ฆ๊ฐ•)
  • 14.
    7.5 Noise Robustness(์žก์Œ์—๋Œ€ํ•œ ๊ฐ•์ธ์„ฑ) Other Wayโ€ฆ ์žก์Œ์„ ์ž…๋ ฅ์ด ์•„๋‹ˆ๋ผ ๊ฐ€์ค‘์น˜์— ๋”ํ•˜๋Š” ๊ฒƒ..! ๋งค๊ฐœ๋ณ€์ˆ˜๋“ค์˜ ํฌ๊ธฐ ์ค„์ด๊ธฐ ์žก์Œ(Noise) ์ฃผ์ž… 7.5.1 Injecting Noise at the Output Targets(์ถœ๋ ฅ ๋ชฉํ‘œ๋“ค์— ์žก์Œ ์ฃผ์ž…) Y๊ฐ€ ์‹ค์ˆ˜์ผ ๋•Œ, ์ฆ‰ ํ•ด๋‹น ๊ฒฌ๋ณธ์˜ ์ •ํ™•ํ•œ ์ด๋ฆ„ํ‘œ(Label)๊ฐ€ ์•„๋‹ ๋•Œ, log(Y|x)๋ฅผ ์ตœ๋Œ€ํ™” ํ•˜๋ฉด ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜๋น ์ง Label๋“ค์— ๋ช…์‹œ์ ์œผ๋กœ ์žก์Œ(Noise) ๋ฐ˜์˜

Editor's Notes

  • #3ย ๊ธฐ๊ณ„ํ•™์Šต์˜ ์ค‘์‹ฌ ๋ฌธ์ œ๋Š”, ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ํ›ˆ๋ จ ์ž๋ฃŒ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ƒˆ๋กœ์šด ์ž…๋ ฅ์— ๋Œ€ํ•ด์„œ๋„ ์ž˜ ์ž‘๋™ํ•˜๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฒƒ ์‹œํ—˜ ์˜ค์ฐจ์˜ ๊ฐ์†Œ๊ฐ€ ์ฃผ๋œ ๋ชฉํ‘œ๋กœ ์„ค๊ณ„. ์‹ฌ์ง€์–ด, ํ›ˆ๋ จ์˜ค์ฐจ๊ฐ€ ์ฆ๊ฐ€ํ•˜๋Š” ๋Œ€๊ฐ€๋ฅผ ์น˜๋ฅด๋”๋ผ๋„ ์‹œํ—˜ ์˜ค์ฐจ๋ฅผ ์ค„์ด๋ ค๋Š” ์ „๋žต๋“ค์ด ์žˆ๋‹ค ์ด๋Ÿฌํ•œ ์ „๋žต๋“ค์„ ํ†ต์นญํ•ด์„œ ์ •์น™ํ™”๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค
  • #6ย ์•ŒํŒŒ๊ฐ€ ํด์ˆ˜๋ก ์ •์น™ํ™”์˜ ์˜ํ–ฅ์ด ์ปค์ง„๋‹ค Bias๋Š” ๊ฐ€์ค‘์น˜๋“ค๋ณด๋‹ค ์ ์€ ์–‘์˜ ์ž๋ฃŒ๋กœ๋„ ์ •ํ™•ํ•˜๊ฒŒ ์ ํ•ฉ์‹œํ‚ฌ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํ•˜๋‚˜์˜ ๊ฐ€์ค‘์น˜๋Š” ๋‘ ๋ณ€์ˆ˜์˜ ์ƒํ˜ธ์ž‘์šฉ ๋ฐฉ์‹์„ ๊ฒฐ์ •ํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ๊ฐ€์ค‘์น˜๋ฅผ ์ž˜ ์ ํ•ฉ์‹œํ‚ค๋ ค๋ฉด ๋‹ค์–‘ํ•œ ์กฐ๊ฑด๋“ค์—์„œ ๋‘ ๋ณ€์ˆ˜๋ฅผ ๊ด€์ธกํ•  ํ•„์š”๊ฐ€ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ํ•˜๋‚˜์˜ bias ํ•ญ์€ ํ•˜๋‚˜์˜ ๋ณ€์ˆ˜๋งŒ ์ œ์–ดํ•œ๋‹ค. -> ์ •์น™ํ™”ํ•˜์ง€ ์•Š์•„๋„ ๋ถ„์‚ฐ์ด ์•„์ฃผ ์ปค์ง€์ง€๋Š” ์•Š๋Š”๋‹ค. ๋˜ํ•œ, ์ •์‹ํ™”ํ•˜๋ฉด ๊ณผ์†Œ์ ํ•ฉ์ด ํฌ๊ฒŒ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ๊ฐ€์ค‘์น˜๋“ค๋งŒ ์ •์น™ํ™”ํ•œ๋‹ค.
  • #12ย ์›์น™์ ์œผ๋กœ k๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, k์™€ ์•ŒํŒŒ*์˜ ๊ด€๊ณ„๋Š” J์˜ ํ˜•ํƒœ์— ์˜์กดํ•œ๋‹ค. ์ œ์•ฝ ์˜์—ญ์˜ ์ •ํ™•ํ•œ ํฌ๊ธฐ๋ฅผ ์•Œ ์ˆ˜๋Š” ์—†์ง€๋งŒ, ์•ŒํŒŒ๋ฅผ ์ฆ๊ฐ€ํ•˜๊ฑฐ๋‚˜ ๊ฐ์†Œํ•ด์„œ ์ œ์•ฝ ์˜์—ญ์˜ ํฌ๊ธฐ๋ฅผ ์–ด๋А์ •๋„ ์ œ์–ดํ•  ์ˆ˜ ์žˆ๋‹ค. ์•ŒํŒŒ๊ฐ€ ํฌ๋ฉด ์ œ์•ฝ์˜์—ญ์ด ์ž‘๊ณ , ์•ŒํŒŒ๊ฐ€ ์ž‘์œผ๋ฉด ์ œ์•ฝ์˜์—ญ์ด ํฌ๋‹ค. ์˜ด(์„ธํƒ€)<k๋ฅผ ๋งŒ์กฑํ•˜๋Š” ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ ์œผ๋กœ ํˆฌ์˜ํ•˜๋Š” ์‹์œผ๋กœ ํ™•๋ฅ ์  ๊ธฐ์šธ๊ธฐ ํ•˜๊ฐ•๋ฒ• ๊ฐ™์€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ๋‹ค. K์˜ ๋ฐ”๋žŒ์งํ•œ ๊ฐ’์„ ์ด๋ฏธ ์•Œ๊ณ  ์žˆ์œผ๋ฉฐ ๊ทธ๋Ÿฌํ•œ k ๊ฐ’์— ํ•ด๋‹นํ•˜๋Š” ์•ŒํŒŒ ๊ฐ’์„ ์ฐพ๋А๋ผ ์‹œ๊ฐ„์„ ํ—ˆ๋น„ํ•˜๊ณ  ์‹ถ์ง€ ์•Š๋‹ค๋ฉด ์ด๋Ÿฐ ์ ‘๊ทผ ๋ฐฉ์‹์ด ์œ ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.
  • #13ย ๊ทธ๋Ÿฌ๋‚˜ ๋‹ซํžŒ ํ˜•์‹์˜ ํ•ด๊ฐ€ ์กด์žฌํ•˜์ง€ ์•Š๋Š” ๊ณผ๊ณ ์…œ์ •(underdeterminded)๋ฌธ์ œ๋ฅผ ํ’€์–ด์•ผ ํ•  ๋•Œ๋„ ์žˆ๋‹ค. Ex ์„ ํ˜• ๋ถ„๋ฆฌ ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ ์ ์šฉ ์‹œ / ๊ฐ€์ค‘์น˜ ๋ฒกํ„ฐ w๋กœ ๊ฒฌ๋ณธ๋“ค์„ ์™„๋ฒฝํ•˜๊ฒŒ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด, 2w๋กœ๋Š” ๋” ๋†’์€ ๊ฐ€๋Šฅ๋„๋กœ ๊ฒฌ๋ณธ๋“ค์„ ์™„๋ฒฝํ•˜๊ฒŒ ๋ถ„๋ฅ˜๊ฐ€๋Šฅ. -> ํ”„๋กœ๊ทธ๋ž˜๋จธ๊ฐ€ ์–ด๋–ค ์‹์œผ๋กœ ์–ด๋–ป๊ฒŒ ์ฒ˜๋ฆฌํ•˜๋ƒ์— ๋”ฐ๋ผ ๋‹ค๋ฅด๋‹ค
  • #14ย ํ•™์Šต ๋ชจํ˜•์˜ ์ผ๋ฐ˜ํ™”๋ฅผ ๊ฐœ์„ ํ•˜๋Š” ์ตœ์„ ์˜ ๋ฐฉ๋ฒ•์€ ๋” ๋งŽ์€ ์ž๋ฃŒ๋กœ ๋ชจํ˜•์„ ํ›ˆ๋ จํ•˜๋Š” ๊ฒƒ ํ•˜์ง€๋งŒ ์ž๋ฃŒ์˜ ์–‘์€ ํ•œ์ • -> ๊ฐ€์งœ ์ž๋ฃŒ๋ฅผ ๋งŒ๋“ค์–ด์„œ ํ›ˆ๋ จ ์ง‘ํ•ฉ์— ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ
  • #15ย ์ผ๋ฐ˜์ ์ธ ๊ฒฝ์šฐ, ์žก์Œ ์ฃผ์ž…(Noise)์ด ๊ทธ๋ƒฅ ๋งค๊ฐœ๋ณ€์ˆ˜๋“ค์˜ ํฌ๊ธฐ๋ฅผ ์ค„์ด๋Š” ๊ฒƒ๋ณด๋‹ค ํ›จ์”ฌ ๊ฐ•๋ ฅํ•  ์ˆ˜ ์žˆ๋‹ค. ํŠนํžˆ, ์€๋‹‰ ๋‹จ์œ„๋“ค์— ์ถ”๊ฐ€ํ•˜๋ฉด ๊ทธ ํšจ๊ณผ๊ฐ€ ๋”์šฑ ๊ฐ•๋ ฅ ์ด๋ฆ„ํ‘œ ํ‰ํ™œํ™”(Label Smoothing)