SlideShare a Scribd company logo
1 of 32
Download to read offline
Latent
Dirichlet
Allocation
David M. Blei | Andrew Y. Ng | Michael I. Jordan
ใ…Žใ…‡
๋ชจ๋ธ ๊ฐœ์š”
ํ† ํ”ฝ๋ณ„ ๋‹จ์–ด์˜ ๋ถ„ํฌ
๋ฌธ์„œ๋ณ„ ํ† ํ”ฝ์˜ ๋ถ„ํฌ
๊ฐ ๋ฌธ์„œ์— ์–ด๋–ค ์ฃผ์ œ๋“ค์ด ์กด์žฌํ•˜๋Š”์ง€์— ๋Œ€ํ•œ ํ™•๋ฅ ๋ชจํ˜•
๊ธ€์“ฐ๊ธฐ์˜ ๊ณผ์ •
๊ธ€๊ฐ, ์ฃผ์ œ ์ •ํ•˜๊ธฐ ์–ด๋–ค ๋‹จ์–ด๋ฅผ ์“ธ๊นŒ?
์‚ฌ๋žŒ
LDA ์˜ ๊ฐ€์ •
๋ง๋ญ‰์น˜(corpus)๋กœ๋ถ€ํ„ฐ
์–ป์€ ํ† ํ”ฝ์˜ ๋ถ„ํฌ๋กœ๋ถ€ํ„ฐ ํ† ํ”ฝ ์„ ์ •
์„ ์ •๋œ ํ† ํ”ฝ์— ํ•ด๋‹นํ•˜๋Š”
๋‹จ์–ด๋“ค์„ ๋ฝ‘์•„์„œ ์“ฐ์ž!
์‹ค์ œ๋กœ ์ด๋Ÿฐ๋‹ค๋Š”๊ฑด์•„๋‹ˆ๊ณ  ์ด๋ ‡๊ฒŒ ๋  ๊ฒƒ์ด๋ผ ๊ฐ€์ •ํ•œ๋‹ค๋Š”๊ฒƒ
๋ฐ˜๋Œ€๋ฐฉํ–ฅ์œผ๋กœ ์ƒ๊ฐํ•ด๋ณด์ž
ํ˜„์žฌ ๋ฌธ์„œ์— ๋“ฑ์žฅํ•œ ๋‹จ์–ด๋“ค์€ ์–ด๋–ค ํ† ํ”ฝ์—์„œ ๋‚˜์˜จ ๋‹จ์–ด๋“ค์ผ๊นŒ?
๋ช…์‹œ์ ์œผ๋กœ ์•Œ๊ธฐ๊ฐ€ ์–ด๋ ค์›€
LDA๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ง๋ญ‰์น˜ ์ด๋ฉด์— ์กด์žฌํ•˜๋Š” ์ •๋ณด๋ฅผ ์ถ”๋ก ํ•ด ๋‚ธ๋‹ค.
๊ทธ๋Ÿผ D์˜ Dirichlet๋Š” ๋ญ์•ผ?
LDA์˜ L์€ latent ์ž ์žฌ์ •๋ณด๋ฅผ ์•Œ์•„๋‚ธ๋‹ค๋Š”๊ฒƒ
์ผ๋‹จ ๋””๋ฆฌํด๋ ˆ๋ผ๋Š” ๋ถ„ํฌ๊ฐ€ ์žˆ๋‹ค๋Š”๊ฒƒ๋งŒ ์•Œ๊ณ  ๋„˜์–ด๊ฐ€์ž
Architecture
๋ง๋ญ‰์น˜ ์ „์ฒด
๋ฌธ์„œ์˜ ๊ฐฏ์ˆ˜
์ „์ฒด ํ† ํ”ฝ์˜ ์ˆ˜
(ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ)
d๋ฒˆ์งธ ๋ฌธ์„œ์˜
๋‹จ์–ด ์ˆ˜
์œ ์ผํ•œ ๊ด€์ฐฐ๊ฐ€๋Šฅ ๋ณ€์ˆ˜
๋ฌธ์„œ ์ƒ์„ฑ ๊ณผ์ •
๋ชจ๋ธ์˜ ๋ณ€์ˆ˜
ฯ•k ๋Š” k๋ฒˆ์งธ ํ† ํ”ฝ์˜ ๋‹จ์–ด๋น„์ค‘์„ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฒกํ„ฐ
๋ง๋ญ‰์น˜ ์ „์ฒด ๋‹จ์–ด ๊ฐœ์ˆ˜๋งŒํผ์˜ ๊ธธ์ด๋ฅผ ๊ฐ–๊ฒŒ๋จ.
ฯ•1 ฯ•2 ฯ•3
๊ฐ entry value๋Š” ํ•ด๋‹น ๋‹จ์–ด๊ฐ€ k๋ฒˆ์งธ ํ† ํ”ฝ์—์„œ
์ฐจ์ง€ํ•˜๋Š” ๋น„์ค‘์„ ๋‚˜ํƒ€๋ƒ„
๊ฐ ์š”์†Œ๋Š” ํ™•๋ฅ ์ด๋ฏ€๋กœ ์—ด์˜ ์ด ํ•ฉ์€ 1์ด ๋œ๋‹ค.
์•„ํ‚คํ…์ฒ˜๋ฅผ ์‚ดํŽด๋ณด๋ฉด ฯ•k ๋Š” ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ฮฒ ์˜
์˜ํ–ฅ์„ ๋ฐ›๊ณ  ์žˆ์Œ. ์ด๋Š” LDA์—์„œ ํ† ํ”ฝ์˜ ๋‹จ์–ด๋น„์ค‘ ฯ•k ์ด ๋””๋ฆฌํด๋ ˆ ๋ถ„ํฌ๋ฅผ
๋”ฐ๋ฅธ๋‹ค๋Š” ๊ฐ€์ •์„ ์ทจํ•˜๊ธฐ ๋•Œ๋ฌธ. ์ž์„ธํ•œ ์ด๋ก ์  ๋‚ด์šฉ์€ ์ž ์‹œ ํ›„์—
๋ชจ๋ธ์˜ ๋ณ€์ˆ˜
ฮธd ๋Š” d๋ฒˆ์งธ ๋ฌธ์„œ๊ฐ€ ๊ฐ€์ง„ ํ† ํ”ฝ ๋น„์ค‘์„ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฒกํ„ฐ
์ „์ฒด ํ† ํ”ฝ ๊ฐœ์ˆ˜ K๋งŒํผ์˜ ๊ธธ์ด๋ฅผ ๊ฐ–๊ฒŒ๋จ.
ฮธ1 ๊ฐ entry value๋Š” k๋ฒˆ์งธ ํ† ํ”ฝ์ด ํ•ด๋‹น d๋ฒˆ์งธ
๋ฌธ์„œ์—์„œ ์ฐจ์ง€ํ•˜๋Š” ๋น„์ค‘์„ ๋‚˜ํƒ€๋ƒ„
๊ฐ ์š”์†Œ๋Š” ํ™•๋ฅ ์ด๋ฏ€๋กœ ๊ฐ ํ–‰์˜ ์ด ํ•ฉ์€ 1์ด ๋œ๋‹ค.
์•„ํ‚คํ…์ฒ˜๋ฅผ ์‚ดํŽด๋ณด๋ฉด ฮธd ๋Š” ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ฮฑ ์˜
์˜ํ–ฅ์„ ๋ฐ›๊ณ  ์žˆ์Œ. ์ด๋Š” LDA์—์„œ ๋ฌธ์„œ์˜ ํ† ํ”ฝ ๋น„์ค‘ ฮธd ์—ญ์‹œ ๋””๋ฆฌํด๋ ˆ ๋ถ„ํฌ๋ฅผ
๋”ฐ๋ฅธ๋‹ค๋Š” ๊ฐ€์ •์„ ์ทจํ•˜๊ธฐ ๋•Œ๋ฌธ. ์ž์„ธํ•œ ์ด๋ก ์  ๋‚ด์šฉ์€ ์ž ์‹œ ํ›„์—
ฮธ2
ฮธ3
ฮธ4
ฮธ5
ฮธ6
๋ชจ๋ธ์˜ ๋ณ€์ˆ˜
zd,n ๋Š” d๋ฒˆ์งธ ๋ฌธ์„œ์˜ n๋ฒˆ์งธ ๋‹จ์–ด๊ฐ€ ์–ด๋–ค ํ† ํ”ฝ์— ํ•ด๋‹นํ•˜๋Š”์ง€ ํ• ๋‹นํ•ด์ฃผ๋Š” ์—ญํ• 
์˜ˆ์ปจ๋ฐ ์„ธ๋ฒˆ์งธ ๋ฌธ์„œ์˜ ์ฒซ๋ฒˆ์งธ ๋‹จ์–ด๋Š” Topic2์ผ ๊ฐ€๋Šฅ์„ฑ์ด ๊ฐ€์žฅ ๋†’๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ์Œ
wd,n ์€ ๋ฌธ์„œ์— ๋“ฑ์žฅํ•˜๋Š” ๋‹จ์–ด๋ฅผ ํ• ๋‹นํ•ด ์ฃผ๋Š” ์—ญํ• .
์ง์ „ ์˜ˆ์‹œ์—์„œ z_3,1์ด ์‹ค์ œ๋กœ Topic2์— ํ• ๋‹น๋˜์—ˆ๋‹ค๊ณ  ํ–ˆ์„๋•Œ, Topic2์˜ ๋‹จ์–ด๋ถ„ํฌ ๊ฐ€์šด๋ฐ
Money์˜ ํ™•๋ฅ ์ด ๊ฐ€์žฅ ๋†’์œผ๋ฏ€๋กœ w_3,1์€ Money๊ฐ€ ๋  ๊ฐ€๋Šฅ์„ฑ์ด ๊ฐ€์žฅ ๋†’์Œ
๋™์‹œ์— ์˜ํ–ฅ์„ ๋ฐ›์Œzd,nฯ•k
Architecture
๋ง๋ญ‰์น˜ ์ „์ฒด
๋ฌธ์„œ์˜ ๊ฐฏ์ˆ˜
์ „์ฒด ํ† ํ”ฝ์˜ ์ˆ˜
(ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ)
d๋ฒˆ์งธ ๋ฌธ์„œ์˜
๋‹จ์–ด ์ˆ˜
์œ ์ผํ•œ ๊ด€์ฐฐ๊ฐ€๋Šฅ ๋ณ€์ˆ˜
LDA์˜ inference
์ง€๊ธˆ๊นŒ์ง€๋Š” LDA๊ฐ€ ๊ฐ€์ •ํ•˜๋Š” ๋ฌธ์„œ์ƒ์„ฑ๊ณผ์ •๊ณผ ์ž ์žฌ๋ณ€์ˆ˜๋“ค์˜ ์—ญํ• ์„ ์‚ดํŽด๋ณด์•˜๋‹ค.
์ด์ œ๋Š” ๋ฐ˜๋Œ€๋กœ ๊ด€์ธก๋œ W_d,n์„ ๊ฐ€์ง€๊ณ  ์ž ์žฌ๋ณ€์ˆ˜๋ฅผ ์ถ”์ •ํ•˜๋Š” inference ๊ณผ์ •์„ ์‚ดํŽด๋ณด์ž.
LDA๋Š” ํ† ํ”ฝ์˜ ๋‹จ์–ด๋ถ„ํฌ์™€ ๋ฌธ์„œ์˜ ํ† ํ”ฝ๋ถ„ํฌ์˜ ๊ฒฐํ•ฉ์œผ๋กœ ๋ฌธ์„œ ๋‚ด ๋‹จ์–ด๋“ค์ด ์ƒ์„ฑ๋จ์„ ๊ฐ€์ •ํ•˜๊ณ  ์žˆ๋‹ค.
์‹ค์ œ ๊ด€์ธก๋œ ๋ฌธ์„œ ๋‚ด ๋‹จ์–ด๋ฅผ ๊ฐ€์ง€๊ณ  ์šฐ๋ฆฌ๊ฐ€ ์•Œ๊ณ  ์‹ถ์€ ํ† ํ”ฝ์˜ ๋‹จ์–ด ๋ถ„ํฌ, ๋ฌธ์„œ์˜ ํ† ํ”ฝ ๋ถ„ํฌ๋ฅผ ์ถ”์ •ํ•  ๊ฒƒ
๋ฌธ์„œ ์ƒ์„ฑ ๊ณผ์ •์ด ํ•ฉ๋ฆฌ์ ์ด๋ผ๋ฉด ์ด ๊ฒฐํ•ฉํ™•๋ฅ ์ด ๋งค์šฐ ํด ๊ฒƒ
ฯ•k ฮธd
LDA์˜ inference
์—ฌ๊ธฐ์—์„œ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ์•ŒํŒŒ์™€ ๋ฒ ํƒ€, ๊ทธ๋ฆฌ๊ณ  ๊ด€์ฐฐ ๊ฐ€๋Šฅํ•œ w_d,n์„ ์ œ์™ธํ•œ ๋ชจ๋“  ๋ณ€์ˆ˜๊ฐ€ ๋ฏธ์ง€์ˆ˜.
p(z, ฯ•, ฮธ|w)๊ฒฐ๊ตญ, ๋ฅผ ์ตœ๋Œ€๋กœ ๋งŒ๋“œ๋Š” z, ฯ•, ฮธ ๋ฅผ ์ฐพ๋Š”๊ฒƒ์ด ๋ชฉ์ 
๊ทธ๋Ÿฐ๋ฐ ์—ฌ๊ธฐ์—์„œ ๋ถ„๋ชจ์— ํ•ด๋‹นํ•˜๋Š” p(w) ๋ฅผ ๋ฐ”๋กœ ๊ตฌํ• ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ๊น์Šค ์ƒ˜ํ”Œ๋ง ํ™œ์šฉ
Dirichlet Distribution
Dirichlet Distribution
Dirichlet Distribution
Dirichlet Distribution
Dirichlet Distribution
LDA์˜ ๊น์Šค ์ƒ˜ํ”Œ๋ง
LDA ์—์„œ๋Š” ๋‚˜๋จธ์ง€ ๋ณ€์ˆ˜๋Š” ๊ณ ์ •์‹œํ‚จ ์ฑ„ ํ•œ ๋ณ€์ˆ˜๋งŒ์„ ๋ณ€ํ™”์‹œํ‚ค๋˜, ๋ถˆํ•„์š”ํ•œ ๋ณ€์ˆ˜๋ฅผ ์ œ์™ธํ•˜๋Š” collapsed gibbs sampling ๊ธฐ๋ฒ• ํ™œ์šฉํ•œ๋‹ค.
์‰ฝ๊ฒŒ ๋งํ•ด์„œ, z๋งŒ ๊ตฌํ•˜๋ฉด phi์™€ theta๋Š” z๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๊ตฌํ• ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— z๋งŒ ๊ตฌํ•˜๊ฒ ๋‹ค๋Š” ๊ฒƒ
LDA์˜ ๊น์Šค ์ƒ˜ํ”Œ๋ง ๊ณผ์ •์„ ์ˆ˜์‹์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.
i๋ฒˆ์งธ ๋‹จ์–ด์˜ ํ† ํ”ฝ์ •๋ณด๋ฅผ ์ œ์™ธํ•œ ๋ชจ๋“  ๋‹จ์–ด์˜ ํ† ํ”ฝ์ •๋ณด
LDA์˜ ๊น์Šค ์ƒ˜ํ”Œ๋ง
LDA์˜ ๊น์Šค ์ƒ˜ํ”Œ๋ง
LDA์˜ ๊น์Šค ์ƒ˜ํ”Œ๋ง
LDA์˜ ๊น์Šค ์ƒ˜ํ”Œ๋ง
LDA์˜ ๊น์Šค ์ƒ˜ํ”Œ๋ง
LDA์˜ ๊น์Šค ์ƒ˜ํ”Œ๋ง
LDA์˜ ๊น์Šค ์ƒ˜ํ”Œ๋ง
์‹ค์ œ ๊ณ„์‚ฐ ๊ณผ์ •
์‹ค์ œ ๊ณ„์‚ฐ ๊ณผ์ •
์ดˆ๊ธฐ์กฐ๊ฑด ๊น์Šค ์ƒ˜ํ”Œ๋ง ํ™œ์šฉํ•˜์—ฌ p(z1,2) ๊ตฌํ•˜๊ธฐ
์‹ค์ œ ๊ณ„์‚ฐ ๊ณผ์ •
์ด ์˜ˆ์‹œ์—์„œ z_1,2๋Š” Topic1์— ํ• ๋‹น๋  ๊ฐ€๋Šฅ์„ฑ์ด ๊ฐ€์žฅ ํฌ๋‹ค.
ํ•˜์ง€๋งŒ ํ™•๋ฅ ์ ์ธ ๋ฐฉ์‹์œผ๋กœ ํ† ํ”ฝ์„ ํ• ๋‹นํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฌด์กฐ๊ฑด Topic1์— ํ• ๋‹น๋œ๋‹ค๊ณ  ํ•  ์ˆ˜๋Š” ์—†์Œ
์‹ค์ œ ๊ณ„์‚ฐ ๊ณผ์ •
๊ฒฐ๊ณผ์ ์œผ๋กœ z_1,2 ๊ฐ€ Topic1์— ํ• ๋‹น๋˜์—ˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•ด๋ณด๋ฉด Doc1์˜ ํ† ํ”ฝ๋ถ„ํฌ ์ฒซ๋ฒˆ์งธ ํ† ํ”ฝ์˜ ๋‹จ์–ด๋ถ„ํฌ ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.ฮธ1 ฯ•1
๋””๋ฆฌํด๋ ˆ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ์—ญํ• 
A๋Š” d๋ฒˆ์งธ ๋ฌธ์„œ๊ฐ€ k๋ฒˆ์งธ ํ† ํ”ฝ๊ณผ ๋งบ๊ณ  ์žˆ๋Š” ์—ฐ๊ด€์„ฑ ๊ฐ•๋„๋ฅผ ๋‚˜ํƒ€๋ƒ„
B๋Š” d๋ฒˆ์งธ ๋ฌธ์„œ์˜ n๋ฒˆ์งธ ๋‹จ์–ด(w_d,n)๊ฐ€ k๋ฒˆ์งธ ํ† ํ”ฝ๊ณผ ๋งบ๊ณ  ์žˆ๋Š” ์—ฐ๊ด€์„ฑ ๊ฐ•๋„๋ฅผ ๋‚˜ํƒ€๋ƒ„
์ด์ „ ์˜ˆ์‹œ์—์„œ Topic2์— ํ• ๋‹น๋œ ๋‹จ์–ด๊ฐ€ ํ•˜๋‚˜๋„ ์—†๋Š” ์ƒํ™ฉ์ด ์žˆ์—ˆ๋‹ค. (n_1,2 = 0)
์›๋ž˜๋Œ€๋กœ๋ผ๋ฉด ์ฒซ๋ฒˆ์งธ ๋ฌธ์„œ๊ฐ€ Topic2์™€ ๋งบ๊ณ ์žˆ๋Š” ์—ฐ๊ด€์„ฑ ๊ฐ•๋„, A๋Š” 0์ด์–ด์•ผ ํ•  ๊ฒƒ,
A๊ฐ€ 0์ด๋˜๋ฉด z_d,i๊ฐ€ Topic2๊ฐ€ ๋  ํ™•๋ฅ  ๋˜ํ•œ 0์ด๊ฒŒ ๋œ๋‹ค.
๋””๋ฆฌํด๋ ˆ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ์—ญํ• 
ํ•˜์ง€๋งŒ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ์•ŒํŒŒ ๋•๋ถ„์— A๊ฐ€ ์•„์˜ˆ 0์ด๋˜๋Š” ์ƒํ™ฉ์„ ๋ฐฉ์ง€ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋จ.
์ผ์ข…์˜ Smoothing ์—ญํ• . ์•ŒํŒŒ๊ฐ€ ํด์ˆ˜๋ก ํ† ํ”ฝ๋“ค์˜ ๋ถ„ํฌ๊ฐ€ ๋น„์Šทํ•ด์ง€๊ณ  ์ž‘์„์ˆ˜๋ก ํŠน์ • ํ† ํ”ฝ์ด ํฌ๊ฒŒ ๋‚˜ํƒ€๋‚˜๊ฒŒ ๋จ.
Latent
Dirichlet
Allocation
David M. Blei | Andrew Y. Ng | Michael I. Jordan
ใ…‚ใ…‡

More Related Content

What's hot

Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationSangwoo Mo
ย 
Machine Learning lecture4(logistic regression)
Machine Learning lecture4(logistic regression)Machine Learning lecture4(logistic regression)
Machine Learning lecture4(logistic regression)cairo university
ย 
Word2 vec
Word2 vecWord2 vec
Word2 vecankit_ppt
ย 
A quick introduction to R
A quick introduction to RA quick introduction to R
A quick introduction to RAngshuman Saha
ย 
Python for R Users
Python for R UsersPython for R Users
Python for R UsersAjay Ohri
ย 
Text classification
Text classificationText classification
Text classificationJames Wong
ย 
PRML 4.1.6-4.2.2
PRML 4.1.6-4.2.2PRML 4.1.6-4.2.2
PRML 4.1.6-4.2.2kazunori sakai
ย 
ใƒžใƒซใ‚ณใƒ•ใƒขใƒ†ใ‚™ใƒซ,้š ใ‚Œใƒžใƒซใ‚ณใƒ•ใƒขใƒ‡ใƒซใจใ‚ณใƒใ‚ฏใ‚ทใƒงใƒ‹ใ‚นใƒˆๆ™‚็ณปๅˆ—ๅˆ†้กžๆณ•
ใƒžใƒซใ‚ณใƒ•ใƒขใƒ†ใ‚™ใƒซ,้š ใ‚Œใƒžใƒซใ‚ณใƒ•ใƒขใƒ‡ใƒซใจใ‚ณใƒใ‚ฏใ‚ทใƒงใƒ‹ใ‚นใƒˆๆ™‚็ณปๅˆ—ๅˆ†้กžๆณ•ใƒžใƒซใ‚ณใƒ•ใƒขใƒ†ใ‚™ใƒซ,้š ใ‚Œใƒžใƒซใ‚ณใƒ•ใƒขใƒ‡ใƒซใจใ‚ณใƒใ‚ฏใ‚ทใƒงใƒ‹ใ‚นใƒˆๆ™‚็ณปๅˆ—ๅˆ†้กžๆณ•
ใƒžใƒซใ‚ณใƒ•ใƒขใƒ†ใ‚™ใƒซ,้š ใ‚Œใƒžใƒซใ‚ณใƒ•ใƒขใƒ‡ใƒซใจใ‚ณใƒใ‚ฏใ‚ทใƒงใƒ‹ใ‚นใƒˆๆ™‚็ณปๅˆ—ๅˆ†้กžๆณ•Shuhei Sowa
ย 
Applying data science to sales pipelines - for fun and profit
Applying data science to sales pipelines - for fun and profitApplying data science to sales pipelines - for fun and profit
Applying data science to sales pipelines - for fun and profitAndy Twigg
ย 
PRML Chapter 10
PRML Chapter 10PRML Chapter 10
PRML Chapter 10Sunwoo Kim
ย 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationMarco Righini
ย 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - IntroductionChristian Perone
ย 
PRML Chapter 3
PRML Chapter 3PRML Chapter 3
PRML Chapter 3Sunwoo Kim
ย 
20191019 sinkhorn
20191019 sinkhorn20191019 sinkhorn
20191019 sinkhornTaku Yoshioka
ย 
Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vecLeiden University
ย 
Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...
Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...
Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...Edureka!
ย 
์•ˆ.์ „.์ œ.์ผ. ๊ฐ•ํ™”ํ•™์Šต!
์•ˆ.์ „.์ œ.์ผ. ๊ฐ•ํ™”ํ•™์Šต!์•ˆ.์ „.์ œ.์ผ. ๊ฐ•ํ™”ํ•™์Šต!
์•ˆ.์ „.์ œ.์ผ. ๊ฐ•ํ™”ํ•™์Šต!Dongmin Lee
ย 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
ย 
20141214 ๋น…๋ฐ์ดํ„ฐ์‹ค์ „๊ธฐ์ˆ  - ์œ ์‚ฌ๋„ ๋ฐ ๊ตฐ์ง‘ํ™” ๋ฐฉ๋ฒ• (Similarity&Clustering)
20141214 ๋น…๋ฐ์ดํ„ฐ์‹ค์ „๊ธฐ์ˆ  - ์œ ์‚ฌ๋„ ๋ฐ ๊ตฐ์ง‘ํ™” ๋ฐฉ๋ฒ• (Similarity&Clustering) 20141214 ๋น…๋ฐ์ดํ„ฐ์‹ค์ „๊ธฐ์ˆ  - ์œ ์‚ฌ๋„ ๋ฐ ๊ตฐ์ง‘ํ™” ๋ฐฉ๋ฒ• (Similarity&Clustering)
20141214 ๋น…๋ฐ์ดํ„ฐ์‹ค์ „๊ธฐ์ˆ  - ์œ ์‚ฌ๋„ ๋ฐ ๊ตฐ์ง‘ํ™” ๋ฐฉ๋ฒ• (Similarity&Clustering) Tae Young Lee
ย 

What's hot (20)

Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
ย 
Machine Learning lecture4(logistic regression)
Machine Learning lecture4(logistic regression)Machine Learning lecture4(logistic regression)
Machine Learning lecture4(logistic regression)
ย 
Word2 vec
Word2 vecWord2 vec
Word2 vec
ย 
A quick introduction to R
A quick introduction to RA quick introduction to R
A quick introduction to R
ย 
Python for R Users
Python for R UsersPython for R Users
Python for R Users
ย 
Text classification
Text classificationText classification
Text classification
ย 
04 tรณpico 3 - regressรฃo multipla
04   tรณpico 3 - regressรฃo multipla04   tรณpico 3 - regressรฃo multipla
04 tรณpico 3 - regressรฃo multipla
ย 
PRML 4.1.6-4.2.2
PRML 4.1.6-4.2.2PRML 4.1.6-4.2.2
PRML 4.1.6-4.2.2
ย 
ใƒžใƒซใ‚ณใƒ•ใƒขใƒ†ใ‚™ใƒซ,้š ใ‚Œใƒžใƒซใ‚ณใƒ•ใƒขใƒ‡ใƒซใจใ‚ณใƒใ‚ฏใ‚ทใƒงใƒ‹ใ‚นใƒˆๆ™‚็ณปๅˆ—ๅˆ†้กžๆณ•
ใƒžใƒซใ‚ณใƒ•ใƒขใƒ†ใ‚™ใƒซ,้š ใ‚Œใƒžใƒซใ‚ณใƒ•ใƒขใƒ‡ใƒซใจใ‚ณใƒใ‚ฏใ‚ทใƒงใƒ‹ใ‚นใƒˆๆ™‚็ณปๅˆ—ๅˆ†้กžๆณ•ใƒžใƒซใ‚ณใƒ•ใƒขใƒ†ใ‚™ใƒซ,้š ใ‚Œใƒžใƒซใ‚ณใƒ•ใƒขใƒ‡ใƒซใจใ‚ณใƒใ‚ฏใ‚ทใƒงใƒ‹ใ‚นใƒˆๆ™‚็ณปๅˆ—ๅˆ†้กžๆณ•
ใƒžใƒซใ‚ณใƒ•ใƒขใƒ†ใ‚™ใƒซ,้š ใ‚Œใƒžใƒซใ‚ณใƒ•ใƒขใƒ‡ใƒซใจใ‚ณใƒใ‚ฏใ‚ทใƒงใƒ‹ใ‚นใƒˆๆ™‚็ณปๅˆ—ๅˆ†้กžๆณ•
ย 
Applying data science to sales pipelines - for fun and profit
Applying data science to sales pipelines - for fun and profitApplying data science to sales pipelines - for fun and profit
Applying data science to sales pipelines - for fun and profit
ย 
PRML Chapter 10
PRML Chapter 10PRML Chapter 10
PRML Chapter 10
ย 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
ย 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - Introduction
ย 
PRML Chapter 3
PRML Chapter 3PRML Chapter 3
PRML Chapter 3
ย 
20191019 sinkhorn
20191019 sinkhorn20191019 sinkhorn
20191019 sinkhorn
ย 
Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vec
ย 
Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...
Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...
Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...
ย 
์•ˆ.์ „.์ œ.์ผ. ๊ฐ•ํ™”ํ•™์Šต!
์•ˆ.์ „.์ œ.์ผ. ๊ฐ•ํ™”ํ•™์Šต!์•ˆ.์ „.์ œ.์ผ. ๊ฐ•ํ™”ํ•™์Šต!
์•ˆ.์ „.์ œ.์ผ. ๊ฐ•ํ™”ํ•™์Šต!
ย 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
ย 
20141214 ๋น…๋ฐ์ดํ„ฐ์‹ค์ „๊ธฐ์ˆ  - ์œ ์‚ฌ๋„ ๋ฐ ๊ตฐ์ง‘ํ™” ๋ฐฉ๋ฒ• (Similarity&Clustering)
20141214 ๋น…๋ฐ์ดํ„ฐ์‹ค์ „๊ธฐ์ˆ  - ์œ ์‚ฌ๋„ ๋ฐ ๊ตฐ์ง‘ํ™” ๋ฐฉ๋ฒ• (Similarity&Clustering) 20141214 ๋น…๋ฐ์ดํ„ฐ์‹ค์ „๊ธฐ์ˆ  - ์œ ์‚ฌ๋„ ๋ฐ ๊ตฐ์ง‘ํ™” ๋ฐฉ๋ฒ• (Similarity&Clustering)
20141214 ๋น…๋ฐ์ดํ„ฐ์‹ค์ „๊ธฐ์ˆ  - ์œ ์‚ฌ๋„ ๋ฐ ๊ตฐ์ง‘ํ™” ๋ฐฉ๋ฒ• (Similarity&Clustering)
ย 

More from KyeongUkJang

Photo wake up - 3d character animation from a single photo
Photo wake up - 3d character animation from a single photoPhoto wake up - 3d character animation from a single photo
Photo wake up - 3d character animation from a single photoKyeongUkJang
ย 
AlphagoZero
AlphagoZeroAlphagoZero
AlphagoZeroKyeongUkJang
ย 
GoogLenet
GoogLenetGoogLenet
GoogLenetKyeongUkJang
ย 
GAN - Generative Adversarial Nets
GAN - Generative Adversarial NetsGAN - Generative Adversarial Nets
GAN - Generative Adversarial NetsKyeongUkJang
ย 
Distilling the knowledge in a neural network
Distilling the knowledge in a neural networkDistilling the knowledge in a neural network
Distilling the knowledge in a neural networkKyeongUkJang
ย 
Gaussian Mixture Model
Gaussian Mixture ModelGaussian Mixture Model
Gaussian Mixture ModelKyeongUkJang
ย 
CNN for sentence classification
CNN for sentence classificationCNN for sentence classification
CNN for sentence classificationKyeongUkJang
ย 
Visualizing data using t-SNE
Visualizing data using t-SNEVisualizing data using t-SNE
Visualizing data using t-SNEKyeongUkJang
ย 
Playing atari with deep reinforcement learning
Playing atari with deep reinforcement learningPlaying atari with deep reinforcement learning
Playing atari with deep reinforcement learningKyeongUkJang
ย 
Chapter 20 - GAN
Chapter 20 - GANChapter 20 - GAN
Chapter 20 - GANKyeongUkJang
ย 
Chapter 20 - VAE
Chapter 20 - VAEChapter 20 - VAE
Chapter 20 - VAEKyeongUkJang
ย 
Chapter 20 Deep generative models
Chapter 20 Deep generative modelsChapter 20 Deep generative models
Chapter 20 Deep generative modelsKyeongUkJang
ย 
Chapter 19 Variational Inference
Chapter 19 Variational InferenceChapter 19 Variational Inference
Chapter 19 Variational InferenceKyeongUkJang
ย 
Natural Language Processing(NLP) - basic 2
Natural Language Processing(NLP) - basic 2Natural Language Processing(NLP) - basic 2
Natural Language Processing(NLP) - basic 2KyeongUkJang
ย 
Natural Language Processing(NLP) - Basic
Natural Language Processing(NLP) - BasicNatural Language Processing(NLP) - Basic
Natural Language Processing(NLP) - BasicKyeongUkJang
ย 
Chapter 17 monte carlo methods
Chapter 17 monte carlo methodsChapter 17 monte carlo methods
Chapter 17 monte carlo methodsKyeongUkJang
ย 
Chapter 16 structured probabilistic models for deep learning - 2
Chapter 16 structured probabilistic models for deep learning - 2Chapter 16 structured probabilistic models for deep learning - 2
Chapter 16 structured probabilistic models for deep learning - 2KyeongUkJang
ย 
Chapter 16 structured probabilistic models for deep learning - 1
Chapter 16 structured probabilistic models for deep learning - 1Chapter 16 structured probabilistic models for deep learning - 1
Chapter 16 structured probabilistic models for deep learning - 1KyeongUkJang
ย 
Chapter 15 Representation learning - 2
Chapter 15 Representation learning - 2Chapter 15 Representation learning - 2
Chapter 15 Representation learning - 2KyeongUkJang
ย 

More from KyeongUkJang (20)

Photo wake up - 3d character animation from a single photo
Photo wake up - 3d character animation from a single photoPhoto wake up - 3d character animation from a single photo
Photo wake up - 3d character animation from a single photo
ย 
YOLO
YOLOYOLO
YOLO
ย 
AlphagoZero
AlphagoZeroAlphagoZero
AlphagoZero
ย 
GoogLenet
GoogLenetGoogLenet
GoogLenet
ย 
GAN - Generative Adversarial Nets
GAN - Generative Adversarial NetsGAN - Generative Adversarial Nets
GAN - Generative Adversarial Nets
ย 
Distilling the knowledge in a neural network
Distilling the knowledge in a neural networkDistilling the knowledge in a neural network
Distilling the knowledge in a neural network
ย 
Gaussian Mixture Model
Gaussian Mixture ModelGaussian Mixture Model
Gaussian Mixture Model
ย 
CNN for sentence classification
CNN for sentence classificationCNN for sentence classification
CNN for sentence classification
ย 
Visualizing data using t-SNE
Visualizing data using t-SNEVisualizing data using t-SNE
Visualizing data using t-SNE
ย 
Playing atari with deep reinforcement learning
Playing atari with deep reinforcement learningPlaying atari with deep reinforcement learning
Playing atari with deep reinforcement learning
ย 
Chapter 20 - GAN
Chapter 20 - GANChapter 20 - GAN
Chapter 20 - GAN
ย 
Chapter 20 - VAE
Chapter 20 - VAEChapter 20 - VAE
Chapter 20 - VAE
ย 
Chapter 20 Deep generative models
Chapter 20 Deep generative modelsChapter 20 Deep generative models
Chapter 20 Deep generative models
ย 
Chapter 19 Variational Inference
Chapter 19 Variational InferenceChapter 19 Variational Inference
Chapter 19 Variational Inference
ย 
Natural Language Processing(NLP) - basic 2
Natural Language Processing(NLP) - basic 2Natural Language Processing(NLP) - basic 2
Natural Language Processing(NLP) - basic 2
ย 
Natural Language Processing(NLP) - Basic
Natural Language Processing(NLP) - BasicNatural Language Processing(NLP) - Basic
Natural Language Processing(NLP) - Basic
ย 
Chapter 17 monte carlo methods
Chapter 17 monte carlo methodsChapter 17 monte carlo methods
Chapter 17 monte carlo methods
ย 
Chapter 16 structured probabilistic models for deep learning - 2
Chapter 16 structured probabilistic models for deep learning - 2Chapter 16 structured probabilistic models for deep learning - 2
Chapter 16 structured probabilistic models for deep learning - 2
ย 
Chapter 16 structured probabilistic models for deep learning - 1
Chapter 16 structured probabilistic models for deep learning - 1Chapter 16 structured probabilistic models for deep learning - 1
Chapter 16 structured probabilistic models for deep learning - 1
ย 
Chapter 15 Representation learning - 2
Chapter 15 Representation learning - 2Chapter 15 Representation learning - 2
Chapter 15 Representation learning - 2
ย 

Recently uploaded

์บ๋“œ์•ค๊ทธ๋ž˜ํ”ฝ์Šค 2024๋…„ 5์›”ํ˜ธ ๋ชฉ์ฐจ
์บ๋“œ์•ค๊ทธ๋ž˜ํ”ฝ์Šค 2024๋…„ 5์›”ํ˜ธ ๋ชฉ์ฐจ์บ๋“œ์•ค๊ทธ๋ž˜ํ”ฝ์Šค 2024๋…„ 5์›”ํ˜ธ ๋ชฉ์ฐจ
์บ๋“œ์•ค๊ทธ๋ž˜ํ”ฝ์Šค 2024๋…„ 5์›”ํ˜ธ ๋ชฉ์ฐจ์บ๋“œ์•ค๊ทธ๋ž˜ํ”ฝ์Šค
ย 
Merge (Kitworks Team Study ์ด์„ฑ์ˆ˜ ๋ฐœํ‘œ์ž๋ฃŒ 240426)
Merge (Kitworks Team Study ์ด์„ฑ์ˆ˜ ๋ฐœํ‘œ์ž๋ฃŒ 240426)Merge (Kitworks Team Study ์ด์„ฑ์ˆ˜ ๋ฐœํ‘œ์ž๋ฃŒ 240426)
Merge (Kitworks Team Study ์ด์„ฑ์ˆ˜ ๋ฐœํ‘œ์ž๋ฃŒ 240426)Wonjun Hwang
ย 
MOODv2 : Masked Image Modeling for Out-of-Distribution Detection
MOODv2 : Masked Image Modeling for Out-of-Distribution DetectionMOODv2 : Masked Image Modeling for Out-of-Distribution Detection
MOODv2 : Masked Image Modeling for Out-of-Distribution DetectionKim Daeun
ย 
A future that integrates LLMs and LAMs (Symposium)
A future that integrates LLMs and LAMs (Symposium)A future that integrates LLMs and LAMs (Symposium)
A future that integrates LLMs and LAMs (Symposium)Tae Young Lee
ย 
Continual Active Learning for Efficient Adaptation of Machine LearningModels ...
Continual Active Learning for Efficient Adaptation of Machine LearningModels ...Continual Active Learning for Efficient Adaptation of Machine LearningModels ...
Continual Active Learning for Efficient Adaptation of Machine LearningModels ...Kim Daeun
ย 
Console API (Kitworks Team Study ๋ฐฑํ˜œ์ธ ๋ฐœํ‘œ์ž๋ฃŒ)
Console API (Kitworks Team Study ๋ฐฑํ˜œ์ธ ๋ฐœํ‘œ์ž๋ฃŒ)Console API (Kitworks Team Study ๋ฐฑํ˜œ์ธ ๋ฐœํ‘œ์ž๋ฃŒ)
Console API (Kitworks Team Study ๋ฐฑํ˜œ์ธ ๋ฐœํ‘œ์ž๋ฃŒ)Wonjun Hwang
ย 

Recently uploaded (6)

์บ๋“œ์•ค๊ทธ๋ž˜ํ”ฝ์Šค 2024๋…„ 5์›”ํ˜ธ ๋ชฉ์ฐจ
์บ๋“œ์•ค๊ทธ๋ž˜ํ”ฝ์Šค 2024๋…„ 5์›”ํ˜ธ ๋ชฉ์ฐจ์บ๋“œ์•ค๊ทธ๋ž˜ํ”ฝ์Šค 2024๋…„ 5์›”ํ˜ธ ๋ชฉ์ฐจ
์บ๋“œ์•ค๊ทธ๋ž˜ํ”ฝ์Šค 2024๋…„ 5์›”ํ˜ธ ๋ชฉ์ฐจ
ย 
Merge (Kitworks Team Study ์ด์„ฑ์ˆ˜ ๋ฐœํ‘œ์ž๋ฃŒ 240426)
Merge (Kitworks Team Study ์ด์„ฑ์ˆ˜ ๋ฐœํ‘œ์ž๋ฃŒ 240426)Merge (Kitworks Team Study ์ด์„ฑ์ˆ˜ ๋ฐœํ‘œ์ž๋ฃŒ 240426)
Merge (Kitworks Team Study ์ด์„ฑ์ˆ˜ ๋ฐœํ‘œ์ž๋ฃŒ 240426)
ย 
MOODv2 : Masked Image Modeling for Out-of-Distribution Detection
MOODv2 : Masked Image Modeling for Out-of-Distribution DetectionMOODv2 : Masked Image Modeling for Out-of-Distribution Detection
MOODv2 : Masked Image Modeling for Out-of-Distribution Detection
ย 
A future that integrates LLMs and LAMs (Symposium)
A future that integrates LLMs and LAMs (Symposium)A future that integrates LLMs and LAMs (Symposium)
A future that integrates LLMs and LAMs (Symposium)
ย 
Continual Active Learning for Efficient Adaptation of Machine LearningModels ...
Continual Active Learning for Efficient Adaptation of Machine LearningModels ...Continual Active Learning for Efficient Adaptation of Machine LearningModels ...
Continual Active Learning for Efficient Adaptation of Machine LearningModels ...
ย 
Console API (Kitworks Team Study ๋ฐฑํ˜œ์ธ ๋ฐœํ‘œ์ž๋ฃŒ)
Console API (Kitworks Team Study ๋ฐฑํ˜œ์ธ ๋ฐœํ‘œ์ž๋ฃŒ)Console API (Kitworks Team Study ๋ฐฑํ˜œ์ธ ๋ฐœํ‘œ์ž๋ฃŒ)
Console API (Kitworks Team Study ๋ฐฑํ˜œ์ธ ๋ฐœํ‘œ์ž๋ฃŒ)
ย 

Latent Dirichlet Allocation

  • 1. Latent Dirichlet Allocation David M. Blei | Andrew Y. Ng | Michael I. Jordan ใ…Žใ…‡
  • 2. ๋ชจ๋ธ ๊ฐœ์š” ํ† ํ”ฝ๋ณ„ ๋‹จ์–ด์˜ ๋ถ„ํฌ ๋ฌธ์„œ๋ณ„ ํ† ํ”ฝ์˜ ๋ถ„ํฌ ๊ฐ ๋ฌธ์„œ์— ์–ด๋–ค ์ฃผ์ œ๋“ค์ด ์กด์žฌํ•˜๋Š”์ง€์— ๋Œ€ํ•œ ํ™•๋ฅ ๋ชจํ˜•
  • 3. ๊ธ€์“ฐ๊ธฐ์˜ ๊ณผ์ • ๊ธ€๊ฐ, ์ฃผ์ œ ์ •ํ•˜๊ธฐ ์–ด๋–ค ๋‹จ์–ด๋ฅผ ์“ธ๊นŒ? ์‚ฌ๋žŒ LDA ์˜ ๊ฐ€์ • ๋ง๋ญ‰์น˜(corpus)๋กœ๋ถ€ํ„ฐ ์–ป์€ ํ† ํ”ฝ์˜ ๋ถ„ํฌ๋กœ๋ถ€ํ„ฐ ํ† ํ”ฝ ์„ ์ • ์„ ์ •๋œ ํ† ํ”ฝ์— ํ•ด๋‹นํ•˜๋Š” ๋‹จ์–ด๋“ค์„ ๋ฝ‘์•„์„œ ์“ฐ์ž! ์‹ค์ œ๋กœ ์ด๋Ÿฐ๋‹ค๋Š”๊ฑด์•„๋‹ˆ๊ณ  ์ด๋ ‡๊ฒŒ ๋  ๊ฒƒ์ด๋ผ ๊ฐ€์ •ํ•œ๋‹ค๋Š”๊ฒƒ
  • 4. ๋ฐ˜๋Œ€๋ฐฉํ–ฅ์œผ๋กœ ์ƒ๊ฐํ•ด๋ณด์ž ํ˜„์žฌ ๋ฌธ์„œ์— ๋“ฑ์žฅํ•œ ๋‹จ์–ด๋“ค์€ ์–ด๋–ค ํ† ํ”ฝ์—์„œ ๋‚˜์˜จ ๋‹จ์–ด๋“ค์ผ๊นŒ? ๋ช…์‹œ์ ์œผ๋กœ ์•Œ๊ธฐ๊ฐ€ ์–ด๋ ค์›€ LDA๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ง๋ญ‰์น˜ ์ด๋ฉด์— ์กด์žฌํ•˜๋Š” ์ •๋ณด๋ฅผ ์ถ”๋ก ํ•ด ๋‚ธ๋‹ค. ๊ทธ๋Ÿผ D์˜ Dirichlet๋Š” ๋ญ์•ผ? LDA์˜ L์€ latent ์ž ์žฌ์ •๋ณด๋ฅผ ์•Œ์•„๋‚ธ๋‹ค๋Š”๊ฒƒ ์ผ๋‹จ ๋””๋ฆฌํด๋ ˆ๋ผ๋Š” ๋ถ„ํฌ๊ฐ€ ์žˆ๋‹ค๋Š”๊ฒƒ๋งŒ ์•Œ๊ณ  ๋„˜์–ด๊ฐ€์ž
  • 5. Architecture ๋ง๋ญ‰์น˜ ์ „์ฒด ๋ฌธ์„œ์˜ ๊ฐฏ์ˆ˜ ์ „์ฒด ํ† ํ”ฝ์˜ ์ˆ˜ (ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ) d๋ฒˆ์งธ ๋ฌธ์„œ์˜ ๋‹จ์–ด ์ˆ˜ ์œ ์ผํ•œ ๊ด€์ฐฐ๊ฐ€๋Šฅ ๋ณ€์ˆ˜
  • 7. ๋ชจ๋ธ์˜ ๋ณ€์ˆ˜ ฯ•k ๋Š” k๋ฒˆ์งธ ํ† ํ”ฝ์˜ ๋‹จ์–ด๋น„์ค‘์„ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฒกํ„ฐ ๋ง๋ญ‰์น˜ ์ „์ฒด ๋‹จ์–ด ๊ฐœ์ˆ˜๋งŒํผ์˜ ๊ธธ์ด๋ฅผ ๊ฐ–๊ฒŒ๋จ. ฯ•1 ฯ•2 ฯ•3 ๊ฐ entry value๋Š” ํ•ด๋‹น ๋‹จ์–ด๊ฐ€ k๋ฒˆ์งธ ํ† ํ”ฝ์—์„œ ์ฐจ์ง€ํ•˜๋Š” ๋น„์ค‘์„ ๋‚˜ํƒ€๋ƒ„ ๊ฐ ์š”์†Œ๋Š” ํ™•๋ฅ ์ด๋ฏ€๋กœ ์—ด์˜ ์ด ํ•ฉ์€ 1์ด ๋œ๋‹ค. ์•„ํ‚คํ…์ฒ˜๋ฅผ ์‚ดํŽด๋ณด๋ฉด ฯ•k ๋Š” ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ฮฒ ์˜ ์˜ํ–ฅ์„ ๋ฐ›๊ณ  ์žˆ์Œ. ์ด๋Š” LDA์—์„œ ํ† ํ”ฝ์˜ ๋‹จ์–ด๋น„์ค‘ ฯ•k ์ด ๋””๋ฆฌํด๋ ˆ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅธ๋‹ค๋Š” ๊ฐ€์ •์„ ์ทจํ•˜๊ธฐ ๋•Œ๋ฌธ. ์ž์„ธํ•œ ์ด๋ก ์  ๋‚ด์šฉ์€ ์ž ์‹œ ํ›„์—
  • 8. ๋ชจ๋ธ์˜ ๋ณ€์ˆ˜ ฮธd ๋Š” d๋ฒˆ์งธ ๋ฌธ์„œ๊ฐ€ ๊ฐ€์ง„ ํ† ํ”ฝ ๋น„์ค‘์„ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฒกํ„ฐ ์ „์ฒด ํ† ํ”ฝ ๊ฐœ์ˆ˜ K๋งŒํผ์˜ ๊ธธ์ด๋ฅผ ๊ฐ–๊ฒŒ๋จ. ฮธ1 ๊ฐ entry value๋Š” k๋ฒˆ์งธ ํ† ํ”ฝ์ด ํ•ด๋‹น d๋ฒˆ์งธ ๋ฌธ์„œ์—์„œ ์ฐจ์ง€ํ•˜๋Š” ๋น„์ค‘์„ ๋‚˜ํƒ€๋ƒ„ ๊ฐ ์š”์†Œ๋Š” ํ™•๋ฅ ์ด๋ฏ€๋กœ ๊ฐ ํ–‰์˜ ์ด ํ•ฉ์€ 1์ด ๋œ๋‹ค. ์•„ํ‚คํ…์ฒ˜๋ฅผ ์‚ดํŽด๋ณด๋ฉด ฮธd ๋Š” ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ฮฑ ์˜ ์˜ํ–ฅ์„ ๋ฐ›๊ณ  ์žˆ์Œ. ์ด๋Š” LDA์—์„œ ๋ฌธ์„œ์˜ ํ† ํ”ฝ ๋น„์ค‘ ฮธd ์—ญ์‹œ ๋””๋ฆฌํด๋ ˆ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅธ๋‹ค๋Š” ๊ฐ€์ •์„ ์ทจํ•˜๊ธฐ ๋•Œ๋ฌธ. ์ž์„ธํ•œ ์ด๋ก ์  ๋‚ด์šฉ์€ ์ž ์‹œ ํ›„์— ฮธ2 ฮธ3 ฮธ4 ฮธ5 ฮธ6
  • 9. ๋ชจ๋ธ์˜ ๋ณ€์ˆ˜ zd,n ๋Š” d๋ฒˆ์งธ ๋ฌธ์„œ์˜ n๋ฒˆ์งธ ๋‹จ์–ด๊ฐ€ ์–ด๋–ค ํ† ํ”ฝ์— ํ•ด๋‹นํ•˜๋Š”์ง€ ํ• ๋‹นํ•ด์ฃผ๋Š” ์—ญํ•  ์˜ˆ์ปจ๋ฐ ์„ธ๋ฒˆ์งธ ๋ฌธ์„œ์˜ ์ฒซ๋ฒˆ์งธ ๋‹จ์–ด๋Š” Topic2์ผ ๊ฐ€๋Šฅ์„ฑ์ด ๊ฐ€์žฅ ๋†’๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ์Œ wd,n ์€ ๋ฌธ์„œ์— ๋“ฑ์žฅํ•˜๋Š” ๋‹จ์–ด๋ฅผ ํ• ๋‹นํ•ด ์ฃผ๋Š” ์—ญํ• . ์ง์ „ ์˜ˆ์‹œ์—์„œ z_3,1์ด ์‹ค์ œ๋กœ Topic2์— ํ• ๋‹น๋˜์—ˆ๋‹ค๊ณ  ํ–ˆ์„๋•Œ, Topic2์˜ ๋‹จ์–ด๋ถ„ํฌ ๊ฐ€์šด๋ฐ Money์˜ ํ™•๋ฅ ์ด ๊ฐ€์žฅ ๋†’์œผ๋ฏ€๋กœ w_3,1์€ Money๊ฐ€ ๋  ๊ฐ€๋Šฅ์„ฑ์ด ๊ฐ€์žฅ ๋†’์Œ ๋™์‹œ์— ์˜ํ–ฅ์„ ๋ฐ›์Œzd,nฯ•k
  • 10. Architecture ๋ง๋ญ‰์น˜ ์ „์ฒด ๋ฌธ์„œ์˜ ๊ฐฏ์ˆ˜ ์ „์ฒด ํ† ํ”ฝ์˜ ์ˆ˜ (ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ) d๋ฒˆ์งธ ๋ฌธ์„œ์˜ ๋‹จ์–ด ์ˆ˜ ์œ ์ผํ•œ ๊ด€์ฐฐ๊ฐ€๋Šฅ ๋ณ€์ˆ˜
  • 11. LDA์˜ inference ์ง€๊ธˆ๊นŒ์ง€๋Š” LDA๊ฐ€ ๊ฐ€์ •ํ•˜๋Š” ๋ฌธ์„œ์ƒ์„ฑ๊ณผ์ •๊ณผ ์ž ์žฌ๋ณ€์ˆ˜๋“ค์˜ ์—ญํ• ์„ ์‚ดํŽด๋ณด์•˜๋‹ค. ์ด์ œ๋Š” ๋ฐ˜๋Œ€๋กœ ๊ด€์ธก๋œ W_d,n์„ ๊ฐ€์ง€๊ณ  ์ž ์žฌ๋ณ€์ˆ˜๋ฅผ ์ถ”์ •ํ•˜๋Š” inference ๊ณผ์ •์„ ์‚ดํŽด๋ณด์ž. LDA๋Š” ํ† ํ”ฝ์˜ ๋‹จ์–ด๋ถ„ํฌ์™€ ๋ฌธ์„œ์˜ ํ† ํ”ฝ๋ถ„ํฌ์˜ ๊ฒฐํ•ฉ์œผ๋กœ ๋ฌธ์„œ ๋‚ด ๋‹จ์–ด๋“ค์ด ์ƒ์„ฑ๋จ์„ ๊ฐ€์ •ํ•˜๊ณ  ์žˆ๋‹ค. ์‹ค์ œ ๊ด€์ธก๋œ ๋ฌธ์„œ ๋‚ด ๋‹จ์–ด๋ฅผ ๊ฐ€์ง€๊ณ  ์šฐ๋ฆฌ๊ฐ€ ์•Œ๊ณ  ์‹ถ์€ ํ† ํ”ฝ์˜ ๋‹จ์–ด ๋ถ„ํฌ, ๋ฌธ์„œ์˜ ํ† ํ”ฝ ๋ถ„ํฌ๋ฅผ ์ถ”์ •ํ•  ๊ฒƒ ๋ฌธ์„œ ์ƒ์„ฑ ๊ณผ์ •์ด ํ•ฉ๋ฆฌ์ ์ด๋ผ๋ฉด ์ด ๊ฒฐํ•ฉํ™•๋ฅ ์ด ๋งค์šฐ ํด ๊ฒƒ ฯ•k ฮธd
  • 12. LDA์˜ inference ์—ฌ๊ธฐ์—์„œ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ์•ŒํŒŒ์™€ ๋ฒ ํƒ€, ๊ทธ๋ฆฌ๊ณ  ๊ด€์ฐฐ ๊ฐ€๋Šฅํ•œ w_d,n์„ ์ œ์™ธํ•œ ๋ชจ๋“  ๋ณ€์ˆ˜๊ฐ€ ๋ฏธ์ง€์ˆ˜. p(z, ฯ•, ฮธ|w)๊ฒฐ๊ตญ, ๋ฅผ ์ตœ๋Œ€๋กœ ๋งŒ๋“œ๋Š” z, ฯ•, ฮธ ๋ฅผ ์ฐพ๋Š”๊ฒƒ์ด ๋ชฉ์  ๊ทธ๋Ÿฐ๋ฐ ์—ฌ๊ธฐ์—์„œ ๋ถ„๋ชจ์— ํ•ด๋‹นํ•˜๋Š” p(w) ๋ฅผ ๋ฐ”๋กœ ๊ตฌํ• ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ๊น์Šค ์ƒ˜ํ”Œ๋ง ํ™œ์šฉ
  • 18. LDA์˜ ๊น์Šค ์ƒ˜ํ”Œ๋ง LDA ์—์„œ๋Š” ๋‚˜๋จธ์ง€ ๋ณ€์ˆ˜๋Š” ๊ณ ์ •์‹œํ‚จ ์ฑ„ ํ•œ ๋ณ€์ˆ˜๋งŒ์„ ๋ณ€ํ™”์‹œํ‚ค๋˜, ๋ถˆํ•„์š”ํ•œ ๋ณ€์ˆ˜๋ฅผ ์ œ์™ธํ•˜๋Š” collapsed gibbs sampling ๊ธฐ๋ฒ• ํ™œ์šฉํ•œ๋‹ค. ์‰ฝ๊ฒŒ ๋งํ•ด์„œ, z๋งŒ ๊ตฌํ•˜๋ฉด phi์™€ theta๋Š” z๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๊ตฌํ• ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— z๋งŒ ๊ตฌํ•˜๊ฒ ๋‹ค๋Š” ๊ฒƒ LDA์˜ ๊น์Šค ์ƒ˜ํ”Œ๋ง ๊ณผ์ •์„ ์ˆ˜์‹์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. i๋ฒˆ์งธ ๋‹จ์–ด์˜ ํ† ํ”ฝ์ •๋ณด๋ฅผ ์ œ์™ธํ•œ ๋ชจ๋“  ๋‹จ์–ด์˜ ํ† ํ”ฝ์ •๋ณด
  • 27. ์‹ค์ œ ๊ณ„์‚ฐ ๊ณผ์ • ์ดˆ๊ธฐ์กฐ๊ฑด ๊น์Šค ์ƒ˜ํ”Œ๋ง ํ™œ์šฉํ•˜์—ฌ p(z1,2) ๊ตฌํ•˜๊ธฐ
  • 28. ์‹ค์ œ ๊ณ„์‚ฐ ๊ณผ์ • ์ด ์˜ˆ์‹œ์—์„œ z_1,2๋Š” Topic1์— ํ• ๋‹น๋  ๊ฐ€๋Šฅ์„ฑ์ด ๊ฐ€์žฅ ํฌ๋‹ค. ํ•˜์ง€๋งŒ ํ™•๋ฅ ์ ์ธ ๋ฐฉ์‹์œผ๋กœ ํ† ํ”ฝ์„ ํ• ๋‹นํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฌด์กฐ๊ฑด Topic1์— ํ• ๋‹น๋œ๋‹ค๊ณ  ํ•  ์ˆ˜๋Š” ์—†์Œ
  • 29. ์‹ค์ œ ๊ณ„์‚ฐ ๊ณผ์ • ๊ฒฐ๊ณผ์ ์œผ๋กœ z_1,2 ๊ฐ€ Topic1์— ํ• ๋‹น๋˜์—ˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•ด๋ณด๋ฉด Doc1์˜ ํ† ํ”ฝ๋ถ„ํฌ ์ฒซ๋ฒˆ์งธ ํ† ํ”ฝ์˜ ๋‹จ์–ด๋ถ„ํฌ ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.ฮธ1 ฯ•1
  • 30. ๋””๋ฆฌํด๋ ˆ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ์—ญํ•  A๋Š” d๋ฒˆ์งธ ๋ฌธ์„œ๊ฐ€ k๋ฒˆ์งธ ํ† ํ”ฝ๊ณผ ๋งบ๊ณ  ์žˆ๋Š” ์—ฐ๊ด€์„ฑ ๊ฐ•๋„๋ฅผ ๋‚˜ํƒ€๋ƒ„ B๋Š” d๋ฒˆ์งธ ๋ฌธ์„œ์˜ n๋ฒˆ์งธ ๋‹จ์–ด(w_d,n)๊ฐ€ k๋ฒˆ์งธ ํ† ํ”ฝ๊ณผ ๋งบ๊ณ  ์žˆ๋Š” ์—ฐ๊ด€์„ฑ ๊ฐ•๋„๋ฅผ ๋‚˜ํƒ€๋ƒ„ ์ด์ „ ์˜ˆ์‹œ์—์„œ Topic2์— ํ• ๋‹น๋œ ๋‹จ์–ด๊ฐ€ ํ•˜๋‚˜๋„ ์—†๋Š” ์ƒํ™ฉ์ด ์žˆ์—ˆ๋‹ค. (n_1,2 = 0) ์›๋ž˜๋Œ€๋กœ๋ผ๋ฉด ์ฒซ๋ฒˆ์งธ ๋ฌธ์„œ๊ฐ€ Topic2์™€ ๋งบ๊ณ ์žˆ๋Š” ์—ฐ๊ด€์„ฑ ๊ฐ•๋„, A๋Š” 0์ด์–ด์•ผ ํ•  ๊ฒƒ, A๊ฐ€ 0์ด๋˜๋ฉด z_d,i๊ฐ€ Topic2๊ฐ€ ๋  ํ™•๋ฅ  ๋˜ํ•œ 0์ด๊ฒŒ ๋œ๋‹ค.
  • 31. ๋””๋ฆฌํด๋ ˆ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ์—ญํ•  ํ•˜์ง€๋งŒ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ์•ŒํŒŒ ๋•๋ถ„์— A๊ฐ€ ์•„์˜ˆ 0์ด๋˜๋Š” ์ƒํ™ฉ์„ ๋ฐฉ์ง€ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋จ. ์ผ์ข…์˜ Smoothing ์—ญํ• . ์•ŒํŒŒ๊ฐ€ ํด์ˆ˜๋ก ํ† ํ”ฝ๋“ค์˜ ๋ถ„ํฌ๊ฐ€ ๋น„์Šทํ•ด์ง€๊ณ  ์ž‘์„์ˆ˜๋ก ํŠน์ • ํ† ํ”ฝ์ด ํฌ๊ฒŒ ๋‚˜ํƒ€๋‚˜๊ฒŒ ๋จ.
  • 32. Latent Dirichlet Allocation David M. Blei | Andrew Y. Ng | Michael I. Jordan ใ…‚ใ…‡