Semantic analysis
Representation ofdocuments
Axes of a spatial Probabilistic topics
- Euclidean space์์ ์ ์ ๊ฐ๋ฅ
- Hard to interprete
- ๋จ์ด์์ ์ ์๋ probability distribution
- Interpretable
Topic models
Topic models์๊ธฐ๋ณธ ์์ด๋์ด
๏ฌ ๋ฌธ์๋ ํ ํฝ๋ค์ ํผํฉ ๋ชจ๋ธ์ด๋ฉฐ ๊ฐ ํ ํฝ์ ๋จ์ด์์ ์ ์๋ ํ๋ฅ ๋ถํฌ
Document
Topic i Topic j Topic k
Word Word Word Word Word Word
Probabilistic topic models. Steyvers, M. & Griffiths, T. (2006)
23.
Topic models
- TopicA: 30% broccoli, 15% bananas, 10% breakfast, 10% munching, โฆ
- Topic B: 20% cats, 20% cute, 15% dogs, 15% hamster, โฆ
Doc 1 : I like to eat broccoli and bananas.
Doc 2 : I ate a banana and tomato smoothie for breakfast.
Doc 3 : Dogs and cats are cute.
Doc 4 : My sister adopted a cats yesterday.
Doc 5 : Look at this cute hamster munching on a piece of broccoli.
์์ )
- Doc 1 and 2 : 100% topic A
- Doc 3 and 4 : 100% topic B
- Doc 5 : 60% topic A, 40% topic B