1. word2vec 1
🔤
word2vec
태그 NLP
URL https://arxiv.org/abs/1301.3781
날짜
사람 정민화
파일과미디어
논문목록
첫번째논문
Efficient Estimation of Word Representations in Vector Space
We propose two novel model architectures for computing continuous vector
representations of words from very large data sets. The quality of these
representations is measured in a word similarity task, and the results are
https://arxiv.org/abs/1301.3781
두번째논문- 다음기회에…8ㅅ8
@2023년1월12일
논문목록
gensim을활용한간단한word2vec구현(반의어)
gensim을활용한간단한word2vec구현2(활용)
첫번째논문
1. Introduction
1.1 goals of paper
❓분포가설이란❓
3. New Log-linear Models
3.1 Continuous Bag-of-Words Model(CBOW)
3.2. Continous Skip-gram Model
4. Results
4.1 task description
4.2. Maximization of Accuracy
4.3 Comparision of Model Architectures
5. Example of the Learned Relationships
6. Conclusion
2. word2vec 2
Distributed Representations of Words and Phrases and their Compositionality
The recently introduced continuous Skip-gram model is an efficient method for learning
high-quality distributed vector representations that capture a large number of precise
syntactic and semantic word relationships. In this paper we present several extensions
https://arxiv.org/abs/1310.4546
gensim을활용한간단한word2vec구현(반의어)
Google Colaboratory
https://colab.research.google.com/drive/1vUbL74gVVXGIm4A8PTwTWMg
XqsUrYv0x?usp=sharing
gensim을활용한간단한word2vec구현2(활용)
Google Colaboratory
https://colab.research.google.com/drive/10YZU2APAravfqLpz-HFxESJ9Aa
BmcTbv?usp=sharing
첫번째논문
*** CBOW, Skip-gram 설명 ***
1. Introduction
1.1 goals of paper
방대한양의단어로구성된언어데이터셋을고품질의단어벡터로변환하는기술을소개함
유사한단어가가지는여러차원의유사성(multiple degress of similarity)을반영
여러차원의유사성이란, 통사적표현(여기서는품사적인활용을뜻함)이상으로의미적인유사성을포
함하는것을의미함
예)
단어간선형정규성을보존하는새로운모델의개발
w2v은원-핫벡터로표현되던단어(희소표현)을보다저차원의분산표현으로구성되게함
vector( King ) −
′ ′
vector( Man ) +
′ ′
vector( Woman )
′ ′
5. word2vec 5
sliding window: 윈도우값이정해지면원도우를옆으로움직여서주변단어와중심단어의선택을변경해
가며학습을위한데이터셋을구성하는방법
즉, 아래의그림과같이cat, on을통해단어sat예측
2) 입력층-투사층
입력층과projection layer(투사층, 은닉층) 사이에서는가중치행렬 를곱해줌.
가중치행렬 의차원은 이며이는임베딩벡터의차원이됨.
아래그림에서는연산을통해7차원의벡터( )를5차원( )으로축소
사실상원-핫벡터와의연산이기때문에 행렬에서해당단어의인덱스를읽어오는것과같음
W
W V × M
xcat Vcat
W
6. word2vec 6
이때원-핫벡터( )와가중치행렬 의곱으로생긴결과벡터(
)들은윈도우의개수에따라평균을구함. 아래의그림에선window크기가2이므로4로나누어줌
*** skip gram은하나의원-핫벡터를입력값으로갖기때문에위과정이생략***
3) 투사층-출력층
x ,x ,x ,x
fat cat on the W V ,V ,V ,V
fat cat on the
7. word2vec 7
각결과벡터를평균한결과값 는이후새로운가중치행렬 와곱해짐
의차원이 이기때문에연산의결과로는초기원-핫벡터와같은차원의벡터( )가출력(여기
서는7차원)
이후, 는softmax함수를지나면서각원소의값이0~1의실수로표현되는스코어벡터( )가되어각단어
일확률을나타냄.
스코어벡터 의오차값을줄이기위해손실함수로cross entropy를사용함.
역전파를통해가중치행렬 값업데이트
주로 행렬의행을각단어의임베딩벡터로사용하거나, 둘다를이용하기도함
3.2. Continous Skip-gram Model
중심단어에서주변단어를예측하는모델
하나의단어가1번업데이트되는CBOW와달리단어가여러번업데이트될수있기때문에일반적으로
결과가더좋다고알려짐(다만연산량이훨씬많음)
1) 데이터셋준비
window(중심단어주변으로몇개의단어를볼지)가2일때데이터셋구성예시
v W′
W′
M × V z
z y
^
y
^
W,W′
W W,W′