SlideShare a Scribd company logo
1 of 50
Download to read offline
Computa(onal	
  Linguis(cs	
  
Week	
  5	
  	
  
Neural	
  Networks	
  and	
  	
  
Neural	
  Language	
  Models	
  
By	
  Mark	
  Chang	
  
Outlines	
  
•  Machine	
  Learning	
  
•  Neural	
  Networks	
  
•  Training	
  Neural	
  Networks	
  
•  Vector	
  Space	
  of	
  Seman(cs	
  
•  Neural	
  Language	
  Models	
  (word2vec)	
  
Machine	
  Learning	
  
Machine	
  Learning
Training	
  
Data
Machine	
  Learning	
  
Model
Output
Answer
Error
FeedBack
Machine	
  Learning	
  
Model
Tes(ng	
  
Data
AJer	
  
Training
Output
Machine	
  Learning
Training	
  Data	
  
X	
  ,	
  Y	
  
x(i),	
  y(i)	
  
Model	
  
h	
  
Parameter	
  
w
Output	
  
h(X)
Answer	
  
Y
Cost	
  Func(on	
  
E(h(X),Y)
Feedback
X
Y
Logis(c	
  Regression	
  
Training	
  Data	
  
X	
   Y	
  
-­‐0.47241379	 0
-­‐0.35344828	 0
-­‐0.30148276	 0
0.33448276	 1
0.35344828	 1
0.37241379	 1
0.39137931	 1
0.41034483	 1
0.44931034	 1
0.49827586	 1
0.51724138	 1
….	 ….
Model	
  
Sigmoid	
  func(on	
   h(x) =
1
1 + e (w0+w1x)
w0 + w1x < 0
h(x) ⇡ 0
w0 + w1x > 0
h(x) ⇡ 1
Cost	
  Func(on	
  
•  Cross	
  Entropy	
  
	
  E(h(X), Y ) =
1
m
(
mX
i
y(i)
log(h(x(i)
)) + (1 y(i)
)log(1 h(x(i)
)))
y(i)
= 1
E(h(x(i)
), y(i)
) = log(h(x(i)
))
h(x(i)
) ⇡ 0 ) E(h(x(i)
), y(i)
) ⇡ 1
h(x(i)
) ⇡ 1 ) E(h(x(i)
), y(i)
) ⇡ 0
y(i)
= 0
E(h(x(i)
), y(i)
) = log(1 h(x(i)
))
h(x(i)
) ⇡ 0 ) E(h(x(i)
), y(i)
) ⇡ 0
h(x(i)
) ⇡ 1 ) E(h(x(i)
), y(i)
) ⇡ 1
Cost	
  Func(on	
  
•  Cross	
  Entropy	
  
	
  E(h(X), Y ) =
1
m
(
mX
i
y(i)
log(h(x(i)
)) + (1 y(i)
)log(1 h(x(i)
)))
h(x(i)
) ⇡ 0 and y(i)
= 0 ) E(h(X), Y ) ⇡ 0
h(x(i)
) ⇡ 1 and y(i)
= 1 ) E(h(X), Y ) ⇡ 0
h(x(i)
) ⇡ 0 and y(i)
= 1 ) E(h(X), Y ) ⇡ 1
h(x(i)
) ⇡ 1 and y(i)
= 0 ) E(h(X), Y ) ⇡ 1
 	
  w1	
  	
  	
  
	
  	
  w0	
  	
  	
  	
  
Feedback	
  
•  Gradient	
  Descent:	
  
	
  
w0 w0–⌘
@E(h(X), Y )
@w0
w1 w1–⌘
@E(h(X), Y )
@w1
(
@E(h(X), Y )
@w0
,
@E(h(X), Y )
@w1
)
Feedback	
  
Neural	
  Networks	
  
Neurons	
  &	
  Ac(on	
  Poten(al	
  
h`p://humanphisiology.wikispaces.com/file/view/neuron.png/
216460814/neuron.png	
  	
  
h`p://upload.wikimedia.org/wikipedia/commons/thumb/
4/4a/Ac(on_poten(al.svg/1037px-­‐
Ac(on_poten(al.svg.png	
  	
  
Synapse	
  
h`p://www.quia.com/files/quia/users/lmcgee/Systems/endocrine-­‐nervous/
synapse.gif	
  	
  
Ar(ficial	
  Neurons	
  
n
W1
W2
x1
x2
b
Wb
y
nin = w1x1 + w2x2 + wb
nout =
1
1 + e nin
nin
nout
y =
1
1 + e (w1x1+w2x2+wb)
nout = 1
nout = 0.5
nout = 0(0,0)
x2
x1
Ar(ficial	
  Neurons
nin = w1x1 + w2x2 + wb
nout =
1
1 + e nin
nin = w1x1 + w2x2 + wb
nout =
1
1 + e nin
w1x1 + w2x2 + wb = 0
w1x1 + w2x2 + wb > 0
w1x1 + w2x2 + wb < 0
1
0
Binary	
  Classifica(on:AND	
  Gate	
  
x1
 x2
 y
0
 0
 0
0
 1
 0
1
 0
 0
1
 1
 1
(0,0)
(0,1)
 (1,1)
(1,0)
0
1
n
20
20
b
-­‐30
y
x1	
  
x2	
  
y =
1
1 + e (20x1+20x2 30)
20x1 + 20x2 30 = 0
Binary	
  Classifica(on:OR	
  Gate
x1
 x2
 y
0
 0
 0
0
 1
 1
1
 0
 1
1
 1
 1
y =
1
1 + e (20x1+20x2 10)
(0,0)
(0,1)
 (1,1)
(1,0)
0
1
n
20
20
b
-­‐10
y
x1	
  
x2	
  
20x1 + 20x2 10 = 0
XOR	
  Gate	
  ?
(0,0)
(0,1)
 (1,1)
(1,0)
0
0
1
x1
 x2
 y
0
 0
 0
0
 1
 1
1
 0
 1
1
 1
 0
Binary	
  Classifica(on:XOR	
  Gate	
  
n
-­‐20
20
b
-­‐10
y
(0,0)
(0,1)
 (1,1)
(1,0)
0
1
(0,0)
(0,1)
 (1,1)
(1,0)
1
0
(0,0)
(0,1)
 (1,1)
(1,0)
0
0
1
n1
20
20
b
-­‐30
x1	
  
x2	
  
n2
20
20
b
-­‐10
x1	
  
x2	
  
x1
 x2
 n1
 n2
 y
0
 0
 0
 0
 0
0
 1
 0
 1
 1
1
 0
 0
 1
 1
1
 1
 1
 1
 0
Neural	
  Networks	
  
x
y
n11
n12
n21
n22
W12,y
W12,x
b
W11,y
W11,b
W12,b
b
W11,x
 W21,11
W22,12
W21,12
W22,11
W21,b
W22,b
z1
z2
Input	
  	
  
Layer
Hidden	
  
Layer
Output	
  
Layer
Visual	
  Pathway
http://www.nature.com/neuro/journal/v8/n8/images/nn0805-975-F1.jpg	
  
Training	
  Neural	
  Networks	
  
Training	
  Neural	
  Networks	
  
Training	
  
Data
Neural	
  Networks
 Output
Answer
Ini(aliza(on
Forward	
  
Propaga(on	
  
Error	
  
Func(on	
  
Backward	
  
Propaga(on	
  
Ini(aliza(on
•  Randomly	
  sampling	
  W	
  from	
  	
  –N	
  ~	
  N	
  
x
y
n11
n12
n21
n22
W12,y
W12,x
b
W11,y
W11,b
W12,b
b
W11,x
 W21,11
W22,12
W21,12
W22,11
W21,b
W22,b
z1
z2
Forward	
  Propaga(on	
  
Error	
  Func(on
J = (z1log(n21(out)) + (1 z1)log(1 n21(out)))
(z2log(n22(out)) + (1 z2)log(1 n22(out)))
n21
n22
z1
z2
nout ⇡ 0 and z = 0 ) J ⇡ 0
nout ⇡ 1 and z = 1 ) J ⇡ 0
nout ⇡ 0 and z = 1 ) J ⇡ 1
nout ⇡ 1 and z = 0 ) J ⇡ 1
 	
  w1	
  	
  	
  
	
  	
  w0	
  	
  	
  	
  
Gradient	
  Descent
w21,11 w21,11 ⌘
@J
@w21,11
w21,12 w21,12 ⌘
@J
@w21,12
w21,b w21,b ⌘
@J
@w21,b
w22,11 w21,11 ⌘
@J
@w22,11
w22,12 w21,12 ⌘
@J
@w22,12
w22,b w21,b ⌘
@J
@w22,b
w11,x w11,x ⌘
@J
@w11,x
w11,y w11,y ⌘
@J
@w11,y
w11,b w11,b ⌘
@J
@w11,b
w12,x w12,x ⌘
@J
@w12,x
w12,y w12,y ⌘
@J
@w12,y
w12,b w12,b ⌘
@J
@w12,b
(–
@J
@w0
, –
@J
@w1
)
Backward	
  Propaga(on	
  
http://cpmarkchang.logdown.com/posts/277349-neural-network-backward-propagation	
  
Vector	
  Space	
  of	
  Seman(cs	
  
Distribu(on	
  Seman(cs
•  The	
  meaning	
  of	
  a	
  word	
  can	
  be	
  inferred	
  from	
  its	
  
context.	
  
The meanings of dog
and cat are similar.
The	
  dog	
  run.	
  
A	
  cat	
  run.	
  
A	
  dog	
  sleep.	
  
The	
  cat	
  sleep.	
  
A	
  dog	
  bark.	
  
The	
  cat	
  meows.	
  
Seman(c	
  Vectors
The	
  dog	
  run.	
  
A	
  cat	
  run.	
  
A	
  dog	
  sleep.	
  
The	
  cat	
  sleep.	
  
A	
  dog	
  bark.	
  
The	
  cat	
  meows.	
  
the	
   a	
   run	
   sleep	
   bark	
   meow	
  
dog	
   1	
   2	
   2	
   2	
   1	
   0	
  
cat	
   2	
   1	
   2	
   2	
   0	
   1	
  
Seman(c	
  Vectors	
  
dog	
  (1,	
  2,...,	
  xn)	
  	
  	
  
cat	
  (2,	
  1,...,	
  xn)	
  	
  
Car	
  (0,	
  0,...,	
  xn)	
  	
  
Cosine	
  Similarity	
  
•  Cosine	
  Similarity	
  between	
  A	
  &	
  B	
  is:	
  
A · B
|A||B|
dog	
  	
   	
  (a1,a2,...,an)
cat	
   (b1,b2,...,bn)
Cosine	
  similarity	
  between	
  dog	
  &	
  cat	
  
is:	
  
a1b1 + a2b2 + ... + anbn
p
a2
1 + a2
2 + ... + a2
n
p
b2
1 + b2
2 + ... + b2
n
Opera(on	
  of	
  Vectors	
  
Woman	
  +	
  
King	
  -­‐	
  	
  Man	
  	
  
=	
  Queen
Woman
Queen
Man
 King
King	
  -­‐	
  Man
King	
  -­‐	
  Man
Neural	
  Language	
  Models	
  
(word2vec)	
  
Dimension	
  is	
  too	
  LARGE	
  
(x1=the,	
  x2	
  =a,...,	
  xn)	
  
dog	
  
Dimension	
  of	
  seman(c	
  vectors	
  	
  
is	
  equal	
  to	
  the	
  size	
  of	
  vocabulary.	
  
x1	
   x2	
   x3	
   x4	
   xn	
  ...	
  
Compressed	
  Vectors	
  
dog
One-­‐Hot	
  	
  
Encoding
Neural	
  
Network	
  
Compressed	
  
Vector
1.2	
  
0.7	
  
0.5	
  
1	
  
0	
  
0	
  
0	
  
One-­‐Hot	
  Encoding
dog
 cat
 run
 fly
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
Ini(alize	
  Weights	
  
dog
cat
run
fly
dog
cat
run
fly
W =
2
6
6
4
w11 w12 w13
w21 w22 w23
w31 w32 w33
w31 w32 w43
3
7
7
5V =
2
6
6
4
v11 v12 v13
v21 v22 v23
v31 v32 v33
v31 v32 v43
3
7
7
5
Compressed	
  Vectors
1
0
0
0
dog
High	
  
dimension
Low	
  
dimension
v11	
  
v12	
  
v13	
  
v11	
  
v12	
  
v13	
  
v11	
  
v12	
  
v13	
  
1
0
0
0
Compressed	
  Vectors	
  
dog
 cat
 run
 fly
v11	
  
v12	
  
v13	
  
v21	
  
v22	
  
v23	
  
v31	
  
v32	
  
v33	
  
v41	
  
v42	
  
v43	
  
dog
cat
run
fly
Context	
  Word
dog
 1
0
0
0
v11	
  
v12	
  
v13	
  
v11	
  
v12	
  
v13	
   run
0
0
1
0
w31	
  
w32	
  
w33	
  
dog
cat
run
fly
 dog
 cat
run
fly
1
1 + e V1W3
⇡ 1
V1 · W3 = v11w31 + v12w32 + v13w33
Context	
  Word
cat
1
0
0
0
v11	
  
v12	
  
v13	
  
v21	
  
v22	
  
v23	
   run
0
0
1
0
w31	
  
w32	
  
w33	
  
dog
 cat
run
fly
V2 · W3 = v21w31 + v22w32 + v23w33
dog
 cat
run
fly
1
1 + e V2W3
⇡ 1
Non-­‐context	
  Word
dog
 1
0
0
0
v11	
  
v12	
  
v13	
  
v11	
  
v12	
  
v13	
  
fly
0
0
1
0
w41	
  
w42	
  
w43	
  
V1 · W4 = v11w41 + v12w42 + v13w43
1
1 + e V1W4
⇡ 0
dog
 cat
run
fly
dog
 cat
run
fly
Non-­‐context	
  Word
cat
 1
0
0
0
v11	
  
v12	
  
v13	
  
v21	
  
v22	
  
v23	
  
w41	
  
w42	
  
w43	
  
0
0
1
0
V2 · W4 = v21w41 + v22w42 + v23w43
dog
 cat
run
fly
dog
 cat
run
fly
fly
1
1 + e V2W4
⇡ 0
Result	
  
dog
 cat
 run
 fly
v11	
  
v12	
  
v13	
  
v21	
  
v22	
  
v23	
  
v31	
  
v32	
  
v33	
  
v41	
  
v42	
  
v43	
  
dog
 cat
run
fly
Further	
  Reading	
  
•  Logis(c	
  Regression	
  3D	
  
–  h`p://cpmarkchang.logdown.com/posts/189069-­‐logis(-­‐regression-­‐model	
  
•  OverFimng	
  and	
  Regulariza(on	
  
–  h`p://cpmarkchang.logdown.com/posts/193261-­‐machine-­‐learning-­‐overfimng-­‐and-­‐regulariza(on	
  
•  Model	
  Selec(on	
  
–  h`p://cpmarkchang.logdown.com/posts/193914-­‐machine-­‐learning-­‐model-­‐selec(on	
  
•  Neural	
  Network	
  Back	
  Propaga(on	
  
–  h`p://cpmarkchang.logdown.com/posts/277349-­‐neural-­‐network-­‐backward-­‐propaga(on	
  
Further	
  Reading	
  
•  Neural	
  Probabilis(c	
  Language	
  Model:
–  h`p://cpmarkchang.logdown.com/posts/255785-­‐neural-­‐
network-­‐neural-­‐probabilis(c-­‐language-­‐model	
  
–  h`p://cpmarkchang.logdown.com/posts/276263-­‐-­‐
hierarchical-­‐probabilis(c-­‐neural-­‐networks-­‐neural-­‐network-­‐
language-­‐model	
  
•  Word2vec	
  
–  h`p://arxiv.org/pdf/1301.3781.pdf	
  
–  h`p://papers.nips.cc/paper/5021-­‐distributed-­‐
representa(ons-­‐of-­‐words-­‐and-­‐phrases-­‐and-­‐their-­‐
composi(onality.pdf	
  
–  h`p://www-­‐personal.umich.edu/~ronxin/pdf/w2vexp.pdf	
  

More Related Content

What's hot

What's hot (20)

ZK Study Club: Sumcheck Arguments and Their Applications
ZK Study Club: Sumcheck Arguments and Their ApplicationsZK Study Club: Sumcheck Arguments and Their Applications
ZK Study Club: Sumcheck Arguments and Their Applications
 
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
 
Fast Wavelet Tree Construction in Practice
Fast Wavelet Tree Construction in PracticeFast Wavelet Tree Construction in Practice
Fast Wavelet Tree Construction in Practice
 
Gems of GameplayKit. UA Mobile 2017.
Gems of GameplayKit. UA Mobile 2017.Gems of GameplayKit. UA Mobile 2017.
Gems of GameplayKit. UA Mobile 2017.
 
Machine Learning Introduction
Machine Learning IntroductionMachine Learning Introduction
Machine Learning Introduction
 
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」
 
Secure and privacy-preserving data transmission and processing using homomorp...
Secure and privacy-preserving data transmission and processing using homomorp...Secure and privacy-preserving data transmission and processing using homomorp...
Secure and privacy-preserving data transmission and processing using homomorp...
 
QMC Error SAMSI Tutorial Aug 2017
QMC Error SAMSI Tutorial Aug 2017QMC Error SAMSI Tutorial Aug 2017
QMC Error SAMSI Tutorial Aug 2017
 
Convolutional neural network backpropagation derivation
Convolutional neural network backpropagation derivationConvolutional neural network backpropagation derivation
Convolutional neural network backpropagation derivation
 
SOAL RANGKAIAN LOGIKA
SOAL RANGKAIAN LOGIKASOAL RANGKAIAN LOGIKA
SOAL RANGKAIAN LOGIKA
 
Fast Identification of Heavy Hitters by Cached and Packed Group Testing
Fast Identification of Heavy Hitters by Cached and Packed Group TestingFast Identification of Heavy Hitters by Cached and Packed Group Testing
Fast Identification of Heavy Hitters by Cached and Packed Group Testing
 
Cnn backpropagation derivation
Cnn backpropagation derivationCnn backpropagation derivation
Cnn backpropagation derivation
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural network
 
数学カフェ 確率・統計・機械学習回 「速習 確率・統計」
数学カフェ 確率・統計・機械学習回 「速習 確率・統計」数学カフェ 確率・統計・機械学習回 「速習 確率・統計」
数学カフェ 確率・統計・機械学習回 「速習 確率・統計」
 
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
 
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
 
統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半
 
Predicate-Preserving Collision-Resistant Hashing
Predicate-Preserving  Collision-Resistant HashingPredicate-Preserving  Collision-Resistant Hashing
Predicate-Preserving Collision-Resistant Hashing
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Lecture 4: Stochastic Hydrology (Site Characterization)
Lecture 4: Stochastic Hydrology (Site Characterization)Lecture 4: Stochastic Hydrology (Site Characterization)
Lecture 4: Stochastic Hydrology (Site Characterization)
 

Viewers also liked

Transcranial Electrostim Asd
Transcranial Electrostim AsdTranscranial Electrostim Asd
Transcranial Electrostim Asd
WinesforAutism
 
Language development in children
Language development in childrenLanguage development in children
Language development in children
Anam_ Khan
 

Viewers also liked (20)

Neural Doodle
Neural DoodleNeural Doodle
Neural Doodle
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用
 
TensorFlow 深度學習快速上手班--自然語言處理應用
TensorFlow 深度學習快速上手班--自然語言處理應用TensorFlow 深度學習快速上手班--自然語言處理應用
TensorFlow 深度學習快速上手班--自然語言處理應用
 
TensorFlow 深度學習快速上手班--深度學習
 TensorFlow 深度學習快速上手班--深度學習 TensorFlow 深度學習快速上手班--深度學習
TensorFlow 深度學習快速上手班--深度學習
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
TensorFlow 深度學習講座
TensorFlow 深度學習講座TensorFlow 深度學習講座
TensorFlow 深度學習講座
 
The Genome Assembly Problem
The Genome Assembly ProblemThe Genome Assembly Problem
The Genome Assembly Problem
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
AlphaGo in Depth
AlphaGo in Depth AlphaGo in Depth
AlphaGo in Depth
 
DRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterDRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive Writer
 
Applied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural NetworksApplied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural Networks
 
TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習
 
淺談深度學習
淺談深度學習淺談深度學習
淺談深度學習
 
Noam chomsky-
Noam chomsky-Noam chomsky-
Noam chomsky-
 
Transcranial Electrostim Asd
Transcranial Electrostim AsdTranscranial Electrostim Asd
Transcranial Electrostim Asd
 
Image completion
Image completionImage completion
Image completion
 
民主的網路世代—推動罷免的工程師們
民主的網路世代—推動罷免的工程師們民主的網路世代—推動罷免的工程師們
民主的網路世代—推動罷免的工程師們
 
Introduction to computational linguistics
Introduction to computational linguisticsIntroduction to computational linguistics
Introduction to computational linguistics
 
自然語言處理簡介
自然語言處理簡介自然語言處理簡介
自然語言處理簡介
 
Language development in children
Language development in childrenLanguage development in children
Language development in children
 

Similar to Computational Linguistics week 5

Computer Architecture 3rd Edition by Moris Mano CH 01-CH 02.ppt
Computer Architecture 3rd Edition by Moris Mano CH  01-CH 02.pptComputer Architecture 3rd Edition by Moris Mano CH  01-CH 02.ppt
Computer Architecture 3rd Edition by Moris Mano CH 01-CH 02.ppt
Howida Youssry
 

Similar to Computational Linguistics week 5 (20)

Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
 
Lecture 2: Artificial Neural Network
Lecture 2: Artificial Neural NetworkLecture 2: Artificial Neural Network
Lecture 2: Artificial Neural Network
 
Deep Learning for AI (2)
Deep Learning for AI (2)Deep Learning for AI (2)
Deep Learning for AI (2)
 
Scaling Secure Computation
Scaling Secure ComputationScaling Secure Computation
Scaling Secure Computation
 
Computer Architecture 3rd Edition by Moris Mano CH 01-CH 02.ppt
Computer Architecture 3rd Edition by Moris Mano CH  01-CH 02.pptComputer Architecture 3rd Edition by Moris Mano CH  01-CH 02.ppt
Computer Architecture 3rd Edition by Moris Mano CH 01-CH 02.ppt
 
Jacobi method
Jacobi methodJacobi method
Jacobi method
 
Digital logic circuit
Digital logic circuitDigital logic circuit
Digital logic circuit
 
Logic gates
Logic gatesLogic gates
Logic gates
 
CH1_2.ppt
CH1_2.pptCH1_2.ppt
CH1_2.ppt
 
Ch1 2
Ch1 2Ch1 2
Ch1 2
 
CH1_2.ppt
CH1_2.pptCH1_2.ppt
CH1_2.ppt
 
Logic gates and boolean algebra.ppt
Logic gates and boolean algebra.pptLogic gates and boolean algebra.ppt
Logic gates and boolean algebra.ppt
 
digital logic circuits, logic gates, boolean algebra
digital logic circuits, logic gates, boolean algebradigital logic circuits, logic gates, boolean algebra
digital logic circuits, logic gates, boolean algebra
 
Ch1-2, Digital Logic Circuit and Digital Components.ppt
Ch1-2, Digital Logic Circuit and Digital Components.pptCh1-2, Digital Logic Circuit and Digital Components.ppt
Ch1-2, Digital Logic Circuit and Digital Components.ppt
 
Deep Learning for New User Interactions (Gestures, Speech and Emotions)
Deep Learning for New User Interactions (Gestures, Speech and Emotions)Deep Learning for New User Interactions (Gestures, Speech and Emotions)
Deep Learning for New User Interactions (Gestures, Speech and Emotions)
 
Ch1 2 (2)
Ch1 2 (2)Ch1 2 (2)
Ch1 2 (2)
 
Stefan Kanev: Clojure, ClojureScript and Why They're Awesome at I T.A.K.E. Un...
Stefan Kanev: Clojure, ClojureScript and Why They're Awesome at I T.A.K.E. Un...Stefan Kanev: Clojure, ClojureScript and Why They're Awesome at I T.A.K.E. Un...
Stefan Kanev: Clojure, ClojureScript and Why They're Awesome at I T.A.K.E. Un...
 
Multi-Party Computation for the Masses
Multi-Party Computation for the MassesMulti-Party Computation for the Masses
Multi-Party Computation for the Masses
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
Artificial Neural Networks
Artificial Neural NetworksArtificial Neural Networks
Artificial Neural Networks
 

More from Mark Chang (9)

Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential Equation
 
Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential Equation
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
 
PAC Bayesian for Deep Learning
PAC Bayesian for Deep LearningPAC Bayesian for Deep Learning
PAC Bayesian for Deep Learning
 
PAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep LearningPAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep Learning
 
Domain Adaptation
Domain AdaptationDomain Adaptation
Domain Adaptation
 
Language Understanding for Text-based Games using Deep Reinforcement Learning
Language Understanding for Text-based Games using Deep Reinforcement LearningLanguage Understanding for Text-based Games using Deep Reinforcement Learning
Language Understanding for Text-based Games using Deep Reinforcement Learning
 
Discourse Representation Theory
Discourse Representation TheoryDiscourse Representation Theory
Discourse Representation Theory
 

Recently uploaded

Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 

Recently uploaded (20)

Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 

Computational Linguistics week 5

  • 1. Computa(onal  Linguis(cs   Week  5     Neural  Networks  and     Neural  Language  Models   By  Mark  Chang  
  • 2. Outlines   •  Machine  Learning   •  Neural  Networks   •  Training  Neural  Networks   •  Vector  Space  of  Seman(cs   •  Neural  Language  Models  (word2vec)  
  • 4. Machine  Learning Training   Data Machine  Learning   Model Output Answer Error FeedBack Machine  Learning   Model Tes(ng   Data AJer   Training Output
  • 5. Machine  Learning Training  Data   X  ,  Y   x(i),  y(i)   Model   h   Parameter   w Output   h(X) Answer   Y Cost  Func(on   E(h(X),Y) Feedback X Y
  • 7. Training  Data   X   Y   -­‐0.47241379 0 -­‐0.35344828 0 -­‐0.30148276 0 0.33448276 1 0.35344828 1 0.37241379 1 0.39137931 1 0.41034483 1 0.44931034 1 0.49827586 1 0.51724138 1 …. ….
  • 8. Model   Sigmoid  func(on   h(x) = 1 1 + e (w0+w1x) w0 + w1x < 0 h(x) ⇡ 0 w0 + w1x > 0 h(x) ⇡ 1
  • 9. Cost  Func(on   •  Cross  Entropy    E(h(X), Y ) = 1 m ( mX i y(i) log(h(x(i) )) + (1 y(i) )log(1 h(x(i) ))) y(i) = 1 E(h(x(i) ), y(i) ) = log(h(x(i) )) h(x(i) ) ⇡ 0 ) E(h(x(i) ), y(i) ) ⇡ 1 h(x(i) ) ⇡ 1 ) E(h(x(i) ), y(i) ) ⇡ 0 y(i) = 0 E(h(x(i) ), y(i) ) = log(1 h(x(i) )) h(x(i) ) ⇡ 0 ) E(h(x(i) ), y(i) ) ⇡ 0 h(x(i) ) ⇡ 1 ) E(h(x(i) ), y(i) ) ⇡ 1
  • 10. Cost  Func(on   •  Cross  Entropy    E(h(X), Y ) = 1 m ( mX i y(i) log(h(x(i) )) + (1 y(i) )log(1 h(x(i) ))) h(x(i) ) ⇡ 0 and y(i) = 0 ) E(h(X), Y ) ⇡ 0 h(x(i) ) ⇡ 1 and y(i) = 1 ) E(h(X), Y ) ⇡ 0 h(x(i) ) ⇡ 0 and y(i) = 1 ) E(h(X), Y ) ⇡ 1 h(x(i) ) ⇡ 1 and y(i) = 0 ) E(h(X), Y ) ⇡ 1
  • 11.    w1          w0         Feedback   •  Gradient  Descent:     w0 w0–⌘ @E(h(X), Y ) @w0 w1 w1–⌘ @E(h(X), Y ) @w1 ( @E(h(X), Y ) @w0 , @E(h(X), Y ) @w1 )
  • 14. Neurons  &  Ac(on  Poten(al   h`p://humanphisiology.wikispaces.com/file/view/neuron.png/ 216460814/neuron.png     h`p://upload.wikimedia.org/wikipedia/commons/thumb/ 4/4a/Ac(on_poten(al.svg/1037px-­‐ Ac(on_poten(al.svg.png    
  • 16. Ar(ficial  Neurons   n W1 W2 x1 x2 b Wb y nin = w1x1 + w2x2 + wb nout = 1 1 + e nin nin nout y = 1 1 + e (w1x1+w2x2+wb)
  • 17. nout = 1 nout = 0.5 nout = 0(0,0) x2 x1 Ar(ficial  Neurons nin = w1x1 + w2x2 + wb nout = 1 1 + e nin nin = w1x1 + w2x2 + wb nout = 1 1 + e nin w1x1 + w2x2 + wb = 0 w1x1 + w2x2 + wb > 0 w1x1 + w2x2 + wb < 0 1 0
  • 18. Binary  Classifica(on:AND  Gate   x1 x2 y 0 0 0 0 1 0 1 0 0 1 1 1 (0,0) (0,1) (1,1) (1,0) 0 1 n 20 20 b -­‐30 y x1   x2   y = 1 1 + e (20x1+20x2 30) 20x1 + 20x2 30 = 0
  • 19. Binary  Classifica(on:OR  Gate x1 x2 y 0 0 0 0 1 1 1 0 1 1 1 1 y = 1 1 + e (20x1+20x2 10) (0,0) (0,1) (1,1) (1,0) 0 1 n 20 20 b -­‐10 y x1   x2   20x1 + 20x2 10 = 0
  • 20. XOR  Gate  ? (0,0) (0,1) (1,1) (1,0) 0 0 1 x1 x2 y 0 0 0 0 1 1 1 0 1 1 1 0
  • 21. Binary  Classifica(on:XOR  Gate   n -­‐20 20 b -­‐10 y (0,0) (0,1) (1,1) (1,0) 0 1 (0,0) (0,1) (1,1) (1,0) 1 0 (0,0) (0,1) (1,1) (1,0) 0 0 1 n1 20 20 b -­‐30 x1   x2   n2 20 20 b -­‐10 x1   x2   x1 x2 n1 n2 y 0 0 0 0 0 0 1 0 1 1 1 0 0 1 1 1 1 1 1 0
  • 22. Neural  Networks   x y n11 n12 n21 n22 W12,y W12,x b W11,y W11,b W12,b b W11,x W21,11 W22,12 W21,12 W22,11 W21,b W22,b z1 z2 Input     Layer Hidden   Layer Output   Layer
  • 25. Training  Neural  Networks   Training   Data Neural  Networks Output Answer Ini(aliza(on Forward   Propaga(on   Error   Func(on   Backward   Propaga(on  
  • 26. Ini(aliza(on •  Randomly  sampling  W  from    –N  ~  N   x y n11 n12 n21 n22 W12,y W12,x b W11,y W11,b W12,b b W11,x W21,11 W22,12 W21,12 W22,11 W21,b W22,b z1 z2
  • 28. Error  Func(on J = (z1log(n21(out)) + (1 z1)log(1 n21(out))) (z2log(n22(out)) + (1 z2)log(1 n22(out))) n21 n22 z1 z2 nout ⇡ 0 and z = 0 ) J ⇡ 0 nout ⇡ 1 and z = 1 ) J ⇡ 0 nout ⇡ 0 and z = 1 ) J ⇡ 1 nout ⇡ 1 and z = 0 ) J ⇡ 1
  • 29.    w1          w0         Gradient  Descent w21,11 w21,11 ⌘ @J @w21,11 w21,12 w21,12 ⌘ @J @w21,12 w21,b w21,b ⌘ @J @w21,b w22,11 w21,11 ⌘ @J @w22,11 w22,12 w21,12 ⌘ @J @w22,12 w22,b w21,b ⌘ @J @w22,b w11,x w11,x ⌘ @J @w11,x w11,y w11,y ⌘ @J @w11,y w11,b w11,b ⌘ @J @w11,b w12,x w12,x ⌘ @J @w12,x w12,y w12,y ⌘ @J @w12,y w12,b w12,b ⌘ @J @w12,b (– @J @w0 , – @J @w1 )
  • 31. Vector  Space  of  Seman(cs  
  • 32. Distribu(on  Seman(cs •  The  meaning  of  a  word  can  be  inferred  from  its   context.   The meanings of dog and cat are similar. The  dog  run.   A  cat  run.   A  dog  sleep.   The  cat  sleep.   A  dog  bark.   The  cat  meows.  
  • 33. Seman(c  Vectors The  dog  run.   A  cat  run.   A  dog  sleep.   The  cat  sleep.   A  dog  bark.   The  cat  meows.   the   a   run   sleep   bark   meow   dog   1   2   2   2   1   0   cat   2   1   2   2   0   1  
  • 34. Seman(c  Vectors   dog  (1,  2,...,  xn)       cat  (2,  1,...,  xn)     Car  (0,  0,...,  xn)    
  • 35. Cosine  Similarity   •  Cosine  Similarity  between  A  &  B  is:   A · B |A||B| dog      (a1,a2,...,an) cat   (b1,b2,...,bn) Cosine  similarity  between  dog  &  cat   is:   a1b1 + a2b2 + ... + anbn p a2 1 + a2 2 + ... + a2 n p b2 1 + b2 2 + ... + b2 n
  • 36. Opera(on  of  Vectors   Woman  +   King  -­‐    Man     =  Queen Woman Queen Man King King  -­‐  Man King  -­‐  Man
  • 37. Neural  Language  Models   (word2vec)  
  • 38. Dimension  is  too  LARGE   (x1=the,  x2  =a,...,  xn)   dog   Dimension  of  seman(c  vectors     is  equal  to  the  size  of  vocabulary.   x1   x2   x3   x4   xn  ...  
  • 39. Compressed  Vectors   dog One-­‐Hot     Encoding Neural   Network   Compressed   Vector 1.2   0.7   0.5   1   0   0   0  
  • 40. One-­‐Hot  Encoding dog cat run fly 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
  • 41. Ini(alize  Weights   dog cat run fly dog cat run fly W = 2 6 6 4 w11 w12 w13 w21 w22 w23 w31 w32 w33 w31 w32 w43 3 7 7 5V = 2 6 6 4 v11 v12 v13 v21 v22 v23 v31 v32 v33 v31 v32 v43 3 7 7 5
  • 42. Compressed  Vectors 1 0 0 0 dog High   dimension Low   dimension v11   v12   v13   v11   v12   v13   v11   v12   v13   1 0 0 0
  • 43. Compressed  Vectors   dog cat run fly v11   v12   v13   v21   v22   v23   v31   v32   v33   v41   v42   v43   dog cat run fly
  • 44. Context  Word dog 1 0 0 0 v11   v12   v13   v11   v12   v13   run 0 0 1 0 w31   w32   w33   dog cat run fly dog cat run fly 1 1 + e V1W3 ⇡ 1 V1 · W3 = v11w31 + v12w32 + v13w33
  • 45. Context  Word cat 1 0 0 0 v11   v12   v13   v21   v22   v23   run 0 0 1 0 w31   w32   w33   dog cat run fly V2 · W3 = v21w31 + v22w32 + v23w33 dog cat run fly 1 1 + e V2W3 ⇡ 1
  • 46. Non-­‐context  Word dog 1 0 0 0 v11   v12   v13   v11   v12   v13   fly 0 0 1 0 w41   w42   w43   V1 · W4 = v11w41 + v12w42 + v13w43 1 1 + e V1W4 ⇡ 0 dog cat run fly dog cat run fly
  • 47. Non-­‐context  Word cat 1 0 0 0 v11   v12   v13   v21   v22   v23   w41   w42   w43   0 0 1 0 V2 · W4 = v21w41 + v22w42 + v23w43 dog cat run fly dog cat run fly fly 1 1 + e V2W4 ⇡ 0
  • 48. Result   dog cat run fly v11   v12   v13   v21   v22   v23   v31   v32   v33   v41   v42   v43   dog cat run fly
  • 49. Further  Reading   •  Logis(c  Regression  3D   –  h`p://cpmarkchang.logdown.com/posts/189069-­‐logis(-­‐regression-­‐model   •  OverFimng  and  Regulariza(on   –  h`p://cpmarkchang.logdown.com/posts/193261-­‐machine-­‐learning-­‐overfimng-­‐and-­‐regulariza(on   •  Model  Selec(on   –  h`p://cpmarkchang.logdown.com/posts/193914-­‐machine-­‐learning-­‐model-­‐selec(on   •  Neural  Network  Back  Propaga(on   –  h`p://cpmarkchang.logdown.com/posts/277349-­‐neural-­‐network-­‐backward-­‐propaga(on  
  • 50. Further  Reading   •  Neural  Probabilis(c  Language  Model: –  h`p://cpmarkchang.logdown.com/posts/255785-­‐neural-­‐ network-­‐neural-­‐probabilis(c-­‐language-­‐model   –  h`p://cpmarkchang.logdown.com/posts/276263-­‐-­‐ hierarchical-­‐probabilis(c-­‐neural-­‐networks-­‐neural-­‐network-­‐ language-­‐model   •  Word2vec   –  h`p://arxiv.org/pdf/1301.3781.pdf   –  h`p://papers.nips.cc/paper/5021-­‐distributed-­‐ representa(ons-­‐of-­‐words-­‐and-­‐phrases-­‐and-­‐their-­‐ composi(onality.pdf   –  h`p://www-­‐personal.umich.edu/~ronxin/pdf/w2vexp.pdf