Summer 2017
Elvis Saravia
PhD, Information Systems and Applications
ellfae@gmail.com
Github username: omarsar
Questions: sli.do (#Z217)
2
●
●
●
●
●
●
● Knowledge Discovery (KDD) Process
3
4
5
ConceptNet
6
●
●
●
7
Motel = [0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
Hotel = [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]
●
●
One-hot representation
8
hotel = [0.728 0.234 -0.23 0.223]
Distributed representation (low-dimension vector)
9
10
Paper source: https://arxiv.org/pdf/1301.3781.pdf
11
Paper source: https://arxiv.org/pdf/1301.3781.pdf
Feedforward Neural Net Language Model (NNLM)
variables to optimize
denotes window range
12
13
P(the|over)
P(fox|over)
P(jumped|over)
P(the|over)
P(lazy|over)
P(dog|over)
P(VOUT
| VIN
)
How to define this prob. distribution?
Determines similarity in [-1,1]
Get a probability in [0,1] out of a similarity in [-1,1]
14
15
https://www.healthvault.com/en-us/health-bo
t/
16
● https://goo.gl/ppHX65
●
○ Gensim guide for word2vec: https://goo.gl/i2UrdH
● https://goo.gl/7b72S9
●
● https://goo.gl/uNJDrs
●
17
18
19
20
21
22
23
● https://goo.gl/KYacjz
●
●
●
●
●
● https://goo.gl/JezgYg
●
24
a. Build API: (Flask/Django recommended)
b. Pretrained models: (Guide: https://goo.gl/5qt2Ki)
c. Visualization: d3js / plotly / tensorboard
a. LSTM - (Guide: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
b. CNN - (Guide: https://goo.gl/PgLUs7)
c. RNN - (Guide: https://goo.gl/5L9kci
a. Starting point:https://rare-technologies.com/word2vec-tutorial#app
25

Text mining lab (summer 2017) - Word Vector Representation