2. A CNN for modeling Sentences
Kalchbrenner, Nal, Edward Grefenstette, and Phil Blunsom. "A convolutional neural network for modelling sentences."arXiv:1404.2188
(2014).
3. Sentence model
• Sentence -> feature vector, that’s all !
• However, it is the core of:
• Sentiment analysis, paraphrase detection,
entailment recognition, summarisation,
discourse analysis, machine translation,
grounded language learning, image retrieval …
12. Take a look at
DCNN
Need to be optimized during training
If we use Max-TDNN
13. K-max pooling
• Given k, no matter how many dimension an
input get, pool the top-k ones as output, “the
order of output corresponds to their input”
• Better than max-TDNN by:
– Preserve the order of features
– Discern more finely how high activated feature
react
• Guarantee the length of input to FC
independent of sentence length
14. Only fully connected need fix length
• Intermediate layers can be more flexible
• Dynamic k-max Pooling !
15. Dynamic k-max Pooling
• K is a function of length of the input sentence
and depth of the network
The k of currently concerned layer
Fixed k-max pooling’s k at the top
Total # of conv. in the network ( the depth)
Input sentence length
16. Folding
• Feature detectors in different rows are
independent of each other until the top fully
connected layer
• Simply do vector sum
18. Properties
• Sensitive to the order of words
• Filters of the first layer model n-grams, n ≤m
• Properties invariance of absolute position
captured by upper layer convs.
• Induce feature graph property
23. Think about it
• Can this kind of k-max pooling apply to image
tasks ?
24. A CNN for matching nature language
sentences
Hu, Baotian, et al. "Convolutional neural network architectures for matching natural language sentences." Advances in Neural Information Processing Systems. 2014
28. A trick on zero-padding
• The variable length of sentence may be in a
fairly broad range
• Introduce gate operation
• g(z) = <0> while z = <0>, otherwise, <1>
• No bias !
Each word or phrase a vector => compose, word meaning vectors => sentence meaning vector
強項: 不再是naïve or rule based組合,而是隨著前後文調整
K max pooling 取幾個k是由network的其他layer predict的
上面幾個layer的filters 可以整合很遠的文字之間的關係 => 就像CNN for image一樣,是bottom up,由細往大方向看
或其實可以把他的feature graph想成是syntax tree
回到前面跨語言的問題,為什麼說可以跨語言? => 只有network 自己看得懂的語法