Dcnn for text

DCNN for text
B01902004 蔡捷恩

A CNN for modeling Sentences
Kalchbrenner, Nal, Edward Grefenstette, and Phil Blunsom. "A convolutional neural network for modelling sentences."arXiv:1404.2188
(2014).

Sentence model
• Sentence -> feature vector, that’s all !
• However, it is the core of:
• Sentiment analysis, paraphrase detection,
entailment recognition, summarisation,
discourse analysis, machine translation,
grounded language learning, image retrieval …

contribution
• Does not rely on parse tree
• Easily applicable to any language ?

How to model a sentence?
• Composition based method
• Need human knowledge to compose
• Automatically extracted logical forms
• Ex. RNN, TDNN

Brief network structure
• Interleaving k-max pooling & 1-dim-conv. +
TDNN => generate a sentence graph
A kind of syntax tree ?

NN sentence model with syntax tree
(Recursive NN, RecNN)
Reference syntax tree
while training
Share weight
and stack up to
form the network

RNN for sentence model
Linear “structure”

Back to DCNN
• Convolution
• TDNN
• K-max pooling( Dynamic k-max pooling)

Convolution
Narrow type, win=5
wide type, win=5 (0-padding)

Max-TDNN
GOAL: recognize features independent of time-shift
(i.e. sequence position)

Take a look at
DCNN
Need to be optimized during training
If we use Max-TDNN

K-max pooling
• Given k, no matter how many dimension an
input get, pool the top-k ones as output, “the
order of output corresponds to their input”
• Better than max-TDNN by:
– Preserve the order of features
– Discern more finely how high activated feature
react
• Guarantee the length of input to FC
independent of sentence length

Only fully connected need fix length
• Intermediate layers can be more flexible
• Dynamic k-max Pooling !

Dynamic k-max Pooling
• K is a function of length of the input sentence
and depth of the network
The k of currently concerned layer
Fixed k-max pooling’s k at the top
Total # of conv. in the network ( the depth)
Input sentence length

Folding
• Feature detectors in different rows are
independent of each other until the top fully
connected layer
• Simply do vector sum

Properties
• Sensitive to the order of words
• Filters of the first layer model n-grams, n ≤m
• Properties invariance of absolute position
captured by upper layer convs.
• Induce feature graph property

Experiments
Sentiment analysis
Stanford Sentiment Treebank
Movie review, 5 scense, +/- label

Experiments
Question type prediction
on TREC

Experiments
Twitter sentiment dataset, binary label

Experiments
• Visualizing feature detectors

Think about it
• Can this kind of k-max pooling apply to image
tasks ?

A CNN for matching nature language
sentences
Hu, Baotian, et al. "Convolutional neural network architectures for matching natural language sentences." Advances in Neural Information Processing Systems. 2014

Why convolution approach
• No need prior knowledge

Contribution
• Hierarchical sentence modeling
• The capturing of rich matching patterns at
different levels of abstraction

Convolutional Sentence Modeling
Word2vec pre-trained
2-window max pooling
Fixed input len

A trick on zero-padding
• The variable length of sentence may be in a
fairly broad range
• Introduce gate operation
• g(z) = <0> while z = <0>, otherwise, <1>
• No bias !

RNN vs ConvNet
ConvNet RNN
Hierarchical
structure
W L
Parallelism W L
Capture far away
information
- -
Explainable W L
Variety L W

Architecture-I
• Drawback: in forward phase, the representation of each sentence
Is built without knowledge of each other

Architecture-II
• Build directly on the interaction space between 2 sentences
• From 1D to 2D convolution
Good trick at pooling

Model Generality
• Arc-II subsumes Arc-I as a special case

Cost function
• Large margin objective
e(.)

Experiment – Sentence Completion

Experiment – Matching Response to
Tweet

Experiment – Paraphrase Identification
• Determine whether two sentences have the
same meaning

Discussion
• Sequence is important

Zhang, Xiang, and Yann LeCun. "Text Understanding from Scratch." arXiv preprint arXiv:1502.01710 (2015)
Text Understanding from Scratch

Contribution
• Character-level input
• No OOV
• Work for both English and Chinese

The model
character encoding space
Not encoded character or space
=> All-zero vector
Fixed length window
H e l l o w o r l

What about various input length?
• Set to the longest sentence we are going to
see (1014 character used in their experiments)

Data augmentation - Thesaurus
• Thesaurus: “a book that lists words in groups
of synonyms and related concepts”
• http://www.libreoffice.org/

Comparison models
• Bag-of-word: 5000 most freq. words
• Bag-of-centroids: 5000-means word vectors
on Google News corpus

DBpedia Ontology Classification

Amazon review sentiment analysis
• 1~5 indicating user’s subjective rating of a
product.
• Collected by SNAP project

Amazon review sentiment analysis

Yahoo! Answer Topic Classification

News Categorization in English

News Categorization in Chinese
• SogouCA and SogouCS
• pypinyin package + jieba Chinese
segmentation system

News Categorization in Chinese

Conclusion
• We can play a lot of trick with Pooling

Dcnn for text

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to Dcnn for text

Similar to Dcnn for text (20)

More from 捷恩蔡

More from 捷恩蔡 (6)

Recently uploaded

Recently uploaded (20)

Dcnn for text

Editor's Notes

Dcnn for text

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to Dcnn for text

Similar to Dcnn for text (20)

More from 捷恩 蔡

More from 捷恩 蔡 (6)

Recently uploaded

Recently uploaded (20)

Dcnn for text

Editor's Notes

More from 捷恩蔡

More from 捷恩蔡 (6)