Zeng, X., Yang, C., Tu, C., Liu, Z., & Sun, M. (2018). Chinese LIWC Lexicon Expansion via Hierarchical Classification of Word Embeddings with Sememe Attention.
Chinese liwc lexicon expansion via hierarchical classification of word embeddings with sememe attention
1. SMU Classification: Restricted
Zeng, X., Yang, C., Tu, C., Liu, Z.,&Sun, M. (2018). Chinese LIWC Lexicon Expansion via
Hierarchical Classification of Word Embeddings with Sememe Attention, AAAI18
3. SMU Classification: Restricted
2
LIWC
• The Linguistic Inquiry and Word Count
• Widely used for computerized text analysis in social
science
• Consists of 92 predefined word categories
(e.g., Work, Leisure, Money, Time,…)
4. SMU Classification: Restricted
3
• For each category, LIWC gives the proportion of
words under that category of the given text
• The word lists are first constructed by Pennebaker
et al. in early 1990s
!
""
#
""
5. SMU Classification: Restricted
4
• LIWC-based applications:
- Depression detection (Ramirez-Esparza, N et al. 2008)
- Social status prediction (Kacewicz, W., Pennebaker, J.W. 2013)
- Personality prediction (Golbeck, J. 2011)
• LIWC in multi-language:
- Traditional and Simplified Chinese (Huang et al. 2012)
- Italian (Alparone, F 2004)
- Dutch (van Wissen 2017)
6. SMU Classification: Restricted
5
• Although Chinese LIWC has been developed
- The lexicon included are insufficient
- Do not consist emerging word and netspeak
• In order to expand the Chinese LIWC
- Too time-consuming to manually annotate new
words
- Language expertise is needed
• This paper proposes to automatically expand LIWC
lexicon
7. SMU Classification: Restricted
6
• Problems in automated lexicon expansion
- Polysemy: multiple meaning carried by a word
- Indistinctness: categories are too fine-grained,
making it hard to distinguish one from another
8. SMU Classification: Restricted
7
• LIWC categories form a tree structure hierarchy
- Hierarchical SVM do not consider polysemy and
indistinctness property
- This paper proposes to use HowNet together with
attention mechanism for expansion learning
9. SMU Classification: Restricted
8
• Hierarchical multilabel
• LIWC lexicon expansion is a non-mandatory
hierarchical multilabel classification
12. SMU Classification: Restricted
11
• < ", $%&, %' >
- %' (Partial Depth Labeling) indicates that some
instances have a partial depth of labeling
13. SMU Classification: Restricted
12
• Most previous works on lexicon expansion are
based on feature engineering techniques
- Requires lots of human knowledge on feature
designing
- Non-trivial to be adopted
• Hierarchical classification
- Flat Classifier ( Barbedo and Lopes 2007)
- Local Classifier Per Node (Fagni and Sebastiani 2007)
- Local Classifier Per Parent Node (Silla and Freitas 2009)
- Local Classifier Per Level (Clare and King 2003)
- Global Classifier (Kiritchenko et al. 2006)
14. SMU Classification: Restricted
13
• Deep learning based hierarchical classification
- Use the predictions of the neural network as inputs
for the next neural network associated with the next
hierarchical level
(Cerri, Barros, and De Carvalho 2014)
- Use RNN encoder-decoder for entity mention
classification by generating paths in the hierarchy
from top node to leaf nodes
(Karn, Waltinger, and Sch utze 2017)
16. SMU Classification: Restricted
15
• Model the hierarchical classification problem as a
Sequence-to-sequence decoding
- Input: word embedding of a given word !
- Output: hierarchical label " of !
17. SMU Classification: Restricted
16
• Denotations:
- ! denotes the label set
- ": ! → ! denotes the parent relationship
where "(') is the parent node of ' ∈ !
- Given a word *, its labels form a tree structure.
Each path from root node to leaf node can be
transform to a sequence ' = (',, '., … , '0)
where " '1 = '12, , ∀1 ∈ [., 0]
and 0 is the number of levels in the hierarchy
18. SMU Classification: Restricted
• Hierarchical Decoder (HD) predicts a label !" by
taking the parent label sequence (!$, !&, … , !"($) into
consideration
-
17
21. SMU Classification: Restricted
20
!" = %('( ) ℎ"+,, ." + 0()
2" = %('3 ) ℎ"+,, ." + 03)
4̃" = tanh(': ) ℎ"+,, ." + 0:)
4" = !" ∘ 4"+, + 2" ∘ 4̃"
<" = %('= ) ℎ"+,, >" + 0=)
ℎ" = <" ∘ tanh(4")
• Denotations:
- ∘ is element-wise multiplication
- % is the sigmoid function
- 0∗ are the biases
- '∗ are weights
- <", 2", and !" are output gate, input gate and forget
gate respectively
22. SMU Classification: Restricted
21
• Take advantage of word embeddings:
- Initial state !" = $%
where $% represent the word embedding of word &
- ' ∈ ℝ + ×-. denotes the word embedding matrix
where /0 is the dimension number
- 1 ∈ ℝ 2 ×-3 denotes the layer embedding matrix
where /4 is the dimension number
- Word embeddings are pre-trained and fixed during
training
23. SMU Classification: Restricted
22
• Hierarchical decoder is insufficient
- Each word only has one representation
- Cannot handle polysemy and indistinctness
- Propose to incorporate sememe information
• Sememe attention
- Different sememe represent different meanings of a
word
- Same sememe could be under different category
- Incorporating weights when predicting
24. SMU Classification: Restricted
23
• Consider the tree and HowNet structure
- When decoder chooses among the sub-classes of
PersonalConcerns, location should have a relatively
lower weight
25. SMU Classification: Restricted
24
• Propose to utilize the attention mechanism
- By Bahdanau, Cho, and Bengio 2014
- Apply word embeddings as initial state
- Conditional probability is defined as
where !" is the context vector
S ∈ ℝ ' ×)* denotes the sememe embedding matrix
where +, is the dimension number
{ℎ/, … , ℎ2} is the set of sememe embeddings from S
26. SMU Classification: Restricted
25
• The weight !"# of each sememe embedding ℎ# is
defined as
where
- %"# is the score indicating how well the sememe
embedding ℎ# is associated when predicting the
current label &"
- ' ∈ ℝ+, ,- ∈ ℝ+×/0, and ,1 ∈ ℝ+×/2 are weight
matrices
3 is the number of hidden units in attention model
27. SMU Classification: Restricted
26
• At each time step, HDSA
- Choose which sememe to pay attention to when
predicting the current word label
- Different sememes can have different weights
- Same sememe can have different weights under
different categories
28. SMU Classification: Restricted
27
• For optimization, the objective function is defined as
where
- !"# ∈ {0,1} represents whether word *# belongs to
label !"
- !′"# represents the predicted probability of word *#
belonging to label !"
- , is the number of words
29. SMU Classification: Restricted
28
• In the implementation
- Adam algorithm is utilize to automatically adapt the
learning rate of each parameter
- Beam search:
A word will be assign to a label sequence ! only
when
where " is a threshold
- Recurrent Dropout and Layer Normalization are
employed to prevent overfitting
30. SMU Classification: Restricted
• Use the Chinese LIWC by Huang et al. 2012
- Hierarchy depth is 3
- Statistics
• Employ the sememe annotation in HowNet
- 1,617 distinct sememes are used
• Use Sogou-T corpus to learn word embeddings
and sememe embeddings
- Sogou-T contains over 130 million web pages
provided by a Chinese commercial search engine
29
31. SMU Classification: Restricted
30
• Hierarchical algorithms are chosen for comparison
- Top-down k-NN (TD k-NN): Top-down decision
making using k-NN at each parent label
- Top-down SVM (TD SVM): Top-down decision making
using SVM at each parent label
- Structural SVM: A margin rescaled structural SVM
using the 1-slack formulation and cutting plane method
- Condensing Sort and Select Algorithm (CSSA): A
hierarchical classification algorithm which can be used on
both tree- and DAG-structured hierarchies
- Hierarchical Decoder (HD): The Hierarchical
Decoder without sememe attention
32. SMU Classification: Restricted
31
• The word embedding matrix ! is pre-trained using
word2vec, setting
- Directly use the embeddings from ! as sememe
embeddings
- Dimension #$ = #& = 300
- Window size ) = 5
- Negative sampling +, = 5
- Consider words with frequency > 50
• Randomly initialize the label embedding matrix -,
and use backpropagation to update the value during
training
34. SMU Classification: Restricted
33
• Transform tree structure to label sequence
- If a word has more than one path in the tree, it will be
transformed into multiple label sequences
"
#$%
#$&
#$' #$(
" #$% #$& #$'
" #$% #$& #$(
)
)
35. SMU Classification: Restricted
34
• Micro-!"
• Macro-!"
Weighted Macro-F1 (W-M-F1) is used for
categories in LIWC are highly imbalanced with some
categories containing more than 1,000 instances
while some only have less than 40 instances
37. SMU Classification: Restricted
36
• Both the HD and HDSA outperform all other
baselines
• HDSA outperforms HD by 2%
• The performance difference between HDSA and top-
down approaches gets larger as level going deeper
38. SMU Classification: Restricted
37
• ! has a direct impact on both precision and recall
- Higher ! means more strict standard, resulting in
higher precision and lower recall
• Micro-F1 and W-M-F1 do not change much as !
increases
41. SMU Classification: Restricted
40
• Contribution
- Utilize the Sequence-to-Sequence model for
hierarchical classification of LIWC lexicon expansion
- Propose to use sememe and utilize the attention
mechanism to select senses of words
- Prove the model to be capable of understanding the
meaning of words with the help of sememe attention
42. SMU Classification: Restricted
41
• Future directions
- Consider the complicated structure and relations of
sememes and words in HowNet
- Explore the possibility of updating word embeddings
during training instead of using pre-trained
embeddings