Incorporating Word Embeddings into Open Directory Project
based Large-Scale Classification
Presented by KANG-MIN KIM
kangmin89@korea.ac.kr
Data Intelligence Laboratory, Korea University
5th June, 2018
Kang-Min Kim, Alieva Dinara, Byung-Ju Choi, SangKeun Lee
Korea University
PAKDD 2018
1
Contents
1. Problem Statement
2. Contributions
3. Preliminary
4. Joint Model of Explicit and Implicit Representation
5. Semantic Similarity Measure
6. Experiments
7. Conclusion
2
1. Problem Statement
• How to incorporate word embeddings into explicit
knowledge representation to improve the
performance of large-scale text classification?
?Explicit
Representation
Implicit
Representation
3
1. We propose a novel methodology to handle the large-scale text
classification, which utilizes both the explicit and implicit
representation.
2. We develop two novel joint models of ODP-based classification and
word2vec to generate category vectors that represent the semantics of
ODP categories. In addition, we develop a new semantic similarity
measure that utilizes both the category and word vectors.
3. We demonstrate the efficacy of the proposed methodology through
extensive experiments on real-world datasets. Our approach
significantly outperforms the state-of-the-art techniques.
2. Contributions
4
• Text classification using NN [1][2]
Requires a lot of training data per class
Limited to small-
or moderate-scale
* Previous Works
[1] Kim, Y. , “Convolutional neural networks for sentence classification”, EMNLP 2014, 1746-1751.
[2] Le, Q.V., Mikolov, T., “Distributed representation of sentences and documents”, ICML 2014, 1188-1196
5
* Previous Works
• Text classification using knowledge base [1][2]
[1] Lee, J.H., Ha, J., Jung, J.Y., Lee, S. , “Semantic contextual advertising based on the open directory project”, ACM Trans. On the
Web 7(4) 2013, 24:1-24:22
[2] Shin, H., Lee, G., Rye, W.J., Lee, S., “Utilizing Wikipedia knowledge in open directory project based text classification”, SAC 2017,
309-314
6
* Open Directory Project (ODP)
• Large-scale and tree-structured (maximum of 15 levels) web directory
constructed by humans.
• Contains about 1 million categories and 4 million web pages.
Top
Game Society
Online Gambling GovernmentCrime
7
3.1 ODP-based Knowledge Representation [1]
Top
Game Society
Online Gambling GovernmentCrime
.
.
.
.
.
.
List of government
officials and their title,
updated weekly…
ODP documents
…
3. Preliminary
Bag-of-words
word frequency
government 1
officials 1
updated 1
weekly 1
... ...
[1] Lee, J.H., Ha, J., Jung, J.Y., Lee, S. , “Semantic contextual advertising based on the open directory project”, ACM Trans. On the
Web 7(4) 2013, 24:1-24:22
8
Society/Government
government 0.47
election 0.29
… …
Averaged term vector
(TF-IDF scheme)
president 0.10
… …
3.1 ODP-based Knowledge Representation [1]
0.51
0.31
Top
Game Society
Online Gambling GovernmentCrime
.
.
.
.
.
.
…
9
[1] Lee, J.H., Ha, J., Jung, J.Y., Lee, S. , “Semantic contextual advertising based on the open directory project”, ACM Trans. On the
Web 7(4) 2013, 24:1-24:22
Society/Government/President
president 0.44
government 0.31
trump 0.10
… …
Game/Gambling/Casinos
casino 0.53
trump 0.38
card 0.26
… …
3.1 ODP-based Knowledge Representation
• This approach cannot capture the semantic relations between words
(e.g., prez and president).
?
president
ODP taxonomy
A given document d,
“Tump became prez”
0.67 trump
0.51 prez
… …
prez
10
“Trump became prez”
• The word2vec projects words in a shallow layer structure and directly learns
the representation of words using context words.
• Trained word vectors with similar semantic meanings would be located at
high proximity within the vector space.
3.2 implicit representation model - Word2Vec[1][2]
prez
election
president
casino
gambling
[1] Mikolov, T., Chen, K., Corrado, G., Dean, J., “Efficient estimation of word representations in vector space”, ICLR (Workshop) 2013
[2] Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J., “Distributed representations of words and phrases and their
compositionality”, NIPS 2013, 3111-3119
11
“Additive compositionality”
4. Joint Model of Explicit and Implicit Representation
Explicit
Representation
Society/Government/President
president 0.44
government 0.31
trump 0.10
… …
Implicit
Representation
12
• We propose two joint models of ODP-based text classification and word2vec.
• These joint models generate category vectors, which represent the semantics
of ODP categories.
4.1 Generating Category Vector with Algebraic Operation
president
government
trump
…
Explicit
Representation
Implicit
Representation
Word vectorsODP category
=
+
* +
…
Society/Government/President
Category vector
…
Society/Government/President
president 0.44
government 0.31
trump 0.10
… …
…
…
13
• First approach generates the category vector by using the vector scalar
multiplication and vector addition methods.
4.2 Generating Category Vector with Embedding
US President
Trump urged
congress … Society/Government/President
Candidate categories
Target word and
President urged congressUS
ODP-based classification
Skip-gram Category embedding
Game/Gambling
context words
Society/Government/President
Category vector
…
14
• Second approach extends word2vec to represent category vectors.
• The category vector of an ODP category is expected to represent the
collective semantics of words under this category.
A given document d,
“Tump became prez”
0.67 trump
0.51 prez
… …
Society/Government/President
president 0.44
government 0.31
trump 0.10
… …
Game/Gambling/Casinos
casino 0.53
trump 0.38
card 0.26
… …
Category vector
…
Category vector
…
document vector
…
5. Semantic Similarity Model
15
• We develop a novel semantic similarity measure, which captures both the
semantic relations between words and the semantics of ODP categories.
Society/Government/President
president 0.44
government 0.31
trump 0.10
… …
Game/Gambling/Casinos
casino 0.53
trump 0.38
card 0.26
… …
5.1 Using Word-level Semantics
?0.7
A given document d,
“Tump became prez”
0.67 trump
0.51 prez
… …
…
president
…
prez
16
(> 0.6)
0.51*0.44*
…/President ?
…/Casinos ?
? Document d
5.2 Using Category- and Word-level Semantics
Society/Government/President
president 0.44
government 0.31
trump 0.10
… …
Game/Gambling/Casinos
casino 0.53
trump 0.38
card 0.26
… …
A given document d,
“Tump became prez”
0.67 trump
0.51 prez
… …
17
α
α
α
6. Experiments
6.1 Dataset
• We train our category embedding model and word2vec model on the
“One Billion Word Language Model Benchmark” dataset.
18
• Baselines
▫ 𝑂𝐷𝑃 [1]
▫ 𝑃𝑉 [2]
▫ 𝐶𝑁𝑁 [3]
• Proposed models
▫ 𝑂𝐷𝑃𝐶𝑉
▫ 𝑂𝐷𝑃 𝑊𝑉
▫ 𝑂𝐷𝑃𝐶𝑉+𝑊𝑉
6.2 Experimental Setup
𝑂𝐷𝑃𝐶𝑉
𝑂𝐷𝑃 𝑊𝑉
𝑂𝐷𝑃𝐶𝑉+𝑊𝑉
19
[1] Lee, J.H., Ha, J., Jung, J.Y., Lee, S. , “Semantic contextual advertising based on the open directory project”, ACM Trans. On the
Web 7(4) 2013, 24:1-24:22
[2] Le, Q.V., Mikolov, T., “Distributed representation of sentences and documents”, ICML 2014, 1188-1196
[3] Kim, Y. , “Convolutional neural networks for sentence classification”, EMNLP 2014, 1746-1751
(category-level semantics)
(word-level semantics)
(category- and word-level semantics)
6.3 Experimental Results
20
• Comparison of Category Vector Generations
6.3 Experimental Results
• We found that the curve reaches a peek at α = 0.9. This result shows that the
category vector plays a major role in the performance of 𝑂𝐷𝑃𝐶𝑉+𝑊𝑉.
• We observed that when the weight of category vector is 1.0, the performance
drops sharply. This means that the word overlap feature is still helpful.
21
• Classification Performance based on Different α Values
6.3 Experimental Results
• 𝑂𝐷𝑃𝐶𝑉+𝑊𝑉 performs better than 𝑂𝐷𝑃 over 9%, 12%, and 10% on average in
terms of precision, recall, and F1-score, respectively.
• We observe that 𝐶𝑁𝑁 exhibits a better performance than 𝑂𝐷𝑃 in the
moderate-scale text classification.
22
• Classification Performance on the ODP Dataset
[1] Lee, J.H., Ha, J., Jung, J.Y., Lee, S. , “Semantic contextual advertising based on the open directory project”, ACM Trans. On the
Web 7(4) 2013, 24:1-24:22
[2] Le, Q.V., Mikolov, T., “Distributed representation of sentences and documents”, ICML 2014, 1188-1196
[3] Kim, Y. , “Convolutional neural networks for sentence classification”, EMNLP 2014, 1746-1751
[1]
[2]
[3]
[1]
[3]
6.3 Experimental Results
23
• Classification Performance on the NYT Dataset
(2,735 categories)
• The performance of 𝑂𝐷𝑃𝐶𝑉+𝑊𝑉 outperforms all the methods.
• We also observe that both 𝑂𝐷𝑃𝐶𝑉 and 𝑂𝐷𝑃 𝑊𝑉 outperform ODP. These
results clearly demonstrate that both category and word vectors are effective
at text classification.
[1]
[2]
[3]
6.4 Analysis
24
• Nearest Words of Category Vector and Highly Weighted
Words in Centroid Vector of ODP Categories
• We also observe that the category vector identifies semantically related
words that do not appear in the ODP knowledge base
1. We have proposed novel joint models of the explicit
and implicit representation techniques to handle the
large-scale text classification.
2. We have incorporated the well-known word2vec
model into the ODP-based classification framework.
3. We have verified the large-scale classification
performance of the proposed methodology using real-
world datasets.
7. Conclusion
25
Q & A
26
* Explicit Knowledge Representation
27
* Implicit Knowledge Representation: Embedding
28

Incorporating Word Embeddings into Open Directory Project Based Large-Scale Classification

  • 1.
    Incorporating Word Embeddingsinto Open Directory Project based Large-Scale Classification Presented by KANG-MIN KIM kangmin89@korea.ac.kr Data Intelligence Laboratory, Korea University 5th June, 2018 Kang-Min Kim, Alieva Dinara, Byung-Ju Choi, SangKeun Lee Korea University PAKDD 2018 1
  • 2.
    Contents 1. Problem Statement 2.Contributions 3. Preliminary 4. Joint Model of Explicit and Implicit Representation 5. Semantic Similarity Measure 6. Experiments 7. Conclusion 2
  • 3.
    1. Problem Statement •How to incorporate word embeddings into explicit knowledge representation to improve the performance of large-scale text classification? ?Explicit Representation Implicit Representation 3
  • 4.
    1. We proposea novel methodology to handle the large-scale text classification, which utilizes both the explicit and implicit representation. 2. We develop two novel joint models of ODP-based classification and word2vec to generate category vectors that represent the semantics of ODP categories. In addition, we develop a new semantic similarity measure that utilizes both the category and word vectors. 3. We demonstrate the efficacy of the proposed methodology through extensive experiments on real-world datasets. Our approach significantly outperforms the state-of-the-art techniques. 2. Contributions 4
  • 5.
    • Text classificationusing NN [1][2] Requires a lot of training data per class Limited to small- or moderate-scale * Previous Works [1] Kim, Y. , “Convolutional neural networks for sentence classification”, EMNLP 2014, 1746-1751. [2] Le, Q.V., Mikolov, T., “Distributed representation of sentences and documents”, ICML 2014, 1188-1196 5
  • 6.
    * Previous Works •Text classification using knowledge base [1][2] [1] Lee, J.H., Ha, J., Jung, J.Y., Lee, S. , “Semantic contextual advertising based on the open directory project”, ACM Trans. On the Web 7(4) 2013, 24:1-24:22 [2] Shin, H., Lee, G., Rye, W.J., Lee, S., “Utilizing Wikipedia knowledge in open directory project based text classification”, SAC 2017, 309-314 6
  • 7.
    * Open DirectoryProject (ODP) • Large-scale and tree-structured (maximum of 15 levels) web directory constructed by humans. • Contains about 1 million categories and 4 million web pages. Top Game Society Online Gambling GovernmentCrime 7
  • 8.
    3.1 ODP-based KnowledgeRepresentation [1] Top Game Society Online Gambling GovernmentCrime . . . . . . List of government officials and their title, updated weekly… ODP documents … 3. Preliminary Bag-of-words word frequency government 1 officials 1 updated 1 weekly 1 ... ... [1] Lee, J.H., Ha, J., Jung, J.Y., Lee, S. , “Semantic contextual advertising based on the open directory project”, ACM Trans. On the Web 7(4) 2013, 24:1-24:22 8
  • 9.
    Society/Government government 0.47 election 0.29 …… Averaged term vector (TF-IDF scheme) president 0.10 … … 3.1 ODP-based Knowledge Representation [1] 0.51 0.31 Top Game Society Online Gambling GovernmentCrime . . . . . . … 9 [1] Lee, J.H., Ha, J., Jung, J.Y., Lee, S. , “Semantic contextual advertising based on the open directory project”, ACM Trans. On the Web 7(4) 2013, 24:1-24:22
  • 10.
    Society/Government/President president 0.44 government 0.31 trump0.10 … … Game/Gambling/Casinos casino 0.53 trump 0.38 card 0.26 … … 3.1 ODP-based Knowledge Representation • This approach cannot capture the semantic relations between words (e.g., prez and president). ? president ODP taxonomy A given document d, “Tump became prez” 0.67 trump 0.51 prez … … prez 10 “Trump became prez”
  • 11.
    • The word2vecprojects words in a shallow layer structure and directly learns the representation of words using context words. • Trained word vectors with similar semantic meanings would be located at high proximity within the vector space. 3.2 implicit representation model - Word2Vec[1][2] prez election president casino gambling [1] Mikolov, T., Chen, K., Corrado, G., Dean, J., “Efficient estimation of word representations in vector space”, ICLR (Workshop) 2013 [2] Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J., “Distributed representations of words and phrases and their compositionality”, NIPS 2013, 3111-3119 11 “Additive compositionality”
  • 12.
    4. Joint Modelof Explicit and Implicit Representation Explicit Representation Society/Government/President president 0.44 government 0.31 trump 0.10 … … Implicit Representation 12 • We propose two joint models of ODP-based text classification and word2vec. • These joint models generate category vectors, which represent the semantics of ODP categories.
  • 13.
    4.1 Generating CategoryVector with Algebraic Operation president government trump … Explicit Representation Implicit Representation Word vectorsODP category = + * + … Society/Government/President Category vector … Society/Government/President president 0.44 government 0.31 trump 0.10 … … … … 13 • First approach generates the category vector by using the vector scalar multiplication and vector addition methods.
  • 14.
    4.2 Generating CategoryVector with Embedding US President Trump urged congress … Society/Government/President Candidate categories Target word and President urged congressUS ODP-based classification Skip-gram Category embedding Game/Gambling context words Society/Government/President Category vector … 14 • Second approach extends word2vec to represent category vectors. • The category vector of an ODP category is expected to represent the collective semantics of words under this category.
  • 15.
    A given documentd, “Tump became prez” 0.67 trump 0.51 prez … … Society/Government/President president 0.44 government 0.31 trump 0.10 … … Game/Gambling/Casinos casino 0.53 trump 0.38 card 0.26 … … Category vector … Category vector … document vector … 5. Semantic Similarity Model 15 • We develop a novel semantic similarity measure, which captures both the semantic relations between words and the semantics of ODP categories.
  • 16.
    Society/Government/President president 0.44 government 0.31 trump0.10 … … Game/Gambling/Casinos casino 0.53 trump 0.38 card 0.26 … … 5.1 Using Word-level Semantics ?0.7 A given document d, “Tump became prez” 0.67 trump 0.51 prez … … … president … prez 16 (> 0.6) 0.51*0.44*
  • 17.
    …/President ? …/Casinos ? ?Document d 5.2 Using Category- and Word-level Semantics Society/Government/President president 0.44 government 0.31 trump 0.10 … … Game/Gambling/Casinos casino 0.53 trump 0.38 card 0.26 … … A given document d, “Tump became prez” 0.67 trump 0.51 prez … … 17 α α α
  • 18.
    6. Experiments 6.1 Dataset •We train our category embedding model and word2vec model on the “One Billion Word Language Model Benchmark” dataset. 18
  • 19.
    • Baselines ▫ 𝑂𝐷𝑃[1] ▫ 𝑃𝑉 [2] ▫ 𝐶𝑁𝑁 [3] • Proposed models ▫ 𝑂𝐷𝑃𝐶𝑉 ▫ 𝑂𝐷𝑃 𝑊𝑉 ▫ 𝑂𝐷𝑃𝐶𝑉+𝑊𝑉 6.2 Experimental Setup 𝑂𝐷𝑃𝐶𝑉 𝑂𝐷𝑃 𝑊𝑉 𝑂𝐷𝑃𝐶𝑉+𝑊𝑉 19 [1] Lee, J.H., Ha, J., Jung, J.Y., Lee, S. , “Semantic contextual advertising based on the open directory project”, ACM Trans. On the Web 7(4) 2013, 24:1-24:22 [2] Le, Q.V., Mikolov, T., “Distributed representation of sentences and documents”, ICML 2014, 1188-1196 [3] Kim, Y. , “Convolutional neural networks for sentence classification”, EMNLP 2014, 1746-1751 (category-level semantics) (word-level semantics) (category- and word-level semantics)
  • 20.
    6.3 Experimental Results 20 •Comparison of Category Vector Generations
  • 21.
    6.3 Experimental Results •We found that the curve reaches a peek at α = 0.9. This result shows that the category vector plays a major role in the performance of 𝑂𝐷𝑃𝐶𝑉+𝑊𝑉. • We observed that when the weight of category vector is 1.0, the performance drops sharply. This means that the word overlap feature is still helpful. 21 • Classification Performance based on Different α Values
  • 22.
    6.3 Experimental Results •𝑂𝐷𝑃𝐶𝑉+𝑊𝑉 performs better than 𝑂𝐷𝑃 over 9%, 12%, and 10% on average in terms of precision, recall, and F1-score, respectively. • We observe that 𝐶𝑁𝑁 exhibits a better performance than 𝑂𝐷𝑃 in the moderate-scale text classification. 22 • Classification Performance on the ODP Dataset [1] Lee, J.H., Ha, J., Jung, J.Y., Lee, S. , “Semantic contextual advertising based on the open directory project”, ACM Trans. On the Web 7(4) 2013, 24:1-24:22 [2] Le, Q.V., Mikolov, T., “Distributed representation of sentences and documents”, ICML 2014, 1188-1196 [3] Kim, Y. , “Convolutional neural networks for sentence classification”, EMNLP 2014, 1746-1751 [1] [2] [3] [1] [3]
  • 23.
    6.3 Experimental Results 23 •Classification Performance on the NYT Dataset (2,735 categories) • The performance of 𝑂𝐷𝑃𝐶𝑉+𝑊𝑉 outperforms all the methods. • We also observe that both 𝑂𝐷𝑃𝐶𝑉 and 𝑂𝐷𝑃 𝑊𝑉 outperform ODP. These results clearly demonstrate that both category and word vectors are effective at text classification. [1] [2] [3]
  • 24.
    6.4 Analysis 24 • NearestWords of Category Vector and Highly Weighted Words in Centroid Vector of ODP Categories • We also observe that the category vector identifies semantically related words that do not appear in the ODP knowledge base
  • 25.
    1. We haveproposed novel joint models of the explicit and implicit representation techniques to handle the large-scale text classification. 2. We have incorporated the well-known word2vec model into the ODP-based classification framework. 3. We have verified the large-scale classification performance of the proposed methodology using real- world datasets. 7. Conclusion 25
  • 26.
  • 27.
    * Explicit KnowledgeRepresentation 27
  • 28.
    * Implicit KnowledgeRepresentation: Embedding 28

Editor's Notes

  • #2 Hello, This is kangmin Kim. I’ll talk about “Incorporating Word Embeddings into Open Directory Project based Large-Scale Classification”
  • #4 I will start with problem statement, In our paper, the research question is “How to incorporate word embeddings into explicit knowledge representation to improve the performance of large-scale text classification?” For large-scale classification, many studies have been done with an explicit representation model, which uses popular knowledge base, such as ProBase, Wikipedia, or the Open directory project. Explicit representation models are effective in text classification. However, the performance of these models is limited to the associated knowledge bases. To alleviate the limitation, we incorporate implicit representation, such as word embeddings, into the explicit representation.
  • #5 So, our technical contributions are as follows. We propose a novel methodology to handle the large-scale text classification, which utilizes both the explicit and implicit representation. We develop two novel joint models of ODP-based classification and word2vec to generate category vectors that represent the semantics of ODP categories. In addition, we develop a new semantic similarity measure that utilizes both the category and word vectors. Finally, We demonstrate the efficacy of the proposed methodology through extensive experiments on real-world datasets. Our approach significantly outperforms the state-of-the-art techniques.
  • #6 Recently, implicit representation models, such as embedding or deep learning, have been successfully adopted to text classification task due to their outstanding performance. However, the neural network-based classification usually require a lot of training data per class. In large-scale text classification, the training data for each class is relatively insufficient and skewed. Thus, such implicit representation models are limited to small- or moderate-scale text classification.
  • #7 To handle the large-scale text classification, our previous works have utilized popular knowledge bases, such as Open directory project.
  • #8 Open directory project is a large-scale and taxonomy-structured web directory and constructed by volunteer editors.
  • #9 And, we have used an ODP-based knowledge representation to represent ODP knowledge. We represent all ODP documents in each category as bag-of-words.
  • #10 And calculated the averaged term vector with tf-idf scheme. However, due to the large-scale taxonomy structure of the ODP, each ODP category contains a different number of documents, so sometimes resulting in sparsity or unavailability of training documents. So, our previous work proposed a machine learning technique utilizing hierarchy structure. This technique merges the centroid of the descendant categories. It exhibits a robust performance in large-scale text classification.
  • #11 However, this approach cannot capture the semantic relations between words. For example, given a document “Trump became prez”, The document and each ODP categories are represented by ODP-based knowledge representation. And then, The similarity is computed by cosine similarity. So, as you can see, the term weight of trump in the given document is computed with the term weight of trump in President and Casinos category. However, the existing ODP-based classification can not calculate the term weight of prez and president. Therefore, this classifier may classify the document into the game/gambling/casinos category.
  • #12 To complement the ODP-based text classification, we adopt the word2vec, a popular word embeddings technique. The word2vec directly learns the representation of words using context words. Trained word vectors with similar semantic meanings would be located at high proximity. In addition, Word vectors can be composed by an element-wise addition. This property is called “additive compositionality”.
  • #13 In order to utilize such the property of word2vec, we propose two joint models of ODP-based text classification and word2vec. These joint models generate category vectors, which represent the semantics of ODP categories.
  • #14 First approach generates the category vector by using the algebraic operation. For example, ODP-based knowledge representation contains words, such as president, government, trump, and term weight of each words. We multiply the term weights of each word by each word vector obtained from the pre-trained word2vec model. Then, the weighted word vectors are composed as a category vector by element-wise addition. Finally, we obtain the category vector of society/government/president
  • #15 Our second approach extends word2vec to embed both category and word vectors. For example, when Trump is the target word and US president urged congress are context words, trump is trained by predicting context words, at the same time, ODP-based classification identifies candidate categories such as Game/Gambling and Society/Government/President. We then select the most appropriate ODP category in the current context. When the context is “US President urged congress”, the most appropriate category may be Society/Government/President. Finally, we train the category vector of Society/government/president. President category is expected to represent the collective semantics of words under this category like Tump, President, White house, and so on.
  • #16 We can classify the given document into the most relevant category by calculating cosine similarity between category and document vector However, the word overlap is still an important factor for text classification. So, we develop a novel semantic similarity measure, which captures both the semantic relations between words and the semantics of ODP categories.
  • #17 First, we propose a semantic similarity measure that considers word-level semantic. In our similarity measure, if the words have similar meaning, we compute the term weights of two words by using the word vectors. We first calculate the similarity between all words. In this example, the similarity between prez and president is 0.7. This is larger than threshold 0.6. So, we consider that they have similar meaning. Then, we multiply the term weight of prez by the term weight of president with 0.7.
  • #18 Second, we develop a robust similarity measure by utilizing both the category and word vectors. A category vector is utilized as a pseudo word in the process of computing semantic similarity. Therefore, the similarity measure captures the semantic relation between category and document and word and category, as well as semantic relations between words. We have examined how to determine the pseudo term weight of the category and document through many parameter experiments.
  • #19 Now, I will proceed with experiment part. First, I will describe datasets. To obtain a well-organized ODP taxonomy, we apply heuristic rules and build our own taxonomy with 2,735 categories. And, to construct the moderate-scale classification dataset, we use only 13 top-level categories. In addition to the ODP test datasets, we use 120 the New York Times articles.
  • #20 The baselines include the existing ODP-based text classification, the paragraph vector text classifier and convolutional neural networks-based text classifier. And, there are three proposed models, ODPcv, ODPwv and ODPcv+wv
  • #21 We first compare the two methods to generate category vectors. This table shows the results, unexpectedly, we observed that a simple ODPcv_algebra clearly outperforms a relatively elaborate ODPcv_Embedding.
  • #22 Next, we performed a parameter setting experiment to determine the pseudo term weight. This figure shows the classification performance based on different α values. We found that the curve reaches a peak at α is 0.9. This result shows that the category vector plays a major role in the performance of ODPcv+wv.
  • #23 And, the table on left side summarizes the experimental results for text classification on the ODP test dataset with 2,735 categories. We observed that ODPcv+wv outperforms all the other proposed methods, as well as the baselines. And, our experimental results show that CNN performs the worst among the six methods. So, we also compare the performance of CNN with the ODP baseline on moderate-scale. From the table on the right side, we observed that CNN exhibits a better performance than ODP in the moderate-scale text classification. However, from observing both tables, we confirm that CNN is indeed limited to the moderate-scale text classification.
  • #24 And, this table shows that the results for classification performance on the New York Times dataset. Again, the performance of 𝑂𝐷𝑃 𝐶𝑉+𝑊𝑉 outperforms all the methods.
  • #25 This slide shows the qualitative analysis of nearest words of ODP category vectors. we observed that the category vector actually understands the semantics better than the centroid vector. Interestingly, we also observe that the category vector identifies semantically related words that do not appear in the ODP knowledge base like Henin, a Belgian former professional tennis player.
  • #26 We conclude as follows.
  • #28 최근 neural networks에 기반한 text 분류기법들이 좋은 성능을 보이며 인기를 끌고 있음. 그러나 이러한 neural networks에 기반한 분류기법은 class당 많은 training data를 요구함. 따라서 이러한 모델들은 small 또는 moderate 규모로 분류 결과가 제한됨. 뿐만 아니라 인간이 이해하기 어려움 (implicit).
  • #29 최근 neural networks에 기반한 text 분류기법들이 좋은 성능을 보이며 인기를 끌고 있음. 그러나 이러한 neural networks에 기반한 분류기법은 class당 많은 training data를 요구함. 따라서 이러한 모델들은 small 또는 moderate 규모로 분류 결과가 제한됨. 뿐만 아니라 인간이 이해하기 어려움 (implicit).