SlideShare a Scribd company logo
1 of 6
Download to read offline
LATENT SEMANTIC WORD SENSE
DISAMBIGUATION USING GLOBAL
CO-OCCURRENCE INFORMATION
Minoru Sasaki
Department of Computer and Information Sciences, Faculty of Engineering,
Ibaraki University, 4-12-1, Nakanarusawa, Hitachi, Ibaraki, Japan
msasaki@mx.ibaraki.ac.jp

ABSTRACT
In this paper, I propose a novel word sense disambiguation method based on the global cooccurrence information using NMF. When I calculate the dependency relation matrix, the
existing method tends to produce very sparse co-occurrence matrix from a small training set.
Therefore, the NMF algorithm sometimes does not converge to desired solutions. To obtain a
large number of co-occurrence relations, I propose to use co-occurrence frequencies of
dependency relations between word features in the whole training set. This enables us to solve
data sparseness problem and induce more effective latent features. To evaluate the efficiency of
the method of word sense disambiguation, I make some experiments to compare with the result
of the two baseline methods. The results of the experiments show this method is effective for
word sense disambiguation in comparison with the all baseline methods. Moreover, the
proposed method is effective for obtaining a stable effect by analyzing the global co-occurrence
information.

KEYWORDS
Word Sense Disambiguation, Global Co-occurrence information, Dependency Relations, NonNegative Matrix Factorization

1. INTRODUCTION
In natural language processing, acquisition of sense examples from an example set that contain a
given target word enables to construct an extensive data set of tagged examples to demonstrate a
wide range of semantic analysis. For example, using the obtained data set, we can construct a
classifier that identifies its word sense by analyzing co-occurrence statistics of a target word.
Also, we can make a wide-coverage case frame dictionary automatically and construct thesaurus
for each meaning of a polysemous word. To construct large-sized training data, language
dictionary and thesaurus, it is increasingly important to further improve to select the most
appropriate meaning of the ambiguous word.
If we have training data, word sense disambiguation (WSD) task reduces to a classification
problem based on supervised learning. This approach is generally applicable to construct a
classifier from a set of manually sense-tagged training data. Then, this classifier is used to
identify the appropriate sense for new examples. A typical method for this approach is the
David C. Wyld et al. (Eds) : CCSIT, SIPP, AISC, PDCTA, NLP - 2014
pp. 463–468, 2014. © CS & IT-CSCP 2014

DOI : 10.5121/csit.2014.4240
464

Computer Science & Information Technology (CS & IT)

classical bag-of-words (BOW) approach [5], where each document is represented as a feature
vector counting the number of occurrences of different words as features. By using such features,
it becomes easy to adapt many existing supervised learning methods such as Support Vector
Machine (SVM) [1] for the WSD task. However, when the general vector space model, in which
a document is represented as a vector using term frequency based weighting methods, is applied
to the WSD, the local context words are typically used as features and the global context
information without dictionary information is not employed in the previous research.
In this paper, I propose a novel word sense disambiguation method based on the global cooccurrence information using NMF. In previous research, [2] proposes a novel WSD method of
particular word instances using the automatically extracted sense information. When we calculate
the dependency relation matrix, the existing method tends to produce very sparse co-occurrence
matrix from a small training set. Therefore, the NMF algorithm sometimes does not converge to
desired solutions. To obtain a large number of co-occurrence relations, I propose to use cooccurrence frequencies of dependency relations between word features in the whole training set.
This enables us to solve data sparseness problem and induce more effective latent features.

2. NON-NEGATIVE MATRIX FACTORIZATION
Non-Negative Matrix Factorization (NMF) is a popular decomposition method for multivariate
data [4]. NMF decomposes the m n non-negative matrix X to the m k matrix W and the k n
matrix H, while these matrixes have no negative elements. Usually, k is chosen to be smaller
value than n and m.

×

×

×

(1)

X ≈ WH

Using the NMF for a term-document matrix X, the matrix H represents the clustering result with k
topics.
For quantifying the quality of this approximation, cost functions based on Kullback-Leibler
divergence is used and minimized using iterative update rules as follows:

Wij ← Wij

( XH )ij

(2)

(WHH )
T

ij

(X H )

(3)

T

H ij ← H ij

ij

(HW W )
T

,

ij

where Wij and Hij are the i-th row and the j-th column element respectively. These matrices W and
H are initialized randomly with non-negative data and these above update rules are iteratively
applied until the max iteration number (or convergence) is reached.

3. WSD USING GLOBAL CO-OCCURRENCE INFORMATION
3.1. Latent Semantic WSD Using Local Co-occurrence Information
In previous research, [3] proposes a novel WSD method of particular word instances using the
automatically extracted sense information. This method induces latent features for three matrices.
The first matrix A contains co-occurrence frequencies of words that the target word co-occurs
Computer Science & Information Technology (CS & IT)

465

with. The second matrix B contains term frequencies of words that appear in the context window.
The third matrix C contains co-occurrence frequencies of words that the co-occurring context
words of the target word co-occur with. Then, NMF is applied to the three matrices to factorize
each matrix into two non-negative matrices, while the former results are used to initialize the next
factorization, as shown in Figure 1.

Figure 1. Interleaved NMF algorithm for Latent Semantic WSD

Given a non-negative matrices A, B and C in the beginning of this method, matrices W, H, G and
F are initialized randomly with non-negative values. Then, it decomposes the matrix A into the
matrices W and H using NMF. In the decomposition of the matrix B, the updated matrix W is
copied to matrix V and the updated matrices V and G is computed using NMF. In the
decomposition of the matrix C, the transpose of the updated matrix G is copied to matrix U and
the updated matrices U and F are obtained using NMF. At the last step of the iteration, the matrix
F is copied to matrix H. This iteration is repeated until the maximum number of iterations is
reached or the objective function of all NMF decomposition no longer improves.
In order to perform this method for WSD, it needs to fold each sense of the target word into
semantic space using the matrix H. For each sense in training data, centroid vector c is calculated
and this centroid is mapped into the semantic space using the matrix H as follows:

b = cH T

(4)

For input example of the target word, its context words are extracted to construct a vector f and
the vector f is also mapped into the same semantic space using the matrix G as follows:

d = fG T

(5)

Then, cosine similarity between the vector d and each of the sense vectors b are calculated and
the sense that is the largest cosine similarity is selected.
466

Computer Science & Information Technology (CS & IT)

3.2. Latent Semantic WSD Using Global Co-occurrence Information
This Latent Semantic WSD method is efficient for finding a reduced semantic space. However,
problem arises when we apply this method. When we calculate the third matrix C, this method
tends to produce very sparse co-occurrence matrix from a small training set. To obtain a large
number of co-occurrence relations, I propose to use co-occurrence frequencies of dependency
relations between word features in the whole training set. This enables us to solve data sparseness
problem and induce more effective latent features. Like the above method, the proposed method
needs three matrices, but the third matrix is different from the previous method. The third matrix
D contains co-occurrence frequency of context words that co-occur in dependency relations to
context words in a large document set. The proposed method induces latent features for these
three matrices A, B and D.

4. EXPERIMENT
To evaluate the efficiency of the proposed WSD method using the global Co-occurrence
information, I conduct some experiments to compare with the result of the existing methods. In
this section, I describe an outline of the experiments.

4.1. Data
I used the Semeval-2010 Japanese WSD task data set, which includes 50 target words comprising
22 nouns, 23 verbs, and 5 adjectives [2]. In this data set, there are 50 training and 50 test instances
for each target word. Moreover, to obtain a large number of co-occurrence relations, I use 22,832
documents chosen from the Japanese BCCWJ corpus1.

4.2. Evaluation Method
4.2.1. Baseline System 1
As the first baseline method, I only use the first matrix A described in section 3.1. To construct
the matrix A, I represent each sentence with the target word in the training set as a highdimensional vector where each component represents the co-occurrence frequency of the target
word in the sentence. Then, NMF is applied to the matrix A to factorize each matrix into two nonnegative matrices W and H. Each vector is tagged with the sense of the target word in that
sentence. So centroid ci of the co-occurrence vectors that are assigned the same sense i is
calculated and each centroid ci is mapped into the semantic space using the matrix H as follows:

bi = ci H T

(6)

For input example of the target word, its context words are extracted to construct a vector f and
the vector f is also mapped into the same semantic space using the matrix H as follows:

d = fH T

(7)

Then, cosine similarity between the vector d and each of the sense vectors b are calculated and
the sense that is the largest cosine similarity is selected.

1

http://www.ninjal.ac.jp/english/products/bccwj/
Computer Science & Information Technology (CS & IT)

467

4.2.2. Baseline System 2
In the second baseline system, I use the latent semantic WSD using local co-occurrence
information described in Section 3.1. I construct the three matrices A, B and C to induce latent
semantic dimensions using NMF.

Table 1. Experimental Results of Each Execution
(highest precision rates among all the run are written in bold font)
System
Baseline System 1
Baseline System 2
Proposed Method

Run 1
53.28%
59.68%
60.44%

Run 2
53.88%
61.08%
61.48%

Run 3
51.80%
61.08%
61.04%

62.00%
61.00%
60.00%
59.00%
58.00%
57.00%
56.00%
55.00%
54.00%
53.00%
52.00%
Baseline 1

Baseline 2

proposed method

Figure 2. Highest Precision of Each system

5. EXPERIMENTAL RESULTS
Figure 2 shows the experimental results of the baseline methods and the proposed method.
Computational experience reported shows that the choice of initial point is quite important to the
NMF's goal. In practice, the algorithms are run several times with different initial points and the
NMF is chosen as the feasible solution. In our experiments, each method is executed three times
and average precision of all execution is calculated.
In this Figure 2, the proposed method shows higher precision than the other baseline methods so
that this approach is effective for word sense disambiguation. In comparison with the baseline
system 1, the proposed method can obtain better precision so that it is effective for WSD to use
context information and co-occurrence information. In comparison with the baseline system 2, the
proposed method provides slightly better precision than the baseline system 2. As shown in Table
1, the proposed method can obtain the highest precision and can be stable at high precision value.
However, the baseline system 2 cannot achieve stable precision because of the lack of the number
of co-occurrence information. Therefore, the proposed method is effective for obtaining a stable
effect by analyzing the global co-occurrence information.
468

Computer Science & Information Technology (CS & IT)

ACKNOWLEDGEMENT
In this paper, I propose a novel word sense disambiguation method based on the global cooccurrence information using NMF. To evaluate the efficiency of the method of word sense
disambiguation, I conduct some experiments to compare with the result of the two baseline
methods. The results of the experiments show this method is effective for word sense
disambiguation in comparison with the all baseline methods. Moreover, the proposed method is
effective for obtaining a stable effect by analyzing the global co-occurrence information.
Further work would be required to consider a larger sized training data to obtain a large amount
of co-occurrence information.

REFERENCES
[1]
[2]

[3]

[4]

[5]

Corinna Cortes & Vladimir Vapnik, (1995) “Support-Vector Networks”, Machine Learning, Vol. 20,
No. 3, pp273-297.
Okumura, Manabu & Shirai, Kiyoaki & Komiya, Kanako & Yokono, Hikaru, (2010) “SemEval-2010
task: Japanese WSD”, Proceedings of the 5th International Workshop on Semantic Evaluation, pp6974.
Van de Cruys, Tim & Apidianaki, Marianna, (2011) “Latent Semantic Word Sense Induction and
Disambiguation”, Proceedings of the 49th Annual Meeting of the Association for Computational
Linguistics: Human Language Technologies-Volume 1, pp1476-1485.
Lee, Daniel D. & Seung, Sebastian, (2001) “Algorithms for Non-negative Matrix Factorization”,
Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference, MIT
Press, pp556–562.
Witten, Ian H. & Moffat, Alistair & Bell, Timothy C., (1999) “Managing Gigabytes (second edition):
Compressing and Indexing Documents and Images”, Morgan Kaufmann Publishers Inc.

AUTHORS
Minoru Sasaki: received his B.E., M.E. and D.Eng. degrees in information science and
intelligent systems from the University of Tokushima in 1996, 1998 and 2000. He is now
a lecturer in the department of computer and information sciences at Ibaraki University.
His research interests include natural language processing and information, information
retrieval and text min ing.

More Related Content

What's hot

Big Data Processing using a AWS Dataset
Big Data Processing using a AWS DatasetBig Data Processing using a AWS Dataset
Big Data Processing using a AWS DatasetVishva Abeyrathne
 
A hybrid naïve Bayes based on similarity measure to optimize the mixed-data c...
A hybrid naïve Bayes based on similarity measure to optimize the mixed-data c...A hybrid naïve Bayes based on similarity measure to optimize the mixed-data c...
A hybrid naïve Bayes based on similarity measure to optimize the mixed-data c...TELKOMNIKA JOURNAL
 
Interactive Fuzzy Goal Programming approach for Tri-Level Linear Programming ...
Interactive Fuzzy Goal Programming approach for Tri-Level Linear Programming ...Interactive Fuzzy Goal Programming approach for Tri-Level Linear Programming ...
Interactive Fuzzy Goal Programming approach for Tri-Level Linear Programming ...IJERA Editor
 
Instance based learning
Instance based learningInstance based learning
Instance based learningswapnac12
 
Genetic algorithms
Genetic algorithmsGenetic algorithms
Genetic algorithmsswapnac12
 
Bayesian Co clustering
Bayesian Co clusteringBayesian Co clustering
Bayesian Co clusteringlau
 
Supervised Quantization for Similarity Search (camera-ready)
Supervised Quantization for Similarity Search (camera-ready)Supervised Quantization for Similarity Search (camera-ready)
Supervised Quantization for Similarity Search (camera-ready)Xiaojuan (Kathleen) WANG
 
Medical diagnosis classification
Medical diagnosis classificationMedical diagnosis classification
Medical diagnosis classificationcsandit
 
Multiview Alignment Hashing for Efficient Image Search
Multiview Alignment Hashing for Efficient Image SearchMultiview Alignment Hashing for Efficient Image Search
Multiview Alignment Hashing for Efficient Image Search1crore projects
 
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...CSCJournals
 
Ch 9-1.Machine Learning: Symbol-based
Ch 9-1.Machine Learning: Symbol-basedCh 9-1.Machine Learning: Symbol-based
Ch 9-1.Machine Learning: Symbol-basedbutest
 
A Combined Approach for Feature Subset Selection and Size Reduction for High ...
A Combined Approach for Feature Subset Selection and Size Reduction for High ...A Combined Approach for Feature Subset Selection and Size Reduction for High ...
A Combined Approach for Feature Subset Selection and Size Reduction for High ...IJERA Editor
 
Combining inductive and analytical learning
Combining inductive and analytical learningCombining inductive and analytical learning
Combining inductive and analytical learningswapnac12
 

What's hot (17)

Big Data Processing using a AWS Dataset
Big Data Processing using a AWS DatasetBig Data Processing using a AWS Dataset
Big Data Processing using a AWS Dataset
 
A hybrid naïve Bayes based on similarity measure to optimize the mixed-data c...
A hybrid naïve Bayes based on similarity measure to optimize the mixed-data c...A hybrid naïve Bayes based on similarity measure to optimize the mixed-data c...
A hybrid naïve Bayes based on similarity measure to optimize the mixed-data c...
 
228-SE3001_2
228-SE3001_2228-SE3001_2
228-SE3001_2
 
Interactive Fuzzy Goal Programming approach for Tri-Level Linear Programming ...
Interactive Fuzzy Goal Programming approach for Tri-Level Linear Programming ...Interactive Fuzzy Goal Programming approach for Tri-Level Linear Programming ...
Interactive Fuzzy Goal Programming approach for Tri-Level Linear Programming ...
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
 
Genetic algorithms
Genetic algorithmsGenetic algorithms
Genetic algorithms
 
Bayesian Co clustering
Bayesian Co clusteringBayesian Co clustering
Bayesian Co clustering
 
Supervised Quantization for Similarity Search (camera-ready)
Supervised Quantization for Similarity Search (camera-ready)Supervised Quantization for Similarity Search (camera-ready)
Supervised Quantization for Similarity Search (camera-ready)
 
Medical diagnosis classification
Medical diagnosis classificationMedical diagnosis classification
Medical diagnosis classification
 
Multiview Alignment Hashing for Efficient Image Search
Multiview Alignment Hashing for Efficient Image SearchMultiview Alignment Hashing for Efficient Image Search
Multiview Alignment Hashing for Efficient Image Search
 
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
 
Av33274282
Av33274282Av33274282
Av33274282
 
Ch 9-1.Machine Learning: Symbol-based
Ch 9-1.Machine Learning: Symbol-basedCh 9-1.Machine Learning: Symbol-based
Ch 9-1.Machine Learning: Symbol-based
 
Deep leaning Vincent Vanhoucke
Deep leaning Vincent VanhouckeDeep leaning Vincent Vanhoucke
Deep leaning Vincent Vanhoucke
 
Ai inductive bias and knowledge
Ai inductive bias and knowledgeAi inductive bias and knowledge
Ai inductive bias and knowledge
 
A Combined Approach for Feature Subset Selection and Size Reduction for High ...
A Combined Approach for Feature Subset Selection and Size Reduction for High ...A Combined Approach for Feature Subset Selection and Size Reduction for High ...
A Combined Approach for Feature Subset Selection and Size Reduction for High ...
 
Combining inductive and analytical learning
Combining inductive and analytical learningCombining inductive and analytical learning
Combining inductive and analytical learning
 

Similar to Latent Semantic Word Sense Disambiguation Using Global Co-Occurrence Information

Supervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting TechniqueSupervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting Techniqueiosrjce
 
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...cscpconf
 
Max stable set problem to found the initial centroids in clustering problem
Max stable set problem to found the initial centroids in clustering problemMax stable set problem to found the initial centroids in clustering problem
Max stable set problem to found the initial centroids in clustering problemnooriasukmaningtyas
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)neeraj7svp
 
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...ijsc
 
Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...ijsc
 
A Magnified Application of Deficient Data Using Bolzano Classifier
A Magnified Application of Deficient Data Using Bolzano ClassifierA Magnified Application of Deficient Data Using Bolzano Classifier
A Magnified Application of Deficient Data Using Bolzano Classifierjournal ijrtem
 
A parsimonious SVM model selection criterion for classification of real-world ...
A parsimonious SVM model selection criterion for classification of real-world ...A parsimonious SVM model selection criterion for classification of real-world ...
A parsimonious SVM model selection criterion for classification of real-world ...o_almasi
 
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...ijnlc
 
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHSENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHijwscjournal
 
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHSENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHijwscjournal
 
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...acijjournal
 

Similar to Latent Semantic Word Sense Disambiguation Using Global Co-Occurrence Information (20)

J017256674
J017256674J017256674
J017256674
 
Supervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting TechniqueSupervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting Technique
 
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
 
semeval2016
semeval2016semeval2016
semeval2016
 
A1804010105
A1804010105A1804010105
A1804010105
 
Max stable set problem to found the initial centroids in clustering problem
Max stable set problem to found the initial centroids in clustering problemMax stable set problem to found the initial centroids in clustering problem
Max stable set problem to found the initial centroids in clustering problem
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
 
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
 
Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...
 
A Magnified Application of Deficient Data Using Bolzano Classifier
A Magnified Application of Deficient Data Using Bolzano ClassifierA Magnified Application of Deficient Data Using Bolzano Classifier
A Magnified Application of Deficient Data Using Bolzano Classifier
 
ssc_icml13
ssc_icml13ssc_icml13
ssc_icml13
 
Hindi/Bengali Sentiment Analysis using Transfer Learning and Joint Dual Input...
Hindi/Bengali Sentiment Analysis using Transfer Learning and Joint Dual Input...Hindi/Bengali Sentiment Analysis using Transfer Learning and Joint Dual Input...
Hindi/Bengali Sentiment Analysis using Transfer Learning and Joint Dual Input...
 
A parsimonious SVM model selection criterion for classification of real-world ...
A parsimonious SVM model selection criterion for classification of real-world ...A parsimonious SVM model selection criterion for classification of real-world ...
A parsimonious SVM model selection criterion for classification of real-world ...
 
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
 
61_Empirical
61_Empirical61_Empirical
61_Empirical
 
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHSENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
 
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHSENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
 
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
 

Recently uploaded

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Latent Semantic Word Sense Disambiguation Using Global Co-Occurrence Information

  • 1. LATENT SEMANTIC WORD SENSE DISAMBIGUATION USING GLOBAL CO-OCCURRENCE INFORMATION Minoru Sasaki Department of Computer and Information Sciences, Faculty of Engineering, Ibaraki University, 4-12-1, Nakanarusawa, Hitachi, Ibaraki, Japan msasaki@mx.ibaraki.ac.jp ABSTRACT In this paper, I propose a novel word sense disambiguation method based on the global cooccurrence information using NMF. When I calculate the dependency relation matrix, the existing method tends to produce very sparse co-occurrence matrix from a small training set. Therefore, the NMF algorithm sometimes does not converge to desired solutions. To obtain a large number of co-occurrence relations, I propose to use co-occurrence frequencies of dependency relations between word features in the whole training set. This enables us to solve data sparseness problem and induce more effective latent features. To evaluate the efficiency of the method of word sense disambiguation, I make some experiments to compare with the result of the two baseline methods. The results of the experiments show this method is effective for word sense disambiguation in comparison with the all baseline methods. Moreover, the proposed method is effective for obtaining a stable effect by analyzing the global co-occurrence information. KEYWORDS Word Sense Disambiguation, Global Co-occurrence information, Dependency Relations, NonNegative Matrix Factorization 1. INTRODUCTION In natural language processing, acquisition of sense examples from an example set that contain a given target word enables to construct an extensive data set of tagged examples to demonstrate a wide range of semantic analysis. For example, using the obtained data set, we can construct a classifier that identifies its word sense by analyzing co-occurrence statistics of a target word. Also, we can make a wide-coverage case frame dictionary automatically and construct thesaurus for each meaning of a polysemous word. To construct large-sized training data, language dictionary and thesaurus, it is increasingly important to further improve to select the most appropriate meaning of the ambiguous word. If we have training data, word sense disambiguation (WSD) task reduces to a classification problem based on supervised learning. This approach is generally applicable to construct a classifier from a set of manually sense-tagged training data. Then, this classifier is used to identify the appropriate sense for new examples. A typical method for this approach is the David C. Wyld et al. (Eds) : CCSIT, SIPP, AISC, PDCTA, NLP - 2014 pp. 463–468, 2014. © CS & IT-CSCP 2014 DOI : 10.5121/csit.2014.4240
  • 2. 464 Computer Science & Information Technology (CS & IT) classical bag-of-words (BOW) approach [5], where each document is represented as a feature vector counting the number of occurrences of different words as features. By using such features, it becomes easy to adapt many existing supervised learning methods such as Support Vector Machine (SVM) [1] for the WSD task. However, when the general vector space model, in which a document is represented as a vector using term frequency based weighting methods, is applied to the WSD, the local context words are typically used as features and the global context information without dictionary information is not employed in the previous research. In this paper, I propose a novel word sense disambiguation method based on the global cooccurrence information using NMF. In previous research, [2] proposes a novel WSD method of particular word instances using the automatically extracted sense information. When we calculate the dependency relation matrix, the existing method tends to produce very sparse co-occurrence matrix from a small training set. Therefore, the NMF algorithm sometimes does not converge to desired solutions. To obtain a large number of co-occurrence relations, I propose to use cooccurrence frequencies of dependency relations between word features in the whole training set. This enables us to solve data sparseness problem and induce more effective latent features. 2. NON-NEGATIVE MATRIX FACTORIZATION Non-Negative Matrix Factorization (NMF) is a popular decomposition method for multivariate data [4]. NMF decomposes the m n non-negative matrix X to the m k matrix W and the k n matrix H, while these matrixes have no negative elements. Usually, k is chosen to be smaller value than n and m. × × × (1) X ≈ WH Using the NMF for a term-document matrix X, the matrix H represents the clustering result with k topics. For quantifying the quality of this approximation, cost functions based on Kullback-Leibler divergence is used and minimized using iterative update rules as follows: Wij ← Wij ( XH )ij (2) (WHH ) T ij (X H ) (3) T H ij ← H ij ij (HW W ) T , ij where Wij and Hij are the i-th row and the j-th column element respectively. These matrices W and H are initialized randomly with non-negative data and these above update rules are iteratively applied until the max iteration number (or convergence) is reached. 3. WSD USING GLOBAL CO-OCCURRENCE INFORMATION 3.1. Latent Semantic WSD Using Local Co-occurrence Information In previous research, [3] proposes a novel WSD method of particular word instances using the automatically extracted sense information. This method induces latent features for three matrices. The first matrix A contains co-occurrence frequencies of words that the target word co-occurs
  • 3. Computer Science & Information Technology (CS & IT) 465 with. The second matrix B contains term frequencies of words that appear in the context window. The third matrix C contains co-occurrence frequencies of words that the co-occurring context words of the target word co-occur with. Then, NMF is applied to the three matrices to factorize each matrix into two non-negative matrices, while the former results are used to initialize the next factorization, as shown in Figure 1. Figure 1. Interleaved NMF algorithm for Latent Semantic WSD Given a non-negative matrices A, B and C in the beginning of this method, matrices W, H, G and F are initialized randomly with non-negative values. Then, it decomposes the matrix A into the matrices W and H using NMF. In the decomposition of the matrix B, the updated matrix W is copied to matrix V and the updated matrices V and G is computed using NMF. In the decomposition of the matrix C, the transpose of the updated matrix G is copied to matrix U and the updated matrices U and F are obtained using NMF. At the last step of the iteration, the matrix F is copied to matrix H. This iteration is repeated until the maximum number of iterations is reached or the objective function of all NMF decomposition no longer improves. In order to perform this method for WSD, it needs to fold each sense of the target word into semantic space using the matrix H. For each sense in training data, centroid vector c is calculated and this centroid is mapped into the semantic space using the matrix H as follows: b = cH T (4) For input example of the target word, its context words are extracted to construct a vector f and the vector f is also mapped into the same semantic space using the matrix G as follows: d = fG T (5) Then, cosine similarity between the vector d and each of the sense vectors b are calculated and the sense that is the largest cosine similarity is selected.
  • 4. 466 Computer Science & Information Technology (CS & IT) 3.2. Latent Semantic WSD Using Global Co-occurrence Information This Latent Semantic WSD method is efficient for finding a reduced semantic space. However, problem arises when we apply this method. When we calculate the third matrix C, this method tends to produce very sparse co-occurrence matrix from a small training set. To obtain a large number of co-occurrence relations, I propose to use co-occurrence frequencies of dependency relations between word features in the whole training set. This enables us to solve data sparseness problem and induce more effective latent features. Like the above method, the proposed method needs three matrices, but the third matrix is different from the previous method. The third matrix D contains co-occurrence frequency of context words that co-occur in dependency relations to context words in a large document set. The proposed method induces latent features for these three matrices A, B and D. 4. EXPERIMENT To evaluate the efficiency of the proposed WSD method using the global Co-occurrence information, I conduct some experiments to compare with the result of the existing methods. In this section, I describe an outline of the experiments. 4.1. Data I used the Semeval-2010 Japanese WSD task data set, which includes 50 target words comprising 22 nouns, 23 verbs, and 5 adjectives [2]. In this data set, there are 50 training and 50 test instances for each target word. Moreover, to obtain a large number of co-occurrence relations, I use 22,832 documents chosen from the Japanese BCCWJ corpus1. 4.2. Evaluation Method 4.2.1. Baseline System 1 As the first baseline method, I only use the first matrix A described in section 3.1. To construct the matrix A, I represent each sentence with the target word in the training set as a highdimensional vector where each component represents the co-occurrence frequency of the target word in the sentence. Then, NMF is applied to the matrix A to factorize each matrix into two nonnegative matrices W and H. Each vector is tagged with the sense of the target word in that sentence. So centroid ci of the co-occurrence vectors that are assigned the same sense i is calculated and each centroid ci is mapped into the semantic space using the matrix H as follows: bi = ci H T (6) For input example of the target word, its context words are extracted to construct a vector f and the vector f is also mapped into the same semantic space using the matrix H as follows: d = fH T (7) Then, cosine similarity between the vector d and each of the sense vectors b are calculated and the sense that is the largest cosine similarity is selected. 1 http://www.ninjal.ac.jp/english/products/bccwj/
  • 5. Computer Science & Information Technology (CS & IT) 467 4.2.2. Baseline System 2 In the second baseline system, I use the latent semantic WSD using local co-occurrence information described in Section 3.1. I construct the three matrices A, B and C to induce latent semantic dimensions using NMF. Table 1. Experimental Results of Each Execution (highest precision rates among all the run are written in bold font) System Baseline System 1 Baseline System 2 Proposed Method Run 1 53.28% 59.68% 60.44% Run 2 53.88% 61.08% 61.48% Run 3 51.80% 61.08% 61.04% 62.00% 61.00% 60.00% 59.00% 58.00% 57.00% 56.00% 55.00% 54.00% 53.00% 52.00% Baseline 1 Baseline 2 proposed method Figure 2. Highest Precision of Each system 5. EXPERIMENTAL RESULTS Figure 2 shows the experimental results of the baseline methods and the proposed method. Computational experience reported shows that the choice of initial point is quite important to the NMF's goal. In practice, the algorithms are run several times with different initial points and the NMF is chosen as the feasible solution. In our experiments, each method is executed three times and average precision of all execution is calculated. In this Figure 2, the proposed method shows higher precision than the other baseline methods so that this approach is effective for word sense disambiguation. In comparison with the baseline system 1, the proposed method can obtain better precision so that it is effective for WSD to use context information and co-occurrence information. In comparison with the baseline system 2, the proposed method provides slightly better precision than the baseline system 2. As shown in Table 1, the proposed method can obtain the highest precision and can be stable at high precision value. However, the baseline system 2 cannot achieve stable precision because of the lack of the number of co-occurrence information. Therefore, the proposed method is effective for obtaining a stable effect by analyzing the global co-occurrence information.
  • 6. 468 Computer Science & Information Technology (CS & IT) ACKNOWLEDGEMENT In this paper, I propose a novel word sense disambiguation method based on the global cooccurrence information using NMF. To evaluate the efficiency of the method of word sense disambiguation, I conduct some experiments to compare with the result of the two baseline methods. The results of the experiments show this method is effective for word sense disambiguation in comparison with the all baseline methods. Moreover, the proposed method is effective for obtaining a stable effect by analyzing the global co-occurrence information. Further work would be required to consider a larger sized training data to obtain a large amount of co-occurrence information. REFERENCES [1] [2] [3] [4] [5] Corinna Cortes & Vladimir Vapnik, (1995) “Support-Vector Networks”, Machine Learning, Vol. 20, No. 3, pp273-297. Okumura, Manabu & Shirai, Kiyoaki & Komiya, Kanako & Yokono, Hikaru, (2010) “SemEval-2010 task: Japanese WSD”, Proceedings of the 5th International Workshop on Semantic Evaluation, pp6974. Van de Cruys, Tim & Apidianaki, Marianna, (2011) “Latent Semantic Word Sense Induction and Disambiguation”, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp1476-1485. Lee, Daniel D. & Seung, Sebastian, (2001) “Algorithms for Non-negative Matrix Factorization”, Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference, MIT Press, pp556–562. Witten, Ian H. & Moffat, Alistair & Bell, Timothy C., (1999) “Managing Gigabytes (second edition): Compressing and Indexing Documents and Images”, Morgan Kaufmann Publishers Inc. AUTHORS Minoru Sasaki: received his B.E., M.E. and D.Eng. degrees in information science and intelligent systems from the University of Tokushima in 1996, 1998 and 2000. He is now a lecturer in the department of computer and information sciences at Ibaraki University. His research interests include natural language processing and information, information retrieval and text min ing.