A Note on Spelling Correction Methods based upon Statistical Decision Theory Yasunari MAEDA (Kitami Institute of Technology) Hideki YOSHIDA (Kitami Institute of Technology) Yoshitaka FUJIWARA (Kitami Institute of Technology) Toshiyasu MATSUSHIMA (Waseda University)
  Topics 1. Introduction 2. Definitions and Previous Research 3. Spelling Correction Methods based upon  Statistical Decision Theory 3.1. Evaluating an Error Rate per Sentence 3.2. Evaluating an Error Rate per Word 4. Conclusion
  1. Introduction decoding problem in coding theory spelling correction problem in natural language processing discrete  memoryless  channel discrete  memoryless  channel a string of codes ( codes + noises ) a string of words a string of alphabets ( words +  ) a string of alphabets spelling misses
  2. Definitions and Previous Research an alphabet a set of alphabets a word (a string of alphabets) a set of words a probability of an event that word  occurs a probability of an event that word  occurs next to word parameters true parameters (unknown) a probability of an event that alphabet  is received as alphabet  through DMC
  2. Definitions and Previous Research data for learning  and     (  are known) the number of data the  th data in the number of words in the  th sentence the  th sentence in received string of alphabets when  is transmitted the  th word in  , received string of alphabets when  is transmitted the  th alphabet in the  th word  length(the number of alphabets) of the  th word
  2. Definitions and Previous Research a sentence  occurs a sentence  is received as a string . . (1) (2)
  2. Definitions and Previous Research the new sentence  occurs is received as a string the  th word in  the  th alphabet in  a received alphabet when  is transmitted . (4) . (3) a new sentence (unknown) a received string when the new sentence  is transmitted (known)
  2. Definitions and Previous Research spelling correction problem  estimating the new sentence  under the conditon that the learning data  and the new received string are given
  2. Definitions and Previous Research previous research  spelling correction problem is divided into two problems.  1) estimating the unknown parameters 2) estimating the new sentence  Maximum Likelihood Estimate(MLE) is used.  There is no theoretical guarantee when the number of data for learning is finite. . (5)
  3. Spelling Correction Methods based upon  Statistical Decision Theory 1) Byaes optimal method 2) approximate method minimizes an error rate with reference to a Bayes criteiron when the number of data for learning is finite  We treat spelling correction problem as one problem based upon statistical decision theory.  reduces the computational complexity 2 types of error rates 3.1. Evaluating an error rate per sentence 3.2. Evaluating an error rate per word
  3.1. Evaluating an Error Rate per Sentence Loss function where  is a decision function which returns an estimate of risk function . (7) (6)
  3.1. Evaluating an Error Rate per Sentence Bayes risk where  are prior density functions for  . Bayes optimal decision where  , (8) , (9) The error rate per sentence is minimized with reference to the Bayes criterion.
  3.1. Evaluating an Error Rate per Sentence A Direchlet distribution is used as the prior density for  .  are the numbers of times that is received as  in  . is the parameter of the Direchlet distribution for  . , (10) where
  3.1. Evaluating an Error Rate per Sentence depth Bayes optimal solution can be calculated using a DP(Dynamic Programming) method. DP-tree Each node represents a string of words. e.g. The Bayes optimal solution can be calculated by continuing calculation at each node from the depth of  to  .
  3.1. Evaluating an Error Rate per Sentence The Bayes optimal solution can be calculated by continuing calculation at each node from the depth of  to  . calculation of each node at the depth of is expected probability of  . The Bayes optimal solution can be calculated. The computational complexity of Bayes optimal solution is proportional to the number of nodes in the DP-tree. And it is an exponential order on  . , (11) where
  3.1. Evaluating an Error Rate per Sentence Approxomate method Predictive distributions calculated by using the posterior density  are used as estimates of parameters.  e.g. is the number of times that  occurs next to in  , is a parameter of the Direchlet distribution for  . , (12) where
  3.1. Evaluating an Error Rate per Sentence The approxomate method is equal to a Viterbi algorithm. e.g. time time time time time trellis diagram metric of time The approximate solution can be calculated by continuing calculation at each node from time  to  . . (13) . (14) The computational complexity is proportional to  .
  3.2. Evaluating an Error Rate per Word Loss function where  is a decision function which returns an estimate of  . Bayes optimal decision . (16) (15) The computational complexity is an exponential order on  .
  3.2. Evaluating an Error Rate per Word Approxomate method Predictive distributions calculated by using the posterior density  are used as estimates of parameters.  The approxomate method is equal to BCJR algorithm. . (18) (17) .
  3.2. Evaluating an Error Rate per Word time time time time time time time approximate solution . (21) . (20) . (19) The computational complexity is proportional to  .
  4. Conclusion We studied the spelling correction problem based upon statistical decision theory.  We studied two types of error rates. the error rate per sentence the error rate per word We proposed Bayes optimal methods which minimize an  error rate with reference to the Bayes criterion. We also proposed approximate methods. As further works, we want to study properties of the proposed approximate methods. And we also want to apply statistical decision theory to other tasks in natural language processing and so on.

ma52006id386

  • 1.
    A Note onSpelling Correction Methods based upon Statistical Decision Theory Yasunari MAEDA (Kitami Institute of Technology) Hideki YOSHIDA (Kitami Institute of Technology) Yoshitaka FUJIWARA (Kitami Institute of Technology) Toshiyasu MATSUSHIMA (Waseda University)
  • 2.
      Topics 1.Introduction 2. Definitions and Previous Research 3. Spelling Correction Methods based upon Statistical Decision Theory 3.1. Evaluating an Error Rate per Sentence 3.2. Evaluating an Error Rate per Word 4. Conclusion
  • 3.
      1. Introductiondecoding problem in coding theory spelling correction problem in natural language processing discrete memoryless channel discrete memoryless channel a string of codes ( codes + noises ) a string of words a string of alphabets ( words + ) a string of alphabets spelling misses
  • 4.
      2. Definitionsand Previous Research an alphabet a set of alphabets a word (a string of alphabets) a set of words a probability of an event that word occurs a probability of an event that word occurs next to word parameters true parameters (unknown) a probability of an event that alphabet is received as alphabet through DMC
  • 5.
      2. Definitionsand Previous Research data for learning and     ( are known) the number of data the th data in the number of words in the th sentence the th sentence in received string of alphabets when is transmitted the th word in , received string of alphabets when is transmitted the th alphabet in the th word length(the number of alphabets) of the th word
  • 6.
      2. Definitionsand Previous Research a sentence occurs a sentence is received as a string . . (1) (2)
  • 7.
      2. Definitionsand Previous Research the new sentence occurs is received as a string the th word in the th alphabet in a received alphabet when is transmitted . (4) . (3) a new sentence (unknown) a received string when the new sentence is transmitted (known)
  • 8.
      2. Definitionsand Previous Research spelling correction problem estimating the new sentence under the conditon that the learning data and the new received string are given
  • 9.
      2. Definitionsand Previous Research previous research spelling correction problem is divided into two problems. 1) estimating the unknown parameters 2) estimating the new sentence Maximum Likelihood Estimate(MLE) is used. There is no theoretical guarantee when the number of data for learning is finite. . (5)
  • 10.
      3. SpellingCorrection Methods based upon Statistical Decision Theory 1) Byaes optimal method 2) approximate method minimizes an error rate with reference to a Bayes criteiron when the number of data for learning is finite We treat spelling correction problem as one problem based upon statistical decision theory. reduces the computational complexity 2 types of error rates 3.1. Evaluating an error rate per sentence 3.2. Evaluating an error rate per word
  • 11.
      3.1. Evaluatingan Error Rate per Sentence Loss function where is a decision function which returns an estimate of risk function . (7) (6)
  • 12.
      3.1. Evaluatingan Error Rate per Sentence Bayes risk where are prior density functions for . Bayes optimal decision where , (8) , (9) The error rate per sentence is minimized with reference to the Bayes criterion.
  • 13.
      3.1. Evaluatingan Error Rate per Sentence A Direchlet distribution is used as the prior density for . are the numbers of times that is received as in . is the parameter of the Direchlet distribution for . , (10) where
  • 14.
      3.1. Evaluatingan Error Rate per Sentence depth Bayes optimal solution can be calculated using a DP(Dynamic Programming) method. DP-tree Each node represents a string of words. e.g. The Bayes optimal solution can be calculated by continuing calculation at each node from the depth of to .
  • 15.
      3.1. Evaluatingan Error Rate per Sentence The Bayes optimal solution can be calculated by continuing calculation at each node from the depth of to . calculation of each node at the depth of is expected probability of . The Bayes optimal solution can be calculated. The computational complexity of Bayes optimal solution is proportional to the number of nodes in the DP-tree. And it is an exponential order on . , (11) where
  • 16.
      3.1. Evaluatingan Error Rate per Sentence Approxomate method Predictive distributions calculated by using the posterior density are used as estimates of parameters. e.g. is the number of times that occurs next to in , is a parameter of the Direchlet distribution for . , (12) where
  • 17.
      3.1. Evaluatingan Error Rate per Sentence The approxomate method is equal to a Viterbi algorithm. e.g. time time time time time trellis diagram metric of time The approximate solution can be calculated by continuing calculation at each node from time to . . (13) . (14) The computational complexity is proportional to .
  • 18.
      3.2. Evaluatingan Error Rate per Word Loss function where is a decision function which returns an estimate of . Bayes optimal decision . (16) (15) The computational complexity is an exponential order on .
  • 19.
      3.2. Evaluatingan Error Rate per Word Approxomate method Predictive distributions calculated by using the posterior density are used as estimates of parameters. The approxomate method is equal to BCJR algorithm. . (18) (17) .
  • 20.
      3.2. Evaluatingan Error Rate per Word time time time time time time time approximate solution . (21) . (20) . (19) The computational complexity is proportional to .
  • 21.
      4. ConclusionWe studied the spelling correction problem based upon statistical decision theory. We studied two types of error rates. the error rate per sentence the error rate per word We proposed Bayes optimal methods which minimize an error rate with reference to the Bayes criterion. We also proposed approximate methods. As further works, we want to study properties of the proposed approximate methods. And we also want to apply statistical decision theory to other tasks in natural language processing and so on.