Your SlideShare is downloading. ×
Maximum likelihood-set - introduction
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Maximum likelihood-set - introduction

391
views

Published on


0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
391
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Language Modeling with the Maximum Likelihood Set (Karakos & Khudanpur, ISIT-2006) http://dx.doi.org/10.1109/ISIT.2006.261575 Yusuke Matsubara Tsujii lab. Meeting 2006-06-22
  • 2. Necessity of smoothing
    • Estimation from small samples
      • Suppose
        • The possible word set is { A , B , C }
        • Maximum likelihood estimation (unigram)
        • The count of word C is accidentally 0 in the corpus
          • Like “ A B A A B ”
      • MLE predicts that C will never occur
        • Even we know C can occur
        • This is underestimation
  • 3. The Maximum Likelihood Set [ Jedynak & Khudanpur 2005 ]
    • Given a set of words and word counts (= a corpus)
    • The true pmf should predict that the given corpus is more probable than any other corpus of the same size
    • MLS contains all such pmf
    p 1 +p 2 +p 3 =1 MLEs from possible counts 1 1 1 0 Sample size=3 #(word set) =3
  • 4. The Maximum Likelihood Set (formal definition) k 2 linear inequality constraints
  • 5. The Maximum Likelihood Set n = 3 (#samples) k = 3 (#word set) n = 10 (#samples) k = 3 (#word set) Larger samples Nearer to MLE
  • 6. Choosing a pmf from a MLS
    • Assume a reference pmf
    • Choose a pmf that minimize KL-divergence to the reference pmf
    reference MLS
  • 7. Conditional pmf estimation
    • Different MLS for each condition
    • In case of trigram language modeling,
      • |V| 2 MLSs
      • And each MLS has |V| 2 constraints
        • However, we can remove many redundant constraints
  • 8. Experimental results
    • Corpus
      • UPenn Treebank
        • Sect 00-22 (900K words) for training
        • Sect 23-24 (100K words) for testing
    • Evaluation
      • Code length of a word (entropy)
    • Optimization Solver
      • CFSQP
        • Linear constraints & differentiable objective function
    Bigram Trigram Witten-Bell Kneser-Ney Witten-Bell Kneser-Ney Reference 8.47 8.36 8.21 8.08 MLS 8.44 8.38 8.24 8.12
  • 9. Conclusion
    • MLS has competitive performance
    • It can incorporate prior knowledge as a reference pmf
    • And additional good properties are proven
      • Consistent estimation:
        • MLS={MLE} under #samples = ∞
      • Faithful to the counts : c i <c j  p(i) ≤ p(j)
  • 10. (0, 0, 1) (0, 1/3, 2/3) (0, 2/3, 1/3) (0, 1, 0) (0, 2/3, 1/3) (0, 1/3, 2/3) (0, 0, 1) (2/3, 0, 1/3) (1/3, 0, 2/3) (1/3, 1/3, 1/3)
  • 11.  
  • 12.  
  • 13.