Max Entropy

Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06

History ,[object Object],[object Object],[object Object]

Outline ,[object Object],[object Object],[object Object],[object Object]

Reference papers ,[object Object],[object Object],[object Object],[object Object],[object Object]

The basic idea ,[object Object],[object Object]

Setting ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Features in POS tagging (Ratnaparkhi, 1996) context (a.k.a. history) allowable classes

Maximum Entropy ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Ex1: Coin-flip example (Klein & Manning 2003) ,[object Object],[object Object],[object Object],[object Object],p1 H p1=0.3

Coin-flip example (cont) p1 p2 H p1 + p2 = 1 p1+p2=1.0, p1=0.3

Ex2: An MT example (Berger et. al., 1996) Possible translation for the word “in” is: Constraint: Intuitive answer:

An MT example (cont) Constraints: Intuitive answer:

An MT example (cont) Constraints: Intuitive answer: ??

Ex3: POS tagging (Klein and Manning, 2003)

Ex4: overlapping features (Klein and Manning, 2003)

Modeling the problem ,[object Object],[object Object],[object Object]

Features ,[object Object],[object Object],[object Object],[object Object]

Some notations Finite training sample of events: Observed probability of x in S: The model p’s probability of x: Model expectation of : Observed expectation of : The j th feature: (empirical count of )

Constraints ,[object Object],[object Object]

Training data  observed events

Restating the problem The task: find p* s.t. where Objective function: -H(p) Constraints: Add a feature

Questions ,[object Object],[object Object],[object Object],[object Object],[object Object]

What is the form of p*? (Ratnaparkhi, 1997) Theorem: if then Furthermore, p* is unique.

Using Lagrangian multipliers Minimize A(p):

Relation to Maximum Likelihood The log-likelihood of the empirical distribution as predicted by a model q is defined as Theorem: if then Furthermore, p* is unique.

Summary (so far) The model p* in P with maximum entropy is the model in Q that maximizes the likelihood of the training sample Goal: find p* in P, which maximizes H(p). It can be proved that when p* exists it is unique.

Summary (cont) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Algorithms ,[object Object],[object Object]

GIS: setup ,[object Object],[object Object],[object Object],Let Add a new feature f k+1 :

GIS algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],where

Approximation for calculating feature expectation

Properties of GIS ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

IIS algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Calculating If Then GIS is the same as IIS Else must be calcuated numerically.

Feature selection ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Notation The gain in the log-likelihood of the training data: After adding a feature: With the feature set S:

Feature selection algorithm (Berger et al., 1996) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object], Problem: too expensive

Approximating gains (Berger et. al., 1996) ,[object Object]

Training a MaxEnt Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

POS tagging (Ratnaparkhi, 1996) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Using a MaxEnt Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Training step 1: define feature templates History h i Tag t i

Step 2: Create feature set ,[object Object],[object Object]

Step 3: determine the feature weights ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Decoding: Beam search ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Decoding (cont) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Comparison with other learners ,[object Object],[object Object],[object Object]

MaxEnt Summary ,[object Object],[object Object],[object Object],[object Object],[object Object]

Max Entropy

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (19)

Similar to Max Entropy

Similar to Max Entropy (20)

Recently uploaded

Recently uploaded (20)

Max Entropy