20. Some notations Finite training sample of events: Observed probability of x in S: The model p’s probability of x: Model expectation of : Observed expectation of : The j th feature: (empirical count of )
28. Relation to Maximum Likelihood The log-likelihood of the empirical distribution as predicted by a model q is defined as Theorem: if then Furthermore, p* is unique.
29. Summary (so far) The model p* in P with maximum entropy is the model in Q that maximizes the likelihood of the training sample Goal: find p* in P, which maximizes H(p). It can be proved that when p* exists it is unique.