Gabriel Poesia

A maximum entropy approach to natural language processing (@ Computational Linguistics 1996)

Adam L. Berger, Vincent J. Della Pietra, Stephen A. Della Pietra

Link

This paper is a classical in natural language processing, having introduced the maximum entropy method into the field. The principle of maximum entropy is simple: when fitting a probabilistic model to a data set, we want our model to be consistent with the observed data (which can be formulated with an equality constraint), but otherwise have the most uncertainty (which gives an objective function: to maximize entropy). Thus, to fit a model, we can aim to maximize its entropy subject to the constraint that the model's statistics equal that of the data. The paper then suggests using the dual formulation of this constrained problem using Lagrange multipliers, which is unconstrained and thus easier to optimize.

I think maximum entropy has found its way into inverse reinforcement learning as well.