# Word Learning as Bayesian Inference (@ Psychological Review 2007)

### Fei Xu, Joshua B. Tenenbaum

The key idea is simple, as in most good papers. Let's say a person is given examples $X$ of the meaning of a word (e.g. three apples, and the word "apple"). Suppose the person is trying to decide between hypotheses in a class $H$ (e.g. each hypothesis might be the set of things in the world for which the word applies). Then,
$$p(h | X) \propto p(X | h) p(h) \enspace .$$
$p(X | h)$ is the probability that, if those are indeed the objects that the word applies to, the examples in $X$ would be independently sampled from that set. This explains why more positive examples make us more confident: basically, if the word "apple" applied to other things (such as pears, or chairs), then it would be too much of a coincidence that three random samples of "apples" just happen to be this same kind of fruit. $p(h)$ is a prior over hypothesis, linked to what we usually use words for. They use a simple hierarchical taxonomy of objects; then, words are most useful when they are not either too general (not very informative) or too specific (not very frequent), so we start from a more middle-ground prior and callibrate from there using the examples.