# "Why Should I Trust You?": Explaining the Predictions of Any Classifier (@ KDD 2016)

### Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin

Suppose the black-box model maps inputs $\mathcal{X}$ (e.g. texts) into labels $\mathcal{Y}$ (e.g. categories for text classification). The idea of LIME is to map each input into another set of "explanations" (for text, that can be the binary presence/absence of words). Then, given an input, LIME samples a bunch of explanations both near and far away from that input, uses those explanations to transform the input back, then fits a simple interpretable model that (1) is locally faithful, i.e. to accurately predict what happens when you change the input, and (2) is simple (i.e. minimizes the number of words in the explanation). The simple interpretable model might be, for example, a shallow decision tree or sparse a linear classifier with binary weights.