# Auto-Encoding Variatonal Bayes (@ ICLR 2014)

### Diederik P Kingma, Max Welling

This is the paper that introduced the now classical Variational Auto-Encoder.
Not all points of the derivations are clear to me at this point, but the main
idea is easy to understand. The goal is to train a generative model from
an unlabeled dataset (e.g. of images of faces). On the surface, auto-encoders
could be used for that: we train an encoder $\phi(x)$ and a decoder that
reconstructs $x$ from $\phi(x)$. Now if we sample a random vector in $\phi$ space,
and use the decoder, we should get an image. Unfortunately, we do not know what
distribution to sample from, and these representation spaces do not necessarily
behave uniformly. So the idea of the Auto-Encoding Variational Bayes is
to force a certain distribution in the representations (e.g. standard Gaussian).
So what $\phi(x)$ gives is the parameters (mean and variances) of the gaussians
that $x$ was sampled from. Given those, we now sample a vector using those
parameters, and use the decoder to reconstruct the input.

That almost works, except that the sampling step is discrete, and we can't
backpropagate through it. Here comes the famous reparameterization trick.
Instead of using $\phi(x)$ directly as the parameters to sample from, we just
notice that a gaussian with mean $\mu$ and variance $\sigma^2$ can be reparameterized
as a standard gaussian plus $\mu$ times $\sigma$. So now we can sample from
a standard gaussian, and then given $\phi(x)$, we multiply it by $\sigma$ and
add $\mu$ to obtain the input to the decoder. Now we can backpropagate and train
the model end-to-end.

Qualitatively, in the faces dataset, they find features in the representation that
correspond to several interpretable concepts, like the left-right orientation of the
face, and how much is it smiling. The main problem with VAEs is that it's been harder
to produce high-quality outputs with thems. It seems GANs had way more success on
that end, though one could argue they're less statistically principled than VAEs.