Gabriel Poesia

f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization (@ NeurIPS 2016)

Sebastian Nowozin, Botond Cseke, Ryota Tomioka


Generative Adversarial Nets implicitly minimize the Jensen-Shannon Divergence between the distribution given by the generator and the true data distribution, doing so without likelihoods. One natural question, then, is whether this framework can be applied to other divergence metrics.

This paper shows that the origian GAN is a special case of this more general family of f-GANs, which can optimize any f-divergence. An f-divergence between two distributions is any function given by:

\(D_f(p||q) = \int_X q(x) f\left(\frac{p(x)}{q(x)} \right) dx \)

where $f$ is required to be convex, lower semicontinuous (all discontinuities must "go up"), and $f(1) = 0$ (which guarantees that the divergence is 0 when $p = q$). Different $f$ recover different divergences, including the KL, the reverse KL, Jensen-Shannon, and many others.

Note that directly using the above formula requires computing $p(x)$, which we don't want. However, using the fact that $f$ is convex lets us obtain a lower bound to the divergence by instead using the convex conjugate (or Fenchel conjugate) of $f$, defined as:

\(f^*(t) = \sup_{u \in dom(x)} \{ut - f(u)\}\)

It can be shown that $(f^*)^* = f$ for convex $f$. By replacing $f$ by $(f^*)^*$ above, and lower bounding the supremum by a parameterized function (that we'll later optimize over) $T(x)$, we get:

\(D_f(p||q) \geq \sup_{T} \left[ \mathbb{E}_{x \sim P} T(x) - \mathbb{E}_{x \sim Q} f^*(T(x)) \right] \)

Here, $Q$ is our generator, and $T$ plays the role of the discriminator in the original GAN.