The ELBO is a lower bound on the log-likelihood that is especially useful in variational inference, where inference (computing probabilities) is casted as an optimization problem.

The most standard derivation of the ELBO comes from Jensen's inequality. We have:

\begin{align*} \log p(x) &= \log \int_Z p(x, z) dz \\ &= \log \int_Z q(z) \frac{p(x,z)}{q(z)} dz \\ &= \log \mathbb{E}_{z \sim q(z)} \frac{p(x,z)}{q(z)} \\ &\geq \mathbb{E}_{z \sim q(z)} \log \frac{p(x,z)}{q(z)} \\ &= \mathbb{E}_{z \sim q(z)} \log \frac{p(x|z)p(z)}{q(z)} \\ &= \mathbb{E}_{z \sim q(z)} \log p(x|z) + \mathbb{E}_{z \sim q(z)} \log \frac{p(z)}{q(z)} \\ &= \mathbb{E}_{z \sim q(z)} \log p(x|z) - \mathbb{E}_{z \sim q(z)} \log \frac{q(z)}{p(z)} \\ &= \mathbb{E}_{z \sim q(z)} \log p(x|z) - D_{KL}\left(q(z) || p(z)\right) \\ \end{align*}

Where the inequality comes from Jensen's inequality, and the rest is just algebraic rewrites
using definitions. The ELBO is this last expression, which is a lower bound on the log probability of $x$.
Thus, *maximizing* the ELBO by changing the variational distribution $q(z)$ makes it a tighter bound.

Some things to note. First, $x$ is a free variable in the expression above, which means that
the bound holds *for all* $x$. Also, to evaluate it, all we need is to be able to sample from $q(z)$
(which is typically defined as a simple distribution, like a Gaussian, of known density),
to evaluate $p(x|z)$, which is given by the generative process we specify to generate $x$ from $z$,
and finally to compute the KL divergence (which, in simple cases, like between two Gaussians,
can be done analytically, and otherwise can be estimated by Monte Carlo samples).