Variational Autoencoders (VAEs) are a family of latent variable deep generative models, It works by optimizing the Evidence Lower Bound (ELBO) with an encoder-decoder architecture. In the ELBO, $q(z)$ is defined as $q_\phi(z|x)$, i.e. an amortized model that is encoded by a neural network that maps $x$ to variational parameters $z$, and then $p_\theta(x|z)$ is the decoder, mapping $z$ back to $x$. The process is basically to (1) sample a data point from our dataset, which approximates $p(x)$, (2) compute the variational parameters $z$ using $q_\phi(z|x)$, sample from the variational distribution, obtaining a sample $z'$, then reconstruct $x$ from $z'$ using $p_\theta(x|z')$, and then optimize the ELBO using gradient updates. The sampling of $z'$ is done via a reparametrization trick, where the noise is taken from a standard gaussian and then multiplied deterministically rescaled with the variational parameters. This has lower variance than using REINFORCE (which you could always do).
There are many variants of the traditional VAE. The $\beta$-VAE puts more weight on the KL term to get disentangled representations. The IWAE gets a tighter bound by taking multiple Monte-Carlo samples. The GMVAE uses a more flexible prior: a gaussian mixture instead of a single gaussian. One can also use VAEs in a semi-supervised setting, where a small number of classification labels are available.