Creating separable encodings of images
In Figure 11.1, you can see an example of images from the CIFAR-10 dataset, along with an example of an early VAE algorithm that can generate fuzzy versions of these images based on a random number input:

Figure 11.1: CIFAR-10 sample (left), VAE (right)2
More recent work on VAE networks has allowed these models to generate much better images, as you will see later in this chapter. To start, let’s revisit the problem of generating MNIST digits and how we can extend this approach to more complex data.
Early successes in using neural networks for image generation relied upon an architecture known as a Restricted Boltzmann Machine (RBM). An RBM model in essence involves learning the posterior probability distribution for images () given some latent “code” (
), represented by the hidden layer(s) of the network, the “marginal likelihood”3 of
.
We can see as being an “encoding”...