The reparameterization trick
In order to allow us to backpropagate through our autoencoder, we need to transform the stochastic samples of z into a deterministic, differentiable transformation. We can do this by reparameterizing z as a function of a noise variable , which is drawn from a standard normal distribution:
Once we have sampled from the randomness in
no longer depends on the parameters of the variational distribution Q (the encoder), and we can backpropagate from end to end. Our network now looks like Figure 11.7, and we can optimize our objective using random samples of
(for example, a standard normal distribution). This reparameterization moves the “random” node out of the encoder/decoder framework so we can backpropagate through the whole system, but it also has a subtler advantage; it reduces the variance of these gradients. Note that in the un-reparameterized network, the distribution of z depends on the parameters of the encoder distribution...