Variational autoencoder

What is a variational autoencoder?

A variational autoencoder is an autoencoder whose training is regularized for the purpose of preventing overfitting and making sure that the latent space possesses good properties that enable generative process.

It is a generative system and serves a purpose similar to that of a generative adversarial network.

Similar to a standard autoencoder, a variational autoencoder is essentially an architecture that consists of an encoder as well as a decoder. It is trained to minimise the reconstruction error between the encoded-decoded data and the initial data. But, for the purpose of introducing a little regularisation of the latent space, there is a small adjustment made to the encoding-decoding process. Rather than encoding an input as a single point, it is encoded as a distribution over the latent space.

Here is how the model is trained:

It starts by encoding the input as distribution over the latent space
After that, a point from the latent space is sampled from that distribution
Next, the point that was sampled gets decoded and the reconstruction error is computed.
Then, the reconstruction error is backpropagated through the network.

The input is encoded as a distribution with a bit of variance rather than a single point so that it’s possible to naturally express the latent space regularisation. The encoder returns the distributions which are enforced to be close to a standard normal distribution.

The loss function that is minimized is made up of a reconstruction term on the final layer which makes the encoding-decoding scheme as performant as possible, along with a regularization term on the latent layer which regularizes the organisation of the latent space by virtue of making the distributions returned by the encoder close to a standard normal distribution.

The regularization term is the Kulback-Leibler divergence between the returned distribution and a standard Gaussian.

What is variational autoencoder used for?

A variational autoencoder takes on the challenge of latent space irregularity by causing the encoder to return a distribution over the latent space rather than a single point and by introducing a regularization term to the loss function over that returned distribution to ensure that the latent space is organized in a better manner.

‍

Why are variational autoencoders useful?

Variational autoencoders are extremely useful for the purpose of generative modeling. This is because their latent spaces are designed to be continuous, which makes the processes of random sampling and interpolation very easy.

To do this, instead of making its encoder output an encoding vector of size n, it makes it output two vectors of size n: one vector of means, 𝝻, and another vector of standard deviations, 𝞂.

These make up the parameters of a vector of random variables of length n with the ith element of the 𝝻 and the 𝞂 being the mean and standard deviation of the ith random variable, X i, which is sampled from to get the sampled encoding which is passed ahead to the decoder.

Even though the mean and standard deviations remain constant, the actual encoding is chance on each pass because of the sampling.

The mean vector has control over where the encoding of an input should be centered around, while the standard deviation controls the area, which is the extent to which the encoding can vary from the mean. Since encodings can be generated randomly from anywhere in the distribution, the decoder learns that it is not just a single point in the latent space that refers to a sample of that class, but all the nearby points refer to it as well.

This makes it possible for the encoder to do more than just decoding single, specific encodings in the latent space (leaving the decodable latent space discontinuous). It allows it to go beyond that, decoding ones that vary a bit as well since the decoder was exposed to several variations of the encoding of the same input during training.

Because the model is a certain degree of local variation by varying the encoding of one sample, it results in smooth latent spaces on a local scale (for similar samples). Overlapping between samples that are not too similar should also be possible so that interpolation between classes is possible.

We want encodings which are as close as possible to each other, while remaining distinct to allow smooth interpolation and enable the construction of new samples. This can be forced by adding the Kullback–Leibler divergence (KL divergence) into the loss function.

Minimizing the KL divergence optimizes the probability distribution parameters (μ and σ) to strongly resemble that of the target distribution.

An equilibrium is reached due to the cluster-forming nature of the reconstruction loss, and the dense packing nature of the KL loss. This causes the formation of distinct clusters that the decoder will actually be able to decode and if interpolation is being done, there will be a smooth mix of features that the decoder is able to understand, instead of sudden gaps between clusters.

What is the difference between autoencoder and variational Autoencoder?

An autoencoder accepts inputs, compresses them, and then proceeds to recreate the original data. Since all that is needed is the original data, without labels of known correct results, it is an unsupervised technique.

Autoencoders are primarily used to to compress data to two or three dimensions so it may be graphed and to compress and decompress images or documents, thus eliminating noise in the data.

Variational autoencoders assume that the source data possesses some sort of underlying probability distribution. It then attempts to identify the parameters of the distribution. They are used mainly for the purpose of generating new data that is related to the original source data.

The implementation of a variational autoencoder is far harder than that of a normal encoder.

What is the most crucial drawback of VAEs?

The biggest drawback of variational autoencoders is that they have a tendency to generate blurry, unrealistic outputs. This is related to the manner in which VAEs recover data distributions and calculate loss functions.