2. What are Autoencoders?
• An autoencoder neural network is an Unsupervised Machine
learning algorithm that applies backpropagation, setting the target
values to be equal to the inputs.
• Autoencoders are used to reduce the size of our inputs into a smaller
representation.
• If anyone needs the original data, they can reconstruct it from the
compressed data.
3. Autoencoder
• An autoencoder is an unsupervised machine learning algorithm that takes an image
as input and tries to reconstruct it using fewer number of bits from the bottleneck
also known as latent space.
• The image is majorly compressed at the bottleneck.
• The compression in autoencoders is achieved by training the network for a period of
time and as it learns it tries to best represent the input image at the bottleneck.
• The general image compression algorithms like JPEG and JPEG lossless
compression techniques compress the images without the need for any kind of
training and do fairly well in compressing the images.
• Autoencoders are similar to dimensionality reduction techniques like Principal
Component Analysis (PCA).
• They project the data from a higher dimension to a lower dimension using linear
transformation and try to preserve the important features of the data while
removing the non-essential parts.
4. • However, the major difference between autoencoders and PCA lies in the transformation
part: as you already read, PCA uses linear transformation whereas autoencoders use non-
linear transformations.
• Now that you have a bit of understanding about autoencoders, let's now break this term
and try to get some intuition about it!
• The above figure is a two-layer vanilla autoencoder with one hidden layer.
• In deep learning terminology, you will often notice that the input layer is never taken into
account while counting the total number of layers in an architecture.
• The total layers in an architecture only comprises of the number of hidden layers and the
output layer.
5. • As shown in the image above, the input and output layers have the same number
of neurons.
• Let's take an example. You feed an image with just five pixel values into the
autoencoder which is compressed by the encoder into three pixel values at the
bottleneck (middle layer) or latent space.
• Using these three values, the decoder tries to reconstruct the five pixel values or
rather the input image which you fed as an input to the network.
6. The need of Autoencoders
A similar machine learning algorithm ie. PCA which does the same task. But,
autoencoders are preferred over PCA because
• An autoencoder can learn non-linear transformations
with a non-linear activation function and multiple layers.
• It doesn’t have to learn dense layers. It can use
convolutional layers to learn which is better for video,
image and series data.
• It is more efficient to learn several layers with an
autoencoder rather than learn one huge transformation
with PCA.
• An autoencoder provides a representation of each layer
as the output.
• It can make use of pre-trained layers from another model
to apply transfer learning to enhance the
encoder/decoder.
7. Need of Autoencoders
• Used for Non Linear Dimensionality Reduction. Encodes input in the hidden layer
to a smaller dimension compared to the input dimension. Hidden layer is later
decoded as output. Output layer has the same dimension as input. Autoencoder
reduces dimensionality of linear and nonlinear data hence it is more powerful than
PCA.
• Used in Recommendation Engines. This uses deep encoders to understand user
preferences to recommend movies, books or items
• Used for Feature Extraction : Autoencoders tries to minimize the reconstruction error.
In the process to reduce the error, it learns some of important features present in the
input. It reconstructs the input from the encoded state present in the hidden layer.
Encoding generates a new set of features which is a combination of the original
features. Encoding in autoencoders helps to identify the latent features presents in the
input data.
• Image recognition : Stacked autoencoder are used for image recognition. We can use
multiple encoders stacked together helps to learn different features of an image.
9. Image Coloring
Autoencoders are used for converting any black and white picture into
a colored image.
Depending on what is in the picture, it is possible to tell what the color
should be.
10. Feature Variation
It extracts only the required features of an image and generates the
output by removing any noise or unnecessary interruption.
11. Dimensionality Reduction
The reconstructed image is the same as our input but with reduced
dimensions. It helps in providing the similar image with a reduced pixel
value.
12. Denoising Image
The input seen by the autoencoder is not the raw input but a
stochastically corrupted version. A denoising autoencoder is thus
trained to reconstruct the original input from the noisy version.
13. Watermark Removal
It is also used for removing watermarks from images or to remove any
object while filming a video or a movie.
Now that you have an idea of the different industrial applications of
Autoencoders, let’s continue our article and understand the complex
architecture of Autoencoders.
15. Autoencoder consist of three layers:
• Encoder
• Code
• Decoder
Encoder: This part of the network compresses the input into a latent space
representation. The encoder layer encodes the input image as a compressed
representation in a reduced dimension. The compressed image is the distorted
version of the original image.
Code: This part of the network represents the compressed input which is fed to
the decoder.
Decoder: This layer decodes the encoded image back to the original dimension.
The decoded image is a lossy reconstruction of the original image and it is
reconstructed from the latent space representation.
16. The layer between the encoder and decoder, ie. the
code is also known as Bottleneck. This is a well-
designed approach to decide which aspects of
observed data are relevant information and what
aspects can be discarded. It does this by balancing two
criteria :
• The compactness of representation, measured as the
compressibility.
• It retains some behaviorally relevant variables from
the input.
Now that you have an idea of the architecture of an
Autoencoder. Let’s continue our article and understand
the different properties and the Hyperparameters
involved while training Autoencoders.
17. Autoencoder Components
• Autoencoders consists of 4 main parts:
– Encoder: In which the model learns how to reduce the input dimensions
and compress the input data into an encoded representation.
– Bottleneck: which is the layer that contains the compressed
representation of the input data. This is the lowest possible dimensions
of the input data.
– Decoder: In which the model learns how to reconstruct the data from the
encoded representation to be as close to the original input as possible.
– Reconstruction Loss: This is the method that measures how well the decoder is
performing and how close the output is to the original input.
• The training then involves using back propagation in order to minimize the network’s
reconstruction loss.
18. How does Autoencoders work?
• We take the input, encode it to identify latent feature representation. Decode the
latent feature representation to recreate the input.
• We calculate the loss by comparing the input and output.
• To reduce the reconstruction error we back propagate and update the weights.
• Weight is updated based on how much they are responsible for the error.
• In our example, we have taken the dataset for products bought by customers.
• Step 1: Take the first row from the customer data for all products bought in an array as
the input. 1 represent that the customer bought the product. 0 represents that the
customer did not buy the product.
• Step 2: Encode the input into another vector h. h is a lower dimension vector than the
input. We can use sigmoid activation function for h as the it ranges from 0 to 1. W is the
weight applied to the input and b is the bias term.
h=f(Wx+b)
19.
20. • Step 3: Decode the vector h to recreate the input. Output will be of same dimension as the
input.
• Step 4 : Calculate the reconstruction error L. Reconstruction error is the difference between
the input and output vector. Our goal is to minimize the reconstruction error so that output is
similar to the input vector
• Reconstruction error= input vector — output vector
• Step 5: Back propagate the error from output layer to the input layer to update the weights.
Weights are updated based on how much they were responsible for the error.
• Learning rate decides by how much we update the weights.
• Step 6: Repeat step 1 through 5 for each of the observation in the dataset. Weights
are updated after each
• Step 7: Repeat more epochs. Epoch is when all the rows in the dataset has passed through the
neural network.
22. Properties of Autoencoders:
• Data-specific: Autoencoders are only able to compress data similar to
what they have been trained on.
• Lossy: The decompressed outputs will be degraded compared to the
original inputs.
• Learned automatically from examples: It is easy to train specialized
instances of the algorithm that will perform well on a specific type of
input.
24. Code size: It represents the number of nodes in the middle layer.
Smaller size results in more compression.
Number of layers: The autoencoder can consist of as many layers as we
want.
Number of nodes per layer: The number of nodes per layer decreases
with each subsequent layer of the encoder, and increases back in the
decoder. The decoder is symmetric to the encoder in terms of the layer
structure.
Loss function: We either use mean squared error or binary cross-
entropy. If the input values are in the range [0, 1] then we typically use
cross-entropy, otherwise, we use the mean squared error.
26. Undercomplete Autoencoders
– Goal of the Autoencoder is to capture the most
important features present in the data.
– Undercomplete autoencoders have a smaller dimension
for hidden layer compared to the input layer. This helps to
obtain important features from the data.
– Objective is to minimize the loss function by penalizing
the g(f(x)) for being different from the input x.
– When decoder is linear and we use a mean squared
error loss function then undercomplete autoencoder
generates a reduced feature space similar to PCA.
– We get a powerful nonlinear generalization of PCA
when encoder function f and decoder function g are
non linear.
– Undercomplete autoencoders do not need any
regularization as they maximize the probability of data
rather than copying the input to the output.
27. Convolution Autoencoders
• Autoencoders in their traditional formulation does not take into account the fact that a signal can be
seen as a sum of other signals.
• Convolutional Autoencoders use the convolution operator to exploit this observation.
• They learn to encode the input in a set of simple signals and then try to reconstruct the input from
them, modify the geometry or the reflectance of the image.
Use cases of CAE:
• Image Reconstruction
• Image Colorization
• latent space clustering
• generating higher resolution images
28. Sparse Autoencoders
– Sparse autoencoders have hidden nodes greater than input
nodes. They can still discover important features from the data.
– Sparsity constraint is introduced on the hidden layer. This is to
prevent output layer copy input data.
– Sparse autoencoders have a sparsity penalty, Ω(h), a value close
to zero but not zero. Sparsity penalty is applied on the hidden
layer in addition to the reconstruction error. This prevents
overfitting.
– Sparse autoencoders take the highest activation values in the
hidden layer and zero out the rest of the hidden nodes. This
prevents autoencoders to use all of the hidden nodes at a time
and forcing only a reduced number of hidden nodes to be used.
– As we activate and inactivate hidden nodes for each row in the
dataset. Each hidden node extracts a feature from the data
29. Denoising Autoencoders(DAE)
– Denoising refers to intentionally adding noise to the raw input before providing
it to the network. Denoising can be achieved using stochastic mapping.
– Denoising autoencoders create a corrupted copy of the input by introducing
some noise. This helps to avoid the autoencoders to copy the input to the
output without learning features about the data.
– Corruption of the input can be done randomly by making some of the input as
zero. Remaining nodes copy the input to the noised input.
– Denoising autoencoders must remove the corruption to generate an output
that is similar to the input. Output is compared with input and not with
noised input. To minimize the loss function we continue until convergence
– Denoising autoencoders minimizes the loss function between the output node
and the corrupted input.
– Denoising helps the autoencoders to learn the latent representation present
in the data. Denoising autoencoders ensures a good representation is one that
can be derived robustly from a corrupted input and that will be useful for
recovering the corresponding clean input.
– Denoising is a stochastic autoencoder as we use a stochastic corruption
process to set some of the inputs to zero.
30. Contractive Autoencoders(CAE)
• Contractive autoencoder(CAE) objective is to have a robust learned
representation which is less sensitive to small variation in the data.
• Robustness of the representation for the data is done by applying
a penalty term to the loss function. The penalty term is Frobenius
norm of the Jacobian matrix. Frobenius norm of
the Jacobian matrix for the hidden layer is calculated with respect to
input. Frobenius norm of the Jacobian matrix is the sum of square of
all elements.
• Contractive autoencoder is another regularization technique like
sparse autoencoders and denoising autoencoders.
• CAE surpasses results obtained by regularizing autoencoder using
weight decay or by denoising. CAE is a better choice than
denoising autoencoder to learn useful feature extraction.
• Penalty term generates mapping which are strongly contracting
the data and hence the name contractive autoencoder.
31. Stacked Denoising Autoencoders
• Stacked Autoencoders is a neural network with multiple layers of sparse
autoencoders
• When we add more hidden layers than just one hidden layer to an
autoencoder, it helps to reduce a high dimensional data to a smaller code
representing important features
• Each hidden layer is a more compact representation than the last hidden layer
• We can also denoise the input and then pass the data through the stacked
autoencoders called as stacked denoising autoencoders
• In Stacked Denoising Autoencoders, input corruption is used only for initial
denoising. This helps learn important features present in the data. Once the
mapping function f(θ) has been learnt. For further layers we use
uncorrupted input from the previous layers.
• After training a stack of encoders as explained above, we can use the output
of the stacked denoising autoencoders as an input to a stand alone
supervised machine learning like support vector machines or multi class
logistics regression.
32. Deep Autoencoders
• The extension of the simple Autoencoder is the Deep Autoencoder.
• The first layer of the Deep Autoencoder is used for first-order features in the raw input.
• The second layer is used for second-order features corresponding to patterns in the
appearance of first-order features.
• Deeper layers of the Deep Autoencoder tend to learn even higher-order features.
A deep autoencoder is composed of two, symmetrical deep-belief networks-
• First four or five shallow layers representing the encoding half of the net.
• The second set of four or five layers that make up the decoding half.
Use cases of Deep Autoencoders
• Image Search
• Data Compression
• Topic Modeling & Information Retrieval (IR)
33. Implementation of Autoencoder : Basic Autoencoder
Load the dataset
To start, you will train the basic autoencoder using the Fashon MNIST
dataset. Each image in this dataset is 28x28 pixels
34. Define an autoencoder with two Dense layers: an encoder, which compresses the images
into a 64 dimensional latent vector, and a decoder, that reconstructs the original image
from the latent space.
35. • Train the model using x_train as both the input and the target.
The encoder will learn to compress the dataset from 784 dimensions to
the latent space, and the decoder will learn to reconstruct the original
images.
• Now that the model is trained, let's test it by encoding and decoding
images from the test set.
36.
37. Example: Image denoising
• An autoencoder can also be trained to remove noise from images. In the following
section, you will create a noisy version of the Fashion MNIST dataset by applying
random noise to each image. You will then train an autoencoder using the noisy image
as input, and the original image as the target.
• Let's reimport the dataset to omit the modifications made earlier.
38. • Adding random noise to the images.
• Plot the noisy images.
39. Define a convolutional autoencoder
• In this example, you will train a convolutional autoencoder using Conv2D layers in the
encoder, and Conv2DTranspose layers in the decoder.
40. • The decoder upsamples the images back from 7x7 to 28x28.
• Plotting both the noisy images and the denoised images produced by the
autoencoder.
41.
42. Example: Anomaly detection
• In this example, you will train an autoencoder to detect anomalies on the ECG5000
dataset. This dataset contains 5,000 Electrocardiograms, each with 140 data points.
You will use a simplified version of the dataset, where each example has been
labeled either 0 (corresponding to an abnormal rhythm), or 1 (corresponding to a
normal rhythm). You are interested in identifying the abnormal rhythms.
• How will you detect anomalies using an autoencoder? Recall that an autoencoder is
trained to minimize reconstruction error. You will train an autoencoder on the normal
rhythms only, then use it to reconstruct all the data. Our hypothesis is that the abnormal
rhythms will have higher reconstruction error. You will then classify a rhythm as an
anomaly if the reconstruction error surpasses a fixed threshold.
• Load ECG data
48. • You will soon classify an ECG as anomalous if the reconstruction error is
greater than one standard deviation from the normal training examples.
First, let's plot a normal ECG from the training set, the reconstruction after
it's encoded and decoded by the autoencoder, and the reconstruction
error.
49.
50. Detect anomalies
• Detect anomalies by calculating whether the reconstruction loss is greater
than a fixed threshold. In this tutorial, you will calculate the mean average
error for normal examples from the training set, then classify future
examples as anomalous if the reconstruction error is higher than one
standard deviation from the training set.
• Plot the reconstruction error on normal ECGs from the training set
54. Let’s import the required libraries
import numpy as np
from keras.layers import Input, Dense
from keras.models import Model
from keras.datasets import mnist
import matplotlib.pyplot as plt
Declaration of Hidden Layers and Variables
# this is the size of our encoded representations
encoding_dim = 32
# 32 floats -> compression of factor 24.5, assuming the input is 784 floats
# this is our input placeholder
input_img = Input(shape=(784,))
# "encoded" is the encoded representation of the input
encoded = Dense(encoding_dim, activation='relu')(input_img)
# "decoded" is the lossy reconstruction of the input
decoded = Dense(784, activation='sigmoid')(encoded)
55. # this model maps an input to its reconstruction
autoencoder = Model(input_img, decoded)
# this model maps an input to its encoded representation
encoder = Model(input_img, encoded)
# create a placeholder for an encoded (32-dimensional) input
encoded_input = Input(shape=(encoding_dim,))
# retrieve the last layer of the autoencoder model
decoder_layer = autoencoder.layers[-1]
# create the decoder model
decoder = Model(encoded_input, decoder_layer(encoded_input))
# configure our model to use a per-pixel binary crossentropy
loss, and the Adadelta optimizer:
autoencoder.compile(optimizer='adadelta',
loss='binary_crossentropy')
56. Preparing the input data (MNIST Dataset)
(x_train, _), (x_test, _) = mnist.load_data()
# normalize all values between 0 and 1 and we will flatten the 28x28
images into vectors of size 784.
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))
print x_train.shape
print x_test.shape
Training Autoencoders for 50 epochs
autoencoder.fit(x_train, x_train,
epochs=50,
batch_size=256,
shuffle=True,
validation_data=(x_test, x_test))
# encode and decode some digits
# note that we take them from the *test* set
encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)
57. Visualizing the reconstructed inputs and the encoded representations using Matplotlib
n = 20 # how many digits we will display
plt.figure(figsize=(20, 4))
for i in range(n):
# display original
ax = plt.subplot(2, n, i + 1)
plt.imshow(x_test[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# display reconstruction
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(decoded_imgs[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
OUTPUT
Input is in the first row and the output is the second row.
59. Variational AutoEncoders
• Variational autoencoder was proposed in 2013 by Knigma and Welling
at Google and Qualcomm.
• A variational autoencoder (VAE) provides a probabilistic manner for
describing an observation in latent space.
• Thus, rather than building an encoder that outputs a single value to
describe each latent state attribute, we’ll formulate our encoder to
describe a probability distribution for each latent attribute.
• It has many applications such as data compression, synthetic data
creation etc.
60. Architecture
• Autoencoders are a type of neural network that learns the data encodings from the dataset in an
unsupervised way.
• It basically contains two parts: the first one is an encoder which is similar to the convolution
neural network except for the last layer.
• The aim of the encoder to learn efficient data encoding from the dataset and pass it into a
bottleneck architecture.
• The other part of the autoencoder is a decoder that uses latent space in the bottleneck layer to
regenerate the images similar to the dataset.
• These results backpropagate from the neural network in the form of the loss function.
• Variational autoencoder is different from autoencoder in a way such that it provides a statistic
manner for describing the samples of the dataset in latent space.
• Therefore, in variational autoencoder, the encoder outputs a probability distribution in the
bottleneck layer instead of a single output value.
61.
62. Mathematics behind variational autoencoder:
• Variational autoencoder uses KL-divergence as its loss function, the goal
of this is to minimize the difference between a supposed distribution and
original distribution of dataset.
• Suppose we have a distribution z and we want to generate the
observation x from it. In other words, we want to calculate
p(z|x)
• We can do it by following way:
p z x =
p x z p(z)
p(x)
But, the calculation of p(x) can be quite difficult
p(x)= 𝑝 𝑥 𝑧 𝑝 𝑧 𝑑𝑧
63. • This usually makes it an intractable distribution. Hence, we need to
approximate p(z|x) to q(z|x) to make it a tractable distribution.
• To better approximate p(z|x) to q(z|x), we will minimize the KL-divergence
loss which calculates how similar two distributions are:
min KL (q(z|x)||p(z|x))
• By simplifying, the above minimization problem is equivalent to the
following maximization problem :
Eq z x logp x z − KL(q(z|x)||p z )
• The first term represents the reconstruction likelihood and the other term
ensures that our learned distribution q is similar to the true prior
distribution p.
• Thus our total loss consists of two terms, one is reconstruction error and
other is KL-divergence loss:
Loss = L x, x +
j
KL (qj(z|x)||p z )
64. Implementation
• We will be using the Fashion-MNIST dataset, this dataset is already available in
keras.datasets API, so we don’t need to add or upload manually.
• First, we need to import the necessary packages to our python environment. we
will be using Keras package with tensorflow as a backen# code
# Code
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Input, Model
from tensorflow.keras.layers import Layer, Conv2D, Flatten, Dense,
Reshape, Conv2DTranspose
import matplotlib.pyplot as plt
65. • For variational autoencoders, we need to define the architecture of two parts encoder
and decoder but first, we will define the bottleneck layer of architecture, the sampling
layer.
# this sampling layer is the bottleneck layer of variational autoencoder,
# it uses the output from two dense layers z_mean and z_log_var as input,
# convert them into normal distribution and pass them to the decoder layer
class Sampling(Layer):
def call(self, inputs):
z_mean, z_log_var = inputs
batch = tf.shape(z_mean)[0]
dim = tf.shape(z_mean)[1]
epsilon = tf.keras.backend.random_normal(shape =(batch, dim))
return z_mean + tf.exp(0.5 * z_log_var) * epsilon
66. • Now, we define the architecture of encoder part of our autoencoder, this part takes
images as input and encodes their representation in the Sampling layer.
# Define Encoder Model
latent_dim = 2
encoder_inputs = Input(shape =(28, 28, 1))
x = Conv2D(32, 3, activation ="relu", strides = 2, padding ="same")(encoder_inputs)
x = Conv2D(64, 3, activation ="relu", strides = 2, padding ="same")(x)
x = Flatten()(x)
x = Dense(16, activation ="relu")(x)
z_mean = Dense(latent_dim, name ="z_mean")(x)
z_log_var = Dense(latent_dim, name ="z_log_var")(x)
z = Sampling()([z_mean, z_log_var])
encoder = Model(encoder_inputs, [z_mean, z_log_var, z], name ="encoder")
encoder.summary()
67. • Now, we define the architecture of decoder part of our autoencoder, this part takes the
output of the sampling layer as input and output an image of size (28, 28, 1).
# Define Decoder Architecture
latent_inputs = keras.Input(shape =(latent_dim, ))
x = Dense(7 * 7 * 64, activation ="relu")(latent_inputs)
x = Reshape((7, 7, 64))(x)
x = Conv2DTranspose(64, 3, activation ="relu", strides = 2, padding ="same")(x)
x = Conv2DTranspose(32, 3, activation ="relu", strides = 2, padding ="same")(x)
decoder_outputs = Conv2DTranspose(1, 3, activation ="sigmoid", padding ="same")(x)
decoder = Model(latent_inputs, decoder_outputs, name ="decoder")
decoder.summary()
68. • In this step, we combine the model and define the training procedure with loss
functions
# this class takes encoder and decoder models and
# define the complete variational autoencoder architecture
class VAE(keras.Model):
def __init__(self, encoder, decoder, **kwargs):
super(VAE, self).__init__(**kwargs)
self.encoder = encoder
self.decoder = decoder
def train_step(self, data):
if isinstance(data, tuple):
data = data[0]
with tf.GradientTape() as tape:
z_mean, z_log_var, z = encoder(data)
reconstruction = decoder(z)
70. • Now it’s the right time to train our variational autoencoder model, we will train it
for 100 epochs. But first we need to import the fashion MNIST dataset.
# load fashion mnist dataset from keras.dataset API
(x_train, _), (x_test, _) = keras.datasets.fashion_mnist.load_data()
fmnist_images = np.concatenate([x_train, x_test], axis = 0)
# expand dimension to add a color map dimension
fmnist_images = np.expand_dims(fmnist_images, -1).astype("float32") / 255
# compile and train the model
vae = VAE(encoder, decoder)
vae.compile(optimizer ='rmsprop')
vae.fit(fmnist_images, epochs = 100, batch_size = 64)
71. • In this step, we display training results, we will be displaying these results
according to their values in latent space vectors.
def plot_latent(encoder, decoder):
# display a n * n 2D manifold of images
n = 10
img_dim = 28
scale = 2.0
figsize = 15
figure = np.zeros((img_dim * n, img_dim * n))
# linearly spaced coordinates corresponding to the 2D plot
# of images classes in the latent space
grid_x = np.linspace(-scale, scale, n)
grid_y = np.linspace(-scale, scale, n)[::-1]
for i, yi in enumerate(grid_y):
for j, xi in enumerate(grid_x):
z_sample = np.array([[xi, yi]])
x_decoded = decoder.predict(z_sample)
images = x_decoded[0].reshape(img_dim, img_dim)
73. • To get a more clear view of our representational latent vectors values, we will be
plotting the scatter plot of training data on the basis of their values of
corresponding latent dimensions generated from the encoder.
def plot_label_clusters(encoder, decoder, data, test_lab):
z_mean, _, _ = encoder.predict(data)
plt.figure(figsize =(12, 10))
sc = plt.scatter(z_mean[:, 0], z_mean[:, 1], c = test_lab)
cbar = plt.colorbar(sc, ticks = range(10))
cbar.ax.set_yticklabels([labels.get(i) for i in range(10)])
plt.xlabel("z[0]")
plt.ylabel("z[1]")
plt.show()