Image Compression Using Autoencoders in Keras

Sep 26, 2024 06:28 PM - 4 months ago 152288

Introduction

Autoencoders are a heavy learning exemplary for transforming information from a high-dimensional abstraction to a lower-dimensional space. They activity by encoding the data, immoderate its size, to a 1-D vector. This vector tin past beryllium decoded to reconstruct the original information (in this case, an image). The much meticulous the autoencoder, the person the generated information is to the original.

In this tutorial we’ll research the autoencoder architecture and spot really we tin use this exemplary to compress images from the MNIST dataset utilizing TensorFlow and Keras. In particular, we’ll consider:

  • Discriminative vs. Generative Modeling
  • How Autoencoders Work
  • Building an Autoencoder successful Keras
    • Building the Encoder
    • Building the Decoder
  • Training
  • Making Predictions
  • Complete Code
  • Conclusion

Prerequisites

  • Basic Knowledge of Python: Familiarity pinch Python programming is essential.
  • Understanding of Neural Networks: A basal knowing of neural networks, including layers, activation functions, and training processes.
  • Keras and TensorFlow: Basic knowledge of Keras and TensorFlow, including installation and usage.
  • Numpy and Matplotlib: Familiarity pinch NumPy for information manipulation and Matplotlib for plotting and visualizing data.
  • Image Processing Concepts: Basic knowing of image formats and really images are represented arsenic arrays.

Discriminative vs. Generative Modeling

The astir communal type of instrumentality learning models are discriminative. If you’re a instrumentality learning enthusiast, it’s apt that the type of models that you’ve built aliases utilized person been chiefly discriminative. These models admit the input information and past return due action. For a classification task, a discriminative exemplary learns really to differentiate betwixt various different classes. Based connected the model’s learning astir the properties of each class, it classifies a caller input sample to the due label. Let’s use this knowing to the adjacent image representing a informing sign.

If a machine/deep learning exemplary is to admit the pursuing image, it whitethorn understand that it consists of 3 main elements: a rectangle, a line, and a dot. When different input image has features which lucifer these elements, past it should besides beryllium recognized arsenic a informing sign.

If the algorithm is capable to place the properties of an image, could it make a caller image akin to it? In different words, could it tie a caller image that has a triangle, a line, and a dot? Unfortunately, discriminative models are not clever capable to tie caller images moreover if they cognize the building of these images. Let’s return different illustration to make things clearer.

Assume location is personification that tin admit things well. For a fixed image, he/she tin easy place the salient properties and past categorize the image. Is it a must that specified a personification will beryllium capable to tie specified an image again? No. Some group cannot tie things. Discriminative models are for illustration those group who tin conscionable admit images, but could not tie them connected their own.

In opposition pinch discriminative models, location is different group called generative models which tin create caller images. For a fixed input image, the output of a discriminative exemplary is simply a people label; the output of a generative exemplary is an image of the aforesaid size and akin quality arsenic the input image.

One of the simplest generative models is the autoencoder (AE for short), which is the attraction of this tutorial.

How Autoencoders Work

Autoencoders are a heavy neural web exemplary that tin return successful data, propagate it done a number of layers to condense and understand its structure, and yet make that information again. In this tutorial we’ll see really this useful for image information successful particular. To execute this task an autoencoder uses 2 different types of networks. The first is called an encoder, and the different is the decoder. The decoder is conscionable a reflection of the layers wrong the encoder. Let’s explain really this works.

The occupation of the encoder is to judge the original information (e.g. an image) that could person 2 aliases much dimensions and make a azygous 1-D vector that represents the full image. The number of elements successful the 1-D vector varies based connected the task being solved. It could person 1 aliases much elements. The less elements successful the vector, the much complexity successful reproducing the original image accurately.

By representing the input image successful a vector of comparatively fewer elements, we really compress the image. For example, the size of each image successful the MNIST dataset (which we’ll usage successful this tutorial) is 28x28. That is, each image has 784 elements. If each image is compressed truthful that it is represented utilizing conscionable 2 elements, past we spared 782 elements and frankincense (782/784)*100=99.745% of the data.

The adjacent fig shows really an encoder generates the 1-D vector from an input image. The layers included are of your choosing, truthful you tin usage dense, convolutional, dropout, etc.

encoder web informing sign

The 1-D vector generated by the encoder from its past furniture is past fed to the decoder. The occupation of the decoder is to reconstruct the original image pinch the highest imaginable quality. The decoder is conscionable a reflection of the encoder. According to the encoder architecture successful the erstwhile figure, the architecture of the decoder is fixed successful the adjacent figure.

decoder network

The nonaccomplishment is calculated by comparing the original and reconstructed images, i.e. by calculating the quality betwixt the pixels successful the 2 images. Note that the output of the decoder must beryllium of the aforesaid size arsenic the original image. Why? Because if the size of the images is different, location is nary measurement to cipher the loss.

After discussing really the autoencoder works, let’s build our first autoencoder utilizing Keras.

Building an Autoencoder successful Keras

Keras is simply a powerful instrumentality for building instrumentality and heavy learning models because it’s elemental and abstracted, truthful successful small codification you tin execute awesome results. Keras has 3 ways for building a model:

  1. Sequential API
  2. Functional API
  3. Model Subclassing

The 3 ways disagree successful the level of customization allowed.

The sequential API allows you to build sequential models, but it is little customizable compared to the different 2 types. The output of each furniture successful the exemplary is only connected to a azygous layer.

Although this is the type of exemplary we want to create successful this tutorial, we’ll usage the functional API. The functional API is simple, very akin to the sequential API, and besides supports further features specified arsenic the expertise to link the output of a azygous furniture to aggregate layers.

The past action for building a Keras exemplary is model subclassing, which is fully-customizable but besides very complex. You tin publication much astir these 3 methods successful this tutorial.

Now we’ll attraction connected utilizing the functional API for building the autoencoder. You mightiness deliberation that we are going  to build a azygous Keras exemplary for representing the autoencoder, but we will really build 3 models: 1 for the encoder, different for the decoder, and yet different for the complete autoencoder. Why do we build a exemplary for some the encoder and the decoder? We do this successful lawsuit you want to research each exemplary separately. For instance, we tin usage the exemplary of the encoder to visualize the 1-D vector representing each input image, and this mightiness thief you to cognize whether it’s a bully practice of the image aliases not. With the decoder we’ll beryllium capable to trial whether bully representations are being created from the 1-D vectors, assuming they are well-encoded (i.e. amended for debugging purposes) Finally, by building a exemplary for the full autoencoder we tin easy usage it end-to-end by feeding it the original image and receiving the output image directly.

Let’s commencement by building the encoder model.

Building the Encoder

The pursuing codification builds a exemplary for the encoder utilizing the functional API. At first, the layers of the exemplary are created utilizing the tensorflow.keras.layers API because we are utilizing TensorFlow arsenic the backend library. The first furniture is an Input furniture which accepts the original image. This furniture accepts an statement named style representing the size of the input, which depends connected the dataset being used. We’re going to usage the MNIST dataset wherever the size of each image is 28x28. Rather than mounting the style to (28, 28), it’s conscionable group to (784). Why? Because we’re going to usage only dense layers successful the web and frankincense the input must beryllium successful the shape of a vector, not a matrix. The tensor representing the input furniture is returned to the adaptable x.

The input furniture is past propagated done a number of layers:

  • Dense furniture pinch 300 neurons
  • LeakyReLU layer
  • Dense furniture pinch 2 neurons
  • LeakyReLU layer

The past Dense furniture successful the web has conscionable 2 neurons. When fed to the LeakyReLU layer, the last output of the encoder will beryllium a 1-D vector pinch conscionable 2 elements. In different words, each images successful the MNIST dataset will beryllium encoded arsenic vectors of 2 elements.

import tensorflow.keras.layers import tensorflow.keras.models x = tensorflow.keras.layers.Input(shape=(784), name="encoder_input") encoder_dense_layer1 = tensorflow.keras.layers.Dense(units=300, name="encoder_dense_1")(x) encoder_activ_layer1 = tensorflow.keras.layers.LeakyReLU(name="encoder_leakyrelu_1")(encoder_dense_layer1) encoder_dense_layer2 = tensorflow.keras.layers.Dense(units=2, name="encoder_dense_2")(encoder_activ_layer1) encoder_output = tensorflow.keras.layers.LeakyReLU(name="encoder_output")(encoder_dense_layer2)

After building and connecting each of the layers, the adjacent measurement is to build the exemplary utilizing the tensorflow.keras.models API by specifying the input and output tensors according to the adjacent line:

encoder = tensorflow.keras.models.Model(x, encoder_output, name="encoder_model")

To people a summary of the encoder architecture we’ll usage encoder.summary(). The output is below. This web is not ample and you tin summation the number of neurons successful the dense furniture named encoder_dense_1 but I conscionable utilized 300 neurons to debar taking overmuch clip training the network.

_________________________________________________________________ Layer (type) Output Shape Param ================================================================= encoder_input (InputLayer) [(None, 784)] 0 _________________________________________________________________ encoder_dense_1 (Dense) (None, 300) 235500 _________________________________________________________________ encoder_leakyrelu_1 (LeakyRe (None, 300) 0 _________________________________________________________________ encoder_dense_2 (Dense) (None, 2) 602 _________________________________________________________________ encoder_output (LeakyReLU) (None, 2) 0 ================================================================= Total params: 236,102 Trainable params: 236,102 Non-trainable params: 0 _________________________________________________________________

After building the encoder, adjacent is to activity connected the decoder.

Building the Decoder

Similar to building the encoder, the decoder will beryllium build utilizing the pursuing code. Because the input furniture of the decoder accepts the output returned from the past furniture successful the encoder, we person to make judge these 2 layers lucifer successful the size. The past furniture successful the encoder returns a vector of 2 elements and frankincense the input of the decoder must person 2 neurons. You tin easy statement that the layers of the decoder are conscionable reflection to those successful the encoder.

decoder_input = tensorflow.keras.layers.Input(shape=(2), name="decoder_input") decoder_dense_layer1 = tensorflow.keras.layers.Dense(units=300, name="decoder_dense_1")(decoder_input) decoder_activ_layer1 = tensorflow.keras.layers.LeakyReLU(name="decoder_leakyrelu_1")(decoder_dense_layer1) decoder_dense_layer2 = tensorflow.keras.layers.Dense(units=784, name="decoder_dense_2")(decoder_activ_layer1) decoder_output = tensorflow.keras.layers.LeakyReLU(name="decoder_output")(decoder_dense_layer2)

After connecting the layers, adjacent is to build the decoder exemplary according to the adjacent line.

decoder = tensorflow.keras.models.Model(decoder_input, decoder_output, name="decoder_model")

Here is the output of decoder.summary(). It is very important to make judge the size of the output returned from the encoder matches the original input size.

_________________________________________________________________ Layer (type) Output Shape Param ================================================================= decoder_input (InputLayer) [(None, 2)] 0 _________________________________________________________________ decoder_dense_1 (Dense) (None, 300) 900 _________________________________________________________________ decoder_leakyrelu_1 (LeakyRe (None, 300) 0 _________________________________________________________________ decoder_dense_2 (Dense) (None, 784) 235984 _________________________________________________________________ decoder_output (LeakyReLU) (None, 784) 0 ================================================================= Total params: 236,884 Trainable params: 236,884 Non-trainable params: 0 _________________________________________________________________

After building the 2 blocks of the autoencoder (encoder and decoder), adjacent is to build the complete autoencoder.

Building the Autoencoder

The codification that builds the autoencoder is listed below. The tensor named ae_input represents the input furniture that accepts a vector of magnitude 784. This tensor is fed to the encoder exemplary arsenic an input. The output from the encoder is saved successful ae_encoder_output which is past fed to the decoder. Finally, the output of the autoencoder is saved successful ae_decoder_output.

A exemplary is created for the autoencoder which accepts the input ae_input and the output ae_decoder_output.

ae_input = tensorflow.keras.layers.Input(shape=(784), name="AE_input") ae_encoder_output = encoder(ae_input) ae_decoder_output = decoder(ae_encoder_output) ae = tensorflow.keras.models.Model(ae_input, ae_decoder_output, name="AE")

The summary of the autoencoder is listed below. Here you tin find that the style of the input and output from the autoencoder are identical which is thing basal for calculating the loss.

_________________________________________________________________ Layer (type) Output Shape Param ================================================================= AE_input (InputLayer) [(None, 784)] 0 _________________________________________________________________ encoder_model (Model) (None, 2) 236102 _________________________________________________________________ decoder_model (Model) (None, 784) 236884 ================================================================= Total params: 472,986 Trainable params: 472,986 Non-trainable params: 0 _________________________________________________________________

The adjacent measurement successful the exemplary building process is to compile the exemplary utilizing the compile() method according to the adjacent code. The mean quadrate correction nonaccomplishment usability is utilized and Adam optimizer is utilized pinch learning complaint group to 0.0005.

import tensorflow.keras.optimizers ae.compile(loss="mse", optimizer=tensorflow.keras.optimizers.Adam(lr=0.0005))

The exemplary is now fresh for accepting the training information and frankincense the adjacent measurement is to hole the information for being fed to the model.

Just retrieve that location are 3 models which are:

  1. encoder
  2. decoder
  3. ae (for the autoencoder)

Loading the MNIST Dataset and Training Autoencoder

Keras has an API named tensorflow.keras.datasets successful which a number of datasets tin beryllium used. We are going to usage the MNIST dataset which is loaded according to the adjacent code. The dataset is loaded arsenic NumPy arrays representing the training data, trial data, train labels, and trial labels. Note that we are not willing successful utilizing the people labels astatine each while training the exemplary but they are conscionable utilized to show the results.

The x_train_orig and the x_test_orig NumPy arrays clasp the MNIST image information wherever the size of each image is 28x28. Because our exemplary accepts the images arsenic vectors of magnitude 784, past these arrays are reshaped utilizing the numpy.reshape() function.

import tensorflow.keras.datasets import numpy (x_train_orig, y_train), (x_test_orig, y_test) = tensorflow.keras.datasets.mnist.load_data() x_train_orig = x_train_orig.astype("float32") / 255.0 x_test_orig = x_test_orig.astype("float32") / 255.0 x_train = numpy.reshape(x_train_orig, newshape=(x_train_orig.shape[0], numpy.prod(x_train_orig.shape[1:]))) x_test = numpy.reshape(x_test_orig, newshape=(x_test_orig.shape[0], numpy.prod(x_test_orig.shape[1:])))

At this moment, we tin train the autoencoder utilizing the fresh method arsenic follows:

ae.fit(x_train, x_train, epochs=20, batch_size=256, shuffle=True, validation_data=(x_test, x_test))

Note that the training information inputs and outputs are some group to x_train because the predicted output is identical to the original input. The aforesaid useful for the validation data. You tin alteration the number of epochs and batch size to different values.

After the autoencoder is trained, adjacent is to make predictions.

Making Predictions

The predict() method is utilized successful the adjacent codification to return the outputs of some the encoder and decoder models. The encoded_images NumPy array holds the 1D vectors representing each training images. The decoder exemplary accepts this array to reconstruct the original images.

encoded_images = encoder.predict(x_train) decoded_images = decoder.predict(encoded_images)

Note that the output of the decoder is simply a 1D vector of magnitude 784. To show the reconstructed images, the decoder output is reshaped to 28x28 arsenic follows:

decoded_images_orig = numpy.reshape(decoded_images, newshape=(decoded_images.shape[0], 28, 28))

The adjacent codification uses the Matplotlib to show the original and reconstructed images of 5 random samples.

num_images_to_show = 5 for im_ind in range(num_images_to_show): plot_ind = im_ind*2 + 1 rand_ind = numpy.random.randint(low=0, high=x_train.shape[0]) matplotlib.pyplot.subplot(num_images_to_show, 2, plot_ind) matplotlib.pyplot.imshow(x_train_orig[rand_ind, :, :], cmap="gray") matplotlib.pyplot.subplot(num_images_to_show, 2, plot_ind+1) matplotlib.pyplot.imshow(decoded_images_orig[rand_ind, :, :], cmap="gray")

The adjacent fig shows 5 original images and their reconstruction. You tin spot that the autoencoder is capable to astatine slightest reconstruct an image adjacent to the original 1 but the value is low.

fig

One of the reasons for the debased value is utilizing a debased number of neurons (300) wrong the dense layer. Another logic is utilizing conscionable 2 elements for representing each images. The value mightiness beryllium accrued by utilizing much elements but this increases the size of the compressed data.

Another logic is not utilizing convolutional layers astatine all. Dense layers are bully for capturing the world properties from the images and the convolutional layers are bully for the section properties. The consequence could beryllium enhanced by adding immoderate convolutional layers.

To person a amended knowing of the output of the encoder model, let’s show each the 1D vectors it returns according to the adjacent code.

matplotlib.pyplot.figure() matplotlib.pyplot.scatter(encoded_images[:, 0], encoded_images[:, 1], c=y_train) matplotlib.pyplot.colorbar()

The crippled generated by this codification is shown below. Generally, you tin spot that the exemplary is capable to cluster the different images successful different regions but location is overlap betwixt the different clusters.

dot plot

The complete codification discussed in this tutorial is listed below. import tensorflow.keras.layers import tensorflow.keras.models import tensorflow.keras.optimizers import tensorflow.keras.datasets import numpy import matplotlib.pyplot x = tensorflow.keras.layers.Input(shape=(784), name="encoder_input") encoder_dense_layer1 = tensorflow.keras.layers.Dense(units=300, name="encoder_dense_1")(x) encoder_activ_layer1 = tensorflow.keras.layers.LeakyReLU(name="encoder_leakyrelu_1")(encoder_dense_layer1) encoder_dense_layer2 = tensorflow.keras.layers.Dense(units=2, name="encoder_dense_2")(encoder_activ_layer1) encoder_output = tensorflow.keras.layers.LeakyReLU(name="encoder_output")(encoder_dense_layer2) encoder = tensorflow.keras.models.Model(x, encoder_output, name="encoder_model") encoder.summary() decoder_input = tensorflow.keras.layers.Input(shape=(2), name="decoder_input") decoder_dense_layer1 = tensorflow.keras.layers.Dense(units=300, name="decoder_dense_1")(decoder_input) decoder_activ_layer1 = tensorflow.keras.layers.LeakyReLU(name="decoder_leakyrelu_1")(decoder_dense_layer1) decoder_dense_layer2 = tensorflow.keras.layers.Dense(units=784, name="decoder_dense_2")(decoder_activ_layer1) decoder_output = tensorflow.keras.layers.LeakyReLU(name="decoder_output")(decoder_dense_layer2) decoder = tensorflow.keras.models.Model(decoder_input, decoder_output, name="decoder_model") decoder.summary() ae_input = tensorflow.keras.layers.Input(shape=(784), name="AE_input") ae_encoder_output = encoder(ae_input) ae_decoder_output = decoder(ae_encoder_output) ae = tensorflow.keras.models.Model(ae_input, ae_decoder_output, name="AE") ae.summary() def rmse(y_true, y_predict): return tensorflow.keras.backend.mean(tensorflow.keras.backend.square(y_true-y_predict)) ae.compile(loss="mse", optimizer=tensorflow.keras.optimizers.Adam(lr=0.0005)) (x_train_orig, y_train), (x_test_orig, y_test) = tensorflow.keras.datasets.mnist.load_data() x_train_orig = x_train_orig.astype("float32") / 255.0 x_test_orig = x_test_orig.astype("float32") / 255.0 x_train = numpy.reshape(x_train_orig, newshape=(x_train_orig.shape[0], numpy.prod(x_train_orig.shape[1:]))) x_test = numpy.reshape(x_test_orig, newshape=(x_test_orig.shape[0], numpy.prod(x_test_orig.shape[1:]))) ae.fit(x_train, x_train, epochs=20, batch_size=256, shuffle=True, validation_data=(x_test, x_test)) encoded_images = encoder.predict(x_train) decoded_images = decoder.predict(encoded_images) decoded_images_orig = numpy.reshape(decoded_images, newshape=(decoded_images.shape[0], 28, 28)) num_images_to_show = 5 for im_ind in range(num_images_to_show): plot_ind = im_ind*2 + 1 rand_ind = numpy.random.randint(low=0, high=x_train.shape[0]) matplotlib.pyplot.subplot(num_images_to_show, 2, plot_ind) matplotlib.pyplot.imshow(x_train_orig[rand_ind, :, :], cmap="gray") matplotlib.pyplot.subplot(num_images_to_show, 2, plot_ind+1) matplotlib.pyplot.imshow(decoded_images_orig[rand_ind, :, :], cmap="gray") matplotlib.pyplot.figure() matplotlib.pyplot.scatter(encoded_images[:, 0], encoded_images[:, 1], c=y_train) matplotlib.pyplot.colorbar()

Conclusion

This tutorial introduced the heavy learning generative exemplary known arsenic autoencoders. This exemplary consists of 2 building blocks: the encoder, and the decoder. The erstwhile encodes the input information arsenic 1-D vectors, which are past to beryllium decoded to reconstruct the original data. We saw really to use this exemplary utilizing Keras to compress images from the MNIST dataset successful twapplied the autoencoder utilizing Keras for compressing the MNIST dataset successful conscionable 2 elements.

More