Beginners: Keras Mnist Data Classification using CNN (Part 2)

This is 2nd part of this(Part 1) tutorial series. Read more about ไฮโลออนไลน์ ได้เงินจริง. If you have not gone through First part, then please check out first part, because it is continous part of that series.

In the previous part, we have done data loading and preprocessing steps now in this part we are going to train our model. So let’s get started:

4- Declaring model layers

First we need to declare keras sequential model layers to build a model graph. We declare a sequential object with name model, then we add our first convolutional layer by calling a function model.add and in that we call Convolution2D function which takes first parameter as number of filters that we are going to apply on our input image. Second argument takes size of filter, we use 3*3 filter in this problem which is mostly suited to other problems as well. Activation function Relu which transforms the output of convolutional operation in such a way that all negative values becomes zero and positive values remains the same.

Then after first convolutional layer we add max poling layer which is kind of reduce the spatial size of image. Then we Flatten out the output in one-d vector and append neural layer containing 32 neurons. We also add dropout 50% because model sometimes tends to overfit. So to avoid overfiting we apply dropout and then another dense layer with size 10 which are our number of classes. So model will predict 10 small numbers represented by probabilities all sum up to 1 becuase of softmax activation function in the last layer.

#CNN model
model = Sequential()
model.compile(loss='categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

After building the graph with sequential layers, we now compile the model with model.compile function taking parameters of loss function which in our case is “categorical crossentropy” and optimizer adam with metric accuracy.

We use categorical cross entropy because our problem is multi-class classification so we need to have such a loss function that can be optimized easily. You can read more about this loss function here.

5- Training the model

Alright! we have done all the basic steps, now is time to train the model. we will call function which takes our training data x_train and y_train, validation split parameter indicates that how much sample model will keep during training for validation from training set. Batch_size parameter tells us how many images model is going to process at a time in a single batch and epochs are number of times model will iterate through whole dataset.

#training without callbacks
history =,y_train,validation_split=0.2 , batch_size=256,epochs=100)

Training will take some time, if you are training it on Google Colab Gpu then it will take 1 to 2 minutes and on CPu it will take 5 to 8 minutes.

After training the model, we will get around 99% train data accuracy and 98% validation data accuracy. To see the history of training and validation loss and accuracy, we will plot the graphs using matplotlib function. Now let’s visualize the loss and accuracy curve of both training and validation set.

#this function will plot accuracy and loss curve
import matplotlib.pyplot as plt

def plot_curve(train,val,string1,location):
  plt.title('model '+string1)
  plt.legend(['train', 'val'], loc=location)

'upper left'
plot_curve(history.history['accuracy'],history.history['val_accuracy'],'Accuracy','lower right')
plot_curve(history.history['loss'],history.history['val_loss'],'Loss','upper right')

Model Accuracy

Model Loss

if we look at loss curves, validation loss curve green starts to hype upwards after certain number of iterations which means model starts ovefitting because training curve still going downwards. That means we need to stop the training when the model gets to minimal validation loss value. Otherwise we will get the overfitted model. Also we need to save the best model weights when validation loss is minimum. To do this we need to add callbacks.

Callbacks are very handy in keras api when it comes to saving optimal model based on some metric and also stop training process when model starts overfitting and validation loss does not improve after number of epochs. So we add Modelcheckpoint and earlystopping callbacks so that model stop training when validation loss is not increasing. Let’s add callbacks and train the model again.

#Training with callbacks
my_callbacks = [
    keras.callbacks.ModelCheckpoint(filepath='best_model.h5', save_best_only=True),

history =,y_train,validation_split=0.2 , batch_size=256,epochs=100,callbacks=my_callbacks)

After adding callbacks and training the model lets print visualize the history object again and see how it works.

Model Accuracy

Model Loss

We can see that model gets stopped just after training over 30 epochs because validation loss starts increasing again. By looking at curve we can observe this phenomenon. Now we have trained the model and saved the best model as well. Now is the time to load the best model and use it to evaluate on testing dataset.

loss,acc = model.evaluate(x_test,y_test)
print("Testing Accuracy: ",round(acc*100,2))
print("Testing Loss: ",round(loss,2))

We achieved around 98% accuracy on testing data which is quite high but this dataset gives more than 99.9% accuracy on testing data which can be achieved by improving layers further. We will improve the model in our next part of this tutorial series.

Code Link:

Beginners: Keras Mnist Data Classification using CNN (Part 1)

This tutorial series is for the audience who are seeking how Convolutional neural network works in keras specially with sequential api. We will build a custom neural network having some CNN Layers. If you have no prior knowledge of CNN please check out my Short tutorial on how CNN networks work. So lets jump into the tutorial:

CNN networks are very good at learning image representation and hence can be used for different computer vision tasks such as image classification, object detection and segmentation etc. In this tutorial we will use MNIST Dataset. Wanna know more about this dataset check out this Guide.

Mnist dataset contains total 70K grey-scale images of different handwritten digits with dimensions 28*28. We will use this dataset for the illustration of image classification problem using keras api which is made on top of tensorflow to make it easy for AI developers to develop their own neural networks with simple few lines of codes.

We will do image classification task in few steps mentioned below to make it easy to understand:
1- Importing Keras libraries
2- Loading dataset
3- Preprocessing of dataset
4- Declaring keras Model layers
5- Training the model

1- Importing Keras and other important Libraries

First of all we will import some important libraries modules from keras API.

#Importing some Important Keras modules or libraries
import keras
from keras.models import Sequential
from keras.datasets import mnist
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D
import numpy as np

we have imported keras sequential api module alng with mnist class from Keras Standard datasets. We have also imported some layers function like Convolution2D, MaxPooling2D etc . Numpy is very efficient library for handling of multidimensional data so we import this too.

2- Loading Dataset

Now we will load the dataset from keras datasets module and load it into training and testing variables:

#Loading Dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

mnist.load_data function will download the dataset if your using this function for the very first time and it will some time depending on your internet speed. The dataset is already divided into Training and Testing set. Lets print out the shape of the loaded dataset.


#Remember x_train contains our input images of digits and y_train contains 
#labels of training images.Similarly, x_test contains testing images and 
#y_test contains labels of testing images.

Output will be something like this (60000, 28, 28) (60000,) (10000, 28, 28) (10000,). We can see x_train contains 60,000 images with dimension 28*28 and y_train contains 60,000 labels with values range from (1-10).

3- Preprocessing of dataset

As Mnist dataset contains grey scale images and keras convolutional model layer takes input as 4 dimensional so we need to append an extra dimension to our training and testing input images. Lets get it done by using numpy function expand_dims with parameter axis = -1 means append dimension at last position.

# Appending extra chanell dimension at last position
x_train = np.expand_dims(x_train,axis=-1)
x_test = np.expand_dims(x_test,axis=-1)

We have appended the extra chanell dimension and now our dimension size output will look like this (60000, 28, 28, 1) (10000, 28, 28, 1) .

Now, we need to convert our labels into one hot encoding using keras.utils module. In one hot encoding instead of having single number label for each image, we will have a vector of size equal to number of classes which in our case is 10. So each image will be represented by the one dimensional vector of size 10 having values 0 and 1. there will be one 1 value placed at class number and rest of the 9 numbers value will be zero. so let’s suppose if we have an image of digit 8 and its label y is 8, then in one hot encoding its label will look like this [0,0,0,0,0,0,0,1,0,0].

#Converting labels into one hot encoded vectors
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

After converting our labels into suitable format, we now going to normalize the input image pixel values. As we know in grey scale image we have values between 0 and 255 so we divide each pixel with max value 255. This is called normalization of data. There are many other methods as well to normalize the data like subtracting mean image from each input image. But we will stick to normalizing with max value in this tutorial. Lets do it:

#Preprocessing the data 
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

Alright In this part we have done the basic steps like loading and preprocessing dataset. In the next part (2nd) we will start training the CNN model.

click here for PART 2 .

Code Link: