CIFAR10 - Image Classifier


This blog covers image classification on the CIFAR-10 image dataset. I have used CIFAR10 dataset. It has the classes: ‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’. The images in CIFAR-10 are of size 3x32x32, i.e. 3-channel color images of 32x32 pixels in size. The blog tuturial has been taken from below link: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py.

Following this tutorial we will build a Convolutional Neural Network (CNN) with 2 convolutional layers that can classify the CIFAR-10 dataset. All code in part-one is referred from the pytorch tutorial

Firstly, We will create a pytorch network following with importing packages and torchvision to load CIFAR-10 dataset below:

The output of torchvision datasets are PILImage images of range [0, 1]. We transform them to Tensors of normalized range [-1, 1].

We will download and extract the PILImage.

Using libraries such as matplotlib and numpy to get a representaion as well as quick idea of what data looks like.

We will print certain images from dataset

Now we need to introduce an architecture for the neural network. The network defined is a Convolution Neural Network with 2 convolutional layers(conv1 and conv2) connected to 3 fully connected layers(fc1, fc2, fc3), except the final output layer, each layer is taken through the relu-function which initializes the layer. The input convolutional layer takes a 3 channel color 32x32 image as input and looks for 6 features using a 5x5 kernel filter followed by Max Pooling with a 2x2 filter. The second convolutional layer looks for 16 features from the previous 6 using a 5x5 kernel as well. Another round of Max Pool 2 is done and the result is then flattened and fed into the 3 layer fully connected network. The fully connected network reduces the dimensionality to 120 weights then 84 and then finally 10 to predict the class.

Below is the code for the neural network architechture:

Now, Defined below is loss function and optimization method.

For this Cross Entropy and Stochastic Gradient Descent is used.

Cross-Entropy Loss is useful for measuring the performance of models with output of probability values between 0 and 1.

Stochastic Gradient Descent is a form of gradient descent where weights are updated to be the old weight minus some learning rate multiplied by the gradient.

In SGD the gradient is replaced by a stochastic estimation of the gradient.

We simply have to loop over our data iterator, and feed the inputs to the network and optimize.

To train we must first do a forward pass, then do a backwards propagation to calculate the gradients and update the model parameters. The loss is printed every 2000 batches. The network will be trained for 2 epochs, or 2 entire loops.

Finally we need to test the network on the test dataset of 10000 images. To do this we simply just run a forward pass of the network over the test data and compare the predictions with the ground truth. Then we will record the average accuracy of all 10000 images.

The network performed with 54% accuracy on the test data. Finally, below is the accuracy of the predictions for each of the 10 classes.

Below is the bar graph for all the accuracy of dataset with no of trials.

CONTRIBUTION

Secondly,Analyzing and modifying the Network

My first attempt to achieve a more accurate model was to add an additional convolutional layer that finds 32 features from the 16 channel input using a 3x3 kernel. This final layer is not pooled unlike the first and second layer.

Adding one more layer to the network, layer 4

The graph to show accuracy for my experiments above: