Handwritten Digit Recognition


Overview


I created this as a personal project shortly after writing a relatively naive implementation as extra credit for my intro to AI course during my junior year of University. My goal for this project was to create an easily customizable implementation of a fully connected neural net.



The Dataset

The dataset used is "THE MNIST DATABASE of handwritten digits" which is a collection of handwritten digits in size 28x28. The training set consists of 60,000 image, and a separate test set consists of 10,000 images. Since the images are 28x28 in size, we have a total of 784 pixels. Each one of these pixels has a color based on the RGB scale which defines the intensity of the color as an integer between 0 and 255. For example the rgb value of (255, 0, 0) would give us red, (0,0,0) gives us black, and (255,255,255) is white. In our case we don't care too much about the specific color, so it's safe to convert the images to grayscale to simplify things; giving us one value between 0 and 255. Next we need to statistically standardize the pixels values around the mean. This is done by subtracting the mean pixel value and dividing by 255.



Fulfilling my Goals

I won't go into the design decisions and the technical reason behind the actual neural net choices, because that in itself could be an entire website. But rather I will explore the features of my program. For simplicity in testing I created a method of outputting training weights to a text file and automatically loading the weights on subsequent runs. Retraining or loading weight to the neural net can be done by commenting out a single for loop. Included in my github is a pre-trained weights.txt file which can frequently hit over 70% accuracy. This proves that it can provide the intended functionality, but there is room for improvement.

Testing and improving the neural net is simple. The parameters can be customized from the main method, controlling all the vital aspects such as input size, hidden layer count, hidden layer size, output size, and learning rate with simple integer edits.