Thursday, April 28, 2016

More Fun with Machine Learning - Recognizing Digits

I've been interested in machine learning for a long time, but I've been moving very slowly. I got curious recently and decided to put two of my projects together to see what would happen.

Back in June 2014, I came across a machine learning competition to recognize hand-written digits. I wasn't up for the machine-learning parts, but I did write a program to figure out how to display the datasets: Coding Practice: Displaying Bitmaps from Pixel Data


Then in July 2015, I got really excited about Mathias Brandewinder's book Machine Learning Projects for .NET Developers -- mainly because Chapter 1 was all about recognizing hand-written digits using this same training set:


I went through the code, first with the C# sample and then with F#. And working through the process step-by-step was very helpful. Ultimately, the result of this, though, was an accuracy percentage. The initial evaluator that used Manhattan Distance came out with an accuracy of 93.4%.

Then Mathias goes on to show other algorithms that get that accuracy much closer.

A Visual Display of Accuracy
Fast forward to a couple weeks ago: I got it into my head that I should combine these two things. Rather than just showing an accuracy percentage, why don't I display what the computer thinks the digit is next to the bitmap of the digit itself?

And that's exactly what I did.

You can get the code from GitHub. I put it into the same "digit-display" project that I used previously, but I added a new branch to hold the combined code: GitHub: jeremybytes/digit-display - Recognizer Branch.

Just a note: if you download and run the code as it is right now, it takes about 3 minutes to process 1,000 records. I'm working on improving the speed, but I'm just happy to get the results right now.

Here's the output:


This is a *large* image, so you might want to click on it to see the full size (non-blurry) version.

Before analyzing the results, let's take a look at how I modified the code.

The Original Project
The original project had a WPF application and a separate project that loaded the data (which is a string of pixel brightness values) and turned them into bitmaps.


The New Project
I added the C# code from Mathias' book in a separate project (named "Recognizer"):


I know I should be using the F# code here, but most of that is in a script. Future steps will be for me to put that F# code into a library that I can bring in to this solution.

Updates to the WPF Application
I took the path of least resistance to get this to work. In my WPF application, I have most of the code in the code-behind of the MainWindow.

I added a new method that would initialize the digit recognizer (GitHub: MainWindow.xaml.cs):


This uses the "BasicClassifier" (as described in Mathias' book). I load up the data from the "training" set and use that to train the classifier.

In the previous code, I loaded up data from the training set to display. But I changed the code a bit to display the "test" set.

The difference between the "training" set and the "test" set is that the training set has a field that tells what the hand-written digit is supposed to be. This lets gives our classifier both the bitmap data and the expected output.

The "test" set only has the bitmap data. It does not have a field that tells what the number is supposed to be. Instead, we'll use our eyes to pick out the wrong ones.

In the App.config file, we have both files referenced so we can load them in the right places:


Displaying the Digits
In the old code, I created an Image control and loaded it into a WPF wrap panel. In the new code, I add the Image, and then I add a TextBlock that holds the result of our classifier.


The "imageString.Split..." code takes our original input (which is a string of comma-separated values) and turns it into an integer array -- the format needed by our classifier.

Then we pass that to the "classifier.Predict()" method, and it gives us back the recognized digit.

And that gives us the output (again, it takes about 3 minutes to process 1,000 records):


Analyzing the Results
This is where things get interesting. Now we (as humans) can look through the results to find the pairs of numbers that do not match. Here are just a few.



When looking through the numbers that are wrong, it's easy to see why the classifier made the choice that it did. There are similarities in shapes. And these similarities are easy to see when looking at all the numbers the classifier got *right*.

My next steps are to convert this to use the F# code for the machine learning bits and also figure out how to speed things up -- probably by parallelizing a bunch of the code.

Humans are Awesome
Looking at the hand-written digits compared to the numbers predicted by the classifier really gives me an appreciation for how difficult this problem really is.

As a human, I have no trouble interpreting the hand-written digits (except for a couple that are really ambiguous). And it's interesting to think about the things that go on in our brains that allow us to recognize things so quickly -- without conscious analysis. It all happens in a moment without having to think about it.

Before you leave, scan through the results to see how it gets some "hard" ones right, and some "easy" ones wrong. Our brains are pretty amazing:

Click for full-size image
Teaching a computer how to do that is pretty impressive. And honestly, I'm surprised that such a simple algorithm (remember, this is the "step 1" algorithm that we're using here) can get the accuracy as high as it does. This really taught me that we should start out simple and only get more complex as we need to.

And it also gives us a lot of new places to explore.

Happy Coding!

No comments:

Post a Comment