Sunday, June 22, 2014

Coding Practice: Displaying Bitmaps from Pixel Data

I'm not sure what you do when you wake up early and can't get back to sleep. For some reason, I decided to read about machine learning, and I ended up somewhere I didn't expect.

I went from this:


To building an application that does this:


Along the way, I watched the trial-and-error approach I took to solve a problem. So, here it is with the mistakes (and still not quite good code) and the working solution.

You can download the completed code here: http://www.jeremybytes.com/downloads.aspx#CPDB.

[Update: 7/11/2015: Project has been uploaded to GitHub: https://github.com/jeremybytes/digit-display]

[Update 06/27/2016: This project has been expanded beyond what's shown in this article. To see the code here, check the "DigitDisplay" branch on GitHub: jeremybytes/digit-display Branch DigitDisplay.]

How I Got There
So, I started by running a search for articles on machine learning with F#. And I came across lots of links from my friend Mathias Brandewinder (http://www.clear-lines.com/blog/). I guess I'll really have to attend one of his machine learning workshops the next time we're both in the same place.

After bouncing through a few links, I ended up looking at a Kaggle competition (https://www.kaggle.com/c/digit-recognizer/data). The challenge is to build a system that can recognize hand-written digits. Now, I knew that I was not up to that challenge (complex algorithms are something I need to work up to). But I looked at the sample data files, and I got a bit intrigued.

The sample data was in .csv format. And it was basically a comma-separated collection of values between 0 and 255 that represented the darkness of a pixel. The data sets were there to help you train and test your system. But I thought it might be interesting to try to display the images in an application.

Here's what a record looks like:

1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,188,255,94,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,191,250,253,93,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,123,248,253,167,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,80,247,253,208,13,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,29,207,253,235,77,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,54,209,253,253,88,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,93,254,253,238,170,17,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,23,210,254,253,159,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,16,209,253,254,240,81,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,27,253,253,254,13,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,20,206,254,254,198,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,168,253,253,196,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,20,203,253,248,76,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,22,188,253,245,93,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,103,253,253,191,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,89,240,253,195,25,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15,220,253,253,80,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,94,253,253,253,94,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,89,251,253,250,131,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,214,218,95,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

For the training data, the first value (1 in this case) is the number represented by the data. Then there are 784 values that represent a 28 by 28 pixel bitmap. And there were lots of records in the file.

Step 1: Build a Parser
I created my WPF shell application, and then created a library to handle loading the file and parsing the strings into integer arrays and turning the arrays into bitmaps.

I knew I could parse the file pretty easily; I've done that lots of times before. But I wasn't quite as sure about turning a big string of characters into a bitmap. So, that's where I started.

Here was my first shot at it:


I started by splitting the CSV string into an array. But this is an array of strings; I needed integers. So, I looped through the strings and parsed them into integers. Note that I don't have any error handling here. If there is bad data, this whole thing will blow up. But the dataset I was working with was "clean", so I didn't worry about it. Then I took the flat integer array and parsed it into a 28x28 array.

But I wasn't at all sure that I got this right. In fact, I'm famous for being "off by one". Since I knew that I was still a long way off from creating a bitmap, I figured it would be best to create some unit tests to at least make sure that I'm parsing the array correctly.

This was a bit tedious because I had to take my sample input (just part of it here):


And turn it into 28 arrays of 28 elements each (what my expected output was):


This was a lot of counting, copying, and pasting. But I eventually got the data set up.

After that, the tests were pretty easy (there are 2 other tests that have the other "chunks" of arrays):


And what I found is that I was off by one. In the code that sets the "pixelIndex", I was subtracting one (to account for the first value that I wanted to discard). Instead, I should have been adding one. Here's the corrected code:


Since I knew that my 28 x 28 array was correct, I moved on to creating the bitmap image. As a side note, my unit testing pretty much stopped here. The rest of the code deals with images -- and even better, I had no idea what these images were supposed to look like (as we'll see). So, I didn't have much that I could test.

Step 2: Creating the Bitmap
I now had an integer array that I could work with: int[28,28]. This represented the pixels that I needed to create the bitmap. Here was my first stab at that:


I'm not a big fan of nested "for" statements. But I was just trying to get the job done here. I used this to copy the values from the arrays into the pixels from the bitmap image. Since these are grayscale, I applied the value to the RGB values. So if the value was 238, I would get an RGB value of "238, 238, 238".

I really didn't know if this code was right or not. I needed to display this somehow to find out. But first, I had to load the data.

Step 3: Parse the File
I created another class that would load the data from the CSV file. I just created a static method that would parse the file and generate an array of strings (that I could then pass to the methods I already created).


This checks the config file for the file name, and then looks for the file in the same folder as the executable. Then it just loops through the file and creates an array of strings where each string represents a separate bitmap/digit.

There's a note that we're skipping the first line. That's because the training file (the one I'm using here) has a header row that enumerates the columns.

Now it was time to start putting things together.

Step 4: Displaying a Bitmap
I flipped over to my WPF application, added an <Image> element to the markup, and then flipped to the code behind to populate that image.

I was doing pretty good for a while...


But now what? There's no way to directly assign as System.Drawing.Bitmap (the type that I had) to a WPF Image control.

Fortunately, the developer's friend, StackOverflow, had my solution: http://stackoverflow.com/a/6775114. So, I created a static class for this code and completed the method.


Now it was time for the moment of truth -- running the application:


At least I got an image. But it doesn't look quite right. Time to read the instructions (from the Kaggle site):
Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive.
Okay, so I got my light/dark reversed. In my image, the higher numbers represent lighter values. Let's fix the method by subtracting the value from 255 (to invert it):


And see how that looks:


Much better. And I know from looking at the data, that this should be a number "1".

Step 5: Displaying All the Digits
So, I successfully parsed and displayed a single digit image. Next it was time to try to display a lot of them. I swapped out my <Image> for a <ListBox> and then created the image items to put into the list in code:


This just get the first 10 items, loops through them, and generates bitmaps. Then it creates WPF Image controls for each one and puts it into the ListBox. Here's the output:


Not quite what I wanted. It needs some size constraints:


I set the height and width to 28 since that matched the pixel dimensions. The output was much better:


So, I increased to 100 items:


Then 1000:


And this is where I start to look at the data. Something's not right.

Step 6: Fixing the Image
These numbers don't look quite right. It almost looks like some of them are upside down and some of them are backwards. I thought at first that this might have been intentional. But after staring at it for a bit more, I knew things weren't right on my end.

This probably had something to do with the X-Y coordinate system. Some systems use the top left corner as 0,0. Some systems use the bottom left corner as 0,0. Instead of trying to change my arrays, I figured it would be easier to manipulate the bitmap after it's created. So, I played around a bit with the "RotateFlip" method until I found a setting that worked:


That seems to take care of things:


Much better.

Step 7: Final Image Fix
There was one last thing that bothered me. The digits looked blurry. It made sense that they would be blurry when they were stretched out (like in the initial display), but they should be at the native sizes now.

Or should they?

That's when I remembered how WPF worked. When you set the height and width of controls, you don't actually set pixel sizes; you set device-independent units. This is so that WPF can scale appropriately on different DPI devices. As a side note, I really appreciate the scaling now that I'm using a high-resolution display.

Instead of setting the height and width to "28", let's set them to the size of the image itself:


This gives us more digits per line (they are a bit smaller). But none of them are blurry now.


You can click on the image to see the full-resolution version.

Step 8: Functional Style
Since I started out this experiment looking for functional programming ideas, I figured it would be good to incorporate a few in the application.

I made "GenerateDigitArray" and "GetBitmapFromDigitArray" into static methods. And the "LoadDataStrings" was already static.



What's important about these methods is that they are atomic -- they do not depend on any external state or the state of the class that they are in. They do not make any modifications to existing objects. They take parameters and return separate objects. That means they can operate completely independently and have no side effects.

By making them static, they also become easier to use in our code:


This code shows that we do not need to create instances of the FileLoader or the DigitBitmap objects. We just use the static methods that are on those classes.

We can definitely take this further, and the functions are fairly specific by only dealing with 28x28 arrays. But this is a really good place to start. And again, one of the important things is that we consciously think about these functional concepts.

Step 9: General Clean-Up
That's the working application that I wanted. I know, it's not a very exciting application. But it's kind of cool to think that I'm displaying all of these hand-written numbers based only on comma-separate values.

I did a little bit of clean-up after this. I won't go into all the details (you can check the code download if you're interested). I made a change to the "LoadDataStrings" method. In my main code, I wanted to get rid of the "for" loop in the main block of code and make it a "foreach" so I could process however many records were in the file. You can see this by looking at the previous code sample.

[Update: 7/11/2015: Project has been uploaded to GitHub: https://github.com/jeremybytes/digit-display]

[Update 06/27/2016: This project has been expanded beyond what's shown in this article. To see the code here, check the "DigitDisplay" branch on GitHub: jeremybytes/digit-display Branch DigitDisplay.]

The problem is the first time I did this, my computer just started spinning. That's when I checked the file and found that there were over 40,000 items. That's a bit much to process into images all at once. So, I added a "threshold" parameter to the "LoadDataStrings" method. That way I could say, "Just give me the first 1,000 values". And it would be easier to experiment with. It needs a bit of optimization to work with that many records, and the WPF ListBox may not be the best choice of controls for displaying that many items. It's something to think about further.

Wrap Up
You never know where coding explorations will take you. I found this to be a good exercise for me -- a way to do a few things that I've never done before. And hopefully you've gotten a bit of insight by seeing my thought process. I don't always get things right the first time. And that's perfectly okay. But by working in small steps, it's really easy to keep making progress.

Happy Coding!

No comments:

Post a Comment