It inspired me to go back to a bit of my coding practice and do some more experimentation. I took some code that was written in C# and replaced it with F# code and then worked on some optimization. This let me explore the dataset and find some interesting results regarding accuracy based on the size of the training set and the algorithm used.
Previous Code
So, back in June 2014, I first did some exploration into a machine learning problem: recognizing hand-written digits. I wasn't up to the challenge at that point, so I just figured out a way to display the data (represented by a string of values from 0 to 255 that represent the "lightness" of a pixel). You can read about that here: Coding Practice: Displaying Bitmaps from Pixel Data.
This gave me an output like this:
The Original Digit Display |
This let me see the original bitmaps next to the prediction from the machine learning algorithm. You can read more about the process here: More Fun with Machine Learning: Recognizing Digits.
Here's the result of that:
Digit Display with Predictions (C# Code) |
This is great because I get to see a side-by-side comparison of the hand-written digit and what the computer thinks it is.
But this was all C# code (taken from Mathias' book). I wanted to use the F# code here.
Integrating the F# Code
So, I had the F# code (also from Mathias' book). But there were a couple of challenges since I haven't worked extensively with F# projects.
If you'd like to take a look at the code, go to the GitHub project: jeremybytes/digit-display and take a look at the FSharpRecognizer branch. This project has taken on a life of its own, so I'll probably be re-organizing things in a bit.
The first is that the code that I had was in a script file. This is great for running interactively in the REPL. But it wouldn't work for what I needed. I needed a .NET library that I could call from my WPF application.
The other thing is that the original code ran the machine learning algorithm against an entire set of data so that it could get an accuracy percentage at the end. I needed to run the algorithm against one image at a time so that I could associate the prediction with the image.
The first thing I did was create an F# library in my solution. I copied the script file (.fsx) that I had and created a new F# file (.fs) from it.
I had to make some changes to the top of the file to get things to work. This was new to me since I hadn't done this before, but I managed to get a working solution.
Here's the top of the file:
Top of the F# Library File |
The first thing I had to do was create a module. The way I figured this out was from Visual Studio telling me that I needed to have a module:
So, I declared a module.
Next, I needed to update where I got my data. The original script was using a hard-coded path, but I wanted to get that value from configuration (since that's what I was doing with the C# code). With a little hunting, I found "FSharp.Configuration" that let me do just that.
This gives us an AppSettings type provider (type providers are *extremely* useful in F#). The code is pretty simple:
Getting the File from Configuration |
Now I had the data pointed at the right files.
The last thing that I really needed to do was create a function that would work with an individual bitmap image (or technically an integer array that represented that bitmap image). This was some new code that I needed to write.
I took a couple different shots at this, first using the "Observation" object from the original script. That contains both the integer array, plus a value that represents the actual digit. I didn't have the actual digit here since I was using a different data set (the original script used the training data with *does* have the actual digit).
This wasn't all that difficult since I had the original script code I could copy from:
A New Prediction Function |
The "evaluate" method takes the classifier (our algorithm) and runs it against all of the data values. The important bit is right in the middle where it calls "classifer x.Pixels". This creates a prediction (and then it compares it to the actual value in the test data).
So, I just pulled that one piece out and created the "predict" method. This would now return a single prediction (a string) based on a single integer array.
Using the Code
Once I had this, I could go to my WPF project and replace the code that called into the C# library with calls to the F# library.
Here's the snippet of code to do that:
Using the F# Code in the C# Application |
Since the data comes from a text file and is represented by a list of comma-separated integer values, I first had to do some parsing.
The first line takes that entire string and "Split"s it on the commas. This will give us an array of strings. Then the "Select" does a conversion to turn those string values (such as "50", "240", etc.) into integers. The result is an enumeration of integer values. The final method "ToArray" turns it into the integer array that we need for our function.
This syntax takes a bit of getting used to. But I've found that I've become more and more comfortable with this over the years, and I've started to prefer a fluent syntax (where we "dot" methods together) to pipe the output of one function into the input of another.
This is actually a sign that I should jump into functional programming more. The syntax in F# is a bit cleaner than the "dot" syntax that we use in C#.
After getting the integer array, we can run the "predict" function that we just created. All we have to do is pass in the integer array and the classifier that we want to use. In this case, we're using the "manhattanClassifier" -- this was copied from the script file into the F# file with no changes.
I won't lie. It took me a while to figure out the right syntax for everything here. It was the first time that I was calling F# code from a C# application.
Success!
When I did this, I got a successful output. I was able to display the bitmap side-by-side with the prediction coming from the F# library.
But...
(Everyone I know has a big but...)
The application was *SLOW*. The code with the C# library would load up 1000 records in about 3 minutes. The F# library took 3 times that long.
I figured that I would need to do some optimization. For one thing, I wasn't sure how the F# code was behaving. I was using the "manhattanClassifier" inside of a loop. I was pretty sure that the functions behind this method ran for each loop iteration. This meant loading the training set and other things like that.
I really wanted to simply change the loop that I had from a "foreach" to a "Parallel.ForEach". But that wouldn't work with the existing code. The problem is that I was creating the UI elements (the bitmap image and the text block with the predicted number) inside of the loop. So those really needed to be on the UI thread in order to work.
So, I ended up leaving this code alone for a while.
Optimization - Step 1
At NDC Oslo, I showed the code that I had to Mathias, and he worked with me to figure out where the bottleneck was in the code. When you're optimizing, you first have to figure out where the biggest problem is.
Based on stepping through the code, he suggested that we optimize the code that calculates the distance between the pixels (the so called Manhattan distance). Here's the original code:
Original F# Manhattan Distance Calculation |
This function is applied between the pixels of our bitmap and the pixels of all of the data in our training file. So this gets run *a lot*.
Mathias actually suggested that we go a bit non-functional by using mutable data. This would run a bit faster:
Optimized F# Manhattan Distance Calculation |
Instead of piping the data through a map function and then summing the values, we created a mutable total object that would be updated along the way.
This seems counter-intuitive to how I felt we *should* be doing things. But sometimes optimization leads us to these types of choices. I'm glad that Mathias was there to recommend this because I never would have thought of this on my own.
With this change in place, the F# code was now just as fast as the C# code. That really makes sense since the C# code is using the doing the same thing. Here's a snippet of the C# code for comparison:
C# Manhattan Distance Calculation |
By looking at these 2 blocks of code, it makes sense that these would run at the same speed -- they are doing the same thing.
So I had successfully swapped out the F# code for the C# code and had the same performance in the application. But I knew that I could eke out some more performance by parallelizing things a bit.
Optimization - Step 2
The working application had one big flaw (which is a flaw that I had with the C# code as well). When the application was running, it would go into a "Not Responding" state while the data was loading up. That's because everything was happening on the UI thread. So everything was blocked until the process was complete.
This is never a good experience.
In addition, when I looked at the Task Manager, I saw that only 25% of my CPU was being used by the application (meaning, I was only using 1 of the 4 cores available on my machine).
Time to figure out how to run the calculations in parallel.
I really wanted to use the Parallel.ForEach, even if I had to move my code around a bit. But I couldn't figure out a good way to do that while still keeping things on the appropriate threads. I wanted the "predict" method to run on another thread, but I needed all of the UI element creation and interaction to happen on the UI thread.
So I went to my experience with Task. I knew that I could easily create a Task that would run on a different thread. And then I could create a continuation that ran on the UI thread. (For more information, check out my videos and articles for "I'll Get Back to You: Task, Await, and Asynchronous Methods".)
I extracted out the code that creates the UI elements into a separate method. This was originally inside the foreach loop that iterated over the data:
Creating the UI Elements |
Everything in this method should run on the UI thread. To manage that, I created a task and a continuation inside the foreach loop:
Getting Tasks on the Right Threads |
The first Task will run the "predict" method on a separate thread. This will keep our UI thread clear during the long processing parts.
After that, we have a continuation that calls the "CreateUIElements" method that we have above. This method will get called after the original task complete. But more importantly, since we have "TaskScheduler.FromCurrentSynchronizationContext()", this will run the "CreateUIElements" method on the UI thread of our application.
This keeps our UI elements on the UI thread and everything else on separate threads.
The Results
The result of creating the tasks is that we now use all 4 cores of my machine, and we can get CPU usage up to 100%.
In addition, our application no longer goes into a "Not Responding" state. Instead, we see the results get gradually added to the list box in our UI as the processes complete. It's not completely smooth. I think the UI thread is still getting starved a bit by the other activity, so sometimes there are pauses and then a bunch of elements get added to the UI at the same time. But the overall effect is good.
FASTER! |
As we can see, this loads up the data in 1 minute, 16 seconds. This is about 3 times faster than before (it's not 4 times faster because there's some overhead for the tasks, threading, scheduling, etc.).
As a reminder, you can get the code from the GitHub project: jeremybytes/digit-display and take a look at the FSharpRecognizer branch.
More to Come
I was really excited to get the F# code working in this application. This makes me feel like I can integrate F# code into my existing applications without too much trouble. But this also opened up a lot of possibilities for more exploration.
1. Different Classifiers
In addition to the classifier that uses that Manhattan distance calculation, the F# code has a classifier that uses Euclidean distance. In the script examples that Mathias shows, the Euclidean classifier is more accurate. Now I can swap between the Manhattan classifier and Euclidean classifier to see what the actual output looks like. I'm curious to see if the Euclidean classifier simply reduces the number of errors made by the Manhattan classifier or if it makes completely different errors.
I've already done a bit of this experimentation, so look forward to an article on that soon.
2. Speed - Training Set
The Euclidean classifier is a bit slower than the Manhattan classifier -- the calculation uses a power instead of absolute subtraction. This code runs quite a bit more slowly than the code from Mathias' book, and that's because I'm using a different training set. Mathias uses a portion of the training set provided by Kaggle, but I'm using the entire 40,000 records as the training set. This significantly slows down the predictions.
I've experimented with different size training sets (which affect both performance and accuracy). So the results of that will be coming soon, too.
3. Speed - Prediction
As another way of speeding things up, Mathias suggested that there may be a way to optimize the predictions (such as short-circuiting if a value gets too high). This will require a bit more thought.
4. Speed - Reloading
Finally, I'm pretty sure that the training data is getting reloaded each time I call the "predict" function. When this code is being used in a script, it's really easy to only run part of the script so that the data only gets loaded once. But because I'm calling this in a library, I think it gets run each time. This doesn't appear to be terribly slow, but eliminating duplicate work is always a good idea.
[Update: After some more experimentation, I found that the training data is *not* being reloaded each time.]
So there's still quite a bit of work to do on this application.
Wrap Up
This process has reminded me of how much I like functional programming. I really need to do the deep dive that I've been threatening to do for the last 2 years. The good news is that I've got some downtime after the next few weeks. I think I'll use that time to go full-bore into the functional world. I've spent enough time sitting on the edge; it's time to dive in.
Happy Coding!
Hi Jeremy,
ReplyDeleteI tried rewriting your distance calc in a more functional way using fold and my benchmarking gave me a ~33% performance improvement (2.4s -> 1.6s for 10 million pixel arrays)
The code would be:
let manhattanDistance2 (pixels1, pixels2 : int []) =
Array.zip pixels1 pixels2
|> Array.fold (fun z (x1, y1) -> abs(x1 - y1) + z) 0
Does that have the same sort of impact as using a mutable variable for you?
Hi Doug,
DeleteThanks for contributing. The proposed code does run faster than the original, but it's still quite a bit far off from the mutable version.
Original (Array.map) - 470 seconds
Proposed (Array.fold) - 373 seconds
Current (mutable) - 72 seconds
I got these numbers by running by using the full training set (42,000 records) and recognizing 1,000 items. I ran the application from a development branch (DetailedComparison - https://github.com/jeremybytes/digit-display/tree/DetailedComparison). I used this branch because it's possible to change parameters in the UI. There are some bugs in this branch, but they don't affect the timing comparison.
I'm a bit torn at having the mutable code in here. As I've been getting a bit deeper into functional programming, I'm finding that a lot of the situations I normally deal with work better with object-oriented programming (particularly around UIs). I may be taking the wrong approach, or I may be missing something completely obvious. But I'm going to keep exploring and learning.
Thanks again for your help
-Jeremy