When dealing with Tasks, where we create our Progress<T> objects affects where our progress handlers run (or don't run).
Let's jump into a sample. Our application is pretty simple (and you can find the code on GitHub: jeremybytes/progress-task). Here's the UI:
What we're doing is starting a new Task and then reporting progress back to our UI. We've got our code in the code-behind to see how the Progress<T> object reacts. This isn't ideal, but it's what we have for small apps, and it will let us see where we can run into issues.
The Basics
We've got a couple of methods that we want to use (these are in MainWindow.xaml.cs along with all the other code):
The "StartLongProcess" method is what we'll put into our Task. This is pretty simple. We have a "for" loop that goes from 0 to 50. Then inside the loop, we send back the value of our indexer through the progress object. Finally, we Sleep for 100 milliseconds to add some delay to our process. We'll pretend that that "Sleep(100)" is actually processing data or some other operation like that.
When "progress.Report" is called, it runs the "UpdateProgress" method (we'll see how these get associated in just a bit). The parameter "message" is the value that is passed in to the "Report" method above. Since we have a "Progress<string>", our parameter will be a string type.
So ultimately, we'll take the value of the indexer from our "for" loop and put it into a text block in our UI.
Progress<T> in "Bad Progress"
We'll start by looking at the code we have on our "Bad Progress" button:
This creates and starts a new Task by using "Task.Run" (for more information see "Task.Run() vs. Task.Factory.StartNew()").
We create a task that will execute the "StartLongProcess" method. This takes an "IProgress<string>" as a parameter (as we saw above). We use "new Progress<string>" to create the progress object. "Progress<string>" implements the "IProgress<string>" interface, so we can pass it to this method as a parameter.
The Progress constructor take a delegate as a parameter. This is the method that will be called when we "Report" progress. Notice that this is the "UpdateProgress" method from above. This is how our progress object gets associated with the progress handler.
Let's run this code to see what happens. If we click the "Bad Progress" button, we get the following message:
As we can see, we're getting an "InvalidOperationException". The additional information tells us what we need to know: "The calling thread cannot access this object because a different thread owns it."
This means that our "UpdateProgress" method is not running on the UI thread. When we try to directly interact with UI elements from a non-UI thread, we get this error.
We can fix this by marshalling this call back to the UI thread. But there's actually an easier way to get this code to work.
Progress<T> in "Good Progress"
What's the easier way? Let's look at our "Good Progress" button for a clue:
The difference here is subtle. We're still calling "StartLongProcess" inside our Task, and we're still creating a "Progress<string>" object that points to our "UpdateProgress" method.
But as we'll see, our results are completely different when we click the "Good Progress" button:
This code actually works as expected! We can see our text block has the value of "22" (I grabbed a screenshot in the middle of the process).
This calls the exact same "UpdateProgress" method, but it has no trouble interacting with the UI elements. Let's focus on the differences between our 2 methods.
Where We Create Progress<T> Matters
Here are both methods together:
What we need to pay attention to is where we create the "Progress<string>" object in each case.
In the "BadProgress" method, we create the progress object inside of our Task, i.e. we're creating the object inside the lambda expression that we have to define our Task. Since we're creating the progress object on our Task thread, the progress handler ("UpdateProgress") will be run on this thread -- not on the UI thread.
In the "GoodProgress" method, we create the progress object outside of our Task -- before we even create the Task. Since we're creating the progress object on our UI thread, the progress handler will also run on that thread. This is why we can safely interact with our UI elements from the "UpdateProgress" method here.
As a side note, our "progress" variable is captured by the lambda expression. Captured variables are pretty cool. You can learn more about them here: Video - Lambda Expression Basics.
What we've seen is that it's important to pay attention to where we create our Progress<T> objects. This is a subtle difference, but it can cause unexpected problems if we're not looking out for it.
Other Options
There are other ways that we can get this code to work. For example, we can get a reference to the Dispatcher of our UI thread and then use "Dispatcher.Invoke" in the "UpdateProgress" method to marshal things back to the UI thread. This will let us interact with the UI elements as we expect.
Another option is to move this code out of the code-behind of our form and into a separate view model class. In this scenario, our "UpdateProgress" method could update a property in the view model rather than updating the UI directly. This would eliminate the direct interaction with the UI elements -- that would all be handled through data binding.
These options have their pros and cons (and special quirks that we need to be aware of). It's always good to think about other ways of accomplishing a task so that we can pick the one that fits our needs.
Wrap Up
I came across this particular issue when I was helping another developer troubleshoot some code. In this code, there were 2 different Tasks that were created a little bit differently. One of them worked, and one of them didn't. On closer examination, we found that the progress objects were being created on different threads (just like our sample code here).
I will be exploring progress reporting in asynchronous methods a bit more in the future. In the meantime, be sure to take a look at the basics of Task, Await and Asynchronous Methods.
When dealing with multiple threads, it's common to run across issues just like this one. Task and Await make some things easier, but we still need to understand some of the underlying functionality so that we can use these appropriately -- and hopefully not be surprised when something strange crops up.
Happy Coding!
Thursday, July 23, 2015
Sunday, July 19, 2015
Getting a Bit LINQ-ier
Last time we converted a file load method to be a bit more LINQ-y. The result was a compact and very readable method:
But should we be happy with this? Commenter "TomThumb" is not:
ReadAllLines vs. ReadLines
We'll start by looking at the difference between the methods ReadAllLines() and ReadLines(). Here are the method signatures from the documentation:
This shows us that "ReadAllLines" returns a string array, and "ReadLines" returns an string enumeration.
What's the difference? Well, "ReadAllLines" will read the entire file. "ReadLines" will read one line at a time as we enumerate through the file. If we stop enumerating, then the "ReadLines" will stop reading from the file. So this gives us an opportunity to short-circuit the file read.
And this is important because of how we're using the data from the file:
Pay attention to the "Take" method. Our file has about 40,000 records in it. But we may only "Take" 1,000 records, so there's no need for us to read the rest of the file.
This sounds good in theory, but let's put it into practice and see if we get a performance difference.
Comparing Performance
For this, we'll compare our original method (using "ReadAllLines") with a new method that uses "ReadLines". Here's the updated method:
And I'm curious about differences in the performance based on how much of the file that we read. So we'll do tests reading 1,000 records, 10,000 records, and the entire file (about 40,000 records).
1,000 Records
The application itself already has a duration timer built in, so we'll just use this. Now this shows the duration for the entire process which includes reading the file, processing the data into bitmaps, and loading them into a WPF list box.
The file load process is the fastest part of this process, so I expect to see the biggest differences when I only load a small portion of the data file.
Here are the results when we limit things to 1,000 records:
As expected, this is a pretty dramatic difference. Over several runs, the average with "ReadAllLines" was 1.316 seconds. The average with "ReadLines" was 0.951 seconds.
This makes sense since we don't have to read 39,000 records from the file, just the 1,000 that we're using. But let's keep going. I'm curious as to whether we'll keep that advantage with more records.
10,000 Records
The next tests were with 10,000 records. And the difference is still noticeable:
The average with "ReadAllLines" with 10,000 records was 9.311 seconds. The average with "ReadLines" was 9.042 seconds. Again, the bulk of the work being done here is creating bitmaps and loading the list box. But we see that there is about a 1/4 second difference in our results.
40,000 Records
For the last set of tests, I read the entire file which is about 40,000 records. This is where I was most curious. I wondered if the overhead of having the enumeration would out-weigh the efficiency of reading all the data into memory.
Let's look at the results:
This is where the tables turn -- but just a little bit. The average with "ReadAllLines" when we read the entire file is 37.811 seconds. The average with "ReadLines" is 37.877 seconds.
These results are so close (within a few 1/100ths of a second) that I don't feel right calling one "faster" than the other.
What this tells us is that when we "short-circuit" the file reading process (by only reading a portion of the file), we do get a definite advantage from "ReadLines". And when we read the entire file, we do not get a noticeable performance hit from using "ReadLines".
So, in this particular instance, it would be better for us to use "ReadLines".
Considerations
I have updated the code in the GitHub project: jeremybytes/digit-display. If you want to play with this code yourself, the file load method is in the FileLoader.cs file, and the read threshold is set in the MainWindow.xaml.cs file.
Here's our file loader method:
Now that we've made this method more LINQ-y by using a Read method that returns an IEnumerable instead of an array, we can look at taking this further.
I'm a big fan of enumerations and IEnumerable. There are some really cool things that we can do (particularly around lazy-loading). In this method, when we call "ToArray" it causes the enumeration to be evaluated. So any lazy-loading goes away right there. We force all of the records to be enumerated.
But we have another option. What if we were to push the enumeration further down in our application? So instead of returning a "string[]" from our "LoadDataStrings" method, we could return "IEnumerable<string>".
How would that affect our downstream code? Well here's where this method is used (in our MainWindow.xaml.cs file):
Notice that "rawData" is a string array, but this could just as easily be an "IEnumerable<string>". This would change how our code runs. This may be good or bad depending on our needs.
If we switch to an enumeration (rather than an array), then our "foreach" will end up going all the way back to our original enumeration from "ReadLines". This means that a line will be read from the file, then processed into a bitmap, then loaded into the list box before the next line is read from the file.
Is this a good thing? I'm not quite sure -- I'll need to think about this some more. One downside to doing things this way is that the file stays open during the entire process -- even while we're loading our UI controls.
With our current code (which calls "ToArray"), we're done with the file as soon as the file load method is finished. So the file can be closed before we start doing any processing on the data.
I'll leave things the way they are (with the arrays) for now. If you have a preference one way or the other, be sure to leave a comment.
Wrap Up
Thanks to "TomThumb" for making me think about things a bit more. Mathias actually does have a sidebar in his book that talks about the difference between "ReadAllLines" and "ReadLines". I appreciate the push to explore this a bit further.
When the question "How LINQ-y can you get?" comes up, the answer is often "a bit LINQ-ier". I'll look forward to exploring this (and a bunch of F#) as I dive deeper.
Happy Coding!
But should we be happy with this? Commenter "TomThumb" is not:
"Have you thought of using File.ReadLines() instead? It returns a lazily evaluated enumerable. It's arguably a bit more LINQy, and perhaps fits your use case a bit better (selecting subsets of the file)?"Yep, this is probably a good idea. Let's take a closer look and do some informal metrics.
ReadAllLines vs. ReadLines
We'll start by looking at the difference between the methods ReadAllLines() and ReadLines(). Here are the method signatures from the documentation:
This shows us that "ReadAllLines" returns a string array, and "ReadLines" returns an string enumeration.
What's the difference? Well, "ReadAllLines" will read the entire file. "ReadLines" will read one line at a time as we enumerate through the file. If we stop enumerating, then the "ReadLines" will stop reading from the file. So this gives us an opportunity to short-circuit the file read.
And this is important because of how we're using the data from the file:
Pay attention to the "Take" method. Our file has about 40,000 records in it. But we may only "Take" 1,000 records, so there's no need for us to read the rest of the file.
This sounds good in theory, but let's put it into practice and see if we get a performance difference.
Comparing Performance
For this, we'll compare our original method (using "ReadAllLines") with a new method that uses "ReadLines". Here's the updated method:
And I'm curious about differences in the performance based on how much of the file that we read. So we'll do tests reading 1,000 records, 10,000 records, and the entire file (about 40,000 records).
1,000 Records
The application itself already has a duration timer built in, so we'll just use this. Now this shows the duration for the entire process which includes reading the file, processing the data into bitmaps, and loading them into a WPF list box.
The file load process is the fastest part of this process, so I expect to see the biggest differences when I only load a small portion of the data file.
Here are the results when we limit things to 1,000 records:
ReadAllLines - 1,000 Records |
ReadLines - 1,000 Records |
As expected, this is a pretty dramatic difference. Over several runs, the average with "ReadAllLines" was 1.316 seconds. The average with "ReadLines" was 0.951 seconds.
This makes sense since we don't have to read 39,000 records from the file, just the 1,000 that we're using. But let's keep going. I'm curious as to whether we'll keep that advantage with more records.
10,000 Records
The next tests were with 10,000 records. And the difference is still noticeable:
ReadAllLines - 10,000 Records |
ReadLines - 10,000 Records |
The average with "ReadAllLines" with 10,000 records was 9.311 seconds. The average with "ReadLines" was 9.042 seconds. Again, the bulk of the work being done here is creating bitmaps and loading the list box. But we see that there is about a 1/4 second difference in our results.
40,000 Records
For the last set of tests, I read the entire file which is about 40,000 records. This is where I was most curious. I wondered if the overhead of having the enumeration would out-weigh the efficiency of reading all the data into memory.
Let's look at the results:
ReadAllLines - 40,000 Records |
ReadLines - 40,000 Records |
This is where the tables turn -- but just a little bit. The average with "ReadAllLines" when we read the entire file is 37.811 seconds. The average with "ReadLines" is 37.877 seconds.
These results are so close (within a few 1/100ths of a second) that I don't feel right calling one "faster" than the other.
What this tells us is that when we "short-circuit" the file reading process (by only reading a portion of the file), we do get a definite advantage from "ReadLines". And when we read the entire file, we do not get a noticeable performance hit from using "ReadLines".
So, in this particular instance, it would be better for us to use "ReadLines".
Considerations
I have updated the code in the GitHub project: jeremybytes/digit-display. If you want to play with this code yourself, the file load method is in the FileLoader.cs file, and the read threshold is set in the MainWindow.xaml.cs file.
Here's our file loader method:
Now that we've made this method more LINQ-y by using a Read method that returns an IEnumerable instead of an array, we can look at taking this further.
I'm a big fan of enumerations and IEnumerable. There are some really cool things that we can do (particularly around lazy-loading). In this method, when we call "ToArray" it causes the enumeration to be evaluated. So any lazy-loading goes away right there. We force all of the records to be enumerated.
But we have another option. What if we were to push the enumeration further down in our application? So instead of returning a "string[]" from our "LoadDataStrings" method, we could return "IEnumerable<string>".
How would that affect our downstream code? Well here's where this method is used (in our MainWindow.xaml.cs file):
Notice that "rawData" is a string array, but this could just as easily be an "IEnumerable<string>". This would change how our code runs. This may be good or bad depending on our needs.
If we switch to an enumeration (rather than an array), then our "foreach" will end up going all the way back to our original enumeration from "ReadLines". This means that a line will be read from the file, then processed into a bitmap, then loaded into the list box before the next line is read from the file.
Is this a good thing? I'm not quite sure -- I'll need to think about this some more. One downside to doing things this way is that the file stays open during the entire process -- even while we're loading our UI controls.
With our current code (which calls "ToArray"), we're done with the file as soon as the file load method is finished. So the file can be closed before we start doing any processing on the data.
I'll leave things the way they are (with the arrays) for now. If you have a preference one way or the other, be sure to leave a comment.
Wrap Up
Thanks to "TomThumb" for making me think about things a bit more. Mathias actually does have a sidebar in his book that talks about the difference between "ReadAllLines" and "ReadLines". I appreciate the push to explore this a bit further.
When the question "How LINQ-y can you get?" comes up, the answer is often "a bit LINQ-ier". I'll look forward to exploring this (and a bunch of F#) as I dive deeper.
Happy Coding!
Friday, July 17, 2015
Jeremy on Developer On Fire
Have you ever wanted to learn more about me? Now's your chance. All you have to do is tune in to Dave Rael's podcast: Developer On Fire. Check out Episode 012 to hear Dave and I have a great conversation about what drives me forward in the development world as well as the world of "making developers better."
As I get older, I have more and more stories. These are some of the best -- including my favorite success, how I add value, my least-favorite failure, and how I got started on my career path as a developer and swerved into my latest career as a Developer Betterer.
Listen now:
Dave has put together a great podcast that gives you a look at developers you've probably heard of, including John Sonmez, Rob Eisenberg, Oren Eini, Udi Dahan, and many others.
Be sure to subscribe to the feed or pick up the podcast from ITunes or Stitcher. Take your pick. Lots of options available from the podcast website: Developer On Fire Podcast.
A big thanks to Dave for giving me the chance to share these stories all in one place.
Happy Coding!
As I get older, I have more and more stories. These are some of the best -- including my favorite success, how I add value, my least-favorite failure, and how I got started on my career path as a developer and swerved into my latest career as a Developer Betterer.
Listen now:
Episode 012 - Jeremy Clark - Understanding Users and Making Developers BetterSubscribe
Dave has put together a great podcast that gives you a look at developers you've probably heard of, including John Sonmez, Rob Eisenberg, Oren Eini, Udi Dahan, and many others.
Be sure to subscribe to the feed or pick up the podcast from ITunes or Stitcher. Take your pick. Lots of options available from the podcast website: Developer On Fire Podcast.
A big thanks to Dave for giving me the chance to share these stories all in one place.
Happy Coding!
Thursday, July 16, 2015
Getting More LINQ-y
As I explore functional programming more, I'm learning how I haven't been using LINQ as much as I could be.
Last time, I talked about how I was excited about what I saw in Mathias Brandewinder's book Machine Learning Projects for .NET Developers (Amazon link). I made it through the first chapter, and I'm absorbing lots of stuff about machine learning and F#, but I also came across something simple that I hadn't thought about before:
Functionalizing Code
I'll show what I'm talking about by going back to an article from last year where I needed to load data from a text file (Coding Practice: Displaying Bitmaps from Pixel Data).
Here's my original load from file method:
This is fairly straight-forward (and a bit verbose) code. It gets the data file name from configuration, then reads the file, and returns an array of strings (one for each line in the file).
There are a couple of quirks. Notice the "ReadLine()" with the comment "skip the first line". The first line contains header information, so I just did a read to ignore that.
The other quirk is that I have a parameter for the number of lines that I want to read. If the parameter is supplied, then I only want to read that number of lines (the original data file has 40,000 lines, and it was much easier to deal with smaller data sets). And if the parameter is not supplied, then we read in all the lines.
If you want to look at this code, it's available on GitHub: jeremybytes/digit-display. Look at the initial revision of the file here: FileLoader.cs original.
LINQ to the Rescue!
Now, Mathias shows code to load data from the same file/format (we'll look at that in just a bit). But instead of using a stream reader like I did, he used LINQ pretty much straight across.
So I went back and retro-fitted my file loading method. Here's what's left:
This code does the same thing as the method above. Well, not exactly the same, but close enough for government work.
Rather than using a stream reader, we use "ReadAllLines" to bring in the entire file. (I'll talk about the performance implications of this in just a bit). After that, we just use the standard LINQ methods to get the data we want.
The "Skip(1)" call will skip the header row. The "Take" method will limit the number of records to what's passed in to the parameter. As a side note, notice that I changed the parameter default from "0" to "int.MaxValue". This way if the parameter is omitted, the entire file will be read (well up to the max integer value anyway -- this would be a problem for large data sets (but then so would the rest of this application)).
Then we just use the "ToArray" method to get it into the final format that we want.
I don't know why I never thought of doing things this way before. Now there is a performance difference here. Since "ReadAllLines" reads the entire file, we're bringing in more data than we need, but the heavy-lifting of this application is done after this step, so the performance difference is negligible. But these are the things we need to think about as we make changes to our code.
[Update: 07/20/2015: As suggested in the comments, I explore the difference between "ReadAllLines" and "ReadLines" in Getting a Bit LINQ-ier.]
If you want to look at this code yourself, the latest version of this file is on GitHub: jeremybytes/digit-display. Here's a link to the file: FileLoader.cs current.
The Inspiration
Like I said, the way that Mathias loads the data in his book was the eye-opener. Here's that code (from Chapter 1):
This is doing multiple steps. The second method (ReadObservations) reads the data from the file. In addition to skipping the first line, it does a data transformation using the "Select" method.
And we can see this in the first method. It takes the string data (which is a collection of comma-separated integers) and turns it into an integer array. This process skips the first value because this tells what digit the data represents. Everything after than on the line is the pixel data.
More LINQ-y Goodness
So, I wasn't content with just the file loading part of the application. I also needed to take the string data and convert it into a list of integer values.
Here's my original attempt at that (just part of this particular method):
This isn't bad code. I split the line on the commas, then "foreach" over the elements to convert them from strings to integers. File here: DigitBitmap.cs original.
And here's the updated code:
Here's a link to the file on GitHub: DigitBitmap.cs current.
As far as lines of code is concerned, these are about equivalent. You might notice that the original code doesn't have the "Skip 1" functionality. That's because I took care of it later in the method (which resulted in an off-by-one error that you can read about in the original article).
A Bit of a Stumbling Block
The thing that I hadn't really thought about before was using "Select" to transform the data from string to integer. And this was a bit of a stumbling block for me.
I have seen "Select" mis-used in the past. The code sample that I saw called a method in "Select" that had side effects -- it mutated state on another object rather than simply returning data. This smelled really bad to me when I saw it. Since then, I've been careful and shied away from using "Select" with other methods.
But in this case, it's perfectly appropriate. We are not mutating data. We are not changing state. We are transforming the data -- and this is exactly what "Select" is designed for. This was a real eye-opener, and I'm going to be a bit more creative about how I use LINQ in the future.
Wrap Up
I've been a big fan of LINQ for a really long time. And I really like showing people how to use it (and if you don't believe me, just check out this video series: Lambdas and LINQ in C#). I'm really happy that I can still extend my outlook and find new and exciting things to do with my existing tools.
And I find myself getting sucked more into the functional world. The problems are interesting, the techinques are intriguing, and the people I know are awesome.
I'm pretty sure that by the time I get through this book, I'll be looking for ways to use functional programming (and most likely F#) as a primary tool in my toolbox.
Happy Coding!
Last time, I talked about how I was excited about what I saw in Mathias Brandewinder's book Machine Learning Projects for .NET Developers (Amazon link). I made it through the first chapter, and I'm absorbing lots of stuff about machine learning and F#, but I also came across something simple that I hadn't thought about before:
LINQ can be used much more extensively than I've been using it.Now, I've been a huge proponent of LINQ, but I usually think about it for things like handling existing data sets. But I don't really think about it as a way to create those data sets.
Functionalizing Code
I'll show what I'm talking about by going back to an article from last year where I needed to load data from a text file (Coding Practice: Displaying Bitmaps from Pixel Data).
Here's my original load from file method:
This is fairly straight-forward (and a bit verbose) code. It gets the data file name from configuration, then reads the file, and returns an array of strings (one for each line in the file).
There are a couple of quirks. Notice the "ReadLine()" with the comment "skip the first line". The first line contains header information, so I just did a read to ignore that.
The other quirk is that I have a parameter for the number of lines that I want to read. If the parameter is supplied, then I only want to read that number of lines (the original data file has 40,000 lines, and it was much easier to deal with smaller data sets). And if the parameter is not supplied, then we read in all the lines.
If you want to look at this code, it's available on GitHub: jeremybytes/digit-display. Look at the initial revision of the file here: FileLoader.cs original.
LINQ to the Rescue!
Now, Mathias shows code to load data from the same file/format (we'll look at that in just a bit). But instead of using a stream reader like I did, he used LINQ pretty much straight across.
So I went back and retro-fitted my file loading method. Here's what's left:
This code does the same thing as the method above. Well, not exactly the same, but close enough for government work.
Rather than using a stream reader, we use "ReadAllLines" to bring in the entire file. (I'll talk about the performance implications of this in just a bit). After that, we just use the standard LINQ methods to get the data we want.
The "Skip(1)" call will skip the header row. The "Take" method will limit the number of records to what's passed in to the parameter. As a side note, notice that I changed the parameter default from "0" to "int.MaxValue". This way if the parameter is omitted, the entire file will be read (well up to the max integer value anyway -- this would be a problem for large data sets (but then so would the rest of this application)).
Then we just use the "ToArray" method to get it into the final format that we want.
I don't know why I never thought of doing things this way before. Now there is a performance difference here. Since "ReadAllLines" reads the entire file, we're bringing in more data than we need, but the heavy-lifting of this application is done after this step, so the performance difference is negligible. But these are the things we need to think about as we make changes to our code.
[Update: 07/20/2015: As suggested in the comments, I explore the difference between "ReadAllLines" and "ReadLines" in Getting a Bit LINQ-ier.]
If you want to look at this code yourself, the latest version of this file is on GitHub: jeremybytes/digit-display. Here's a link to the file: FileLoader.cs current.
The Inspiration
Like I said, the way that Mathias loads the data in his book was the eye-opener. Here's that code (from Chapter 1):
This is doing multiple steps. The second method (ReadObservations) reads the data from the file. In addition to skipping the first line, it does a data transformation using the "Select" method.
And we can see this in the first method. It takes the string data (which is a collection of comma-separated integers) and turns it into an integer array. This process skips the first value because this tells what digit the data represents. Everything after than on the line is the pixel data.
More LINQ-y Goodness
So, I wasn't content with just the file loading part of the application. I also needed to take the string data and convert it into a list of integer values.
Here's my original attempt at that (just part of this particular method):
This isn't bad code. I split the line on the commas, then "foreach" over the elements to convert them from strings to integers. File here: DigitBitmap.cs original.
And here's the updated code:
Here's a link to the file on GitHub: DigitBitmap.cs current.
As far as lines of code is concerned, these are about equivalent. You might notice that the original code doesn't have the "Skip 1" functionality. That's because I took care of it later in the method (which resulted in an off-by-one error that you can read about in the original article).
A Bit of a Stumbling Block
The thing that I hadn't really thought about before was using "Select" to transform the data from string to integer. And this was a bit of a stumbling block for me.
I have seen "Select" mis-used in the past. The code sample that I saw called a method in "Select" that had side effects -- it mutated state on another object rather than simply returning data. This smelled really bad to me when I saw it. Since then, I've been careful and shied away from using "Select" with other methods.
But in this case, it's perfectly appropriate. We are not mutating data. We are not changing state. We are transforming the data -- and this is exactly what "Select" is designed for. This was a real eye-opener, and I'm going to be a bit more creative about how I use LINQ in the future.
Wrap Up
I've been a big fan of LINQ for a really long time. And I really like showing people how to use it (and if you don't believe me, just check out this video series: Lambdas and LINQ in C#). I'm really happy that I can still extend my outlook and find new and exciting things to do with my existing tools.
And I find myself getting sucked more into the functional world. The problems are interesting, the techinques are intriguing, and the people I know are awesome.
I'm pretty sure that by the time I get through this book, I'll be looking for ways to use functional programming (and most likely F#) as a primary tool in my toolbox.
Happy Coding!
Thursday, July 9, 2015
Preview: Machine Learning Projects for .NET Developers
So, I just got my copy of Machine Learning Projects for .NET Developers by Mathias Brandewinder (Amazon link). I'm excited about this for 2 reasons: (1) machine learning has interested me for a long time, but always seemed out of reach, (2) Mathias is smart, funny, practical, and a good teacher.
But I got even more excited when I flipped through and saw what was on page 4:
Why does this look familiar? Oh yeah...
I've already worked with this data set. I found it when looking at machine learning projects, and I ended up writing an application that would change the raw data into displayable bitmaps.
Read more about it here: Coding Practice: Displaying Bitmaps from Pixel Data.
I'm looking forward to finally understanding the machine learning bits of this particular challenge.
Happy Coding!
But I got even more excited when I flipped through and saw what was on page 4:
Why does this look familiar? Oh yeah...
I've already worked with this data set. I found it when looking at machine learning projects, and I ended up writing an application that would change the raw data into displayable bitmaps.
Read more about it here: Coding Practice: Displaying Bitmaps from Pixel Data.
I'm looking forward to finally understanding the machine learning bits of this particular challenge.
Happy Coding!
Wednesday, July 8, 2015
An Outsider's View of the Functional Community
I really like the functional developer community. I'm not currently part of this community (although I've been threatening to do a deep dive into functional programming for a while), but there are several things that I find attractive.
I've had several opportunities to talk to some really great folks including Mathias Brandewinder (@brandewinder), Phil Trelford (@ptrelford), Bryan Hunter (@bryan_hunter), Jessica Kerr (@jessitron), and Dave Fancher (@davefancher) -- and many others (please don't take it personally if I didn't list you here).
So let's take a look at what I like.
Lots of Explorers
The functional community has lots of explorers. Earlier this year, Phil Trelford posted a series of random art images:
In addition, there's a lot going on around machine learning. This is another area that has me intrigued. And I'm looking forward to reading Mathias Brandewinder's new book Machine Learning Projects for .NET Developers. (My copy should arrive tomorrow.)
I expect that this will be a bit over my head, but that's okay. You need to start somewhere. The last time I looked into machine learning, I ended up building something completely different.
And there's much, much more I could say in this area.
No Language Bigots
I really like that there are no language bigots in the functional world (okay, so there are some, but they are the minority).
Every time a functional conference comes up (or a conference with a strong functional track), I see the same types of messages coming out -- most recently, from NDC Oslo. It probably goes along with the large number of explorers in the community: "Show me new stuff!"
Welcome to Functional Programming
The people I talked to have been very welcoming and encouraging. Granted, I'm talking to folks who are leading user groups, speaking at events, and writing books, so these are the folks who are actively promoting their community. But my impression is that the same attitude has "filtered through the ranks" of the community in general.
There are a lot of people making it easy to for folks to get started. For example, Jessica Kerr has a Pluralsight course Functional Programming with Java. I asked her, "Why not Scala?" (since she introduced herself as a Scala dev), and she answered that this was about how to use functional techniques in your current environment. We can use those functional concepts now, and then maybe jump to a full-functional environment if the need arises.
BTW, I really like her bio on Pluralsight: "Her mission is to bridge the gap between object-oriented and functional development."
I really like the idea of this. For those who have been following my functional programming articles, a lot of what I've written about isn't necessarily because I want to go full functional (although it is attractive). Instead, it's been about how I can leverage functional concepts in my current environment -- which happens to be C#. And I'm a big fan of declarative programming styles such as XAML and LINQ.
Another example is Mathias Brandewinder's presentation "F# for the C# Developer". I've seen this presentation, and it was great to give me a gentle introduction using familiar concepts. There's a recording of the presentation on YouTube so you can experience it:
Why am I Writing This?
So why am I writing about this now? It's actually been on my mind for the last couple of days. Every so often, I see a "You're an idiot if you're still doing OO programming" message come through my Twitter stream.
Fortunately, this is very rare. But whenever I see something like this get retweeted, I die a little inside. I always try to promote the right tool for the job. OO programming has things that it's really good for. Functional programming has things that it's really good for. So let's try to get the best of both worlds.
This morning, I came across an article written by Richard Dalton: Some Functional Programmers are Arrogant. It was written in response to a tweet by Uncle Bob (not shown here).
Even though I have run into this attitude, it has been extremely rare. The vast majority of functional developers that I've interacted with are very welcoming. Richard's article is a very good response to this from someone in the functional community.
I thought that I would add my view from outside the community.
To all you functional developers out there: keep being awesome!
Happy Coding!
I've had several opportunities to talk to some really great folks including Mathias Brandewinder (@brandewinder), Phil Trelford (@ptrelford), Bryan Hunter (@bryan_hunter), Jessica Kerr (@jessitron), and Dave Fancher (@davefancher) -- and many others (please don't take it personally if I didn't list you here).
So let's take a look at what I like.
Lots of Explorers
The functional community has lots of explorers. Earlier this year, Phil Trelford posted a series of random art images:
Random Art from recursive nested random symbolic math expressions http://t.co/NUrjrprcCs http://t.co/SuhIp2NsCX pic.twitter.com/46FMSJtgRT
— Sean's dad (@ptrelford) January 16, 2015
I love this kind of stuff. And this makes me want to dig into the details further.In addition, there's a lot going on around machine learning. This is another area that has me intrigued. And I'm looking forward to reading Mathias Brandewinder's new book Machine Learning Projects for .NET Developers. (My copy should arrive tomorrow.)
I expect that this will be a bit over my head, but that's okay. You need to start somewhere. The last time I looked into machine learning, I ended up building something completely different.
And there's much, much more I could say in this area.
No Language Bigots
I really like that there are no language bigots in the functional world (okay, so there are some, but they are the minority).
Every time a functional conference comes up (or a conference with a strong functional track), I see the same types of messages coming out -- most recently, from NDC Oslo. It probably goes along with the large number of explorers in the community: "Show me new stuff!"
"Look at this cool thing that xxx did in Erlang."And it goes on and on. It seems like it doesn't matter whether someone is using Erlang or Haskell or Clojure or F# or Elixir or Scala. The functional community recognizes that each environment has strengths and weaknesses. Instead of focusing on the weaknesses, there is a huge focus on the strengths -- finding out what each environment is good for and seeing how that can make us super productive.
"Wow, did you see the Elixir demo from xxx?"
"You're doing awesome stuff with Clojure? Show me the awesome stuff you're doing with Clojure."
Welcome to Functional Programming
The people I talked to have been very welcoming and encouraging. Granted, I'm talking to folks who are leading user groups, speaking at events, and writing books, so these are the folks who are actively promoting their community. But my impression is that the same attitude has "filtered through the ranks" of the community in general.
There are a lot of people making it easy to for folks to get started. For example, Jessica Kerr has a Pluralsight course Functional Programming with Java. I asked her, "Why not Scala?" (since she introduced herself as a Scala dev), and she answered that this was about how to use functional techniques in your current environment. We can use those functional concepts now, and then maybe jump to a full-functional environment if the need arises.
BTW, I really like her bio on Pluralsight: "Her mission is to bridge the gap between object-oriented and functional development."
I really like the idea of this. For those who have been following my functional programming articles, a lot of what I've written about isn't necessarily because I want to go full functional (although it is attractive). Instead, it's been about how I can leverage functional concepts in my current environment -- which happens to be C#. And I'm a big fan of declarative programming styles such as XAML and LINQ.
Another example is Mathias Brandewinder's presentation "F# for the C# Developer". I've seen this presentation, and it was great to give me a gentle introduction using familiar concepts. There's a recording of the presentation on YouTube so you can experience it:
Why am I Writing This?
So why am I writing about this now? It's actually been on my mind for the last couple of days. Every so often, I see a "You're an idiot if you're still doing OO programming" message come through my Twitter stream.
Fortunately, this is very rare. But whenever I see something like this get retweeted, I die a little inside. I always try to promote the right tool for the job. OO programming has things that it's really good for. Functional programming has things that it's really good for. So let's try to get the best of both worlds.
This morning, I came across an article written by Richard Dalton: Some Functional Programmers are Arrogant. It was written in response to a tweet by Uncle Bob (not shown here).
Even though I have run into this attitude, it has been extremely rare. The vast majority of functional developers that I've interacted with are very welcoming. Richard's article is a very good response to this from someone in the functional community.
I thought that I would add my view from outside the community.
To all you functional developers out there: keep being awesome!
Happy Coding!
Monday, July 6, 2015
Tracking Property Changes in Unit Tests
When we were looking at Assert messages last time, I mentioned a property change tracker class that I've found useful in my tests. Let's take a closer look at why we might need this.
Here's a sample test (pay attention to the "tracker" variable):
The "WaitForChange" method takes 2 parameters: the property name we're checking and a timeout (in seconds). Before going into the code, let's see why this may be important.
Articles in this Series
o Tracking Property Changes in Unit Tests (this article)
o Update 1: Supporting "All Properties"
o Update 2: Finer-Grained Timeout
o Update 3: Async Tests
INotifyPropertyChanged and Asynchronous Methods
I've spent a lot of my time in view models. These are just classes that have properties that we data bind to our UI elements and methods that can call into the rest of our application.
In order for data binding to work as expected, we implement the "INotifyPropertyChanged" interface. This interface has a single member (an event), but we usually set up a helper method that make it easy to raise this event.
Here's a common implementation of INotifyPropertyChanged:
This code is a little bit different than code I've shown before. Thanks to Brian Lagunas for pointing out that the we should add the intermediate "handler" variable to take care of potential multithreading issues. I haven't run into these issues myself, but I'm sure I will as our UIs get more asynchronous and parallel.
Then in the setters for our properties, we call "RaisePropertyChanged":
So now, whenever the "LastUpdateTime" property is set (and the value is different from the current value), it will raise the PropertyChanged event, and this will notify the affected UI elements that they need to update their values.
Note: I would normally recommend using the CallerMemberName attribute here, but this code is taken from a .NET 4.0 application, so that attribute is not available.
Asynchronous Method Calls
Normally, we don't need to know when a property changed in our unit tests. The adventure comes when we have asynchronous method calls. If we have an asynchronous method call that updates a property when it completed, we can tap into the INotifyPropertyChanged event rather than setting up full interaction with the asynchronous method.
This is especially true when dealing with the Asynchronous Programming Model (APM) which this particular code sample uses. This async approach has been around for a long time (and thankfully has been mostly replaced by TAP (Task Asynchronous Pattern) with Task, async, and await).
Here's our asynchronous method:
I won't go into the details here. This takes a service call that uses APM (the "_service" object with "BeginGetPeople" and "EndGetPeople") and switches it to TAP (creating a Task and a continuation).
The important part that we're looking for here is that the "LastUpdateTime" is set in the continuation. That means we can use this as a signal that the continuation has completed.
Tests with the Asynchronous Call
I'm a bit torn on whether it's best to look at the test in detail or to look at the "PropertyChangeTracker" first. Let's start with the test:
This test checks to see if the client-side cache is working. This application has a 10 second cache (for demonstration purposes). If the cache is still valid, then the catalog service should *not* be called. But if the cache is expired, then the service *should* be called. This test checks that the service is called again when the cache has expired.
Here's the progression of this test:
By waiting for the "LastUpdateTime" property to be updated, we know that the asynchronous process has completed, and it is safe to continue with our tests. We can think of this as a "block until changed or timeout".
Let's look at the code that makes this possible.
PropertyChangeTracker in Prism 4
I borrowed the PropertyChangeTracker class from the Prism 4 test helpers. Here's the code for that class (notice that "WaitForChange is not part of this class).
I ran across this class when I was working with Prism 4 a few years back (here and here). I thought it was pretty clever and useful. Here's what it does in its "raw" state.
The constructor takes a parameter of a class that implements that "INotifyPropertyChanged" interface. Then it hooks up it's own event handler to the "PropertyChanged" event. In this case, it will add the name of the property to a private "notifications" collection.
The only public method on this class is "Reset" which will clear out the notification list. The "ChangedProperties" property lets us get an array of the property names so that we can see if a particular one was updated.
Note: I haven't found an equivalent object in the latest code for Prism 6. I'll need to ping Brian Lagunas about that to see if there is something similar in the test suite.
[Update 07/06/2015: Per Brian, this class was no longer needed for the Prism test suite, so it was removed.]
New Functionality
This class was used as a helper as part of the Prism tests. But I wanted to use this same technique to get a little more functionality. Rather than just getting a list of the properties that have changed, I wanted to be notified *when* a property was changed.
Most of my class is the same as the Prism class:
I ended up removing the private "changer" field because it isn't used anywhere else in the class. The important bit that I added is the "WaitForChange" method:
This method takes 2 parameters: the property name and a timeout (in seconds). It uses a (very inefficient) "while" loop to keep checking whether the property name is showing up in the "notifications" collection.
The loop will continue until the property name shows up in the list or the timeout expires. The method returns "true" if the property was changed; otherwise "false".
Based on this information, it should be more apparent how the previous test works (I won't repeat it here, you can scroll up an re-read the procedure.)
Why Have a Timeout?
So the question you might have is why do we have the "maxWaitSeconds" parameter (a timeout)? It's so we can write tests like this:
This test is similar to the previous test, but it is testing the inverse state. If the cache is still valid, then the catalog service should *not* be called again. It should only be called once when the view model is initialized.
And this is why we need the timeout.
Our "Arrange" section is just the same as the previous test. But our "Act" section is different. Since we are not expiring the cache (by setting the "LastUpdateTime"), the cache should still be valid in this case. That means that when we call "RefreshCatalog", the service should *not* be called. And as a byproduct of that, the "LastUpdateTime" property is *not* updated.
So when we call "WaitForChange" the second time, we expect that the "LastUpdateTime" property will *not* be updated. So this method will not return until the timeout has expired. But it will return; it will not simply hang.
Concerns for PropertyChangeTracker
There are a few things we should look at with PropertyChangeTracker, including cases it currently misses.
Missed "All" Property Changed
One scenario that this class misses is when all of the properties are changed with a single call. If we want, we can call "RaisePropertyChanged" with a null value for the parameter. The effect of this is that the UI will rebind all of the properties.
But in our case, "WaitForChange" would return false. This would be a fairly easy fix; we just need to change the "while" condition to check for the parameter name or a "null" to denote "all properties".
The "while" Loop
The "while" loop in our method is a blocking call (it needs to be blocking because we want our test to also be "blocked"). But this has the advantage of notifying us immediately when the property is changed.
So even though this is inefficient, it works for what we need here. I have a feeling that there is a better way of implementing this immediate notification; one way may be by setting up something in the event handler. It will be interesting to explore further.
Finer-Grained Timeout
One good change would be to allow for a finer-grained timeout parameter. Right now, it is set to seconds (and whole seconds since it is an integer). It may be better to set this milliseconds or maybe event a TimeSpan. This would let us use increments smaller than 1 second.
Implementing IDisposable
A practice that we're ignoring here would be to implement the IDisposable interface on this class. Why would we want to? Because we're hooking up an event handler.
When we hook up event handlers, we create references between our objects. And because of this, we may keep the garbage collector from collecting objects that are no longer used. So generally, we like to unhook our event handlers in a Dispose method to clean things up a bit.
But we don't need to worry about that here. Our objects are extremely short-lived. In fact, they only live for the length of the individual test method. We create both objects (the view model that is being tracked, and the tracker object itself) in the test method, and they will both go out of scope when the method exits.
Because of this, we don't need to be worried about these references.
Helper Class
This is really a helper class for our tests. As such, we usually just give it the smallest amount of functionality that it needs to be useful. As we run into different scenarios, then we can expand the class a bit. But generally, if it's working for what we need, then we leave it alone.
There are a lot of interesting techniques out there. For people who have lots of view models and they need to track changes to properties, this little change tracker may be a good option.
Happy Coding!
Here's a sample test (pay attention to the "tracker" variable):
The "WaitForChange" method takes 2 parameters: the property name we're checking and a timeout (in seconds). Before going into the code, let's see why this may be important.
Articles in this Series
o Tracking Property Changes in Unit Tests (this article)
o Update 1: Supporting "All Properties"
o Update 2: Finer-Grained Timeout
o Update 3: Async Tests
INotifyPropertyChanged and Asynchronous Methods
I've spent a lot of my time in view models. These are just classes that have properties that we data bind to our UI elements and methods that can call into the rest of our application.
In order for data binding to work as expected, we implement the "INotifyPropertyChanged" interface. This interface has a single member (an event), but we usually set up a helper method that make it easy to raise this event.
Here's a common implementation of INotifyPropertyChanged:
This code is a little bit different than code I've shown before. Thanks to Brian Lagunas for pointing out that the we should add the intermediate "handler" variable to take care of potential multithreading issues. I haven't run into these issues myself, but I'm sure I will as our UIs get more asynchronous and parallel.
Then in the setters for our properties, we call "RaisePropertyChanged":
So now, whenever the "LastUpdateTime" property is set (and the value is different from the current value), it will raise the PropertyChanged event, and this will notify the affected UI elements that they need to update their values.
Note: I would normally recommend using the CallerMemberName attribute here, but this code is taken from a .NET 4.0 application, so that attribute is not available.
Asynchronous Method Calls
Normally, we don't need to know when a property changed in our unit tests. The adventure comes when we have asynchronous method calls. If we have an asynchronous method call that updates a property when it completed, we can tap into the INotifyPropertyChanged event rather than setting up full interaction with the asynchronous method.
This is especially true when dealing with the Asynchronous Programming Model (APM) which this particular code sample uses. This async approach has been around for a long time (and thankfully has been mostly replaced by TAP (Task Asynchronous Pattern) with Task, async, and await).
Here's our asynchronous method:
I won't go into the details here. This takes a service call that uses APM (the "_service" object with "BeginGetPeople" and "EndGetPeople") and switches it to TAP (creating a Task and a continuation).
The important part that we're looking for here is that the "LastUpdateTime" is set in the continuation. That means we can use this as a signal that the continuation has completed.
Tests with the Asynchronous Call
I'm a bit torn on whether it's best to look at the test in detail or to look at the "PropertyChangeTracker" first. Let's start with the test:
This test checks to see if the client-side cache is working. This application has a 10 second cache (for demonstration purposes). If the cache is still valid, then the catalog service should *not* be called. But if the cache is expired, then the service *should* be called. This test checks that the service is called again when the cache has expired.
Here's the progression of this test:
- The view model (CatalogViewModel) is created. This is the class that we're testing.
- The tracker (PropertyChangeTracker) is created.
- The "Initialize" method is called on the view model. This calls the "RefreshCatalogFromService" method that we saw above. This has asynchronous behavior.
- The "WaitForChange" method waits for the "LastUpdateTime" to be changed by the asynchronous method. This signals that the asynchronous call is complete and it is safe to continue.
- The "LastUpdateTime" is set to an hour in the past. This sets the cache to expired.
- The "Reset" method is called on the tracker.
- The "RefreshCatalog"method is called on the view model. We expect that this *will* call the "RefreshCatalogFromService" method.
- The "WaitForChange" method is called again on "LastUpdateTime".
- The assert will check that the method on our mock service was called 2 times. (This mock setup is hidden in this case (which isn't good). It would be better if we got rid of the setup method for these tests).
By waiting for the "LastUpdateTime" property to be updated, we know that the asynchronous process has completed, and it is safe to continue with our tests. We can think of this as a "block until changed or timeout".
Let's look at the code that makes this possible.
PropertyChangeTracker in Prism 4
I borrowed the PropertyChangeTracker class from the Prism 4 test helpers. Here's the code for that class (notice that "WaitForChange is not part of this class).
I ran across this class when I was working with Prism 4 a few years back (here and here). I thought it was pretty clever and useful. Here's what it does in its "raw" state.
The constructor takes a parameter of a class that implements that "INotifyPropertyChanged" interface. Then it hooks up it's own event handler to the "PropertyChanged" event. In this case, it will add the name of the property to a private "notifications" collection.
The only public method on this class is "Reset" which will clear out the notification list. The "ChangedProperties" property lets us get an array of the property names so that we can see if a particular one was updated.
Note: I haven't found an equivalent object in the latest code for Prism 6. I'll need to ping Brian Lagunas about that to see if there is something similar in the test suite.
[Update 07/06/2015: Per Brian, this class was no longer needed for the Prism test suite, so it was removed.]
New Functionality
This class was used as a helper as part of the Prism tests. But I wanted to use this same technique to get a little more functionality. Rather than just getting a list of the properties that have changed, I wanted to be notified *when* a property was changed.
Most of my class is the same as the Prism class:
I ended up removing the private "changer" field because it isn't used anywhere else in the class. The important bit that I added is the "WaitForChange" method:
This method takes 2 parameters: the property name and a timeout (in seconds). It uses a (very inefficient) "while" loop to keep checking whether the property name is showing up in the "notifications" collection.
The loop will continue until the property name shows up in the list or the timeout expires. The method returns "true" if the property was changed; otherwise "false".
Based on this information, it should be more apparent how the previous test works (I won't repeat it here, you can scroll up an re-read the procedure.)
Why Have a Timeout?
So the question you might have is why do we have the "maxWaitSeconds" parameter (a timeout)? It's so we can write tests like this:
This test is similar to the previous test, but it is testing the inverse state. If the cache is still valid, then the catalog service should *not* be called again. It should only be called once when the view model is initialized.
And this is why we need the timeout.
Our "Arrange" section is just the same as the previous test. But our "Act" section is different. Since we are not expiring the cache (by setting the "LastUpdateTime"), the cache should still be valid in this case. That means that when we call "RefreshCatalog", the service should *not* be called. And as a byproduct of that, the "LastUpdateTime" property is *not* updated.
So when we call "WaitForChange" the second time, we expect that the "LastUpdateTime" property will *not* be updated. So this method will not return until the timeout has expired. But it will return; it will not simply hang.
Concerns for PropertyChangeTracker
There are a few things we should look at with PropertyChangeTracker, including cases it currently misses.
Missed "All" Property Changed
One scenario that this class misses is when all of the properties are changed with a single call. If we want, we can call "RaisePropertyChanged" with a null value for the parameter. The effect of this is that the UI will rebind all of the properties.
But in our case, "WaitForChange" would return false. This would be a fairly easy fix; we just need to change the "while" condition to check for the parameter name or a "null" to denote "all properties".
The "while" Loop
The "while" loop in our method is a blocking call (it needs to be blocking because we want our test to also be "blocked"). But this has the advantage of notifying us immediately when the property is changed.
So even though this is inefficient, it works for what we need here. I have a feeling that there is a better way of implementing this immediate notification; one way may be by setting up something in the event handler. It will be interesting to explore further.
Finer-Grained Timeout
One good change would be to allow for a finer-grained timeout parameter. Right now, it is set to seconds (and whole seconds since it is an integer). It may be better to set this milliseconds or maybe event a TimeSpan. This would let us use increments smaller than 1 second.
Implementing IDisposable
A practice that we're ignoring here would be to implement the IDisposable interface on this class. Why would we want to? Because we're hooking up an event handler.
When we hook up event handlers, we create references between our objects. And because of this, we may keep the garbage collector from collecting objects that are no longer used. So generally, we like to unhook our event handlers in a Dispose method to clean things up a bit.
But we don't need to worry about that here. Our objects are extremely short-lived. In fact, they only live for the length of the individual test method. We create both objects (the view model that is being tracked, and the tracker object itself) in the test method, and they will both go out of scope when the method exits.
Because of this, we don't need to be worried about these references.
Helper Class
This is really a helper class for our tests. As such, we usually just give it the smallest amount of functionality that it needs to be useful. As we run into different scenarios, then we can expand the class a bit. But generally, if it's working for what we need, then we leave it alone.
There are a lot of interesting techniques out there. For people who have lots of view models and they need to track changes to properties, this little change tracker may be a good option.
Happy Coding!
Friday, July 3, 2015
Unit Testing Asserts: Skip the Message Parameter
The Assert methods that we get with our unit testing frameworks have an optional message parameter. Under normal conditions, we should *not* include this message. This is opposite of what I used to do. But I'm rarely perfect at something when I'm first starting out.
Getting Started with Testing
When I first came across the Assert object (and all of the methods that go with it), I saw that most of the methods had an optional message object. I figured that it was a best practice to fill in this message to let the developer know what went wrong in a test failure.
To me, this was helpful. It tells us that the SelectedPeople property is not empty. But this turns out to be unnecessary.
One of my problems when I was first starting out is that I didn't have a good naming scheme for tests. So, this is what a typical test looked like:
Even with the bad naming, the Assert message is not necessary, but let's explore a couple of good testing practices before looking specifically at the message.
Good Naming
The first step in helping the developer when tests fail is to make sure that we have good names for our tests. Roy Osherove recommends a 3-part naming scheme.
The idea is that by just looking at the names of the tests, we can figure out what went wrong without needing to look at the test code or the test failure details.
The methodology that I use is a bit of mutation from this. Here's the test from above with a better name:
In this case, we're testing the "SelectedPeople" property of the "Model" object in our view model that we're working with. When we call the "ClearSelection" method, we expect that the "SelectedPeople" property will be empty.
When this test fails, we see it in the test results:
And this tells us that the SelectedPeople property is *not* empty just by looking at the failing test name.
Note: I can think of a couple of different ways to name this method based on Osherove's scheme. The level of detail would depend on the types/number of tests that we have as well as how we are organizing our tests.
Additional Note: This test was written a couple of years ago, and I can think of a few ways to enhance the readability based on my experience since then. But that's true of pretty much any code that I've written more than 6 months ago.
So if we have a good naming scheme, the test name tells us what went wrong.
Assert Message is Just a Comment
The Assert Message is really just a comment. And just like we want to avoid unnecessary comments in our code, we want to avoid unnecessary comments in our tests.
And if we feel the need to explain code in comments, we need to look at rewriting the code. (Check out slides 18 - 26 in my Clean Code presentation or watch a few minutes of the presentation starting at 21:04.)
Let's look at the test run details to see if this message is really necessary:
The standard Assert.AreEqual method gives us a useful message. We expected a "0" value for the count of our collection, but the actual value was "2". And the "SelectedPeople count is not zero" message does not add any useful information.
So, let's remove it. Here's our updated test:
And the failure details:
This is still perfectly readable. We already know from the test name that we expect "SelectedPeople" to be empty. And the standard Assert.AreEqual message tells us that we have 2 actual items.
And it's easy to see that we don't need the additional message.
Minimized, but not Eliminated
So, I've modified my testing behavior a bit. By ensuring good names, the Assert messages become unnecessary. But I find them useful in a couple of situations.
Notice from the test above that I have an Assert (with message) in the Arrange section of the test:
There are situations where I want to validate the arrangement before testing. In this case, since I'm clearing out a collection, I want to make sure that the collection is actually populated first. If I don't populate the collection, then I don't know whether "Clear" actually worked, or if the collection was empty when I started.
This arrangement populates the collection with 2 items, and then uses an Assert as a sanity check to make sure that we have to 2 items we expect to be there. In this case, the message tells us that the Arrangement went bad (rather than the "Act" part of our test).
Is this a good way to do things? Well, I've been re-thinking this as well. I've used this pattern in the past, and it has worked. But I'm thinking about ways that this can be improved.
Multiple Asserts
The problem is that we really should not have multiple "Asserts" in a single test. In the test above, the first assert is really a sanity check, and the second assert is the actual one that we want for the test.
But I have written tests that have multiple asserts. Let's take a look at the application:
The test with the multiple asserts has to do with the checkbox filters that are under the list box. Here's one of the tests:
This test is a bit more complicated that I'd like it to be -- primarily because this is testing an asynchronous service call that uses the APM (Asynchronous Programming Model). So, you'll notice the "tracker.WaitForChange" method. This is a helper class that waits for a property to be changed (by hooking up to the PropertyChanged event).
Note: The "tracker" object is actually very useful when we're testing OO code. I'll show that in a future article.
And rather than having a separate test for each filter, I've combined them into a single test.
This has a Big Downside
The big downside to this is that if the first "Assert.IsTrue" fails (the one checking "Include70s"), then none of the other Asserts will run. This is because the Assert method throws an exception which prevents the rest of the test from running.
So if we get one failure, we don't know whether the other items would have failed as well. This may change the approach we take to fixing the bug. If we fix the first problem, we may find that the second Assert fails, and we have to go through the process again.
Because we're testing multiple things in 1 test, our test name needs to be generic enough to be relevant to all of the cases. This is why our test name says "Filters" rather than "Include70sFilter". Since the test name does not give us the details we need when it fails, we need to look at the Message that we're sending back from the Assert.
This is why it's recommended that we only have 1 Assert per test. Ideally, this test should be broken up into 4 different tests. That would enable us to give each test a more specific name, and we could eliminate the Assert messages.
As a side note, there are testing frameworks that are designed for multiple asserts on a single arrangement. One of these frameworks is MSpec. I haven't used it myself, but when I was talking to David Batten (Pluralsight and Twitter) at the Denver Dev Day, he said he's had good experiences with it.
Constant Improvement
Programming is a constant learning process. By examining our existing code, we can determine which things work well and which things can be improved. And by looking at other people's code, we come across ideas that we would not have thought of ourselves.
Tips for Good Tests
Happy Coding!
Getting Started with Testing
When I first came across the Assert object (and all of the methods that go with it), I saw that most of the methods had an optional message object. I figured that it was a best practice to fill in this message to let the developer know what went wrong in a test failure.
To me, this was helpful. It tells us that the SelectedPeople property is not empty. But this turns out to be unnecessary.
One of my problems when I was first starting out is that I didn't have a good naming scheme for tests. So, this is what a typical test looked like:
Even with the bad naming, the Assert message is not necessary, but let's explore a couple of good testing practices before looking specifically at the message.
Good Naming
The first step in helping the developer when tests fail is to make sure that we have good names for our tests. Roy Osherove recommends a 3-part naming scheme.
The idea is that by just looking at the names of the tests, we can figure out what went wrong without needing to look at the test code or the test failure details.
The methodology that I use is a bit of mutation from this. Here's the test from above with a better name:
In this case, we're testing the "SelectedPeople" property of the "Model" object in our view model that we're working with. When we call the "ClearSelection" method, we expect that the "SelectedPeople" property will be empty.
When this test fails, we see it in the test results:
And this tells us that the SelectedPeople property is *not* empty just by looking at the failing test name.
Note: I can think of a couple of different ways to name this method based on Osherove's scheme. The level of detail would depend on the types/number of tests that we have as well as how we are organizing our tests.
Additional Note: This test was written a couple of years ago, and I can think of a few ways to enhance the readability based on my experience since then. But that's true of pretty much any code that I've written more than 6 months ago.
So if we have a good naming scheme, the test name tells us what went wrong.
Assert Message is Just a Comment
The Assert Message is really just a comment. And just like we want to avoid unnecessary comments in our code, we want to avoid unnecessary comments in our tests.
And if we feel the need to explain code in comments, we need to look at rewriting the code. (Check out slides 18 - 26 in my Clean Code presentation or watch a few minutes of the presentation starting at 21:04.)
Let's look at the test run details to see if this message is really necessary:
The standard Assert.AreEqual method gives us a useful message. We expected a "0" value for the count of our collection, but the actual value was "2". And the "SelectedPeople count is not zero" message does not add any useful information.
So, let's remove it. Here's our updated test:
And the failure details:
This is still perfectly readable. We already know from the test name that we expect "SelectedPeople" to be empty. And the standard Assert.AreEqual message tells us that we have 2 actual items.
And it's easy to see that we don't need the additional message.
Minimized, but not Eliminated
So, I've modified my testing behavior a bit. By ensuring good names, the Assert messages become unnecessary. But I find them useful in a couple of situations.
Notice from the test above that I have an Assert (with message) in the Arrange section of the test:
There are situations where I want to validate the arrangement before testing. In this case, since I'm clearing out a collection, I want to make sure that the collection is actually populated first. If I don't populate the collection, then I don't know whether "Clear" actually worked, or if the collection was empty when I started.
This arrangement populates the collection with 2 items, and then uses an Assert as a sanity check to make sure that we have to 2 items we expect to be there. In this case, the message tells us that the Arrangement went bad (rather than the "Act" part of our test).
Is this a good way to do things? Well, I've been re-thinking this as well. I've used this pattern in the past, and it has worked. But I'm thinking about ways that this can be improved.
Multiple Asserts
The problem is that we really should not have multiple "Asserts" in a single test. In the test above, the first assert is really a sanity check, and the second assert is the actual one that we want for the test.
But I have written tests that have multiple asserts. Let's take a look at the application:
The test with the multiple asserts has to do with the checkbox filters that are under the list box. Here's one of the tests:
This test is a bit more complicated that I'd like it to be -- primarily because this is testing an asynchronous service call that uses the APM (Asynchronous Programming Model). So, you'll notice the "tracker.WaitForChange" method. This is a helper class that waits for a property to be changed (by hooking up to the PropertyChanged event).
Note: The "tracker" object is actually very useful when we're testing OO code. I'll show that in a future article.
And rather than having a separate test for each filter, I've combined them into a single test.
This has a Big Downside
The big downside to this is that if the first "Assert.IsTrue" fails (the one checking "Include70s"), then none of the other Asserts will run. This is because the Assert method throws an exception which prevents the rest of the test from running.
So if we get one failure, we don't know whether the other items would have failed as well. This may change the approach we take to fixing the bug. If we fix the first problem, we may find that the second Assert fails, and we have to go through the process again.
Because we're testing multiple things in 1 test, our test name needs to be generic enough to be relevant to all of the cases. This is why our test name says "Filters" rather than "Include70sFilter". Since the test name does not give us the details we need when it fails, we need to look at the Message that we're sending back from the Assert.
This is why it's recommended that we only have 1 Assert per test. Ideally, this test should be broken up into 4 different tests. That would enable us to give each test a more specific name, and we could eliminate the Assert messages.
As a side note, there are testing frameworks that are designed for multiple asserts on a single arrangement. One of these frameworks is MSpec. I haven't used it myself, but when I was talking to David Batten (Pluralsight and Twitter) at the Denver Dev Day, he said he's had good experiences with it.
Constant Improvement
Programming is a constant learning process. By examining our existing code, we can determine which things work well and which things can be improved. And by looking at other people's code, we come across ideas that we would not have thought of ourselves.
Tips for Good Tests
- Useful Test Names
- One Assert per Test
- Avoid Assert Messages
Happy Coding!
Thursday, July 2, 2015
July 2015 Speaking Engagements
I've got 2 events currently scheduled in July. It should be a lot of fun.
Monday, July 13, 2015
LADOTNET
Los Angeles, CA
Meetup Event
Topic: Unit Testing Makes Me Faster
I've been a big fan of unit testing for a long time. And I've worked with it enough to see how it has sped up my development process. I'll show some examples of how unit testing can be much faster than manual testing the application or with a test app. And we'll look at some features that make up good unit testing -- these are key to having tests that speed us up rather than slow us down.
This is a new talk. I gave it for the first time this past weekend at the So Cal Code Camp. That talk went very well, and I'm looking to keep improving it. (I've been accepted to give this talk at Visual Studio Live! this November, so I'm making it as awesome as I can before then.) Look forward to a lot of fun and lots of practical advice.
Tuesday, July 28, 2015
IE.NET
Riverside, CA
Meetup Event
Topic: Getting Started with Git
I'm fairly new to Git myself; I've only been using it for about 6 months now. But I'm a convert. I really like having full history on my machine without needing network access. In this session we'll create a repository and use it to run through the basic Git commands -- both on the command line and through Visual Studio integration. And we'll see a bit of GitHub along the way. In the end, we'll see that it's not that complicated, and there are some great resources to help us get started.
This is also a new talk that I gave for the first time last week at DotNetGroup.org in Las Vegas. I had a great time doing it, and some good questions came out of the session.
Wednesday, July 29, 2015 - LATE ADDITION
C# Entertainment
Los Angeles, CA
Meetup Event
Topics: I'll Get Back to You: C# Task, Await, and Asynchronous Methods
C# Entertainment is a BRAND NEW group in Los Angeles. The goal is to mix programming technology with stand-up comedy. And don't worry, these are separate acts -- I'm not the one doing the stand-up. I'm looking forward to a great evening of tech and laughter. Come check it out.
A Look Ahead
I have a few exciting things scheduled for August, including a trip to Detroit for the Quicken Loans Technical Conference and then to Wisconsin for That Conference. I'm looking forward to both of these events. (BTW, Uncle Bob will be giving a keynote at That Conference. It will be great to see him in person.)
A Look Back
I had a lot of fun at the events I spoke at in June. The Denver Dev Day was a great day. 250 excited devs were packed into the rooms throughout the day; and tons of conversations going through the hallways. I ended up doing 4 talks that day, and I met a lot of new people and caught up with a few old friends.
Here's a few pics from one of my presentations (Learn to Love Lambdas):
And this past weekend, I had a great time at the So Cal Code Camp. I got to catch up with a lot of folks I hadn't seen for a while, meet some new people, attended some good talks, and got to give several talks. Plus, folks got to hear me play the banjo.
Here are some pictures: one from the speaker dinner at the beginning of the event, and one from the raffle at the tail end of the event. These are the brave souls to managed to make it all the way to Sunday afternoon. I was totally exhausted afterwards, but it was totally worth it.
Hunt down a community event near you. It's a great investment in yourself and in others in your local community.
Happy Coding!
Monday, July 13, 2015
LADOTNET
Los Angeles, CA
Meetup Event
Topic: Unit Testing Makes Me Faster
I've been a big fan of unit testing for a long time. And I've worked with it enough to see how it has sped up my development process. I'll show some examples of how unit testing can be much faster than manual testing the application or with a test app. And we'll look at some features that make up good unit testing -- these are key to having tests that speed us up rather than slow us down.
This is a new talk. I gave it for the first time this past weekend at the So Cal Code Camp. That talk went very well, and I'm looking to keep improving it. (I've been accepted to give this talk at Visual Studio Live! this November, so I'm making it as awesome as I can before then.) Look forward to a lot of fun and lots of practical advice.
Tuesday, July 28, 2015
IE.NET
Riverside, CA
Meetup Event
Topic: Getting Started with Git
I'm fairly new to Git myself; I've only been using it for about 6 months now. But I'm a convert. I really like having full history on my machine without needing network access. In this session we'll create a repository and use it to run through the basic Git commands -- both on the command line and through Visual Studio integration. And we'll see a bit of GitHub along the way. In the end, we'll see that it's not that complicated, and there are some great resources to help us get started.
This is also a new talk that I gave for the first time last week at DotNetGroup.org in Las Vegas. I had a great time doing it, and some good questions came out of the session.
Wednesday, July 29, 2015 - LATE ADDITION
C# Entertainment
Los Angeles, CA
Meetup Event
Topics: I'll Get Back to You: C# Task, Await, and Asynchronous Methods
C# Entertainment is a BRAND NEW group in Los Angeles. The goal is to mix programming technology with stand-up comedy. And don't worry, these are separate acts -- I'm not the one doing the stand-up. I'm looking forward to a great evening of tech and laughter. Come check it out.
A Look Ahead
I have a few exciting things scheduled for August, including a trip to Detroit for the Quicken Loans Technical Conference and then to Wisconsin for That Conference. I'm looking forward to both of these events. (BTW, Uncle Bob will be giving a keynote at That Conference. It will be great to see him in person.)
A Look Back
I had a lot of fun at the events I spoke at in June. The Denver Dev Day was a great day. 250 excited devs were packed into the rooms throughout the day; and tons of conversations going through the hallways. I ended up doing 4 talks that day, and I met a lot of new people and caught up with a few old friends.
Here's a few pics from one of my presentations (Learn to Love Lambdas):
Jeremy talking while holding a Diet Coke |
Folks making space where they can |
Here are some pictures: one from the speaker dinner at the beginning of the event, and one from the raffle at the tail end of the event. These are the brave souls to managed to make it all the way to Sunday afternoon. I was totally exhausted afterwards, but it was totally worth it.
Speaker Dinner (that's meet at the far end) |
At the Raffle (that's me holding the banjo) |
Happy Coding!
Subscribe to:
Posts (Atom)