Monday, December 19, 2016

When Progress Reporting Goes Bad: Incremental Progress vs. Cumulative Progress

Last time, we looked at how to report progress with an asynchronous task by using whatever payload we like. For that example, we used cumulative progress reporting, where the asynchronous method reported back on how far along it was in the process. This worked for our example, but if we try to add some parallelism to our asynchronous process, we run into problems.

The idea behind incremental progress reporting is that we just report increments from the asynchronous process, and we leave the aggregation up to the calling code.

Note: this article is inspired by Stephen Cleary's article on Reporting Progress from Async Tasks (specifically look at the section "Defining 'Progress'"). Stephen is a great resource for all thing async, so be sure to check out his articles and his book.

Let's take a look at our current solution, see the problem when we add some parallelism, and how we can fix our reporting by thinking incrementally.

Cumulative Progress Reporting
The way we left our code in the last article is using cumulative progress reporting. Here's what the reporting looks like:

Cumulative Progress Reporting

If we look at the output, we see that the numbers count up in order, from 1 to 7. Some of the number blip by pretty quickly, but they are all there.

The code comes from the 10b-CustomProgress branch of the GitHub project: jeremybytes/using-task.

The values are coming from the progress report itself. As a reminder, here is the code that reports the progress:

Reporting Progress (Original)

This uses a "for" loop to iterate through the items in our collection. The "Report" method is what we want to pay attention to here. This sends back at "PersonProgressData" object that has 3 pieces: (1) the item number, (2) the total number of items, and (3) the name on the item.

Since these are processed in order and reported in order, things come out in the right order on the UI side as well. So we can simply put these values into a string:

Showing Progress (Original)

We're letting the asynchronous method handle the aggregation (meaning that it's telling us how many items have been processed). This is what Stephen refers to as "cumulative" progress reporting. And it works fine as long as things stay in order.

But when we're talking about asynchronous code, things don't always happen in order.

The Problem: Parallelism and Cumulative Reporting
As long as we're doing async coding, we might as well add some parallelism as well. Right now, we're "processing" items one at a time. What if we create some more asynchronous tasks so that we can process multiple items at one time?

That's where we run into problems. Let's see our output with some parallel processing:

Cumulative Progress Reporting (with parallel processing)

Yikes! Our items are reported out of order now: 6, 5, 2, 7, 1. We don't see all of the items reported because we have a couple items that are reported "at the same time" (or at least close enough together that we don't see the UI updates).

Think about the chaos this would cause if we had this hooked up to a progress bar. It would be jumping all over the place. It's particularly bad since it ends with "1 of 7" in this case rather than starting with "1 of 7". At the end, our progress bar would show "14%".

If you're curious to see this code, you can check out the 10c-IncrementalProgress branch.

The asynchronous method has been updated to look like this:

Reporting Progress (with parallel processing)

This code has a few kludges in it since we're not doing real processing. Inside the "for" loop, we create a new Task for each of our items. The "Thread.Sleep()" is a kludge to make sure that the tasks don't all finish so close together that we can't see the reporting.

After the "Thread.Sleep()", we do that same thing as we did before: check the cancellation token and then report the progress.

We add each task to a collection so that we can use the "WhenAll" method on Task. When we "await" this, our code won't move forward until all 7 of our tasks are complete.

This code is not a recommendation for creating your own parallel tasks (for that, you can check out the processing that I've done in the Digit Recognizer project). This code just helps us see one of the problems we run into when we add parallelism.

Incremental Reporting
To fix this code, we won't rely on the asynchronous process to tell us where we are in the process. Instead, we'll just use it for incremental reporting -- to tell us that one more item is complete. We'll keep a counter on the client side to handle the aggregation.

Here's the result:

Incremental Progress Reporting (with parallel processing)

Notice the difference here. We can see our progress increasing "2 of 7", "3 of 7", "5 of 7", "6 of 7", and "7 of 7". (The "1" and "4" blip by really quickly.)

The number in parentheses is the item number. So it ends with item #2 (Dylan Hunt), but since we're keeping the actual completed count on the client side, our progress is in the right order.

I made a couple changes to the client-side code to accommodate this. These are in the "MainWindow.xaml.cs" file (the code-behind of our form).

First, I added a "count" field that we could use as an accumulator. This is at the top of the class:

"count" Field

The next trick is that we need to have the "count" reset to 0 each time we kick off our asynchronous method. I handled this by getting rid of the class level "IProgress<T>" object that we had in our last example.

Instead of creating our progress object in the constructor of our class, I added a factory method that would reset the counter and create the IProgress:

Factory Method for IProgress

The important bit of the updated string is that we are using the "Item" property for reference only. To keep track of actual progress, we're using the "count" field.

With regard to the "count" field, notice that we set it to 0 at the top of the method, then we increment it inside the delegate that gets called when progress is reported. (We use the pre-increment operator (with the "++" before "count") so that it will increment the value before it's printed out.)

We still need to pass the "IProgress<T>" to the "Get" method like we did before. But instead of using the class-level field, we use the factory method:

Passing the IProgress as a Parameter

When we put all of these pieces together, we get progress reporting that shows the actual progress (even if the items are not reported "in order").


If we were to hook this up to a progress bar, we would get a nice increasing progress that would show "100%" at the end.

Threading Concerns
As mentioned in previous articles, we do need to be careful about where we create the progress objects. Since we want the delegate to run on the UI thread (because we're interacting with UI elements), we need to make sure we create the progress object on the UI thread. (See "Pay Attention to Where You Create Progress Objects" for more information.)

In this case, we're calling the factory method (GetFreshProgress) as a parameter to the "Get" method. This will be resolved on the current thread before the "Get" method is called. So our progress object will be created in the right spot. This is how we can safely interact with the TextBlock on our UI.

Next, we might be concerned about the "count" field since it's shared state that we're incrementing in our delegate. But in this case, the delegate is running on the UI thread (as mentioned above). So even if progress is reported "at the same time", the delegates will only execute one at a time since it runs on the UI thread (and the UI thread will only do one thing at a time). So we don't have to worry about race conditions here.

As a side note, we could move "count" to be a method-level variable in the factory method. Since it gets captured by the delegate, things would still work as expected. I might do that in the final project code. For more information on captured variables, see Lambda Expression Basics.

Wrap Up
By using incremental progress reporting, we don't care what order things are processed or reported on the asynchronous side. Instead, we do the aggregation on the client side before we report the progress. This ensures that our progress is always moving in the right direction.

When we start talking about asynchronous and parallel processing, things get really interesting. It's easy to get things "out of order". And even better, things will process differently based on the number of cores on a particular machine. (It's really interesting to watch the Digit Recognizer max out the CPU by using all the cores on the machine.)

So we do need to put a little bit more thought into things. But once we get in that mindset, things become a bit easier. And if we use functional-style programming that doesn't use shared state, then running things in parallel gets a lot easier.

Happy Coding!

No comments:

Post a Comment