Thursday, February 6, 2014

Book Review: Parallel Programming with Microsoft .NET

I recently finished reading Parallel Programming with Microsoft .NET by Colin Campbell, Ralph Johnson, Ade Miller, and Stephen Toub (Amazon link or read for free online). This book has a rather intimidating subtitle: "Design Patterns for Decomposition and Coordination on Multicore Architectures", but the content is actually very approachable.

I picked up this book to get better familiarity with the Task Parallel Library (TPL), and it gave me the information I was looking for (plus, quite a bit of information that I didn't know I needed).

This is a fairly short book -- just 130 pages, plus the appendices; it is approachable; and it clearly describes the problems that can be solved with parallel programming.

What's Covered
Parallel Programming shows how to implement various scenarios using the Task Parallel Library (TPL) and PLINQ (Parallel Language INtegrated Query). This book is from 2010, so it is from the .NET 4.0 era (after the TPL was added, but before async/await).

What I really like about this book is that it focuses on problems and solutions. It shows several different patterns and ways to implement parallelism in our applications, but it has a primary concern of making sure that we select the solution that matches our problem and environment.

We can get an idea of that by looking at the chapter titles:
  1. Introduction
  2. Parallel Loops
  3. Parallel Tasks
  4. Parallel Aggregation
  5. Futures
  6. Dynamic Task Parallelism
  7. Pipelines
  8. Appendix A - Adapting Object-Oriented Patterns
  9. Appendix B - Debugging and Profiling Parallel Applications
  10. Appendix C - Technology Overview
The Introduction gives the basics of what parallelism gives us and also what limitations we can expect. Just because we run parallel code across 8 cores does not mean our application runs 8 times faster. But we can get a good idea of performance improvement by calculating the degree of parallelism along with a few other factors. This is an easy-to-follow introduction, so even if you've never worked with parallel code or understand how the hardware handles requests across multiple cores, you can still understand the concepts presented.

The introduction also gives a quick overview of what will be covered in the other chapters -- with a focus on making sure that we select the right parallel pattern for our problem.

Chapter 2 covers parallel loops. This includes Parallel.For, Parallel.ForEach and also AsParallel in PLINQ. It gives the basics of the syntax, but more importantly, it talks about situations were we *can* use these constructs.

We cannot simply change a "for" loop into "Parallel.For" without considering the work that is actually being done inside the loop. This could result in corrupted data (if multiple threads update the same values without proper synchronization) or deadlocks (if multiple threads try to update values with improper synchronization). And it's very likely that even if our code works, we may actually see slower performance than if the code ran sequentially (due to the added overhead of the additional threads).

So, I really appreciate that quite a bit of time is spent in determining whether our code is suitable for a parallel loop. If not, there may be some changes we can make to our code to facilitate it, or there may even be a better pattern for us to look at.

And we don't simply want to add locks to our code to handle synchronization. This can lead to slower performance and deadlocks. It's much better if we can architect our objects and methods so that they do not require synchronization. Instead of using a shared state that needs synchronization, we should look at ways to accomplish the same task without needing that shared state.

Other chapters cover the topics in a similar manner, and there are quite a few references from one chapter to another to explore different options. Just to give you an idea of what's covered in the chapters, here's a screenshot from the introduction (it's in the middle of this page:

Again, this is a great approach to understand the types of problems that can be addressed with parallel programming and how to make sure we use the right implementation for our problem.

Another thing that I really like: there are clear recommendations and examples on how to handle things like cancellation and exceptions. These things are quite a bit more complicated when we may have things running in parallel.

Parallel Programming vs. Async
Parallel programming and asynchronous programming are 2 different things. But they are related. We most often think of parallel programming as running the same function across multiple cores so that we can enhance performance. (And from the chart above, we see that there are several variations that let us take advantage of parallelism if we have some slightly different requirements.)

Asynchronous programming is generally more about "go do this, and let me know when you're done" without interrupting the flow of the rest of the program. So, we may kick off a long-running calculation on a separate thread so that our UI stays responsive.

But in both of these cases, the result is that we end up using multiple threads (most of the time). And so we see both parallel programming and asynchronous programming implemented with Tasks and the Task Parallel Library.

Last year, I reviewed Async in C# 5.0 (it looks like almost exactly a year ago, actually). And one of the caveats in my review is that you need a good understanding of Tasks before reading it. It turns out that Parallel Programming with Microsoft .NET give you (almost) all the information you need to understand how Tasks are used with async.

Is This Still Relevant?
This book is from 2010, and we've had several important updates and additions to the .NET framework. So, is this book still relevant?

The answer is "Yes". The Task Parallel Library has had a few additions (including the IProgress<T> interface), but the core functionality is still the same, including using a CancellationTokenSource, spinning up new Tasks, and dealing with AggregateExceptions.

I found the Appendix on Adapting Object-Oriented Patterns to be very interesting. I'm a big fan of design patterns (as long as they are used properly). The appendix takes several OO patterns and shows how we may need to modify them to make sure they are thread-safe. For example, the "standard" implementation of the Singleton pattern (which includes a class with a static instance field) can cause problems in a multi-threaded environment. The book provides several ways to make a thread-safe Singleton, including adding synchronization or by using Lazy<T> for the instance. Very cool stuff.

Wrap Up
I'm not actually expecting to dive into parallel programming too deeply. Historically, I haven't had too many business problems that really warranted it. But I do have a much better understanding of the Task Parallel Library and PLINQ. I have been using the TPL more and more in my code (primarily for the asynchronous bits), and this book has given me some good ideas on how I can start expanding into the parallel world.

Parallel Programming with Microsoft .NET is an approachable resource for anyone who is looking to better understand the problems of parallel development and also for developers who simply want a better understanding of the Task Parallel Library for use with async method calls. Personally, I expect to use this information as I dive deeper into functional programming in .NET.

Happy Coding!

No comments:

Post a Comment