Friday, August 7, 2015

When Does DRY Become ARID?

The DRY Principle is a great thing to keep in mind:
The DRY Principle
Don't Repeat Yourself
But like all principles, we need balance. Otherwise we might find we've gone from DRY to ARID:
All Readability Is Dead
To me, readability is king. Sometimes I will pick a not-completely-optimized way of writing code so that it is more understandable. (Obviously, if I really need speed optimized code -- such as tight loops or other processes -- then that will get prioritized.) Let's see why we need to watch out for over-doing DRY and straying into ARID code.

Challenged By a Friend
I'm a big fan of the DRY principle, and that's because it's caused me a lot of grief in code I've dealt with. And this is one of the reasons I spend time sharing my stories with other developers (see "Clean Code" and "Abstract Art").

One of my friends -- a developer I worked with for many years -- recently watched my Clean Code presentation, and he made a comment about the refactoring part.

"I don't think you went far enough... You're violating DRY."

I'm always up for some exploration, so let's take a look at the code.

The Original Refactoring
If you want to watch the refactoring in video, here's a link to the right spot in the recorded presentation: YouTube - Clean Code 46:03.

In the sample, we start with a simple (but a bit difficult to approach) method:

Then we extract a couple of methods to hide some details and add some good method names to let us know what's going on:

This makes our "Initialize" method much easier to understand. If we need to know how we "get the service from the container", we can drill into that method. Otherwise, we can simply move on.

A Bit Drier
My friend pointed out that there is duplicated code in the two extracted methods. So this is where we should concentrate our efforts.

When we look at these a bit closer, we see a lot of similarity. We have conditions based on "IsRegistered" to see if an item is in the dependency injection container. Note that these are not *exactly* the same. The first method has a "CurrentOrder" string as a parameter; the second method has no parameter.

If the object is not in the container, then we throw a MissingFieldException (same exception with a slightly different message).

Then we resolve the actual item from the container. Again, one of the methods takes a string parameter, and the other does not.

To help us get rid of some of this duplication, we can combine them into a single method with the help of a generic type parameter. Here's a first stab at this:

Now we have a single method that returns a generic type ("T"). But we really haven't eliminated the duplication. We simply have 2 conditional blocks based on whether we have that optional string parameter. If we have a string parameter (like "CurrentOrder"), then it is passed to the "IsRegistered" and "Resolve" methods. And it is also used in the message of our exception.

If there is no parameter, then we use the name of the type itself in the exception message.

With a little more understanding of how the underlying methods work, we can eliminate some duplication:

This takes advantage of the fact that if we call "IsRegistered" with a "null" parameter, it works the same as if we had no parameter. The same is true of "Resolve". It was very easy for me to verify this functionality because I have unit tests in place that test both the success state and the error state for this.

We still need to use the right string for our exception message. So for that, we create a variable ("missingObject") that we can populate with the parameter or the type name as applicable.

Using the Drier Method
Now that we have our drier method, we need to actually use it in our code. Let's update our "Initialize" method:

All of our unit tests still pass, so we know that this code is functionally the same as we had before, but have we made the readability better or worse?

This exposes some details that we don't have in the other version. For example, we see that the model is of type "CatalogOrder" (which we could tell from inspecting the "_model" variable as well). But it also tells us that the model is referred to as the "CurrentOrder". Is this important? Or does it just get in the way?

Let's put the methods side by side:


A Bit Drier

Personally, I prefer the original version in this case. We don't need the details here; that's the reason we extracted the methods to start with. In addition, I know the environment that this code came from. It's based on real code and a real situation. Based on my understanding, the original refactoring is a better fit for the developers who would be maintaining the code.

As with all of our best practices, this is a matter of balance. For a bit more, see Beware of "Never" and "Always".

Truly ARID
I would not consider either version above to be unreadable, but readability is always at the top of my mind. The primary reason for this is that in addition to seeing violations of the DRY principle, I've seen people take this much too far. The result was something completely unreadable.

Here's an example. I was working on a WPF project, and we had several value converters to get our UI working the way that we needed for our users. (Value converters are one of the great things about data binding in WPF -- it let's us do cool things like bind a color in our UI to a date property in our data.)

Someone got tired of the proliferation of value converters, so they decided to create "one converter to rule them all". The idea was to have one value converter that had all sorts of options that could be provided as parameters. We'll look at just a little bit of this code (unfortunately, there was much, much more).

This is the main "Convert" method. And we can see that there are a couple of properties used in this method: StringFormat and FalseText. And there are a couple of methods called as well: IsStringFormatConversion and Match.

Let's take a look at the Match method:

Yikes. This contains more properties: ConditionalText, IsNullCompare, IsStringCompare, and IncludeEmpty. The mixture of "&&" and "||" also makes the logic extremely difficult to follow.

But worse that this is that the converter was confusing to use. Here are a couple basic examples:

These only use two of the possible parameters: ConditionalText and IsStringCompare. But there were examples that used many more.

The big question is "What do these converters actually do?" And that's the problem. Just by looking at the usage, I have no idea what's going on. I would need to dig deeper.

This code has definitely crossed the line into ARID.

Balance in Everything
As developers, we have to constantly find the right balance. This could be weighing the pros and cons of a particular tool. It could be finding the balance between readability and testability. Sometimes it is finding the sweet spot between speed and memory footprint.

Sometimes these choices are clear cut. But often, they are subjective -- especially when we're talking about readability. Because we always have to ask, "Readable by whom?" Readable by a senior developer? Readable by a new developer? Readable by our users?

In the example we looked at today, I could see arguments being made either way. I have a bit more insight since this is based on real code, and I know the developers who needed to understand the code. Because of this, I erred on the side of simplicity. These projects had a lot of new concepts to this team (DI, XAML, delegates, and a few more), so I would simplify wherever possible.

The decisions aren't always easy. But that's part of the fun of this type of work.

Happy Coding!


  1. However, there's a level of repeating yourself that you are completely missing. The Resolve<> method naturally has to check if the item is in the container, and almost certainly throws an exception is the item is not registered. So, what really have we gained by wrapping the Resolve() inside another check-and-throw??

    If we were to write it merely like this, what really have we lost? We'll get the container's proprietary MissingObject exception rather than our own, but we were already exploiting implementation details of the container, so that shouldn't be a problem.

    public void Initialize()
    _service = _container.Resolve();
    _model = _container.Resolve("CurrentOrder");

    This is the most pernicious form of DRY violation, because, while the others merely bloat your code, this will be slowing it down, by doing the same action twice in a row.

    1. Hi, James. I'll agree with your points; however, I'm still okay with the original refactored code. If the container is not populated, then this is truly exceptional; meaning if this happens, we have a coding error (as opposed to unexpected user action).

      I look at who will ultimately support the code. The sanity check of calling "IsRegistered" lets us take full control over the exception (or even whether we want to throw an exception at all). This code was more readable for the developers who were responsible for it.

      There is a bit of bloat, but quite honestly, the slowest part of this application (a desktop app) is the user. The extra sanity check will be unnoticeable in this case. Of course, in different circumstances (such as resolving a large number of objects at one time), we'd look for different solutions. Our job as developers is to constantly balance pros and cons of the approaches that we use.

      DRY is especially dear to me since I've been bitten by it in pretty nasty ways in the past. But I will feel free to make other decisions based on the current environment:

      Thanks for your insight. We all learn from each other.