Wednesday, August 31, 2016

Code Coverage Should Not Be The Goal

Metrics (such as code coverage) are useful tools. They can tell us if we're headed in the right direction. But when a particular metric becomes the goal, we won't get the results that we expect.

I was reminded of this yesterday as I went through the drive-thru of a restaurant. So, it's time to take a closer look at the real problem of "metrics as a goal".

A Good Metric Becomes Useless
Yesterday, I was in the drive-thru of a local fast food restaurant. After paying at the window, the employee asked me to pull around to the front of the building, and they would bring me my food there. They have asked me to do this three times over the last couple months, so it stuck out a bit more this time.

Here's the problem: The employees at the drive-thru were being judged on the length of each transaction. The restaurant has sensors set up to see how long each car is at the window (and the shorter, the better). To get me off of the sensor, they asked me to drive around to the front of the restaurant. At this point, the employee has to walk around the counter and come outside to bring me the food.

This sounds like a good metric to check ("how long does it take to serve each customer?"). But the metric became the goal. The effect is that the employees were actually working *harder* to meet that goal. It takes them longer to walk out to the front of the restaurant (and it is work that is completely unnecessary). And this also means that it takes longer to actually serve the customer.

Because the metric became the goal, the employees were working harder to meet the metric, and the actual numbers lost their value -- they no longer know how long it *really* takes to serve each customer.

Code Coverage as a Goal
Bosses love metrics because they are something to grab on to. This is especially true in the programming world where we try to get a handle on subjective things like "quality" and "maintainability".

Code Coverage is one of the metrics that can easily get mis-used. Having our code covered 100% by unit tests (meaning each line of code is represented in a test) sounds like a really good quality to have in our projects. But when the number becomes the goal, we run into problems.

I worked with a group that believed if they had 100% code coverage, they would have 0 defects in the code. Because of this, they mandated that all projects would have to have 100% coverage.

And that's where we run into a problem.

100% Coverage, 0% Useful
As a quick example, let's look at a method that I use in my presentation "Unit Testing Makes Me Faster" (you can get code samples and other info on that talk on my website). The project contains a method called "PassesLuhnCheck" that we want to test.

As a little background, the Luhn algorithm is a way to sanity-check a credit card number. It's designed to catch digit transposition when people type in numbers manually. You can read more about it on Wikipedia: Luhn Algorithm.

So let's write a test:


This test is (almost) 100% useless. It calls our "PassesLuhnCheck" method, but there are no assertions -- meaning, it doesn't check the results.

The bad part is that this is a passing test:


This doesn't really "pass", but most unit testing frameworks are looking for failures. If something doesn't fail, then it's considered a "pass".

Note: I said that this test is *almost* useless because if the "PassesLuhnCheck" method throws an exception, then this test will fail.

Analyzing Code Coverage
Things get a bit worse when we run our code coverage metrics against this code. By using the built-in Visual Studio "Analyze Code Coverage" tool, we get this result:


This says that with this one test, we get 92% code coverage! It's a bit easier to see when we turn on the coverage coloring and look at the method itself:


Note: I didn't write this method, I took it from this article: Extremely Fast Luhn Function for C# (Credit Card Validation).

The blue represents the code that the tool says is "covered" by the tests. The red shows the code that is not covered. (My colors are a bit more obnoxious than the defaults -- I picked bold colors that show up well on a projector when I'm showing this during presentations.)

So this shows that everything is covered except for a catch block. Can we fix that?

From my experience with this method, I know that if we pass a non-numeric parameter, it will throw an exception. So all we have to do is add another method call to our "test":


This test also passes (since it does not throw an unhandled exception). And our code coverage gets a bit better:


We now have 100% code coverage. Success! Except, our number means absolutely nothing.
When a number becomes the goal rather than a guide, that number can easily become useless.
Code coverage is a very useful metric. It can tell us that we're headed in the right direction. If we have 0% code coverage, then we know that we don't have any useful tests. As that number gets higher (assuming that we care about the tests and not just the number), we know that we have more and more useful tests. We just have to be careful that the number doesn't become the goal.

Overly Cynical?
Some people might think that I'm overly cynical when it comes to this topic. But I've unfortunately seen this behavior in a couple different situations. In addition to the restaurant employees that I mentioned above, I ran into someone who worked only for the metric -- to the detriment of the customer.

Many years ago, I worked in a call center that took hotel reservations. The manager of that department *loved* metrics. Everyday, she would print out the reports from the previous day and hang them up outside her office with the name at the top of the list highlighted.

There were 2 key metrics on that report: number of calls answered and average talk time. "Number of calls answered" means what you think it means: the number of calls that a particular agent answers in an hour. "Average talk time" tracked how long the agent was on the phone with each customer.

There was a particular agent who was consistently on the top of the report whenever she had a shift. But there was something that the manager didn't know: the agent was working strictly for the metrics.

This agent always took a desk at the far end of the room (away from the managers office). Then she would only answer every 3rd call -- meaning, she would answer and then immediately hang up on 2 out of 3 customers. This got the "number of calls answered" number up -- she was answering 3 times more calls than otherwise. This got the "average talk time" number down -- 2 out of 3 calls had "0" talk time so the average went down. Since the metrics were up, she could take extra breaks and no one would notice.

Not The Goal
So maybe I am overly cynical when it comes to metrics. But I have seen them horribly mis-used. We can have "short" drive thru times while making the experience longer for the customer. We can have "100%" code coverage without actually having useful tests. We can have "short" talk time because we hang up on our customers.

Measuring is important. This is how we can objectively track progress. But when the measurement becomes the goal, we only care about that number and not the reason we're tracking it.

Use those numbers wisely.

Happy Coding!

5 comments:

  1. Agree completely with the concept, and the call center example, but there is a reason beyond hitting metrics for the fast food example.

    Let's say I and my hatred for condiments go to FastFoodCo, and order a plain double cheeseburger at the drive thru. So they need to make it fresh, as they don't keep plain burgers baking under the lights (a side benefit to my pickiness). So they have two choices:
    1 - I wait at the window for them to prepare my sandwich. They get it to me as quickly as possible, though it takes a couple minutes. My wait time is as short as possible.
    2 - They ask me to pull around to the waiting spot until they have fixed my sandwich. It takes ME longer bc they have to walk out and give me the food. But in the meantime the three cars behind me with basic orders are able to order AND get their food without having to wait. So while my time is longer, overall the goal of overall customer wait time decreasing is met.

    Just felt I had to share...plus, another free blog topic, about keeping an eye on the overall goal, not being overly focussed on each sample. :)

    thanks for the hard work,
    Mike

    ReplyDelete
    Replies
    1. Yep, I agree with you on the "special order" scenario. A lot of drive-thrus around here have a waiting spot just for this. The problem with my experience at this particular drive-thru is that they have only asked me to pull around when there is no one behind me. Plus, the "pull around" location blocks the exit. :)

      Delete
  2. I agree 100% code coverage is not good, and useless metric if it is just a goal that the team needs to keep. What I have done, which could reinforce this article on why we should not...
    My team has built up JS and C# code coverage to 90%. As a team consensus I custom TFS CI to ensure also during the build that the coverage is maintained or it will reject the build. What this did was increase confidence so any developer can touch any part of the product code base without creating lots of defects.
    Granted I did also implement TDD, automated UI testing, API test cases, add “road block” on continuous deliver/deployment (before it is able to deploy to QA) that all automated cases are executed with 100% pass rate before the code base is push to QA

    ReplyDelete
    Replies
    1. Thanks for sharing your process; it's good to see how other people do things so that I can steal the best parts ;)

      One of the best things about having good unit tests in place is that you know immediately if your new code broke something it shouldn't have. These are the things I try to focus on. If the developers think that unit testing is a good idea (because they see the advantages), then you're going to be successful with it. If it's something that developers are forced to do (just to hit a metric), then it probably won't be very useful.

      I try to show developers how awesome unit testing is when it's done with certain things in mind. This gives us good, useful tests that help us get to delivery faster.

      Delete
  3. Coverage measurement is a useful tool (to know what you're not testing), but when it gets viewed as the hammer, soon every problem is a thumb.

    What is most terrifying is that even though it is well known that you get what you measure and only that, you can still in this day and age get initiatives handed down from on high to drive this or the other metric in the name of product quality.

    The C-levels might see comforting numbers on their dashboard as the system is gamed, but producing numbers does not fix a broken culture.

    ReplyDelete