Thursday, November 13, 2014

Smart Unit Tests (Preview) and Conway's Game of Life

I have a love/hate relationship with automated test generation. Visual Studio 2015 Preview has a new built-in feature: Smart Unit Tests. This is one of the features that we got a sneak peak of at the Microsoft MVP Summit last week and was announced publicly at the Connect(); event this week.

Smart Unit Tests came out of the Pex project from Microsoft Research. And it works by going through the code, finding all of the possible code paths, and generating unit tests for each scenario it finds. I want to emphasize that this is currently a preview. So, we aren't looking at final functionality here.

I downloaded the latest Visual Studio 2015 Preview yesterday (11/12/2014) so that I could give this a try. Let's see how it works with our Conway's Game of Life implementation.

[Update: This exploration of Smart Unit Tests has turned into a series. Get all the articles here: Exploring Smart Unit Tests (Preview).]

[Update: Smart Unit Tests has been released as IntelliTest with Visual Studio 2015 Enterprise. My experience with IntelliTest is largely the same as with the preview version as shown in this article.]

Generating Tests
A couple weeks ago, we implemented the rules for Conway's Game of Life using TDD and MSTest (and then we looked at parameterized tests with NUnit). We'll work with that same method.

Here's the code we ended up with:

Generating Smart Unit Tests is really simple: just right-click on the method (or class) and choose "Smart Unit Tests".

The result for this method is 5 tests:

Let's walk through how these are generated (this is the cool part about the technology). Here's how it figures out how to generate the tests.

Test #1
For the first test, it provides the default values for the parameters. Since "CellState" is a enum (which is actually an integer underneath), it uses the "0" value which happens to be "Alive". Since the "liveNeighbors" parameter is an integer, it just supplies "0".

The result when following this path is that it hits the "CellState.Alive" in the switch statement and then the "if" statement evaluates to "True" (since 0 is less than 2), so the result is "CellState.Dead".

Here's the actual test that was generated:

Other test methods look similar, so I'll let you use your imagination for those.

Test #2
For the next test, Smart Unit Tests recognized that we hit the "True" branch of the first "if" statement, so it supplies a parameter that would cause it to hit the "False" branch. In this case, it chooses "2" because the conditional includes "liveNeighbors < 2".

The result when following this path is that it just drops through to the "return" statement, so the value is unchanged: "Alive".

Test #3
For the next test, we take the next "case" in the "switch" statement. So, a test is created to check "CellState.Dead" and goes back to the default "0" value for number of live neighbors. The "if" statement evaluates to "False" (since it is not equal to "3").

The result from this path is that the value is unchanged: "Dead".

Test #4
For the next test, Smart Unit Tests figures out that the "liveNeighbors" parameter needs to be "3" so that it hits the other branch of the "if" statement. This way it will evaluate to "True".

The result from this path is that the value is updated to "Alive".

Test #5
The final test passes in a value for the CellState enum that will not be caught by the "switch" statement. This value is "2" (which is an invalid value for the enum, but still allowed since it is really just an integer). This does not hit any branches, and so the value is returned unchanged: "2".

Pretty Cool
So this technology is pretty cool. It is not just passing in random values for the parameters of the tests. It is actually walking through the code and keeping track of which branches are hit or not hit. Then it generates parameters to exercise each branch of the code.

Now, I wish that this had generated at least one more test with the parameters "Alive" and "4". This would catch the "liveNeighbors > 3" condition. I'm hoping that this will be updated in a future release (again, remember we are dealing with preview code here). Regardless, each code path *is* exercised.

These tests are dynamically generated, but we can save them off as regular MSTest methods that we run with our standard test runner. This is good because none of these tests show what the code is *expected* to do; this shows what the code *actually* does. So, it's really up to us to analyze the tests that are generated to see if the results are what we expect.

Once we have these tests saved off, then we'll be able to tell if code updates change the functionality because they would fail (just like our other tests). This reminds me of a similar workflow that we have with Approval Tests, where we say "yes, this is a valid result" and then are warned if we don't get that expected result.

I'm a Bit Cautious
Even though this technology is really cool, I still have some misgivings about it -- mainly based on my experience with various teams.

Not Good for TDD
Obviously, we're generating tests *after* we have the code, so it's not possible to use this technology in a TDD environment. As I've mentioned previously, I'm not quite at TDD in my own coding practices -- I'm more of a "test along side" person. This means that I write some code and then immediately write some unit tests to exercise the code. (This is much different from people who build their tests days or weeks after the code is written.)

Not Quite Exhaustive
Even though we are exercising all of the code paths, we aren't actually testing all of the valid values. Remember from our parameterized testing, we added test cases for both CellStates (Alive and Dead) and for the valid range of live neighbors (which is from 0 to 8). The result was 18 test cases:

This more closely matches our actual business case for this method.

Now, if we add code contracts, we can add valid ranges for our parameters. In this case, Smart Unit Tests would provide some tests for parameters outside of the range to make sure that it behaves as expected (such as by throwing an ArgumentOutOfRangeException). But it won't generate tests for all of the values in the valid range (again, this may change in future releases).

[Note: I experiment with guard clauses in the next article: More Smart Unit Tests (Preview): Guard Clauses.]

Developers Who Don't Understand Unit Testing
My biggest misgiving about auto-generated tests is more of a developer problem than a technical problem. I've worked with groups who consider success as having 100% code coverage in their tests. So a tool like this would simply encourage them to auto-generate tests and say that they are done.

It's easy to get a tool to report 100% code coverage, but it's much harder to actually have 100% coverage. For example, our generated tests above do give us 100% coverage, but they don't deal with 100% of the scenarios. For example, we really should put in a parameter check to make sure that "liveNeighbors" is between 0 and 8 (inclusive). But there's no way for the computer to know about that requirement that is missing from our code.

So I'm a bit wary of tools that make it easy for these types of teams to simply "right-click and done". This is great as a helper tool, but it should not replace our normal testing processes.

The Best Thing
The best thing about auto-generated tests is that we can use them to help fill in the gaps in our testing. In our scenario above, there was a test generated for a "CellState" of "2". This is out of range for our enum, but it is still valid code since the enum is ultimately just an integer.

I could use a test like this to say, "Whoah! No one should be passing in a '2' for that value." And then I could immediately add a range check to my method. The test would still be generated, but instead of the method returning normally, it would throw an exception -- which is exactly what we want.

Wrap Up
So like I said, I have a love/hate relationship with auto-generated testing. It is an extremely cool technology. And I really like how Smart Unit Tests is really smart -- it traverses the code to find all of the valid code paths. I'm going to be running this against a bunch of my other code to see what it comes up with.

But I am a bit wary as well. I'm afraid that some developers will use it to auto-generate the tests and then think nothing more about testing. That really scares me.

I'm going to be following this technology closely. And I expect to use it quite a bit to fill in the gaps in my testing. It will be interesting to see where this goes.

Happy Coding!


  1. This looks awesome, I've been following PEX for a while but they limited it to just PCL (portable class libraries), which is your not using doesn't help you much.

    Is the usefulness of Smart Unit Tests best to be thought of as a scaffolding tool, with the idea that you use it once on existing code that has no tests and then tweak the generated test code?

    1. The best use that I've heard so far is working with legacy code that does not have tests. Running Smart Unit Tests would quickly generate a suite of tests that shows what the code does now. Then when you start to make changes (updates, refactoring, etc.) you can verify that you don't break the existing functionality.

      As I mentioned, I'm still trying to figure out how (or if) this fits in to a green field scenario where we're writing completely new code.


  2. Replies
    1. It does allow you to create custom factories, so if you're using constructor injection, you can modify the factory method for the dependent objects with the code that you want to use for testing. There's a short demo of this on Channel 9:

      I don't believe it goes much beyond this (at this point), but we'll see what the future holds.

  3. Just curious, but why is

    if (liveNeighbors < 2 or liveNeighbors > 3)

    better than?

    if (liveNeighbors != 3)

    am I missing something obvious? Or is optimization so 80s and 90s?

    1. It actually goes back to the original article ( The rules are "less than 2 dies from loneliness" and "more than 3 dies from overcrowding". So a value of 2 or 3 remains unchanged.

  4. >> Now, I wish that this had generated at least one more test with the parameters "Alive" and "4".
    This is reasonable. Will look into this. Thank you.