Wednesday, September 25, 2019

C# 8 Interfaces: Dangerous Assumptions in Default Implementation

In taking a closer look at C# 8 interfaces, we need to consider that we can make things worse with default implementation.
Making assumptions with default implementation can lead to broken code, runtime exceptions, and slow performance.
One of the features that is being promoted about C# 8 interfaces is that we can add members to an interface without breaking existing implementers. But we can cause a lot of pain if we aren't careful. Let's look at some code that makes bad assumptions so that we can understand the importance of avoiding these problems.

The code for this article is available on GitHub: jeremybytes/interfaces-in-csharp-8, specifically the DangerousAssumptions project.

Note: this article uses C# 8 features, which are currently only implemented in .NET Core 3.0. For these samples, I used Visual Studio 16.3.0 and .NET Core 3.0.100.

Assuming Implementation Details
The main reason I am calling this out is because of a blog article that promotes code that makes really bad assumptions about implementation. (I'm not calling out the specific article because I don't want that blogger to get hammered with comments; I'm going to contact them discretely.)

The article in question is about how good default implementation is because we can add to interfaces after there are existing implementers. However, the code makes some bad assumptions. (This code is in the BadInterface folder of the GitHub project.)

Here is the original interface:

The article continues by showing an implementation of the interface "MyFile" (in the MyFile.cs file):

The article then shows that we can add a "Rename" method with a default implementation that won't break the existing "MyFile" class.

Here is the updated interface (from the IFileHandler.cs file):

MyFile still works so everything's great, right? Not exactly.

Bad Assumptions
The main problem with the "Rename" method is that it makes a *HUGE* assumption: the implementations are using a physical file on the file system.

Let's take a look at an implementation that I created to use an in-memory file system. (Note: this is my code; it is not from the original article. You can see the full implementation in the MemoryStringFileHandler.cs file.)

This class implements a naive file system that uses an in-memory Dictionary to hold text files. There is nothing here that touches the physical file system, and it does not reference System.IO at all.

Broken Implementer
With the update to the interface, this class is now broken.

If client code calls the "Rename" method, it will generate a runtime error (or worse, it will actually rename a file on the file system).

Even if our implementation is using physical files, it may be using files that are located in cloud storage and are not accessible through System.IO.File.

This is also a potential problem when it comes to unit testing. If a fake or mock object is not updated, but the code under test is updated, this will try to touch the file system when the unit tests are run.
Because a bad assumption is made in the interface, implementers of the interface are broken.
Unreasonable Fear?
It's pretty easy to think about this as an unreasonable fear. When I talk about mis-use of code, often the response is "well, that's bad programming." I don't disagree with that.

My general response is to wait and see how things are used. For example, I had a fear that "static using" would be abused. So far, I haven't seen that. But in this instance, there is at least one article out there that is promoting this type of code (and there may be more).

We need to be aware that these ideas are out there so that we can help people move back onto a happier path that won't cause as much pain.

As always, we get better by helping each other out.

Performance Issues
I started thinking about other ways that we could get into trouble when we make assumptions about interface implementers.

The previous example calls code that is outside of the interface itself (specifically, from System.IO). We can probably agree that doing something like that is a danger sign. But if we're using things that are already part of the interface, we should be okay, right?

Not necessarily.

Just as a quick example, I created an "IReader" interface. (This is more thought out than the embarrassing IReader interface that included a "Save" method in a previous article.)

Original Interface & Implementation
Here is the original IReader interface (from the IReader.cs file -- although the file has updates):

This is a generic interface with a method to get a read-only collection of items.

One implementation of this interface generates a Fibonacci Sequence (Yes, I have an unhealthy interest in generating Fibonacci Sequences.) Here is a FibonacciReader (from the FibonacciReader.cs file -- this file also has updates on GitHub):

The FibonacciSequence class is an implementation of IEnumerable<int> (from the FibonacciSequence.cs file). It uses an 32-bit integer as a data-type, so it overflows pretty quickly.

If you're interested in this implementation, take a look at TDDing into a Fibonacci Sequence in C#.

The DangerousAssumptions project is a console application that writes out the results of the FibonacciReader (from the Program.cs file):

And here's the output:

Updated Interface
So, we have working code. But at some point we might want to get an individual item from the IReader rather than getting the entire collection of items. Since we are using a generic type on the interface, and we do not have a natural ID property on the object, we'll pull out an item at a specific index.

Here's our interface with the "GetItemAt" method added (from the final IReader.cs file):

The "GetItemAt" includes a default implementation. At first glance, this doesn't look too bad. It uses an existing member of the interface (GetItems), so it is not making external assumptions. And it uses a LINQ method on the results. I'm a big fan of LINQ, and this code looks reasonable.

Differences in Performance
Since the default implementation calls "GetItems", it requires that the entire collection be returned before a specific item is picked out.

In the case of the FibonacciReader, this means that all of the values are generated. An update to the Program.cs file will show this:

This calls "GetItemAt". Here is the output:

If we set a breakpoint inside the FibonacciSequence.cs file, we can see that the entire sequence is generated for this.

Running the program will hit this breakpoint twice: once for the "GetItems" call and again for the "GetItemAt" call.

Performance Hindering Assumption
The biggest problem with this method is that it requires the entire collection of items to be retrieved. If this "IReader" is going against a database, then all of the items are pulled back so that a single item can be selected. It would be much better to let the database handle this functionality.

With our FibonacciReader, we are calculating each new item. This means that the entire list needs to be calculated in order to pull out the one item that we want. Calculating a Fibonacci sequence is not very CPU-intensive, but what if we were doing something a bit more intensive, like calculating prime numbers.

You might be saying, "Well, there's a 'GetItems' method that returns everything. If that takes so long to do, then it probably shouldn't be there." And that's a fair statement.

But the calling code doesn't know about any of this. If I'm calling "GetItems", I know that I'm going through a (possibly) network and data-intensive process. If I'm asking for a single item, I don't necessarily think that same thing.

Specific Performance Fix
For the FibonacciReader, we can put in our own implementation to get much better performance (in the final FibonacciReader.cs file):

The "GetItemAt" method overrides the default implementation that is provided in the interface.

This uses the same "ElementAt" LINQ method that the default implementation uses. But instead of using that method on the read-only collection that comes back from "GetItems", it uses it on the underlying FibonacciSequence, which is an IEnumerable.

Since the FibonacciSequence is an IEnumerable, calling ElementAt will stop once it gets to the selected item. So rather than generating the entire collection, it only generates items up to that specific index.

To try this out, keep the same breakpoint as above and run the application again. This time, it will only hit the breakpoint once (for the "GetItems" call). It does not hit the breakpoint during the "GetItemAt" call.

A Bit Artificial
This example is a bit artificial because we don't normally pick items out of a dataset by index. But we could think of something similar that would happen if we had a natural ID property.

If we pulled items out by ID rather than index, we could have the same performance issues on a default implementation. The default implementation would require all items to be returned to pick out just one. Letting a database or other "reader" pull out an individual item by ID would be much more efficient.

Think About Your Assumptions
We always make assumptions. If we tried to code for every possible use of our libraries, we would never be done. But we need to think about the assumptions that we do make for our code.

This doesn't necessarily mean that the "GetElementAt" default implementation is a bad one. It does have potential performance issues. But if the data sets are small or the calculated elements are "cheap", then it may be worth the trade.

I'm still not a big fan of changing an interface after it has existing implementers. But I understand that there are scenarios where it is preferable to other options. Programming is all about problem solving, and that includes weighing pros and cons for each tool and technique that we use.

Default implementation has the potential to cause harm to the implementers of the interfaces (and potentially the callers of those implementations). We need to take special care regarding our assumptions when it comes to default implementation.

Happy Coding!

No comments:

Post a Comment