Tuesday, May 22, 2012

Next, Please! - A Closer Look at IEnumerable (Part 2 - Extension Methods and LINQ)

This is a continuation of "Next, Please! - A Closer Look at IEnumerable".  The articles in the series are collected together here: JeremyBytes - Downloads.

Last time we took a look at the Iterator Pattern and how IEnumerable<T> lets us use the "foreach" loop.  IEnumerable<T> also gives us access to a huge number of extension methods that make up LINQ (Language INtegrated Query).

Extension Methods
Extension methods allow us to add methods to a class -- with no subtyping required.  This means that we can even add methods to sealed classes, and the classes will behave as if our methods are native to the class.  (Technically, we are not adding methods to the class, but the usage appears that way.)  To explore extension methods a little further, we'll take a look a quick sample.  The source code is available here: Quick Byte: Extension Methods.

In this sample, we will add an extension method to IEnumerable<T>.  Here's the definition of our method:


The idea behind this method is that we take a collection (any collection that implements IEnumerable<T>), pass in a delimiter (such as a comma or pipe), and the method will output a string with all of the elements delimited with the specified value.

On its face, this method should be pretty easy to figure out.  First, it is a public static method (ToDelimitedString<T>()) that is part of a public static class (JBExtensions).  The only strange part of this method is that there is the "this" keyword before the first parameter (IEnumerable<T> input).  We'll come back to this in just a bit.

We can use this method just like we would use any other static method:


This is a button click event in a simple WPF application.  First, we create a variable called "months" which is a list of strings.  The Months.GetMonths() method will populate this list with the 12 calendar months.  Next, we populate a text box based on our static method.  Notice that we call this with the static class (JBExtensions), then the static method (ToDelimitedString<string>).  There are 2 parameters: our list (months) and the delimiter (a comma and a space).  This results in the following output:


But remember how we described extension methods: the ability to add a method to a class (or at least appear to do so).  Because we used the "this" keyword before the first parameter, we can treat the method as if it were a method on that first parameter.  In our example, this means that it would behave as if the "ToDelimitedString" method is part of the "months" class.  Here's what that looks like:


Now instead of calling "JBExtensions.ToDelimitedString...", we call "months.ToDelimitedString...".  The parameters of the method are then any remaining parameters (after the first one).  In this case, we have the delimiter parameter.  If we compare the syntax to what we had before, this version is much more readable.  It is clear what is happening: we are operating on the "months" object by calling "ToDelimitedString" with a delimiter as the parameter.  The great thing about Visual Studio is that we get full IntelliSense as well.  When we type "months." we will see "ToDelimitedString" in the list of methods that are available (assuming that all of the requirements for an extension method are met).

Here are the requirements:
  • Extension methods must be public static methods in a public static class.  The class name itself is unimportant.
  • Extension methods are declared by including the "this" keyword in front of the first parameter.  The "this" keyword can only be used with the first parameter.
  • Extension methods are used by including the namespace of the public static class in the scope where the methods are to be used.  This means that extension methods can be collected in a shared library that is used across projects, if desired.
That's all there is to it.

IEnumerable<T> and LINQ
So, why all of this talk about extension methods?  Well, it turns out that much of Language INtegrated Query (LINQ) is implemented as extension methods on IEnumerable<T>.  This gives us a ton of really cool functionality that we can use with our collections.  The one qualification is that we need to include a using statement for "System.Linq" in order for the extension methods to be available (the 3rd bullet point, above).  The good news is that Visual Studio includes System.Linq as part of the default "using" statements for most code-file types.

LINQ offers us multiple syntaxes for implementation.  The first is using Query Syntax.  This syntax looks a lot like a SQL query (except the "select" is at the end instead of the beginning).  Here's a sample:


The "people" variable is the same one we saw last time.  It is a list of Person that is populated by the static "GetPeople" method (just a list of hard-coded values).  If you are familiar with SQL queries, then the second statement should look pretty familiar.  We are looking in the "people" object, doing a filter based on the FirstName property, and sorting the records based on the StartDate property.

If we take a look at the members of the IEnumerable<T> interface, we will see all of the Extension Methods: IEnumerable<T> Interface.  And when I say that there is a ton of functionality, this includes over 50 unique extension methods -- and several of those extension methods have multiple overloads.

By using these methods directly, we can implement LINQ by using what is often called Fluent Syntax.  Here's what the same query looks like using the fluent syntax:


The "Where" extension method takes an IEnumerable<T> and returns an IEnumerable<T>.  This means that we can keep concatenating the extension methods, so we end up with people.Where().OrderBy().  As you can imagine, this could make our code lines quite long.  Fortunately, C# allows us to add line breaks before the dots:


You can see that this is a bit more legible.  One thing to point out about LINQ extension methods is that they generally take lambda expressions in the parameters.  For a full discussion of Lambda Expressions and LINQ, please refer to Learn to Love Lambdas.

When trying to decide between Query Syntax and Fluent Syntax, there are a few things to note.  First, not all of the extension methods have Query Syntax keywords.  The basics are there (from, where, orderby, join, groupby, select), but others are not (such as First(), Single(), Count(), Average()).  This means that there are times when you may need to mix the Query Syntax and Fluent Syntax.  And that's okay.  Ultimately, which syntax you use is up to you.  Personally, if you are comfortable with lambda expressions, I think that the Fluent Syntax is easier to read and work with.  But that's just my preference.

Look at the IEnumerable<T> Extension Methods
Look at the extension methods that LINQ provides.  No, really.  Look at the Extension Methods.  There are a ton of useful methods in there.  They aren't all just "query" type methods (where, order by, grouping, etc.).  They also include aggregations such as average and count, and scalar types such as min, max, first, and last.  So, look through the list.

Here's a sample of how the extension methods can make your code much more concise.  This sample is taken from Introduction to Data Templates and Value Converters in Silverlight (this works in WPF and Windows Phone as well).  Here is the output:


The Person class includes a Rating field, which is an integer value from 0 to 10.  For the UI above, we want to take the value of the "Rating (Stars)" text box and count the number of asterisks that it contains.  We want to ignore any character that is not an asterisk.  Here is the "traditional" way of doing this (this is located in the "RatingStarConverter" in the "Converters.cs" file):


This takes the incoming value (from the text box), and assigns it to the "input" variable.  Then it uses a foreach loop to iterate through the characters of that "string" (remember from last time that string implements IEnumerable<char>, so we can use it with foreach).  If the current character is an asterisk, then we increment our rating value.

But the IEnumerable<T> extension methods give us a much quicker way of writing this: Count().


This code does exactly the same thing as the foreach loop.  The Count() method takes an optional predicate parameter that lets us set a condition on the items that we want to count.  In this case, we only count the characters that are asterisks.

There are tons of other useful methods.  Some of my favorites are Average, Count, Skip, SingleOrDefault, OrderBy, Except (the opposite of Where), and Sum.  Pretty much all of these methods use Func<> of some type in the declaration.  Wherever you see "Func<>", treat it as a big sign that says, "PUT YOUR LAMBDA EXPRESSION HERE."  Of course, the more comfortable you are with delegates and lambda expressions, the better off you'll be.  You can refer to Learn to Love Lambdas and Get Func<>-y: Delegates in .NET for additional information and samples.

The extension methods on the IEnumerable<T> interface make this an extremely powerful interface.  By simply including the System.Linq namespace, we get over 50 additional functions that can make our code more concise and readable.

Next Time
Today, we took a look what extension methods are and how we can use them to add functionality to existing classes.  Then we saw how LINQ adds a huge number of extension methods to the IEnumerable<T> interface.  These extension methods let us harness the power of LINQ in any of our collections (or other classes that implement IEnumerable<T>).

Next time, we'll create our own class that implements the IEnumerable<T> interface.  This will give us a chance to explore the interface in a bit more detail and to see exactly what we need to do in order to create our own enumerable classes.

Happy Coding!

No comments:

Post a Comment