:::: MENU ::::

Tuesday, July 15, 2008

The yield keyword in C# is pretty powerful and expressive, but it doesn’t seem to be very widely known about. In this post we’ll take a quick look at what yield does and then I’ll post a follow-up that looks at what the compiler generates for you. Let’s start by looking at a simple (and contrived) example:

private static readonly string[] StringValues = new string[] { "The", "quick", "brown", "fox", 
                            "jumped", "over", "the", "lazy", "dog" };
static IEnumerable<string> TestIterator()
{
foreach(string value in StringValues)
{
yield return value;
}
}

The return type for the TestIterator method is IEnumerable, but you can see that we don’t have a return statement in the implementation. Instead, we’re using the yield return statement to return each item that we want the caller to operate on. The compiler automatically generates a class that implements IEnumerable for us. We can call this function using the following code:

foreach(string value in TestIterator())
{
Console.WriteLine("In foreach:{0}", value);
}

This code will iterate over the IEnumerable instance that is returned from the TestIterator method. In this example we’ve simply iterated over an existing collection, so we’re not providing much functionality! A more interesting example would be for a binary tree such as:

class BinaryTree
{
public T Item { get; set; }
public BinaryTree Left { get; set; }
public BinaryTree Right { get; set; }
}

In this case, the yield keyword makes it a breeze to add IEnumerable support. First, we add IEnumerable to the implemented interfaces and then we just need the code below to provide the implementation:

public IEnumerator GetEnumerator()
{
yield return Item;
if (Left != null)
{
foreach (T t in Left)
{
yield return t;
}
}
if (Right != null)
{
foreach (T t in Right)
{
yield return t;
}
}
}

This code returns the item for the current node and then recurses into the items from the left and right hand of the tree – how easy is that? This is one of the big advantages of the yield keyword: it allows you to write readable and concise code to produce an iterator.

Above we took a quick tour of the yield keyword. In this post we’re going to have a look at the code that the compiler generates for us when we use yield. We’ll return to the first example from last time and insert a Console.WriteLine before the yield return statement:

private static readonly string[] StringValues = new string[] { "The", "quick", "brown", "fox", 
                            "jumped", "over", "the", "lazy", "dog" };
static IEnumerable<string> TestIterator()
{
    foreach (string value in StringValues)
{
Console.WriteLine("In iterator:{0}", value);
yield return value;
}
}

If we run the following code to execute the iterator, we’d now see the output shown below:

foreach(string value in TestIterator()) 
{ 
    Console.WriteLine("In foreach:{0}", value); 
}
Output:
In iterator:The
In foreach:The
In iterator:quick
In foreach:quick
In iterator:brown
In foreach:brown
...
 

It’s clear from this output that the TestIterator isn’t returning all of the values in one go and then returning control to the calling code. Conceptually, you can view it that the TestIterator method is iterating over the collection that it wants to return, temporarily returning control to the caller and then resuming where it left off. I recently fired up Reflector (http://www.aisto.com/roeder/dotnet/) to look at the generated code and was initially quite surprised. The most interesting part is the IEnumerator.MoveNext function, and here’s a roughly equivalent piece of code (tidied up for readability):

private bool MoveNext()
{
switch (this.state)
{
case 0:
this.state = -1;
this.state = 1;
this.Values = Program.StringValues;
this.index = 0;
while (this.index < this.Values.Length)
{
this.temp = this.Values[this.index];
Console.WriteLine("In iterator:{0}", this.temp);
this.current = this.temp;
this.state = 2;
return true;
Label_0084:
this.state = 1;
this.index++;
}
this.state = -1;
break;

case 2:
goto Label_0084;
}
return false;

}

The state member is initialised to 0, so when MoveNext is first called it drops into the first case block. This sets up the Values and index members and then starts iterating over the array. Once it has got the first item, it sets the state to 2, stores the value in current (which is then returned via IEnumerator.Current) and returns true so that the caller can process the result. When MoveNext is called the second time, the code drops into the second case block which causes it to jump to immediately after the return true statement (the Label_0084 line). This was the ‘aha’ moment for me – conceptually it is doing exactly what I described above: iterating over the list, returning each item (and control) to the caller, and then resuming exactly where it left off. It’s so brilliant that it almost shouldn’t work! Since all of the state is stored in member variables, when it comes in to the function on subsequent calls it can safely carry on processing. And when it gets to the end of the list it breaks out of the switch block and returns false (setting the state to –1 to ensure it keeps returning false if called again).

Now imagine the code above (and the other methods required to implement IEnumerator) and compare it to the TestIterator method at the top of the post. Which could you code up more quickly? And more importantly, which is more readable to you?

More

Categories: