IEnumerable Performance Tip: Any() vs. Count()

I can’t count the number of times (unintentional pun) I’ve checked if an IEnumerable<T> sequence contains elements using Count().

static void Method(IEnumerable<Status> statuses)
{
if (statuses != null && statuses.Count() > 0)
// do something...
}

To get the count, the code has to traverse the entire sequence. On a long, lazy-executed sequences, this can take significant time. Since I only want to know if the sequence contains one or more elements, it’s computationally more efficient to use the Any() extension method.

static void Method(IEnumerable<Status> statuses)
{
if (statuses != null && statuses.Any())
// do something...
}

In this case, Any() will return after examining the first element in the sequence. It also reads a better (IMHO).

 

PDC09 – Day 1

I was going to give a summary of the keynote proceedings but Dan Rigsby has already posted a great summary here.

It’s clear that Microsoft has really scaled back this year’s PDC. There is not much hype (at least for a PDC), the freebies, food and swag are “minimal” and the whole thing feels like a giant corporate yawn. Part of the fun of going to these events is to pickup on the “buzz” of what others are excited about. So far it’s been missing.

That said, I went to some very good breakout sessions yesterday. This is the “other reason” to go to conferences like PDC and in this case PDC has delivered the goods. I want sessions that “hurt my brain” and make me think about how approach programming problems and two of the sessions did exactly that.

First breakout was Future Directions of C# and VB. Luca Bolognese is a disarmingly charming young Italian with a thick accent and a great sense of humor. He wowed everyone at last year’s F# presentation. I attended this session because of the speaker and he delivered the goods. The most interesting bit of news here is a new use for the “yield” keyword. Although experimental, the idea is to yield control of a thread during async operations. It’s purpose is to resolve the ”many threads not enough cores issue” for parallel processing. I’m thinking we’ll use it in our current project when it becomes available.

Next came ASP.NET Futures. There are just so many cool things that are happening in ASP.NET that one presentation really can’t cover it. My favorite, ActiveRecord Integration. And not just with Entity Framework but other data providers. It even sports a “code first” model where you write the classes and just run. The framework creates and wires up a database and you’re off and running. Very nice.

Microsoft ASP.NET 4 Core Runtime for Web Developers. Again, I’m just blown away at the amount of “new stuff” coming in ASP.NET 4. This session focused on new tooling to allow better management of server resources. Frankly, much of it was “over my head” but then that’s sort of the point. Perhaps most interesting is that there are new tools to help find the “bad application” in an app pool that is running multiple applications. If you ever have encountered this problem (I have) you’ll really appreciate this new tooling.

Manycore and the Microsoft .NET Framework 4: A Match Made in Microsoft Visual Studio 2010. Usually by the end of the day I’m burned out and the last session can be a dud for that reason alone. Not this time. I suspect this will be the best breakout session (for me at least) of the conference. I can’t do any justice to it with a summary. Just spend an hour and watch it. It’s that good.

 

Nested Switch Statements to Table Lookup using LINQ

Note: My colleague Brian Genisio deserves much of the credit for this work. He urged me to consider a functional approach during a code review and then later contributed significantly to the code and this article. Check out his blog. Trust me, you’ll be a better programmer for it.

Note 2: This isn’t so much a “How to" as it is a “What I did” article. I would be really interested in any comments on this approach or other approaches. Even how it might be done in other languages like Ruby.

I was refactoring some code I wrote a few months ago and ran across this (illustrative example from original code):

static Expression<Func<Part, bool>> PartsFilter(FieldId field, Condition condition, string value)
{
switch (field)
{
case FieldId.Id:
switch (condition)
{
case Condition.Is: return part => part.Id == Convert.ToInt32(value);
case Condition.LessThan: return part => part.Id < Convert.ToInt32(value);
case Condition.GreaterThan: return part => part.Id > Convert.ToInt32(value);
}
break;

case FieldId.Name:
switch (condition)
{
case Condition.Is: return part => part.Name.CompareTo(value) == 0;
case Condition.Contains: return part => part.Name.Contains(value);
case Condition.StartsWith: return part => part.Name.StartsWith(value);
case Condition.LessThan: return part => part.Name.CompareTo(value) < 0;
case Condition.GreaterThan: return part => part.Name.CompareTo(value) > 0;
}
break;

case FieldId.Description:
switch (condition)
{
case Condition.Is: return part => part.Description.CompareTo(value) == 0;
case Condition.Contains: return part => part.Description.Contains(value);
case Condition.StartsWith: return part => part.Description.StartsWith(value);
case Condition.LessThan: return part => part.Description.CompareTo(value) < 0;
case Condition.GreaterThan: return part => part.Description.CompareTo(value) > 0;
}
break;
}

throw new InvalidOperationException();
}

It gets the job done . Also, there were about 20 top-level case statements in the original code so it spanned multiple pages which made it hard to visualize.

Given the repetitive nature of the the code, it really warrants some kind of lookup table. At the time, I was just coming up to speed on LINQ to SQL and Dynamic Predicate Building so “Getting it to just work” was my primary concern.

You’ll notice that the switch statement is just a double lookup: Find a given Part, filtered by a Part descriptor and an additional search criteria. Since I’m building my query expression dynamically, what I want to return here is a Predicate describing the filter. (Reminder, a predicate here is just a function that takes an argument and returns a bool). That’s what the switch statement is doing.

Here’s the same solution using a lookup table.

class FieldCondition : Dictionary<FieldId, ConditionFilter> { }
class ConditionFilter : Dictionary<Condition, Func<string, Expression<Func<Part, bool>>>> { }

static readonly FieldCondition _filters = new FieldCondition
{
    { FieldId.Id, new ConditionFilter {
        { Condition.Is, val => part => part.Id == Convert.ToInt32(val) },
        { Condition.LessThan, val => part => part.Id < Convert.ToInt32(val) },
        { Condition.GreaterThan, val => part => part.Id > Convert.ToInt32(val) }}
    },
    { FieldId.Name, new ConditionFilter {
        { Condition.Is, val => part => part.Name.CompareTo(val) == 0 },
        { Condition.Contains, val => part => part.Name.Contains(val) },
        { Condition.StartsWith, val => part => part.Name.StartsWith(val)},
        { Condition.LessThan, val => part => part.Name.CompareTo(val) < 0 },
        { Condition.GreaterThan, val => part => part.Name.CompareTo(val) > 0 }}
    },
    { FieldId.Description, new ConditionFilter {
        { Condition.Is, val => part => part.Description.CompareTo(val) == 0 },
        { Condition.Contains, val => part => part.Description.Contains(val) },
        { Condition.StartsWith, val => part => part.Description.StartsWith(val)},
        { Condition.LessThan, val => part => part.Description.CompareTo(val) < 0 },
        { Condition.GreaterThan, val => part => part.Description.CompareTo(val) > 0 }}
    }
};

// The PartsFilter method reduces down to a single line:
static Expression<Func<Part, bool>> PartsFilter(FieldId field, Condition condition, string value)
{
return _partsFilter[FieldId.Name][Condition.StartsWith]("spring");
}


The interesting bit here is double dictionary declaration:
 
class FieldCondition : Dictionary<FieldId, ConditionFilter> { }
class ConditionFilter : Dictionary<Condition, Func<string, Expression<Func<Part, bool>>>> { }

It looks a little daunting until you take it apart. It actually reads naturally left to right. It’s a dictionary of FieldId returning a dictionary of Condition returning a function that takes a string that returns an expression. The dictionary part is easy enough to understand but the “Function taking a string returning an expression” might be less familiar.

Func<string, Expression<Func<Part, bool>>>

If you work in LINQ to SQL for any length of time the Expression<Func<Part, bool>> part should be familiar. Because I need to pass a string to the filter for the comparison, I’ve wrapped the expression in a function that takes the string value.

The final part is constructing the lambda expression to express the filter. A lambda expression is a good choice here because it’s a concise and compact way of communicating intent. In this case, we have a lambda “going into” a lambda. A bit unusual but totally legal (and kind of cool looking).

val => part => part.Name.StartsWith(val)

In the end, the code is not much smaller so what have we gained here?  The first version is imperative where the second version is functional.  The first version has control flow, which has more opportunities for subtle bugs.  The second version, however, is really just a lookup table of functions to be executed so it is more maintainable.  Not only is it easy to retrieve the expression from the lookup table to generate new expressions, it is now possible to work with the lookup table like data. 

For instance, if you need to get a list of all expressions that respond to a particular condition, you can extract it from the table:

public IEnumerable<Expression<Func<Part, bool>>> PartsFiltersOfCondition(Condition condition, string value) 
{
return from conditionFilter in _filters.Values
let partsFilter = conditionFilter[condition]
select partsFilter(value);
}

Calling PartsFiltersOfCondition(Condition.Is, “spring”) will return all filters that contain Condition.Is and generate the expression for you.

Try doing THAT with a switch statement :)

 

Use LINQ Aggregate to Create Comma Separated Lists

It’s a common problem. You’re handed a collection of strings or numbers and you need emit a comma separated list. The  usual approach I’ve seen (and done) is to resort to a loop and some check to if it’s the first or last element to control when the comma is added.

var items = new string[] { "Wine", "Cheese", "Bread" };
bool first = true;
string picnicItems = string.Empty;

foreach (var item in items)
{
if (!first)
picnicItems += ", ";

first = false;
picnicItems += item;
}

I’ve always felt dissatisfied with doing this but could never find the energy to do something about it.

Enter the LINQ Aggregate extension method. As the name implies, it will “Aggregate” an IEnumerable using a given function.

var items = string[] { "Wine", "Cheese", "Bread" };
var picnicItems = items.Aggregate((s1,s2) => s1 + ", " + s2);

Much nicer IMHO. Also, if there is only one item in the list, it correctly creates the string without the comma. Sweet!

 

previous | next

powered by Bloget™