SelectMany, Sorting and Grouping Objects

02 Jul 2010

So here is the problem: I have a list of items that

var collection = new[]
{
    new { Title = "One", References = "1;3" },
    new { Title = "Two", References = "2;3" },
    new { Title = "Three", References = "1;4" },
    new { Title = "Four", References = "4"}
};

The References fields of these object is some kind of category. What I want to do here is to have a list for each different reference (in this example: 1, 2, 3 and 4) containing all the items that are in the reference. Items will be duplicated if they are in more than one category.

To sum it up, the expected output would be: One, Three, Two, One, Two, Three, Four

After fooling around a bit, here is the query I came out with:

var query = from c in collection
            from d in c.References.Split(';')
            orderby d
            group c by d into groups
            select groups;

This does exactly what I want and produces the output I expected from the input data.

However, when I use Linq, I generally use extensions methods directly and not the pretty query syntax. This is mostly because I want to understand what happens behind the scene, and I have to admit that this query was quite a beast.

First, as there are two from clauses, there is a SelectMany somewhere. You probably know that SelectMany is a kind of the beast and that understanding it fully is quite a challenge compared to the other operators/extensions methods. Also, I thought that the GroupBy clause was going to be tough, as we groups c items by d which is in the other collection.

I couldn’t figure out by myself how to write that query using extension methods, so I fell back on the good old Reflector that gave me a straight answer:

var query = collection.SelectMany(delegate (<>f__AnonymousType0 c) {
    return c.Values.Split(new char[] { ';' });
}, delegate (<>f__AnonymousType0 c, string d) {
    return new { c = c, d = d };
}).OrderBy(delegate (<>f__AnonymousType1<<>f__AnonymousType0, string> <>h__TransparentIdentifier0) {
    return <>h__TransparentIdentifier0.d;
}).GroupBy(delegate (<>f__AnonymousType1<<>f__AnonymousType0, string> <>h__TransparentIdentifier0) {
    return <>h__TransparentIdentifier0.d;
}, delegate (<>f__AnonymousType1<<>f__AnonymousType0, string> <>h__TransparentIdentifier0) {
    return <>h__TransparentIdentifier0.c;
}).Select(delegate (IGrouping<>f__AnonymousType0> groups) {
    return groups;
});

After reading that, it made much more sense. Here is what I came up with when writing it on my own:

var p = collection
    .SelectMany(c => c.References.Split(';'), (c, d) => new { c, d })
    .OrderBy(t => t.d)
    .GroupBy(t => t.d, c => c.c);

Much more readable. The idea here is that the SelectMany clause outputs a sequence of anonymous types that contains the two kind of elements. This sequence is then sorted with the OrderBy, and finally fed trough a GroupBy that uses the d property as the grouping key and the c property as the project in the resulting collections. Not that difficult after all…

Here is another version that is probably a bit more clear:

var q = collection
    .SelectMany(c => c.References.Split(';'), (c, d) => new { Title = c.Title, Reference = d })
    .GroupBy(c => c.Reference, c => c.Title)
    .OrderBy(g => g.Key);

Note that this is a simplified version of the original issue. The issue itself was to do this with some ListItems retrieved from SharePoint. Objects were a bit more complicated, but logic is the same.

C# 32
LINQ 7