04 Oct 2008

A few months ago, we wanted to query an XML log file on our project. On that particular project, we can use .NET 3.5 features, while we have to stick to Visual Studio 2005.

For this task, LINQ seemed like a natural choice, provided that we could use it inside Visual Studio 2005 which doesn't natively support the sexy LINQ query syntax.

At first, we thought that it was not possible to use LINQ in Visual Studio 2005, but after some research it turned out that we were wrong. I found some clues on how to achieve that in one of the comments on this post comment by Charles Young.

The LINQ libraries are located in the System.Core assembly, so it is required to reference it in the project. Then, each file that will use LINQ will have to include the System.Linq using statement.

A simple LINQ query

Let's take a standard LINQ query on a String array

String[] persons = new String[] { "Philippe", "Steve", "Bill" };

var search = from p in persons
             where p.Contains("l")
             select p.ToUpper();

This is a fairly simple query, involving only filtering (where clause) an projection (select with transformation of the output). This native query will be translated by the compiler multiple times, we just have to follow the pipeline until we find something that Visual Studio 2005 understands.

The first translation results in a statement that uses Extensions methods on the array:

var search = persons
    .Where(p => p.Contains("l"))
    .Select(p => p.ToUpper());

These extension methods are from Enumerable class. However, Extension methods are not natively supported in Visual Studio 2005.

The reverse call of the previous statement will be a use some of Enumerable's static methods (as these are Extension methods, the reverse call will actually call a static method and give the object that you think it has been called on as parameter). In this particular example, it will be translated in:

var search = Enumerable.Select(
    Enumerable.Where(persons, p => p.Contains("l")),
    p => p.ToUpper());

Please note that this translation results in using Enumerable because for this particular case we are using LINQ to Objects. I suppose that using LINQ to SQL would result in using Queryable static methods, but I didn't try yet.

Also note that there is a call to Enumerable.Select because our initial query involves projection, in a sense that the result of the query (here a String object) is different from the input (here a String object that is a transformation of the input). If the projection statement of the initial query was "select p" (no transformation of the output), there would be no call to Enumerable.Select. More on this below.

Still, this statement is not supported in Visual Studio 2005 because of the lambda expressions that are not supported.

The next step for the compiler is to translate these lambda expressions into anonymous delegates:

var search = Enumerable.Select(
    Enumerable.Where(
        persons,
        delegate(String p)
        {
            return p.Contains("l");
        }),
    delegate(String p)
    {
        return p.ToUpper();
    });

Even if this looks like something we could use in Visual Studio 2005, it is not. Visual Studio 2005 will complain because it is unable to infer types (and because of the var keyword that is also part of C# 3.0 hence not supported in Visual Studio 2005).

We have to specify types explicitly for the Enumerable methods:

IEnumerable<String> search = Enumerable.Select<String, String>(
    Enumerable.Where<String>(
        persons,
        delegate(String p)
        {
            return p.Contains("l");
        }),
    delegate(String p)
    {
        return p.ToUpper();
    });

There we are, a LINQ query that can be used inside Visual Studio 2005! It's a bit confusing to write at first, but you'll get use to it.

Now that we have seen a simple query, let's have a look at the other query statements.

Ordering

Always using the same String array, here is the C# 3.0 style LINQ query

var search = from p in persons
             orderby p
             select p;

that translates in

IEnumerable<String> search = Enumerable.OrderBy<String, String>(
        persons,
        delegate(String p)
        {
            return p;
        });

As you can see, no need for Enumerable.Select call as the projection returns the same object.

If you want to order descending, use the Enumerable.OrderByDescending method instead. It is also possible to specify the IComparer that is to be used by the OrderBy/OrderByDescending method, thus allowing you to use ordering on custom object.

Joining

For joining, we need to define some struct in order do have two distinct list of elements.

So, here is a simple struct:

struct Name
{
    public int Id;
    public String Value;

    public Name(int Id, String Value)
    {
        this.Id = Id;
        this.Value = Value;
    }
}

And here are two array declarations:

Name[] firstNames = new Name[] {
    new Name(1, "Philippe"),
    new Name(2, "Steve"),
    new Name(3, "Bill")
};

Name[] lastNames = new Name[] {
    new Name(1, "Vlérick"),
    new Name(2, "Balmer"),
    new Name(3, "Gates")
};

Using two arrays, we will be joining on the Id field of the Name struct.

The C# 3.0 query:

var search = from fn in firstNames
             join ln in lastNames on fn.Id equals ln.Id
             select fn.Value + " " + ln.Value;

Here is the translation:

IEnumerable<String> search = Enumerable.Join<Name, Name, int, String>(
    firstNames,
    lastNames,
    delegate(Name n)
    {
        return n.Id;
    },
    delegate(Name n)
    {
        return n.Id;
    },
    delegate(Name n1, Name n2)
    {
        return n1.Value + " " + n2.Value;
    });

This one is a bit tricky and need some clarification.

The first two types given are the type of object that each collection contains. The third type is the type that the actual join will be made on and has to be the same on both collections. The last type is the type of the objects that will be stored in the returned collection. So, in this particular case, the first collection will contain Name objects, the second collection will contain Name objects as well, the join will be made on int types and the returning collection will contain String objects.

As parameters, the first two are collections that contain the object to use for the join. The next two parameters are the delegates that must return the type the join has to be made on (in this case, int). The first delegate will be used with each object of the first collection (in this case, firstNames) and the second delegate will be used with each object of the second collection (lastNames). The last parameter is a delegate that must return the given return type (here a String object) receiving the two joined objects as parameters).

Note that we don't need to explicitly call the projection method (Enumerable.Select) as there is a selector delegate that builds the output object.

Grouping

As a reminder, grouping means splitting the output in sequences of groups that have the same key value. The output of this type of query is a bit different from the previous queries.

A custom struct is needed to have an example object that we can test grouping on:

struct Student
{
    public String Name;
    public String Course;

    public Student(String Name, String Course)
    {
        this.Course = Course;
        this.Name = Name;
    }
}

We can then declare an array that we will query on

Student[] students = new Student[] {
    new Student("Philippe", "Data Structure"),
    new Student("Bill", "Marketing"),
    new Student("Steve", "Finance"),
    new Student("Bill", "Data Structure"),
    new Student("Steve", "Dance"),
    new Student("Bill", "Finance")
};

Lets group on Course field of the struct. Here is the C# 3.0 query:

var search = from s in students
             group s by s.Course;

As you can see, there is no projection. The output is a IEnumerable of IGrouping objects. An IGrouping object is a collection of objects that share a common key, in here Student objects with String as keys.

To iterate trough this, we need two foreach loops. the first one will iterate trough each IGrouping object, and the second one will iterate trough each Student.

foreach (var course in search)
{
    Console.WriteLine(course.Key);
    foreach (var student in course)
    {
        Console.WriteLine("- " + student.Name);
    }
}

This is how this query translates in Visual Studio 2005:

IEnumerable<IGrouping<String, Student>> search = Enumerable.GroupBy<Student, String>(
    students,
    delegate(Student s)
    {
        return s.Course;
    });

And the loops to display the content:

foreach (IGrouping<String, Student> course in search)
{
    Console.WriteLine(course.Key);
    foreach (Student student in course)
    {
        Console.WriteLine("- " + student.Name);
    }
}

Again, you have to specify types everywhere in Visual Studio 2005.

Other Thoughts

One of the other painful things with Visual Studio 2005 is that there are no anonymous types which are very handy to use with LINQ. It is required to write all the used types as Classes (or better, Structs).

Conclusion

Using static methods from Enumerable (and possibly Queryable), it is possible to use LINQ queries inside Visual Studio 2005. However, queries are a bit more complicated to write.

A good practice would be to comments the query intend extensively, as the query statement itself is hard to read.

Resources

Here are some of the resources I used to write this entry:

  • Programming C# 3.0 - O'Reilly, January 2008
  • "Linq To Anything" - Bart De Smet, March 2008
  • "LINQ Under the Covers" - Alex Turner, March 2008


blog comments powered by Disqus