VB.NET: Beware of the 'Aggregate' keyword (updated)
UPDATE I tested this initially with EmployeeID and noticed the strange behavior. Writing this blogpost I thought the max of the employeeID was a little artificial, so I changed that in OrderDate. But... what happened (see my reply to this blogpost below in the comments) ? When o.OrderDate is used, the VB.NET compiler produces a proper expression tree! But when EmployeeID is used, it doesn't. Both are nullable types. When using CustomerID it also works properly. It's beyond me why it fails to produce a proper expression tree when EmployeeID is used. Nevertheless, the advice I gave below is still valid: do yourself a favor and call the aggregate extension methods of All, Any, Max, Min, Sum etc. when you want to obtain these values from a set in a scalar query.
This afternoon I was writing some VB.NET specific Linq queries for the documentation for our upcoming LLBLGen Pro v2.6 (currently in beta) version, and typically I was simply porting over C# queries to VB.NET code and see if they work and if they do, use them, if not, start some cursing, quests for help in the MSDN docs and repeat the process. During my travels through the MSDN Linq documentation and what Microsoft had added to VB9 (.NET 3.5), I ended up in the VB.NET Linq specific language clauses documentation. One of them is about the Aggregate clause.
The Aggregate clause was a strange fellow to me. It didn't ring any bell. What does it do? Well, let's look at a typical example: I want to fetch the maximum EmployeeID for the Northwind Orders. In SQL one would do:
SELECT MAX(EmployeeID) FROM Orders
Easy enough. In Linq for VB.NET, using this Aggregate keyword, you can write:
Dim max = Aggregate o in metaData.Order Into m=Max(o.EmployeeID)
Looks ok, right? Running it will give the maximum EmployeeID, all is well. Or not? When you examine the trace output of the query generated, or check the expression tree passed into the provider, you'll see that all Order entities are fetched, and the aggregate is simply ran in memory!
This is because the Expression tree has no notion of 'Aggregate' nor 'Max'. The only thing that's in the expression tree is metaData.Order.
So I checked it with Linq to Sql. Same result. This is a very dangerous keyword if applied without proper caution: you'll fetch all data into memory for the aggregate to be applied. This can lead to serious performance issues at runtime, for example if the Order table contains 10 million rows, the retrieval of the Max date using the Aggregate keyword isn't going to perform...
What to use instead?
Use the C# way: call the extension method:
Dim max = metaData.Order.Max(Function(o) o.EmployeeID)
This will give you a query which executes the Max aggregate function on the single field EmployeeID in the database server, so no slow fetching of all the data.