Developing Linq to LLBLGen Pro, part 12
(Updated Wednesday 30-jan-2008). It was mentioned that we would implement 'Skip' as well, although we already had a paging method added, TakePage(). After carefull analysis, we decided not to implement Skip for now. The reason is that it can lead to confusing queries, while paging is what the developer wants. We believe TakePage() serves the developer better than a combination of Skip / Take (Take is still supported separately) which won't work in a lot of cases if Skip is specified alone.
(This is part of an on-going series of articles, started here)
Cast again
The last episode in this series contained a remark about Queryable.Cast, and that it can be ignored. I've to correct myself there, this isn't entirely correct. Let's look at the example query at hand:
// LLBLGen Pro Linq example var q = from e in metaData.Employee.Where(e => e is BoardMemberEntity).Cast<BoardMemberEntity>() select e.CompanyCarId;
Here, the Cast is actually irrelevant because the Where already filters on the BoardMemberEntity type. However in the following small query, it's not:
var q = from e in metaData.Employee.Cast<BoardMemberEntity>() select e.CompanyCarId;
Here, the Cast is actually a type filter. Well... sort of. The thing is: there's no real 'good' way to handle this, similar to 'as', which is discussed below. Imagine the situation where the employee instance is of type ManagerEntity (supertype of BoardMemberEntity), and not of type BoardMemberEntity. What should the query above do in that case? The instance 'e' in that case doesn't have a CompanyCarId field, as that's a field only available in the BoardMemberEntity type. The specification of Queryable.Cast doesn't reveal any help here: it doesn't say what should be done when the Cast can't be performed. So, I decided it emits a type filter instead. This means that the query above always succeeds if there are BoardMemberEntity instances in the database, it simply filters out any types which aren't of the specified type, in this case BoardMemberEntity.
OfType
Queryable.OfType is a bit of a weird method, in that it actually does more or less what Cast does in that it filters out any elements which aren't of the type specified. Of course, 'Cast' by definition isn't a filter, but in database queries, you have little choice here: SQL isn't imperative, at least not in the way SELECT queries work. OfType therefore is implemented similar to Cast: it emits a type filter into the query.
The 'as' keyword
I already had implemented support for 'is', which resulted also in a typefilter (if you now have the feeling there are a couple of redundant ways to specify the same thing in Linq, you're correct), but 'as' is a little different. It again is more or less a keyword which is actually not really usable in database queries. Let's look at a query which illustrates this:
var q = from e in metaData.Employee where (e as BoardMemberEntity).CompanyCarId==1 select e;
This in itself looks pretty straight forward, but look closely to what it represents: what if 'e' is a ClerkEntity (not a supertype of BoardMemberEntity) ? The 'as' keyword should then result in null/Nothing, so accessing the property CompanyCarId on it should result in an exception, at least in .NET code it does. But how are you going to convert this into SQL? One could argue, and I agree with that, that the query above is pretty poor code and constructions like that should be discouraged. Though the thing is that just because it's possible, it will cause some people to write code just like that.
Fortunately, the code above is translatable to SQL in general form, if the 'as' keyword is seen as a type conversion, so 'e' is simply seen as a BoardMemberEntity and the fact that a 'null' can occur is ignored as the filter on CompanyCarId will weed out any types without that field in the query anyway (no matter what inheritance mapping strategy is used). The only difference is that if you expect an error, you won't get one. Let's look at a more nastier version of the one above which actually is more correct:
var q = from e in metaData.Employee let x = e as BoardMemberEntity where (x!=null) && (x.CompanyCarId == 1) select x;
Here, across two statements, a type filter is written: first the 'as' conversion in the let statement, and then the test for null on the result. You can generally write this as a handling of type specification equal/not equal null and by adding a special handler for that in the general binary expression handler, you can convert that situation to a type filter!
There's something not really great about the query above though. Linq to Sql has no problems, because it only supports single-table inheritance. If the O/R mapper supports multi-table/view inheritance, like LLBLGen Pro does, you have a bit of a problem with field names. Consider a hierarchy where the root has three branches of subtypes, and in two branches a field Foo is present, though not in the third branch. If you now fetch all entities of the root type, you will end up with multiple times the same field in the projection. The fix for that is field aliasing, namely you give every field an artificial alias, e.g. Fx_n, where x is some index to specify the entity and n is the index of the field in the entity.
The problem begins when 'let' entered the room: a 'let' is effectively a wrapping SELECT statement. (You can rewrite some queries by moving the query to other parts of the query but that's not always possible). The query wraps the original query in its FROM clause as a Derived Table and simply selects from that source, making the projection required for let. Now, imagine you have your fields aliased with artificial aliases inside that source. Your Derived Table fields now have names like F0_1, F3_4 etc., but not 'CompanyCarId'. This means that the filter on CompanyCarId, which is outside the Derived Table, as let wrapped what's before 'let' in the query, will fail if it targets 'CompanyCarId'. Normally it would work, as the JOIN statements which joined every subtype's Table/View would be accessable for the WHERE clause, but in this case that's not possible because the joins are inside the Derived Table.
As elements are processed in different areas of the Linq provider (you're not going to write a big routine which handles every situation), the handler for Where has no clue if the field should be re-aliased, at least not at that point. So I wrote a traverser, a class which crawls over all LLBLGen Pro query api objects and finds the elements sought. Basicly it's a base class which simply visits all object elements and if you in a derived class override a given method you can tap into this process and do what you need to do, e.g. collect Derived Table definitions, fields targeting derived tables etc. I wrote a couple of derived classes which collected information for me this way and with that I could easily fix every field alias and target in the entire query, so CompanyCarId references in filters outside the Derived Table would then be fixed to target the Fx_n field which represented CompanyCarId. The crawlers are also handy to find places where inheritance relations have to be injected into the query, for example because the query is folded into a subquery inside a Derived Table. The classes will be public in the upcoming runtime library of LLBLGen Pro v2.6 so our customers can use them as well for their own fine-grained query voodoo magic .
Except and Intersect
Except and Intersect are two methods which are actually each other complement: you can implement both with a single construct and just a flag to add 'NOT' to the query fragment. Except and Intersect can be implemented as an EXISTS query construct, similar to the work done for Contains: a lot of code can be re-used for these two methods.
Except and Intersect can have an IEnumerable as argument. Although Linq to Sql doesn't support it (I don't know why) it's perfectly doable, and in Linq to LLBLGen Pro you can pass an in-memory set of elements to Except or Intersect. What's particularly weird with Linq to Sql not supporting it is that the code to support in-memory sets as argument for these two methods is also usable for Contains. Oh well... .
There's some pitfall to be noticed with these two methods however. Consider this query:
// won't work var q = from c in metaData.Customers where someCollection.Except(c.Orders).First().EmployeeId==3 select c;
This query might look like a bit far-fetched, but the general idea is this: it's not possible to use an in-memory collection in a database-targeting query where Except is called on that in-memory collection with a database set as argument. Do you see why? It took me 3 days to realize this, so don't worry if you don't see it right away. The thing is: Except filters the set it is called on. But that set isn't in the database, you've to pass it to the database in the query. With PK fields, that might be possible, but not with complete entities, that's undoable. With Intersect it could be done though, however I haven't implemented that for now, as it's easy to work around it (swap the operands of Intersect ).
Single
Now here's some method I have no idea why it is in the API: Queryable.Single. Let me first quote the specification of the method:
Returns the only element of a sequence, and throws an exception if there is not exactly one element in the sequence.
There's an overload which accepts a predicate and which simply means that Single(predicate) will return the only element in the sequence which matches this predicate. What I find odd is the remark about throwing an exception: why would anyone call such a method? There is a rule about exceptions: "Don't ever use exceptions for control flow in your application". Exceptions aren't expressions for if/else constructs, they're serious business: they mean something was definitely wrong and needs handling to avoid a total crash. In database scenario's it's even weirder: which exception should you expect? And more importantly: what does it mean? If you don't know what the true meaning of an exception is, you'll never be able to handle it.
The example in the MSDN to illustrate Queryable.Single() is pretty bad actually, because it uses exceptions as control flow. Not only that, the example fails to illustrate a valid case where the exception would be something you would want. This is important, because... I can't think of a use case for Queryable.Single() where you would want the exception. After all, exceptions aren't meant for control flow, so I don't expect an exception to drive my code as if an if-statement resolved to false: it's not meant to be a test if a set contains more than one element.
The thing with Single is that it's redundant, at least for database queries. You can also use First(). First() also returns a single element, but it doesn't throw an exception when there are more than one element in the sequence. With database queries, using Single() has the same behavior as First(). Sure, I could add code which flags the resulting QueryExpression object that the result should be checked if it has more than one element and if so, an exception should be thrown, as Linq to Sql does with this query:
// Linq to Sql code var q = (from c in nw.Customers select c).Single();
However, do I get the same exception that there are more than one element in the sequence with this one:
// Linq to Sql code var q = from c in nw.Customers where c.Orders.Single().EmployeeId==3 select c;
No. With this query I get a hardcore severity level 16 error from SqlServer that the query is wrong because the subquery returns more than one element which can't be used with the operator (=) specified. This isn't the fault of Linq to Sql, what else can it do? Call RAISERROR (someone at Sybase still feels ashamed about that typo I bet ) with a Count check? Why? Would that help the caller of the query? I'm not convinced it necessary will help. The 'Single' method is simply not useful in database queries: for the Single() overload, use First(), for the Single(predicate) overload use Where(predicate).First(). Though, as the requirement of the method is that it should throw an (not specified which one) exception, these usable synonym statements aren't completely covering what Single() represents. Though in my opinion, the exception requirement is a big mistake: if you need behavior to be called when a set has more than one element, you should test on that specification and call the behavior if the test succeeds, not by issuing a query which in the end fails with whatever error takes place, and count on that to handle things further.
I'll add support for it, though under protest. The reason I'll add support for it is to be compatible with queries which target other O/R mapper frameworks.
Linq to Sql issue(s)
During development of the Linq to LLBLGen Pro provider I often check what Linq to Sql does with a given query I use for testing and to see if for example my SQL is more efficient or falls flat on its face. Sometimes you run into unexpected things. When I tried an Except or Intersect query, I saw that 'DISTINCT' was emitted into the query. Everyone who knows SQL knows that DISTINCT is a keyword you have to be careful with, not all databases support every type of field with DISTINCT. In SqlServer for example, (n)text and image fields aren't supported in a DISTINCT projection. Not sure if this was a small glitch or that there was logic which would prevent DISTINCT in queries where it's not allowed I tried:
// Linq to Sql, gives crash with NotSupportedException. var q = (from e in nw.Employees select e).Distinct();
It too emitted DISTINCT into the query, which was caught by its validation checker. However what's worse: Except and Intersect therefore also aren't usable with Linq to Sql and any image/(n)text containing entity type: DISTINCT is always emitted into the query. This query for example gives the same exception, though no DISTINCT was specified in the Linq query:
var q = from e in nw.Employees.Except( from e in nw.Employees where e.Country == "USA" select e) select e;
One could argue: "But if DISTINCT isn't possible, how to weed out the duplicates?". Well, you do that on the client in the routine which consumes the datareader and constructs the objects to return. You keep hashtables with hashes calculated from identifying attributes like PK fields and with that you filter out duplicates. At least with entity fetches like this one. Not supporting situations where DISTINCT can't be emitted into a SQL query is a typical error one could make in a v1 O/R mapper, it's only a bit sad for Linq to Sql users that Microsoft is so generous with releasing fixes for their framework on a regular basis.
Writing a Linq provider is a lot about true software engineering, I've written about that several times before: some things are known but the biggest part is unknown territory: what is constructed in which order, when to expect what, is it safe to ignore this or that? Unclear things one can only find out when it is used, i.e. by trial/error approaches. This is actually a bit sad, because it's now easy to overlook mistakes or miss corner cases, as the bigger picture isn't always clear. For example: what does VB.NET emit into the expression tree for string concatenations if C# emits 'Add' operations ?
With the DISTINCT keyword popping up in the Linq to Sql query for Except I was immediately alarmed: why is it there? Linq to Sql never emits DISTINCT when it's not told to do so. You then start thinking about it: is it something which is a left-over from their tree manipulations? Or is it hardcoded set to be emitted? It turns out it is. In QueryConverter.VisitExcept and QueryConverter.VisitIntersect it sets 'IsDistinct' to true. I couldn't think of a reason to have it default to DISTINCT, so I didn't add a requirement for that to our tree for Except and Intersect, also because it uses an EXISTS query, which doesn't care if DISTINCT is there or not.
Both routines are also a clone of eachother. Clones are easily created and often overlooked, however this one is particularly obvious, especially because the behavior of the two routines is closely related, so the implementation of the handlers is then too closely related.
What's next?
Implementing 'Single', probably tomorrow, then on to implementing the database function framework I have in mind: it has to be a framework where developers can add their own function mappings to the provider so they can map their own extension methods to database functions easily. When that's done, Queryable.Convert gets another look as some scenario's require it not to be stripped off but handled instead, though that relies on the function-mapping framework, so it has to wait till then. After that, hierarchical projections and prefetch path support are on the menu (prefetch paths are more a small addition as the core functionality is already in the runtime for quite some time, hierarchical projections require the prefetch path merge code already in the framework to be opened up to the Linq provider) and then I'm done with the Linq part. Finally. Stay tuned.