API's and production code shouldn't be designed by scientists
One of the biggest mistakes Microsoft made in Database land was the absense of a proper paging mechanism in SqlServer 7 and SqlServer 2000. (No, don't come to me with tricks with @@ROWCOUNT because these don't always work in all situations). People had to use temp tables to get a mechanism which always works in all situations. What a surprise it was to see that Microsoft said they solved it in SqlServer 2005: they added a construct which offered paging inside the database without the necessity of temp tables. Though, who designed this feature? Did that person ever write production SQL code or did that person ever write normal business applications? Did anyone ever tell that person that all other major databases on the planet, except Access, had a simple paging mechanism build in so anything more complex than that would be bad? Apparently not.
There's a saying in IT land: when you let smart people solve a simple problem you get a overly complicated solution. The solution Microsoft chose for paging in SqlServer 2005 is one of the most bulkiest, complicated solution to paging ever created. I dare to say it's even more complicated than temp tables. Let's look at an example and how other databases handle it, shall we? In the code snippets below, pageNumber starts with 1 (which is the first page) and pageSize is the normal size of the page to retrieve.
Example
The typical example we want to use is this query:
SELECT * FROM Orders WHERE CustomerID LIKE 'A%' ORDER BY OrderDate DESC, ShippingDate DESC
(I use '*' instead of a list of fields for simplicity here, but you get the idea). The query is pretty simple. Now, if you ask a random developer what s/he would add to the SELECT syntaxis to implement paging, what do you think that person will answer in general? Very likely the person will say that the SELECT statement has to have a page specification and a page size specification and that's about it. You know, how Firebird solves it:
SELECT FIRST pageSize SKIP ((pageNumber-1) * pageSize) * FROM Orders WHERE CustomerID LIKE 'A%' ORDER BY OrderDate DESC, ShippingDate DESC
Which basicly means: skip the first n rows and then simply take the first m rows. Pretty simple, works very straight forward and works on every resultset. Syntaxis is clean, simple to understand and to the point. Clearly the work of an engineer who knows what s/he's doing.
Another simple approach would be how MySql and PostgreSql solve it: with a specification at the end of the query:
-- MySql SELECT * FROM ORDERS WHERE CustomerID LIKE 'A%' ORDER BY OrderDate DESC, ShippingDate DESC LIMIT ((pageNumber -1) * pageSize), pageSize
-- PostgreSql SELECT * FROM ORDERS WHERE CustomerID LIKE 'A%' ORDER BY OrderDate DESC, ShippingDate DESC LIMIT pageSize OFFSET ((pageNumber -1) * pageSize)
Which basicly comes down to the FIRST, SKIP approach: skip a set of rows and take the subset with the length specified.
Though are the big guns in the field, i.e. DB2 and Oracle, also doing paging this way? No, they have a slightly different approach. They do it a little more complicated, but not that much complicated. What they basicly do is that you can refer to the number of the row in the total resultset and simply filter on that. So if the example query results in 1,000,000 rows, and you want the 10th page (which is the page with pageNumber==10) of 25 results, you can simply refer to the ((10-1) * 25)th row in the resultset as the first row (row counting starts with 0) in the resultset to grab and then grab the 25 rows you want. You can do this with a normal WHERE clause. It's quite straight forward. Let's look at DB2's query first:
-- IBM DB2 SELECT * FROM ( SELECT a.* , rownumber() over () AS rn FROM ( SELECT * FROM ORDERS WHERE CustomerID LIKE 'A%' ORDER BY OrderDate DESC, ShippingDate DESC ) AS a ) AS rs WHERE rs.rn between (((pageNumber-1) * pageSize)+1) AND (pageNumber * pageSize)
This looks complicated at first but it's not: whatever query you have, it's always placed at the same spot, so the paging query itself is always the same. That's a key point, as writing queries can be tedious and if you can use a paging query which is nothing more than a wrapper around the main query, you're not likely to make a mistake, in fact you can factor out the paging query to a piece of sequel you simply apply to your main query by using a routine.
The Oracle one is a bit simpler:
SELECT * FROM ( SELECT a.*, rownum r__ FROM ( SELECT * FROM ORDERS WHERE CustomerID LIKE 'A%' ORDER BY OrderDate DESC, ShippingDate DESC ) a WHERE rownum < ((pageNumber * pageSize) + 1 ) ) WHERE r__ >= (((pageNumber-1) * pageSize) + 1)
The same basic idea: refer to the physical row number in the final resultset of the row you want to retrieve and then return that row and the n following rows which all form the page you want to fetch. In Oracle you too can use a wrapper routine for every query you have.
So, two main alternatives to choose from: add simple keywords to the SELECT syntaxis and hide every little dirty element necessary for paging for the developer or offer a way to use the row number inside a paging query and basicly let the developer wrap any query with a standard paging query.
So what does Microsoft bring to the table? Well, it does sound like the DB2/Oracle method: the ROW_NUMBER() function. At first, it looks like we can go the DB2/Oracle route and simply write a wrapper query which pages everything. But no, that's not the case. Of every option thinkable, the scientist who designed this syntax thought it had to be as much complicated as it possibly can be made: your original query has to be split up to work properly with the ROW_NUMBER() function. That means that any wrapper approach isn't going to work, which means for every query that has to be paged, a complicated new query has to be formulated. No, there's no simple SELECT statement syntaxis. Any engineer working on this API would have opted for that approach. A person however who never uses his/her own work will perhaps opt for a different approach but even then, the DB2/Oracle approach is more simpler to use than the SqlServer 2005's approach will ever be. Let's take a look at how the simple example query will look like in SqlServer 2005. I won't use a CTE here, to focus on the main SQL.
select TOP 10 b.* FROM ( SELECT ROW_NUMBER() OVER ( ORDER BY OrderDate DESC, ShippingDate DESC) as row_number, * FROM ( SELECT * FROM Orders ) AS a WHERE a.CustomerID LIKE 'A%' ) AS b where b.row_number > (pageSize * (pageNumber-1))
So, let me get this straight: to page over a resultset in SqlServer 2005: you have to split your query into fragments which then move to different parts of the final query. Not only that, but if you refer in your WHERE clause to a table, like: WHERE Orders.CustomerID LIKE 'A%', it goes wrong: in the new WHERE clause, you have to refer to the set with the ROW_NUMBER function. This can become complex very very quickly, mainly because the actual query is completely ripped apart. There are alternatives to this structure though, neither one of them is easier than the one I pointed out above.
Scientists should stick with science. What they invent and discover should be moved to the real world by engineers, not by scientists as well. Because, if you leave it to scientists, you'll end up with horrible syntaxis like the ROW_NUMBER() function in SqlServer 2005. It might be a powerfull feature as it has functionality which might be usable in some situations (mostly in reporting scenario's, not in paging), however they never ever should have used the function as the paging functionality cornerstone in SqlServer 2005: they simply should have added a couple of simple keywords to SELECT and be done with it. Though if you aren't an engineer, if you don't work with the code you write yourself, if you don't write user-code up front to see if your API really is easy to use and consistent, you fall back on what you understand of what you created, which is basicly everything used as the foundation of what you created and of course what you created yourself, which makes everything look easy and straight forward. That's the pitfall in which scientists fall and which engineers many times manage to avoid
For completeness:
Again, my disclaimer: you can think that @@ROWCOUNT works always, or a simple TOP trick works always: they don't. There's no wrapper/SELECT trick in SqlServer whatever version which works with all queries you throw at it so you can write consistent code.