Reply to "What ORMs have taught me: just learn SQL"
This is a reply to "What ORMs have taught me: just learn SQL" by Geoff Wozniak.
I've spent the last 12 years of my life full time writing ORMs and entity modeling systems, so I think I know a thing or two about this topic. I'll briefly address some of the things mentioned in the article.
Reading the article I got the feeling Geoff didn't truly understood the material, what ORMs are meant for and what they're not meant for. It's not the first time I've seen an article like this and I'm convinced it's not the last. That's fine; you'll find a lot of these kind of articles on many frameworks/paradigms/languages etc. in our field. I'd like to add that I don't know Geoff and therefore have to base my conclusions on the article alone.
Re: intro
The reference to the Neward article made me chuckle: sorry to say it but bringing that up always gives me the notion one has little knowledge of what an ORM does and what it doesn't do. An ORM is just a tool to translate between two projections of the same abstract entity model (class and table, which result in instances: object and table row); it doesn't magically make your crappy DB look like one designed by CELKO himself nor does it magically make your 12 level deep, 10K object wide graph persist to tables in a millisecond as if there was just 1 table. Neither will SQL for that matter, but Geoff (and Neward before him) silently ignores that.
An ORM consists of two parts: a low level system which translates between class instances and table rows to transport the entity instances (== the data) back and forth, and a series of sub-systems on top of that to provide entity services (validation, graph persistence, unit of work, lazy / eager loading etc. etc.)
It is not some sort of 'magic connector' which eats object graphs and takes care of transforming those to tabular data of some sort with which you don't want to know anything about. It also isn't a 'magic connector' which reads your insanely crappy relational model into a dense object graph as if you read the objects from memory.
Re: Attribute Creep
He mentions attribute creep (more and more attributes (==columns) per relation (==table)) and FKs in the same section, however I don't think one is related to the other. Having wide tables is a problem but it's a problem regardless of what you're using as a query system. Writing projections on top of an entity model is easy, if your ORM allows you to, but even if it doesn't, the wide tables are a problem of the way the database is set up: they'll be a problem in SQL as well as an ORM.
What struck me as odd was that he has wide tables and also a problem with a lot of joins which sounds like he either has a highly normalized model, which should have resulted to narrow tables, or uses deep inheritance hierarchies. Nevertheless, if a projection requires 14 joins, it requires 14 joins: the data itself isn't obtainable in any other way otherwise it would be doable through the ORM as well (as any major ORM allows you to write a custom projection with joins etc. to obtain the data, materialized in instances of the class type you provide). It's hard to ignore the fact the author might have overlooked easy to use features (which hibernate provides) to overcome the problems he ran into and at the same time it's a bit odd a highly normalized model is the problem of the ORM and won't be a problem when using SQL (which has to work with the same normalized tables)
He says:
Attribute creep and excessive use of foreign keys shows me is that in order to use ORMs effectively, you still need to know SQL. My contention with ORMs is that, if you need to know SQL, just use SQL since it prevents the need to know how non-SQL gets translated to SQL.
I agree with the fact that you still need to know SQL, as you need to formulate the queries in your code in such a way that it leads to more efficient SQL; an ORM can do a bit of optimization but it is almost impossible to do without statistics/data (which are not available at that stage). But you can't conclude from that to 'just use SQL', as that's like recommending to learn to write Java Bytecode because the syntax of Clojure is too hard to grasp. A better conclusion would be to learn the query system better so you can predict the SQL which will be produced.
Re: Data Retrieval
Query performance is always a concern, and anything between code and the actual execution of the DML in the DB is overhead. Hand-optimized SQL might be a good option in some areas, but in the majority of cases queries generated by ORMs are fine, even hibernate's ;). Most ORMs have a query language / system which is derived from SQL to begin with (the mentioned hibernate does: HQL) and it is predictable what SQL it will roughly produce.
Sure, if you create deep inheritance hierarchies over your tables, you might run into a lot of joins, but that's known up front: inheritance isn't free, one knows what it will do at runtime. "Know the tool you're working with". If Geoff was surprised to see a lot of joins because a 14-entity deep inheritance hierarchy was pulled from the DB, he should have known better.
He says:
From what I've seen, unless you have a really simple data model (that is, you never do joins), you will be bending over backwards to figure out how to get an ORM to generate SQL that runs efficiently. Most of the time, it's more obfuscated than actual SQL.
I find this hard to believe with the query systems I've seen and written myself, with one exception: Linq. Linq is a bit different because it has constructs (like GroupBy) which are different in Linq/code than they are in the DB which require a translation of intend from the query to SQL and thus can / will lead to a SQL query which might not be what one would expect when reading the Linq query.
The usage of Window functions and other DB specific features (like query hints) might be something not doable in an ORM query language. There are several solutions to that though, one being creating DB functions which are mapped to code methods so you can execute the constructs inside your query using those methods which will result in using the functions in the SQL query, another being DB Views. They both require actions inside the RDBMS which is less ideal, but if it helps in edge cases, why not? They're equal to adding an index to some fields to speed up data retrieval, or creating a denormalized table because the data is read-only anyway and it saves the system using it a lot of joins.
Re: Dual schema dangers
Here I saw the struggle Geoff had with the concept of ORMs. This isn't uncommon, e.g. Neward (in my opinion) expresses the same struggle in his cited essay. There are two sides with a gap between them: Classes and Table definitions. If you start with classes and try to create table definitions from these it's equal as starting with the table definitions and try to create classes from these: both are the projection result of an abstract entity model and to get one from the other requires reverse engineering the side you start with to the abstract entity model it was the projection of and then projecting that to the side you want to create: starting from classes or table definitions doesn't matter.
I do understand the pain point when you start with either side and have to bridge the gap to the other side: without the abstract entity model as the one true source of truth, it's always a problem when one side changes to update the other side.
Geoff tries to blame this on the ORM but that's not really fair: the ORM is meant to work with both sides (class and tables) at run time, not at design time; it requires a system meant for modeling an abstract entity model to manage both sides, as both sides are the result of that model, not the source of it. (I wrote one, see 'Links to my work' at the top left. I didn't want to pollute this article with references to my work)
Re: Identities
Creating new entity instances which get their PK set by a sequence in the DB are the main cause of the problem if I understand Geoff's description correctly. In memory, these entities have no real ID and referring to them is a bit of a pain, true. But that's related to working with objects in general: any object created is either identified by some sort of ID you give it or its memory location ("the instance itself"). I don't get the struggle with the cache and partial commits: if you want to refer to objects in memory, it's equal to what you would do if they weren't persisted to a DB. That they get IDs in the DB in the case of sequenced PKs is not a problem: the objects get updated after the DB transaction completes. Even hibernate is capable of doing that.
Re: Transactions
This section is a typical description of what happens when you confuse a DB transaction with a business transaction. A business transaction can span more than one DB transaction, might involve several subsystems / services, might even use message queues, might even be parked for a period of time before commit. A DB transaction is more explicit and low-level: you start the transaction, you do work, you commit (or rollback) the transaction and that's it.
Geoffs reference to scope is good, it illustrates that there's a difference between the two and therefore you shouldn't use a DB transaction when you need a business transaction. However it's too bad he misses this himself. Often developers try to implement a business transaction at the level of an ORM by using its unit of work, but it's too low level for that: a business transaction might span several systems and an ORM isn't the right system to control such a transaction; it's meant to control one DB transaction, that's it.
That doesn't mean the ORM shouldn't provide the tools to help a developer write proper business transaction code with the systems controlling the business transaction. After all, the second part of an ORM is 'entity services' and one being 'Unit of work'. Most ORMs follow the Ambler paper and combine a Unit of Work with their central Session or Context object. This leads to the problem that you can't offer a Unit of Work without the central Session or Context object and thus when you actually want a Unit of Work to pass around, collecting work for (a part of) the business transaction, you don't want to deal with a Session / Context object which also controls the DB connection / transaction; it might be that at that level / scope it's not even allowed / possible to do DB oriented work.
It's therefore essential to have an ORM which offers a separate Unit of Work object, which solves this problem. Additionally to that, the developer has to be aware that a business transaction is more than just a DB transaction and should design the code accordingly.
Re: Where do I see myself going
A highly normalized relational model (4+ normal form) which is used to retrieve denormalized sets is not likely to perform well (as the chance of a high number of joins in most queries is significant), no matter what query system you're using. I get the feeling parts of what Geoff ran into is caused by reporting requirements (which often requires denormalized sets of (aggregated) data), parts are caused by inheritance hierarchies (not mentioned but according to the # of joins which were unexpected I think this is the case) and partly caused by poorly designed relational models.
None of those are solved magically if you use SQL instead of HQL or whatever query language you're using in an ORM. Not only is 'SQL' a query language and not a query system, it also doesn't make the core problems go away. Well, perhaps the inheritance one as you can't have inheritance in SQL, but then again, you're not forced to use inheritance in your entity model either.
He says:
By moving away from thinking of the objects in my application as something to be stored in a database (the raison d'être for ORMs) and instead thinking of the database as a (large and complex) data type, I've found working with a database from an application to be much simpler.
Here Geoff illustrates clearly a misconception about ORMs: they're not there to persist object graphs into some magic box in the corner, they're a system to move entity instances(==data) across the gap between two projections of the same abstract entity model. It's no surprise it turns out to be much simpler if you see your DB as part of your application, because it is part of your application. If we ignore the difference in level of abstraction, it's equal to talk to a DB through a REST service as it is to talk to a DB through an ORM which provides you with data: both times you go through an API to work with the entity instances on the other side. The REST service isn't a bucket you throw data in, and neither is the ORM.
Re: conclusion
SQL is a query language, not a query system. It's therefore not an alternative to the functionality provided by an ORM. ORMs make some things, namely the things they're built for, very easy. They make other things, namely the things they're not built for, hard. But the same can be said about any tool, including SQL (if we see a language as a tool): SQL is set oriented, and therefore imperative logic is hard to do, so one shouldn't do imperative logic in SQL. Blaming SQL for being crap in dealing with imperative logic doesn't make it so, it merely shows the person doing the blaming doesn't understand what SQL is meant to do and what it isn't meant to do.
In closing I'd like to not that what's ignored in the article is the optimized SQL ORMs generate with respect to e.g updates and graph fetches (eager loading). Left alone the fact that to execute the SQL query and consume the results, one has to write a system which is the core of any ORM: the low-level query execution system and object materializer.
It always pains me to read an article like Geoff's about a long struggle with ORMs as it's often based on a set of misconceptions what ORMs do and what they don't do. This is partly to blame on some ORM developers (let's not name names) themselves which try to sell the image that an ORM is a magic object graph persister and will turn your RDBMS into an object store. It's also party to blame on the complexity of the systems themselves: you don't simply learn how to use all of the ORM features and quirks overnight.
And sadly, it's also party to blame on the users, the developers using the ORMs, themselves. Suggesting a query language as the answer (and with that the tools that come with it) isn't going to solve anything: the root problem, working with relational data in an OO system, i.e. bridging the cap between class and table definition, still has to be solved, and using SQL and low-level systems to execute it will only move that problem onto your own plate, where you run the risk of re-inventing the wheel, albeit poorly.