Why use the Entity Framework? Yeah, why exactly?
Danny Simmons wrote a marketing piece about the project he's been working on for so long: "Why use the Entity Framework?". I don't expect Danny to be unbiased towards his own work, so at first I just ignored it: Microsoft produces these kind of 'use our stuff, it's better than sliced bread'-articles a dozen times a day. However, this particular article seems to be a discussion subject and is supported by non-Microsoft people on other blogs, so it's time to stop ignoring it and start to refute the contents of the article, despite it being marketing. After all, it doesn't look like it's marketing.
I've spend the entire last 6 years of my life on something called Object-Relational Mapping, so I think I can comment on Danny's claims a bit. Object-Relational Mapping, or O/R mapping or 'ORM' (for the people who aren't aware of ORM being the acronym of Object Role Modelling) can be implemented in a lot of ways, and it is always used to solve a mismatch between two projections of an abstract entity model: the projection onto a relational schema and the projection onto an object oriented language. For more details, read my essay about this subject (don't let the title feed you with presumptions about the contents). As this is rather abstract, let's use an example: a very simple Order system
When you're creating a system for a client, that system has to represent functionality which are usable in the reality the client lives in. In short this comes down to the fact that the functionality and features of the system have to connect to what the client does and have to make the client solve problems / overcome challenge s/he would otherwise run into without the system (otherwise, why bother using it, right?). To be successful in this, the system should work with elements which are recognizable in the reality of the client: if the client works with customers who file orders for products, the system should work with customers who file orders for products (this is oversimplified, but it's for the example in a blogpost, not a book). But, what are these 'Customer', 'Order' and 'Product' exactly?
When discussing the project with the client, the client will tell you how s/he sees 'Customer', 'Order', 'Product' and how they relate to eachother: which information elements they consist of, how long they'll live inside the reality of the client, who must alter them etc. etc. This information is abstract, i.e. it's not physically available to you. To get a deeper understanding of it, you'll create a model out of this information, in one way or the other: the model contains the information obtained from the client and shows you the definitions of Customer, Order and Product and how they relate to eachother, perhaps other information as well if it benefits the model and its purpose. Now, the word 'Model' will make a lot of people think about visio diagrams or otherwise a picture of some sort. But that doesn't have to be the case. It can be anything, as long as it represents exactly the information you obtained from the client (or for DDD enthousiasts: the Domain Expert), so if you like to write lond-winded word documents, or write everything down in a text-based DSL, it's up to you.
As I also described in my essay linked above, you'll soon find out that a 'Customer', 'Order' and 'Product' in the reality of the client (and thus the system you're creating for this client!) are actually names for different groups of data elements. In other words: the instances of these three elements are tuples of data. If in your abstract model, the definitions of 'Customer', 'Order' and 'Product' are called entity definitions, the tuples are entity instances. So if I say: "ALFKI, Alfreds Futterkiste, Maria Anders, Sales Representative etc.", what does that mean? For the Northwind impaired: not much. For the people who recognize the first Customer record in Northwind's Customer table, it means: "It's a customer!". Well, almost. It's a Customer instance.
The system you're creating will deal with these entity instances in memory but also has to store them in a persistence storage (e.g. database). If for persistence storage a database is used, it means that these entity instances flow from memory to database table (insert/update) and back (fetch). If we define the reality of the application to be the state in-memory inside the application, we can define that an entity instance should be the same data tuple identified by the same identifying attribute (e.g. primary key, Id), if we save it or fetch it: if we fetch the ALFKI customer instance from Northwind on Monday morning 8 am. it has to be the same instance if we fetch it on Thursday afternoon at 4 pm (unless it's been deleted by someone of course). It might be that some attributes (fields) have changed a value, but it is the same instance.
However, looking at the data tuple's contents, they're just a bunch of strings and other constants. So this data tuple only becomes an entity instance if there's a valid entity definition in the same space (e.g. memory, database, application) to give it context. In other words: if you want to be able to talk about Customer, Order and Product in the application, as well as the database, you've to have a definition of these entities available where you work with the data tuples.
As we've already made the abstract entity model (in the form you like, e.g. a chalkboard drawing with foodstamps, knock yourself out, as long as it represents the exact information it should represent), why not use that information to become our entity definitions we need to give the data tuples context? This is called projection: we project this information onto a different space (e.g. program language, storage structure) and the result of that projection is the element we can use inside that space. The advantage of this is that the projection result isn't something that fell out of the sky: it is based on the result of analysis with the client or Domain Expert. As 'projection' sounds rather abstract, what is it exactly? Think about it like a transformation of a definition from one domain to another.
Take our abstract Customer entity definition. It's a definition of the attributes (fields/data elements) which together form the Customer: some ID which is unique, a company name, contact person name, title of the contact person etc.. If you see that entity definition, could you write a .NET class which represents that definition? I think you can . That's called a projection: you projected the abstract entity definition onto a .NET language. So a C#/VB.NET class which is the projection of an abstract entity is the projection result of that entity and can be used inside that space (C#/VB.NET/.NET) as the representative of that abstract definition. This means that we can use a Customer, Order and Product class which are the projection of these abstract entities onto for example C#/VB.NET in our C#/VB.NET application to give a Customer, Order and Product instance (data tuple) meaning. If we load the same data into an instance of a random other class (or an object array for example), you'll see a bunch of constants, values. But does it mean the same? You can interpret the data as if it's a Customer instance, but is that correct? In other words: you need the definition of the Customer to give the data meaning: create an instance of the projection result (class) and store the data tuple (customer entity instance) inside that class instance so the data inside the class instance has meaning: it is a Customer instance.
The same can be said about the persistent storage. Let's assume the data tuples are stored in a relational database. Because data tuples are just that: groups of constants, we need an entity definition to give them context: to give them meaning. In the relational model, these definitions are called tables, views and select queries (as by Codd/Chen's definition: a query is also an entity). So if we project our abstract entity definitions onto the relational model, we get representing elements which we can use in that space. Some will pick the table as the form they want to work with, others will pick a view. Both will have the same characteristic feature: they represent the abstract entity definition they're a projection of, they give meaning to entity instances of the entity they're a projection of.
For your Order system you need two projections of the same entity definitions: one on a .NET language, and another one on the relational model used in the relational database of choice (e.g. Oracle). As we've seen above, this results in Customer, Order and Product classes and Customer, Order and Product tables (or views, if you like views). Both projection results live inside a reality with its own rules and boundaries: the .NET classes live in the OO world of .NET, the tables live in the world of relational schemas, algebra and set theory.
Earlier in this post we've seen that an entity instance (data tuple) has the same meaning in the application's space whatever you do: working with the Customer instance represented by the ID 'ALFKI' means you're working with that instance, not with a random instance but with the instance. Inside the persistent storage, the instance is stored in a row which is defined by the table (or view) it is part of, i.e. the table definition (which happens to be the projection result of the Customer entity definition onto the relational model!). In memory the instance is stored in an instance of the result of the projection of the same definition onto the .NET language of choice, i.e. the Customer class.
However these two worlds don't live together in the same space: transfering an entity instance from its entity class instance (customer object) to the table row inside the database and back could be seen as a transformation: perhaps the entity is stored inside the database in two or more tables. Perhaps the projection on the .NET language resulted in multiple classes. For the application however, they must look like they live together: the transformation between the two worlds should just be there, it should 'just work', so the developer writing the system for the client doesn't have to worry about it. This is the service provided by an O/R mapper: it makes sure that the entity instances can be transported to the persistent storage (where they're stored in instances of the projection result on the relational model (i.e. table / view)) and back, and they keep the same meaning.
If a class definition C is the projection of an abstract entity E and a table definition T is also the projection of the same entity E, isn't it possible to project C out of T? or T out of C? Given the rules and boundaries of the spaces T and C live in respectively, and the projection rules of E onto C and T, one can define a projection from T to C and from C to T. This is what most of you are doing today: you pick an O/R mapper, you start with an abstract entity model, in one form or the other, work with it to create either tables or classes and tell the O/R mapper of choice to produce the other side. There are some variants on this but it more or less comes down to this, or in the ideal situation where you start from scratch and have the abstract entity model which is then used to produce both the tables and the classes. Some people will now argue that their .NET classes are way different than any table, and the classes follow the application's needs, but frankly that's not true: the classes written in such an application haven't fallen out of the sky either. If a Customer, Order or Product class has to be created, how is decided which fields are defined in these classes? Exactly, by projecting the abstract entity model.
The Entity Framework is just an O/R mapper
Now that we understand how entity definitions, entity classes, tables and views and entity instances relate to eachother, that there's a need to make them all work together and that that need is fulfilled with the service provided by an O/R mapper, the question arises: why does Danny Simmons argue that the Entity Framework is so much more than an O/R mapper? What does it provide above the service an O/R mapper provides? After all, isn't it so that what we needed, namely making the two projection results work together, is already provided by an O/R mapper? Why is a system needed which apparently can do more (whatever that is) ?
The truth is: you don't need more. What you need is an O/R mapper which provides a rich service to make classes work together with tables to make sure your entity instances are the same whatever you do and you are able to work with entity instances without worries, and above all: which fits how you want to work. As discussed above: if you want to start with the abstract entity model, or with a projection result, or with two projection results, that's up to you: pick the O/R mapper which fits the way you want to work, the way you deal with the abstract entity definitions. Because that's its purpose: providing the service to make the two spaces work together as if they're one space, so you only have to worry about the functionality you have to write for your client. Everything else is overhead, plumbing, and above all: your client pays you to build functionality for the sole benefit of that client, s/he doesn't pay you to build overhead / plumbing, if that's already available.
So what does Danny say about the Entity Framework and why does that make the article marketing? Let's quote a couple of snippets:
The big difference between the EF and nHibernate is around the Entity Data Model (EDM) and the long-term vision for the data platform we are building around it. The EF was specifically structured to separate the process of mapping queries/shaping results from building objects and tracking changes. This makes it easier to create a conceptual model which is how you want to think about your data and then reuse that conceptual model for a number of other services besides just building objects.
First this one. I'm not going into detail about the comparisons between Entity Framework and ADO.NET (apples vs. oranges) nor Entity Framework vs. Linq to SQL ('We deliberately limit framework B and we're comparing our other framework A with it to make A look good!'). Danny uses NHibernate as the metaphore for comparing Entity Framework with a 3rd party O/R mapper. I'm not sure why he uses NHibernate in particular, perhaps because Microsoft thinks it's an open source framework which isn't owned by any company (so bashing it in a comparison matrix isn't running the risk of a lawsuit), but it is owned by a company (JBoss Inc., which is owned by Red Hat), but that's not important. The important bit is in that last sentence.
I've tried to explain in short what the purpose is of an O/R mapper, and why it is needed in the first place. For example an application which is written entirely inside the same space as the tables doesn't need an O/R mapper because the projection result onto the relational schema (table/view definition) is re-usable to give meaning to an entity instance used in that application. An application which uses an ODBMS (object database) also doesn't need an O/R mapper, as entity definition projection onto the OO language is usable inside the persistent storage as well. Danny now paints the picture that the Entity Framework is easier to create a 'Conceptual model which is how you want to think about your data'. But, Danny, isn't the way we want to think about our data already defined in our abstract entity model? You know, the model which is used as projection source for our projections onto a .NET language and the relational model? If I use that abstract entity model to produce classes in the .NET space, and use an O/R mapper to make sure that what's inside the instances of these classes represents the entity instances I work with, so the data tuples inside these class instances have meaning, what else is there? Isn't that exactly what I need to create an application which can work with these entity instances?
Danny hints in that last sentence that there are other services besides building objects in which the Entity Framework can assist. But... are these services only usable in your application in such a way that the data tuples have the same meaning in these services as well as in your application if the Entity Framework is used? Or is another O/R mapper, say LLBLGen Pro or NHibernate also usable for that? Of course these other O/R mappers are suitable for that too: they're transformation services which make it possible to work with entity instances in your .NET language. If you for example want your Astoria (ADO.NET dataservices) service to work with another O/R mapper, you can: simply because there's no conceptual element required for these services which isn't available in the group of mature O/R mappers out there. In fact, it's the Entity Framework which lacks some conceptual elements when you compare it to mature O/R mapper frameworks like LLBLGen Pro: distributed scenario's for example, or multi-database models.
Danny's last sentence is worth quoting:
So the differentiator is not that the EF supports more flexible mapping than nHibernate or something like that, it's that the EF is not just an ORM--it's the first step in a much larger vision of an entity-aware data platform.
I don't get this: a person like Danny Simmons who worked on the Entity Framework for so long, how can such a person ignore the fact that any O/R mapper is about entity awareness? What's described in that last sentence is exactly an O/R mappers sole purpose: it's there so that the developer can work with entity instances in the OO language and store these instances in a non-OO environment like a relational database and vice versa. What larger vision is there to have, if all there is is the abstract entity model and its projections? Tooling perhaps? To make things easier for the developer to create these projections and position the O/R mapper service in the application code?
If that's so, then why is it that the Entity Framework designer is such a pain to use? And why does it lack true maintenance features like true projection maintenance of T to C for example? After all, the core point of the Entity Framework seems to be that a conceptual model can be defined on top of a set of table / view definitions. But if these table / view definitions change, who's doing the maintenance? Isn't that the purpose of the Entity Framework, as the developer works with the abstraction created in the form of the conceptual model, on top of these tables / views? Then why doesn't the tooling for the Entity Framework take care of it? If I can create a designer which can find inheritance hierarchies automatically, which can deal with thousands of entities, which does maintain your model after the table definitions have changed, which actually can deal with UDT types written in C#, which can deal with multiple catalogs/schemas in a single model, why can't MS? Taken that into account, what's left of that much larger vision if we look at the bits today, if we look at what Danny said that the Entity Framework apparently is: not just an O/R mapper but part of a much larger vision?
I'm sure the Entity Framework is build by competent people, who are very smart and know what O/R mapping is. What I'm not sure about is if these competent people actually have windows in their building, if they have an internet connection and that someone has told them it's no longer 1994 but that times changed and that there are numerous people out there who have solved the same problem they've tried to solve for so long. That these people have mature solutions at their hands now, which match what developers need. That these mature solutions are solving the exact same problem. That these mature solutions don't need a corporate spin to make the real problem (mismatch between two projections) look like some kind of different 'problem' which can only be solved by the product sold by the same corporation.
So for the people who echo the Microsoft spin: you too should look further than the shiny brochure on your desk. There's no 'bigger vision', as there's nothing bigger to vision. If there's a reason the Entity Framework can only work with technology ABC, the reason is artificial: bottom line is that there are just entity definitions, entity instances and projections of entity definitions to store the entity instances. Nothing more, just data and the definitions to give that data context, meaning and a service to make that happen.