Concurrency Control Methods. Is there a silver bullet?
Which concurrency control method do you use most of the time? "First Save Wins" ? (optimistic locking in ADO.NET) "Last Save Wins" ? (Overwrite the row, no matter what). Ever wondered what the difference between the two is when it comes to efficiency? Most people haven't and think "Last Save Wins" is BadTM and "First Save Wins" is GoodTM. But both make at least one person loose his work to preserve the work of another person. I read a thread today in the microsoft.public.dotnet.framework.adonet newsgroup where one person asked how he could make the SQL generator in Visual Studio.NET to use the Last Save Wins method and another person stepped in and bashed him for using a not very smart concurrency control method like Last Save Wins. But does it matter which one you pick when both are as inefficient as you can possibly make them to be (i.e. someone looses work) ?
What is a concurrency control method?
A concurrency control method is a method to regulate multiple requests (transactions if you will) for modification of a shared resource, for example a source file in a source control system or a table row in a database. Several different methods are known and each have their pro's and con's. Not all methods are usable in all areas of software development: the concurrency control methods often used in source control systems (if conflict then merge) is not usable most of the time in a database world. However, when there are more than a few methods, which one is the best? To answer that question, we first have to define what 'best' means in this context. Let's see why we should think about using a concurrency control method in the first place to formulate our answer.
Most developers just write the software they have to write and focus on a single user using the software, interacting f.e. with a shared resource, probably a database. When multiple users reference and modify data in a shared resource, the resource manager (f.e. SqlServer or Oracle) should take care of the issues related to the multiple transactions accessing the same table or row, or at least that's how a lot of developers think. In a way this is understandable because the developer is a user of the resource manager, and the resource manager should have that kind of functionality, right? Well, in a way it should but if it does have it, do you want the functionality in the way it is provided to you?
A simple example of a common problem: Joe and Jack want to modify the same row in a database. They do that using the same tool. Joe starts the functionality to edit the specific row and starts working on it. Jack opens the functionality a little later, but because he's the quicker thinker he beats Joe in completing the modifications and saves his work to the database first. When Jack is done editing the specific row, Joe finally completes his modifications of the same row and saves is work too.
Now... what should happen?
- Joe should be able to overwrite Jack's data. Jack looses his work. (Last Save Wins)
- Jack's work should be preserved, Joe shouldn't be able to save his work and thus looses his work. (First Save Wins)
Besides the emotional stress this can bring (this is no joke, there are applications who have screens which require 10 or more minutes to complete and at the end the user gets a notification that it was all for nothing because he can't save the work due to a nice concurrency control mechanism), the time spend by Joe on the useless modifications can be spend better. Both methods are inefficient: always at least one person looses time and work. Other known methods, like 2-phase locking (2PL / Strict 2PL), Multi-version-concurrency-control (MVCC in Oracle), other database-locking variants, they all cause the loss of work at the end of a workflow cycle, when the work performed is stored.
So there is no best method?
If you focus on the resources in the database and use low level methods to prevent multiple changes on a shared resource on that low level in the application stack, there is no best method: all are causing the loss of work in some form. The only true method that works is the method of good organization and an application which helps you with that, sometimes called Functionality Locking. In the example above, Joe and Jack are both modifying the same row. If you didn't wonder "Why?", start this article at the top and re-read it. If you did, you are on the right track. You see, the less concurrency you create in your application, the less problems it will bring. If Jack, who starts later with the same functionality, isn't able to start the editor for editing the same datarow, because Joe is already busy with the row, he can do something else with his time, because the modifications he has to make will or be lost or will cause that someone else looses his work.
It would be better if Jack then would be notified that a user "Joe" was already working on that row, which then would enlighten an inefficient way of working in the organization, which is fixed by transferring the work Jack had planned for that datarow to Joe, because he's working on that row already. This can be implemented with a common gateway, today we call them webservice. A common gateway to your shared functionality among users can control functionality locking for one application and also for more applications who want to modify the same shared resource.
So, every time you write code to pop up a window to notify the user he can't save his work because another user has already modified the rows he tries to overwrite, ask yourself this: "wouldn't it be better to notify the user before he started working on that rows?". Software is not created to keep us busy (ok, some is ;)), it is created (or better: should be created) for making processes more efficient and thus organizations which have a lot of such processes should perform more efficient. Software which tosses away even a fraction of the work done by a single employee because some other employee has done the same already, is inefficient and should be replaced, since it doesn't help the processes be more efficient, on the contrary, software should help the organization with the implementation of more efficient ways to perform the processes encountered in the organization.