Saturday, November 6, 2010

Concurrency for Persistent Storage

I have been thinking about concurrent updates to relational databases and a strategy to deal with the problem. Of course it is not the hottest thing on earth but until I figured out what to do - I just did not know the solution. I will have this documented here in hope of referring to in the future. If this also helps anyone else; I will be glad.
A few words about concurrency itself in the beginning would put the following discussion in the right perspective. Any object that is visible to multiple threads is subject to be read or written by multiple threads. The problem of concurrency arises from the scenario described above. A particular thread may 'think' that it has updated the value of a String variable from "a" to "b" but in reality it may have only updated the value from "c" to "b". Consider the events below.


1. Thread 1 reads the value of a String variable name to be "Joe".
2. Thread 2 also reads the value of a String variable name to be "Joe".
3. Thread 1 & 2 decide to update the value to "Jack" and "Jill" respectively.
4. Thread 2 gets his chance and changes name to "Jill".
5. Thread 1 (having no knowledge of Thread 2's existence changes name to "Jack".


In my real life applications I would like Thread 1 to be informed at step 5 something like this "sorry mate that name has changed. please go and fetch the new value and then come and update it". This problem begins to fade slightly if the threads are required to acquire the intrinsic lock on the name object described above. Note: if we just make the threads acquire the lock, the problem does not go away completely. In order to fix the problem one must acquire the lock, read the current value, ensure that is the same as before acquiring the lock, update the value to the new desired one and release the lock. In code that looks like so.


public void setName(String oldName, String newName){
    synchronized(this){
        String currentName = getName(); // Database equivalent of "Select... for Update"
        if(currentName.equals(oldName)){ // verify name has not been changed since our last read
                name = newName; // only now we are ready to make an exclusive update
        } else{
                throw NameAlreadyChangedException();
        }
    }
}


Ok problem solved. But wait, that name field is stored in a persistent storage and locking a Java object does not necessarily mean that no one else can update that row in the store. That's why most good databases let the user lock a record using "select for update" and it is generally referred as the 'last select'. But that works for some (may be most if not all) relational databases. But I am not ready to make any assumption about the type of store (relational or otherwise). I am looking for a strategy; not a specific solution.


In my test environment I am using Spring, Spring's JpaTemplate and MySQL. And I particularly like the approach of optimistic locking using version attributes. So here goes my Entity class.

@Entity
public class Person {
    @Version
    @Column(name = "VERSION", 
     nullable = false
    )
  private long version;
  @Column(name = "NAME")
  private String name;
  public long getVersion() {
return version;
  }
  private void setVersion(long version) {
     this.version = version;
  }

  public String getName() {
return name;
  }
  public void setName(String name) {
     this.name = name;
  }
}


So JPA provides this @Version annotation and the value of the field is automatically managed (i.e. auto-incremented on every update). Very nice. So my original if(currentName.equals(oldName)) translates to something like if(pojo.getVersion() == persistedPojo.getVersion())
And I can finish off my Repository (aka DAO) code in style like so.

@Repository
public class SupplierRepository extends JPATemplate // not exactly so in real code
...
// pojo.setName() has been called elsewhere
public Person update(Person pojo) {
synchronized (pojo) {
  Supplier persistedPojo = super.find(Supplier.class, pojo.getId());
  if(pojo.getVersion() == persistedPojo.getVersion()) {
    return super.update(pojo);
  }
  else {
    throw new OptimisticLockingFailureException();
  }
 }
}

Beginning with the commented line I can let hundreds of threads call the setName(...) method totally 'un-thread-safely'. Only one is going to be successful in committing the data to the persistence layer anyway. The intrinsic lock acquiring line synchronized(pojo) ensures none but one thread executes the remaining lines. Then the findbyId reads the current row (hence the latest version number) from the store. On verification (if condition) the update happens. The point I want to make here is multiple threads can still get inside the synchronized block (after the first one to get access relinquishes) but they will all fail the version test hence will face the OptimisticLockingFailureException which is an unchecked exception defined by my application.