Partial Update and Immutable Fields Update Dilemma

One of the recurring problems I have encountered in creating services – be they REST, SOAP or Hibernate – is how to handle the partial update problem. A partial update refers to the ability to update a subset of fields (subelements or attributes) exposed by an object. If the representation exposes resource/object references, then the problem gets even more complex – this differential object-graph update is the topic of another blog.

Some discussions of the problem – such as Joe Gregorio in How To Do RESTful Partial Updates advocate breaking up the resource into multiple subresources that can be manipulated individually. See Subbu’s PATCH: It is the Diff for another RESTful point of view. Although my discussion will be REST-oriented, the intent is more general in scope and also aims to encompass SOAP services. The goal is to capture the commonality of the partial update problem in different contexts.

Here’s a simplified object domain model to illustrate the issue. All objects extend BaseObject – to avoid visual clutter I have not drawn the derivation connector. Note that BaseObject’s id and version fields and VideoAsset’s workflowState field are read-only fields.

Below is a full representation of the resource that you would obtain from a GET.

<videoAsset xmlns=”http://www.myvideo.com/services/rest/v1″&gt;
<id>1066</id>
<version>1</version>
<title>Battle of Hastings</title>
<originalFileKey>Battle-of-Hastings</originalFileKey>
<longDescription>Battle of Hastings</longDescription>a
<shortDescription>Battle of Hastings</shortDescription>
<videoSeconds>1000</videoSeconds>
<workflowState>NOT_DISTRIBUTED</workflowState>
<genres>
<link href=”History” kind=”name”/>
<link href=”Military” kind=”name”/>
</genres>
<medias>
<media>
<version>2</
version>
<uri>http://www.myvideo.com/media/hastings.flv</uri&gt;
<profile href=”defaultFlv” kind=”name” type=”VIDEO”/>
</media>
<media>
<
version>2</version>
<uri>http://www.myvideo.com/media/hastings.jpg</uri&gt;
<profile href=”defaultThumbnail” kind=”name” type=”IMAGE”/>
</media>
</medias>
</videoAsset>

Now let’s say you want to modify (PUT) just one field – title for example. Are you required to include all the fields of the videoAsset element or can you just specify the new value?

<videoAsset xmlns=”http://www.myvideo.com/services/rest/v1″&gt;
<title>Battle of Hastings in England</title>
</videoAsset>

The problem lies in the semantics your service associates with absent fields. If your parser finds a value for title – all is fine. But what about missing fields such as longDescription? We potentially have three cases:

  • Do not set the value of longDescription
  • Set longDescription to empty string
  • Set longDescription to null

In the above example, how do we interpret the missing fields of videoAsset? Does the fact that longDescription is missing mean that we wish to set its database value to null or not do anything at all?

In the example below, we explicitly include an empty longDescription element which the parser will pick up as an empty string. We can therefore unambiguously set its database value to an empty string.

<videoAsset xmlns=”http://www.myvideo.com/services/rest/v1″&gt;
<title>Battle of Hastings in England</title>
<longDescription></longDescription>
</videoAsset>

You can resolve this problem by using the XSD nillable constraint. But it is not always apparent what is nillable or not, and this can lead to unnecessary use of nullable columns in the database. Of course, this will work if you use XSD. If your XML payloads have no schema definition such as Atom, this wouldn’t work as neatly. Offhand, I’m not sure how RelaxNG or Schematron address this issue.

<videoAsset xmlns=”http://www.myvideo.com/services/rest/v1″&gt;
<title>New Value</title>
<longDescription></longDescription>
</videoAsset>

A major complicating factor is the difficulty that ORM frameworks have with partial updates in general. Are the semantics for missing fields in an update merge or overwrite? The irony is that both SQL and XML can easily handle granular updates – in SQL you just update the required columns and in XML you just parse the elements. It is the Hibernate/Java middle tier that introduces “accidental” complexity for this use case. Since Hibernate is “object-oriented” it updates by objects not fields. The typical solution is to “merge” the new value with the existing object, but this requires the system to read in the entire object and then persist it with the new values – hardly an ideal case. Under any load this scenario would be found wanting – both for data overhead and data consistency.

Imagine if we could just extract the title from the XML payload and just directly create a SQL update statement:

update videoAsset va set va.title=’New Value’ where va.id=1066

Of course, the ORM framework provides a host of other benefits, but the point here is to understand the limitations and new accidental complexities introduced by the framework. Alas, as always it ends up as an issue of balancing competing concerns.

Mutable Operations with Immutable Fields

Another related problem is the presence of immutable fields in the inbound payloads of mutable operations such as POST and PUT.

If you are using XSD schema (as in SOAP), much depends on how you define the optionality constraints for your elements (minOccurs and maxOccurs). Some fields can naturally be optional (description) but others such as title and originalFilekey are always required for a videoAsset.

The principal problem here is if you are using the same XML representation for multiple use cases (GET, PUT and POST). For simplicity’s sake this is often the case, but to accurately represent each particular use case you would ideally want to use a different representation. A GET-oriented XSD type (VideoAssetGet) would represent a view of the object where all optionality constraints were accurately modeled for retrieval. A POST (VideoAssetCreate) type would only expose mutable fields and would not contain read-only fields such as id, version (derived from BaseObject) or workflowState. It makes no sense to specify these fields for either a POST or PUT since they are immutable. This can get quite messy for large objects that have a complex mix of write-able and read-only fields.

So your choice boils down to using similar but different objects for each verb or you use one “fat object” to represent a union of all fields across GET, PUT and POST and make sure you have good documentation. The downside is that the service then becomes less “self-documenting”.

Typically what is done is to use one representation and ignore read-only values for POST and PUT calls – AtomPub does this with the id element the feed/id element. GoogleData also insists that you have to always GET the state of a resource and use this data to PUT it. I guess you can’t go wrong with modeling your service on Google, right?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: