cmis:objectId – A Case of Mistaken Identity

It’s ok to store cmis:objectId’s in my CMIS client application, right?

Ah such a simple question, yet hiding a plethora of probable pitfalls!

Over the last couple of years I’ve encountered (and held myself!) the misconception that cmis:objectId‘s are basically synonymous with NodeRef‘s, Alfresco’s native form of identifier.  Unfortunately there is a subtle but significant difference that traps many an unwary CMIS client developer: an Alfresco NodeRef identifies an entire object including that object’s version history (if any) – in effect documents and versions are fundamentally different types of “thing”, and versions don’t have any independent notion of identity.  In contrast, in CMIS both documents and versions are the same (they’re both cmis:documents) and are each uniquely identified by their own cmis:objectId.  From the versioning section of the spec (emphasis added):

Each version of a document object is itself a document object, i.e. has its own object id, property values, MAY be acted upon using all CMIS services that act upon document objects, etc.

So going back to our original question, the validity of storing a cmis:objectId for later use depends on what the CMIS client application is storing a reference to; some possibilities include:

  1. An unversioned object type (i.e. something other than cmis:document) => golden!
  2. An unversioned cmis:document => peachy!
  3. A specific version of a versioned cmis:document => good to go!
  4. The latest version of a versioned cmis:document => ruh roh raggy!!

1 problematic case out of 4 may not seem like too much of an issue, until we recall that versioning is enabled by default for all CMIS-accessible files in Alfresco (i.e. cmis:document and all sub-types).  Add to this that many CMIS client apps, regardless of the server they’re connecting to, basically don’t care about versioning (and when they do it’s often limited to concurrency control via private working copies) – they simply want to treat the CMIS repository as a glorified file/folder store, reading and writing files as if they were flat, unversioned objects – and you start to appreciate the seriousness of the problem.

In short, cmis:objectId alone cannot satisfy the 80% use case of CMIS client applications i.e. version-agnostic file/folder CRUD!

So what’s the alternative?

There are at least two approaches that I’ve come up with for working around this issue (and there may be more):

  1. Store a cmis:objectId and make additional CMIS calls to manually “fast forward” to the latest version of the object on every subsequent CMIS call.
  2. Store a cmis:objectId for unversioned object types, and a cmis:versionSeriesId for versioned object types, and make subsequent CMIS calls appropriate to each.

Fast forward

With this approach, the CMIS client application would store the cmis:objectId as normal, but every single time it accesses the object it identifies, it would look up the cmis:objectId of the latest version of the object first, before continuing with the original operation.  In detail, this involves:

  1. Call the getObject service with the original cmis:objectId.
  2. Look for the cmis:versionSeriesId property in the response.  If the cmis:versionSeriesId property exists in the response:
    1. Call the getPropertiesOfLatestVersion service with the cmis:versionSeriesId.
    2. Pull out the cmis:objectId from the response – this is guaranteed to be the cmis:objectId of the latest version of the object, at the time of the call.
    3. Update the stored cmis:objectId with the retrieved cmis:objectId (optional).
  3. Call the desired CMIS service.

The advantages of this method is that the logic is reasonably clean and simple, but it has the downside of requiring at least 2, and sometimes 3, CMIS calls for every single “original” CMIS call the client application wished to make (regardless of whether the object is versioned or not), as well as risking race conditions between steps 2.1 and 3 (i.e. when a new version of the object gets created by some other process between those two calls).

[UPDATE 2014-02-28] A vendor I’m working with mentioned another variation of this strategy that uses the “cmis:isLatestVersion” property in step 2 to determine whether the cmis:objectId refers to the latest and greatest version or not.  Other than use of a different property, the logic remains much the same (the client application still needs to “fast forward” to the latest version, using cmis:versionSeriesId).

Conditionally store either cmis:objectId or cmis:versionSeriesId

This approach involves storing cmis:objectIds for object types that are not versioned, and cmis:versionSeriesIds for object types that are.  Unversioned object types include everything that isn’t a cmis:document (cmis:folder, cmis:relationship, cmis:policy and cmis:item), as well as, on a case-by-case basis, cmis:document and sub-types of cmis:document (whether such object types are versioned or not can be determined by retrieving the “versionable” property for each cmis:document object type in the system).

For unversioned objects (i.e. those that have a cmis:objectId stored in the CMIS client application), CMIS service calls can be made directly by the client application, secure in the knowledge that the results will always refer to the latest version of the object (since, by definition, there can only ever be one version of such objects).

For versioned objects (i.e. those that have a cmis:versionSeriesId stored in the CMIS client application), one of two possible call sequences are required:

  1. If the CMIS client application only requires metadata, it can call one of the “OfLatestVersion” services (getObjectOfLatestVersion or getPropertiesOfLatestVersion).
  2. For all other use cases:
    1. Call the getPropertiesOfLatestVersion service with the cmis:versionSeriesId.
    2. Pull out the cmis:objectId from the response – this is guaranteed to be the cmis:objectId of the latest version of the object, at the time of the call.
    3. Call the desired CMIS service with the retrieved cmis:objectId.

The advantage of this method is that it optimises the number of CMIS calls needed to perform such “version independent” operations – often only requiring a single call.  The disadvantages are that it requires some initial “discovery” calls to figure out exactly what’s versioned vs what isn’t, the client application’s logic is more complex due to the two different types of CMIS identifier that must be used, and there is the risk of a race condition between steps 2.1 and 2.3 in the event of a concurrent update by another process.

You might be wondering why a CMIS client application can’t simply store the cmis:versionSeriesId in all cases.  Unfortunately cmis:versionSeriesId is optional (you’ll have to manually scroll down to the cmis:versionSeriesId definition in that reference) – a compliant CMIS repository does not have to provide this property for unversioned object types, and in my experience most don’t.

This sux – surely there’s something better?

I’ve been unable to come up with a better alternative based strictly on the CMIS 1.x specifications, but that doesn’t mean others don’t exist – I’d love to hear about them if you’ve come up with one.  That said, having worked fairly extensively with CMIS client application implementers over the last couple of years I’m reasonably certain there isn’t a fundamentally better approach.

The good news is that the issue has been brought to the attention of the CMIS Technical Committee, and there is a proposal from Oracle for something called “representative copies” that potentially has some overlap with this use case.

Speaking personally, I would like to see something along the lines of the following, minimally intrusive change:

  • Make cmis:versionSeriesId mandatory for all object types and rename it (e.g. to cmis:id) to show case its more general utility.
  • Update all services that receive a cmis:objectId to also support the new identifier property as an alternative.  When the new identifier is provided, the semantics would be “perform the requested service against the latest version of the object”.
  • Remove the “OfLatestVersion” services, as they would now be redundant.

Conclusion

CMIS is a valuable addition to the content management repertoire, but as with version 1s of most products, it has its share of flaws.  This particular flaw happens to be both subtle and of significant impact, which makes it all the more important for CMIS client application developers to understand it and factor it into their designs.

More generally, it is my opinion that this also reflects the specification’s focus on addressing “hard core ECM” requirements, to the (unintended) detriment of the 80% content management case i.e. simple file/folder CRUD.  I suspect no one on the CMIS TC realised at the time that the intersection of versioning and identity would “bleed through” the basic file/folder CRUD use case in this way.

Ultimately the best way for problems like this to be fixed (or better yet, to not surface in the first place!) is community involvement.  I’ve found the CMIS TC to be an open and welcoming place, and I strongly encourage all CMIS client application implementers to get involved in the committee’s good work, at the very least at the level of an observer (as I have).

  • Florian

    Hi Peter,

    Simple file/folder applications can also use the path to access (most) objects. Calling getObjectByPath() for a document usually returns the latest version.

    – Florian

    • pmonks

      File paths make for terrible identifiers as they are not immutable. If the object is renamed or moved, the path no longer identifies the object.

      • Florian

        That’s why I said “simple file/folder applications”. If your application just needs a file system replacement, that’s good enough and simple enough. No need to handle object IDs here.
        If you need ECM capabilities than you have to learn to deal with object IDs.

        You said that the object ID alone cannot satisfy the 80% use case of CMIS client applications. I would be careful with that number. Other repositories have been built for other purposes and the CMIS object ID semantics make a lot of sense to them and their applications. You might be right with “the 80% use case of CMIS client applications that have Alfresco in mind”.

        Adding a “dynamic identifier” is a good thing. But you know from CMIS committee discussions that defining that isn’t trivial. Not all repositories can support it out-of-the-box.

        • pmonks

          Strong identity is not an “ECM capability” – virtually all non-CMIS and/or non-ECM content repositories support the concept, as do most vanilla file systems, albeit in proprietary ways.

          Regardless of the semantic argument, my experience in working with CMIS client application developers is clear and ubiquitous – they expect to have a single identifier they can use to identify an object, for the entire lifetime of that object. CMIS 1.x doesn’t provide that, to the surprise of most developers. The fact that expensive workaround (as I’ve described above) are necessary piles insult on injury.

          As for repositories that can’t support this out if the box, I have no problem with them doing what Alfresco had to do to support unique version identifiers i.e. emulate it. The call sequences above are effectively the technical design of what that emulation logic would look like (albeit implemented on the server).

      • Florian

        Btw. The Browser Binding allows you to call getObjectOfLatestVersion() with an object ID. (Technically, the AtomPub binding, too.)
        So, storing one object ID of the version series is sufficient to get to the latest version – if you don’t use the Web Services binding.

        • pmonks

          cmis:objectId is not listed as an allowed input parameter for the getObjectOfLatestVersion service (section 2.2.7.4 of the spec), nor, as best I can tell, is it described as an allowed input parameter in the browser binding representation of that service (section 5.4.3.30 of the spec).

          Is that an erratum in the published spec, or a proprietary extension that some CMIS servers support?

          • Florian

            Section 5.4.3.30 defines the arguments that can or have to be added to an Object URL (section 5.3.4). The Object URL contains either an object ID (document ID in this case) or a path. In a nutshell, compile an Object URL to a document and add “returnVersion=latest” and you get the latest version without detour. That’s standard CMIS 1.1 behaviour.

          • Florian

            The AtomPub binding supports something similar. See section 3.7.1.1.2.

            Only the Web Services binding has no equivalent.

          • pmonks

            What about all of the other CMIS services? Do they also support the “returnVerson=latest” parameter?

            Regardless, I think you can see the issue – this is basically an improvement to a subset of the CMIS universe (i.e. CMIS 1.1 servers only, AtomPub / Browser binding only, client must be careful to construct URLs correctly, mechanism may not be available for all services).

            I’d still argue that a better solution is for CMIS 2.0 to address the issue head on, by providing an identifier that’s consistent with the semantics most developers expect.

  • Dick Weisinger

    Peter,

    The fact that Alfresco’s NodeRef and the cmis:objectId aren’t the same thing is something that we too got caught up in when developing an integration between Alfresco and AutoCAD using CMIS. It’s definitely easy to mistake Alfresco’s NodeRef and the cmis:objectId as being one and the same, and that misunderstanding can lead to some head scratching.

    I do like your suggestion of making cmis:versionSeriesId mandatory, although I would think that there would be resistance in trying to rename that property to something else.

    Your description here is a good ‘heads up’ to Alfresco developers using CMIS.

    • pmonks

      Thanks for the comments Dick. I do hope the committee seriously considers making cmis:versionSeriesId mandatory (with or without renaming it), as that seems to be the least intrusive solution to this problem.

      And just to be crystal clear, although I’ve phrased the issue in terms of Alfresco, it’s not specific to CMIS client application developers who target Alfresco. Identifiers have well established semantics across a broad range of software applications (including those that deal with files/content), yet CMIS 1.x does something different, violating the Principle of Least Surprise. It’s these unexpected semantics that cause the head scratching the implementers I’ve worked with have had to go through.

  • kamielvdz

    We use(d) the noderef as a unique identifier in XML content for assets. The assets are added using CMIS. Worked great without versioning, with versioning same issue as above.

    Now we have 2 options. Either keep using NodeRefs in which case we have to create some helper functions to determine the noderef based on the response we get from CMIS.

    Or we use the CMIS ID, which we would prefer. However when we process the XML content back in Alfresco (we parse it here using jdom) there is no easy way to find an object based on the CMIS ID.
    We do not want to deconstruct the CMIS ID using the version label and definitely do not want to make an extra CMIS call.

    A few useful functions in the Alfresco Java or Javascript API to translate from CMIS ID to Alfresco NodeRef would be fantastic. Anybody know of some?

    • pmonks

      The challenge is that there’s no one-to-one mapping between cmis:objectId’s and NodeRefs – for versioned documents, multiple cmis:objectId’s map to the same NodeRef (since Alfresco doesn’t provide a native identifier for individual version entries – they’re always “keyed” off of the same NodeRef plus a version label).

      Instead Alfresco’s CMIS implementation synthesises cmis:objectId values by combining NodeRefs (or, as of v4.2, a GUID) with the version label, and then parsing those values when they’re received via an inbound CMIS call. Regardless, a compliant CMIS client application can’t make any assumptions about cmis:objectId values – they must be treated as opaque string values. This is particularly important when calling Alfresco, since Alfresco changed the format of cmis:objectId values in v4.2.

  • Pingback: Fixing Object Identity in CMIS – Some Proposed Solutions | Peter Monks()