Fixing Object Identity in CMIS – Some Proposed Solutions

As mentioned in my last post, the CMIS TC has been working on the issue of version-independent object identity, and that work is being tracked as CMIS-731.  At this point there are 2 basic proposals floating around (each with a number of variants), and I wanted to describe, compare & contrast them, and open the discussion up to the CMIS community for feedback.

Right now the committee is actively discussing these proposals, but my concern is that because there is minimal representation on the TC from the developers of CMIS client applications, the requirements and proposed solutions are being presented and discussed in something of a vacuum.  While I think I have a reasonable understanding of client application requirements (from my work with the Alfresco ecosystem), I’m still an indirect source – I’d much rather hear requirements from, and have proposals validated by, the developers of CMIS client applications, and while the OASIS machinery can be a little intimidating to navigate, a blog is a relatively low pressure, easy place to provide that kind of feedback – please don’t hesitate to comment here!

I’ll start by providing a summary of the two basic proposals and their primary variants as I understand them, and then compare and contrast those vs the requirements that the committee has discussed to date.  I’d then encourage you to provide any and all feedback you have (even if it’s a short “Peter, you’re wrong and here’s why: …!”) as comments on this post.

So without further ado, the proposals:

Extend applicability of cmis:versionSeriesId

This proposal is based on the observation that in the current versions of the specification (1.0 and 1.1), the cmis:versionSeriesId property (where present, and for the services that support it) already provides a version-agnostic identifier; the only gaps being that it isn’t ubiquitous across all object types and services.

Based on this observation, this proposal mandates cmis:versionSeriesId for all servers, regardless of whether they support versioning or not, and all CMIS services that today accept a cmis:objectId would offer an equivalent that accepts a cmis:versionSeriesId (the semantics being “invoke this service as if the cmis:objectId of the latest version had been provided”).  This could be achieved by continuing the xxxOfLatestVersion service pattern to its ultimate conclusion, or by overloading the existing services to support either cmis:objectId or cmis:versionSeriesId as input.

This proposal also optionally renames cmis:versionSeriesId to something more descriptive (its expanded semantics no longer being limited to version series’), as well as deprecating or removing the xxxOfLatestVersion services if the alternative of overloading the existing services to accept either cmis:objectId or cmis:versionSeriesId is selected (since they would be redundant).

Basic Variant – extend the semantics for cmis:document types only

In this variant, cmis:versionSeriesId would only become mandatory for cmis:document and sub-types of it.  Other CMIS object types (cmis:folder, cmis:relationship, cmis:item and cmis:policy) would continue to not support this property, as is already the case in CMIS 1.0 and 1.1 (cmis:objectId would remain the only identifier for these object types).

Extended Variant – extend the semantics to all CMIS object types

In this variant, cmis:versionSeriesId would become mandatory for all object types – not just cmis:document but also cmis:folder, cmis:relationship, cmis:item and cmis:policy.

Add a new identifier

In this proposal, a new mandatory identifier would be added to the specification, tentatively called cmis:representativeCopyId at the time of writing (see CMIS-731).

All CMIS services that today accept a cmis:objectId would offer an equivalent that accepts a cmis:representativeCopyId, with the semantics being “invoke this service as if the cmis:objectId of the latest version had been provided”.

Basic Variant – add the new identifier to cmis:documents types only

In this variant, cmis:representativeCopyId would only become mandatory for cmis:document and sub-types of it.  Other CMIS object types (cmis:folder, cmis:relationship, cmis:item and cmis:policy) would not support this property (cmis:objectId would remain the only identifier for these object types).

Extended Variant – add the new identifier to all CMIS object types

In this variant, cmis:representativeCopyId would become mandatory for all object types – not just cmis:document but also cmis:folder, cmis:relationship, cmis:item and cmis:policy.

Comparison Matrix

With the basic proposals outlined, we can now compare these two proposals (and their variants) vs the requirements that the committee has identified to date (additional requirements welcome!):

Extend cmis:versionSeriesId Add a new identifier
cmis:document only All CMIS object types cmis:document only All CMIS object types
1 Avoids extra round trips to the server that are required today (e.g. calls to getTypeDefinition to figure out if a type is versioned or not, calls to “fast forward” through a version history, etc.)
2 Provides a single identifier that can be used for cmis:document and all sub-types
3 Provides a single identifier that can be used for all CMIS objects types
4 Eliminates conditional logic around identifier handling in CMIS client applications
5 Avoids identifier proliferation
6 Avoids adding a 2nd identifier to object types that don’t need it (cmis:folder etc.)
7 Avoids potential confusion around the current semantics of “version series”
SCORE (higher is better) 4 5 4 5

What’s been apparent during the committee’s discussions, and is borne out by this comparison, is that none of these proposals is a clear winner.  What this exercise does do, however, is focus the conversation on the key differentiating characteristics of the proposals, which are:

  • Line 3: Is it important to have a single identifier for all objects in CMIS, or is it acceptable to require client applications to deal with 2 (one for cmis:document and sub-types, and another for everything else)?
  • Line 4: What value should be placed on keeping all client applications simpler, even at the expense of more complex server side implementations?
  • Line 5: What is the value of keeping the specification simpler, by avoiding identifier proliferation?
  • Line 6: How bad is it to add another identifier property to object types that technically don’t need it?
  • Line 7: Does expanding the semantics of cmis:versionSeriesId confuse or devalue the concept of “version series”?

While I and the CMIS TC members have our own answers to these questions, I’m much more interested in hearing directly from the developers of CMIS client applications.  Which of these proposed solutions makes your life easiest?  Which requirements do you care about, and which don’t matter?  What requirements are missing from the list above?

The window is closing on identifying a preferred solution to the long-standing problems of CMIS identity, and once closed it’s unlikely to be reopened for a long time (if ever), so now is your chance to have a say!

As always the CMIS mailing list is the best place to leave ad-hoc feedback, but feel free to comment here and I’ll pass your feedback along to the committee.

A note on Private Working Copies (PWCs)

One topic that’s come up in the TC meetings is how this new mechanism should interact with Private Working Copies (PWCs).  As a refresher, a PWC is the temporary copy of a document that gets created when the checkout service is called on it.

The complication revolves around whether a PWC, while it exists, should be the target of the proposed version-independent identifier (however it is implemented).  In the description of the checkout service, the CMIS specification states:

until it is checked in (using the checkIn service), the PWC MUST NOT be considered the latest or latest major version in the version series.

which implies it should not be resolvable via the new identifier.

However my experience has been that it is a common requirement for a CMIS client application to want to retrieve the latest “usable” version of an object, which is the PWC for those user(s) that have permission to access the PWC, and the latest non-PWC version otherwise. So there’s dramatic tension here between the spec’s definition that PWC’s are not versions, and the reasonable expectation that the new identifier would resolve to a PWC where appropriate.

The upshot is that further consideration is needed around how the new identifier would interact with PWCs, if at all. Feedback from CMIS client application developers is, again, very welcome.

  • Florian

    Hi Peter,

    I would like to add a few more aspects.

    The acceptance of a new CMIS specification version depends strongly on backwards compatibility. An application that works against a CMIS 1.0 or CMIS 1.1 server must work against a CMIS 2.0 server basically without code change.

    That rules out a few things. For example, renaming the cmis:versionSeriesId property would break some applications. Also existing services cannot accept cmis:versionSeriesId values instead of cmis:objectId values because in some repositories both IDs overlap and the repository cannot distinguish between them.

    Going the xxxOfLatestVersion route or the overloading route would add about 18 new services. You may want to add an “Avoids service proliferation” row to your comparison matrix. Also, the CMIS TC has decided not to extend the Web Services API. That is, these services wouldn’t be available through this binding. You may want to add “Works with Web Services” row to your matrix, too.

    For me, these aspects make the “Extend cmis:versionSeriesId” option not a viable option.

    Adding a new identifier is a cleaner and backwards compatible solution. But that is only necessary for documents. Introducing another identifier for non-document types would only duplicate the cmis:objectId property.

    Conditional logic around identifier handling is actually a good thing. Everybody who has ever saved a Word document under a different name to keep the old version knows the concept of versioning. Application developers should make a conscious decision if they want to address a specific version or a representative (the latest) version. That avoids surprises later. The conditional logic in this case would be one if-statement, which should be a reasonable effort for application developers.

    The PWC topic is interesting because this has been clarified in CMIS 1.1. The TC explicitly decided that the PWC is not the latest version (see https://tools.oasis-open.org/issues/browse/CMIS-728).

    I think it would be confusing if the latest version would be different for two users. Application developers who work with PWCs deal with versioning anyway. If the end-user expects something different from these applications, the applications can/should handle that.

    Apart from that, changing this would be an incompatible change and would break existing applications. Also, some repositories cannot implement it (for example, support PWCs in query results).

    Cheers,

    Florian

    Just for the record: I’m working in a team that develops four CMIS clients (three end-user applications and one server application). I also help other teams to build CMIS client applications.

    • pmonks

      I’m surprised that you’re unwilling to considering backwards compatibility breaking changes in a new *major* version of the specification. Is that SAP’s official position on this matter?

      To your other points, briefly:

      “existing services cannot accept cmis:versionSeriesId values instead of cmis:objectId values”
      This is incorrect – both the AtomPub and browser bindings use named parameters (see CMIS 1.1 sections 3.1.4, 5.3.4) and it would be trivial to extend the set of allowed named parameters to add cmis:versionSeriesId (or its successor), without disrupting the current parameters (i.e. cmis:objectId). The Web Services (SOAP) binding may or may not use named parameters – I didn’t check that binding, as it sounds like it will effectively go away in CMIS 2.0 (a backwards compatibility breaking change that you *don’t* appear to have a problem with, intriguingly…).

      “Going the xxxOfLatestVersion route or the overloading route would add about 18 new services.”
      Agreed. As I’ve said several times already, I would prefer to see cmis:versionSeriesId (or its successor) accepted everywhere a cmis:objectId can be used today, and the current (partial) set of xxxOfLatestVersion services removed. As described in the previous point, this is trivial to do cleanly, while simultaneously reducing the total number of CMIS services.

      “Adding a new identifier is a cleaner [snip] solution.”
      I disagree that this is clean. A content management API only requires two identifiers:
      1. a version specific one
      2. a version agnostic one
      Having 3 identifiers with partially overlapping semantics creates incidental complexity. Amongst other things, the cost of learning the API and implementing client applications that use it goes up unnecessarily.

      “But that is only necessary for documents. Introducing another identifier for non-document types would only duplicate the cmis:objectId property.”
      While technically correct, this seems to miss the critical point that there is no benefit to client applications in having different identifiers for different object types. If there is zero benefit, why introduce this unnecessary complexity (and associated costs)?

      “Conditional logic around identifier handling is actually a good thing.”
      This is incorrect. Extra logic in an application adds costs, and in the case of applications that don’t care about versions (the majority of CMIS client applications), you’re imposing those costs unnecessarily – the alternative approach does not impose them. Applications that do care about versions are going to require that conditional logic regardless of the approach, so both approaches are cost-equivalent in that case.

      “Application developers who work with PWCs deal with versioning anyway.”
      This is incorrect – as you pointed out, PWCs aren’t versions according to the spec. Ignoring that (problematic) definition for a moment, recall that in many cases the CMIS client application is not the only agent interacting with the corpus. There will likely be the server’s own UI(s) & protocols, other CMIS client applications, applications that use native APIs etc. etc. What matters is that all of these applications work together to provide users with a consistent and seamless experience, and that may very well require alternative “views” of the content, depending on who the user is.

      • Jay Brown

        I have a suggestion that could preserve the schema (no changes to current methods and no additional ones) and add only a single additional boolean to the repositoryInfo.

        Here goes.

        In 2.0 we add a boolean – supportsVersionSeriesRepresentativeIds (or something like that)

        If this is set to true then it means that server will accept versionSeriesIDs for any cmis:document operations where an objectId is also accepted. This will be taken to mean the latest version in the series.

        The new boolean gets us past the issue the Florian pointed out about how some 1.0/ 1.1 repositories cannot distinguish between these two ids (they may overlap) Those repositories will report false to opt out.