Archive for the ‘DM’ Category

Get your social content going with Alfresco and Liferay

Tuesday, March 29th, 2011

By Luis Sala

The fates of Alfresco and Liferay have been intertwined pretty much from the early days, not too long after Alfresco version 1.0 was released in late 2005. There’s a natural, dare I say, “synergy” in combining a robust content repository with a powerful and highly extensible portal.

As an Epicentric [ http://en.wikipedia.org/wiki/Epicentric ] alumnus, my technology career was practically launched working with portal technology. In many respects, Liferay has changed the portal landscape by demonstrating that open source portals are as good, if not superior, to their proprietary counterparts.

I therefore take a great degree of satisfaction and even greater pleasure in seeing that Alfresco and Liferay are partnering to show the world how portals and ECM can work together to give users an unparalleled platform for improved collaboration and social content publishing.
Attendees of this weeks’ Gartner Portal and Collaboration Summit in Los Angeles are getting a first-hand look at our joint success stories along with learning of our roadmap for further collaborations between the two organizations.

The keys to achieving our goals hinge on open standards such as Java Portlets (JSR-286) and CMIS, the Content Management Interoperability Services standard. We also employ other open source components such as the Activiti Business Process Management engine, an Apache-licensed project sponsored by Alfresco, and the Spring Framework, sponsored by VMWare’s SpringSource business unit.

It’s through the interoperability enabled by the aforementioned standards and open source components that joint customers can quickly deploy a solution built on Alfresco and Liferay.

Alfresco recently announced its Social Content Management message which dovetails nicely into Liferay’s social roadmap by facilitating agile collaboration and exposing the functionality through OpenSocial, an industry standard for interoperability among socially-aware applications.

For my part, I’m very excited about the prospect of working closer with Liferay and hope you do too.

You can learn more about our joint efforts by visiting alfresco.com/liferay.

Technical Overview of the Alfresco / Jive Toolkit

Tuesday, March 29th, 2011

Recently I transitioned from my long-standing role leading Alfresco’s Professional Services team to being the in-house technologist for the Business Development team, and one of my first tasks in the new role has been to work on an integration between Alfresco and Jive Engage. This work is being done in partnership with SolutionSet (a partner of both Alfresco and Jive), and I wanted to discuss some of the technical design work that has gone into the Toolkit, ahead of its availability (which will be soon after the Jive 5.0 launch – the version of Jive that the Toolkit is targeting).

Functional Overview, aka “What will it do?”

As announced at Gartner’s Portals, Collaboration & Content Summit this week, the integration (known as the “Jive Toolkit”) is a set of pre-built components that allows Jive to store documents in Alfresco while still offering all of the same social features as “native” Jive documents (commenting, rating, discussions, etc.). While not yet all-encompassing – Jive’s “social” content cannot yet be stored or managed within Alfresco – the Toolkit will provide a foundational level of document-centric integration, allowing implementers to focus on more use-case specific integrations as required (hence the positioning as a “toolkit”, rather than a fully fledged solution).

More specifically, the initial version of the Toolkit will allow users of Alfresco and/or Jive to create “managed” documents in any of the following 3 ways:

  1. By uploading a document to Alfresco, using the Jive UI.
  2. By “publishing” an existing document from Alfresco to Jive, using Alfresco’s Share UI.
  3. By “linking” an existing document stored in Alfresco to Jive, using the Jive UI.

In all 3 cases, the result is the same: the document is visible and accessible via the Jive UI in exactly the same way as any “native” document, but the content of the document is stored and managed in Alfresco only. Jive will maintain some metadata about the document – for example the document’s filename and a pointer to the document in Alfresco – but it will not store the binary content of the document. This approach ensures that the document is a first class citizen in both the Alfresco and Jive worlds, while minimising the risk of synchronisation issues between the two systems.

Here are some screenshots that demonstrate uploading a document to Alfresco using the Jive UI:

Alfresco managedocument step1

Step 1 – Navigating to a community in Jive

Alfresco managedocument step2

Step 2 – Managing a document

Alfresco managedocument step3

Step 3 – Select a file to upload

Alfresco managedocument step4

Step 4 – Select the target space in Alfresco

Alfresco managedocument step5alf

Document details (Alfresco)

Alfresco managedocument step5

Document details (Jive)

Technical Details, aka “Rubber, meet road”

As mentioned above, there are a variety of ways that the initial “linkage” of a document between Alfresco and Jive can be achieved, however all 3 creation mechanisms produce the same end state: Alfresco has the document in its entirety (including the filename, content, etc.) while Jive has a “proxy object” (a structured data-only object that has the filename and a pointer to the document in Alfresco, but does not have the actual binary content).

This means that all downstream events (updates, metadata modifications, deletes) can be handled the same way, irrespective of how the content was linked between the two systems in the first place – a major simplification in the logic for those downstream events.

Integration Mechanism, aka “CMIS, by any other name would smell as sweet…”

Another nice characteristic of this approach is that the calls from Jive to Alfresco (to create content, update and retrieve it) can be accomplished using the CMIS API. This has several benefits, from reduced development effort in the Toolkit itself (due to the ready availability of client-side CMIS libraries), to the potential for portability to other CMIS compliant repositories in the future.

One important thing to note is that the Alfresco-to-Jive API calls are not standards-based – they make use of Jive’s proprietary REST API. Jive does not expose a standards-based API (indeed, no suitable standard exists for social business systems yet), and CMIS doesn’t provide any kind of callback mechanism for clients to be notified when repository events of interest occur (i.e. a mechanism equivalent to Alfresco’s Component Policies).

Tricky Bits, aka “The Devil is in the Details”

As with any integration between complex enterprise applications, there is some trickery in some parts of the integration, and it’s critical to understand these if you’re evaluating the Jive Toolkit.

Deletion

The first piece of trickery involves deletion of the content, specifically deletion in Alfresco. Because Jive maintains a pointer to the document in Alfresco (specifically, the “cmis:id”), rather than the content itself, if the document is deleted in Alfresco without Jive being notified, attempts within Jive to retrieve that content will fail. To prevent this, the Toolkit is currently designed to veto deletes in Alfresco if the document has been socialised in Jive. To delete a document, it will first need to be deleted in Jive at which point it can be deleted from Alfresco too. The reason the Toolkit doesn’t simply synchronise deletes between Jive and Alfresco is that there are common use cases where the document may be removed from Jive, but needs to be retained in Alfresco – replicating deletes between the two systems would have ruled out these use cases.

Full Text Indexing

The second item of trickery revolves around full text indexing of the document in Jive. To accomplish this, Jive will retain a copy of the content of the document just long enough to index it into Jive’s full text index, and once indexing is complete the content of the document will be removed from Jive. As you’d expect, Alfresco will also notify Jive of any updates to the document, so that the content can be re-indexed on the Jive side.

Access Control and Identity

Access control to the documents is also tricky, primarily because the Alfresco and Jive ACL models differ in their level of granularity. Jive’s access control is primarily Community-centric (i.e. defined and enforced at the level of the Community), while Alfresco has a fine grained, per-node (file or folder) ACL mechanism. In this first release, the Toolkit will initially create the document in both systems in such a way that the ACLs are in sync, but modification of those ACLs in either system will not be replicated to the other system. The upshot is that direct manipulation of the document’s ACLs in Alfresco may cause errors in Jive (i.e. users who can see the document in the Jive UI, but are unable to download it).

Furthermore, in order for Alfresco and Jive to agree on the principal set, the initial version of the Toolkit assumes that both Alfresco and Jive are configured to use the same LDAP repository for user identity and authentication. During the design sessions it was felt that this was likely to be a requirement for an integrated solution anyway and hence wouldn’t be an impediment, but we’re keen to have that assumption validated as broadly as possible.

In Conclusion

So there you have it – a whirlwind tour of the upcoming Jive Toolkit! As a v1.0 there are some more sophisticated use cases that the Toolkit doesn’t address yet, including multi-document / library based integration, and capture of Jive’s social content (discussions, ratings, wiki pages, etc.) in Alfresco. The intention with the Toolkit is to initially provide Alfresco+Jive Systems Integrators (such as SolutionSet) with a small but solid base on which such extensions could be built, and if/when common requirements are identified for these more sophisticated use cases they can be rolled back into the Toolkit.

We’re keen to hear your feedback and look forward to your participation in the project!

Alfresco and Groovy, Baby!

Thursday, August 19th, 2010

For quite a few years now I’ve been a fan of scripted languages that run on the JVM, initially experimenting with the venerable BeanShell, then tinkering with Javascript (via Rhino), JRuby and finally discovering Groovy in late 2007. A significant advantage that Groovy has over most of those other languages (with the possible exception of BeanShell), is that it is basically a superset of Java, so most valid Java code is also valid Groovy code and can therefore be executed by the Groovy “interpreter”1 without requiring compilation, packaging or deployment – three things that significantly drag down one’s productivity with “real” Java.

To that end I decided to see if there was a way to implement Alfresco Web Scripts using Groovy, ideally in the hope of gaining access to the powerful Alfresco Java APIs with all of the productivity benefits of working in a scripting-like interpreted environment.

It turns out that the Spring Framework (a central part of Alfresco) moved in this direction some time ago, with support for what they refer to as dynamic-language-backed beans. Given that a Java backed Web Script is little more than a Spring bean plus a descriptor and some view templates, initially it seemed like Groovy backed Web Scripts might be possible in Alfresco already, merely by adding the Groovy runtime JAR to the Alfresco classpath and then configuring a Java-backed Web Script with a dynamic-language-backed Spring bean.

Oh behave!

Unfortunately this approach ran into one small snag: Alfresco requires that Java Web Script beans have a “parent” of “webscript”, as follows:

  <bean id="webscript.my.web.script.get"
        class="com.acme.MyWebScript"
        parent="webscript">
    <constructor-arg index="0" ref="ServiceRegistry" />
  </bean>

but Spring doesn’t allow dynamic-language-backed beans to have a “parent” clause.

It’s freedom baby, yeah!

There are several ways to work around this issue, but the simplest was to implement a “proxy” Web Script bean in Java that simply delegates to another Spring bean, which itself could be a dynamic-language-backed Spring bean implemented in any of the dynamic languages Spring supports.

This class ends up looking something like (imports and comments removed in the interest of brevity):

    public class DelegatingWebScript
        extends DeclarativeWebScript
    {
        private final DynamicDeclarativeWebScript dynamicWebScript;

        public DelegatingWebScript(final DynamicDeclarativeWebScript dynamicWebScript)
        {
            this.dynamicWebScript = dynamicWebScript;
        }

        @Override
        protected Map executeImpl(WebScriptRequest request, Status status, Cache cache)
        {
            return(dynamicWebScript.execute(request, status, cache));
        }
    }

While DynamicDeclarativeWebScript looks something like:

    public interface DynamicDeclarativeWebScript
    {
        Map execute(WebScriptRequest request, Status status, Cache cache);
    }

This Java interface defines the API the Groovy code needs to implement in order for the DelegatingWebScript to be able to delegate to it correctly when the Web Script is invoked.

The net effect of all this is that a Web Script can now be implemented in Groovy (or any of the dynamic languages Spring supports for beans), by implementing the DynamicDeclarativeWebScript interface in a Groovy class, declaring a Spring bean with the script file containing that Groovy class and then configuring a new DelegatingWebScript instance with that dynamic bean. This may sound complicated, but as you can see in this example, is pretty straightforward:

  <lang:groovy id="groovy.myWebScript"
               refresh-check-delay="5000"
               script-source="classpath:alfresco/extension/groovy/MyWebScript.groovy">
    <lang:property name="serviceRegistry" ref="ServiceRegistry" />
  </lang:groovy>

  <bean id="webscript.groovy.myWebScript"
        class="org.alfresco.extension.webscripts.groovy.DynamicDelegatingWebScript"
        parent="webscript">
    <constructor-arg index="0" ref="groovy.myWebScript" />
  </bean>

While a little more work than I’d expected, this approach meets all of my goals of being able to write Groovy backed Web Scripts, and in the interests of sharing I’ve put the code up on the Alfresco forge.

I demand the sum… …OF 1 MILLION DOLLARS!

But wait – there’s more! Not content with simply providing a framework for developing custom Web Scripts in Groovy, I decided to test out this framework by implementing a “Groovy Shell” Web Script. The idea here is that rather than having to develop and register a new Groovy Web Script each and every time I want to tinker with some Groovy code, instead the Web Script would receive the Groovy code as a parameter and execute whatever is passed to it.

Before we go any further, I should mention one very important thing: this opens up a massive script-injection-attack hole in Alfresco, and as a result this Web Script should NOT be used in any environment where data loss (or worse!) is unacceptable!! It is trivial to upload a script that does extremely nasty things to the machine hosting Alfresco (including, but by no means limited to, formatting all drives attached to the system) so please be extremely cautious about where this Web Script gets deployed!

Getting back on track, I accomplished this using Groovy’s GroovyShell class to evaluate a form POSTed parameter to the Web Script as Groovy code (this is conceptually identical to Javascript’s “eval” function, hence the warning about injection attacks). Effectively we have a Groovy-backed Web Script that interprets an input parameter as Groovy code, and then goes ahead and dynamically executes it! It’s turtles all the way down!

The code also transforms the output of the script into JSON format, since there are existing Java libraries for transforming arbitrary object graphs (as would be returned by an arbitrary Groovy script) into JSON format.

Here’s a screenshot showing the end result:

Alfresco Groovy Shell

Alfresco Groovy Shell - Vanilla Groovy Script

The more observant reader will have noticed the notes in the top right corner, particularly the note referring to a “serviceRegistry” object. Before evaluating the script, the Web Script injects the all important Alfresco ServiceRegistry object into the execution context of the script, in a Groovy variable called “serviceRegistry”. The reason for doing so is obvious – this allows the script to interrogate and manipulate the Alfresco repository:

Alfresco Groovy Shell

Alfresco Groovy Shell - Groovy Script that Interrogates the Alfresco Repository

Sharks with lasers strapped to their heads!

Now if you look carefully at this script, you’ll notice that it (mostly) looks like Java, and this is where the value of this Groovy Shell Web Script starts to become apparent: because most valid Java code is also valid Groovy code, you can use this Web Script to prototype Java code that interacts with the Alfresco repository, without going through the usual Java rigmarole of compiling, packaging, deploying and restarting!

I recently conducted an in-depth custom code review for an Alfresco customer who had used Java extensively, and this Web Script was a godsend – not only did I eliminate the drudgery of compiling, packaging and deploying the customer’s custom code (not to mention restarting Alfresco each time), I also completely avoided the time consuming (and, let’s be honest, painful) task of trying to reverse engineer their build toolchain so that I could build the code in my environment. This alone was worth the price of admission, but coupled with the rapid turnaround on changes (the mythical “edit / test / edit / test” cycle), I was able to diagnose their issues in a much shorter time than would otherwise have been possible.

Conclusion

As always I’m keen to hear of your experiences with this project should you choose to use it, and am keen to have others join me in maintaining and enhancing the code (which is surprisingly little, once all’s said and done).


1 Technically Groovy does not have an interpreter; rather it compiles source scripts into JVM bytecode on demand. The net effect for the developer however is the same – the developer doesn’t have to build, package or deploy their code prior to execution – a serious productivity boost.

Version Baselining

Tuesday, November 3rd, 2009

One of the great things about working with Alfresco is the vast number of extension points the system offers to developers.  Some of these stem from the pervasive use of the Spring framework, some of them to a well thought out application architecture, and many of them from a number of guiding principles that are consistently applied even when their potential uses aren’t necessarily known with certainty ahead of time.

I recently had the pleasure of being reminded of this latter case when a customer asked for an extension that allowed their content contributors to control the “baseline” version number of documents in their Alfresco installation.  The idea was to allow their contributors to (optionally) enter a version number along with each document, and have the Alfresco versioning system start with that version number instead of the default of 1.0.

Although I didn’t know how this might be achieved, in less than 10 minutes I had my answer and it relied on a slight variation of a mechanism that I’d used in the past.  The customer was also gracious enough to release the IP, so I’ve made the initial version of the extension available on google code.

Here is a brief overview of its usage:

This extension works by extending Alfresco with a custom content type called “Version Baselined Content” that includes a single property called “Base Version”.  This property is where the content contributor can set the base version to be used if/when versioning is enabled on the document.

In order to create content of this type, “Version Baselined Content” needs to be selected in the “Type” dropdown of the “Add Content Dialog”:

Provided the “Modify all properties when this page closes” checkbox is left checked (the default), the contributor will then be presented with the option to specify the base version number for this document (if/when versioning is enabled):

The default value for this field is “0.1″ – if the contributor elects to skip modification of the new content’s properties, this is the base version number it will be assigned automatically.

The base version number must be a valid non-negative decimal number (ie. it must be a number greater than or equal to 0.0).  If an invalid value is entered, an error will be displayed when the user clicks the “OK” button.

Once the version number is populated, it may be edited via the document’s properties as many times as are necessary, up until the time versioning is enabled for the document:

Once versioning is enabled for the document, the initial version number will be set to the value of the “Base Version Number” property at that time:

From this point on, any modifications to the “Base Version Number” property will be ignored as it is not possible to renumber an existing Alfresco version history.

Other than allowing explicit control over the initial version number for a document, this extension does not change any other versioning behavior in the system.  For example creating a new minor revision of a document (via checkout and checkin) will increment the version number by 0.1.  Similarly, creating a new major revision of a document (via checkout and checkin) will increment the major component of the version number by 1, and set the minor component to 0:

While the extension is quite neat and (due to the generosity of the customer) available for anyone to use, refine and extend, what really grabbed me as I developed it was how, despite having no prior experience with this particular extension point, it was familiar enough that I was able to understand it almost immediately and leverage it to achieve the desired goal.

Bulk Import from a Filesystem

Thursday, October 22nd, 2009

The Use Case

In any CMS implementation an almost ubiquitous requirement is to load existing content into the new system. That content may reside in a legacy CMS, on a shared network drive, on individual user’s hard drives or in email, but the requirement is almost always there – to inventory the content that’s out there and bring some or all of it into the CMS with a minimum of effort.

Alfresco provides several mechanisms that can be used to import content, including:

Alfresco is also fortunate to have SI partners such as Technology Services Group who provide specialised content migration services and tools (their open source OpenMigrate tool has proven to be popular amongst Alfresco implementers).

That said, most of these approaches suffer from one or more of the following limitations:

  • They require the content to be massaged into some other format prior to ingestion
  • Orchestration of the ingestion process is performed external (ie. out-of-process) to Alfresco, resulting in excessive chattiness between the orchestrator and Alfresco.
  • They require development or configuration work
  • They’re more general in nature, and so aren’t as performant as a specialised solution

An Opinionated (but High Performance!) Alternative

For that reason I recently set about implementing a bulk filesystem import tool, that focuses on satisfying a single, highly specific use case in the most performant manner possible: to take a set of folders and files on local disk and load them into the repository as quickly and efficiently as possible.

The key assumption that allows this process to be efficient is that the source folders and files must be on disk that is locally accessible to the Alfresco server – typically this will mean a filesystem that is located on a hard drive physically housed in the server Alfresco is running on.  This allows the code to directly stream from disk into the repository, which basically devolves into disk-to-disk streaming – far more efficient than any kind of mechanism that requires network I/O.

How those folders and files got onto the local disk is left as an exercise for the reader, but most OSes provide efficient mechanisms for transferring files across a network (rsync and robocopy, for example).  Alternatively it’s also possible to mount a remote filesystem using an OS-native mechanism (CIFS, NFS, GFS and the like), although doing so reintroduces network I/O overhead.

Another key differentiator of this solution is that all of the logic for ingestion executes in-process within Alfresco.  This completely eliminates expensive network RPCs while ingestion is occurring, and also provides fine grained control of various expensive operations (such as transaction commits / rollbacks).

Which leads into another advantage of this solution: like most transactional systems, there are some general strategies that should be followed when writing large amount of data into the Alfresco repository:

  1. Break up large volumes of writes into multiple batches – long running transactions are problematic for most transactional systems (including Alfresco).
  2. Avoid updating the same objects from different concurrent transactions.  In the case of Alfresco, this is particularly noticeable when writing content into the same folder, as those writes cause updates to the parent folder’s modification timestamp.[EDIT] In recent versions of Alfresco, the automatic update of a folder’s modification timestamp (cm:modified property) has been disabled by default. It can be turned back on (by setting the property “system.enableTimestampPropagation” to true), but the default is false so this is likely to be less of an impact to bulk ingestion than I’d originally thought.

The bulk filesystem import tool implements both of these strategies (something that is not easily accomplished when ingestion is coordinated by a separate process).  It batches the source content by folder, using a separate transaction per folder, and it also breaks up any folder containing more than a specific number of files (1,000 by default) into multiple transactions.  It also creates all of the children of a given folder (both files and sub-folders) as part of the same transaction, so that indirect updates to the parent folder occur from that single transaction.

But What Does this Mean in Real Life?

The benefit of this approach was demonstrated recently when an Alfresco implementation had a bulk ingestion process that regularly loaded large numbers (1,000s) of large image files (several MBs per file) into the repository via CIFS.  In one test, it took approximately an hour to load 1,500 files into the repository via CIFS.  In contrast the bulk filesystem import tool took less than 5 minutes to ingest the same content set.

Now clearly this ignores the time it took to copy the 1,500 files onto the Alfresco server’s hard drive prior to running the bulk filesystem import tool, but in this case it was possible to modify the sourcing process so that it dropped the content directly onto the Alfresco server’s hard drive, providing a substantial (order of magnitude) overall saving.

What Doesn’t it Do (Yet)?

Despite already being in use in production, this tool is not what I would consider complete.  The issue tracker in the Google Code project has details on the functionality that’s currently missing; the most notable gap being the lack of support for population of metadata (folders are created as cm:folder and files are created as cm:content). [EDIT] v0.5 adds a first cut at metadata import functionality.  The “user experience” (I hesitate to call it that) is also very rough and could easily be substantially improved. [EDIT] v0.4 added several UI Web Scripts that significantly improve the usability of the tool (at least for the target audience: Alfresco developers and administrators).

That said, the core logic is sound, and has been in production use for some time.  You may find that it’s worth investigating even in its currently rough state.

[POST EDIT] This tool seems to have attracted quite a bit of interest amongst the Alfresco implementer community. I’m chuffed that that’s the case and would request that any questions or comments you have be raised on the mailing list.  If you believe you’ve found a bug, or wish to request an enhancement to the tool, the issue tracker is the best place. Thanks!


Alfresco Home | Legal | Privacy | Accessibility | Site Map | RSS  RSS

© 2012 Alfresco Software, Inc. All Rights Reserved.