Posts Tagged ‘alfresco’

Alfresco and Groovy, Baby!

Thursday, August 19th, 2010

For quite a few years now I’ve been a fan of scripted languages that run on the JVM, initially experimenting with the venerable BeanShell, then tinkering with Javascript (via Rhino), JRuby and finally discovering Groovy in late 2007. A significant advantage that Groovy has over most of those other languages (with the possible exception of BeanShell), is that it is basically a superset of Java, so most valid Java code is also valid Groovy code and can therefore be executed by the Groovy “interpreter”1 without requiring compilation, packaging or deployment – three things that significantly drag down one’s productivity with “real” Java.

To that end I decided to see if there was a way to implement Alfresco Web Scripts using Groovy, ideally in the hope of gaining access to the powerful Alfresco Java APIs with all of the productivity benefits of working in a scripting-like interpreted environment.

It turns out that the Spring Framework (a central part of Alfresco) moved in this direction some time ago, with support for what they refer to as dynamic-language-backed beans. Given that a Java backed Web Script is little more than a Spring bean plus a descriptor and some view templates, initially it seemed like Groovy backed Web Scripts might be possible in Alfresco already, merely by adding the Groovy runtime JAR to the Alfresco classpath and then configuring a Java-backed Web Script with a dynamic-language-backed Spring bean.

Oh behave!

Unfortunately this approach ran into one small snag: Alfresco requires that Java Web Script beans have a “parent” of “webscript”, as follows:

  <bean id="webscript.my.web.script.get"
        class="com.acme.MyWebScript"
        parent="webscript">
    <constructor-arg index="0" ref="ServiceRegistry" />
  </bean>

but Spring doesn’t allow dynamic-language-backed beans to have a “parent” clause.

It’s freedom baby, yeah!

There are several ways to work around this issue, but the simplest was to implement a “proxy” Web Script bean in Java that simply delegates to another Spring bean, which itself could be a dynamic-language-backed Spring bean implemented in any of the dynamic languages Spring supports.

This class ends up looking something like (imports and comments removed in the interest of brevity):

    public class DelegatingWebScript
        extends DeclarativeWebScript
    {
        private final DynamicDeclarativeWebScript dynamicWebScript;

        public DelegatingWebScript(final DynamicDeclarativeWebScript dynamicWebScript)
        {
            this.dynamicWebScript = dynamicWebScript;
        }

        @Override
        protected Map executeImpl(WebScriptRequest request, Status status, Cache cache)
        {
            return(dynamicWebScript.execute(request, status, cache));
        }
    }

While DynamicDeclarativeWebScript looks something like:

    public interface DynamicDeclarativeWebScript
    {
        Map execute(WebScriptRequest request, Status status, Cache cache);
    }

This Java interface defines the API the Groovy code needs to implement in order for the DelegatingWebScript to be able to delegate to it correctly when the Web Script is invoked.

The net effect of all this is that a Web Script can now be implemented in Groovy (or any of the dynamic languages Spring supports for beans), by implementing the DynamicDeclarativeWebScript interface in a Groovy class, declaring a Spring bean with the script file containing that Groovy class and then configuring a new DelegatingWebScript instance with that dynamic bean. This may sound complicated, but as you can see in this example, is pretty straightforward:

  <lang:groovy id="groovy.myWebScript"
               refresh-check-delay="5000"
               script-source="classpath:alfresco/extension/groovy/MyWebScript.groovy">
    <lang:property name="serviceRegistry" ref="ServiceRegistry" />
  </lang:groovy>

  <bean id="webscript.groovy.myWebScript"
        class="org.alfresco.extension.webscripts.groovy.DynamicDelegatingWebScript"
        parent="webscript">
    <constructor-arg index="0" ref="groovy.myWebScript" />
  </bean>

While a little more work than I’d expected, this approach meets all of my goals of being able to write Groovy backed Web Scripts, and in the interests of sharing I’ve put the code up on the Alfresco forge.

I demand the sum… …OF 1 MILLION DOLLARS!

But wait – there’s more! Not content with simply providing a framework for developing custom Web Scripts in Groovy, I decided to test out this framework by implementing a “Groovy Shell” Web Script. The idea here is that rather than having to develop and register a new Groovy Web Script each and every time I want to tinker with some Groovy code, instead the Web Script would receive the Groovy code as a parameter and execute whatever is passed to it.

Before we go any further, I should mention one very important thing: this opens up a massive script-injection-attack hole in Alfresco, and as a result this Web Script should NOT be used in any environment where data loss (or worse!) is unacceptable!! It is trivial to upload a script that does extremely nasty things to the machine hosting Alfresco (including, but by no means limited to, formatting all drives attached to the system) so please be extremely cautious about where this Web Script gets deployed!

Getting back on track, I accomplished this using Groovy’s GroovyShell class to evaluate a form POSTed parameter to the Web Script as Groovy code (this is conceptually identical to Javascript’s “eval” function, hence the warning about injection attacks). Effectively we have a Groovy-backed Web Script that interprets an input parameter as Groovy code, and then goes ahead and dynamically executes it! It’s turtles all the way down!

The code also transforms the output of the script into JSON format, since there are existing Java libraries for transforming arbitrary object graphs (as would be returned by an arbitrary Groovy script) into JSON format.

Here’s a screenshot showing the end result:

Alfresco Groovy Shell

Alfresco Groovy Shell - Vanilla Groovy Script

The more observant reader will have noticed the notes in the top right corner, particularly the note referring to a “serviceRegistry” object. Before evaluating the script, the Web Script injects the all important Alfresco ServiceRegistry object into the execution context of the script, in a Groovy variable called “serviceRegistry”. The reason for doing so is obvious – this allows the script to interrogate and manipulate the Alfresco repository:

Alfresco Groovy Shell

Alfresco Groovy Shell - Groovy Script that Interrogates the Alfresco Repository

Sharks with lasers strapped to their heads!

Now if you look carefully at this script, you’ll notice that it (mostly) looks like Java, and this is where the value of this Groovy Shell Web Script starts to become apparent: because most valid Java code is also valid Groovy code, you can use this Web Script to prototype Java code that interacts with the Alfresco repository, without going through the usual Java rigmarole of compiling, packaging, deploying and restarting!

I recently conducted an in-depth custom code review for an Alfresco customer who had used Java extensively, and this Web Script was a godsend – not only did I eliminate the drudgery of compiling, packaging and deploying the customer’s custom code (not to mention restarting Alfresco each time), I also completely avoided the time consuming (and, let’s be honest, painful) task of trying to reverse engineer their build toolchain so that I could build the code in my environment. This alone was worth the price of admission, but coupled with the rapid turnaround on changes (the mythical “edit / test / edit / test” cycle), I was able to diagnose their issues in a much shorter time than would otherwise have been possible.

Conclusion

As always I’m keen to hear of your experiences with this project should you choose to use it, and am keen to have others join me in maintaining and enhancing the code (which is surprisingly little, once all’s said and done).


1 Technically Groovy does not have an interpreter; rather it compiles source scripts into JVM bytecode on demand. The net effect for the developer however is the same – the developer doesn’t have to build, package or deploy their code prior to execution – a serious productivity boost.

Implementing “DocFlip” for FSRs

Thursday, October 30th, 2008

In my previous post I discussed how File System Receivers (FSRs) implement deployment transactions on top of non-transactional filesystems.  As discussed in that post, there is a window of time in which an inconsistent state could be seen by an application reading the content; that is, while the FSR is in the middle of the commit phase.  Now the duration of this window varies based on a number of factors, but in some cases it’s critical to minimise the inconsistent window as much as possible, and in these cases a technique called “docflip” can help.

I first heard about “docflip” almost 10 years ago, and have seen it in use several times since then.  The basic approach is relatively simple:

  1. Two full copies of the target directory are maintained.
  2. A symlink is used that points to one of these directories.  All applications that are reading content use this symlink exclusively (they are unaware of the two underlying directories).
  3. At any point in time:
    1. One of the directories (the one pointed to by the symlink) is the “live” copy.
    2. The other directory (that is not pointed to by anything) is the “shadow” copy.
  4. A transaction involves:
    1. Writing all of the changes to the shadow copy.
    2. Either committing the transaction, which involves:
      1. Flipping the symlink from the current live directory to the (newly updated) shadow directory, effectively swapping which directory is live and which is the shadow.
      2. Re-running step 4.1 against the (new) shadow directory (the directory that was live up until step 4.2.1) – this can also be achieved by simply rsyncing from the (new) live to the (new) shadow directory, if rerunning the original set of content modifications is too difficult or expensive.
    3. Or rolling back the transaction, which involves replacing the (partially updated) shadow directory with the contents of the current live directory, without touching the symlink at all.

Note that there are some downsides to this approach, including:

  • It requires two full copies of the target directory, which can be problematic with large content sets.
  • It assumes that applications don’t keep files open for extended periods of time – updates to a file are only visible when that file is (re)opened.
  • It doesn’t work very well on Windows platforms due to Windows’ unfortunate choice of using fully qualified paths for file handles instead of inodes, making it impossible to flip the symlink / junction if any files are currently held open by an application.

Regardless, “docflip” greatly reduces the window of time in which the filesystem is in an inconsistent state – basically to the time it takes to rewrite a symlink.  That said it doesn’t completely eliminate phantom reads, since it’s still possible for an application to read a file prior to a transaction, a transaction commits (flipping the symlink) and then the application re-reads the file a second time post transaction and the file has changed.  However without introducing read transactions (which would require changes to the applications reading the filesystem, along with some kind of transaction coordinator), it’s probably impossible to obtain serialisable isolation on non-transactional filesystems.

So now that we have a technique for minimising the time for changes to commit, how would this be implemented with an Alfresco FSR?

Without enhancing the FSR in any way, the approach I’ve considered involves:

  1. Having 3 copies of the target filesystem – one managed by the FSR, the other two (the live and shadow copies) managed by the custom “docflip” process.  As with vanilla “docflip” a symlink would point to the currently live copy of the content, and all applications reading the content would read via that symlink.
      • It’s not possible to use the FSR’s own target directory as one of the live / shadow directories, since that would require that the FSR itself can be dynamically reconfigured to ensure it always writes to the shadow (which changes with every flip of the symlink).
      1. Configuring a ProgramRunnable that calls a “docflip” shell script.  This shell script:
        1. Replicates the deployed delta from the FSR target directory to the shadow copy.
        2. Commits the transaction by flipping the symlink (ie. swaps the shadow and live copies).
        3. Re-replicates the deployed delta from the FSR target directory to the (new) shadow copy.
      2. Rollback doesn’t need to be considered, since by the time the ProgramRunnable is invoked, the FSR has already committed the deployed content to the target directory.  The only concern would be if step 2.3 fails – that would need to raise a critical administration alert of some kind since it would require manual intervention to avoid throwing all subsequent deployments into disarray.  Forcibly shutting down the FSR in this case might be justified, just to ensure that no further deployment can occur until the issue is resolved.

      Replicating the changes made to the FSR’s target directory to the “docflip” directories (steps 2.1 and 2.3) could be done in a number of ways, including:

      1. Brute force rsync of the entire target directory.
      2. Directed rsync, using the manifest of changes that are sent to the shell script by the ProgramRunnable.
      3. By interpreting the manifest of changes that are sent to the shell script by the ProgramRunnable and executing equivalent cp / rm / mkdir / rmdir commands.
      4. Implementing the entire “docflip” process in Java instead of a shell script, and directly interpreting the manifest of changes.

      These are listed in what I believe would be least dev effort / worst performance to most dev effort / highest performance.  The “sweet spot” is likely to be a combination of options 2 and 3, where rsync is used for creates / updates and rm / mkdir / rmdir are used for file deletes and directory operations.  If performance trumps all else option 4 is worth considering, possibly leveraging Java NIO and/or multi-threading techniques (being careful to preserve the order of operations listed in the manifest that are order-dependent eg. create directory A, …, …, create file A/B.txt).

      So there you have it – a (hopefully enlightening!) exploration of the intricacies of FSR deployment, as well as ways to mitigate some of the potential concerns with the default implementation.


      Alfresco Home | Legal | Privacy | Accessibility | Site Map | RSS  RSS

      © 2012 Alfresco Software, Inc. All Rights Reserved.