Over the last week or so we’ve been having an interesting discussion on one of our internal mailing lists about File System Receivers (FSRs) and ACID transactions, and I thought it was worth sharing.
The question that was originally posed was quite simple:
If I have a web site where pages are typically made up of multiple assets, isn’t it possible that for a split second during FSR deployment there will be a window in which not all of those assets will be available, as they may not have been copied from the Alfresco WCM authoring environment yet?
(note: edited for brevity)
Now the short answer is this question is yes, this window does indeed exist, but to understand the duration of the window (and hence the likelihood of this situation occurring), one has to look a little deeper into how Alfresco WCM deployment works, and more importantly how the FSR attempts to implement ACID transactions on top of non-transactional filesystems.
FSRs basically implement transactional deployment via the following process:
- All of the content modifications received from the authoring environment as part of a single deployment job are written to the depdata directory
- Assuming step #1 succeeds in its entirety, the transaction is committed, which involves replicating the changes across from the depdata directory to the target directory (this step is assumed to succeed - it’s conceptually the same as the commit phase in the 2 phase commit protocol)
- Assuming step #1 fails at some point, the transaction is rolled back, which involves reseting the depdata directory and leaving the target directory untouched
Now in terms of ACIDity, this:
- Is Atomic - all changes either appear in the target directory or none of them do
- Is (mostly*) Consistent - prior to step 2 the target directory contains the content from the previously deployed snapshot, and after step 2 contains the content from the deployed snapshot, both of which are consistent states for the target directory to be in
- Is (mostly*) Isolated - the only process modifying the target directory is the FSR, and read accesses do not require isolation from one another
- Is Durable - to the extent that the underlying filesystem is durable
* While the commit phase is in progress, other applications (web servers, app servers, etc.) could read multiple assets that are inconsistent with one another (one is the version prior to deployment while the other is the version post deployment).
So an accurate answer to the original question is yes, there is a window in which an inconsistent state could be read, however that window is limited to the commit phase of the FSR. In database terms, FSR transactions allow both non-repeatable and repeatable (phantom) reads of the target directory while the commit phase is in progress.
But how long is this window, so that we can get a sense for the risk of it occurring?
Well the answer to that is that it depends on both the raw I/O horsepower of the disk subsystem(s) the depdata and target directories are located on, the typical size of the deployment jobs and (to a lesser extent) the makeup of those deployment jobs (file vs directory operations and creates vs updates vs deletes). The good news is that there’s no network I/O during the commit phase (at least not unless either the depdata or target directories are located on a mounted network drive - an architecture that needs to be carefully considered), so the questioner’s concerns about files “not having been copied from the Alfresco WCM authoring environment yet” are unfounded.
This suggests some crude tuning tips for servers that host FSRs:
- keep the FSR’s metadata directories on a different disk subsystem to the target directory (it’s typically faster to read from one disk subsystem while writing to another disk subsystem than to do both on the same disk subsystem)
- consider using a disk subsystem for the target directory that maximises write performance (eg. RAID 0), but balance this against the (heavy!) read load that’s likely to be coming in from the web server or app server
- consider optimising your filesystems and disk subsystem for the typical sizes of assets that you’re deploying (some filesystems perform better or worse with small files, for example)
In addition, in Alfresco v3.0 the commit phase has been optimised via the use of multi-threading and java.nio, which should reduce the duration of the commit window, particularly for large deployment jobs.
The email conversation then moved on to ways that this window might be mitigated, but I’ll reserve that as a topic for my next post.