Navigation

Platform News

Fighting the Sea of Unstructured ContentEvery business has a content and collaboration problem of some sort. The problem usually sounds something like this…

  • “We’re drowning in documents (or videos or images). We don’t know what we have and none of it is organized. We waste so much time and money recreating stuff that probably already exists—if we could just find it.”
  • “We’ve got serious business risk caused by people using the first thing they find instead of the right thing.”
  • “We have a process for sending stuff around to the rest of the team for review and approval, but we have no idea what’s in flight or who we’re waiting on or why.”
  • “We have teams of people from both inside and outside the organization that need to be able to work together efficiently. They need to share files, of course, but really, it’s more than that.”
  • “We’ve got business systems that generate, store, and process things like reports and images at an alarming rate.”

I think content management professionals tend to think that everyone else knows a content management problem when they see it. They move right to trying to categorize it using the acronyms they’ve grown accustomed to. But I think it’s time we realized that most people, business people as well as non-content management-focused technical folks, don’t care (and shouldn’t care) about the difference between Document Management, Digital Asset Management, Records Management, Web Content Management, and the rest. The pains they’ve expressed above can be boiled down to:

  • I’ve got a ton of files (I doubt they would say “content” but they might).
  • I’ve got people that produce them, sometimes collaboratively, and people that consume them.
  • I want to somehow make it easier to deal with all of this.

In a past life, we called this the “capture, organize, and share” problem and that still describes the situation today even as the amount of content, number of channels, and type of devices have exploded. It still comes down to those three things.

For many IT organizations, this is an unwieldy problem, made so by the volume of data that’s involved, security and metadata requirements that are often associated with it, and the sometimes squishy business processes that permeate the whole mess. Compounding the frustration is that the structured data approach—stick it in a relational database and slap a front-end on it—is ill-suited to address this problem.

What’s certain is that most IT organizations won’t be able to anticipate the “capture, organize, and share” needs of the business with enough specificity for a one-size-fits-all approach to be practical. The best plan is to establish an unstructured stack. Or call it a pattern of use. Or a toolkit. It doesn’t matter what you call it, the point is that, just like you did when you standardized on a relational database, some middleware, reporting tools, and development frameworks, you can do the same for your “content and collaboration” problems.

Forming the foundation of this stack is a content repository. The content repository is purpose-built to handle unstructured data (like files and their associated metadata) so much more efficiently than traditional relational databases. Just like in relational land, your stack includes more than just data persistence. There’s a whole slew of helpful services that wrap around your repository. Now you’ve got a platform.

In my mind, a platform is a set of services that developers can leverage to build applications. The services that are important vary depending on the application being built. For content-centric applications, the critical services might be a subset of this list:

  • User interface/presentation layer
  • Data model/Content model/persistance layer
  • Library services (check-in/check-out, upload/download, versioning)
  • Transactions
  • Security
  • Workflow/Business Process Management (BPM)
  • Integration hooks
  • Scheduler
  • Public API
  • Search
  • Transformation/Rendition
  • Tagging/Categorization
  • Development model (config, customize, extend, manage, deploy)

So you take the repository and wrap it with some or all of these services and you’ve got a platform. It’s a platform upon which just about any content-centric application can be built. If it seems like a lot of work, it is. The good news is that you don’t have to build either the repository or the platform yourself. It exists today.

Alfresco is an open source, standards compliant platform for building content-centric applications. It offers functionality to address each of these services. I think that’s what got me excited about it originally. Here was this toolkit that offered all of the functionality of the so-called “leaders” in the space, but in a much more svelt package that was open source and either freely-available for the community-supported edition or available at a fraction of the cost for the commercially-supported edition. Thousands of others have made the same realization: You can save time and money by addressing your “capture, organize, and share” problems with Alfresco.

This has been true from our first release. Since then, major innovations have been added that makes Alfresco even more useful as a platform. In terms of the services listed above, the most notable enhancements have been in the areas of: Public API, Integration Hooks, and Workflow/BPM. Let’s take a look at the specifics.

Web Scripts (Integration hooks, Public API, Development Model)

By far, the most critical addition to the platform has been the Web Script Framework. Now part of Spring, the Web Script Framework provides developers with an easy way to expose the repository in a RESTful way. This was perfectly timed because it happened at about the same time REST came into favor as a method of remotely interacting with a set of services compared to heavier alternatives such as SOAP or RMI.

In a nutshell, Web Scripts are a Model-View-Controller implementation based on your choice of either server-side JavaScript or Java for the controller and FreeMarker for the view. Originally, it was kind of up to you to define and configure a REST API that made sense for your application, but over time Alfresco has implemented a full-fledged REST API for the most common types of interactions with the repository. Now you have a choice: You can create new Web Scripts to implement your own API, make calls to Alfresco’s out-of-the-box REST API, extend Alfresco’s existing web scripts to change their behavior, or use some mix of all three.

Web Scripts are fun and fast to develop. And it is a development model that should be easy for your team to pick up. The pattern is extremely common and well-known to most developers.

CMIS (Integration Hooks, Public API)

Before SQL was standardized, query languages were specific to the database an app sat on top of. Once standardization happened, developers could write software that, with a certain amount of care, was pretty much guaranteed to run on any compliant back-end. The Content Management Interoperability Services (CMIS) standard seeks to do that for rich content repositories. It is a vendor-neutral, language-neutral standard for working with Documents, Folders, ACLs, Types, and Relationships that exist in a repository. Theoretically, developers can write their application against the CMIS API and it will work against other CMIS-compliant repositories.

Alfresco was first to production with a fully-compliant CMIS server. And we continue to invest heavily in the standard through our involvement on the OASIS Technical Committee and through our leadership on the Apache Chemistry project, home to the CMIS reference server implementation and several client libraries, including Java, PHP, .NET, and Python. In short, we want to be the leader when it comes to fast, scalable, standards-compliant CMIS servers.

When people ask for advice on building content-centric applications on top of Alfresco, the first thing I tell them is to use CMIS as much as possible. It will save you all kinds of time. Plus, if you do decide to switch repositories at some point in the future, you might be able to salvage some or all of the front-end.

Additional file protocols: SharePoint, IMAP, and SMTP

Alfresco has supported WebDAV, FTP, and CIFS/SMB from the beginning. It was important to do that so that content creators could use the tools they were used to working with, but still be able to enjoy the security, metadata, and search benefits of storing their content in a repository like Alfresco instead of a file share or FTP server.

Over time that goal didn’t change, but the number of different tools and clients supported increased. We now support the SharePoint Protocol which means Alfresco looks like a SharePoint server to tools like Microsoft Office. The repository can also be accessed as if it were a shared folder in a mail client (IMAP) and, because every space in the repository has an email address, content creators can simply email content into the repository via our inbound SMTP support.

Workflow: jBPM & Activiti

Ah, if I had a nickel for every time I custom-developed a finite state machine . . . Okay, I guess I’d only have about 15 cents. The point is that before workflow engines and frameworks were common developers created their own. Even in platforms that were supposedly “good at workflow” developers wanted something even more flexible so that the business process could change without requiring low-level code changes.

The first time I saw JBoss jBPM was an epiphany similar to my first Alfresco experience. Lots of companies were out there selling workflow engines for hundreds of thousands of dollars and here was one for free that seemed pretty great. Alfresco embeds JBoss jBPM which means there’s a workflow service you can call from your code that already comprehends that you’re probably routing documents that live in Alfresco, making assignments to Alfresco users, and showing a task list in the Alfresco web client. Developers simply deploy their business process and Alfresco takes care of the rest.

In 4.0 we’ll be adding a second workflow engine to the platform called Activiti. Activiti is a new Alfresco-sponsored open source project created by some of the original jBPM developers. It’s an Apache-licensed workflow engine that’s BPMN 2.0 compliant, which is an important standard in the space. One of the nice things about the Alfresco architecture is that the new Activiti engine just slides right into place and sits side-by-side with jBPM. Developers can choose which engine to use based on their specific requirements.

Summary

Every business has a “capture, organize, and share” problem—the only question is one of scale and scope. The IT organization can help address the business pains and save time for their technical teams by identifying a stack that can be used to address content and collaboration problems. It is important that the stack be as open and standards-compliant as possible because that should lower costs and keep your options open—it’s hard to anticipate what the business will need next, so don’t tie your hands with closed or inflexible solutions.

The Alfresco platform is a great choice as the foundation of your content and collaboration stack. The development model is fast and fun, there are a variety of file protocols and API’s for getting content into and out of the repository, and we give you more options—not less—in terms of operating systems, databases, application servers, and integration with other Enterprise systems.

Why not take a look at Alfresco today? We have three editions to choose from depending on your needs (Community, Team, Enterprise). All offer at least one way to get started for free and in the case of the Community Edition, you can run free forever for as many documents and users as you need.

About the author

Jeff Potts

Jeff Potts

Jeff Potts is the Chief Community Officer of Alfresco Software. Jeff has been working with Alfresco since 2005 and has 20 years of content management, document management, and collaboration experience. Jeff wrote the first developer-focused book on Alfresco, the Alfresco Developer Guide, and recently co-authored CMIS & Apache Chemistry in Action. Follow Jeff on his blog at http://ecmarchitect.com.

Leave a comment

Previous Post:

The opinions expressed on this website are those of each author, not of the author's employer or of Alfresco.
© 2014 Alfresco Software, Inc. All Rights Reserved.