Using Apache to load balance Alfresco Share with an Alfresco Repository Cluster and clustering Alfresco Share.

This post assumes a reasonable sys-admin level of knowledge of Alfresco and assumes you are already familiar with setting up the Alfresco Repository in a cluster configuration and also familiar configuring an Apache web-server instance.

For high availability and performance reasons, clustering the Alfresco Repository is a commonly used setup. But it is also easy to use Apache for both load balancing an Alfresco Share web-tier against an Alfresco cluster and also to cluster (i.e. web farm) the Alfresco Share web-tier itself – or both those combined together. This post examines the configuration required for those setups using Alfresco 3.4/4.0.

These are the three setups we will be considering. Note that all examples assume you have already successfully setup a working Alfresco 3.4/4.0 repository cluster. It also assumes you are deploying the Share web-tier(s) to separate TomCat instance, not the same one as the Alfresco repository instances.

  1. Single Share instance load balanced to a repository cluster e.g. Browser->Share->Apache->Repo Cluster
  2. Share tier cluster to a single repository instance e.g. Browser->Apache->Share Cluster->Repo
  3. Share tier cluster load balanced to a repository cluster e.g. Browser->Apache->Share Cluster->Apache->Repo Cluster

NOTE: all the example config below mentions “localhost” – as for testing purposes I set this all up on a single machine, changing the various TomCat ports as I go – but in reality you would probably be using difference machine instances (virtual or otherwise) for each, so care must be taken when setting up the config!

For setup 1. Configuring Share to point to an Apache reverse proxy that is setup to load balance to a multi-node Alfresco repository cluster.

You should be familiar with “share-config-custom.xml” – generally located in \tomcat \shared\classes\alfresco\web-extension and used to override the common Share startup parameters, including the remote endpoint locations. There is an example “share-config-custom.xml.sample” provided in the Alfresco distribution as a starting point. So add this section:

   <config evaluator="string-compare" condition="Remote">
      <remote>
         <endpoint>
            <id>alfresco-noauth</id>
            <name>Alfresco - unauthenticated access</name>
            <description>Access to Alfresco Repository WebScripts that do not require authentication</description>
            <connector-id>alfresco</connector-id>
            <endpoint-url>http://localhost:8089/alfresco/s</endpoint-url>
            <identity>none</identity>
         </endpoint>

         <endpoint>
            <id>alfresco</id>
            <name>Alfresco - user access</name>
            <description>Access to Alfresco Repository WebScripts that require user authentication</description>
            <connector-id>alfresco</connector-id>
            <endpoint-url>http://localhost:8089/alfresco/s</endpoint-url>
            <identity>user</identity>
         </endpoint>

         <endpoint>
            <id>alfresco-feed</id>
            <name>Alfresco Feed</name>
            <description>Alfresco Feed - supports basic HTTP authentication via the EndPointProxyServlet</description>
            <connector-id>http</connector-id>
            <endpoint-url>http://localhost:8089/alfresco/s</endpoint-url>
            <basic-auth>true</basic-auth>
            <identity>user</identity>
         </endpoint>
      </remote>
   </config>

The endpoint-url elements are the important bits – set them to your Apache location, in my example this is “localhost:8089”.
And in your Apache httpd.conf file, enable the various proxy/balancer modules (these modules are used in both examples – so you might not actually need them all, but there is no harm in enabling them):

LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_ajp_module modules/mod_proxy_ajp.so
LoadModule proxy_balancer_module modules/mod_proxy_balancer.so
LoadModule proxy_connect_module modules/mod_proxy_connect.so
LoadModule proxy_http_module modules/mod_proxy_http.so

And then add this section:

# Reverse Proxy Settings (cluster load balancing)
ProxyRequests Off
#NOTE: sticksession only required if non-default Alfresco auth stack is used i.e. NTLM2 or similar
#stickysession=JSESSIONID|jsessionid nofailover=On
ProxyPass /alfresco balancer://app
ProxyPassReverse /alfresco balancer://app
<Proxy balancer://app>
  #Add your Alfresco cluster nodes here
  BalancerMember ajp://localhost:8009/alfresco route=tomcat1
  BalancerMember ajp://localhost:8014/alfresco route=tomcat2
</Proxy>

The various cluster nodes should be added into the Proxy element as above, then in each Alfresco node instance TomCat server.xml, ensure the AJP connector is enabled and that the port matches the BalancerMember setting above, also ensure the Engine element jvmRoute attribute is set to match i.e using the above example:

<Connector port="8009" protocol="AJP/1.3" redirectPort="8443" />

and

<Engine name="Catalina" defaultHost="localhost" jvmRoute="tomcat1">

Note that sticky sessions are not required by default when using the standard Alfresco authentication stack as the communication between Share and Alfresco is completely stateless and does not require a web Session. This means that your Apache instance can load balance the requests, even multiple Ajax requests from a single page instance, across multiple Alfresco cluster nodes to give the best performance. However, the side effect of this is that Lucene results will potentially differ in a cluster and Lucene is used to retrieve even simple things like Share document list and various picker results in 3.4 and below. The good news is that from Alfresco 4.0 and onwards, Lucene is no longer used for list or picker results by default (canned DB queries are used instead), so sticky sessions are not required in 4.0.

For setup 2. We need to configure an Apache forward proxy to point to multiple Share web-tier instances, also each user browser must be configured to point to the Apache proxy. Load balancing can of course be used, although the simple Apache config below doesn’t show that. This time however sticky sessions are required between the browser and the Share instance as a small amount of Session state is managed for each user in Share.

Apache httpd.conf (not a load balanced example but you get the idea)

# Forward Proxy Settings
ProxyRequests On
ProxyVia On
<Proxy *>
  Order Allow,Deny
  Allow from all
</Proxy>

You may be familiar with “custom-slingshot-application-context.xml” – generally located in \tomcat \shared\classes\alfresco\web-extension and used to override the Spring application context beans for Share. There is an example “custom-slingshot-application-context.xml.sample” provided in the Alfresco distribution as a starting point.
So add this section  to each Share tomcat instance to disable the appropriate web-tier caches (this example actually comes from the currently shipped custom-slingshot-application-context.xml.sample, which is where the helpful comments come from):

   <!-- Web-tier cluster configuration -->
   <!-- Enable this section if you are clustering or load balancing the share.war i.e. multiple web-tiers behind a proxy -->
   <!-- If you have a single web-tier running against an Alfresco cluster via a reverse proxy you don't need this -->
   <bean id="webframework.slingshot.persister.remote" class="org.springframework.extensions.surf.persister.PathStoreObjectPersister" parent="webframework.sitedata.persister.abstract">
      <property name="store" ref="webframework.webapp.store.remote" />
      <property name="pathPrefix"><value>alfresco/site-data/${objectTypeIds}</value></property>
      <property name="noncachableObjectTypes">
         <set>
            <value>page</value>
            <value>component</value>
         </set>
      </property>
   </bean>

For setup 3. Use a combination of the above two setups! So modify the “custom-slingshot-application-context.xml“ and “share-config-custom.xml” for each Share TomCat instance to configure the forward and reverse proxy Apache instances (you will need two Apache instances for this) and to disable the web-tier caches.

Tags: , , , , , , , , ,

3 Responses to “Using Apache to load balance Alfresco Share with an Alfresco Repository Cluster and clustering Alfresco Share.”

  1. fegor Says:

    Very good post!

  2. Coralie Says:

    Whoa, thnigs just got a whole lot easier.

  3. : : blyx.com : : Blog : : Toni de la Fuente : : » Blog Archive » Seguridad en las comunicaciones de Alfresco Says:

    [...] repositorio y cambiar la configuración de Share para el acceso a los puertos SSL del repositorio. Este artículo te ayudará a entender como separar en capas y su [...]

Leave a Reply


Alfresco Home | Legal | Privacy | Accessibility | Site Map | RSS  RSS

© 2012 Alfresco Software, Inc. All Rights Reserved.