Configuring the Share HTML processing black/white list

Alfresco Share has a number of features to protect against XSS (Cross Site Scripting) attacks, session hijacking and similar. One of the most aggressive features is the automatic processing of 3rd party HTML to “sanitise” or “strip” out unwanted HTML tags and attributes before rendering in the page. By 3rd party HTML, I mean any HTML content that is displayed in Share that is sourced from a node content stream – such as a Wiki page, Blog post or Discussion post. So any content that may be user edited or could come from any source (not just Share itself!)

This is a well tested feature that handles all commonly known XSS attack holes and many less well known ones – including all the attack vectors listed here: http://ha.ckers.org/xss.html

One of the downsides to this, is the stripping of some otherwise useful HTML attributes and elements is mainly to support issues in legacy browsers such as IE6 and IE7. Consider the STYLE attribute – not a problem attribute you would assume, how could setting a STYLE cause an XSS attack?! Well in IE8, FireFox, Safari, Chrome etc. it can’t. But in IE6/7 Microsoft in their wisdom allowed JavaScript to be inserted into a STYLE attribute (called “CSS Expressions” – a better name would have “CSS Hacks”). This is a potential XSS hole that only affects those legacy browsers – but the HTML stripping process cannot rely on your browser agent (which of course could be faked) so must always assume the worst and strip those STYLE attributes.

For the majority Alfresco users who discarded IE6 (or even just IE…) long ago, why should they be punished with this limitation? And it is an annoying limitation, as most of the in-line editing capabilities of TinyMCE and other in-line editors that can potentially be used with Alfresco use STYLE attributes to apply formatting to much of their generated content.

In Alfresco 3.4.9/4.0.2 and onwards, it is now possible to fully configure the black/white list of HTML tags and attributes that the HTML stripping process will use.

This is the default configuration this is applied OFTB:

      <!-- the set of HTML tags considered safe for rendering when mixing with existing client-side output -->
      <!-- NOTE: define all tags in UPPER CASE only -->
      <property name="tagWhiteList">
         <set>
            <value>!DOCTYPE</value>
            <value>HTML</value>
            <value>HEAD</value>
            <value>BODY</value>
            <value>META</value>
            <value>BASE</value>
            <value>TITLE</value>
            <value>LINK</value>
            <value>CENTER</value>
            <value>EM</value>
            <value>STRONG</value>
            <value>SUP</value>
            <value>SUB</value>
            <value>P</value>
            <value>B</value>
            <value>I</value>
            <value>U</value>
            <value>BR</value>
            <value>UL</value>
            <value>OL</value>
            <value>LI</value>
            <value>H1</value>
            <value>H2</value>
            <value>H3</value>
            <value>H4</value>
            <value>H5</value>
            <value>H6</value>
            <value>SPAN</value>
            <value>DIV</value>
            <value>A</value>
            <value>IMG</value>
            <value>FONT</value>
            <value>TABLE</value>
            <value>THEAD</value>
            <value>TBODY</value>
            <value>TR</value>
            <value>TH</value>
            <value>TD</value>
            <value>HR</value>
            <value>DT</value>
            <value>DL</value>
            <value>DT</value>
            <value>PRE</value>
            <value>BLOCKQUOTE</value>
            <value>BUTTON</value>
            <value>CODE</value>
            <value>FORM</value>
            <value>OPTION</value>
            <value>SELECT</value>
            <value>TEXTAREA</value>
         </set>
      </property>
      <!-- The set of HTML tag attributes that are to be removed before rendering -->
      <!-- NOTE: define all attributes in UPPER CASE only -->
      <!-- IMPORTANT: JavaScript event handler attributes starting with "on" are always removed -->
      <property name="attributeBlackList">
         <set>
            <value>STYLE</value>
         </set>
      </property>
      <!-- The set of HTML tag attributes that are considered for sanitisation i.e. script content removed -->
      <!-- NOTE: define all attributes in UPPER CASE only -->
      <property name="attributeGreyList">
         <set>
            <value>SRC</value>
            <value>DYNSRC</value>
            <value>LOWSRC</value>
            <value>HREF</value>
            <value>BACKGROUND</value>
         </set>
      </property>

As you can see it’s quite a list. The import config for STYLE attribute processing is here:

      <property name="attributeBlackList">
         <set>
            <value>STYLE</value>
         </set>
      </property>

So simply override the black list in the stringutils bean in your custom-slingshot-application-context.xml file – generally found in \tomcat\shared\classes\alfresco\web-extension – as detailed in previous blog posts:

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans-2.0.dtd'>

<beans>

   <!-- Override HTML processing black list -->
   <bean id="webframework.webscripts.stringutils" parent="webframework.webscripts.stringutils.abstract"
         class="org.springframework.extensions.webscripts.ui.common.StringUtils">
      <property name="attributeBlackList">
         <set></set>
      </property>
   </bean>

</beans>

Restart the Share web-application and STYLE attributes will no longer be removed by Share.

20 thoughts on “Configuring the Share HTML processing black/white list

  1. Olivier

    Hi Kev,

    Thank you for this article, this is exactly what I need now. However saving custom-slingshot-application-context.xml, I get the following arror from alfresco when restarting:

    ERROR [web.context.ContextLoader] [pool-2-thread-1] Context initialization failed
    org.springframework.beans.factory.BeanDefinitionStoreException: Invalid bean definition with name ‘webframework.webscripts.stringutils’ defined in file [
    /home/alfresco/tom_shared/classes/alfresco/web-extension/custom-slingshot-application-context.xml]: Could not resolve parent bean definition ‘webframework.webscripts.stringutils.abstract’; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No bean named ‘webframework.webscripts.stringutils.abstract’ is defined

    The parent bean seems missing. Am I missing something in my configurations ?

    I’m running Alfresco 4.0.0.d.

    Regards

    Reply
  2. Kevin Roast Post author

    You need to be running a more recent version unfortunately, this is a relatively new feature.

    Either a HEAD nightly build or one of the Enterprise builds such as 4.0.3.

    Reply
  3. Kevin Roast Post author

    Yes, the development blogs do tend to be around the newer features. Fortunately the good news is that a new Community release is coming in the next few weeks. You are welcome to try out a nightly build until then to test the feature if you like.

    Reply
  4. Pingback: Kev’s blog » Blog Archive » Alfresco Community 4.2 – Shiny. Fast. Awesome.

  5. Pingback: Media Viewers - extending Alfresco viewer capability | Loftux AB

  6. Pingback: Media Viewers - exempel på hur det ser ut | Loftux AB

  7. Arno Hagen

    Hi Kevin,

    we implemented this (whitelisted the attribute) and it works fine for the Wiki module. But it still will be stripped out in the other TinyMCE instances (Blog, Discussion, HTML content). Any idea why?

    We assumed that this is a system wide configuration.

    We are on Enterprise version 4.1.1.9.

    Alfresco support is not helpful so far. Any hint is really appreciated.

    Thanks!
    Arno

    Reply
  8. Born Konrad

    I have the problem, that alfresco removes specific html tags (e.g. ‘iframe’) after saving content (in share, but also by working with WEBDAV, so this should be no SHARE issue), but I find no ‘custom-slingshot-application-context.xml’ File and I find no ‘StringUtils’ class in any Spring Context??!!
    Beside this blog, I find no information about this issue (which makes me wondering, that Alfresco has no documentation about this, how can that be???)

    Reply
  9. Kevin Roast Post author

    >We assumed that this is a system wide configuration.
    It affects all Share processing of HTML, including Wiki, Blog, Discussion etc.

    >but I find no ‘custom-slingshot-application-context.xml’ File and I find no ‘StringUtils’ class in any Spring Context?
    You create the custom-slingshot-application-context.xml file in your web-extension folder (which is part of the installed tomcat – e.g. tomcat/W:\apache-tomcat\shared\classes\alfresco\web-extension
    An example file called: custom-slingshot-application-context.xml.sample is provided which you can edit.

    Reply
  10. Born Konrad

    Hi Kevin,

    thanx for your answer – but I have exactly the same problem as Arno Hagen:
    Wiki pages accept ‘IFRAME’ tags, but Blog pages do not (and my customer would like to blog posts with youtube content – which does no work)
    I also found this JIRA issue which also indicates me that the problem is something different:
    https://issues.alfresco.com/jira/browse/ALF-17862

    Any idea???

    Reply
  11. Daniel Oderbolz

    Dear Kevin,

    thanks for the interesting article. I found that Share also applies this “santitation” with any HTML file that is uploaded (4.2c CE). Now my customer has created this with word, and guess what – many tags are gone.
    Also, it seems that the process does not work properly – in many cases, the opening tag is in upper case but the closing tag is still lowercase.
    Since Word creates an XHTML file, the tags must (by standard) be lowercase anyway, so the process in effect breaks the document…
    In principle I can understand the reasoning behind this, but it frightens people if suddenly a document is altered in Alfresco for no apparent reason.

    Just my 2 cents (I wil feed Jira)

    Cheers,
    Daniel

    Reply
  12. Kevin Roast Post author

    Hi Daniel,

    Actually I agree – however we only sanitise HTML/XHTML documents on display in the browser, not on download. So we are only modifying the content for reasons of security when it is viewed directly in the browser. If you download the document the content is not modified.

    We recently upgraded the html parser library used for this process and it looks like there is a regression in that we now see mixed case start/end tags which you have identified. I will ensure this is fixed for 4.2.d.

    Reply
  13. Daniel Oderbolz

    Dear Kevin,

    thanks, I saw your activity in Jira, thats good news!
    However, in our 4.2c CE, the HTML is also changed when you download the file in Share (this is how we found out).

    Cheers,
    Daniel

    Reply
  14. Andreas Amstutz

    Kevin,
    I need to backport this to the enterpise version 4.0.2.9 as one of our customers wants to upload/download reports(html-files) generated in some other tool.

    For a quick test I built the spring-webscripts-1.2.0-SNAPSHOT.jar from the sources and replaced the version that comes with 4.0.2.9 with this snapshot. Unfortunately that did not work very well – got errors related to LocalWebScriptRuntimeContainer.addExtensibilityDirectives at server startup.

    How are the chances to get alfresco support provide a patch (generated via svn) that could cover that task?

    Reply
    1. Kevin Roast Post author

      >How are the chances to get alfresco support provide a patch (generated via svn) that could cover that task?
      It would be possible to generate a SP that had this backport in, please generate a request via support.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>