Configuring the Share HTML processing black/white list

Alfresco Share has a number of features to protect against XSS (Cross Site Scripting) attacks, session hijacking and similar. One of the most aggressive features is the automatic processing of 3rd party HTML to “sanitise” or “strip” out unwanted HTML tags and attributes before rendering in the page. By 3rd party HTML, I mean any HTML content that is displayed in Share that is sourced from a node content stream – such as a Wiki page, Blog post or Discussion post. So any content that may be user edited or could come from any source (not just Share itself!)

This is a well tested feature that handles all commonly known XSS attack holes and many less well known ones – including all the attack vectors listed here: http://ha.ckers.org/xss.html

One of the downsides to this, is the stripping of some otherwise useful HTML attributes and elements is mainly to support issues in legacy browsers such as IE6 and IE7. Consider the STYLE attribute – not a problem attribute you would assume, how could setting a STYLE cause an XSS attack?! Well in IE8, FireFox, Safari, Chrome etc. it can’t. But in IE6/7 Microsoft in their wisdom allowed JavaScript to be inserted into a STYLE attribute (called “CSS Expressions” – a better name would have “CSS Hacks”). This is a potential XSS hole that only affects those legacy browsers – but the HTML stripping process cannot rely on your browser agent (which of course could be faked) so must always assume the worst and strip those STYLE attributes.

For the majority Alfresco users who discarded IE6 (or even just IE…) long ago, why should they be punished with this limitation? And it is an annoying limitation, as most of the in-line editing capabilities of TinyMCE and other in-line editors that can potentially be used with Alfresco use STYLE attributes to apply formatting to much of their generated content.

In Alfresco 3.4.9/4.0.2 and onwards, it is now possible to fully configure the black/white list of HTML tags and attributes that the HTML stripping process will use.

This is the default configuration this is applied OFTB:

      <!-- the set of HTML tags considered safe for rendering when mixing with existing client-side output -->
      <!-- NOTE: define all tags in UPPER CASE only -->
      <property name="tagWhiteList">
         <set>
            <value>!DOCTYPE</value>
            <value>HTML</value>
            <value>HEAD</value>
            <value>BODY</value>
            <value>META</value>
            <value>BASE</value>
            <value>TITLE</value>
            <value>LINK</value>
            <value>CENTER</value>
            <value>EM</value>
            <value>STRONG</value>
            <value>SUP</value>
            <value>SUB</value>
            <value>P</value>
            <value>B</value>
            <value>I</value>
            <value>U</value>
            <value>BR</value>
            <value>UL</value>
            <value>OL</value>
            <value>LI</value>
            <value>H1</value>
            <value>H2</value>
            <value>H3</value>
            <value>H4</value>
            <value>H5</value>
            <value>H6</value>
            <value>SPAN</value>
            <value>DIV</value>
            <value>A</value>
            <value>IMG</value>
            <value>FONT</value>
            <value>TABLE</value>
            <value>THEAD</value>
            <value>TBODY</value>
            <value>TR</value>
            <value>TH</value>
            <value>TD</value>
            <value>HR</value>
            <value>DT</value>
            <value>DL</value>
            <value>DT</value>
            <value>PRE</value>
            <value>BLOCKQUOTE</value>
            <value>BUTTON</value>
            <value>CODE</value>
            <value>FORM</value>
            <value>OPTION</value>
            <value>SELECT</value>
            <value>TEXTAREA</value>
         </set>
      </property>
      <!-- The set of HTML tag attributes that are to be removed before rendering -->
      <!-- NOTE: define all attributes in UPPER CASE only -->
      <!-- IMPORTANT: JavaScript event handler attributes starting with "on" are always removed -->
      <property name="attributeBlackList">
         <set>
            <value>STYLE</value>
         </set>
      </property>
      <!-- The set of HTML tag attributes that are considered for sanitisation i.e. script content removed -->
      <!-- NOTE: define all attributes in UPPER CASE only -->
      <property name="attributeGreyList">
         <set>
            <value>SRC</value>
            <value>DYNSRC</value>
            <value>LOWSRC</value>
            <value>HREF</value>
            <value>BACKGROUND</value>
         </set>
      </property>

As you can see it’s quite a list. The import config for STYLE attribute processing is here:

      <property name="attributeBlackList">
         <set>
            <value>STYLE</value>
         </set>
      </property>

So simply override the black list in the stringutils bean in your custom-slingshot-application-context.xml file – generally found in \tomcat\shared\classes\alfresco\web-extension – as detailed in previous blog posts:

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans-2.0.dtd'>

<beans>

   <!-- Override HTML processing black list -->
   <bean id="webframework.webscripts.stringutils" parent="webframework.webscripts.stringutils.abstract"
         class="org.springframework.extensions.webscripts.ui.common.StringUtils">
      <property name="attributeBlackList">
         <set></set>
      </property>
   </bean>

</beans>

Restart the Share web-application and STYLE attributes will no longer be removed by Share.

Tags: , , , , , , , ,

20 Responses to “Configuring the Share HTML processing black/white list”

  1. Olivier Says:

    Hi Kev,

    Thank you for this article, this is exactly what I need now. However saving custom-slingshot-application-context.xml, I get the following arror from alfresco when restarting:

    ERROR [web.context.ContextLoader] [pool-2-thread-1] Context initialization failed
    org.springframework.beans.factory.BeanDefinitionStoreException: Invalid bean definition with name ‘webframework.webscripts.stringutils’ defined in file [
    /home/alfresco/tom_shared/classes/alfresco/web-extension/custom-slingshot-application-context.xml]: Could not resolve parent bean definition ‘webframework.webscripts.stringutils.abstract’; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No bean named ‘webframework.webscripts.stringutils.abstract’ is defined

    The parent bean seems missing. Am I missing something in my configurations ?

    I’m running Alfresco 4.0.0.d.

    Regards

  2. Kevin Roast Says:

    You need to be running a more recent version unfortunately, this is a relatively new feature.

    Either a HEAD nightly build or one of the Enterprise builds such as 4.0.3.

  3. Olivier Says:

    Thank you. We are using the community 4.0.0d, I guess I’ll have to wait for the next update :-/

  4. Kevin Roast Says:

    Yes, the development blogs do tend to be around the newer features. Fortunately the good news is that a new Community release is coming in the next few weeks. You are welcome to try out a nightly build until then to test the feature if you like.

  5. Kev’s blog » Blog Archive » Alfresco Community 4.2 – Shiny. Fast. Awesome. Says:

    [...] Kev’s blog A blog from Alfresco Engineering « Configuring the Share HTML processing black/white list [...]

  6. Media Viewers - extending Alfresco viewer capability | Loftux AB Says:

    [...] same for plain text files (rendered in browser instead of transformed to flash). Note that Alfresco sanitize some html tags for security [...]

  7. Media Viewers - exempel på hur det ser ut | Loftux AB Says:

    [...] (visade direkt i webbläsaren instället för konverterade till flash). Notera att Alfresco tvättar bort vissa html-taggar av [...]

  8. Arno Hagen Says:

    Hi Kevin,

    we implemented this (whitelisted the attribute) and it works fine for the Wiki module. But it still will be stripped out in the other TinyMCE instances (Blog, Discussion, HTML content). Any idea why?

    We assumed that this is a system wide configuration.

    We are on Enterprise version 4.1.1.9.

    Alfresco support is not helpful so far. Any hint is really appreciated.

    Thanks!
    Arno

  9. Born Konrad Says:

    I have the problem, that alfresco removes specific html tags (e.g. ‘iframe’) after saving content (in share, but also by working with WEBDAV, so this should be no SHARE issue), but I find no ‘custom-slingshot-application-context.xml’ File and I find no ‘StringUtils’ class in any Spring Context??!!
    Beside this blog, I find no information about this issue (which makes me wondering, that Alfresco has no documentation about this, how can that be???)

  10. Kevin Roast Says:

    >We assumed that this is a system wide configuration.
    It affects all Share processing of HTML, including Wiki, Blog, Discussion etc.

    >but I find no ‘custom-slingshot-application-context.xml’ File and I find no ‘StringUtils’ class in any Spring Context?
    You create the custom-slingshot-application-context.xml file in your web-extension folder (which is part of the installed tomcat – e.g. tomcat/W:\apache-tomcat\shared\classes\alfresco\web-extension
    An example file called: custom-slingshot-application-context.xml.sample is provided which you can edit.

  11. Born Konrad Says:

    Hi Kevin,

    thanx for your answer – but I have exactly the same problem as Arno Hagen:
    Wiki pages accept ‘IFRAME’ tags, but Blog pages do not (and my customer would like to blog posts with youtube content – which does no work)
    I also found this JIRA issue which also indicates me that the problem is something different:
    https://issues.alfresco.com/jira/browse/ALF-17862

    Any idea???

  12. Daniel Oderbolz Says:

    Dear Kevin,

    thanks for the interesting article. I found that Share also applies this “santitation” with any HTML file that is uploaded (4.2c CE). Now my customer has created this with word, and guess what – many tags are gone.
    Also, it seems that the process does not work properly – in many cases, the opening tag is in upper case but the closing tag is still lowercase.
    Since Word creates an XHTML file, the tags must (by standard) be lowercase anyway, so the process in effect breaks the document…
    In principle I can understand the reasoning behind this, but it frightens people if suddenly a document is altered in Alfresco for no apparent reason.

    Just my 2 cents (I wil feed Jira)

    Cheers,
    Daniel

  13. Daniel Oderbolz Says:

    In fact, someone else already did it:

    https://issues.alfresco.com/jira/browse/ALF-18696

  14. Kevin Roast Says:

    Hi Daniel,

    Actually I agree – however we only sanitise HTML/XHTML documents on display in the browser, not on download. So we are only modifying the content for reasons of security when it is viewed directly in the browser. If you download the document the content is not modified.

    We recently upgraded the html parser library used for this process and it looks like there is a regression in that we now see mixed case start/end tags which you have identified. I will ensure this is fixed for 4.2.d.

  15. Kevin Roast Says:

    The issue has now been fixed in SpringSurf. It will appear in 4.2.d during the next trunk merge.

  16. Daniel Oderbolz Says:

    Dear Kevin,

    thanks, I saw your activity in Jira, thats good news!
    However, in our 4.2c CE, the HTML is also changed when you download the file in Share (this is how we found out).

    Cheers,
    Daniel

  17. Kevin Roast Says:

    Yes – that was spotted also and already fixed for 4.2.d. Thanks.

  18. Andreas Amstutz Says:

    Kevin,
    I need to backport this to the enterpise version 4.0.2.9 as one of our customers wants to upload/download reports(html-files) generated in some other tool.

    For a quick test I built the spring-webscripts-1.2.0-SNAPSHOT.jar from the sources and replaced the version that comes with 4.0.2.9 with this snapshot. Unfortunately that did not work very well – got errors related to LocalWebScriptRuntimeContainer.addExtensibilityDirectives at server startup.

    How are the chances to get alfresco support provide a patch (generated via svn) that could cover that task?

  19. Kevin Roast Says:

    >How are the chances to get alfresco support provide a patch (generated via svn) that could cover that task?
    It would be possible to generate a SP that had this backport in, please generate a request via support.

  20. fabrizio Says:

    This solution is useful in 4.2.d to enable the background color on table.

    BTW the option align=”center” for the table still doesn’t work at all. any solution? (i’ve created a post in jira https://issues.alfresco.com/jira/browse/ALF-20059)

Leave a Reply


Alfresco Home | Legal | Privacy | Accessibility | Site Map | RSS  RSS

© 2012 Alfresco Software, Inc. All Rights Reserved.