Friday 21 February 2014

Is contextual output escaping too hard?

So whilst writing my post on Unobtrusive Javascript and JSON Data Islands I got to thinking about how successful contextual output escaping (or encoding) is as a tactic to mitigate XSS vulnerabilities.

I think there is an argument that it hasn't been very successful, and I don't think it is too hard to imagine the reasons for this:
  • As a mitigation to XSS it involves a task that is essentially difficult.  Knowing the context (either HTML, HTML attribute, JavaScript, CSS, etc.) where you are binding data into a template, either involves writing an HTML parser (to do it properly), or have developers manually specify the context for each binding site.  That is not a solution as much as it is a whole new problem.
  • There is a reason why there is no "contextual SQL escaping" mitigation promoted by the security community, and why parameterised queries is the mandated mitigation to SQL injection.  It's because history has shown us that people are terrible at combining data and code and that the best option is to let the component that executes the code (and hence has the best code parser) do the combining for us.  XSS is the same problem, combining data with code (HTML), which suggests that the browser should be combining the data and code(1) and we shouldn't be suggesting people do this themselves.
Of course it's all very well saying that people should architect their web applications to have the browser combine code and data, but developing web applications is a complex problem and there are lots of considerations to take into account when architecting the application, not just security.  My impression is that the main considerations are performance and convenience.

Performance is tough to comment on, and depends on many things, but potentially moving more work to the client-side could improve server performance (you could argue that client-side performance might suffer, and thus user experience, but JavaScript performance only improves).  Though there is no doubt that some businesses are very sensitive to page load times.

Convenience isn't a good reason in my opinion.  It might not be convenient but that doesn't mean it can't be convenient.  If a web application framework was designed from the ground up to let the browser combine code and data then likely they would (eventually) do it in a way that was convenient.  Of course relying on frameworks means getting popular frameworks(2) to change, and I think the security community has a role to play there.

I will say that I've spent a good deal of time telling businesses to use contextual output escaping, and it is always difficult to challenge a long held belief, but there is evidence and reasoning to suggest that parameterising HTML is the better mitigation.  The jury is still out in my mind, but I would be keen to hear other points of view?

(1) See also "Insane in the IFRAME - The case for client-side HTML sanitisation" (video - slides) from OWASP AppSecEU 2013.
(2) That's not to say there aren't some web application frameworks that have contextual output escaping built-in or support data binding on the client i.e. JavaScript templating engines (although I'm not sure the browser does the escaping), but I think (but am not an expert) that most don't.  There are a lot of web application frameworks out there.

No comments:

Post a Comment