Tuesday, 26 March 2013

Categories of XSS

So I was reading an article recently that talked about finding XSS where the malicious script was injected via a cookie.  This got me thinking about which of the standard categories of XSS - Stored (or Persistent), Reflected (or Non-persistent) and DOM-based - that fell into.  It wasn't immediately clear as it was persistent, but stored on the client, but the fix would need to be in the JavaScript which makes it similar to DOM-based XSS.  This got me thinking about the basis for the standard categories or types of XSS.

I think the first thing to point out is that at some time in history someone, or more likely a bunch of different people, discovered XSS and thought of it as a single type of attack.  It's an interesting thought experiment to imagine you find that you can inject JavaScript into a web page, and that as far as you know no one has done it before.  I imagine I would think of it as a code injection attack, where the code just happens to be JavaScript, and the novelty would be that I proxied the attack via the web server and that the victims were other users of the web server, rather than the server itself.  Potentially I might not even think it was that interesting running arbitrary JavaScript on a client, as it wouldn't have been comparable to the usual arbitrary code execution attacks that existed already.  But then if I could have predicted how popular the web would become, and what it would be used for, well I would have <insert genius idea> and profited.

I'm going to make the completely unfounded assumption that Stored XSS was discovered first, as it's considered the more serious attack (which for now I'll state without justification) and then Reflected XSS was considered a variant type of attack, but different to Stored XSS because it required user interaction.  But really the whole user interaction thing is a fairly weak argument these days, with the proliferation of ad-networks inserting arbitrary HTML in popular websites and 'watering hole' type attacks, a user just surfin' the net is far more likely than ever to hit upon a malicious iframe that exploits a reflected XSS attack.  But granted, the user does have to be logged in to the web site so the window of opportunity for the attacker is smaller in a Reflected XSS attack.

Comparing to a Stored XSS attack, well the user is most likely already logged in to the web site so the attacker's window of opportunity is much larger, however that attacker could face a second issue, the victim still has to navigate to the page with the injected script, so the attack is location limited.  It's interesting to note that Reflected XSS is not location limited, since the attacker will direct the victim to precisely the page with the vulnerability.  But then it doesn't actually make sense to say Stored XSS is location limited if Reflected XSS isn't, because whatever vector of attack Reflected XSS can use, so can Stored XSS i.e. a malicious iframe can point to a web page with Stored XSS.

It would seem then that we should categorise XSS by the vectors of attack that can be used by an attacker to exploit a victim.

Possible types of XSS when the attacker manipulates ...
XSS Type the victim's response the victim's request the victim's DOM properties
Stored XSS Y Y N
Reflected XSS N Y N
DOM-based XSS N Y Y

So it might not be clear what I mean by "the victim's response", what I mean is that the attacker does not control the victim's request, but is able to include data of their choosing in the response to the victim's request.  This is the standard Stored XSS scenario where the data comes from the DB for instance.  However Reflected XSS isn't possible because the attacker doesn't control the request and DOM-based isn't possible because either the vulnerable DOM property is set by the server, or set by the client with data from the server.

When the attacker manipulates "the victim's request", Stored, Reflected and DOM-based XSS is possible.  Stored XSS because the attacker can direct the victim to the infected page.  Reflected XSS because the attacker controls the parameters of the request.  Finally, DOM-based XSS because the attacker can set DOM properties that aren't sent to the server e.g. fragment identifier if the attacker can specify the URL, window.name if the attacker makes a request from script.

In the case of "the victim's DOM property", what I mean is a DOM object property that is set client-side and not set by (or sent to) the server.  In this case only DOM-based is possible, as the DOM property is not set by the server so Stored or Reflected XSS isn't possible.  This is in fact a special case, or a subset, of the "the victim's request" scenario.

I've taken a fairly strict definition of DOM-based XSS here, as I think this is required to truly separate it out from Reflected XSS.  My logic is that if the malicious data comes from, or goes via, the server, then the attack can be considered to come via the server in the same way as Stored or Reflected XSS respectively, but if the malicious data never actually gets sent to the server (e.g. it's in the fragment identifier, window.name, see domxsswiki for more examples) then that is quite plainly a different attack vector.  Note that for DOM-based XSS a request still needs to be made to the server (most likely), so it shares that in common with the other attack vectors, but there is no malicious data sent as part of that request, unlike the other attack vectors.

There are other definitions of DOM-based XSS that are looser, that include any DOM property, even if coming via the server, and that's fine for most practical purposes.  For the purpose of categorisation though I think being stricter helps understand how the different attack vectors for XSS fit together.

So what about the cookie-based XSS?  Well if an attacker finds a way to set a victim's cookie such that client-side JavaScript reads the malicious cookie value resulting in code injection, I would say that the value of cookie goes via the server so that is Reflected XSS.  Whilst I think that is probably the most common scenario, I can imagine scenarios where you would call it Stored XSS and other scenarios where it would be DOM-based XSS.

So there you have it, my opinion on the different categories of XSS, some people will agree and some will disagree.  Generally speaking the security industry has settled on some loose categories of XSS, and that's fine, it's not clear we need strict category definitions, but it can be an informative exercise to consider them.

NB: I refuse to use Type-0, Type-1, or Type-2 names for the different types of XSS, they are meaningless names that only serve to confuse.


Saturday, 13 October 2012

The horrible asymmetry of processing HTML

So I've got my developer hat on for the purpose of this blog post.

If you were designing a client/server system where you had to transport a representation of an object from the server to the client, I'm almost certain that the code you have that serialises and de-serialises that object would be identical on both the client and server.  As an example imagine a web service, the representation of a request and response object defined in XML Schema and instantiated in XML, but both client and server code would only ever deal with objects and not directly with XML i.e. the serialisation and de-serialisation is handled automatically.

So I had a kind of "Huh?, that's weird" moment when thinking about how the majority of web application frameworks handle HTML.  Most frameworks have an HTML template on the server side and then some scripting language interspersed in the template that customises the template for that request.  So that's how an HTML representation of a web page is generated on the server side, but how is that representation processed on the client side?  Well, as we all know, the browser (or User Agent (UA)) creates a Document Object Model (DOM) from the HTML and from this internal object representation displays the web page to the user.

So the client side UA receives a serialised representation of a the web page object (in HTML) and de-serialises that to a DOM (object representation) for processing.  However, the server side starts with a serialised version of a the web page object (in HTML) and makes direct edits to the serialised version of the object (i.e. the HTML) before sending it to the client.

The lack of symmetry hurts that part of me that likes nice design.

So either the way web application frameworks work is particularly clever and those of us that have be serialising and de-serialising objects symmetrically in every other system we build have really missed a trick.  Or, web application frameworks have evolved from the days of static HTML pages into a system where serialised objects are edited directly in their serialised form, a design that would be universally mocked in any other system.

Now to be fair web application frameworks have evolved into convenient and efficient systems so it could be argued that this justifies the current design.  I would worry though that that is an institutionalised point of view, since making changes to HTML (the serialised web page object) directly is all we have every known.  Of course it's the right way to do it, because it's the only way we've ever done it!

I'll be the first to raise my hand and say I'm not sure exactly how you might go about about generating web pages in a convenient and efficient way in code, before serialising it in HTML, but I certainly don't think it's impossible and I accept that any initial design would be less convenient and efficient to what we already have (but that seems inevitable and would change as we get experience of the new system).

Now for all I know there may be great reasons why we generate the web pages the way we do, but if there are I'm guessing they're not widely known.  I can certainly think of some advantages to manipulating a web page as an object on the server before serialising it and sending it to the client, and if you think about it for a bit, perhaps you can too?

Sunday, 7 October 2012

Scaling Defences

In defending against vulnerabilities in code there is concept probably best summed up by this quote:
We only need to be lucky once. You need to be lucky every time. 
IRA to Margaret Thatcher 
The concept is basically that an attacker needs to find just one vulnerability amongst all possible instances of that vulnerability, whereas a defender needs to mitigate every instance. If you consider this in the sense of a single vulnerability type e.g. buffer overflows, then I don't think it's true.  The reason I don't think it's true is that the actual problem is the way we create our defences, that is the security controls we put in place to mitigate the vulnerability.

Take the buffer overflow example.  If a developer tries to defend against this vulnerability by ensuring that each instance where he writes to his buffers, that his code never writes beyond the boundaries of  his buffer, then if he fails to do this correctly in just one instance, then it might be possible for an attacker to find and exploit that vulnerability.  But what if that developer is programming in C#?  There is no need for the developer to be right everywhere in his code, as C# is (effectively) not vulnerable to buffer overflow attacks.  So if we can choose the right defence, we don't need to mitigate every instance.

For me the next question is what differentiates defences that are 'right'?  I would argue that one important criteria, often overlooked, is scale.  Taking the buffer overflow example again, if the developer has to check his bounds every time he writes to a buffer, then the number of vulnerabilities scales with the number of times the developer writes to their buffers.  That's a problem that scales linearly, that is, if there are Y buffers referenced in code, then the  number of places you have to check for vulnerabilities is X = aY, where a is the average number of times a buffer is written to.  Other common compensating security controls we put in place to make sure the developer doesn't introduce a vulnerability also tend to scale linearly; code reviewers, penetration tests, security tests, etc.  By this I mean if you have D developers and C code reviewers, then if you increase to 2D developers you will likely need 2C code reviewers.

If you choose the 'right' defence though, so for example using C# or Java, then you don't need either the developer or compensating controls worrying about buffer overflows (realistically you would probably have some compensating controls for such things as unsafe code).  Note, I'm suggesting changing programming languages is practical solution, I'm just trying to give an example of a control that completely mitigates a vulnerability.

Below is a graphical representation of how the costs of certain types of security controls (defences) scale with the number of possible locations of a vulnerability.

The red line is showing the cost of a security control that scales linearly with number of possible locations of a vulnerability. The blue line is the cost of the 'right' security control, including an initial up-front cost.

The purple line is for another type of security control, for situations where we do not have a 'right' security control that mitigates a vulnerability by design.  This type of security control is one where we make accessible, at the locations where the vulnerability might exist, some relevant information about the control.  For example, annotating input data with their data types (which are then automatically validated).  If this information is then available to an auditing tool where it can be reviewed, then the cost of this type of control scales in a manageable way.

What is also interesting to note from the graph is that the red line has a lower cost initially than the blue line, until they intersect.  This implies that until there are sufficient number of possible locations for a vulnerability, it is not worth the initial cost overhead to implement a control that mitigates the vulnerability automatically.  This perhaps goes some way to explaining why we use controls that don't scale, as the controls are just the 'status quo' from when the software was much smaller and it made sense to use that control.

My main point is this; when we design or choose security controls we must factor in how the cost of that control will scale with the number of possible instances of the vulnerability.

Sunday, 9 September 2012

The Model Developer

In security we spend a lot of time focusing on attackers and imagining all the possible ways they might be able to compromise an application or system.  While I think we are currently immature in our ability to model attackers, the industry does seem to spend some time thinking about this, and generally ends up assuming attackers are very well resourced.

I come from a cryptographic background, and in crypto you tend to define an adversary in a more mathematical way.  When designing a new crypto algorithm the adversary is effectively modelled as another algorithm that is only limited in the sense it cannot be 'computationally unbounded' and it does not have direct access to the secrets of the crypto algorithm.  Apart from that no real assumptions are made and it is most certainly expected that the adversary is much smarter than you.

For all the time we spend thinking about what attackers can do, I wonder if we should also spend some of that time modelling what developers can do.  Developers, after all, are an essential part of the systems we build.  Let's try and model a developer:
  • Knowledge.  Developers understand how to code securely.
  • Experience.  Developers have experience in coding securely.
  • Time.  Developers are given sufficient time to write secure code.
  • Priority.  Developers prioritise security over functionality.
  • Consistency.  Developers code security the same way every time.
  • Reviewed.  Developer code is thoroughly reviewed for security.
  • Tested.  Developer code is thoroughly tested for security.
[We are actually modelling more than just a developer here, but also the environment or support structures in which they develop as that directly effects security too.]

How accurate does that model seem to you?  It would be great for people that design systems and their security if developers could be modeled in this way, it would make their jobs a lot easier.  Unfortunately it seems that people who suggest security controls for vulnerabilities sometimes are making an implicit assumption about developers, they have modeled the developer in a certain way without even realising it, and that model is often fairly similar to one given above.

My favourite example of this is when people say the solution to XSS is to output encode (manually i.e. all data written to a page is individually escaped).  When this is suggested as a solution it is implicitly modelling that developer as; knowledgeable about how to output encode, experienced in output encoding, has the time to do write the extra code, will make it a priority, will be completely consistent and not forget to output encode anywhere, will have their code thoroughly reviewed and tested.  Don't misunderstand me, some of these assumptions might be perfectly reasonable for your developers, but all of them?  Consider yourself fortunate if you can model a developer this way.

Much in the same way that we model an attacker to be as powerful as we can (within reason) when designing systems, I think we also need to model the developers of our system to be as limited as possible (within reason).  It's not that I want people to treat developers as idiots, because they are clearly not, it's that I'd like to see the design of security controls have an implicit (or explicit) model of a developer that is practical.

The Content Security Policy (CSP) is an example of a control that I think comes pretty close to having a practical model for developers, since; the developer requires knowledge about how to completely separate HTML and JavaScript and about how to configure the CSP settings, needs some experience using the CSP, has to take time to write separate HTML and JavaScript, doesn't need to prioritise security, doesn't need to try to be consistent (CSP enforces consistency), does require their CSP settings to be reviewed, does not require extra security testing.  The CSP solution does model a developer as someone that needs to understand the CSP solution and code to accommodate it, which could be argued is a reasonable model for a developer.

Ideally of course we want to model developers like this:
  • Knowledge.  Developers don't understand how to code securely.
  • Experience.  Developers don't have experience in coding securely.
  • Time.  Developers have no time to write secure code.
  • Priority.  Developers prioritise only functionality.
  • Consistency.  Developers code security inconsistently.
  • Reviewed.  Developer code isn't reviewed for security.
  • Tested.  Developer code isn't tested for security.
 If our security controls worked even when a developer gave no thought to security at all, then in my opinion that's a great security control.  I can't think of a lot of current controls in the web application space have this model of the developer.  In the native application world we have languages like .Net and Java that have controls for buffer overflows that model the developer this way, as developers in these languages don't even have to think about buffer overflows.  You might be thinking that's not a great example as developers are able to write code with a buffer overflow in .Net or Java i.e. in unsafe code, however I think we have to model developers to be a limited as possible, within reason, and the reality is it is a sufficiently corner case scenario that we can treat it like the exception it is.

Modelling developers in a way that accounts for the practical limitations they face leads me to believe that creating frameworks for developers to work in, a sand-boxed environment if you will, allows for security controls to be implemented out of view of developers, enabling them to focus on business functionality.  A framework allows a developer to be modeled as requiring some; knowledge, experience, and testing, but minimal; time, priority and consistency.  A framework does still have substantial demands for review though (although I think automating reviews is the key for making this manageable).

If we can start being explicit about the model we use for developers when we create new security controls (or evaluate existing ones) we can hopefully better judge the benefits and effectiveness of those controls and move closer to developing more secure applications.

Drupal 7 security notes

So I just put together a page on Drupal 7 Security.  It doesn't require a lot of existing knowledge about Drupal, but some appreciation would probably help - at least knowing that Drupal is extendable via Modules and customisable via Hooks.

The notes were created so I could give some advice on securing Drupal 7, and since I didn't have any knowledge about Drupal security, the goal of the notes is to bring someone up to speed on what mitigations or approaches Drupal makes available to solve certain security threats.

Here are the topics I cover:
The Basics
Sessions
User Login
Mixed Mode HTTP/HTTPS
CSRF
Access Control
Dynamic Code Execution
Output Encoding
Cookies
Headers
Redirects

What is interesting after you understand what Drupal offers, is to think about the things it does not offer.  I worry a lot  about validating input and if you use the Drupal Form API then you get a good framework for validation as well, similarly for the Menu system.  However for other types of input, GET request parameters, Cookies, Headers etc., you are on your own.  There are a variety of 3rd party modules that implement various security solutions e.g. security related headers etc., but it would be good if these were part of Drupal Core, as security should never just be an add-on.

Saturday, 28 April 2012

Why being a defender is twice as hard as being an attacker

So it occurred to me that being a defender is twice as hard as being an attacker (at least).  I don't mean that in an absolute or measurable sense of course, just in some sense that will become obvious.  I also will limit the context of that claim to applications although it may apply to other areas of security as well.

The goal of an attacker is to find vulnerabilities in an application.  An application is protected by defenders who design vulnerability mitigations and developers who implement functionality.  Of course an attacker only needs to find a single weakness and a defender needs to try to protect against all attacks, which itself would probably support my claim, but it's not what my point is going to be.

Conversely, a defender's goal is to minimise the number of vulnerabilities in an application.  Defenders attempt to realise this goal by designing defenses that both limit what the attacker can do and limit the flexibility the developer has.  However, it is not only attackers that will hack away at a defenders defenses, it's also the developer.  The point of this blog post is that developers show surprisingly similar characteristics to attackers when they create novel ways to circumvent the defense mechanisms defenders put in place.  After all developers have the goal of implementing functionality with the minimum amount of effort as possible, and defenses often make that more difficult (even if only marginally more difficult). 

Clearly the motivations are entirely different in the attackers and developers case, but at the end of the day the defenders are being attacked on twin fronts; by the attackers looking to get in and by the developers looking to break out.

Monday, 23 April 2012

CVSS doesn't measure up.

I was doing some basic research into software metrics the other day and I came across something that I was probably taught once but had long since forgotten.  It was to do with the way we measure things and is covered in the Wikipedia article on Level of Measurement.

Basically there are 4 different scales which are available to measure things:
  • Nominal scale - Assigning data to named categories or levels.
  • Ordinal scale - A Nominal scale but the levels have a defined order.
  • Interval scale - An Ordinal scale but the difference between, or units, of each level are well defined.
  • Ratio scale - An Interval scale but with a non-arbitrary zero-point.
Why these scales are interesting is that only certain type of math, and therefore certain conclusions can be drawn from what you measure, depending on what scale the measurements belong.  For instance we can order the finishing place of a horse race into 1st, 2nd, 3rd etc. (an Ordinal scale), but we can't meaningfully say what the average finishing place of a horse is as there is no magnitude associated with the difference between the levels.  If on the other hand the races were over the same distance, we could could measure the time the horse took to complete the race (a Ratio scale) and calculate it's average time.

Sometimes we have an Ordinal scale that looks like an Interval or Ratio scale, for instance when we assign a numeric value to the levels e.g. ask people how much they like something on a scale of 1 to 5.  But this is still an Ordinal scale, and although we can assume that the difference between each level is a constant amount, nothing actually makes that true.  Thus calculating the average amount that people like something e.g. 2.2, is often a meaningless number.

When reading about this I was reminded of the way vulnerabilities are categorised and how we would so dearly like to be able to assign numbers to them so we can do some math and reach some greater insight into the nature of the vulnerabilities we have to deal with.  The Common Vulnerability Scoring System (CVSS) suffers essentially from this problem; vulnerabilities are assigned attributes from certain (ordered) categories, and then a complicated formula is used to derive a number in a range from 1 to 10.  It is basically optimistic to think that a complicated formula can bridge the theoretical problem of doing math on values from an Ordinal scale.  I wouldn't necessarily go to the other extreme and say it makes CVSS totally without merit - just that it's not the metric you likely wish it was.