Samadhic Security: The horrible asymmetry of processing HTML

So I've got my developer hat on for the purpose of this blog post.

If you were designing a client/server system where you had to transport a representation of an object from the server to the client, I'm almost certain that the code you have that serialises and de-serialises that object would be identical on both the client and server. As an example imagine a web service, the representation of a request and response object defined in XML Schema and instantiated in XML, but both client and server code would only ever deal with objects and not directly with XML i.e. the serialisation and de-serialisation is handled automatically.

So I had a kind of "Huh?, that's weird" moment when thinking about how the majority of web application frameworks handle HTML. Most frameworks have an HTML template on the server side and then some scripting language interspersed in the template that customises the template for that request. So that's how an HTML representation of a web page is generated on the server side, but how is that representation processed on the client side? Well, as we all know, the browser (or User Agent (UA)) creates a Document Object Model (DOM) from the HTML and from this internal object representation displays the web page to the user.

So the client side UA receives a serialised representation of a the web page object (in HTML) and de-serialises that to a DOM (object representation) for processing. However, the server side starts with a serialised version of a the web page object (in HTML) and makes direct edits to the serialised version of the object (i.e. the HTML) before sending it to the client.

The lack of symmetry hurts that part of me that likes nice design.

So either the way web application frameworks work is particularly clever and those of us that have be serialising and de-serialising objects symmetrically in every other system we build have really missed a trick. Or, web application frameworks have evolved from the days of static HTML pages into a system where serialised objects are edited directly in their serialised form, a design that would be universally mocked in any other system.

Now to be fair web application frameworks have evolved into convenient and efficient systems so it could be argued that this justifies the current design. I would worry though that that is an institutionalised point of view, since making changes to HTML (the serialised web page object) directly is all we have every known. Of course it's the right way to do it, because it's the only way we've ever done it!

I'll be the first to raise my hand and say I'm not sure exactly how you might go about about generating web pages in a convenient and efficient way in code, before serialising it in HTML, but I certainly don't think it's impossible and I accept that any initial design would be less convenient and efficient to what we already have (but that seems inevitable and would change as we get experience of the new system).

Now for all I know there may be great reasons why we generate the web pages the way we do, but if there are I'm guessing they're not widely known. I can certainly think of some advantages to manipulating a web page as an object on the server before serialising it and sending it to the client, and if you think about it for a bit, perhaps you can too?

Samadhic Security

Pages

Saturday, 13 October 2012

The horrible asymmetry of processing HTML

No comments:

Post a Comment