SitePen Support
Orbited

Independence Day: HTML5 WebSocket Liberates Comet From Hacks

by Michael CarterJuly 4th, 2008

A recent set of HTML5 discussions are changing the course of Comet. First, a recap of the last two years of Comet: With long-polling we set the bar to cross-browser push. With XHR streaming and ActiveXObject(’htmlfile’) we raised it to cross-browser streaming. With SSE we’ve been trying to raise the bar to native, cross-browser streaming. And there we’ve sat, hoping that browser vendors actually implement the latest SSE spec.

I say we’ve been selling ourselves short. All this time pushing for a native server->client streaming transport, but we still lack client->server streaming, and anything resembling a standard transport for bi-directional communication. The Holy Grail of Comet development has always been native browser support of a full-duplex, single-connection communication’s channel, otherwise known as a TCP socket. But we’ve been mired down in hacks so long that we’ve lost the vision.

No Longer. The HTML5 specification now offers WebSocket, a full-duplex communications channel that operates over a single socket. I have been listening closely, and in some cases contributing, to the process of ensuring that WebSocket will:

  • Seamlessly traverse firewalls and routers
  • Allow duly authorized cross-domain communication
  • Integrate well with cookie-based authentication
  • Integrate with existing HTTP load balancers
  • Be compatible with binary data

The API of WebSocket is very straightforward. You create a WebSocket and point it at a url:

var conn = new WebSocket("ws://www.example.com/livedemo")

Then you attach three callbacks:

conn.onopen = function(evt) { alert("Conn opened"); }
conn.onread = function(evt) { alert("Read: " + evt.data); }
conn.onclose = function(evt) { alert("Conn closed"); }

And finally, you can send upstream data with a simple function call:

conn.send("Hello World")

The browser will perform an HTTP handshake with the target web server to determine support, and then a direct stream will be exposed via the onread and send functions. The uri scheme “ws://” is used for WebSocket connections, and the “wss://” URI scheme is for secure WebSocket connections.

After the handshake, bi-directional framed communication ensues. Each frame can be either binary or text, thus allowing for swapping the encoding mid-stream. You can find more information about the protocol itself at the network section of the whatwg HTML5 draft page

While the HTML5 specification is not in a finalized stage, the first public draft was published by the W3C in January 2008, and browser vendors have already began targeting features in the specification. The idea of putting a duplex channel into the spec is not a new one; the TCPConnection API and protocol was initially drafted more than two years ago. Unfortunately there were many significant problems with TCPConnection that held back browser adoption. Ian Hickson, the editor of the HTML5 specification, tackled these problems head on and under his guidance the standard has evolved to a usable state with WebSocket. About this new feature, Mr. Hickson comments:

“I’m looking forward to seeing Web Socket implemented in browsers, as I think it’s going to enable all kinds of realtime applications like chatting, remote controls, and the like, without the ridiculous hacks authors have to use today.”

Comet

WebSocket in a browser is terrific because it drastically cuts down the complexity of the Comet server by an order of magnitude. What’s more though, it provides a straightforward, understandable API to JavaScript developers. The most important part of the specification is that developers can wrap their heads around the API in about five seconds. That’s because it looks so much like a socket.

If the future prospect of a native WebSocket isn’t enough good news, I am proud to announce that the Orbited project has implemented WebSocket for all major browsers, today. We do this by communicating over various Comet transports with the browsers, then performing the WebSocket handshake with the remote server, and proxying data in between. This means that today you can write a WebSocket server and application, start Orbited up, and be on your way. Tomorrow, you won’t need to change any of your server or client code whatsoever. Your application will fall forward to the native implementation of WebSocket for improved performance.

TCPSocket

The single most voiced criticism to this specification has been that a WebSocket isn’t quite the same as a raw TCP socket, because a WebSocket server needs to understand a specific handshake in order for browsers to connect directly, and as such a WebSocket can’t connect to existing servers. If we did allow raw TCP sockets in the browser, a malicious site could cause any visitors to open up a TCP connection to an SMTP server, for instance, turning a casual web visitor into a spambot. There are many variations on this scenario, but the general problem is that a raw socket connections in a browser will allow any sites that a user visits to access network services as if they were the user, in the same network security context as the user. We need to therefore make this an opt-in process or we’ll catch existing servers off-guard. Furthermore, very few protocols have any kind of cross-domain authorization or security mechanisms built in. If we were to allow raw TCP, then we would be opening all manner of cross-site security holes. We could fix these by limiting TCP connections to the origin domain and port, (meaning a direct sockets back to the webserver only) but that would limit any usefulness the TCP socket could provide.

I fully understand the criticism though; Earlier this week I discussed exactly why having a raw socket in the browser is so desirable. You could, for instance, quickly prototype a Gmail clone using a raw socket, an IMAP client, and an XMPP client in the browser.

We have a clear problem then: Direct access to existing network servers could greatly simplify application architecture, but due to security restrictions it’s a non-starter; we absolutely must retrofit each network server with the new WebSocket protocol first. I hope that happens, but we can’t count on it, at least not right away. What we really need is a way to allow the server to opt-in without putting it in the protocol, a way to seamlessly layer access control in front of the back-end server. It turns out that this problem has already been solved for traditional networks. That is, if we have two end-points communicating over TCP, and we need transparent access control in between, then we can use a well known device: A firewall. The beauty of a firewall is that server behind it requires no re-programming, or even re-configuration, yet gains all of the access control/security benefits. What we really need in the browser case, is a custom firewall that can listen for WebSocket connections from the browser, enforce access control, and relay TCP to a back-end server.

That is why Orbited provides this feature under the API name TCPSocket. Orbited is the firewall that sits between the back-end server and the browser. It understands WebSocket protocol for browser communication, and uses whitelist security to accept or reject requests to proxy TCP data to and from a back-end server. That’s right, you can fire up a stock XMPP server, and Orbited, and write the XMPP client entirely in JavaScript. This works cross-browser today. We also offer a binary mode that uses an intermediate encoding to allow the browser to read raw bytes (in the form of JavaScript integer arrays) from a remote server. Here is a diagram of the architecture:

Orbited is a Web Firewall

The Future

Now its up to the browser developers to implement Websocket. I expect some will be very quick on the uptake, while it may take years for others. I expect to see a common pattern emerge where application servers listen to the WebSocket protocol directly from new browsers, but fallback to Orbited’s emulation layer for legacy browsers. The key here is that we don’t have to wait on the browser vendors to get started. We can all develop these WebSocket applications now, and when browsers have native support, we’ll all get a performance boost.

We will probably never get a native (raw) TCP socket in the browser, for the security reasons I outlined. It’s okay though — we can use the firewall pattern I outlined. For more information about installing and configuring Orbited, check out the documentation and the getting started section.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]
Orbited

16 Responses to “Independence Day: HTML5 WebSocket Liberates Comet From Hacks”

  1. Martin Tyler Says:

    I’ve not had time yet to look at the spec for this, but you seem to be mainly talking about browsers implementing this. Surely every proxy out there needs to somehow support this.. and what does a server wanting to support it need to do? You, or maybe ‘they’, seem to be presenting it as a browser thing, but it needs support on the server too surely.

  2. Frank Salim Says:

    Martin:
    In the article, it says that WebSocket has a protocol designed to work through current proxies. Also, until there are servers supporting WebSocket, Orbited provides backwards compatibility. It has a WebSocket to TCP bridge.

  3. Martin Tyler Says:

    The point of this is to get a socket, not a socket wrapper over comet ‘hacks’ though.

    Ok, I have now read the spec. It uses the existing CONNECT command when going through a proxy, which is intended for SSL connections. Thats something I played around with years ago, but obviously only things like signed applets can make connections to the proxy directly like that.

    With no proxy configured it just makes a straight connection to the server.

    The communication isnt a straight socket in that you couldnt connect to an existing server and handle its protocol - the websocket protocol requires specific headers and encoding (although pretty simple).

    So this looks just like a much better way to implement a comet server, rather than being able to connect browsers to existing servers.

    So anyone know an ETA on browsers supporting this?

  4. Metal Hurlant Says:

    > We will probably never get a native (raw) TCP socket in the browser, for the security reasons I outlined.

    I question the validity of this statement. Every browser with Flash 9 installed has support for raw TCP sockets, and the security model it relies on doesn’t seem to have the security problems you allude to. (it uses socket-served crossdomain.xml files.)

    You mentioned how neat it would be to have an XMPP client in the browser.
    There’s an open-source project that does just that already: http://www.igniterealtime.org/projects/xiff/

    In truth, with so much of HTML 5’s spec aiming squarely at catching up with Flash functionality, I’m a little bit amazed to see several people behind the spec act in their html 5-related blog posts as if Flash doesn’t exist.

    It does exist, it has already implemented a lot of the new exciting stuff HTML 5 promises, and it seems peculiar not to learn from it before re-implementing your own flavor of it.

    (As a fun aside, the SVG 1.2 draft spec hints at some support for raw TCP socket support. It’s obviously not popular to acknowledge Flash, but their source of inspiration is fairly obvious to anyone who knows the flash APIs and bothers to read their spec http://www.w3.org/TR/2004/WD-SVG12-20041027/ )

  5. David Davis Says:

    I think It should use an observer pattern, so you don’t have to wrap it in a scope.

    conn.addListener( ‘open’, this.handleOpen, this );
    conn.addListener( ‘read’ this.handleRead, this );
    conn.addListener( ‘close’, this.handleClose, this );

  6. Jacob Rus Says:

    David: It does. Notice that the spec says “WebSocket objects must also implement the EventTarget interface.”

  7. RogerV Says:

    WebSocket as a much superior bi-directional, full-duplex Comet is superb just by itself. Not particularly concerned about trying to get JavaScript in a browser to consume legacy TCP services at the server side.

    I’ve been waiting about five years for the web to get a protocol like WebSocket so that web apps can be implemented to be like distributed rich client messaging-based apps.

    What will be way cool is to keep writing Flex apps today to Adobe’s BlazeDS remoting and messaging abstraction layer, and then down the road get an automatic upgrade in performance when BlazeDS and the FlashPlayer implement WebSocket support.

    Finally the W3C has something cooking that is very worthwhile. Otherwise I’d written W3C off as completely hopeless.

  8. David Davis Says:

    Jacob: Ah I see

    Thanks

  9. GregWilkins Says:

    I think this is a huge step backwards. Sockets are simply not the right level of abstraction that we want to expose in javascript. Asynchronous socket programming is hard, with one of my favorite examples being, what does a programmer do with 3 byte of a 6 bytes UTF-8 character?

    Currently, nice native code in the browser handles our dataframing and character conversion for us. Websockets is putting that difficult burden onto the javascript framework.

    HTTP gives us a lot of support with character encoding and data framing. Many of the so-called hacks take great advantage of this and use the efficient implementations in servers and browsers. Dropping the semantic of the conversation to raw bytes on the wire will just turn js framework developers into async IO programmers and introduce years of instability while they learn how hard that is!.

  10. Frank Salim Says:

    Greg:
    The authors of the WebSocket spec anticipated this, so WebSocket differs from a TCP socket* in that it is text-based and framed.

    http://www.whatwg.org/specs/web-apps/current-work/multipage/comms.html#ws-sd-framing

    You will see that the browser hides buffering so that JavaScript only deals with complete frames (which are strings). This makes sending and receiving complete JSON objects or XML very easy. As a matter of fact, this sounds like _exactly_ what you want.

    *WebSocket is on top of TCP — it isn’t reinventing the wheel

  11. GregWilkins Says:

    OK, I stand corrected.

    They need to work on there names and descriptions. The name Socket is misleading and their text talks about bytes being available etc.

    So this is a messaging system then, were the messages are just strings that can be XML, Json, javascript or something else. Well that I like!

  12. Orbited Blog » Blog Archive » Talk at OSCON 2008 Says:

    [...] Independence Day: HTML5 WebSocket Liberates Comet From Hacks [...]

  13. Comet Daily » Blog Archive » Comet Gazing: WebSocket Says:

    [...] time, we’ve asked our contributors the question: “With the recent addition of WebSocket to the HTML 5 recommendation, what impact will this have on your Comet implementation in both the [...]

  14. Orbited Blog » Blog Archive » Our Ancestor’s Secrets: WebSocket Article and Panel Says:

    [...] on WebSocket for the Silicon Valley Web Builder blog. He talks about his experiences explaining WebSocket to developers, and how we can “recover our ancestor’s secrets” of good [...]

  15. Comet Daily » Blog Archive » Dojo and WebSocket Says:

    [...] HTML5 defined WebSocket has been gaining in popularity because of its efficient and intuitive approach to Comet. Orbited [...]

  16. John Bailo Says:

    I’ve been trying to avoid Comet altogether until someone came up with a true means to invoke callback functions remotely.

    This seems like the real thing — closer to RMI then a “long polling” hack like Comet.

Leave a Reply



Copyright 2009 Comet Daily, LLC. All Rights Reserved