In part 2 of this series, Michael Carter agrees with Greg Wilkins on the need for a Comet standard, but says that adopting Bayeux as-is will lead to a “fractured community with competing and overlapping standards.”
Part 1: Greg Wilkins explains the need for Bayeux
Part 2: Michael Carter criticizes the current state of Bayeux
Part 3: Greg Wilkins responds to Michael Carter
Andrew Betts’ thoughts (from a related article)
Part 4: Michael Carter responds to Greg Wilkins
Part 5: Kris Zyp’s thoughts
Part 6: Alex Russell responds to Michael Carter
Part 7: Michael Carter responds to Alex Russell
Part 1: Greg Wilkins explains the need for Bayeux
Comet needs a standard. As Comet takes off over the next couple of years, the single most important action we can take is to standardize communication between Comet servers and clients. Given that this is my first priority, my second is to make that standard as good as possible. And I must therefore say that Bayeux, at least in its present form, is not the standard we want. There are a number of problems, both low-level issues with the protocol and high-level philosophical errors in the design of the protocol. We should act now to correct these problems before Bayeux becomes widespread.
In the course of the article I’ll show that Bayeux is a monolithic, “one-size fits all” approach to a deeply complex problem. The downside to this approach is that many developers will not adhere to the standard in the cases where it is not a good fit. We want all Comet developers to follow a standard, and the only way to ensure that is to design a standard that can adapt to any use case.
I advocate that we adopt a layered approach whereby we break discrete atoms of functionality out of the Bayeux spec and isolate them in separate protocols/APIs. Together these layers offer the full functionality of Bayeux, but they can be used separately as well. In order to illustrate why this is a good idea, we need to look at existing layered protocol architecture, and also at different classifications of communication protocols.
The origins of the internet provide a solid but flexible foundation for any conceivable network application. It makes sense, therefore, to look to Internet Protocol (IP) as we re-invent network communication for the browser. After all, any modern network application is ultimately built on IP, which routes payloads to destination addresses. IP has no reliability or message ordering guarantees.
Transmission Control Protocol (TCP) treats IP as an unreliable data transport and implements message delivery and ordering guarantees, connection management, and flow control. The promise of TCP is that it will always deliver every packet, and deliver them in order. To accomplish this, TCP sacrifices latency. Almost all modern protocols and applications are built on TCP, with the exception of gaming and media streaming protocols that cannot accept increased latency.
Internet Relay Chat (IRC) protocol is one of the many higher level protocols built on TCP. IRC is primarily meant for chat, and provides the basic mechanisms for distributed publish/subscribe. It’s trivial to build a chat application on top of IRC because that’s its main use case. I choose to put IRC in this analogy because it most directly relates to the functionality provided by Bayeux.
It turns out that there is no application that cannot be built on top of IRC. If the application needs channels, then it’s a perfect fit. If the app only needs peer messaging, then it can ignore the channels. Even video data could be encoded and delivered over an IRC network. We should therefore combine IP, TCP, and IRC into a single standard. Even Web content could be delivered over IPTCPIRC.
I’m sure this already sounds dubious at best. Actually, combining these protocols is among the worst suggestions I’ve ever made. Reasons include:
- If we change IRC semantics, then we must overhaul our IP routers.
- We can’t experiment with alternatives to TCP/IP. We are stuck with IPTCPIRC, and can never try IPUDPIRC, for example.
- We lose the ability for ultra-low latency data transfer. Video conferencing would be terribly laggy.
- Creating an IPTCPIRC client would be incredibly difficult, so much so that we’d likely just have a single implementation.
There are dozens more reasons why this makes no sense. It’s easy to identify these reasons simply by finding the benefits of having each layer separated. The crux of the issue is that when we combine layers, we lose the potential strengths of the lower-level components. For instance, IPTCPIRC loses the advantages of IP’s low latency and simplicity.
We can also look at a protocol by examining what type of communication it is. Consider three forms of communication: a radio broadcast, a walkie-talkie exchange, and a telephone conversation. Communication between two points will always resemble one of these three. Technically speaking, they are defined as follows:
- Simplex: This is similar to a radio broadcast. A source may send data to a destination, but cannot receive a reply.
- Half-duplex: This is similar to a walkie-talkie conversation. Either side may transmit, but never both at the same time.
- Full-duplex: This is similar to the phone conversation. Either side of the conversation may transmit any data at any time. Both may transmit at the same time.
These definitions do not map directly to communication on the web, but we can draw some close parallels. Typical browser-server (HTTP) interaction very much resembles half duplex communication; it is similar to a walkie-talkie conversation. HTTP isn’t actually half-duplex because the server can’t transmit except in response to a client request.
Browser: “Firefox/Windows here, I’d like to do ABC, over.”
Server: “Apache here, I did what you asked, and here’s the result: XYZ, over.”
Ideally though, we’d allow either the server or the browser to transmit data at any point. The supposed purpose of Comet, in fact, is to provide a full-duplex channel over HTTP.
A full-duplex channel can easily be constructed out of two limited channels. Two simplex channels together form a full-duplex channel. Likewise, two half-duplex channels make a full-duplex channel, or one simplex and one half-duplex channel can be used for full-duplex communication.
So, consider long-polling, or iframe streaming. Both of these methods simulate a simplex channel for transmitting data from the server to the browser. At any point the server can asynchronously dispatch a message to the client.
Next consider a series of XHR requests. The XHR requests closely resemble a simplex channel for the purpose of browser -> server transmission. Every time the client has something to say, it transmits it, and the server responds immediately with a minimal “ok” message.
The task of creating a full-duplex communication model can be simplified to that of creating two simplex channels and using them in parallel. This is how Comet works.
Bayeux was constructed for a particular class of application. It is a full-stack, full-duplex protocol. It provides everything from data transport to authentication mechanisms and publish/subscribe, as well as an asynchronous, bi-directional communication channel. When we consider that the main drive of Comet has been to provide full-duplex communication it becomes clear why a chat demo has become the “Hello World” of Comet. After all, a close WWW analogy to a full-duplex phone conversation is a chat room.
Another example where Bayeux succeeds is that of a multiplayer game lobby system. In this system there is one large “room” where people can arrange games of poker. Once they join a game, they are moved from the lobby to that game’s room. Once in the game, users publish their actions to the game channel so that other players in the same game all see what happened. I might publish the JSON datastructure [ "raise", "$5" ] to signify that I just raised the pot by five dollars. Publishing and subscribing is exactly the sort of base functionality that is required for this type of application, and so Bayeux works extraordinarily well.
Cost Without Benefit
There exists a second class of application that is just as important, though largely unacknowledged by Bayeux. These applications are known as “broadcast” applications. For instance, a web-page that shows earthquake information in real-time. This is an application that requires no client -> server communication; it uses simplex interaction. It doesn’t need a publish mechanism, or really even a subscribe mechanism. The URL itself is enough of a subscription request for this information.
The unnecessary cost in this scenario is the overhead of defining the upstream communication model in a protocol, and all the development costs associated with implementing this feature in Bayeux clients and servers even though it goes unused. If a developer is creating a Comet server optimized towards a broadcast type of application, then it makes very little sense to implement Bayeux in its current form. And that’s exactly the reasoning that developers behind some existing Comet servers are using to justify not following the Bayeux spec.
The bottom line is that it takes significant additional resources to follow the Bayeux specification, and some servers and applications have no need for those features, so many Comet server developers will end up abandoning the standard. Any application that doesn’t require the clients to publish back will likely benefit from choosing a server that doesn’t follow the Bayeux standard. The developers, after all, will have spent 100% of their time optimizing towards this use case, and none of their time implementing Bayeux’s unnecessary features.
Cost and more Cost
There is another class of application that actually does need some form of publish/subscribe, but the actual system needed is different enough from that provided by Bayeux so as to render Bayeux’s publish/subscribe system a useless burden.
Consider another multiplayer game lobby system. The game this time is a real-time strategy game. It will have a channel for the lobby which works the same as before. In this game though, we want to verify in-game actions performed by the client. If a player found a treasure chest, he might publish to the game channel [ "found", "15 energy" ], but we want to make sure that the player isn’t cheating. Therefore, a player might instead publish [ "found", "item1" ] to the server. The server would then verify that the client could have legally found item1, and publish back [ "found", "15 energy" ] to signify what the item actually was.
But, if this real-time strategy game had fog of war (which means that players can only see certain portions of the game environment at a time), then it would be bad for this exchange to be broadcast to all players. Therefore we would have to implement a system whereby the server and the individual clients could communicate directly. The server would subscribe to a channel with the ID “server” and the players would subscribe to a channel that contains their player name, like “player-michael”. Now, when I found item1, I would publish not to the game channel, but to the “server” channel. The server would publish back to the “player-michael” channel with information on what I actually found. It might be that the player Dylan was nearby when I found that item, so the server would also publish the information to the “player-dylan” channel. We have effectively implemented peer messenging on top of publish/subscribe. We’ve then gone a step further to re-implement a customized version of publish/subscribe on top of our peer messenging.
For developers who engineer systems for this use-case, it makes no sense to support Bayeux. In storing information about channels that ultimately only have one user, we are simply wasting memory and cycles. The actual publish/subscribe system that ends up being used must be re-implemented at the application layer regardless of Bayeux’s system.
What’s more, distributed Bayeux servers will be optimized towards the publish/subscribe mechanisms as specified in the standard. While it would be possible to optimize a Bayeux server towards peer-messenging, it would be difficult to do so within the framework of Bayeux. Peer messenging under Bayeux is an ugly hack. It’s not the sort of hack that is easy to extend and work with. In the context of distributing Bayeux for peer messenging, this hack makes it so difficult that it encourages developers to adopt a different standard for peer messenging Comet servers.
The bottom line is that by using Bayeux when its publish/subscribe mechanism is not a good fit, applications suffer a memory usage, CPU performance, and scalability penalty, besides the fact that the work of implementing publish/subscribe in Bayeux goes wasted on these applications.
Bayeux shows promise for applications that fit its model well. I can see a whole plethora of rapidly developed chat-style applications taking the WWW by storm in the near future. But Bayeux also promises non-compliance. Because it fits certain use-cases so poorly, there will always be servers that throw out the specification in its entirety. As Comet gains popularity, all of the discomfited developers who have been operating outside of the world of standards will likely get together and put forth a new specification. Bayeux may live on, but in its current form it practically promises its own fall from grace.
A Better approach
The problems are clearly laid out on the table, so now we want to implement all of the functionality of Bayeux, but in a way that doesn’t impinge on use-cases that don’t need all of those features. The way to do this is to take a layered approach where we separate the functionality into component layers much like IP, TCP, and IRC.
A few goals are:
- Provide a good fit for all application types.
- Allow flexibility for as yet unknown use cases.
- Make each piece easy to implement.
There is a set of guidelines we should follow in approaching the design:
- A layer is not aware of the layer above it.
- A layer is aware of the layer directly below it, and no deeper.
- A layer implements a clear set of related functionality.
If we adhere to these guidelines we can ensure that the design is as flexible as possible.
Transport Layer (Unreliable Simplex)
The Transport Layer defines what a transport must do, but not how to do it. This is a definition of what a Comet Transport is, and what API it supports. External documents might specify how to uphold this API for a particular means of communications, such as long-polling or a flash socket. A few important areas to concentrate on are:
- How to encode message data.
- A means of identifying the connection (query string argument or cookie, for example).
- A means to signify that the transport should RECONNECT.
- A means to signify that the transport should CLOSE.
Session Layer (Reliable Simplex)
The Session Layer uses the Transport Layer to receive information from the server, and further defines a method of creating an identified session as well as a way to request a resend for any dropped messages. This layer is very much like TCP in that it guarantees message delivery and provides a means to gracefully close the connection. Particular aspects we concentrate on are:
- Create a session key to identify connections in the transport layer.
- Manage connection state and latency information with Pings and timeouts.
- Graceful closing of connections via a CLOSE or EOF command.
- Guaranteed delivery and message ordering (Message IDs and a RESEND command).
- Provide a means of identifying the session (identifier and location).
Dispatch Layer (External events)
The Dispatch Layer uses a simple readline protocol, not unlike HTTP, to send events from external processes. For instance, the application might be connected to the Earthquake monitoring system, and it would dispatch events to Comet server by using the Dispatch Protocol. It’s conceivable that HTTP itself could be used as the protocol for the dispatch layer. A common upgrade path for current web applications would be to implement a Dispatch Layer client inside of the current web server. Then the browser could communicate as it currently does with the web server, but receive events back from the web server via the Comet Transport or Session server.
Some aspects to concentrate on for the dispatch layer
- Send messages to either Transport Layer clients or Session Layer clients.
- Human readable protocol.
- As simple as possible.
App / Protocol Layer (Anyplex)
The App Layer is where we would implement higher-level functionality, such as publish/subscribe. Protocols built at this layer will likely share a few common characteristics:
- Perform application- or protocol-specific communication.
- Use the Session Layer as a downstream communication method.
- Use an upstream communication method of choice (probably HTTP).
- Include higher-level functionality such as authentication or publish/subscribe.
By layering the various bits of functionality, a developer can choose the layer best suited to providing Comet communication for the application. A server that supports a Protocol Layer specification like Bayeux will necessarily also support the Transport Layer and the Session Layer. Some servers, on the other hand, might be optimized towards the Session Layer and not support anything higher level. This gives developers a full range of options to choose from, and allows them to switch at will to servers that are optimized more towards their needs. A short list of benefits includes:
- Actually use HTTP for what its good at…Client -> Server communication.
- Allow any piece of the stack to be used on its own.
- If you need extremely low latency—and broadcast communication and reliability isn’t an issue—then just use the Transport Layer.
- Third tier standards are very simple. They only have to deal with the new functionality added by the app/protocol.
Additional problems with Bayeux
Reliability Guarantees and Message Ordering Missing
Bayeux contains no way of re-requesting missed messages at the protocol level. This is because the message IDs are left to the application layer and are not numerically ordered. This problem would be solved by leaving reliability guarantees to the session layer, but even a monolithic standard like Bayeux in its current form should have some kind of reliability guarantees. This is the 21st century, after all, and we want coherent data.
Standards-based upstream communication
We can solve this by adopting the layered approach, as the App / Protocol Layer doesn’t have any requirements for upstream communication. We can have a publish/subscribe standard that includes JSON encoded upstream communications, but also another standard that doesn’t. The worst thing we can do is use the promise of Comet as a carrot, and the Bayeux standard as a stick, and try to force developers to adopt our way of communication with the HTTP server. Developers want to be able to receive pushed events. They don’t necessarily want to adopt our artificial requirements on upstream communication.
Bayeux has no concept of a Ping. There should be some way to manage the connection state and detect timed out connections. The layered approach handles this in the Session Layer. The App layer shouldn’t have to handle this. Rather, the application should receive a notification both on the client or server side when a connection is considered timed out.
Bayeux is a great step in the right direction. Standards compliance is the most important goal we can strive for in implementing Comet servers. But right now Bayeux is a “one size fits all” approach to a deeply complex problem. The result is that many server developers will discard the standard in the cases where it doesn’t make sense. If we don’t improve the standard now before it becomes final, we’ll be stuck with a fractured community with competing and overlapping standards.
I don’t think any of my proposals are unreasonable or at odds with the philosophy behind Bayeux. I see no reason not to support the same functionality that Bayeux supports. I just believe that we should do it in layers. It makes good sense from a software architecture point of view, and it makes the same good sense from a protocol design standpoint.