In early 2006, Alex Russell posted about a neat hack that the Google Talk team in Gmail use to support Comet in Internet Explorer, a trick which works as far back as IE 5.01. What great news! A reliable way to stream Comet messages to Microsoft’s browsers. If only it were that easy.
I have not been alone in the following findings: after connecting the htmlfile ActiveX object as a streaming Comet transport to my Comet server, everything works perfectly for a few messages, but then abruptly fails. The connection is closed by the browser with the server-side error “Connection reset by peer.” Surprisingly, however, no one seems to have looked too deeply into this problem.
Finding a Pattern
Comet is complicated. The technical nature of Comet implementations make it very difficult to isolate the server-side code from the browser code. There are so many interactions between so many moving parts that the first step to debugging a new Comet app or transport is to specifically isolate the problem. So when my original htmlfile code failed, I followed these steps:
- Time it. I figured that it might be a set timeout, perhaps an idling timeout or a limit on the total time for an open connection. I tried waiting 10 seconds between each event, 1 minute, 3 minutes and 10 minutes. For all but the 10 minute interval I had got the same number of messages. So on to my second plan of attack.
- Count messages. I was getting 7 messages before failure.
- Increase message payload. Having ascertained that the problem is not a set time, but rather the number of messages, I wondered if it depended on the size of those messages. I switched my app to send “Yo” * 50 instead of just “Yo”. Nothing changed: I could send the same number of messages (7) even with a payload roughly 50 times larger.
- Change browser callbacks. Quickly running out of ideas, I tried altering the message payload in other ways. I created a new app which would simply ‘alert’ the data instead of adding it to a styled ‘div’. Behold! The eighth message was successfully received. As was the ninth, and every message up to the 42nd. Strange—and still broken—but encouraging.
I set aside this revelation for the time being. Changing the browser callbacks seemed to affect the behavior, so I tried another simple change—alerting twice. This time, only 25 messages were processed. Strange… I would understand 21 messages of two alerts each being allowed, as I was allowed 42 alerts sent one at a time. But now I’m allowed 50 if I send them two at a time?
Why not just call a function attached the parent window, you wonder? It turns out htmlfile’s iframe doesn’t care where the function object lives; instead, it cares which thread is used to execute the code. The htmlfile thread is a capricious beast, and will rebel when employed to do too much DOM work. The effect of setInterval is to move the actual DOM manipulations to a thread that is perfectly safe for that sort of scripting. This fix works for IE 5.01+
- If you recall my first debugging technique of changing the time between each event, it failed when I attempted to wait 10 minutes between events. To be specific, if the connection remains open for 300 seconds (5 minutes) without receiving any data, then Internet Explorer will consider the request to have timed out. In IE 5.01 and 5.5 an error dialog is displayed when the connection is broken, but no such error occurs in IE6/7.
We are left with a robust streaming transport for Internet Explorer 5.01+ which I believe to be production ready. I’d love to know how the Gmail team got around this problem. Perhaps it is as easy as an additional argument when instantiating the htmlfile object. I have no idea because I cannot find proper documentation. I am satisfied with this solution though. It is good enough. It gets the job done and the user sees none of the ugliness under the hood.