Orbited
SitePen Support

Comet Drives Election Excitement

by Andrew BettsMay 5th, 2008

As I write this, we’re waiting for the results of London’s Mayoral election. The position of Mayor of London has only been around since 2000, and this year a too-close-to-call contest between two of the most colourful politicians in the country has kept the race on the front pages of the papers for weeks. The incumbent and often controversial socialist mayor Ken Livingstone is up against an unpredictable and flamboyant conservative, Boris Johnson, who is better known for hilarious appearances on TV and for having uncontrollable hair than for serious politics. There are about a dozen other candidates, but I’d struggle to tell you who they are. This is the Ken and Boris show.

A couple of days before the polls opened, I helped set up a Comet server for TheLondonPaper, one of the city’s free newspapers, enabling them to provide real time coverage on polling day and during the count, which has now been going for a mammoth 14 hours. I’ve had it open for most of the last 48 hours and I’m absolutely hooked.

TheLondonPaper screenshot

The paper’s reporters are out with all the candidates, at City Hall and at all the counts sending in updates at a frightening rate. What have I learned? Well, Ken started the day with an orange juice, and voted at 08:59. Boris emerged from his house to vote at 10:37. At 11:25, we discover where Boris had his hair cut in preparation for the big day. By 13:10, the hair is back to its usual wild state and the pictures are beaming across the web in real time. Moments later we have coverage of a rain shower that has forced both candidates to take shelter, and Boris gets mobbed by a group of sixty-year-olds. Some serious analysis follows… Ken is modeling the very latest in flasher mac couture and charity shop-style muffler - grand total about £120 ($240). Boris strides out in text book Conservative Paul Smith navy blue suit and Thomas Pink shirt - grand total about £1,200 ($2,400). The fashion editor has the vital statistics at her fingertips.

In fact I’m doing the paper a disservice. The live feed had no match when it came to keeping Londoners informed throughout polling and count day - results literally seconds after they were announced, quotes and sound bytes from senior members of both parties, polls and interim results as they came in. And all the ‘flashes’ about rain showers and wardrobe malfunctions are just adding to the sense that we’re right there in the midst of the action. It’s utterly addictive in a way that TV just can’t replicate, and I can have this running on my computer all day.

This is surely one of the very best uses of Comet technology. Tracking live news events that change by the second is a real challenge for the web, and Comet technologies make the difference between a lacklustre service and one that people simply can’t turn off. I hope we see more of this kind of thing in the future. For the moment, the count is finally over and the announcement has just been made - Boris has won it.

TLP screenshot 2

With characteristic flair Boris signed off his acceptance speech:

I hope that everybody who loves this city will put aside party differences to try in the making of Greater London greater still. Let’s get cracking tomorrow, and let’s have a drink tonight. Thank you.

I know a web developer at TheLondonPaper who could probably do with one as well.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Is Comet Becoming Over-complicated?

by Andrew BettsFebruary 14th, 2008

Note: Andrew’s article wasn’t written as part of the Colliding Comets: Bayeux series, but this article would be of interest to anyone following the Bayeux series:

Part 1: Greg Wilkins explains the need for Bayeux
Part 2: Michael Carter criticizes the current state of Bayeux
Part 3: Greg Wilkins responds to Michael Carter
Andrew Betts’ thoughts (from a related article)
Part 4: Michael Carter responds to Greg Wilkins
Part 5: Kris Zyp’s thoughts
Part 6: Alex Russell responds to Michael Carter
Part 7: Michael Carter responds to Alex Russell

In the late 1950s, when the US and the Soviet Union began manned space programmes in earnest, they both encountered a problem: pens don’t work in space, because a force is required to carry the ink to the nib, and that force is normally gravity. NASA therefore embarked on a research programme to develop pressurised ink cartridges that would work in space. Many millions of dollars later, they perfected the space pen. This revolutionary piece of technology features a tungsten carbide ballpoint precisely engineered to avoid leaking ink, and a float to keep the ink reservoir separate from the pressurised nitrogen that forces it out at any angle. The pen can also write at extreme altitude, on greasy surfaces and at extreme temperatures.

Meanwhile, the Russians used pencils.

Yes, it’s inelegant, a bit messy, doesn’t do much for the cause of human knowledge, but if we’re trying to solve the problem of writing in zero gravity, then job done.

I think this is a great story (despite the fact that, sadly, it isn’t true), because it demonstrates the tendency for engineers to occasionally invent a solution to a problem that doesn’t exist, or to solve a symptom rather than a cause. Others can identify the easy way and grasp what I call the ‘pencil solution’.

The particular problem Comet is trying to solve is pushing data from server to client in an event driven way, and doing so in a way that works in most web browsers, so it can be used on a website. We currently achieve this by making an HTTP connection, but while HTTP is designed to transport hypertext in a stateless, non-persistent way, we’re using it to transport (typically) JSON or XML in a stateful, persistent way. Sounds rather like trying to shove a square peg in a round hole, which is exactly what it is.

Added to that, each browser presents a need for a different implementation strategy which leads us to a series of hacks that differ from one browser to the next.

Ideally, since we’re generally transporting a data stream, and doing so on a persistent connection, it would make sense for there to be a protocol designed from the ground up for this that all browsers support in a consistent way. Some work has gone into this in the WHATWG and W3C HTML working group, with ‘server-sent DOM events‘ a feature of the upcoming HTML 5 specification, and already partially implemented in Opera. However, the spec for this feature still envisions developers using regular HTTP to send event streams, and makes only this passing reference to other protocols:

For non-HTTP protocols, UAs should act in equivalent ways.

Is HTTP coming to be seen as a Swiss army knife for any kind of vaguely web-related data transport requirement? Any application protocol built on top of HTTP accepts as a fundamental principle that at the transport level, the protocol is a stateless request-response system. The Bayeux spec pays similar attention to non-HTTP protocols as server-sent events does:

Other transports that support a request/response paradigm may be used. However this document assumes HTTP for reasons of clarity.

So what I’m saying here is that in creating this kind of protocol that builds on top of HTTP, you essentially rule out the possibility of using any transport protocol that is not implemented as a request-response paradigm. As I try to keep up with the debate my co-contributors are having (part 1 here) about Bayeux, this is one of the main issues that troubles me. Why choose a transport protocol that’s so apparently misaligned with the purpose of the system, and then create an application protocol on top of it that corrects for the misalignment.

The obvious answer is that Comet is about communication with a web browser, and that means speaking in HTTP, and so right now this is the best we can do. But when browsers don’t understand the protocols we want natively, traditionally that’s what plug-ins are for. If you start with the objective of eliminating plug-ins, that means accepting a big fudge of browser inconsistent hacks, plus limitations that you just can’t work around, like cross-domain streaming and multiple concurrently connected clients. These are not small problems.

Of course the perfect solution is that mythical transport protocol that all browsers support in the same way and which is designed to do exactly what we’re trying to achieve with Comet. And if re-engineering at the transport level is too much to ask, at least server-sent events might offer a standardised application protocol that can be implemented consistently.

While we wait for this revelation to occur, I prefer to keep my options open. Here’s a use case to consider: a commercial advertising-funded website that serves at least two Flash ads on every page, and which attracts some fifty million page impressions a month. That’s a hundred million SWFs a month—quite some commitment to the technology. Flash has been able to do socket connections via the XMLSocket object since version 5, released in 2000, and Flash / JavaScript integration has been relatively straightforward since Flash player 6 (2002). So if you’re using Flash anyway, why not use it for streaming as well?

Of course you still need a server that supports event streaming on long-lived connections, but the unsavoury client-side of comet suddenly becomes simple. It also trivially solves all cross-domain issues, needs no headers or JSON encoding, and works using the same technique on all browsers that support Flash player 6 or greater.

What about mobile. Well, push-to-mobile is a problem solved as long ago as 1999 by Research in Motion. Their Blackberry devices work so well that the prospect of losing the service thanks to a patent dispute was very nearly ruled a national emergency in the US. The reason these things Just Work is that RIM has control of everything: server software, client software, client OS, even the handset hardware can be designed to work together. It’s one great big happy family of closed source proprietary technology. But imagine a mobile Comet web app providing a user experience anywhere near as good as a Blackberry’s native email client. Web browser running in background constantly??? Even then, you’ve got no access to the phone’s OS to fire vibration or audio alerts. Even if Comet worked perfectly (using any transport), lack of support for Flash would be the least of your problems when it comes to mobile push.

So back in the real (desktop) world, I had a crack at a Flash-based comet client for Meteor. After a few brief experiments I was able to complete a client consisting of a small Flash movie acting as a bridge between an event stream and the browser’s JavaScript event handlers. Not only that, but it works consistently in IE, Firefox, Opera and Safari. And all in under an hour. I spent another precious eight minutes proving it could talk cross domain.

Here’s a video (the Flash bit is the tiny 30×15 widget at the top right of the scrolling DIV):

(Note that the periodic disconnects are being forced at the server end to test client resilience).

Heady with excitement, I got carried away and stuck six of them on the same page, and in doing so proved that I could stream from multiple completely alien domains simultaneously and maintain at least four concurrent connections to the same one. The client also implements auto-reconnect, doesn’t require any padding on the response, and just needs a null byte to separate packages. The bandwidth use is a fifth of the equivalent payloads sent using Bayeux. The ActionScript weighs in at 3KB uncompressed and the finished SWF, even incorporating a simple UI, is a tiny 2.3KB.

Bayeux actually acknowledges the potential of using a Flash client in the spec, but this is surely nonsense, as virtually none of the reasons you might want to use Bayeux apply when you’re building on a client than can make its own raw socket connections.

So in summary I’m saying it could be argued that Bayeux is a solution looking for a problem. Flash combined with a protocol built directly on top of TCP/IP is smaller, leaner, more compatible, less restrictive, and simpler. While not everyone is going to be satisfied with that, it’s certainly a perfectly respectable alternative. And when you have several possible (imperfect) ways of solving a problem (and probably more on the way) standardising on one of them just seems to limit choice. I agree with Michael Carter’s recent post that if we are to standardise anything, it should be the API, not the communication protocol.

So we wait…for the holy grail of standardised server-pushed events in HTML 5, if indeed it materialises. And in the interim, no one imperfect solution seems any better than any other imperfect solution. Now where did I put that pencil…

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Real Time Angst

by Andrew BettsJanuary 15th, 2008

My work often finds me writing support for (or creating) APIs and integrating systems, and I find it depressingly familiar every time a big client suggests connecting disparate parts of their IT empire with RSS feeds (or in fact any kind of XML, which is often mislabelled as RSS when it’s not). Yes, RSS is great, but will the world’s corporations please stop misusing it as the be all and end all of systems integration. I’ve recently had to do battle with a client and two of their existing suppliers over why syndicating a rapidly changing news source by polling an RSS feed every 10 seconds is a bad idea.

Of course integrating systems with real time data streams doesn’t require Comet (if we define Comet as push-to-browser), but because there are so few people doing it, I fear that developers will not see the point of Comet for the ‘last mile’ when their back-end is fundamentally not set up to be event driven.

For example, I have a new Blackberry Curve, but we don’t have Blackberry Enterprise Server at work, so I’m using the Blackberry service provided by my network operator. This checks my IMAP email every 15 minutes and then squirts the new messages to my Blackberry in real time. What on earth is the point of that? The phone may as well just poll.

So we need to extend the real-time vision further into the back-end of systems to make it worthwhile to use Comet. One of the only useful real-time data sources available for hacking with (at time of writing) is the Livejournal update stream from Six Apart. This is basically an XML feed, though it can never validate as XML since it doesn’t have a doctype or wrapper element, and you’ll never finish loading it.

The Six Apart stream features content from Livejournal, and pushes the full text and all metadata out on the feed as soon as it’s published. That’s a lot of data—about 7KB/s on average, or 2-3 posts per second. Pretty impressive for a community that’s often considered to be a bit out of vogue. The demographics of LJ show that most users are aged 15-24, and are two-thirds female. Top topics include school, boys, depression, parents and celebrities. In other words, it’s a never-ending stream of angst.

I’m going to present a tutorial for working with this real time data and streaming it to the browser using Comet (for which I’ll be using the Meteor Comet server, though the comet implementation you choose is fairly irrelevant to this example), in an attempt to demonstrate that Comet is not just useful for chat rooms, and that when you think about the bigger picture, making your applications talk to each other in real time is a very good idea.

Coping with the pain

I wrote a daemon in PHP to listen to Six Apart’s stream, extract English posts, catalogue all the angst-related words and construct a word frequency index. Here’s the bit of code that does the indexing:


// Decode, replace LJ-peculiar entities, and strip tags
$content = strip_tags(str_replace($entities, $replacements,
(html_entity_decode($content))));
// Remove any non-words (phone numbers, ascii art, wierd l33t-speak...)
$content = preg_replace("/[^a-z\-]+/i”, ” “, $content);
// Split resulting text into words
$words = explode(” “, $content);
foreach ($words as $word) {
  // Lowercase the word for comparison and see if it’s a target term
  $word = strtolower($word);
  if (in_array($word, $terms)) {
    // Add the time of this occurence to the occurences list for this word
    if (!isset($counts[$word])) $counts[$word] = array();
    $counts[$word][] = time();
    // Add the word to the words changed list for quick reference later
    if (!in_array($word, $wordschanged)) $wordschanged[] = $word;
  }
}

This creates an associative array noting the times at which each key word was seen. The number of elements in each word’s array tells us how many times that word has been seen since the earliest timestamp in the array (which will always be the first element). As the array is constructed we’re also keeping note of the words that have been spotted again since the last set of rolling averages were calculated. Whenever data is no longer being received, the script uses the downtime to recalculate those averages, like this:


while (!empty($wordschanged)) {
  $w = array_shift($wordschanged);
  // Remove any occurences that are older than 5 mins
  while (sizeof($counts[$w]) and $counts[$w][0] < (time()-300))
  array_shift($counts[$w]);
  // Calcuate the new 5 min average and compare it to the previous one
  $prev = (isset($prevcounts[$w])) ? $prevcounts[$w] : 0;
  $now = round(sizeof($counts[$w])/5,1);
  // If more than 2% different, send the update to Meteor
  if (abs($now-$prev) > (0.02*$prev)) {
    $out = “ADDMESSAGE angst {w:’”.addslashes($w).”‘,c:”.$now.”}\n”;
    echo “> $out”;
    fwrite($op, $out);
    $prevcounts[$w] = $now;
  }
}

For each word, the list of occurrences is trimmed until the first element is less than five minutes old. If the new count is still different to the old count, the new count is divided by five to give a five minute rolling average, and inserted into Meteor on a controller connection (already open as $op). If it’s the same, there’s no need to insert a message, since the client will continue presenting the old number, knowing that it is still up to date.

Finally, and I’d love it if someone could come up with a better way of doing this, we need to fix a PHP memory leak. The $counts array is treated like a queue, with timestamps appended to the end of each word’s ocurrence list, and removed from the start. As a result it should gradually move through memory, allocating new blocks at the end of the array, and releasing them at the front. But it doesn’t seem to release them, so the simple solution is to simply copy the remaining elements into a new array:


$newcounts = $counts;
unset($counts);
$counts = $newcounts;
unset($newcounts);

This is all wrapped in a never-ending loop that maintains connections to both Meteor and the Six Apart stream, and re-establishes them if they drop. The complete commented code can be downloaded here:

livejournalmonitor.php

Now run it, and you will start filling up your Meteor server with word frequency change notifications. The next step is to make a nice JavaScript visualisation of this data.

Giving it some bling

I’m using jQuery, my favourite JavaScript library, to make this all very easy. I’m going to put my code in an include file, which I’ll load after the jQuery include, in the <HEAD> of the HTML document. Of course you could also include it inline, in a <SCRIPT> block. First you need a connection to Meteor (you have installed Meteor, haven’t you? If not, go do it now. We’ll wait):


$(document).ready(function() {
  Meteor.hostid = hostid; // defined earlier
  Meteor.host = "data."+location.hostname;
  Meteor.registerEventCallback("process", newHit);
  Meteor.joinChannel("angst", 25);
  Meteor.mode = 'stream';
  Meteor.connect();
}

This subscribes to the angst channel, retrieves the last 25 messages, and starts streaming new messages. hostid is just a unique reference for this connection, so that when the client reconnects the server knows to terminate the old connection. It can be anything you like, but each client needs to choose a different one, so either allocate it from the server end or use a random number with a high level of entropy. It also registers a function for handling new events: newHit.

The next step is to define a function called newHit and deal with messages.


function newHit(datastr) {
  eval("var data = "+datastr+";");
  if ($("#wordrow_"+data.w).length) {
    var prev = $("#wordrow_"+data.w+" td.c").html();
    var change = Math.round(Math.abs(data.c - prev)*10)/10;
    if (change > 0) {
      var dir = ((data.c - prev) > 0) ? "up" : "down";
      $("#wordrow_"+data.w+" td.c").html(data.c);
      $("#wordrow_"+data.w+" td.ch").empty().append("<div class=\""+
      dir+"\">"+change+"</div>");
      $("#wordrow_"+data.w+" td.ch div").animate({backgroundColor:'#fff'},
      'slow');
      updateAngstOrder(data);
    }
  } else {
    var min = ($("#datatable1 tr").length >= 11) ?
    $("#datatable1 tr:last td.c").html() : 0;
    if (data.c > min) {
      $("#datatable1").append("<tr id=\"wordrow_"+data.w+
      "\"><td class=\"w\">"+data.w+"</td><td class=\"c\">"+
      data.c+"</td><td class=\"ch\"></td><td class=\"gl\"></td>");
      $("#wordrow_"+data.w).animate({backgroundColor:'#fff',
      fontWeight:'normal', color:'rgb(68,64,88)'}, 3000);
      updateAngstOrder(data);
    }
  }
}

The message that the function receives is a JSON object, something like {w:’ohmygod’,c:13.6}. So you can just eval() it to assign a native object to a variable. On line 6 we split depending on whether we already have an entry for this word or not. If so, we can look up the current value from the HTML we wrote last time to calculate the change, and if there is a change (which there should be, else Meteor would not have sent an update), display it. This means working out the direction of the change, updating the HTML with the new actual value, displaying an arrow showing the direction, and setting off an animation effect to highlight the change.

If the word doesn’t exist yet, then we may want to add it to the table, but because we only want a maximum of ten items in the table, it will have to have a higher value than the lowest item currently displayed. To do that, we check that the table is full (11 rows equals ten data rows and the header row) and extract the value of the last item, or otherwise set a minimum of zero. To highlight the new entry in the table, it is initially set to a yellow background, and gradually faded to white.

You’ll notice that whenever a value is updated or a new entry is inserted, I’m calling a function called updateAngstOrder(). This shuffles the list to ensure that all the rows are in the correct rank order. Here it is:


function updateAngstOrder(word) {
  var row = $("#wordrow_"+data.w);
  while (row.prev().length && data.c >
  parseFloat(row.prev().children("td.c").html())) {
    var prev = row.prev();
    row.remove().insertBefore(prev);
  }
  while (row.next().length && data.c <
  parseFloat(row.next().children("td.c").html())) {
    var next = row.next();
    row.remove().insertAfter(next);
  }
  while ($("#datatable1 tr").length > 11) {
    $("#datatable1 tr").eq(($("#datatable1 tr").length-1)).remove();
  }
}

First we establish what the value of the changed row is, then, while that value is greater than the one in the row above, we shift the row up the table, and while it is lower than the value of the row below, we shift it down. Once the row has been positioned correctly, there may be an extra row at the bottom to remove, so the table is trimmed of the lowest values until it contains just the top ten. This will be because when a new entry comes along that’s higher than the lowest value in the table (but when there are already ten items in the table) we add it anyway, shuffle it to the right position, and then remove the lowest item.

Plottin it, on like a chart, innit

There seemed to be a nice convenient gap in my page that called to me and said something like “I ought to be a real-time graph, duh”. So I decided to give it a go with a Flash component. Initially I spent quite some time trying to write one from scratch and then, at about the same time as I realised what a mammoth job it was, I discovered amCharts.

The amChart takes care of scrolling points leftwards automatically, adjusting its own scale as the min/max range of plotted values changes, and allowing mouseovers to examine individual data points. But first of all it would need data from my page, so I added the following to updateAngstOrder:


if (data.w == "me") updateFlash("me", data.c);

Quick and dirty, but I only want to plot a single data series for the moment. Now we need an updateFlash function to send the new value to the amChart:


var angstdata = {};
function updateFlash(word, value) {
  var now = new Date();
  var hrs = now.getHours();
  var mns = now.getMinutes();
  var scs = now.getSeconds();
  var nowts = Math.floor(now.getTime()/1000);
  angstdata[nowts] = value;
  var xml = “”;
  for (var i=0; i<=59; i++) {
    scs–;
    if (scs==-1) { scs = 59; mns–; }
    if (mns==-1) { mns = 59; hrs–; }
    if (hrs==-1) hrs=23;
    var fsec = (scs<10) ? “0″+scs : scs;
    var fmin = (mns<10) ? “0″+mns : mns;
    xml = “<value xid=\”"+i+”\”>”+hrs+”:”+fmin+”:”+fsec+”</value>”+xml;
  }
  var xml = “<?xml version=\”1.0\” encoding=\”UTF-8\”?>\n”+
  “<chart>\n<series>\n”+
  xml + “</series><graphs><graph gid=\”1\”>”;
  for (var ts in angstdata) {
    var secago = nowts - ts;
    if (secago > 60) delete angstdata[ts];
    xml += “<value xid=\”"+secago+”\”>”+angstdata[ts]+”</value>”;
  }
  xml += “</graph></graphs></chart>”;
  if ($(”#amlineswf”).length) $(”#amlineswf”).get(0).setData(xml);
}

So what we’re doing here is building up a string of XML in a format that amCharts understands. First we define the X-axis divisions: one for each of the last 60 seconds, labelled with HH:MM:SS. The loop on lines 9-17 steps backwards through time in 1 second increments, formats the time, and writes an entry to the XML string (importantly, each division is keyed using the number of seconds between its timestamp and the current time). We’ve also just added the latest value to a global array angstdata on line 7, using the timestamp in seconds as a key. Now we can write the values for the data series, attaching each value to the relevant x-axis division using the time offset to work out which division each value belongs to. Once the XML is complete, we just tell the amChart to do its stuff.

And there you have it

All the JavaScript code is reproduced on this page, and is also available via the live example of the angst monitor linked above. To implement the amChart you will need to download the amLine release package from amCharts.com—the only files you need from this package are amline.swf, amline_settings.xml, amline_data.xml and swfobject.js (though you can write your own Flash embed code if you prefer). I have modified amline_data.xml to remove the data since it will all be generated real-time from JavaScript, and have tweaked amline_settings.xml to create the visual style that I wanted. You can download both of those modified files here:

amline_data.xml
amline_settings.xml

Phat conclusions

You are like seriously sad and I hate you

~ Comment from anonymous Livejournal user

Conclusions herein should not be taken seriously, and I’m perfectly aware of the numerous scientific flaws in this. Scientific accuracy wasn’t the objective.

That said, some of the nicer findings of this fairly pointless mashup are that life is over ten times as popular as death (although there are still thousands of mentions of death every day) and love triumphs over hate. It’s nice to think the world has 3.8 times more love than hate, but maybe terrorists don’t blog. And remember the most popular word of all: ‘Me’.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Cross Site Scripting Joy

by Andrew BettsDecember 4th, 2007

Google for ‘cross site scripting‘ and you’ll get a plethora of articles and tutorials about vulnerabilities, loopholes, and exploits. Early in the development of JavaScript it was realised that client-side scripting had the capacity to access information in other browser windows that might be sensitive and which it certainly had no business reading. This was considered a problem and dubbed cross site scripting (thankfully abbreviated to XSS, not CSS). The basic security principle that solves this is the Same Origin Policy, which prevents scripts from accessing resources unless they come from the same host. Sounds simple enough, but modern XSS exploits are incredibly complex, getting around the same origin policy by taking advantage of opportunities to inject script into websites that simply redisplay input without encoding it first.

So the battle over XSS as a security problem has moved on from the same origin policy, but same origin remains a massive obstacle to development of useful non-malicious services, and that’s particularly true of Comet, because there are typically two servers involved in any comet setup: a web server like Apache, and a comet server like Meteor or Orbited.

There are essentially three choices for making these two servers play together:

  1. marry them: have one server that serves both your Comet connections and the standard ones (including any dynamically generated content);
  2. have a regular web server with a Comet server sitting in front of it, so all connections are made to the Comet server, and it proxies the non-comet connections to the web server;
  3. have both the Comet and the regular web server exposed to the web, and request applicable content from each one.

There are very few, if any, web servers that are capable of doing Option 1 efficiently (hence the development of Meteor, Cometd, Orbited, Lightstreamer etc). Option 2 is easy and works, but is a bit of a cop out from the same-origin problem, and puts a lot of unnecessary load on your Comet server.

Which brings us to Option 3, and the need to have content served from two different sources interacting on the same page, and leaves us at loggerheads with the same origin policy.

If only the same origin policy was uniformly implemented, this might not be such a problem. But it’s not. And after getting utterly frustrated with inconsistent behaviours between different browsers, I decided to write a few tests. And I got slightly carried away, so I ended up with 38. Here are the results—I threw in an iPhone for colour:

T’port Configuration IE 6/7 FF 2 Op 9 Saf 2 Saf 3 iPhone
XHR one-part, same-origin Yes Yes Yes Yes Yes Yes
one-part, same host, different port Yes No No No No No
one-part, parent host, same port No No Yes2 No No No
one-part, parent host, different port No No No No No No
incremental, same-origin No8 Yes Yes No3 No3 Yes
incremental, same-origin, 1K prepended ‘noise’ No8 Yes Yes Yes Yes Yes
incremental, same host, different port No8 No No No No No
incremental, same host, different port, 1K prepended ‘noise’ No8 No No No No No
incremental, parent host, same port No No Yes2 No No No
incremental, parent host, same port, 1K prepended ‘noise’ No No Yes2 No No No
incremental, parent host, different port No No No No No No
incremental, parent host, different port, 1K prepended ‘noise’ No No No No No No
T’port Configuration IE 6/7 FF 2 Op 9 Saf 2 Saf 3 iPhone
IFRAME one-part, same-origin Yes1 Yes1 Yes1 Yes1 Yes Yes
one-part, same host, different port Yes1 Yes1 No No Yes1 No
one-part, parent host, same port Yes1 Yes1 Yes1 Yes1 Yes1 Yes1
one-part, parent host, different port Yes1 Yes1 No No Yes1 No
incremental, same-origin No3 Yes1 Yes1 No3 No3 Yes
incremental, same-origin, 1K prepended ‘noise’ Yes1 Yes1 Yes1 Yes Yes Yes
incremental, same host, different port No3 Yes1 No No No3 No
incremental, same host, different port, 1K prepended ‘noise’ Yes1 Yes1 No No Yes1 No
incremental, parent host, same port No3 Yes1 Yes1 No3 No3 Yes
incremental, parent host, same port, 1K prepended ‘noise’ Yes1 Yes1 Yes1 Yes1 Yes1 Yes
incremental, parent host, different port No3 Yes1 No No No3 No
incremental, parent host, different port, 1K prepended ‘noise’ Yes1 Yes1 No No Yes1 No
T’port Configuration IE 6/7 FF 2 Op 9 Saf 2 Saf 3 iPhone
SCRIPT one-part, same-origin Yes Yes Yes Yes Yes Yes
one-part, same host, different port Yes Yes Yes No Yes No
one-part, parent host, same port Yes Yes Yes Yes Yes Yes
one-part, parent host, different port Yes Yes Yes No Yes No
any incremental-loading configuration No No No No No No
Shorten document.domain Yes Yes Yes Yes Yes Yes
Lengthen document.domain to original length once shortened Yes No No No7 No7 No7
T’port Configuration IE 6/7 FF 2 Op 9 Saf 2 Saf 3 iPhone

Table notes

  1. IFRAME can only be accessed if its document.domain matches the parent frame’s document.domain
  2. XHR can only be made if document.domain is a match for or a parent of the target host. This will always be the case for XHRs to the originating host, but document.domain must be shortened to allow XHR to the parent domain.
  3. Safari and IE have a 1K buffer which must fill up before any response is parsed. Data received before the 1K mark is reached will not be rendered or interpreted and will not fire the Interactive state of an XHR until the buffer is full or the connection is closed. The obvious solution to this is to send 1K of ‘noise’ at the start of any response that needs to be parsed incrementally.
  4. SCRIPT loading is a blocking action in Opera and a non-blocking action in other browsers.
  5. In Safari, HTTP URLs that do not explicitly define a port number (and therefore default to port 80 in all browsers) are considered to have a different port to those URLs that do explicity define port 80, even though both URLs actually use the same port. The test suite therefore defines ’same port’ as ‘no port specified’, rather than explicitly setting :80.
  6. Despite setting cache-busting headers, you must give every script you load using the SCRIPT transport a unique URL, else the browser may simply pull the existing script out of memory and run it again.
  7. Fails silently (does not throw an error).
  8. In Internet Explorer, the first incremental response to XHRs does fire a readyState 3 ‘interactive’ event, but neither the body nor the headers of the response are available until the response is complete, making these events essentially useless. So for the purposes of incrementally downloading data into the browser, the incremental XHR is considered non-functional in IE.

The full results, methodology and online test suite is available on the Meteor site. Feel free to check these, tell me if I’ve got any wrong, and suggest tests that should be added. And if your browser isn’t listed, run it through all the tests, send me the results, and I’ll add it to the table.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Real Time Data Sources

by Andrew BettsNovember 26th, 2007

There are so many APIs out there now—it’s still not standard practice to create an API when building web apps, but we’re not far off. But when you compare the availability of APIs with the availability of real time data sources, APIs seem to be everywhere while there is barely a real time source to be found.

And this is disappointing, because there are loads of services for which real time data would be so much easier and more applicable to the task: Flickr’s recent activity method, Digg’s and Technorati’s search APIs, the BBC traffic feeds, and virtually everything offered by the Twitter API…

In fact, I’m only aware of one really useful source of real time data: Sixapart’s excellent Livejournal/Typepad update stream, where you can connect to their server and receive data as it happens. Livejournal users the world over are generating content constantly, and as soon as they publish their data it goes out on the feed. I typically find that you see 2-3 posts per second, which is pretty good for a community many think is a bit out of vogue.

I’d love to know if anyone has come across any other real-time data sources that are freely available.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Developing Markets Live for the FT

by Andrew BettsNovember 7th, 2007

When the Financial Times came to us for a new kind of Markets commentary blog, they had in mind something to really break the mould of the daily market report you’ll find on the back of a typical business newspaper. A blog is all well and good, but markets are probably the holy grail of real time data—with 50 trades a second not uncommon during peak periods on the world’s busiest exchanges, markets news moves by the second, not by the day.

FT Alphaville

People building Comet-type projects have long realised this, and the stock price monitor is pretty much the ‘hello world’ of the Comet movement.

What the FT has done really well for 119 years is to make sense of the frenzied market activity and present something more refined: insight and analysis rather than just raw data. The challenge was to present that insight at the speed of the market, not at the speed of a typical newspaper publishing cycle.

The solution we designed is an event-driven blog built on WordPress, where the blog homepage can sit open for a whole day on a trader’s screen and accumulate posts as they are made, plus an hour-long live chat at 11am London time every trading day. This is all powered by Meteor, an open source event-driven webserver designed to distribute real time data to thousands of subscribers efficiently.

The task can be broken down like this:

* Getting new posts into Meteor
* Getting user comments into Meteor
* Getting live chat messages into Meteor
* Providing a 15 second buffer for sanity/legal checking
* Distributing Meteor event messages across a multi-server platform
* Receiving and acting on Meteor messages in JavaScript

We hooked our own plugin into WordPress to notify the Meteor servers when a new post was made, and user comments are simply submitted via Ajax to a handler script that pops them into Meteor after a quick trip through the excellent comment/trackback spam-blocker Akismet. Live chat required the development of a simple chat client, but actually submitting messages is again an Ajax request (we trust the journos enough not to send their messages through Akismet).

When the live chat messages are sent to Meteor, they are added on two channels: one for participants and another for subscribers. The participants’ channel receives the message immediately and the subscribers’ channel receives it after a 15 second delay. This is to allow chat participants to block each other’s messages if they are libelous, technically incorrect or mistyped. Block actions are sent to the server in another Ajax call and distributed in another Meteor message so that all participants can see straight away when a message is blocked.

The next issue is running Meteor on a multiple-server environment. Incoming messages will land on one server, and subscribers or other participants may be connected to a different server. Currently Meteor does not support syncing multiple instances, so we wrote an abstraction layer that would ensure messages were copied to each server.

Each page of the blog includes the Meteor JavaScript client. Blog index and category pages subscribe to post channels, the blog article pages subscribe to comment channels (each post has its own comment channel), and the Markets Live chat subscribes to either the subscriber or participant channel, depending on who’s logged in.

FT Alphaville

See it for yourself at ftalphaville.ft.com. And if you can drop in at 11am London time (3am San Francisco, 6am New York, 12pm Berlin, 7pm Tokyo, 8pm Sydney) you can catch Markets Live—just make sure you’ve got your tin hat on

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Meteor at Hack Day ‘07

by Andrew BettsOctober 24th, 2007

Since being invited to contribute to Comet Daily, I’ve been thinking about what to do with my opening post, and I can think of nothing better to get things going with a bang than to talk about the antics at Hack Day ‘07 here in London. Hack Day was a gathering of around 200 hackers sponsored by the BBC and Yahoo, in which everyone was given 36 hours to hack together the most amazing mashup they could muster.

There were 62 hacks in all, some more outlandish than others. The other guys on our table created Fruitr, a Flickr-powered fruit recognition engine, and there were airships, mapping apps, and even one entirely paper-based hack that had the audience in fits. What struck me though, was that none of the hacks were event-driven (except ours, naturally!). The prevalence of APIs everywhere is great, but are web service owners missing a trick by only making their data available via REST/SOAP/RSS interfaces?

Take twitter, for example. I can get an RSS feed of the latest twitters from everyone, but seeing as there are so many, I have to refresh the feed at a rate that borders on denial-of-service to guarantee that I’ll capture all activity. So we thought: let’s do something with Meteor, the open-source perl-based comet server, to demonstrate what event-driven interfaces can do.

The event had wifi provided by BT Openzone running from about 15 access points around the hall, so we set up sniffers to capture all the traffic going over the air, and sent it to a Meteor controller script that we knocked up in PHP to inject packet stats into Meteor. We then did an interface in Javascript/XHTML/CSS, deployed the Meteor javascript client, and were able to see in real time what everyone in the room was doing on the web. Suffice to say there were some fairly bizarre protocols flying about.

Anyway, here’s a screencast of our demo:

The left column shows a ranked list of protocols in use by number of packets, so HTTP, IMAP and SSH were always pretty close to the top. The second column shows the IP addresses of people accessing the Meteor demo and their user agent (that’s from an access log monitor daemon, rather than any data from the Wifi). Finally the right hand side has a world map with flashing dots showing the targets of the traffic from the Hack day hackers on a Yahoo Map, our token attempt to work BBC/Yahoo APIs into our hack!

The event itself became infamous for the lightning strike that hit the building. I was watching a brilliant talk on Yahoo’s new FireEagle project, and suddenly there was a loud bang, then a rising screaming noise (exactly the sort of rising screaming noise that precedes a big explosion, according to Hollywood), then after a couple of seconds of silence, all the fire control vents in the roof opened. This was a particular problem because it was pouring with rain, and two hundred laptops and their attendant hackers started getting rained on.

However, if you’re a hacker, the solution to this kind of incident is quite simple - it’s raining, so you need an umbrella:

Hacking continues regardless as rain pours through fire control vents in the roof

Lightning is, of course, a push technology, so maybe nature was trying to tell us something. I’d also like to personally congratulate the person who tried to download Debian via my GPRS connection that I kindly shared during the wifi outage. I’ll be sending you my phone bill :-(

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Copyright 2008 Comet Daily, LLC. All Rights Reserved