Real Time Angst

by Andrew BettsJanuary 15th, 2008

My work often finds me writing support for (or creating) APIs and integrating systems, and I find it depressingly familiar every time a big client suggests connecting disparate parts of their IT empire with RSS feeds (or in fact any kind of XML, which is often mislabelled as RSS when it’s not). Yes, RSS is great, but will the world’s corporations please stop misusing it as the be all and end all of systems integration. I’ve recently had to do battle with a client and two of their existing suppliers over why syndicating a rapidly changing news source by polling an RSS feed every 10 seconds is a bad idea.

Of course integrating systems with real time data streams doesn’t require Comet (if we define Comet as push-to-browser), but because there are so few people doing it, I fear that developers will not see the point of Comet for the ‘last mile’ when their back-end is fundamentally not set up to be event driven.

For example, I have a new Blackberry Curve, but we don’t have Blackberry Enterprise Server at work, so I’m using the Blackberry service provided by my network operator. This checks my IMAP email every 15 minutes and then squirts the new messages to my Blackberry in real time. What on earth is the point of that? The phone may as well just poll.

So we need to extend the real-time vision further into the back-end of systems to make it worthwhile to use Comet. One of the only useful real-time data sources available for hacking with (at time of writing) is the Livejournal update stream from Six Apart. This is basically an XML feed, though it can never validate as XML since it doesn’t have a doctype or wrapper element, and you’ll never finish loading it.

The Six Apart stream features content from Livejournal, and pushes the full text and all metadata out on the feed as soon as it’s published. That’s a lot of data—about 7KB/s on average, or 2-3 posts per second. Pretty impressive for a community that’s often considered to be a bit out of vogue. The demographics of LJ show that most users are aged 15-24, and are two-thirds female. Top topics include school, boys, depression, parents and celebrities. In other words, it’s a never-ending stream of angst.

I’m going to present a tutorial for working with this real time data and streaming it to the browser using Comet (for which I’ll be using the Meteor Comet server, though the comet implementation you choose is fairly irrelevant to this example), in an attempt to demonstrate that Comet is not just useful for chat rooms, and that when you think about the bigger picture, making your applications talk to each other in real time is a very good idea.

Coping with the pain

I wrote a daemon in PHP to listen to Six Apart’s stream, extract English posts, catalogue all the angst-related words and construct a word frequency index. Here’s the bit of code that does the indexing:


// Decode, replace LJ-peculiar entities, and strip tags
$content = strip_tags(str_replace($entities, $replacements,
(html_entity_decode($content))));
// Remove any non-words (phone numbers, ascii art, wierd l33t-speak...)
$content = preg_replace("/[^a-z\-]+/i”, ” “, $content);
// Split resulting text into words
$words = explode(” “, $content);
foreach ($words as $word) {
  // Lowercase the word for comparison and see if it’s a target term
  $word = strtolower($word);
  if (in_array($word, $terms)) {
    // Add the time of this occurence to the occurences list for this word
    if (!isset($counts[$word])) $counts[$word] = array();
    $counts[$word][] = time();
    // Add the word to the words changed list for quick reference later
    if (!in_array($word, $wordschanged)) $wordschanged[] = $word;
  }
}

This creates an associative array noting the times at which each key word was seen. The number of elements in each word’s array tells us how many times that word has been seen since the earliest timestamp in the array (which will always be the first element). As the array is constructed we’re also keeping note of the words that have been spotted again since the last set of rolling averages were calculated. Whenever data is no longer being received, the script uses the downtime to recalculate those averages, like this:


while (!empty($wordschanged)) {
  $w = array_shift($wordschanged);
  // Remove any occurences that are older than 5 mins
  while (sizeof($counts[$w]) and $counts[$w][0] < (time()-300))
  array_shift($counts[$w]);
  // Calcuate the new 5 min average and compare it to the previous one
  $prev = (isset($prevcounts[$w])) ? $prevcounts[$w] : 0;
  $now = round(sizeof($counts[$w])/5,1);
  // If more than 2% different, send the update to Meteor
  if (abs($now-$prev) > (0.02*$prev)) {
    $out = “ADDMESSAGE angst {w:’”.addslashes($w).”‘,c:”.$now.”}\n”;
    echo “> $out”;
    fwrite($op, $out);
    $prevcounts[$w] = $now;
  }
}

For each word, the list of occurrences is trimmed until the first element is less than five minutes old. If the new count is still different to the old count, the new count is divided by five to give a five minute rolling average, and inserted into Meteor on a controller connection (already open as $op). If it’s the same, there’s no need to insert a message, since the client will continue presenting the old number, knowing that it is still up to date.

Finally, and I’d love it if someone could come up with a better way of doing this, we need to fix a PHP memory leak. The $counts array is treated like a queue, with timestamps appended to the end of each word’s ocurrence list, and removed from the start. As a result it should gradually move through memory, allocating new blocks at the end of the array, and releasing them at the front. But it doesn’t seem to release them, so the simple solution is to simply copy the remaining elements into a new array:


$newcounts = $counts;
unset($counts);
$counts = $newcounts;
unset($newcounts);

This is all wrapped in a never-ending loop that maintains connections to both Meteor and the Six Apart stream, and re-establishes them if they drop. The complete commented code can be downloaded here:

livejournalmonitor.php

Now run it, and you will start filling up your Meteor server with word frequency change notifications. The next step is to make a nice JavaScript visualisation of this data.

Giving it some bling

I’m using jQuery, my favourite JavaScript library, to make this all very easy. I’m going to put my code in an include file, which I’ll load after the jQuery include, in the <HEAD> of the HTML document. Of course you could also include it inline, in a <SCRIPT> block. First you need a connection to Meteor (you have installed Meteor, haven’t you? If not, go do it now. We’ll wait):


$(document).ready(function() {
  Meteor.hostid = hostid; // defined earlier
  Meteor.host = "data."+location.hostname;
  Meteor.registerEventCallback("process", newHit);
  Meteor.joinChannel("angst", 25);
  Meteor.mode = 'stream';
  Meteor.connect();
}

This subscribes to the angst channel, retrieves the last 25 messages, and starts streaming new messages. hostid is just a unique reference for this connection, so that when the client reconnects the server knows to terminate the old connection. It can be anything you like, but each client needs to choose a different one, so either allocate it from the server end or use a random number with a high level of entropy. It also registers a function for handling new events: newHit.

The next step is to define a function called newHit and deal with messages.


function newHit(datastr) {
  eval("var data = "+datastr+";");
  if ($("#wordrow_"+data.w).length) {
    var prev = $("#wordrow_"+data.w+" td.c").html();
    var change = Math.round(Math.abs(data.c - prev)*10)/10;
    if (change > 0) {
      var dir = ((data.c - prev) > 0) ? "up" : "down";
      $("#wordrow_"+data.w+" td.c").html(data.c);
      $("#wordrow_"+data.w+" td.ch").empty().append("<div class=\""+
      dir+"\">"+change+"</div>");
      $("#wordrow_"+data.w+" td.ch div").animate({backgroundColor:'#fff'},
      'slow');
      updateAngstOrder(data);
    }
  } else {
    var min = ($("#datatable1 tr").length >= 11) ?
    $("#datatable1 tr:last td.c").html() : 0;
    if (data.c > min) {
      $("#datatable1").append("<tr id=\"wordrow_"+data.w+
      "\"><td class=\"w\">"+data.w+"</td><td class=\"c\">"+
      data.c+"</td><td class=\"ch\"></td><td class=\"gl\"></td>");
      $("#wordrow_"+data.w).animate({backgroundColor:'#fff',
      fontWeight:'normal', color:'rgb(68,64,88)'}, 3000);
      updateAngstOrder(data);
    }
  }
}

The message that the function receives is a JSON object, something like {w:’ohmygod’,c:13.6}. So you can just eval() it to assign a native object to a variable. On line 6 we split depending on whether we already have an entry for this word or not. If so, we can look up the current value from the HTML we wrote last time to calculate the change, and if there is a change (which there should be, else Meteor would not have sent an update), display it. This means working out the direction of the change, updating the HTML with the new actual value, displaying an arrow showing the direction, and setting off an animation effect to highlight the change.

If the word doesn’t exist yet, then we may want to add it to the table, but because we only want a maximum of ten items in the table, it will have to have a higher value than the lowest item currently displayed. To do that, we check that the table is full (11 rows equals ten data rows and the header row) and extract the value of the last item, or otherwise set a minimum of zero. To highlight the new entry in the table, it is initially set to a yellow background, and gradually faded to white.

You’ll notice that whenever a value is updated or a new entry is inserted, I’m calling a function called updateAngstOrder(). This shuffles the list to ensure that all the rows are in the correct rank order. Here it is:


function updateAngstOrder(word) {
  var row = $("#wordrow_"+data.w);
  while (row.prev().length && data.c >
  parseFloat(row.prev().children("td.c").html())) {
    var prev = row.prev();
    row.remove().insertBefore(prev);
  }
  while (row.next().length && data.c <
  parseFloat(row.next().children("td.c").html())) {
    var next = row.next();
    row.remove().insertAfter(next);
  }
  while ($("#datatable1 tr").length > 11) {
    $("#datatable1 tr").eq(($("#datatable1 tr").length-1)).remove();
  }
}

First we establish what the value of the changed row is, then, while that value is greater than the one in the row above, we shift the row up the table, and while it is lower than the value of the row below, we shift it down. Once the row has been positioned correctly, there may be an extra row at the bottom to remove, so the table is trimmed of the lowest values until it contains just the top ten. This will be because when a new entry comes along that’s higher than the lowest value in the table (but when there are already ten items in the table) we add it anyway, shuffle it to the right position, and then remove the lowest item.

Plottin it, on like a chart, innit

There seemed to be a nice convenient gap in my page that called to me and said something like “I ought to be a real-time graph, duh”. So I decided to give it a go with a Flash component. Initially I spent quite some time trying to write one from scratch and then, at about the same time as I realised what a mammoth job it was, I discovered amCharts.

The amChart takes care of scrolling points leftwards automatically, adjusting its own scale as the min/max range of plotted values changes, and allowing mouseovers to examine individual data points. But first of all it would need data from my page, so I added the following to updateAngstOrder:


if (data.w == "me") updateFlash("me", data.c);

Quick and dirty, but I only want to plot a single data series for the moment. Now we need an updateFlash function to send the new value to the amChart:


var angstdata = {};
function updateFlash(word, value) {
  var now = new Date();
  var hrs = now.getHours();
  var mns = now.getMinutes();
  var scs = now.getSeconds();
  var nowts = Math.floor(now.getTime()/1000);
  angstdata[nowts] = value;
  var xml = “”;
  for (var i=0; i<=59; i++) {
    scs–;
    if (scs==-1) { scs = 59; mns–; }
    if (mns==-1) { mns = 59; hrs–; }
    if (hrs==-1) hrs=23;
    var fsec = (scs<10) ? “0″+scs : scs;
    var fmin = (mns<10) ? “0″+mns : mns;
    xml = “<value xid=\”"+i+”\”>”+hrs+”:”+fmin+”:”+fsec+”</value>”+xml;
  }
  var xml = “<?xml version=\”1.0\” encoding=\”UTF-8\”?>\n”+
  “<chart>\n<series>\n”+
  xml + “</series><graphs><graph gid=\”1\”>”;
  for (var ts in angstdata) {
    var secago = nowts - ts;
    if (secago > 60) delete angstdata[ts];
    xml += “<value xid=\”"+secago+”\”>”+angstdata[ts]+”</value>”;
  }
  xml += “</graph></graphs></chart>”;
  if ($(”#amlineswf”).length) $(”#amlineswf”).get(0).setData(xml);
}

So what we’re doing here is building up a string of XML in a format that amCharts understands. First we define the X-axis divisions: one for each of the last 60 seconds, labelled with HH:MM:SS. The loop on lines 9-17 steps backwards through time in 1 second increments, formats the time, and writes an entry to the XML string (importantly, each division is keyed using the number of seconds between its timestamp and the current time). We’ve also just added the latest value to a global array angstdata on line 7, using the timestamp in seconds as a key. Now we can write the values for the data series, attaching each value to the relevant x-axis division using the time offset to work out which division each value belongs to. Once the XML is complete, we just tell the amChart to do its stuff.

And there you have it

All the JavaScript code is reproduced on this page, and is also available via the live example of the angst monitor linked above. To implement the amChart you will need to download the amLine release package from amCharts.com—the only files you need from this package are amline.swf, amline_settings.xml, amline_data.xml and swfobject.js (though you can write your own Flash embed code if you prefer). I have modified amline_data.xml to remove the data since it will all be generated real-time from JavaScript, and have tweaked amline_settings.xml to create the visual style that I wanted. You can download both of those modified files here:

amline_data.xml
amline_settings.xml

Phat conclusions

You are like seriously sad and I hate you

~ Comment from anonymous Livejournal user

Conclusions herein should not be taken seriously, and I’m perfectly aware of the numerous scientific flaws in this. Scientific accuracy wasn’t the objective.

That said, some of the nicer findings of this fairly pointless mashup are that life is over ten times as popular as death (although there are still thousands of mentions of death every day) and love triumphs over hate. It’s nice to think the world has 3.8 times more love than hate, but maybe terrorists don’t blog. And remember the most popular word of all: ‘Me’.

4 Responses to “Real Time Angst”

  1. Kris Zyp Says:

    Very cool. I think you comments about having polling in the middle of process are great. I think a key goal of real time data should be end-to-end pushing. Comet provides the last leg, but having polling in the middle somewhere defeats the goal.

  2. Simon Proctor Says:

    Ok, after I finished giggling and posting selected quotes to my Livejournal, I’m one of the long time users who bucks the demographics, such is life, I went through the code and was impressed. Very very nice.

  3. Comet Daily » Blog Archive » Comet Daily Twitter Says:

    [...] And while on the topic, is anyone using Comet with Twitter’s APIs? Does Twitter provides enough of a real-time data feed? [...]

  4. paramesh Says:

    I want to implement an application like server sends data to client(without ckient request) rapidly.So, i want commet FrameWork how to use it(examples). THANKING YOU SIR,
    paramesh.


Copyright 2015 Comet Daily, LLC. All Rights Reserved