Building a Better Stock Ticker

by Jerod VenemaMarch 1st, 2010

One of the most common use-cases for Comet is the ubiquitous stock ticker demo. There are several reasons why this particular demo is so popular: it’s easy to understand why people want stock quotes in real-time, it’s obvious that changes are occurring (even without any user interaction), and at least for a simple demo, it’s fairly simple to implement.

Although building a demo is quite trivial, building a stock ticker that has useful functionality quickly raises a number of questions, many of which have been directed to me at Frozen Mountain. One question in particular has been directed to me multiple times, and is more of a design question: what’s the best way to implement a stock ticker using Comet?

Note: for the purposes of this article, I’m going to assume we’re working with a Comet server built on the Bayeux protocol, such as WebSync or Jetty).

The Problem

The “best way to implement a stock ticker” is rather ambiguous, so let’s take a moment and first define our scenario clearly so we know what the problem really entails.

Let’s work with a fairly common scenario. Assume each user has a “portfolio” of stocks, each stock containing between 10 and 50 points of data (such as min price, max price, current value, etc). Each user also had the ability to show or hide certain data points. Since we’re talking about 10-50 data points per stock, and we obviously want to avoid sending 5x the amount of necessary data, we want to send each user only the exact set of information they require.

At First Glance

If you just take this problem at face value without digging into the details, one approach jumps out pretty quickly. The data set is unique per user, so just send each user their own set of data! This approach would look something like this:

  • Create a unique channel for the user
  • Build a custom “portfolio” object for each user
  • Change the serialization of that object to only include the properties selected by the user
  • For every user, when new data arrives, check to see if the new data matches the list of properties the user cares about, and if so, send it to them

While this approach would (somewhat) work, it has a couple fairly notable disadvantages:

  • State must be maintained for each user on the server, increasing memory usage substantially
  • A lot of additional checking needs to be done to determine if the data should be sent
  • The stock data gets tied directly to a specific user

Our first disadvantage, maintaining state, can be resolved with more hardware and persistent storage. Obviously, if we can eliminate the need to maintain state for thousands of users, we reduce the application complexity, remove the need for persistent storage (in case of a soft application reset), and we’re going to need a lot less hardware. Pretty straightforward.

Our second disadvantage, additional data checking, is less important, but still a hassle. When the data is sent, it has to be checked to make sure it’s in the list of data this user cares about. If the sum total of the data is nothing, then the request doesn’t actually need to be sent at all. Not impossible by any stretch of the imagination, but annoying. And software should never be annoying.

Our third disadvantage with this approach, the tight coupling of data to individual users, is a little more interesting. Typically, to get data streaming, the data is published from a separate process - a windows service, linux/unix daemon, whatever. This process contains no information about the web users, and is just managing the data. That means we need to make an adjustment to either the process (to make it aware of the users by using a common storage mechanism, such as a database) or the web application (to make it pre-process the data, such as in WebSync’s “BeforePublish” event, and push the data to the individual users). Neither of these options sound like ideal, and both would still require the web application to be stateful.

So with all these issues, where does that leave us? If we analyze the problems with our “first glance” approach, it becomes obvious that the what we’re looking for is a solution that is both 1) stateless and 2) loosely coupled between the data and users.

Now that we’ve agreed on that…let’s locate that solution!

Channel Mania

Let’s re-state our problem a little, now that we’ve gotten a little more in-depth. Basically, we need to have a way for any user to be able to access live changes for any, all, or none of the data points. If we think about this in terms of the Bayeux protocol, what we’re really saying is that each data point needs its own channel.

That means no more per-user channels, which in turn means our application can be stateless, which also implies that we can keep our publishing of data separate from our consuming of the data. Ok, interesting. Of course, this solution immediately raises some questions as well.

First, isn’t creating a channel for every data point a lot of extra overhead? Well, there are two places where we really have to care about overhead: the server process that’s managing all the messages and the size of the actual message itself. The message size can be easily managed by keeping channel length small, so that’s not a big problem. The server processing of the messages is actually a very interesting discussion…

While I can’t speak for certain about implementations of the Bayeux protocol other than WebSync, I believe our implementation may be similar to many others in this particular aspect. WebSync makes using multiple channels a very efficient method of message distribution. In fact, WebSync only actually creates a channel when at least one user subscribes to it, and destroys it once the last subscription to that channel is gone. What that means is that publishing a message to 10,000 channels when users are only subscribed to 5 channels results in 9,995 messages immediately getting discarded, which is the fastest possible operation that can be performed. So we’re ok on server overhead too.

Final Result

Now we’ve answered our two main objections to this solution: we can keep message sizes small, and having lots and lots of channels is actually a very good thing for efficient message distribution. We’ve also noted that this solution allows us to reach our goals of statelessness and clean separation of tasks between the publishing and consuming of data. So, as a result, we end up with all kinds of channels, each of which might look something like:

"/ticker/GOOG/max"
"/ticker/GOOG/min"
"/ticker/GOOG/current"

So the only question that remains is how to actually subscribe to all these channels without creating a ton of additional requests? Well guess what - the Bayeux protocol accounts for this very scenario. When you make a subscribe request, you can subscribe to multiple channels in a single shot:

client.subscribe({
  channels: [
      "/ticker/GOOG/max",
      "/ticker/GOOG/min",
      "/ticker/GOOG/avg"
  ],
  onReceive: function(args){
      // handle the incoming data...args.channel will describe the 
      // channel, so if we need to we can update specific table 
      // columns, rows, etc.
  }
});

Another interesting aspect of this approach is that it is highly extensible. We’re now able to work with 2D data (stocks => rows, data points => columns) but we can extend our approach out to 3D data with another path portion on the channel such as "/ticker/GOOG/{date}/avg", and start displaying data in a cube instead of a table!

Man, now I want to make a 3D stock ticker…

One Response to “Building a Better Stock Ticker”

  1. Mario Miki Says:

    Hi Jerod,

    I liked this post. Simple and clear but I’m wandering about the overhead of keeping at the server side the channels a user is subscribed, is this a concern or it isn’t?. Thank you very much.


Copyright 2015 Comet Daily, LLC. All Rights Reserved