Lightstreamer
SitePen Support

On-board vs. Off-board Comet

by Joe WalkerMay 22nd, 2008

Definition:

Off-board Comet: When separate web-servers handle normal connections and Comet connections.

On-board Comet: When the web-server that serves your normal content also handles your Comet connections.

What difference does it make?

More than you might think. While there are some fuzzy parts to these differences, it affects your architecture.

Off-board Comet

  • Favors a more disconnected bus-like architecture
  • Works well with a multi-language system
  • Libraries can offer scaling more easily

off-board Comet diagram

On-board Comet

  • Easier to get started
  • Easier management of state shared with normal content
  • Development platform can be more unified

on-board Comet diagram

What libraries help me do on-board and off-board?

Cometd, Lightstreamer, Caplin Liberator and Orbited are examples of off-board systems. DWR and web-servers with built-in Comet like Jetty, Grizzy and Tomcat are examples of on-board systems, although web-server based options could be used in an off-board way too.

For our Comet talk at JavaOne we had a Twitter clone implemented with DWR. We were able to add Comet updates in 2 lines of configuration and under 50 lines of source code. To have done the same using Cometd would have taken much more effort. On the other hand if, like Facebook, we’d wanted to use Erlang to overcome threading problems, DWR would have been a non-starter.

How do I choose?

Some indicators that you need off-board Comet:

  • You are looking for Facebook or Google levels of scalability.
  • You are using PHP or another language that doesn’t play particularly well with Comet.
  • You have a large existing system, and your use-case for Comet is fairly separate from your main business.

Some indicators that you might need on-board Comet:

  • Your Comet needs are tightly integrated with your main business.
  • You are looking to get started with some simple use-cases.
  • You want Comet features without significant server changes.

If you would like some help or advice with Comet then it’s worth asking SitePen as we provide professional Comet support from investigating an architecture, to developing a solution.

The JavaOne talk

I’ve uploaded the slides that Alex Russell and I used at JavaOne to SlideShare.

You don’t get to see the demo on these slides, but if you want you can try it out yourself. BlabberOne is a Twitter clone which demonstrates (amongst other things) how easy it can be with on-board Comet to add asynchronous updates. The source code is available from DWR’s SVN repository.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Comet and Cross-Site Scripting

by Joe WalkerMay 20th, 2008

When Ajax was the #1 premium buzzword, we had a spate of ‘Ajax security issues’, which were mostly just known browser issues pumped up with a little extra JavaScript. Most web security issues affect Web 1.0 as much as Web 2.0.

I’m not sure that Comet will ever reach the stratospheric level of buzz that Ajax did, but I do have a clue where someone can find the headline "COMET SECURITY FLAW". Like many of the Ajax counterparts it’s not really anything new, but it is something to be aware of.

Short version:

Any site with XSS flaw + user editable pages + comet = web-worm to take on a Warhol Worm for speed.

Long Version + Comparison with Warhol Worm:

I created a clone of Twitter using DWR for my 2 talks at JavaOne. The first was on Comet with Alex Russell and the second on security with Jeremiah Grossman. In the first talk we showed how easy it is to add Comet features to an app, and in the second we hacked a (deliberately insecure) Twitter clone to pieces, ending with a web worm that worked its way around the site infecting user profile after user profile.

There is the potential for a web worm any time you have a XSS flaw on a site with user editable pages such as a social network. Perhaps the most obvious example was created by Samy Kamkar who managed to create some HTML that evaded MySpace’s anti-XSS filters. If you viewed Samy’s profile the XSS was executed, and the script copied itself to your profile and sent a friend request on your behalf to Samy. Then whenever someone looked at your profile they got infected, also befriended Samy, and so on. Samy is just out of his 3 year probation having pleaded guilty to a violation of penal code section 502(c)(8).

The worm managed to infect over a million profiles in well under 24 hours as people clicked around MySpace. The only option available to the MySpace admins was to shut the site down while they cleaned it out. More technical details on the attack are available.

The good news for MySpace was that they'd not taken the Comet step that Facebook just have with Facebook Chat. If Samy could have used chat as a vector for his XSS propagation then he could have infected far more than a million profiles in 24 hours. The propagation speed would be limited only by the latencies built into Facebook's chat system. With the Samy incident the propagation relied on you viewing the profile of an infected user. What if you only needed to be online and friends with an infected user?

A Warhol worm could, in theory, infect the entire Internet in 15 minutes (hence the name). The propagation time of a Warhol Worm is limited by the need to find new hosts to infect. Warhol worms can prime themselves with an initial hit-list of vulnerable hosts, but from then on they are limited by the search for hosts to infect.

The propagation rates of a social network worm are different. We remove the random element replacing it with a node space based on our proximity to Kevin Bacon. The idea is that we are all connected, even to Kevin Bacon, through 6 friends.

Assuming a the worm takes 1 second to propagate, clearly this means you can infect the majority of people in any social network in 6 seconds, and we can replace the initial hit-list by making sure we are friends with Robert Scoble.

The 6 second theory is flawed simply because any social network is going to fall over very quickly under that strain whether it's written in Erlang or not. This is of course not much comfort to the Facebook admins!

Just to make it clear - we’re not saying that Facebook has any holes. They don’t allow HTML as an input so their system has some built in resilience against XSS attacks. The point is that if there is a chance that your system is vulnerable to web worms, Comet could make matters much worse.

Numbers side-track: The time to infect people depends on the size of the total population and the average network size. The only data I could find on Facebook network size claims a average of 164 friends for a population of 70 million or so, where the 6 degrees assumption is based on 250 friends in the real world against a population of approaching 7 billion. The 6 degrees Facebook application thinks the average number of nodes between 2 people is under 6. In some ways it’s rather academic however - if a flaw of this type could crash Facebook in 5 or 50 seconds, either is fairly bad.

Moral: Before you add chat or other Comet features to a site, make very sure that you don't have any XSS holes.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Developing for Comet

by Joe WalkerMarch 21st, 2008

There are 3 styles of API that are relevant when considering Comet services:

  • Traditional API Style. This API is the most obvious. If you want to set the text of a button, you call
    field.setValue(43.5) or something similar. It’s possible to imagine complex APIs like the SWT library made available remotely. This type of API does not need Comet, unless the trigger for wanting to alter the UI is asynchronous.
  • Message Passing Style. This API style is very simple—typically just a subscribe() method to declare an interest in some topic, and a publish() method to declare some information to a topic. Some messaging APIs (notably JMS) add a large number of classes around this simple concept, however at its core, it’s as easy as those 2 methods.
  • Synchronized Data Style. This style involves some data cache on the server that is synchronized with a similar data cache on the client (or clients).

There are pros and cons to all these API styles. I’m going to compare them with notes as to how they are supported in DWR. Unless noted otherwise the code below works in DWR version 3, milestone 1.

Traditional API Style

The most obvious API style is fairly heavyweight in that there could be a lot to learn, but it’s flexible, and easy to understand what is going on.

One of the things that we pride ourselves on with DWR is a minimalist API, so how do we go about supporting this interface style? We certainly don’t want to invent a widget toolkit to control remotely. The route we have taken involves replicating currently existing JavaScript APIs in Java. The first API that we’ve cloned is the TIBCO GI API, and I hope to have the Dojo Toolkit API working remotely before too long, although it might end up going into 3.1.

Here’s an example from the Java version of the GI API that’s in DWR 3 milestone 1. If you are familiar with the TIBCO GI JavaScript API, this will be a strange deja-vu of seeing stuff you are familiar with used from Java. When this code is run, DWR generates JavaScript on the fly, and sends it using Comet/reverse Ajax to all the browsers looking at ticketcenter.html.

import jsx3.GI;
import jsx3.app.Server;
import jsx3.gui.*;
import jsx3.xml.*;
 
// ... Then further down
// A CDF document is how GI stores data on the client
// This code simply populate a new CDF document with server-side data
CdfDocument cdfdoc = new CdfDocument("jsxroot");
for (Call call : calls) {
    cdfdoc.appendRecord(new Record(call));
}
 
// DWR code to find the browsers looking at a page
ServerContext serverContext = ServerContextFactory.get();
Collection<ScriptSession> sessions = 
	serverContext.getScriptSessionsByPage("ticketcenter.html");
 
// Get a handle onto the GI Server object so we can push
// the CDF doc into the client side cache, and repaint the table
Server tc = GI.getServer(sessions, "ticketcenter");
tc.getCache().setDocument("callers", cdfdoc);
tc.getJSXByName("listCallers", Matrix.class).repaint(null);

Many people will be more familiar with the Dojo Toolkit API. We hope to be able to create a shared text editor with code along these lines (this API is not part of DWR 3 milestone 1).

Firstly the Dojo Toolkit code in the browser in edit-demo.html:

<script>dojo.require("dijit.InlineEditBox");</script>
<script src="/dwr/interface/Server.js"></script>
 
<span id="myedit" dojoType="dijit.InlineEditBox"
	onChange="Server.update(arguments[0])">
Some text that many users can edit
</span>

And then the server code in Java that uses a DWR/Dojo Toolkit integration:

import org.dojotoolkit.proxy.dijit.Dijit;
import org.dojotoolkit.proxy.dijit.Editor;
 
public class Server {
  public void update(String newValue) {
    // DWR functions to find the users viewing a given page
    Collection&lt;ScriptSession&gt; sessions =
        WebContextFactory.get().getScriptSessionsByPage("edit-demo.html");
 
    // Proxy API generated from the GI API.
    Dijit dijit = new Dijit(sessions);
    Editor editor = dijit.byId('myedit', Editor.class);
    editor.setValue(newValue);
  }
}

These 2 examples make use of ‘drapgen’, a tool that takes a JavaScript API, introspects it, and automatically generates a Java API, which looks similar to the JavaScript API. When this Java API is executed the original JavaScript is generated, which is sent to the client asynchronously.

DWR also contains a small server-side version of the DOM that’s built into all web browsers:

ServerContext serverContext = ServerContextFactory.get();
Collection<ScriptSession> sessions = serverContext.getAllScriptSessions();
 
Window window = new Window(sessions);
window.alert("Hello, World");

DWR also contains a lower level API that allows you to execute arbitrary JavaScript code.

ScriptProxy proxy = new ScriptProxy(sessions);
proxy.addFunctionCall("window.writeln", new Date());

What about other Ajax libraries? We’ve already got a server side version of Scriptaculous.Effect. I’d like to see how easy it would be to create a server-side version of YUI, Ext and maybe jQuery too. I also wonder if it’s possible to create a remote version of the GWT API. So DWR could link to classes compile against the GWT API, but execute them at runtime for asynchronous delivery.

Message Passing Style

Message passing’s strengths are its simplicity and reduction of dependencies. In the Ajax world, the OpenAjax Hub is an excellent place to start thinking about an API provider.

During development of the OpenAjax Hub we developed a client written using TIBCO GI to which we published data using both DWR and Lightstreamer without any changes to the GI code. We just plugged a different data source into the hub.

Version 3 contains 2 new ways to use a message passing API style. The DWR hub is accessible from any thread. Messages published to the hub will be sent to any browser that has subscribed to that topic using the OpenAjaxHub.

Person joe = new Person("Joe Public");
HubFactory.get().publish("people", joe);

Subscribing to broadcasts made by browsers is just as easy:

HubFactory.get().subscribe("people", new MessageListener() {
    public void onMessage(MessageEvent message) {
        Person sarah = message.getData(Person.class);
        log.info(sarah.getName());
    }
});

Also in version 3, DWR contains an integration with JMS, so you can use a JMS API to achieve just what has been achieved above. I will skip over the code here as people that care about JMS will probably be able to imagine it anyway, and those that don’t will just be horrified that it takes twice as many lines to do just the same thing. If you are really interested you can take a look at the demo example in the DWR repo.

Synchronized Data Style

Many client-side APIs have data stores that serve as the M in an MVC framework. Perhaps the simplest style of API would be one in which some server side data cache was automatically synchronized with a browser or browsers on some page. Given some setup the API might then be as simple as:

dataStore.add(person);

DWR does not support this style yet, but we hope to add it as part of the work to integrate more deeply with various widget toolkits.

Which Style is Best?

The advantages of message passing style include:

  • Low coupling between data sources and data syncs. This makes systems more robust and more testable.
  • Good inter-language interoperability—SOAP has evolved from an RPC model to a more message based model predominantly due to interoperability issues.

The advantages of the traditional style include:

  • Tighter coupling, so you rely less on documentation to know what options exist rather than a more formalized API. Message passing could be called extreme dynamic typing, so if you like statically typed APIs, you’re more likely to like the traditional API style.
  • Less rich API. Remote API models may have a very broad range of possibilities. While message models may be as broad, they typically are not.

The advantages of a synchronized data style include:

  • It requires very little thought or understanding—the API might even be an API like the Java collections API that you know already.
  • It is possible for an advanced API to conserve bandwidth by grouping changes, and even pruning multiple updates to the same element into a single update.

On the DWR project, we don’t believe that there is a ‘right’ answer to the question ‘which of these alternatives is best’, so we are extending our support for all of them.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Buzzword Overload

by Joe WalkerFebruary 5th, 2008

Comet is sometimes overshadowed by the large number of buzzwords and phrases that bubble around it. There are so many similar names for the same thing that you could be forgiven for thinking that Comet was invented by Tolkien.

So this is a list of the terms, what they mean, and how they are related to Comet.

Comet: Any technique that uses a long-lived HTTP connection to reduce the latency with which messages are passed to the server.

‘Long’ is a relative term. It’s always amused me that ‘High-Temperature Superconductivity‘ happens at a temperature several hundred degrees colder than ‘Cold Fusion‘ does. Long-lived here is the same. We define long-lived to mean somewhere between several seconds to several minutes. Even a Gastrotrich would feel fairly shortchanged at that lifetime. In essence Comet means not polling the server regularly. Instead the server has an open line of communication with which it can push data to the client.

Forever Frame: A Comet technique which involves opening an invisible iframe pointing at a Comet server, which then sends data back. Jacob discussed this in more detail a few weeks back.

Streaming: A state where the server is able to drip-feed data to the browser without needing to ask the browser to reconnect regularly. Full streaming is the Holy Grail of Comet, but network proxies, anti-virus systems and even web-server modules can prevent data from streaming by holding onto it until the connection has been closed.

Long Polling: Long polling mostly means Comet without full streaming, i.e. the server is sending messages to the browser asynchronously, however in order to flush proxies the server is closing the connection soon after data is sent, and asking the browser to reconnect. Sometimes the phrase ‘long polling’ is used synonymously with Comet, though, and the phrase Long-Poll Abortion may be used. DWR has an early closing mode which implies long polling. Intelligent Polling and Intelligent Long Polling are similar and have a variable polling frequency based on user need. Kris Zyp talked about them in his article “Easing into Comet.”

Polling: Literally, polling is all about head-count (hence polling station), but with Comet it means asking again and again. Polling is generally the opposite of Comet, i.e. short-lived HTTP connections. Polling is a higher-latency, lower-bandwidth alternative to Comet that has a habit of killing servers. On the plus side it’s easy to implement.

Reverse Ajax: DWR allows you to use Comet or polling or even piggybacking (which means no extra network connections at all because server events are piggybacked onto normal requests). The umbrella phrase reverse Ajax is used to describe all 3 i.e. some way to get data from the server to the client.

Bayeux: Bayeux is a network protocol for routing events between clients and servers in a publish subscribe model. For more see Dylan’s “Introduction to Bayeux.”

Other terms that have been used roughly synonymously with Comet:

  • Push
  • Server push the term used by Netscape to describe Comet in 1995, referring specifically to XHR multipart (see below)
  • HTTP streaming (although see streaming above)
  • Pushlet (although this is the name of a Java project with similar goals)

In addition there are a number of buzzwords from other Comet techniques:

  • Server-sent events: An HTML 5 tag to allow Comet without any JavaScript or hackery. Championed by Opera.
  • XHR Multipart: A method of sending several sets of data along the same connection using MIME to separate the parts.
  • htmlfile: An ActiveX control in Internet Explorer. Of all the browsers, IE resists attempts to stream data the most, but the htmlfile can be a solution.
  • Script tag long polling: A method of allowing cross-domain Comet by sending a series of script tags that can point at data from other domains.
[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

More DWR News

by Joe WalkerDecember 20th, 2007

After the recent announcement about DWR joining the Dojo Foundation, I’ve written a DWR State of the Union

Also of note to Comet developers, I’ve created a tutorial, Using DWR with TIBCO General Interface, demonstrating some of the basic utilities DWR provides for making interactions between DWR and Ajax and Comet applications created with TIBCO General Interface.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

DWR and Grizzly

by Joe WalkerNovember 28th, 2007

Thanks to some quick hackery that Jean-Francois Arcand and I have been doing, DWR (and specifically Reverse Ajax) now supports Grizzly, the web-server from Sun’s GlassFish project. The code turned out to be really very simple because Grizzly supports Jetty’s ‘Ajax Continuations,’ so there wasn’t much change needed. The code should be out with the next release of DWR which we’ve been calling 2.1, but I’m thinking there is enough new stuff that it really should be called 3.0.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Why Comet is of Growing Importance

by Joe WalkerOctober 23rd, 2007

Comet is at the intersection of 2 trends. Put the 2 trends together and you have a compelling case for why the time is now right for comet.

Trend 1: The time spent on a single web page is increasing

It doesn’t take a lot to demonstrate this trend: Ajax-y web pages existed before the coining of the term, but only in pockets. Today it is a standard part of every “web2.0″ startup’s buzzword compliance list.

As the use of Ajax increases, so does the time people spend on a page without navigating to a new page increases. It’s been noted many times that Ajax means the death of the ‘page-view’ model of how successful a website is.

Through hours of painstaking research into the growth of Ajax on the web, it’s possible to create the following graph:

Trend 1

Trend 2: Pages are becoming more dynamic

There are a number of features that feed into this trend. The first is that websites are becoming smarter and smarter. We are now able to do things with websites that we didn’t think possible 10 years ago, like office suites and multi-player games.

The second, perhaps more important trend is the growth of the social side of the web. The web used to be somewhere you looked up recipes, now it’s a place where cooks create social networks to comment on what they’re cooking, who they got the recipe from and how it turned out.

It’s easy to see that the half-life of the text on a page is going down. From Facebook news feeds to blog posts and ratings sites, the content of the web is changing faster than ever.

Again, through hours of painstaking research into the half-life of web pages it’s possible to derive the following graph:

Trend 2

The Growth of Comet

It’s easy to see why comet is such a sure bet. On one hand we’re spending longer and longer on a single page, but on the other hand the chance that the page we’re looking at has changed on the server is going up. Clearly we need a simple, asynchronous, low latency way to update pages. Comet is the answer because it lets you do just that.

As final proof, through hours of painstaking cutting and pasting of the graphs above, we can create the ultimate executive proof of the growing need for comet:

Trend 3

Clearly the crossover point for any web page will be different. For chat sites like Meebo, the crossover happened years ago. For office suites that allow parallel editing, the crossover has also already happened. Sites covering live events are next in accepting comet, and it won’t be long before news feeds in social networks start dynamically updating too.

Comet isn’t going to evolve to be used on every web page on the Internet, but it is going to find its way into places that we’ve not thought of yet. 15 years of plain HTTP request/response has taught us to think in terms of static pages that don’t change by themselves too much. Comet lets us treat the web as a network of people interacting more naturally than they can behind a request/response barrier.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Copyright 2008 Comet Daily, LLC. All Rights Reserved