One topic often brought up with regards to Comet servers is performance. This is mainly due to the different style of communication used by Comet applications compared to more traditional applications and web servers. This article covers various aspects of Comet server performance and the testing of them.
Lots of connections
The basic problem that Comet servers first face is having to handle a large number of open connections. If clients are subscribed to events from a Comet server, then they need an open connection to allow the server to send events when they occur, so in general a Comet server needs to be able to handle connections for at least the number of concurrent users at any one time.
This is a classic problem for client/server applications and one that has been discussed a lot (http://www.kegel.com/c10k.html). Unfortunately most web servers and applications servers were not initially designed for this high number of long-lived connections. A typical web server is designed to handle lots of short-lived connections, although with HTTP Keep Alive in place a very busy website could still be managing quite a high number of connections. Many of these web servers were based on a process or thread per connection, which can quickly become a limiting factor. However, this problem is generally overcome by most mature Comet servers, in Java often using Java NIO and in other lower level languages having more direct access to some of the methods mentioned in the c10k article linked to above.
What do we want to achieve?
The above approaches are all very well, but what do we actually want from our Comet servers in terms of performance? Being able to maintain 10,000 open connections is not that much use if sending a message to them takes 10 seconds or if the server falls over when the message rate to those 10,000 users gets too high.
The two aspects of performance we should look at are message rates and message latency. You might want to keep an eye on server CPU utilization, as it can give you an idea of what is going on, but the message rates and latency are what you should actually be concerned with.
This is where testing comes into play, and with these kind of numbers it is not that straight forward. The only way to find out how your Comet server behaves with 10,000 clients is to test it with 10,000 clients. This raises a big problem, because most people do not have access to 10,000 client machines that they can use in a controlled way. Even running multiple browsers or virtual machines is not really feasible when trying to hit 10,000 clients.
Short of having 10,000 test machines, the only way to test 10,000 clients is to implement a test client application that can act as multiple clients by making multiple connections. With a test client like this you can use a handful of machines to simulate your 10,000 clients.
Some Comet servers also have client APIs in non browser technologies, such as Java or C++. This is a good start, although there may still be problems as these are probably designed to be a single client. For example, a Java client API may be implemented using a thread or two to handle the socket communication. Using this API to make a multi-connection test client would run into the same problems as the early Java server applications before Java NIO was used. Aside from this, you might find that a client API designed primarily for a single client is just not efficient enough to be a multi connection test client without you still requiring quite a large number of test machines. This does not mean the API is bad, just that it has different requirements such as usability and stability.
For benchmarking Liberator we implemented a simple command line C application that can act as multiple clients. It does not have all the capabilities that you would need to write a full client application—it just parses the bare minimum of the protocol to record some statistics and some configurable logic for making requests, and so on. This allowed us to run hundreds or even thousands of clients off a single machine.
It is easy to assume that, because you have 10,000 clients logged on and subscribed to data, they are receiving events as well as when you tested with 5 visual clients connected. This will almost certainly not be the case. The only way to be sure is to measure it for all messages on all client connections, making no assumptions.
To take measurements you need some known messages. To create a data set useful for benchmarking you may need to create another test application. For Liberator this means a test DataSource application. We implemented this to allow controllable messages with a controllable update rate. The messages contain a sequence number and a millisecond timestamp. With these known messages the test client can calculate message latency.
To measure latency perfectly, the timestamp on the message needs to be created on the same machine as it is being received on. In other words the test DataSource needs to be on the same machine as the test clients. This poses a problem, since you will probably be using multiple client machines. However, with the other client machines synchronized using NTP, the latency measurements on those will be accurate enough for your needs, but it is still useful to have one machine that will record the latency perfectly.
So we now have a setup that can record message latency, and it can easily record the message rate being received too. These are the two aspects of performance we want to record. Often message rates will be controlled by the setup of the test, however it should still be recorded to make sure the expected behavior is achieved.
I have already stated that message rates and message latency are the two important aspects of performance. The back-end message rate, the number of clients, and the number of subscriptions those clients make, all add up to the overall throughput of the system.
A Comet application may have events being produced at a low message rate, but with all the clients subscribed to all those events this would result in a high rate of messages going out to clients. Another Comet application might have a very high message rate being produced, but clients are subscribed to a small subset of the total messages, which might give a different slant on performance. This is a simple example that demonstrates the need for a number of tests to be performed so as to gain a good understanding of how a Comet server behaves under different circumstances. There are many variables involved and they can all affect performance in different ways.
Often these variables present a trade off between latency and throughput. Some Comet servers allow you to configure certain aspects which allows you to tune this trade off. For example, batching messages together at the transport level can help greatly with throughput, as it makes better use of the network and often CPU, at the expense of added latency. However, in real world conditions, a small amount of batching can actually improve latency. If minimum latency is not absolutely critical, then batching is very useful. In a full streaming transport this is often a configuration option, and in a long-polling transport the batching is inherent in the transport.
In recent years multi-core and multi-CPU machines have become a lot more common, and server applications must take advantage of this. Most Comet servers are multi threaded, but when it comes to making the best use of the CPU cores available on the machine, it is not always as simple as making a server multi threaded. You don’t want a single thread using all of one CPU core while the other cores are doing very little.
In Comet applications, where there are only clients and the server threads all perform the same tasks, the workload can usually be spread quite evenly across the CPU cores.
In asymmetric Comet applications, where clients are subscribing to data from a server side data feed, threading can become more important. This setup means there is more of a flow of data through the server, with different tasks being handled in different places. The design of the Comet server can be very important here to avoid scenarios where some CPU cores are being fully utilized, causing a bottleneck, while other cores are not doing much processing.
This can be a key point when testing different scenarios. A fast data feed with clients subscribed to a small subset of data may utilize your CPU cores in a very different way than a slower data feed with clients subscribed to more data.
In a previous musing I talked about bandwidth usage in Comet applications. This is an often overlooked factor. People talk about high numbers of clients and for some applications high message rates without realizing the impact on bandwidth.
We have to be clear on this though. Comet applications may use more bandwidth than non Comet applications, but it is not a level playing field. If you compare an application which must update the user with new information, then a Comet application will likely use significantly less bandwidth than a page refresh website.
The kind of throughput achieved in benchmark tests requires a lot of bandwidth, and we usually benchmark on dedicated gigabit networks. Testing at this level over the Internet is not very feasible and would probably introduce some interesting artifacts not seen when testing lower throughput over the Internet.
Some actual results
The following results are a small subset of the overall tests we carry out, and hopefully correspond to typical setups of our customers. In all the tests shown here the message size is about 60 bytes; however this size is protocol dependent and may be larger or smaller using another product.
In each test the number of clients is increased. Each client subscribes to a number of objects (or channels) chosen randomly out of a maximum number and with each object having a known fixed update rate.
These tests used a fairly powerful server, a 4 x dual core AMD Opteron 2.8GHz running Redhat Enterprise Linux 4.
Low update rates
This test shows a very large number of users receiving a fairly low update rate with a very low latency and low CPU utilization. (Click for full size version).
Medium update rates
In this test we increase the number of subscriptions each client makes, which increases the update rate to 10 messages per second to each client. The results show that a very high number of users is still achievable. We can see that after about 12,000 clients the latency starts to increase more. (Click for full size version).
High update rates
This test increases the update rate to 50 messages per second to each client. A lot of Comet applications would never need anything like this level of updates, but many financial applications do, and often even higher update rates.
We can still see decent performance up to around 10,000 clients. Understandably, latency is higher than in previous tests. It is worth pointing out that at the 10,000 client mark Liberator is sending out 500,000 messages per second in total, which is 29MBytes/sec. (Click for full size version).
Tests were also performed going up to 500 messages/sec to each client, with slightly different subscription scenarios. More details on these tests and the tests shown above are available on The Liberator Free Edition website in HTML and PDF formats.
Benchmarking Comet servers is not easy. There are lots of variables which impact the results, often in an unexpected way, so the only way to be sure is devise and run the appropriate tests. Making assumptions while testing is not good; if you need to know something it should be measured and recorded.
It is possible to achieve high numbers of users with high update rates and relatively low latency. However, a project needs to understand the implications, such as bandwidth, which are inherent in the requirements rather than the technology. A serious project should also perform benchmark tests themselves, with test scenarios similar to expected usage, rather than just looking at some headline figures from the Comet server vendor.