After some recent optimizations, the Dojo Cometd implementation of the Bayeux protocol running on the Jetty web server can now handle up to 20,000 simultaneous users per server while maintaining sub-second latency.
Previous benchmarking efforts had been done on low end hardware with early versions of Cometd, yet had showed promising results, with 10,000 simultaneous users being possible but with latency blowing out to 7 seconds or more.
This latest testing has taken place on more realistic hardware with the more mature implementation of Cometd that is provided by Dojo and Jetty.
The test machines
The test machines used were mid-sized Amazon EC2 virtual servers: 7.5 GB of memory, 2×2 EC2 Compute Units, 64-bit platform running Ubuntu 7.10 and Sun JVM 1.5.0_13. A single virtual machine was used as the Cometd server and between 1 and 3 virtual machines were used to generate the load of 20,000 clients.
The test server software
The Cometd server used was the Cometd demo bundled with Jetty 6.1.7, with some modifications:
- The war was expanded into a
webapps/rootdirectory and the war file removed
- All other contexts and webapps deleted
Initial benchmarking past 10,000 showed that the limiting factor was lock starvation on the BoundedThreadPool. A new version of the thread pool (QueuedThreadPool) was written that used 4 locks instead of 1 and was configured with a maximum of only 50 threads.
The number of acceptors on the SelectChannelConnector were increased to 4
The lowResourcesConnections was increased to 30,000
- The timeout was increased from 120,000 to 240,000 ms
- All filters were removed for the test
The application tested was the simple chat room (the “hello world” of Comet) using the BayeuxLoadGenerator class shipped with Jetty.
The test client(s)
The load generator uses the Jetty asynchronous HTTP client so that 20,000 Bayeux users can be simulated with only a few threads. While it is somewhat remarkable that the Cometd/Jetty server can scale to 20,000 users, it is truly remarkable that the test client can also handle such loads.
But no matter how good the test client is, 20,000 simulated users on 1 JVM on 1 machine just can’t generate/handle the same load as 20,000 users each with their own computer and network infrastructure. To understand how this affected the results, this test measured the latency of all 20,000 users simulated from 1 machine and compared that with the latency experience on a machine simulating only 1,000 users with the remaining 19,000 users being simulated by 2 other machines.
The test load
The test load generated was the most difficult for Cometd: a constant stream of messages without any introduced jitter that would allow the server to catch up on backlogs of messages. The load was a 50 byte payload sent in a burst to 10 randomly selected chat rooms at an interval fixed for each test. The interval was selected so that a steady state was obtained with the server CPU at approximately 10% and 50% idle.
Tests were performed for 1,000, 5,000, 10,000 and 20,000 simultaneous users for 100 and 1,000 chat rooms. This gave a range of 1 to 200 users per room.
The raw benchmark results are available and are plotted in a bubble diagram above. You can download the results in a Gnumeric spreadsheet (with the diagram) or CSV format (without the diagram). The bubble size relates to the message throughput (a maximum 3,800 messages per second were delivered) and the bubble position shows the round trip latency.
The key result is that sub-second latency is achievable even for 20,000 users. There is an expected latency vs. throughput tradeoff: for example, for 5,000 users, 100ms latency is achievable up to 2,000 messages per second, but increases to over 250ms for rates over 3,000 messages per second.
Above 10,000 users, the test client became brittle, and while it was able to maintain good average latency, the maximum latency could blow out to beyond some of the timeouts. This appears to indicate that lock starvation is still occurring in the test client. The tests were repeated with the load generated from 3 clients instead of 1 and the latency measurements were taken from a client limited to 1,000 simulated users. This approach gave significantly better latency and throughput for 20,000 users and of course real traffic will not suffer from this type of limitation.
The Cometd implementation from Dojo and Jetty has demonstrated scalability for web-2.0 applications to levels that meet or even exceed node utilization levels expected for web-1.0 applications.