Webtide

20,000 Reasons Why Comet Scales

by Greg WilkinsJanuary 7th, 2008

After some recent optimizations, the Dojo Cometd implementation of the Bayeux protocol running on the Jetty web server can now handle up to 20,000 simultaneous users per server while maintaining sub-second latency.

Bayeux Benchmark

Previous benchmarking efforts had been done on low end hardware with early versions of Cometd, yet had showed promising results, with 10,000 simultaneous users being possible but with latency blowing out to 7 seconds or more.

This latest testing has taken place on more realistic hardware with the more mature implementation of Cometd that is provided by Dojo and Jetty.

The test machines

The test machines used were mid-sized Amazon EC2 virtual servers: 7.5 GB of memory, 2×2 EC2 Compute Units, 64-bit platform running Ubuntu 7.10 and Sun JVM 1.5.0_13. A single virtual machine was used as the Cometd server and between 1 and 3 virtual machines were used to generate the load of 20,000 clients.

The test server software

The Cometd server used was the Cometd demo bundled with Jetty 6.1.7, with some modifications:

webapps/comet.war
The war was expanded into a webapps/root directory and the war file removed
contexts/*.xml,webapps/test*
All other contexts and webapps deleted
etc/jetty.xml
Initial benchmarking past 10,000 showed that the limiting factor was lock starvation on the BoundedThreadPool. A new version of the thread pool (QueuedThreadPool) was written that used 4 locks instead of 1 and was configured with a maximum of only 50 threads.
The number of acceptors on the SelectChannelConnector were increased to 4
The lowResourcesConnections was increased to 30,000
webapps/root/WEB-INF/web.xml
The timeout was increased from 120,000 to 240,000 ms
webapps/root/WEB-INF/filters.json
All filters were removed for the test

The application tested was the simple chat room (the “hello world” of Comet) using the BayeuxLoadGenerator class shipped with Jetty.

The test client(s)

The load generator uses the Jetty asynchronous HTTP client so that 20,000 Bayeux users can be simulated with only a few threads. While it is somewhat remarkable that the Cometd/Jetty server can scale to 20,000 users, it is truly remarkable that the test client can also handle such loads.

But no matter how good the test client is, 20,000 simulated users on 1 JVM on 1 machine just can’t generate/handle the same load as 20,000 users each with their own computer and network infrastructure. To understand how this affected the results, this test measured the latency of all 20,000 users simulated from 1 machine and compared that with the latency experience on a machine simulating only 1,000 users with the remaining 19,000 users being simulated by 2 other machines.

The test load

The test load generated was the most difficult for Cometd: a constant stream of messages without any introduced jitter that would allow the server to catch up on backlogs of messages. The load was a 50 byte payload sent in a burst to 10 randomly selected chat rooms at an interval fixed for each test. The interval was selected so that a steady state was obtained with the server CPU at approximately 10% and 50% idle.

Tests were performed for 1,000, 5,000, 10,000 and 20,000 simultaneous users for 100 and 1,000 chat rooms. This gave a range of 1 to 200 users per room.

Results

The raw benchmark results are available and are plotted in a bubble diagram above. You can download the results in a Gnumeric spreadsheet (with the diagram) or CSV format (without the diagram). The bubble size relates to the message throughput (a maximum 3,800 messages per second were delivered) and the bubble position shows the round trip latency.

The key result is that sub-second latency is achievable even for 20,000 users. There is an expected latency vs. throughput tradeoff: for example, for 5,000 users, 100ms latency is achievable up to 2,000 messages per second, but increases to over 250ms for rates over 3,000 messages per second.

Above 10,000 users, the test client became brittle, and while it was able to maintain good average latency, the maximum latency could blow out to beyond some of the timeouts. This appears to indicate that lock starvation is still occurring in the test client. The tests were repeated with the load generated from 3 clients instead of 1 and the latency measurements were taken from a client limited to 1,000 simulated users. This approach gave significantly better latency and throughput for 20,000 users and of course real traffic will not suffer from this type of limitation.

Summary

The Cometd implementation from Dojo and Jetty has demonstrated scalability for web-2.0 applications to levels that meet or even exceed node utilization levels expected for web-1.0 applications.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]
SitePen, Inc. Comet Services

26 Responses to “20,000 Reasons Why Comet Scales”

  1. ben Says:

    They tried this in ROR and had to restart 400x per day. It achieved a high throughput though: 5 requests per second. ROR 4 lifE!

  2. Ajaxian » 20,000 Reasons that Comet Scales Says:

    [...] Wilkins is marching a long with better and more performant Comet support as shown in his piece 20,000 Reasons Why Comet Scales: After some recent optimizations, the Dojo Cometd implementation of the Bayeux protocol running on [...]

  3. David Davis Says:

    Woot!

    Great work greg!

  4. Skowronek Says:

    Hi!
    Thats really nice ;)
    Can i download somewhere code that you use?

  5. Barak Says:

    These results are impressive. Still, you need to run your test clients from outside Amazon. Keeping the clients and server inside Amazon means you are probably testing inside one data center. Real world clients connect over the internet, with all the latency/bandwidth issues that come with that.
    Consider running the exact same test, but move the clients to another service like Flexiscale. Or, make sure that the clients are running in a different data center.

    Either way, cool stuff.

  6. GregWilkins Says:

    Barak,

    yes real latencies will need to add the time for the internet traversal to the numbers from this test - so mentally do a +100ms when looking at the graph. However note that with the asynchronous approach, a slower network does not result in more resources being needed. If Jetty were synchronous, then a slow network would need more threads to flush the messages. It might mean that we need a few more buffers in the buffer pool.

    I blogged about this recently at http://blogs.webtide.com/gregw/2008/01/03/1199356319554.html

    On the other hand, in a real world situation, you would have a cluster of 20,000 computers to generate the load rather than just 1 - so there may be a -100ms from that effect.

  7. GregWilkins Says:

    Skowronek,
    the test server and test client are both included in the source of the cometd. If you checkout the latest code from jetty svn you will get the exact source used. If you go for the 6.1.7 release, you will get code that is close, but lacks the threadpool and a few other minor tweaks.

  8. Skowronek Says:

    Thanks Greg!

    I dowload a 6.1.7 release, but now i’ll get it from svn.

  9. Sho Fukamachi Online » Blog Archive » Java looks inevitable Says:

    [...] be actually faster than MRI. Then how glassfish serves, and scales, better than mongrel. How jetty scales to 20,000 simultaeneous connections - a figure at least 2 orders of magnitude above anything you’ll be able to do in ruby. How [...]

  10. Martin Tyler Says:

    Interesting tests Greg, could you clarify what the message rates are - i think, if i have understood you right, you were tweaking the message rate to hit a target CPU figure, but it would be interesting to know what those rates were. Or if i have missed it could you highlight it please :)

  11. GregWilkins Says:

    Martin,
    the size of the bubble in the diagram represents the message rate, which started at 3800/s and
    was reduced as the load continue to grow. The exact numbers are available in the spreadsheet that can be downloaded from the article (you need to do a saveas, because the mime type is broken from this server).

  12. Martin Tyler Says:

    Thanks, so just so I understand the test method..

    Each user logs into a single chat room? So (from the csv file) 100 rooms, 1000 clients, ~4000 throughput. Does that mean 10 clients per room and 400 generated messages fanning out to 4000 messages, or 4000 fanning out to 40000?

    I have done a lot of this kind of testing, there are always lots of variables, so coming up with useful test cases without ending up with too many graphs can be tricky :)

  13. GregWilkins Says:

    Martin,

    the tests were run with either 100 or 1000 total rooms. So when we had 20k users with 100 rooms, that was 200 users per room. Thus every message sent in the 10 message load generation burst would be delivered to 200 users and the burst would generate 2000 messages.

    Note that the delay is a pause between bursts, not the time of the burst. So you need to add the time it takes actually send each burst.

    Yes there are lots of variables and I certainly have not exhausted the test space and I’m sure there are different data sizes, burst sizes and room counts that will give better and/or worse numbers. The main thing that I ensured, was that all of the results I recorded were steady state. I would monitor the latency as the test ran and made sure it was not increasing… although you would see the odd increase for a GC, but then I only recorded results where the ave latency was able to recover after a GC (ie each GC did not put your further and further behind).

    cheers

  14. Un expresso sans sucre » COMET : Push HTTP Says:

    [...] modèle de programmation (appelé COMET) a été monté en charge et les resultats sont très bons, moins d’une demi seconde de temps de latence sous une charge très [...]

  15. ChrisB Says:

    Greg…

    I had a few quick questions about the test environment and variables… Was wondering if any kernel level settings on the OS (ie. tcpip, shared memory, etc.) were adjusted to support that many connections on the client(s)/server. I ask because, I’ve been playing around with the load generator and keep running into problems when I try to create too many connections.

    clients = 1250
    clients = 1260
    Exception in thread “main” java.lang.NullPointerException
    at org.mortbay.cometd.client.BayeuxClient$Handshake.(BayeuxClient.java:546)
    at org.mortbay.cometd.client.BayeuxClient.start(BayeuxClient.java:107)

    Reviewed the code and noticed the checkConnections() method which should guarantee that we have a connection. Any thoughts?

    Thanks…

  16. Comet Daily » Blog Archive » 10 Things IT Needs to Know About Ajax? Says:

    [...] 20,000 Reasons that Comet Scales [...]

  17. Comet Daily » Blog Archive » Comet Message Rates Says:

    [...] Chat rooms have been cited as the “Hello World” application of Ajax Comet because chat is something that everybody can understand and represents a good exemplar of the technology. With my own work, I have frequently used the cometD chat demo as the basis of benchmarking and scalability tests. [...]

  18. Pathfinder Development » Comet 2008: The State of Play in Reverse Ajax Says:

    [...] 20,000 Reasons Why Comet Scales - Greg Wilkins achieved sub-second latency using Dojo Cometd/Bayeux and Jetty. Yes, it’s a benchmark, and benchmarks can’t be swallowed whole, but it’s still quite impressive to see just two load balanced servers manage this kind of load. [...]

  19. Nancy Says:

    Hi Greg,

    We’ve been doing performance/stress testing on Jetty’s cometd 6.1.14 for our project. We observed that when we test the deliver feature of cometd - for eg - have 5000 HTTP bayeux clients connecting/subscribing to the Jetty cometd server and the server delivering to these clients at 3000notification/second. (Here, for performance testing- we deliver the same message to all the clients on same channel) - the jvm on server side goes out of java heap space after running for 20-24 hours.

    The parameters we had:
    jetty.xml:

    10
    500
    20
    2

    2000000000
    2
    false
    8443
    50000000000
    50000000000

    We increased the maxIdleTime, lowResourcesConnections and lowResourcesMaxIdleTime to support that many connections.
    We wanted to know if this would be a right approach.

    web.xml:

    timeout
    12000000

    interval
    0

    maxInterval
    100000000

    We faced the problem of clients disconnecting when we had default/lower numbers for timeout and maxInterval. So we played around with it little bit and increasing those params resolved the issue.

    After this analysis - we are not able to figure out why the memory goes out of bound or is it something related to huge numbers for the params in jetty.xml and web.xml?

  20. Nancy Says:

    Sorry - the xml tags didn’t appear in last post:
    We changed following params:
    In jetty.xml:
    maxThreads: 500
    maxIdleTime:2000000000
    lowResourcesConnections:50000000000
    lowResourcesMaxIdleTime:50000000000

    In web.xml
    timeout:12000000
    maxInterval:100000000

  21. Greg Wilkins Says:

    Firstly can you try 6.1.17 and see if you get the same problem.

    However, we have had a few reports of similar problems that appear to be cause by certain combinations of JVM and OS. We have the same cometd stress test that runs for days on linux and macos, but on some windows installations there are issues with direct buffer allocation resulting in OOMs after many hours. So if you are suffering from this same problem, it is almost certainly caused by the associated JVM bugs. Please email the details of your testing to cometd-dev@googlegroups.com for further analysis

  22. Bartek Doszczak Says:

    Hello,

    In my team we have been using the sample app and BayeuxLoadGenerator and it has been very useful so far however we hit a problem when we tried to use HTTPS. When we were using CometD coming with Jetty 6.22 we were getting IllegalStateException thrown by SSLEngine during handshake/rehandshake. When we tried to use cometD with Jetty 7 HttpExchange.setStatus if thorowin ISE as well as says that 7 >= 9… As result in both cases when trying to logon 100 users usuallu around 90 logs on successfully.

    Any idea how to fix the problem what is actually causing it?

    Cheers,
    Bartek

  23. Sven Driemecker Says:

    Hi Greg,

    did you use any special Garbage Collector settings for your load tests? I’m doing a lot of load testing on my own and I noticed that GC tuning can really increase the performance of my app - up to 20%. My settings on a 4 CPU, 8GB machine with an Ubuntu 9.10 64-bit server OS and the latest stable JDK 6 version were as follows: -Xmx7168m -Xms7168m -XX:+UseParallelOldGC -XX:+UseParallelGC -XX:NewRatio=2 -XX:+AggressiveHeap -XX:ParallelGCThreads=4

    Would like to know what settings you preferred in your test setup.

    Cheers,
    Sven

  24. Comet Servers for a Single-Dealer Platform (SDP) « SerenDiPity Says:

    [...] Average (Blog) [...]

  25. Cometd-2 Throughput vs Latency | Webtide Blogs Says:

    [...] some of our own lies, damned lies and benchmarks. It has be over 2 years since we published the 20,000 reasons that cometd scales and in that time we have completely reworked both the client side and server side of cometd, plus [...]

  26. CometD 2.4.0 WebSocket Benchmarks | Webtide Blogs Says:

    [...] than one year has passed since the last CometD 2 benchmarks, and more than three years since the CometD 1 benchmark. During this year we have done a lot of work on CometD, both by adding features and by continuously [...]

Leave a Reply



Copyright 2014 Comet Daily, LLC. All Rights Reserved