Guaranteed Messaging in Comet

by Martin TylerMay 9th, 2008

There is a lot of confusion about guaranteed messaging. Many products claim some level of guaranteed or reliable messaging, but what does it really mean? What are you guaranteeing, and to whom?

As a software vendor all you need to do is have the words ‘Guaranteed Messaging’ in your literature and for a lot of customers their box ticking exercise will be satisfied. It’s not until the customers technical team start asking awkward questions that you start to think about what ‘Guaranteed Messaging’ really means and whether the related features you have built into your product really fall under that banner.

In some sectors and products the words Guaranteed, Reliable and Certified have specific meaning, but I am just going to look at the basic concepts surrounding these.

There are a few things about sending messages that could be guaranteed:

  • In general you want to be sure a message gets to the intended recipients.
  • In some cases you also want to be sure that successive messages are received in the same order as they were sent.

Once you start thinking about solutions for this, a third requirement might spring up:

  • You want to be sure a message is received or, more importantly, processed only once.

Point to point

When messages are sent between a single producer and a single consumer this is often called point to point messaging. This is usually the area where guaranteed messaging is required as the message relates directly to a particular consumer rather than being a general message for many interested consumers.

Most solutions to this are based around sending acknowledgements. The basic idea here is that the producer sends a message and the consumer sends an acknowledgement when it receives the message. If the producer does not receive the acknowledgement within a specified timeout period then it will resend the message.

This is where we meet our first problem. If the producer does not receive the acknowledgement, is it because the message did not get to the consumer or because the acknowledgement did not get back to the producer? If the producer simply resends the message we run the risk of the consumer processing the message multiple times. To prevent this from occurring, the consumer needs to remember which messages it has processed, and this requires a unique message id or sequence number to be present on all messages. When the consumer receives a message it has previously processed, it can resend the acknowledgement without processing the message again. It is fairly common for a resent message to be flagged as such, so that the consumer can optimise checking whether it has previously seen a message.

So with this in place we can cope with the message being lost in transit, and also with the acknowledgement being lost in transit. To take it to the next level we need to think about what the messaging API might look like and who is being given the guarantee.

The simplest approach is for the send() method to block until the message has definitely been consumed (received and acknowledged). It may not be acceptable for the sending application to be held up in this way, so instead the API could asynchronously notify the producer that the message has been consumed. Without such a notification your guarantee does not cope with software failure. You cannot simply call a non-blocking send() method and be guaranteed that the message will be consumed. This is a key point, and means that the user of the API must also be involved in the guarantee.

At the consumer end, you can imagine an onMessage() callback. There could be two ways to implement the acknowledgement. The simple way is for the messaging library to send the acknowledgement after the callback has returned. This assumes that the user code is processing the message synchronously and the callback returning implies the message has been processed successfully. Alternatively you can provide a method on the message to explicitly acknowledge it. This gives the user code more flexibility in how it processes messages. Without this, the consumer library could be sending acknowledgements when the message has not been processed, which would open up holes in the system.

Coping with software failure is an interesting topic. Platforms and libraries can only provide so much; the application itself needs to decide its own requirements and how to handle some errors. For full resilience the application needs to be involved in the guarantee and cannot rely just on the platform being used.

Multiple consumers

In many Comet solutions there are a number of consumers all subscribed to the same subject and therefore receiving the same messages for that subject. A single producer may send a message which the Comet server then sends out to multiple consumers.

In this scenario, typically the producer does not require an acknowledgement from each consumer, since the producer is not even aware of all the consumers. In most platforms there will be a component between the producer and the consumer, in this instance it is usually the Comet server. The producer may actually be another Comet client, or it may be a server side component generating messages. The producer may require an acknowledgement from the Comet server, which would state that the server has processed the message and all subscribers will receive the message. However, there are caveats with that acknowledgement; it is really saying that all subscribers will get the message as long as the Comet server does not fail and as long as the consumer does not fail and not reconnect.

Without an acknowledgement from each client this is the best guarantee you can get. For most applications it is impractical for consumers to be sending acknowledgements for messages with multiple consumers. However, if this is required then a similar implementation to the point-to-point acknowledgements can be used, but this time only between the server and the consumer.

An alternative to acknowledging every message is to implement a lazy acknowledgement, which acknowledges batches of messages. A server might store all messages sent to a client, to enable it to resend them in the case of a reconnection - lazy acknowledgements would allow the server to clear out this cache of messages periodically, preventing excessive memory usage.

Message Order and Missing Messages

Guaranteeing the order of messages a consumer receives can be easy. Depending on the implementation, it may be that nothing special is needed.

In a system where all the components are connected via persistent TCP/IP sockets, as long as the Server and libraries do not change the order of messages due to threading, all messages should be received in the order they were sent. However, Comet solutions do not generally have the luxury of a persistent TCP/IP socket, certainly not in both directions. Even with a bidirectional socket, coping with reconnections opens you up to some of the issues that other transports may have to cope with during normal operation.

To handle message ordering each message needs a sequence number. In the simple case of a bidirectional socket, on a reconnection a handshake can take place to ask for all messages since the last received sequence number.

This is not just about message ordering, its about missing messages. The methods used to guarantee message delivery between producer and consumer will obviously prevent missed messages, but they are generally more end to end (producer to consumer) rather than just server to client at the transport level. End to end guarantees are not always used, or not used for all types of data, but you still want a transport that will not let messages go missing.

Due to the nature of Comet it is often at the transport level that preventing missed messages has to be implemented. Although this could be layered on top of the transport, an understanding of the connection semantics of the transport is beneficial.

With a bidirectional socket, it is really only reconnection time that is an issue. Other streaming transports typically have an open socket for server to client messages, but use transient sockets for client to server. This usually means an HTTP request for every client-to-server message, which means an acknowledgement is built in through the HTTP response. The server-to-client messages are effectively a socket connection, so again it should only be an issue at reconnection time. However, it is common for ‘hidden iframe’ based streaming transports to force a reconnection periodically so it is not just unforeseen disconnections that occur. Other, less connected, transports might introduce other issues since new HTTP requests and responses are used continuously throughout the lifetime of the Comet session.

As long as sequence numbering is maintained and checked then handling of message ordering and missing messages in a layer above the transport can work. The system needs to be able to resend messages that might have been missed when asked to do so. However, in some cases knowledge of the transport could make this more efficient.

Conclusion

This article only touches on some of the areas of guaranteed messaging with respect to Comet platforms and hopefully is food for thought. Many Comet applications will not need guaranteed messaging, but will obviously still want a level of reliability where some of the above issues still apply. Hopefully this illustrates that the application itself must be involved in guaranteed messaging rather that relying solely on the libraries or platform being used.

Comments are closed.


Copyright 2015 Comet Daily, LLC. All Rights Reserved