Guidance for Asynchronous Messaging
On this page
- Design smart endpoints and dumb pipes
- Strive for atomic messaging
- Build in resiliency
- Secure the pipe
- Decouple across all layers
- Evolve and support the interface throughout its lifecycle
Asynchronous messaging allows for sharing of data across systems and is in wide use across the Government of Canada (GC). This guidance document is intended to guide technical practitioners (e.g., integration developers and architects) in the development of asynchronous messaging across the GC to better support integrated digital processes across departments and agencies.
- What is asynchronous messaging? Asynchronous messaging is a form of send-and-forget messaging where messages are sent and no immediate response is required for processing.
- Why is it important? Using queues to pass messages without waiting for a response can be extremely effective in communicating large amounts of data while reducing dependencies between systems. Care should be taken in designing messaging systems as they can become extremely difficult and expensive to operate when too much complexity is introduced.
- When should I use this type of messaging? Ideal use cases for asynchronous messaging include notifications for multiple consuming systems. Often these notifications indicate either new data is available or existing data has changed. The consumers then decide if the notification requires further action and follow-up accordingly, perhaps by requesting updated information from an API or simply changing a local record to reflect the new information.
- How do I implement asynchronous messaging? This guidance document provides best practices to follow when designing and implementing asynchronous messaging solutions to minimize operational complexity and cost while maximizing the flexibility of integration between systems. These implementations may be done using the GC Event Broker or service bus solutions.
- Should I use asynchronous messaging for all of my data sharing? You should not use asynchronous messaging for all your data sharing. Instead, use it for cases when you have small, contained events and data that does not require an immediate response, such as an event notification or small packet of sensor data. The GC recognizes the need for many integration styles ranging from Application Programming Interface (APIs) to bulk file transfer. Please also refer to the Government of Canada Standards on APIs for more about implementing APIs.
1. Design smart endpoints and dumb pipes
Complexity and logic should be pushed out to the applications at the edges and should not be encapsulated inside the queueing/messaging infrastructure. This design ensures that logic remains in the application layer and minimizes the number of parties involved in troubleshooting processing errors. In particular:
- Implement parameter-based routing – Message senders should set the necessary parameters (e.g., event type, routing headers) to determine how the message should be routed. Routing in the messaging platform should be based on these parameters rather than hard-coded logic within the messaging infrastructure.
- Translate and transform data at the endpoints – Any data transformations or translations should be done by the sending and receiving endpoints and not within the messaging infrastructure itself. If the transformation or translation logic needs to be external to the sending or receiving application code, it should be implemented as an independent application that is managed separately from the messaging infrastructure.
- Avoid distributed transactions – Distributed transactions (e.g., XA transactions) are powerful when trying to coordinate concurrent or related updates to multiple systems, but are extremely difficult to troubleshoot should a failure occur. The preference is to design systems that do not require distributed transactions and can tolerate eventual consistency.
- Avoid logic in the messaging layer – The messaging layer (i.e., messaging infrastructure) should only be responsible for routing and delivery of messages. Any business logic such as data validation or conditional processing (e.g., checking payloads for null values) should be implemented by either the sender or the receiver. See translation note above as well.
2. Strive for atomic messaging
Messaging interaction should be as atomic as possible. Assembling a transaction across messages and queues creates complexity and a higher risk of failures. The following practices should be applied:
- Do not orchestrate processing across queues – If a process requires messages to be sent across multiple queues and then reassembled by the receiver, it should be re-designed to use a single queue instead. Coordinating cross-queue dependencies becomes exponentially complex when failures occur and a single queue going down could back up all other queues tied to the same process.
- Limit the use of in-order delivery – While most messaging infrastructure solutions offer in-order delivery capabilities, the ability to guarantee 100% in-order delivery end-to-end is nearly impossible. This difficulty is a result of the numerous different paths messages can take between sender and the receiver (e.g., load balancers, network routes). Therefore, dependency on in-order delivery should be avoided unless absolutely necessary. If in-order delivery cannot be avoided, always ensure that grouping and message order IDs are assigned to these messages so the receiver can reassemble them correctly without solely relying on the messaging infrastructure to manage order. Message groupings should also be kept as small as possible and unbounded groups should be avoided as out-of-order messages will cause all other messages to be held in memory until the order is restored.
- Avoid generic queues or topics – Queues or topics should be mapped to either a single event type or a single business entity to allow for granular control of routing flows. Generic queues or topics that carry multiple event or entity types result in more complex routing logic that creates undue operational burden.
3. Build in resiliency
Endpoints, especially senders, should assume that the messaging/queuing infrastructure will fail at some point. Resiliency should therefore be built into the endpoints to deal with potential failure scenarios and should not assume absolute reliability of the messaging/queuing infrastructure. The following considerations should be made:
- Implement retry at the sender – Message senders should implement appropriate logic in case the messaging infrastructure is unavailable.
- Implement idempotent receivers – End-to-end once-only (a.k.a. once-and-only-once) delivery is impossible to guarantee end-to-end. A more resilient approach is for the message receivers to be implemented in such a way that duplicate messages (i.e., resends) do not adversely affect the system. Queues should therefore be configured to perform at-least-once delivery.
- Implement durable and persistent queues/topics – Persistent queues ensure that messages are temporarily stored until they are successfully delivered. Durable queues ensure that those messages will survive a restart of the queueing infrastructure. Unless the messages are disposable, it is recommended that all queues and topics be configured to be persistent and durable.
- Allocate persistence stores based on peak volumes and outage windows – Queues that require guaranteed delivery of messages should be configured as persistent. The persistent storage for these queues should be sized based on the following formula to minimize the risk of message loss during any potential outages:
- Maximum message size x Peak messages/hour x Maximum scheduled outage window (in hours)
- Implement redelivery and dead letter queues appropriately – Any persistent queues should also be implemented with redelivery and dead letter queues to offload failed messages from the main queue. This configuration will ensure that any message-specific failures do not stop the queue.
- Implement operational processes for message failures – Message failures will certainly occur. The message sender and receiver(s) should agree on operational processes to monitor, identify, and deal with failed messages up front, before any integrations are actually deployed.
4. Secure the pipe
Security should be top of mind when designing and implementing interfaces. The following practices should be followed for any asynchronous messaging integration other than those exposing public data (e.g., open data). It is important to note that these practices are to provide a baseline set of security controls. Additional controls (e.g., message-level encryption, mutual authentication, and digital signatures) may be required based on the sensitivity level of the data and your own departmental security requirements.
- Enforce transport encryption – Unencrypted TCP is prohibited in the GC. All encryption practices must adhere to the Implementing HTTPS for Secure Web Connections Information Technology Policy Implementation Notice (ITPIN) and the Direction on the Secure Use of Commercial Cloud Services Security Policy Implementation Notice (SPIN).
- Protect endpoints – Authentication by both message senders and receivers should be used, with at least sender authentication for open data and unclassified queues. Message encryption and signing at the application tier outside of the queueing infrastructure is also encouraged.
- Implement certificate-based authentication where possible – Asynchronous messaging is mainly used for back-end integrations from machine to machine and does not involve users. Certificates provide a much better way of managing machine to machine authentication than username and passwords, and therefore should be implemented whenever the protocol or queueing infrastructure supports it.
5. Decouple across all layers
Asynchronous messaging doesn’t automatically provide loose coupling. Instead, loose coupling is achieved through proper design and implementation across application and data tiers. The following practices should be followed when defining your integration:
- Implement a protocol translation layer if possible – Message queueing protocols are less standardized than those used by APIs. As such, the need to support multiple protocols is necessary to ensure future flexibility and compatibility. Senders and receivers should implement some form of a translation layer or leverage technology connectors so that application code is not tightly coupled to the messaging protocol. This also ensures that systems across the GC can interoperate over messaging without the need to agree on a single common protocol and that protocol changes will not require changes to the connected applications.
- Use notifications and eventing – Messaging is most effective when data payloads are small and atomic. Therefore, it is better to send notifications and events (e.g., new employee joined) instead of large complex datasets (e.g., new job record with new employee record). This principle is especially true when different datasets are generated based on the context of a given event (e.g., employee joining an existing position vs. a new position). Notification or event receivers can then make API calls as needed to the sender to retrieve the appropriate dataset based on the specific context of the event. This design is much more efficient than the message sender trying to anticipate all permutations of the data set that might be required and truly decouples the responsibilities of the sender from the processing requirements of the receiver.
- Leverage GC Event Broker – Use the GC Event Broker infrastructure for asynchronous messaging between GC departments and agencies where possible.
6. Evolve and support the interface throughout its lifecycle
Messaging interfaces will change over time as system and user needs evolve. As a best practice, that change should be properly supported and managed through the following practices:
- Version messages and queues – Any extensions to the message schema should be versioned. Any changes to the message schema which cannot be implemented as an extension (i.e., not backwards compatible) should result in a new message type entirely. Any changes to queue behaviour that are associated with a schema change (e.g., routing, degree of validation, persistence parameters, error handling) should result in a new queue version. Versions should be single number (e.g., v1).
- Respect existing consumer dependencies – Support at least one previous major version (i.e., N-1) to ensure receiver systems have time to migrate to the latest version of the message or queue. Communicate your development roadmap with receiving teams and work with them to understand the impact of any major changes. Set clear deprecation policies and timelines up front so receivers understand how long they have to migrate to each new release before the legacy one is offline. Coordinate any necessary testing on all releases.
- Provide a point of contact – Provide a designated point of contact to any teams receiving messages from your queue. If the queue is available for GC-wide or external use, provide a support email account. A phone number should also be provided for critical queues.
- Define an SLA up front – Each queue should be accompanied with a clearly-defined Service Level Agreement (SLA). At a minimum, the SLA should define the following:
- Support hours (e.g., 24/7, 9/5, coast to coast business hours)
- Service availability (e.g., 99%)
- Support response time (e.g., within the hour, 24 hours, best effort)
- Scheduled outages (e.g., nightly, weekly, every 2nd Sunday evening)
- Throughput limit (e.g., 100 requests per second per receiver)
- Message size limit (e.g., < 500Kb per message)
Page details
- Date modified: