Achieving low latency with pub/sub

In pub/sub messaging systems, getting messages to flow quickly between publishers and users isn't just critical to its general performance, but central to its basic usability. Achieving this at scale introduces some extra challenges that require thoughtful architecture design and strategies for handling unexpected behavior (e.g. traffic spikes). To better understand the best practices we can implement to our architecture to overcome these challenges, let's revisit how the pub/sub pattern works. What is pub/sub? Pub/sub (or publish/subscribe) is an architectural design pattern used in distributed systems for asynchronous communication between different components or services. Although publish/subscribe is based on earlier design patterns like message queuing and event brokers, it is more flexible and scalable. The key to this is the fact that pub/sub enables the movement of messages between different components of the system without the components being aware of each other’s identity (they are decoupled). For a deeper dive into pub/sub, including examples and comparisons to other messaging patterns, see our guide: What is pub/sub?. Why latency is crucial to pub/sub realtime systems Latency is the time it takes for data to travel from the backend (like a datacenter) to the end-user’s device. Latency levels of

Jan 23, 2025 - 00:08

In pub/sub messaging systems, getting messages to flow quickly between publishers and users isn't just critical to its general performance, but central to its basic usability. Achieving this at scale introduces some extra challenges that require thoughtful architecture design and strategies for handling unexpected behavior (e.g. traffic spikes).

To better understand the best practices we can implement to our architecture to overcome these challenges, let's revisit how the pub/sub pattern works.

What is pub/sub?

Pub/sub (or publish/subscribe) is an architectural design pattern used in distributed systems for asynchronous communication between different components or services. Although publish/subscribe is based on earlier design patterns like message queuing and event brokers, it is more flexible and scalable. The key to this is the fact that pub/sub enables the movement of messages between different components of the system without the components being aware of each other’s identity (they are decoupled).

For a deeper dive into pub/sub, including examples and comparisons to other messaging patterns, see our guide: What is pub/sub?.

Why latency is crucial to pub/sub realtime systems

Latency is the time it takes for data to travel from the backend (like a datacenter) to the end-user’s device. Latency levels of <100ms are hard to achieve in general, but for pub/sub systems, it’s essential that those speeds remain not only consistent but also undetectable so that users remain engaged and don’t quit the app entirely. Applications that focus on, for example, crucial broadcasting updates, realtime chat, and live streaming services, need to be able to deliver seamless experiences with ultra-low latency to maintain their user bases.

This becomes especially important at scale for a global audience: If a pub/sub system can’t maintain these speeds as it scales up and reaches a global user base, message delays could render it unusable, even if your infrastructure has the raw capacity. Serving a single region is significantly simpler than achieving consistent low latency across a distributed global audience, where factors like inter-region data replication and network variability come into play. Global median latency is a good measure of average global latency if you’re operating at scale, and it’s the metric we use to measure our speeds at Ably.

Some architectural decisions you can make to achieve low latency are:

Global datacenter coverage: The physical proximity of datacenters or edge points of presence (PoPs) to end users significantly impacts round-trip times for messages. If you distribute datacenters and PoPs globally, you can drive down latency for your users.
Protocol efficiency: The choice of protocol affects how efficiently messages are transmitted. For example, WebSocket is highly efficient for realtime communication compared to HTTP long polling. (WebSockets are a particularly good protocol for achieving low latency in pub/sub systems since they maintain an open connection between the client and server without the need for frequent HTTP responses. For a deeper dive into how WebSockets compare to other protocols in pub/sub systems, check out our guide Pub/Sub vs WebSockets.)
Network robustness: A reliable, fault-tolerant underlying network infrastructure can ensure consistent low latency even under high traffic volumes.

Challenges to achieving low latency

The most straightforward obstacle to latency is network speeds - latency is inherently affected by the distance between clients and the server. The farther a client is from a datacenter, the longer it takes for messages to reach them.This is a critical consideration for global systems, where distances between users and datacenters can span continents. But there are other factors that can affect end-user latency:

Message routing: Poorly optimized routing can lead to bottlenecks, especially in use cases with high fanout where a single message is delivered to thousands or millions of subscribers.
Load balancing: If you don’t have a load balancer, or an improperly configured one, imbalances can cause overloading of certain nodes, resulting in delays for subscribers.
System resource contention: High message volumes can strain CPU, memory, and storage resources, leading to increased latency. This is particularly true during traffic spikes.
Encoding: Inefficient message encoding increases latency by slowing down the system’s ability to translate data into a transmittable format and back again.

Best practices for achieving low latency

Best practices for achieving low latency are, on paper, straightforward fixes to the points discussed above. However, making these changes to your architecture requires significant engineering effort and potentially a rehaul of your existing infrastructure. Here’s what we recommend you do:

Use a globally-distributed architecture

Deploying servers in multiple regions reduces the physical distance between clients and the server, minimizing network latency. Make sure that your infrastructure includes a combination of core datacenters and edge points of presence (PoPs). This secures fast round-trip and consistent round trip times for users anywhere in the world.

Optimize message routing

Efficient routing algorithms, such as consistent hashing, can ensure that messages are delivered to subscribers quickly and reliably. For systems with high fanout, prioritize techniques that minimize duplication and ensure messages are processed efficiently.

Have a load balancer

Dynamic load balancing distributes traffic evenly across servers, preventing overloading. For pub/sub systems, load balancers must account for both connection count and message throughput.

Use message delta compression

Compressing messages reduces their size, enabling faster transmission over the network. Use lightweight, efficient compression algorithms to minimize processing overhead.

Autoscale to reduce resource consumption

Optimize resource usage by scaling infrastructure elastically during traffic spikes. Use dynamic autoscaling to add capacity on demand and maintain a significant resource buffer.

Have redundancy and failover

Build redundancy into servers and have failover mechanisms that reroute traffic during outages. For global systems, failover strategies should account for regional redundancies to make sure that if one region experiences an outage, traffic can seamlessly shift to another without impacting users worldwide. This minimizes latency spikes during failover events and ensures uninterrupted service.

How Ably can help

For many people, building a system with all of these components from the ground up is impractical and is a huge time and skill investment. That investment also tends to be more expensive than initially expected because of maintenance costs and other challenges - like scalability and data integrity - that make maintaining a low enough latency even more difficult.

At Ably, our team is very familiar with the amount of work building a low-latency pub/sub system takes - and all the edge cases around optimum performance. We’ve made it our mission to provide the most reliable realtime service for you - and Ably Pub/Sub is devoted to pub/sub use cases.

Choosing a managed pub/sub service like Ably can save you and your team the headache of managing the architectural challenges of low latency at scale. Performance is one of Ably’s core pillars, and it’s built into what we do. Here’s how:

Predictable performance: A low-latency and high-throughput global edge network, with median latencies of <50ms.
Guaranteed ordering & delivery: Messages are delivered in order and exactly once, with automatic reconnections.
Fault-tolerant infrastructure: Redundancy at regional and global levels with 99.999% uptime SLAs. 99.999999% (8x9s) message availability and survivability, even with datacenter failures.
High scalability & availability: Built and battle-tested to handle millions of concurrent connections at scale.
Optimized build times and costs: Deployments typically see a 21x lower cost and upwards of $1M saved in the first year.

Low latency is non-negotiable for any pub/sub system that aims to deliver realtime experiences at scale. If you’re looking for a solution that scales up and ensures some of the lowest latencies in the business, Ably provides a robust and reliable platform to power your pub/sub needs. Sign up for a free account to try it for yourself.