Skip to content

Best practices

Jon Chambers edited this page Jan 2, 2022 · 7 revisions

To make the most of Pushy in high-throughput environments, we offer these best practices to help you build the most reliable, efficient applications possible.

Long-lived resources

ApnsClient instances are designed to stick around for a long time. They're thread-safe and can be shared between many threads in a large application. We recommend creating a single client (per APNs certificate/key), then keeping that client around for the lifetime of your application.

Asynchronous operation

Pushy is designed to send lots of notifications asynchronously. Pushy returns a CompletableFuture when sending a notification, and callers can add follow-up actions to those CompletableFutures to track the results of attempts to send notifications. Embracing asynchronous operation means that Pushy can start sending a second notification before receiving a reply to prior notifications.

As an example, let's say it takes us about 40 microseconds to write a push notification to the network, but it takes 40 milliseconds to receive a reply from the server. If we send the notifications synchronously (i.e. we wait for a reply to one notification before sending the next), our time is dominated by waiting for a reply to the server. In fact, we're only using about 0.1% of our time actually sending notifications. At 40 milliseconds per notification, we have a maximum possible throughput of about 25 notifications per second. As an example:

for (final ApnsPushNotification pushNotification : collectionOfPushNotifications) {
    final CompletableFuture sendNotificationFuture = apnsClient.sendNotification(pushNotification);

    // This call will block until we get a reply from the server
    final PushNotificationResponse response = sendNotificationFuture.get();
}

Put on a timeline, the situation looks something like this:

|- 40ms -||- 40ms -| ... |- 40ms -||- 40ms -|

By contrast, if we send notifications as quickly as we can without waiting for a reply (don't worry—the replies will arrive later), we can go much faster. Even though the server still takes 40 milliseconds to get back to us, we can keep working to send more notifications while we wait. In fact, we could theoretically send up to 25,000 notifications per second to the same server and under the same network conditions as the first case. For example:

for (final ApnsPushNotification pushNotification : collectionOfPushNotifications) {
    final CompletableFuture sendNotificationFuture = apnsClient.sendNotification(pushNotification);

    sendNotificationFuture.whenComplete((response, cause) -> {
        // This will get called when the sever has replied. response will
        // be null and cause will be non-null if something went wrong when
        // sending the notification.
    });
}

On a timeline, it looks something like this:

|---------- 40ms ----------|
 |---------- 40ms ----------|
  |---------- 40ms ----------|
   |---------- 40ms ----------|
    |---------- 40ms ----------|

You may notice that this leaves us with lots of notifications "in flight" at the same time. The APNs server allows for (at the time of this writing) 1,500 notifications in flight at any time. If we hit that limit, Pushy will buffer notifications automatically behind the scenes and send them to the server as in-flight notifications are resolved.

In short, asynchronous operation allows Pushy to make the most of local resources (especially CPU time) by sending notifications as quickly as possible.

Flow control

Even though Pushy buffers notifications automatically, callers may still want to implement their own flow control layer. For example, if you try to send a billion notifications at the same time, Pushy will have to buffer all of them and will most likely run out of memory. For many users in low- or moderate-throughput environments, this may never become a problem. For users in high-throughput environments, though, controlling the number of push notifications in play is important to avoid exhausting memory.

One popular tool for flow control is Java's Semaphore. Here's a simple example for limiting the number of outstanding push notifications to 10,000:

final Semaphore semaphore = new Semaphore(10_000);

while (pushNotificationSource.hasNext()) {
    semaphore.acquire();
    
    // Let's assume that pushNotificationSource isn't just an in-memory list of push notifications
    final CompletableFuture<PushNotificationResponse> sendFuture =
            apnsClient.sendNotification(pushNotificationSource.next());
    
    sendFuture.whenComplete((response, cause) -> {
        // Do whatever processing needs to be done
        semaphore.release();
    });
}

Threads, concurrent connections, and performance

Pushy is built on Netty, an asynchronous event-driven network application framework. Netty relies heavily on the notion of "event loops", which execute tasks in series on a single thread. In Pushy, a single connection to the APNs server is bound to a single event loop, and thus a single thread. Callers may configure ApnsClient instances to open multiple concurrent connections to the APNs server and to use EventLoopGroups (essentially thread pools for event loops) of varying sizes.

Because connections are bound to a single event loop (which is bound to a single thread), it never makes sense to give an ApnsClient more threads in an event loop than concurrent connections. A client with an eight-thread EventLoopGroup that is configured to maintain only one connection will use one thread from the group, but the other seven will remain idle. Opening a large number of connections on a small number of threads will likely reduce overall efficiency by increasing competition for CPU time.

Additionally, the number of APNs servers in play at any given time is finite (callers can check with dig +short api.push.apple.com for the production environment or dig +short api.sandbox.push.apple.com for the sandbox environment). At the time of writing, there are eight production servers and four sandbox servers. A client that allows many more concurrent connections than there are servers to receive those connections will wind up opening multiple connections to the same server, and that arrangement is unlikely to yield any performance benefits (and may actually begin to adversely affect performance due to congestion, context-switching, and resource overhead).

As a rule of thumb, Pushy is most efficient in terms of processing time when it has one or two threads per CPU core and one or two connections per thread, not to exceed more than two connections per server. Still, lots of factors can limit overall performance:

  • Pushy may use all available CPU time
  • Pushy may run up against network throughput limits (i.e. Pushy is generating more than a gigabit of traffic on a gigabit network connection)
  • Connections to the APNs server may become "saturated" by running up against the server-imposed limit on in-flight notifications; this can be especially problematic for servers with high round-trip times to the APNs server

For high-volume applications, the key to maximizing performance with Pushy is to make sure that you're completely saturating either your CPU or your network connection. If, for example, you find that you're using all of the processing time on a single core, but have additional cores and bandwidth to spare, try configuring your client to use more threads and concurrent connections. If you find that you're quickly hitting the limit for in-flight notifications, but using little CPU time, try using more concurrent connections. If you're using all available bandwidth and not much CPU time, you can scale back on the number of threads and concurrent connections.