Push Outbox and Suppression Audit

I recorded sent notifications and stopped notifications with their reasons in the same system

Push notifications reach users outside the app. They speak to people who did not open the app, so the first question is not “can we send this?” but “is it okay to send this?”

Hangangjari notifications pass through three stages.

Create candidates. Decide whether to send or suppress them by policy. Put them in an outbox, send them, and record the result in an audit trail.

When operating notifications, the question that appears most often is: “why did this notification go out, and why did that one not go out?” To answer that, storing only successful sends is not enough. Reasons for not sending also have to become records.

Source changes first become notification candidates

Push does not send raw source events directly. First, it turns changes into facts that can be considered as notifications. For example, a parking lot’s remaining spaces falling below a threshold, a park crowding change, or an event cancellation can become a fact.

Then each fact is matched with subscriptions. The system checks which parks or lots the user cares about, which notification types are enabled, and whether quiet hours apply.

flowchart LR
  Source["Source change"] --> Fact["Notification fact"]
  Fact --> Match["Candidate matching"]
  Subscription["Subscriptions/settings"] --> Match
  Match --> Policy["Send decision"]
  Policy --> Decision["push_now or suppress"]
  Decision --> Outbox["Push outbox"]

First, there are rules for not sending

Suppression is not failure. It is a choice that protects users.

Hangangjari has explicit suppression rules.

ExampleReason
Suppress broad parking-lot state-change notificationsUsers are closer to wanting remaining-space threshold alerts
Suppress threshold alerts for lots without remaining-space valuesThere is no actionable number
Suppress broad park state-change notificationsToo broad, with unclear user action
Suppress unclear crowding changesA safe message body cannot be made

These rules are not about “sending fewer notifications.” They are closer to “send only notifications users can act on.”

User settings and impact are checked together

Even after a candidate passes suppression rules, it does not immediately enter the outbox. The system checks the user’s notification mode together with how important the fact is.

It asks roughly:

  • Did the user turn notifications off?
  • Is this change high urgency?
  • In parking-first mode, is this an important parking signal?
  • In outing-brief mode, is this an important park/event signal?
  • Otherwise, is it important enough for smart mode?

The result is push_now or suppress, and it must leave a reason code.

flowchart TD
  Candidate["Matched candidate"] --> Off{"Notifications off?"}
  Off -->|Yes| Suppress["Record suppression reason"]
  Off -->|No| Rule{"Stop rule applies?"}
  Rule -->|Yes| SuppressRule["Suppress with reason"]
  Rule -->|No| Tier{"Important or T0?"}
  Tier -->|Yes| Push["push_now"]
  Tier -->|No| Mode["Mode-specific decision"]

Not-sent reasons are also one-line records

If a notification was not sent, that reason also has to be recorded.

“Why didn’t it go?” is a common notification question. I need to distinguish no candidate, no matching subscription, a stop rule, an outbox entry that failed delivery, and so on.

The delivery-decision record stores subscription, fact, target, decision, reason code, policy mode, and policy version. It also prevents the same fact/subscription pair from being processed twice.

This record is needed to tune whether notifications are working.

In practice, when tuning notifications, stopped notifications are often more important to inspect than sent notifications. Too much suppression may mean the facts are vague or subscription matching is off. Too little suppression may make the app noisy. Suppression is not “failure”; it is an intentional choice.

APNs delivery happens from the outbox

The outbox is a buffer that keeps APNs delivery out of the API request path.

sequenceDiagram
  autonumber
  participant Worker as Push worker
  participant Outbox as Outbox table
  participant Dispatcher as Push dispatcher
  participant APNs as APNs
  participant Repo as Delivery repository

  Worker->>Outbox: Enqueue notification for delivery
  Dispatcher->>Outbox: Fetch notification ready to send
  Dispatcher->>APNs: Send message to APNs
  alt Success
    Dispatcher->>Repo: Record delivery success
  else Invalid device token
    Dispatcher->>Repo: Record terminal failure and token-cleanup candidate
  else APNs auth error
    Dispatcher->>Repo: Defer remaining work
  else Retryable failure
    Dispatcher->>Repo: Store next attempt time and backoff
  end

When sending from the outbox, the system distinguishes:

  • Expired items are not sent.
  • Messages without prepared translations become terminal failures.
  • Invalid device tokens become cleanup candidates.
  • When APNs authentication fails, remaining items are deferred instead of being forced through.
  • Retryable failures reflect attempt count and backoff.

I watch reasons, not only counts

For push, success count is not enough. Sent count, suppressed count, retry count, invalid device tokens, long-waiting outbox items, and backlog by priority all need to be seen together.

Suppression metrics are especially useful because they show whether notifications are too cautious or too noisy. If too many are suppressed, facts may be vague or subscription rules may be misaligned. If too few are suppressed and too many are sent, users will turn notifications off.

So a dashboard that only watches outbox backlog is insufficient. It also needs to show how much was filtered before the outbox, which reason codes are increasing, and where invalid tokens and APNs auth errors diverge.

Notification quality did not end at send count

Push had to ask “is it okay to send?” before “can we send?” Suppression is a choice that protects users, and reasons for not sending need to be audited with reason codes.

The outbox separates API requests from APNs delivery failure. Missing translations, invalid device tokens, APNs auth errors, and retryable failures are different problems. If they are merged into one failure count, the next action becomes unclear.

The final question in notification work was not “how many did we send?” It was whether I could explain why we sent, why we did not send, and whether that choice helped users.

That explanation became possible only when stopped notifications and sent notifications were recorded together.

Share

Share

Image preview