Write to Postgres Before You Publish the Event

#postgres #real-time #distributed-systems

We run Centrifugo over WebSockets for browser realtime, with NATS as the fan-out broker for workers. A request that creates a comment has three side effects: a row in Postgres, an event over NATS, and a Centrifugo publication to the subscribed clients.

The ordering question is small but consequential.

The wrong order

await centrifugo.publish(channel, payload); // client sees the comment
await db.insert(comments).values(row);      // database catches up

If the second line throws, the client has already rendered a comment that does not exist server-side. A refresh makes it disappear. Worse, a client that just connected after the publication but before the row landed sees inconsistent state when it asks the database for history.

The right order

const inserted = await db.insert(comments).values(row).returning();
await centrifugo.publish(channel, { ...payload, id: inserted[0].id });

The database is the source of truth. The realtime event is a courtesy — it lets connected clients skip a refetch. If the publish fails after the write succeeds, the worst outcome is that clients learn about the change on their next query. That’s recoverable. The reverse is not.

When you can’t await both

For high-volume systems the publish step is sometimes moved into a worker: write the row, enqueue a job, return. The worker publishes. This preserves the invariant — Postgres still leads — and adds a retry buffer between the two steps. We do this for chatbot responses where the LLM call happens entirely in the worker, but the realtime fanout still waits on the row.

The rule

The database write is what makes a fact real. Anything that depends on that fact — events, caches, search indexes — runs after, and is allowed to lag, but is never allowed to lead.