개발

PostgreSQL read replicas: replication lag is not a bug—it is a contract (field notes)

Read-replica rationale, WAL lag, replica-side SQL, hot_standby_feedback, planner drift, failover/DNS, and an adoption checklist—plus API routing, RYW, pooling, conflict cancels, and on-call order. itemSCV PostgreSQL field notes.

PostgreSQL read replicas: replication lag is not a bug—it is a contract (field notes)

Key takeaway

In one line: Before you funnel SELECTs to a replica, product and backend must agree how stale reads can be. Without that agreement, “data looks wrong sometimes” tickets never end.

QuestionIf unanswered
Must a user see a write immediately?Missing rows on replica → user reports
Can dashboards lag 30s?Replicas can save cost and load
Balance right after payment?Primary-only routing rules required

Async replication and lag


Opening: “Replication is fine—why don’t I see it?”

A familiar on-call pattern: “We just updated but the UI didn’t change.” Writes hit primary; reads habitually go to a read-only endpoint. PostgreSQL is not wrong—the app ignores the physical time gap of async replication.

On event and reward tables at itemSCV, reads-after-writes are common. We tried “replica only” first, then moved specific APIs back to primary more than once. This article captures the shared vocabulary and checklists from that work. Concepts apply whether you use managed RDS, AlloyDB, or self-hosted Postgres (February 2026 baseline).

Unless stated otherwise, this post assumes physical streaming replication (WAL). Logical replication and external CDC have different lag and consistency—do not reuse the same routing table blindly.


1. Why split reads in the first place?

The slogan is “reduce primary load,” but in practice you usually want one of:

  1. CPU/IO isolation: heavy analytics/reporting queries stop stealing OLTP disk and CPU
  2. Availability buffer: ties into failover/promotion stories (this post stays focused on read routing)
  3. Deploy/migrations: long read-only jobs on a replica

If you add replicas only because “SELECT feels slow” without handling lag, perceived quality can get worse.

1-1. Do not start with a replica when…

SituationLook here first
One query hogs primaryPlans, indexes, statistics (ANALYZE), partitioning
Connection count explodesPool sizing, idle timeouts, app leaks
Disk IO capsInstance class, storage type, full-table scans

A replica does not make a slow query fast. Offloading a slow query to a replica often stays slow and may fight replay.


2. Async means lag is a gap, not a failure

With typical streaming replication, the replica is a separate process catching up on WAL. Write bursts on primary leave the replica briefly behind. That is normal.

PieceOne-liner
PrimaryWrites WAL and ships it to standbys
ReplicaReplays received WAL into data files
LagTime/bytes gap from ship vs replay speed

Where to send reads

The team’s job is not to wish for “zero lag” in slides, but to write down:

  • Product SLO: e.g. “list views may lag up to N seconds”
  • Routing rules: e.g. “user’s own resource reads go to primary” in code or middleware

Without that, the database always gets blamed.

synchronous_commit / sync standbys change write latency and availability—a different axis from “read staleness.” For read consistency, app routing usually comes first.


3. Common application-level compromises

A. “Just wrote—use primary” pattern

For a session, user, or order id, stick to primary for a short window after writes. Implementation varies (cookie, Redis flag, gateway header). The point is encoding who needs freshness for how long.

B. Read-only APIs on replica only

Dashboards, internal admin, batch reports—paths where slightly old data is OK—can stay on replica. If caches are involved, align cache TTL with lag SLO.

C. Pitfall inside a transaction

Reads after writes in the same transaction must use the same session and primary. Trying to read from a replica “in the same request” means the design is already wrong.

Right after COMMIT, some ORMs attach the next read to a replica connection even outside an explicit transaction block. Post-commit reads break RYW easily—make those paths explicitly primary.

D. ORM/router footguns

MistakeOutcome
“Read-only” flag still points at the same poolYou never hit replica
Lazy loads use the read connectionRows “vanish” right after write in the UI
Batch jobs read-only on replica but drive decisionsStale decisions without FOR UPDATE where needed

In review, confirm two DSN strings exist and smoke tests run against both primary and replica.


4. Monitoring: don’t alarm on zero—alarm on SLO

For replicas I care less whether lag is exactly zero than whether it exceeds the N seconds we promised.

What to chart on replicas

Metric names differ by platform; treat numbers as directional.

SignalWhy it matters
Replay lag (time or bytes)Early signal for write bursts, network, or disk
Replica connection countWithout pooling, the replica dies first
Long-running SELECTsTied to max_standby_streaming_delay / cancel behavior

On primary, pair pg_stat_replication with replica-side recovery lag views/metrics to see where the bottleneck is.

Example on primary (session-level sanity; column names vary by version):

On the replica, receive vs replay LSN (check version/permissions):

If replay_queue_bytes stays large, suspect replay (CPU, disk, long queries/conflicts). Pair primary replay_lag_bytes with replica queue to split network vs replay bottlenecks.

4-1. Alerts: SLO and trend, not “non-zero”

Anti-patternPrefer
Page if lag > 0Noise every burst; alerts get ignored
One fixed threshold foreverMeaningless after traffic grows
Watch only replica CPULag can pile in WAL queue with idle CPU

Alert when you breach the promised N seconds (or N MB), or when p95 worsens for several days.

Operators should read this as “how many times our SLO”, not “must always be zero”.


5. Hall of anti-patterns

  • “All reads go to replica” → mystery logins for users who just signed up
  • Lag alerts too tight → nightly pages people learn to ignore
  • Heavy long queries on OLTP replica only → fighting replay
  • Flip ORM to replica URL with no agreement → an hour-long blame game
  • Expose sequences/counters directly → replica may disagree with primary on “next value” users see
  • “Read-only batch” on replica without checking → temp tables, COPY, some extensions are blocked or risky
One-line symptomSuspect
“Sometimes the row is missing”RYW, cache, replica lag
“Only replica 5xx”connection storm, conflict cancels, instance limits
“Only after migration”wrong endpoint wiring, pool warm-up

6. Sync replication in one line

“Can’t we just use synchronous commit?” That trades write latency and availability. Some financial flows justify it; most web stacks do better with app routing + async + SLO. Before enabling sync, re-measure write p99.


6-1. One-page API table—“which endpoint goes where?”

A single table ends a lot of meetings. Fictional e-commerce example:

API / screenRead targetReason
After POST /orders, GET /orders/:idprimaryConsistency right after payment/inventory
Product list, searchreplica (30s SLO)Spread load on traffic/cache miss
“My orders” first loadreplicaSlight lag usually OK
Order detail right after checkoutprimary or short RYW windowUser expects what they just saw
Admin daily revenue rollupreplica + long timeoutIsolate OLTP IO

Practice: keep this next to OpenAPI or in Notion and ask in PR review: “Can this SELECT hit replica?”


6-2. Read-your-writes—“primary for 5 seconds” pattern

Frameworks differ; the contract is the same: “For T seconds after a write, reads for that user/resource go to primary.”

ApproachProsCaveats
Session last_write_at + middlewareSimpleClock skew, multi-tab, mobile concurrency
Redis user:123:last_write TTL 10sFits stateless app tierFallback if Redis is down
Response header X-Use-Primary-UntilWorks with gatewaysNeeds client cooperation

Set TTL around replication lag p99 + margin. “Forever primary” negates having a replica.


6-3. Replica-only failures—max_standby_streaming_delay

Long SELECTs on a replica can block WAL replay; Postgres may cancel queries (e.g. “canceling statement due to conflict with recovery”).

ActionNotes
Long reports off peak or on a dedicated reporting replicaCleanest split from OLTP replica
Tune max_standby_streaming_delayRaising without agreement can grow visible replication lag
Vacuum tuningOld xmin can increase conflicts (workload-dependent)
Review hot_standby_feedbackOld replica transactions can delay primary vacuum → bloat/conflicts—enable only with the tradeoff understood

Practice: if “only replica is dying,” grep conflict cancel logs first. Primary fine + replica 5xx often matches this picture.


6-4. Connection pooling (PgBouncer, etc.) and replica URLs

If apps open a storm of connections to the replica, the replica dies while primary looks healthy.

CheckWhy
Pool on replica tooWhether numbackends hits instance limits
ORM “read-only” sessions use a real different DSNRenaming config while still pointing at primary
Batch worker pool size500 connections from one box to replica ends badly

Even with a managed reader endpoint, multiply app pool size by instance count and sanity-check totals.


6-5. On-call order when you suspect stale reads

StepCheck
1Which DSN the request used (logs/APM)
2Whether primary vs replica replay lag breached SLO
3Recent bulk writes, migrations, or vacuum
4Cache TTL / CDN serving stale responses (don’t blame DB alone)
5Whether post-write read paths match the table and code

Skipping step 4 burns an hour on Postgres for nothing.


7. “Same DB, different plan?”—stats, bloat, and the planner

Replicas follow data, but planner statistics are not guaranteed identical to primary. Divergent ANALYZE timing or autovacuum can make the same query seq-scan on replica only.

CheckMeaning
EXPLAIN (ANALYZE, BUFFERS) on replica (careful with load)If plan differs from primary, chase stats/config/cache
Table bloat / dead tuple ratioTied to replay, vacuum, and long-running queries
Whether hot_standby_feedback is onCan slow primary vacuum and indirectly hurt replica queries

Practice: if “replica is slow” before blaming lag, diff execution plans once.


8. Failover, DNS, and reader endpoints

After managed failover, reader endpoints may point at a new instance. Apps can cling to old hosts because of DNS TTL and connection pool reuse.

CheckNotes
Reader endpoint vs per-instance DNSHow failover is documented to behave
Pool idle timeoutToo long → stale sockets after promotion
App retriesWhether transient resets are absorbed

Primary failover opens RPO/RTO discussions; read replicas also see traffic slamming the newly healthy node.


9. One-page checklist before you add a replica

  • SLO: per-surface allowed lag (seconds/MB) is written down
  • Routing table: APIs/batches marked primary / replica / conditional (RYW)
  • On-call runbook: stale-read steps 1–5 + which log fields
  • Monitoring: lag on both sides, connections, conflict cancel logs
  • Pooling: PgBouncer or pool size vs instance max_connections
  • Reporting/batch: separate reporting replica vs OLTP replica?
  • Failover: at least one line on DNS, pools, and retry policy

Closing

A PostgreSQL read replica is less a performance switch than a switch that changes your consistency model. “Replicated” does not mean immediately readable; product must accept the gap.

For new projects, before adding a replica, write one page: which APIs use primary, which use replica, and allowed lag in seconds. Add RYW TTL, on-call order, and pool behavior on failover in a line each—that saves on-call rotations later.

Share

Related posts