RMRM Full Stack & AI Engineer · All guides · Roadmaps
Architecture · guide

What is a Write-Ahead Log?

A Write-Ahead Log (WAL) is a durability and crash-recovery technique used in databases and storage systems where every change is first recorded to an append-only log on disk before it is applied to the main data files. This 'log first, apply second' discipline guarantees that no committed data is ever lost, even if the system crashes mid-operation.

What It Is

A Write-Ahead Log is a sequential, append-only file that captures every intended modification to a database as a log record before the actual data pages are written. Each record typically contains a sequence number (LSN), the transaction ID, and both the before-image and after-image of the changed data. Because appending to a log is a single sequential write, it is significantly faster than randomly updating scattered data pages on disk. Systems like PostgreSQL, SQLite, MySQL InnoDB, and RocksDB all rely on WAL as a core mechanism.

Why It Matters

WAL provides two critical guarantees: atomicity (a transaction either fully succeeds or is fully rolled back) and durability (committed data survives crashes). Without it, a power failure mid-write could leave data pages in a partially updated, corrupt state. Beyond crash safety, WAL also enables efficient replication by streaming log records to replicas, and it supports point-in-time recovery by replaying logs up to any desired moment.

How It Works

When a transaction commits, the database engine first flushes all corresponding WAL records to disk using fsync or an equivalent durable-write call. Only after that flush succeeds is the commit acknowledged to the client. In the background, a separate checkpoint process periodically writes dirty data pages to the main data files and records the checkpoint LSN in the WAL, allowing older log segments to be safely discarded. On restart after a crash, the engine reads the WAL from the last checkpoint and replays any records that were not yet reflected in the data files.

Log Sequence Numbers (LSNs)

Every WAL record is stamped with a monotonically increasing Log Sequence Number that acts as a global ordering mechanism. LSNs let the recovery process know exactly which changes have already been applied to data pages and which still need to be replayed. Replication systems use LSN comparisons to determine how far behind a replica is and to resume streaming from the correct position. In PostgreSQL, LSNs are exposed via functions like pg_current_wal_lsn() for operational visibility.

Checkpointing and Log Retention

Checkpointing is the process of flushing all WAL-covered dirty pages to the main data files and recording a safe restart point, after which earlier WAL segments are no longer needed for recovery. Checkpoints trade I/O pressure (a burst of page writes) for shorter recovery time and bounded WAL disk usage. Tuning checkpoint frequency is a critical performance knob — too frequent causes I/O spikes, too infrequent causes slow crash recovery and large WAL accumulation. Systems like PostgreSQL expose checkpoint_completion_target and max_wal_size to help balance this trade-off.

Key Gotcha: fsync Must Not Be Disabled

The entire durability guarantee of WAL depends on log records being physically flushed to stable storage before a commit is confirmed. Disabling fsync (a common but dangerous performance hack) allows the OS to buffer WAL writes in memory, meaning a kernel crash can silently lose committed data or corrupt the database entirely. Always keep fsync enabled in production; instead, use faster storage hardware, battery-backed write caches, or tuned wal_sync_method settings to achieve the performance you need without sacrificing safety.

Go deeper with an AI tutor that teaches this in context — and quizzes you on it.
Open the app — free to start

© RM Full Stack & AI Engineer · All guides · Roadmaps · Open the app