Caching is the practice of storing copies of data in a faster-access layer so future requests can be served more quickly. Choosing the right caching strategy directly impacts application performance, consistency, and resource costs.
A cache is a high-speed data storage layer — such as Redis, Memcached, or an in-memory store — that holds a subset of data so requests can be fulfilled without hitting the slower origin source (e.g., a database or external API). Caches exploit the principle of locality: recently or frequently accessed data is likely to be accessed again soon. The ratio of requests served from cache versus total requests is called the cache hit rate, and maximizing it is a primary optimization goal.
In cache-aside, the application itself manages the cache: on a read, it checks the cache first; on a miss, it fetches from the database, stores the result in the cache, then returns it. This is the most common pattern because only requested data is ever cached, keeping memory usage efficient. The key gotcha is that the first request after a cache miss always incurs full latency, and if many simultaneous misses occur for the same key, you can trigger a 'thundering herd' problem.
Write-through keeps the cache and database in sync by writing to both simultaneously on every update, ensuring data consistency at the cost of slightly higher write latency. Write-behind (or write-back) buffers writes to the cache first and asynchronously flushes them to the database later, improving write throughput but introducing a risk of data loss if the cache node fails before flushing. Write-through is preferred when data consistency is critical; write-behind suits high-write workloads that can tolerate eventual persistence.
Read-through is similar to cache-aside but delegates cache population to the cache layer itself rather than the application code, simplifying client logic. Refresh-ahead proactively refreshes cache entries before they expire, based on predicted access patterns, to eliminate cache-miss latency for hot data. Refresh-ahead works best for highly predictable, frequently accessed data; it can waste resources if predictions are inaccurate.
Because caches have finite memory, eviction policies determine which entries are removed when space runs out. Common policies include LRU (Least Recently Used), LFU (Least Frequently Used), and FIFO (First In, First Out), with LRU being the most widely used default. Time-To-Live (TTL) settings independently expire cache entries after a fixed duration, preventing stale data from persisting indefinitely — always set a TTL appropriate to how quickly your underlying data changes.
Cache invalidation — ensuring stale data is removed or updated — is famously one of the hardest problems in computer science; design invalidation logic explicitly rather than relying solely on TTL. Avoid caching highly volatile data or data that varies per user unless you namespace keys carefully (e.g., including user IDs in the cache key). Monitor your hit rate, eviction rate, and memory usage continuously, and always have a plan for cache stampede protection such as probabilistic early expiration or request coalescing.
© RM Full Stack & AI Engineer · All guides · Roadmaps · Open the app