A concise, beginner-friendly technical guide covering what database indexes are, why they matter for performance, how they work under the hood, and best practices to avoid common pitfalls.
A database index is a separate data structure that stores a sorted subset of column values along with pointers to the corresponding rows in a table. It works similarly to a book's index: instead of scanning every page, you jump directly to the relevant entry. Indexes are created on one or more columns and are maintained automatically by the database engine as data changes.
Without an index, the database performs a full table scan, reading every row to find matches — an O(n) operation that becomes painfully slow on large tables. With an index, lookups can drop to O(log n) or even O(1) depending on the index type. This difference can mean queries that took minutes completing in milliseconds, making indexes one of the highest-impact performance tools available to developers.
The most common index structure is the B-Tree (Balanced Tree), which keeps values sorted and allows searches, insertions, and deletions in O(log n) time. Hash indexes offer O(1) exact-match lookups but cannot support range queries. Specialized types like bitmap indexes suit low-cardinality columns, while full-text indexes enable linguistic search across large text fields.
A composite index is built on multiple columns, such as (last_name, first_name), and is most effective when queries filter on those columns in the same left-to-right order. A covering index includes all the columns a query needs, allowing the database to satisfy the query entirely from the index without touching the base table. This eliminates an extra 'heap fetch' step and can dramatically reduce I/O.
Every index must be updated whenever rows are inserted, updated, or deleted, adding write overhead and consuming additional disk space. Over-indexing a write-heavy table can slow down INSERT and UPDATE operations significantly. A good rule of thumb is to index columns used frequently in WHERE clauses, JOIN conditions, and ORDER BY expressions, while avoiding indexes on columns with very high write rates and low read selectivity.
Indexes are silently ignored when a function is applied to an indexed column in a WHERE clause (e.g., WHERE YEAR(created_at) = 2024 bypasses an index on created_at — use a range condition instead). Regularly run EXPLAIN or EXPLAIN ANALYZE to verify that the query planner is actually using your indexes. Monitor index usage statistics and drop unused indexes, as they cost write performance and storage without providing any query benefit.
© RM Full Stack & AI Engineer · All guides · Roadmaps · Open the app