Load balancing is the process of distributing incoming network traffic across multiple servers to ensure no single server becomes overwhelmed, maximizing availability, reliability, and performance of applications.
A load balancer is a device or software component that sits between clients and a pool of backend servers, routing each incoming request to the most appropriate server. It acts as a reverse proxy, accepting traffic on a single entry point and transparently forwarding it. This abstraction means clients never communicate with backend servers directly, which also improves security.
Without load balancing, a single server handles all traffic and becomes a single point of failure — if it goes down, the entire application goes offline. Load balancing enables horizontal scaling, allowing you to add more servers to handle growing traffic rather than upgrading one expensive machine. It also enables zero-downtime deployments by draining traffic from servers before taking them offline.
Round Robin cycles through servers in order, sending each new request to the next server in the list — simple but ignores server load. Least Connections routes requests to the server with the fewest active connections, which is better for long-lived or variable-duration requests. Weighted variants of both algorithms allow you to send proportionally more traffic to more powerful servers. IP Hash deterministically maps a client IP to a specific server, which is useful for session persistence.
Layer 4 (transport-layer) load balancers operate on TCP/UDP packets and make routing decisions based on IP addresses and ports without inspecting the payload — this makes them extremely fast. Layer 7 (application-layer) load balancers inspect HTTP headers, cookies, and URLs, enabling smarter routing such as directing /api requests to one server pool and /static to another. Most modern load balancers like NGINX, HAProxy, and AWS ALB operate at Layer 7. Layer 7 balancers can also perform SSL termination, offloading decryption from backend servers.
Load balancers continuously probe backend servers using health checks — periodic HTTP pings or TCP connections — to detect failures. If a server fails a health check, the load balancer automatically removes it from the rotation until it recovers, preventing traffic from reaching a broken instance. For the load balancer itself to avoid being a single point of failure, it is typically deployed in an active-passive or active-active pair with a shared virtual IP address.
If your application stores session data in server memory, routing the same user to different servers will cause them to lose their session — a common pitfall called the stateless assumption violation. The quick fix is sticky sessions (session affinity), where the load balancer pins a client to one server using a cookie, but this undermines balanced distribution. The proper solution is to externalize session state to a shared store like Redis so any server can handle any request, making your app truly stateless and load-balancer friendly.
© RM Full Stack & AI Engineer · All guides · Roadmaps · Open the app