Definition: Network device or service that distributes incoming traffic across multiple servers to improve application availability, scalability, and fault tolerance.
— Source: NERVICO, Product Development Consultancy
What is a Load Balancer
A load balancer is a network device or service that automatically distributes incoming traffic across multiple backend servers. Its primary function is to ensure no single server becomes overloaded while others remain idle. It acts as a single entry point that users see, while behind the scenes it distributes requests among a pool of servers.
It is a fundamental component in any architecture that needs high availability or horizontal scalability.
How it works
When a user sends a request, it reaches the load balancer instead of a specific server. The load balancer applies a distribution algorithm to decide which backend server should receive the request. The most common algorithms are round-robin (sequential rotation), least connections (server with fewest active connections), weighted (distribution proportional to capacity), and IP hash (same user always routed to the same server).
Modern load balancers perform periodic health checks against backend servers. If a server stops responding, the load balancer removes it from the pool automatically and redistributes traffic among healthy servers. When the server recovers, it rejoins the pool.
They exist as dedicated hardware (F5, Citrix), software (Nginx, HAProxy), or managed cloud services (AWS ALB, Google Cloud Load Balancing, Azure Load Balancer).
Why it matters
Without a load balancer, a single server is a single point of failure: if it goes down, the entire service goes down. The load balancer provides automatic redundancy and enables horizontal scaling by adding more servers to the pool without changes to client configuration.
Practical example
A web application receives 10,000 requests per second during a marketing campaign. A single server handles 3,000 requests per second before degrading. The team configures a load balancer with four backend servers. The load balancer distributes requests evenly: each server receives approximately 2,500 requests per second, operating within capacity. If one server fails, the remaining three absorb the load temporarily while it recovers or a new one is provisioned.