Load balancing is the process by which inbound Internet protocol (IP) traffic can be distributed across multiple servers. This enhances the performance of the servers, leads to their optimal use, and ensures that no single server is overwhelmed. The practice is particularly important for busy networks, where it is difficult to predict the number of requests that will be issued to a server.
Typically, two or more web serves are employed in a load balancing scheme. In case one of the servers begins to get overloaded, the requests are forwarded to another server. This process brings down the service time by allowing multiple servers to handle the requests. Service time is reduced by using a load balancer to identify which server has the appropriate availability to receive the traffic.
The process, very generally, is straightforward. A webpage request is sent to the load balancer, which forwards the request to one of the servers. That server responds back to the balancer, which in turn sends the request on to the end user.
Load balancing allows service to continue even in the face of server down time due to server failure or server maintenance. If a company is using several servers and one of them fails, its website or other services will still be available to its users, as the traffic will be diverted to the other servers in the server farm. In Global Server Load Balancing (GSLB) the load is distributed to geographically scattered server farms, depending on their load, health, or proximity.
There are several methods by which loads can be balanced. If the servers are similar in hardware specifications, the Perceptive (which predicts the server based on historical and current data) and the Fastest Response Time methods can be the best to use. On the other hand, if the hardware specifications are different, the Weighted Round Robin method, which assigns requests to servers in turn according their weights, may be a better solution because it can assign more requests to the server that can handle a greater volume.