What is HAProxy?
As the HAProxy official website states “HAProxy is a free, very fast and reliable solution offering high availability, load balancing, and proxying for TCP and HTTP-based applications.”. HAProxy stands for High Availability Proxy. It is open-source and it is freely available for every use case. It has high availability, which means it is durable and can work continuously for a long time without failing. It is so fast that many big companies like Airbnb, Adobe Advertising Cloud, Alibaba, Github, Instagram, Reddit, Tumblr, Twitter, Vimeo, etc. are relying on it.
What is a load balancer?
If you are working on managing multiple concurrent connections then you are probably familiar with the word load balancer. The load balancer allows us to distribute traffic to multiple servers. It has many load balancing strategies to operate our network efficiently. It has an internal health check mechanism to check server health status. If the server is unreliable then the load balancer will redirect it’s workload to a reliable server. So, a load balancer is a reliable solution for managing a large set of connections in a concurrent manner. If a spike happens in workload then you can add additional resources. You can remove the resource once the workload gets reduced. It is a very cost-efficient solution to manage a large workload.
Load balancer explained in simple words
Let’s understand the load balancer in simple terminology.
You are operating a restaurant and you have one waiter. On the first day, you got only 5 customers. One waiter can easily serve all customers. But as time passed, your restaurant has grown. Now in a day, you are getting a hundred customers. Luckily your waiter is young and energetic so you managed to serve hundreds of customers without hiring a new waiter. What if you get thousands of customers in a day? A single waiter can’t serve everyone. You need to hire many new waiters. If you fail to hire a new waiter then your restaurant business will collapse because of bad service.
The same concept applies to the webserver. If your web server has only a few connections then it is very easy to manage on a single server. Now let’s assume you have thousands of concurrent connections. Luckily you have a very powerful server to manage thousands of requests, still, every web server has some kind of ram, memory or bandwidth limitation so once your webserver reaches that limit then your server will start lagging or a crash can happen.
The load balancer is a way that allows us to manage a large set of connections without worrying about server limitation or traffic. It has many load balancing strategies so you can divert traffic based on your convenience. It has a health checker mechanism so if any server is unreliable then they will redirect traffic to a reliable web server.
If a spike is happening in traffic then you can add an additional server and when traffic goes down then you can reduce the number of servers. It is a cost-efficient solution because you are paying only for used resources, not for unused resources.
What is the difference between AWS ELB and HAProxy?
- AWS ELB is a cloud-based Highly Available Hardware + Software solution. HAProxy is a Software that combines with your chosen hardware and works on that, you will have to manage its HA structure.
- ELB costs depend upon the usage of ELB. In the ELB family, there are three kinds of load balancers: Application load balancer, Network load balancer, Classic load balancer. All three load balancers have different rates. HAProxy is an open-source software-based solution so you are paying only the instance cost.
- ELB offers a limited number of load balancer algorithms. ELB support only round-robin and sticky session algorithm while HAProxy support many algorithms such as round-robin, least connections, source, URI, URL parameter.
- ELB supports a wide range of protocols including HTTP, HTTPS, TCP, SSL. If you want to support streaming protocols such as RTMP, HTTP then you can extend it by using CloudFront CDN. HAProxy supports a limited set of protocols including TCP and HTTP.
- ELB is designed to handle the unlimited concurrent request. ELB is good at managing a large set of concurrent requests in a gradually increasing manner. ELB struggles in managing the immediate spike in traffic. If you expect a sudden spike in traffic then you can create a server and assign a server to an HAProxy.
- ELB can auto-deploy EC2 instance when spike happens in traffic. For HAProxy you need to manually create or remove EC2 instance. If you want auto-creation or deletion of the server then you need to write an automation script.
- Let’s assume you have an E-commerce website. You want to divert traffic of product route to X server and order route to Y server. In ELB only application load balancer supports URI/URL params based routing. HAProxy can redirect traffic based on URI/URL params.
- In ELB by default access log is disabled, if you want then you can enable it. ELB access log contains information such as request time, client IP address, latency, request path, server response. Access log gets stored in the S3 bucket. HAProxy emits a log message to be processed by the syslog server. This is compatible with syslog like rsyslog as well as the systemd service journald. You can also utilize various log forwarders like ELK stack for receiving a syslog message from HAProxy. HAProxy log message contains a wide range of info like timers, byte count, term code, connection count, queue length, etc.
Conclusion
If you are a startup and you don’t want to spend extra money on load balancing then HAProxy is a perfect solution for you. If you want a wide range of load balancing algorithm then go with HAProxy. If you don’t want to write additional lines of code for deploying an instance when spike happens in traffic then ELB is the right solution. If your application has a requirement of TCP, SSL, streaming protocol such as RTMP then ELB is suitable for you.