AWS Bites Podcast

127. Load Balancers

Published 2024-07-12 - Listen on your favourite podcast player

An overview of load balancers, explaining how they distribute traffic across multiple servers and provide high availability. We discuss layer 4 and layer 7 load balancers, detailing their pros and cons. We then focus on AWS load balancers, covering network load balancers and application load balancers in depth, including their features, use cases, and pricing models. We conclude by mentioning some alternatives to AWS load balancers.

AWS Bites is brought to you by fourTheorem an AWS consulting partner with tons of experience with AWS. If you need someone to help you with your ambitious AWS projects, check out fourtheorem.com!

In this episode, we mentioned the following resource:

Let's talk!

Do you agree with our opinions? Do you have interesting AWS questions you'd like us to chat about? Leave a comment on YouTube or connect with us on Twitter: @eoins, @loige.

Help us to make this transcription better! If you find an error, please submit a PR with your corrections.

Luciano: Hello there and welcome to episode 127 of AWS Bites. My name is Luciano and I'm here with Eoin. So cloud computing is all about elastic scalability, high availability and security. For most architecture to get all of that, you'll need a load balancer. But what is a load balancer really? This is the topic of our podcast today. So we are going to explore how does it work, a load balancer, what type of load balancer should you choose when it comes to performance, cost and flexibility.

And by the end of this episode, hopefully you should have a clear idea on which load balancer to choose when you need one on AWS. So let's start with an example. You have a client application that could be a web browser or maybe another kind of application, maybe running on a mobile phone. It could be an IoT device. This client is going to send a request to a server. What kind of server is probably the first question you should ask yourself?

Well, even if your application is running on a physical host, maybe that can be a virtual machine running on EC2 or maybe a container running on Fargate, it's unlikely that your traffic is going to be served directly to that application server. And because there will be a few problems with that approach. First of all, it would expose your server directly to the public internet, which can cause some kind of security problems.

You generally don't want to put your web server just in front on the public internet. That would also mean that you have only one host. So that means that if you start to get a lot of traffic, your single instance machine is going to be overloaded very quickly. And then you have limited options on how to scale it in response to an increase of traffic, for example. And you can also think that this is a single point of failure, not just in terms of scalability, but maybe if something else goes wrong, maybe, I don't know, you install an update and that update goes wrong, or maybe your host fails because of an hardware failure.

At that point, your application just becomes unavailable because your only server just died and no traffic can be served anymore. And again, if you want to deploy a new version of the software or maybe upgrade something in the configuration of your server, then you need to decide, what am I going to do during the downtime? I will have some kind of downtime. How do I manage all of that? And that's going to degrade the experience of your users.

So ideally, you don't want to have just one instance, which brings us to the challenge that we need to do something in our architecture that allows us to support multiple instances of an application. And actually, this problem existed even before cloud computing. So back in the days around the 1990s, when people started to ship all kinds of server applications, they started to have these problems even in their own data centers.

So how do we scale and make things highly available for our users? And that's where softwares like NGINX, Squid, HAProxy, Trafic, and many others, were started to come up to help solve this problem. So the idea is that you use one of this software to effectively become the entry point into your application, and they can distribute and manage multiple connections and distribute those connections across a variety of servers that exist behind the scenes of this specific application. So this is where we can say that the idea of a load balancer was invented. And we will discuss a little bit today about more of the details, what is really the responsibility of a load balancer, what different kinds of load balancers exist, and then we'll talk a little bit more specifically about AWS. What kind of load balancer do you have there, and which ones should you choose depending on your specific use cases? So maybe make sense to start with the features of a load balancer. Eoin, do you want to talk a little bit about that?

Eoin: Yeah, feature number one is the ability to distribute the traffic across multiple hosts and solve a lot of those problems you were talking about there. And then load balancers will generally have different algorithms for distributing the traffic, like simple round robin, weighted distribution, etc. And often there's a lot more and a lot different edge cases that load balancers will have to handle for you.

Then they'll need health checks because if you've got multiple hosts and they need to be able to redirect traffic, they want to direct it to healthy hosts, so ones that are proven to be able to serve responses to requests effectively. So health checks are an important feature. TLS and SSL termination. Normally TLS termination these days for HTTPS or even any other protocol supporting securing encryption, it's a really important feature.

And load balancers can offload the computational workload of TLS from your server by handling it at the entry point into your network. So if the route between your load balancer and the host is secured via another mechanism, like in the case with AWS VPCs, which are already secured and encrypted traffic by default, your hosts will then have a reduced workload. Since they don't have to do that TLS handshaking, encryption and decryption.

Other important features are things like DDoS protection to protect against distributed denial of service attacks. And then a firewall, right? A web application firewall or other firewall rules or software or hardware to detect intrusions and prevent intrusions. Now there's a lot more features you can get with load balancers, but it'll ultimately depend on the type of load balancer you have. And it's probably a good idea now to start talking about the different types of load balancers.

If we're going to do that, we should talk about the OSI networking model. You might be very familiar with this. Maybe you've heard about it, or you might have a vague memory or flashback to your college days or somewhere where you learned about it before. So let's quickly revisit it, quick refresher, just to make sure we're all on the same page. The OSI model, it's a theoretical model. It doesn't necessarily translate directly to physical implementations or software implementations, but it's a good model for talking about networking.

Because when we're talking about networking protocols, we usually start defining about which layer they belong to within this OSI model. So it's useful for understanding what happens when traffic is flowing between a source and a destination. For load balancers, we're going to focus on layers four and seven, and there are seven layers in the OSI model. But let's quickly run through the layers so we can understand where they fit.

And if we start from the bottom up, your first one is layer one, that's the physical layer. That's where your hardware devices are. Above that, you'll have layer two, which is called the data link layer. And this is like low-level communication protocols like Ethernet or Wi-Fi. And above that, you'll have your network layer, which is routing and addressing. And this is in TCP IP. This is where IP sits, Internet Protocol.

So those are kind of low-level things. Most people will be familiar with IP. If you're dealing with things like load balancers, you probably have some familiarity with that, and you'll understand what an IP address is. But above that, then you talk about your transport layer. That's layer four. And this is where things like TCP, UDP, and TLS more or less goes there. TLS, I suppose you can almost put it anywhere because it's like an encryption thing on top.

But transport layer is like TCP traffic or UDP traffic between source and destination, usually. Now, above that, you get into higher-level protocols. Layer five, typically called the session layer. This is for coordination of sessions, really, between two hosts. And protocols like NFS will operate at that level. Depending on the system, you don't necessarily have all the layers. So a lot of systems don't have session layer, so you don't have to worry about it too much.

Layer six also is something you don't typically have to think about too much. It's called a presentation layer. And that deals with the structure of the data itself, like whether it's HTML or JSON or JPEG or text. And also compression can fit in there too. But then at layer seven, this starts to get relevant for most of us again, because this is the application layer, and this is the highest level. And it deals with the high-level application protocol. This is where HTTP and HTTPS live. And for other things like email, SMTP is also a protocol that fits into layer seven. So now that that slightly torturous theory lesson is over, let's get back to load balancers, knowing roughly where layer four, the transport layer, and layer seven, application layer, sit. And we can talk about layer four and layer seven load balancers. So what's the deal with layer four load balancers, Luciano?

Luciano: So since layer four load balancers operate at the transport layer, they don't really understand anything about what we might call the protocol. Like, is it an HTTP request or is it maybe an SMTP? Or they don't even understand whether you are trying to transmit, I don't know, a JPG or an HTML page. They are only dealing with transport. So that generally means TCP or UDP. And of course, this is a trade-off.

It comes with some advantages and some disadvantages. And the main advantage is that because it focuses only at the TCP or UDP level, there is very low latency. Like it doesn't have to really unpack or pack any data or trying to really understand the content of the packets, but it can literally move packets from one place to another, which means that it can move like millions of billions of packets in the order of second with very little hardware costs, so to speak.

So that's why layer four load balancers generally are considered most performant. And they can move massive amounts of traffic. So that's definitely the main advantage. But the disadvantage is that because they don't understand the actual traffic, that the amount of features you can have at the protocol level, of course, is very limited. And for instance, you cannot do things like, I don't know, if you are doing an HTTP request to a specific path, forward that request to maybe an endpoint, but not to another.

Of course, because you cannot even understand the concept of a path because that exists at the HTTP level, that's definitely a feature that cannot exist in this kind of load balancers. Maybe the things that you can do, because you can understand IP, so you can understand source IP and destination IP, you can probably create rules on how to forward traffic based on IP information. This is something that exists at the TCP level, so you can use that and use it to do routing or maybe understand, depending on the distribution of traffic that you want to use, there can be an information you can take and apply to the specific protocol.

And there are some more modern load balancers that have this concept of a proxy protocol, which is something that can add additional information into the packets themselves. For instance, I know that some of them will allow you to see what was the originating IP address, just because they embed that information into this particular space inside the TCP packet. Because the idea is that since they will be normally rewriting TCP packets themselves, from your server perspective, it looks like every request is being originated by the load balancer. So if you actually want to know which one was the actual client IP, you need to try to read this additional information from somewhere else, and the proxy protocol helps with that. So hopefully that gives you an idea of what layer 4 is good for, what are the pros and cons, but let's talk about layer 7 now.

Eoin: Yeah, if you're looking for some more control at the layer 7 layer, then you'll need a layer 7 load balancer, because they can see the traffic at that HTTP or HTTPS level. That means you can see the headers, the path, query parameters, and even the body. Doing this, as you said, Luciano, it's going to introduce some latency, but with that, you'll get a lot more useful features. And the list is a bit longer, like you can get routing based on header, path, and query. You can even modify all of those things before you pass the request to the backend. You can do HTTP compression in your load balancer, so you could do gzip encoding at the load balancer level. You can do much finer-grained specific HTTP health checks, like checking the health of a specific URL and inspecting the response in detail, like the status code and the body or response headers. You can even do caching in the load balancer. Another thing that is quite common in load balancers is to do authentication and authorization at the load balancer level. Like layer 7 load balancers can also almost be used as a very effective API gateway. When we're talking about security, we mentioned things like DDoS protection, but when you're operating at layer 7, you can also have a web application firewall linked to your load balancer so that you can do lots of more specific HTTP, like kind of OASP top 10 type attack detection. Because you can see headers, you can see cookies, and therefore you can also do cookie-based stickiness in layer 7 load balancers. And you don't need that proxy protocol thing that was added for network load balancers or layer 4 load balancers, Luciano, because the X-forwarded for header is very typical with load balancers and reverse proxies. Whereas layer 4 load balancers normally have one connection inbound associated with another connection to the backend, layer 7 typically have a different model where they terminate a connection and then forward traffic to like a pool of connections to the backend server. So it can be more efficient in terms of connection utilization as well. So that's layer 4, layer 7 load balancers. Where does this all fit into the world of AWS? We mentioned that there are four types. We're going to talk about two, the network load balancer and the application load balancer in AWS. Yes.

Luciano: So in AWS, you can of course integrate any kind of third-party load balancer if you're willing to spend the time to set it up correctly on EC2 instances, for example. And of course, that can be complicated. It can take a lot of time and you'll need to know how to do it correctly. So there are alternatives that AWS provides in the form of managed services. And this is probably the most used load balancer you will see in AWS. There are actually four different types, and they are the classic load balancer, which is now deprecated, then network load balancers, which are the layer 4 that we described before. Then we have application load balancers, which is going to be layer 7. And then we have another kind of load balancers, which are called gateway load balancers, which are generally used when you have a third-party security prevention detection device, and you need all the traffic to go through that particular device.

This is a little bit of a more niche use case that only makes sense when you have specific security requirements and you are setting up this kind of third-party tools that need to live at the load balancer level. So we can use the short name NLB for network load balancer and ALB for application load balancer. They can be either public or internal phasing. So that's a great option to have because that means that if you are creating an application that needs to be internet phasing, you can just put a load balancer there and it's going to handle all your incoming traffic.

If you're building something internally, maybe this is an internal API or internal microservices, if you want, you can also use all the same features of both NLB and ALBs just for internal traffic. So traffic that is only existing inside your own private VPC. So let's talk a little bit more about network load balancers. As we said, they support layer 4 and these are great candidates if you don't need all the features of layer 7. And these are generally the way to go because the latency is so much lower. So if you need to provide as low a latency as possible, you probably want to use a network load balancer. They have some support for TLS termination and you can use AWS certificate manager, but still that termination is already done at the TLS level. It doesn't really understand, for instance, HTTPS. So it only unwraps the TLS traffic. Another thing that is interesting is that because you are using certificate manager and it can use SNI, it can support multiple domains. So that's another great feature to have for NLBs. So one more reason to consider a network load balancer is that a network load balancer will give you a fixed public IP address. And either it will provision one for you or you can use an existing Elastic IP that you have pre-allocated before. And this is great because sometimes you might need that for specific reasons depending on your network needs. Instead, if you compare that with an application load balancer, an application load balancer will only give you a CNAME and that you can then configure, for instance, if you need to forward traffic from somewhere else, from like a DNS, you can point it to that particular CNAME. But then behind the scene, the IP associated might change over time. So you can not rely on that IP being fixed. So if you really need a fixed IP, don't use application load balancer, but use network load balancers. Now, what are some of the features of application load balancer and why it might be worth considering those?

Eoin: Application load balancers are going to support everything we talked about in terms of layer 7 load balancer features are supported really in application load balancers. So you got the path-based routing, etc. You can do web sockets, you can do HTTP version 2, not version 3 yet, and you could do gRPC over HTTP 2 as well. You can also, again, like network load balancers, you can do your TLS termination, and it's integrated with certificate manager and also supports server name identification. One of the unique features of application load balancers is that they support lambda functions as targets.

You can't do that with a network load balancer. And they have authorizers built in. So you can use an OADC provider, use an OADC authorizer, and you can also use Cognito user pool authorization. They also integrate pretty well with ECS, like through ECS deployments. Containers are automatically added as targets to your load balancer and ECS will watch your health checks and make sure that the deployment is successful. And then you can integrate into AWS web application firewall or WAF to get your security rules applied, like IP address blocking or preventing SQL injection or cross site scripting attacks. Now, if we look at NLDs in terms of how you create them at ALBs, it's actually very similar between the two. There's just a few subtle differences, even though the feature set is quite different. So let's just talk through the concepts and how you kind of construct an architecture around load balancers, whether you're using an NLD or an ALB. So you create your load balancer first. And once you have a load balancer, it doesn't do anything on its own. You have to create a listener. And a listener is where you'll specify the port and the protocol. And if you want to use TLS termination on your load balancer, you can specify your certificate as well, linking it into ECM. Now within the listener, you can specify your default action and an action can either provide a fixed response like a 404 or forward traffic to a target group. And target groups are really one of the most important concepts when it comes to load balancers, because this is where you route traffic from the load balancer to the backend. Target groups represent your fleet of backend things. Now it differs a little bit at this stage if you're using an application load balancer, because you don't just specify the default action, you can also specify additional listener rules. Each rule will have its own target group. And the listener rules can say, based on the path being slash login, go to this target group. Or if I've got a HTTP header, which matches this value, then go to a different target group. And all of these rules are prioritized, like you give each one a unique priority number value, and they're evaluated in that order. If no rules match at all, the default action will apply. So your default action might be to serve index.html, or it might be serving a 404. And you can have up to 100 rules in these application load balancer listeners as well by default, but it's only a soft limit. So you can get more if you ask nicely. And go back to the target groups, whether you're using an ALB or an NLB, these are groups of targets, as the name suggests, but the targets can be IP addresses. And that IP address doesn't even have to be on AWS, it could also be on premises. And a lot of people use that as a gateway drug, if you'll excuse the pun, to cloud migration. It's like using a load balancer with an on-premises server as the backend, and you can shift the traffic over to cloud instances over time. You can also have application load balancers that are targets of load balancer listener rules. So that means that you can have a hierarchy of load balancers. And it's not uncommon to do that. A lot of people will have that kind of architecture where they find out traffic across multiple load balancers. You might have like a network load balancer in front of an ALB, because you want a fixed IP address, for example. And that's not something that's uncommon at all. It's pretty common. For application load balancers, we mentioned that you can have Lambda functions as targets. At scale, actually, application load balancers can be a much more cost-effective alternative to API gateway in front of Lambda, albeit with a reduced feature set, because you don't have things like caching, rate limiting, and API keys, and various other features. But the target group is also the place where you configure health checks, which will make a request to the targets to determine whether they should be added to or removed from the target group based on their health. And you can specify the frequency, the number of target group health checks that have to succeed in order for a target to be added to your pool. So Luciano, I think we covered quite a lot. Although there's probably quite a lot else we could cover. Is there anything else we need to know?

Luciano: Maybe it's worth spending a few minutes on discussing pricing, because pricing can be a little bit difficult when it comes to, well, probably when it comes to everything in AWS. But yeah, that doesn't change for load balancers. What is the model for load balancer? There is a minimal unit of time, which is the hour. So you pay for... Actually, you can pay also for partial hours, I believe. Is that correct?

But yeah, you basically pay time and then the capacity unit. So this is not too much different from other pricing models that you can find in AWS, but there is this concept that you are using a service for a certain amount of time. And then depending on how much you use that service, you have these units that try to represent the amount of, I guess, capacity that you are consuming per time. So we have different kinds of capacity units. For NLBs, we have NLCU, so Network Load Balancer Capacity Units. And for ALBs, you have LCUs. Now, there is quite a bit of complexity in these LCUs to be able to fully cover it here, but it's basically a formula that is based on new connections, active connection, process bytes, and dual evaluations. So if you're really curious, I recommend checking out the documentation and trying to figure it out. You probably end up with a big spreadsheet if you really want to do a solid model to predict the cost. And keep in mind that you also have to pay for data transfer out per gigabyte beyond 100 gigabytes to the internet.

But data transfer between ADs in the same AWS PPC is free. And by the way, you can enable or disable cross zone traffic in load balancers for LPs, but cross zone is always available, always enabled for ALBs. Now, another thing that might be worth covering very quickly is what are your alternatives? What if you don't want to use AWS load balancers? What are your options? And we already mentioned at the beginning that you could effectively host anything on EC2. For instance, if you like Nginx, of course, you can use Nginx as a load balancer. You just need to make sure you provision it correctly on AWS infrastructure. Similarly, you can use HAProxy or whatever else you like.

My caveat there would be that I think it's significantly complex to do that correctly and to make it scalable and secure. So do that only if you really have the skills and you are really knowledgeable about these tools and all the relevant networking configuration. The pricing is simpler though. Yeah, that's actually one reason maybe to consider that if you don't account for the amount of hours that you will be spending configuring all of that. But yeah, another option could be using DNS load balancing, which has interesting trade-offs. For instance, one thing you could do, you could do geographical distribution of traffic. Maybe you can distribute traffic across regions using DNS load balancing, but also you know that DNS is always the reason for problems in the web. So there is significant complexity there to do it right, to do DNS caching and validation and all that kind of stuff. So again, it's an option, but comes with its own trade-offs. And another option that we have been discussing before is VPC lattice. And we have an entire episode and a blog post. So if you haven't seen that, we'll leave the link in the show notes so you can check those out. So hopefully this was a complete enough coverage of load balancers and how do you use load balancers on AWS. I'm sure that there is a lot we have missed. So let us know if you have any particular experience that might be worth sharing. What kind of load balancers do you like to use? Or maybe any other comments of question that you think is relevant here. So that's all for today. Thank you for being with us. And as always, if you liked this episode, consider sharing it and giving us a like. See you in the next one.