AWS Bites Podcast

72. How do you save cost with ECS?

Published 2023-03-17 - Listen on your favourite podcast player

AWS ECS is a powerful service that allows you to run containerized applications at scale. It's suitable for a variety of use cases, including web applications, microservices, and background processing.

In this episode, we'll provide an introduction to the main concepts of ECS and then dive into cost-optimization strategies. We'll explore the different options for running containers on ECS, including EC2, Fargate, and ECS Anywhere.

We'll discuss various opportunities for saving money, such as using Arm (Graviton) instances, Spot instances, Compute Savings Plans, and RIs or EC2 Saving Plans.

Finally, we'll cover how to set up ECS to use Spot instances, including how to create capacity providers and specify a capacity provider strategy. We'll also discuss whether it's always best to use EC2 instead of Fargate for cost optimization and recommend some tools that can help you find other opportunities to save on container costs.

AWS Bites is sponsored by fourTheorem, an AWS Consulting Partner offering training, cloud migration, and modern application architecture.

In this episode, we mentioned the following resources:

Let's talk!

Do you agree with our opinions? Do you have interesting AWS questions you'd like us to chat about? Leave a comment on YouTube or connect with us on Twitter: @eoins, @loige.

Help us to make this transcription better! If you find an error, please submit a PR with your corrections.

Luciano: ECS, or Elastic Container Service, lets you run containerized applications at scale on AWS. It's suitable for web applications, microservices, and background processing. With ECS, you can use Fargate to make it simpler, or the EC2 version for more advanced control. You can even extend the workload to your own data center if that's your thing. And given all this option, how do you get a handle on cost and optimize to get the biggest bang for your buck?

If you stay until the end of this episode, we will tell you some tricks and tips that you can follow to try to optimize cost for your Fargate or ECS-backed EC2 deployments. I am Luciano, and I'm here with Eoin, and this is AWS Bites podcast. The AWS Bites Podcast is sponsored by fourTheorem. fourTheorem is an AWS partner for migration, architecture, and training, including ECS. Find out more at fourtheorem.com. You'll find the link in the show notes. So first of all, it's probably a good idea to start to explain what ECS is and what are the main concepts. Eoin, do you want to try to do that? I'll give that a go.

Eoin: Yeah, so you have two primary modes. You have ECS on EC2, which you mentioned. That's the original flavor of it, where you're running containers, and you're actually running EC2 instances, and you can see those instances and manage those instances underneath. And then you've got Fargate, which is the so-called serverless flavor of ECS, and that allows you to run containers without worrying about EC2 or the operating system underneath too much.

That mode is a bit simpler. I suppose the trade-off there is that you have less choice on the underlying hardware, network speed, disk, and the ratio between CPU and memory, and you also have no GPU with Fargate, at least yet. There is a third option, which is ECS Anywhere, and that's where the instances are external. So if you've got a data center there with lots of hardware lying idle, you can run ECS tasks there as well.

You just have to run the ECS agent and the systems manager agent on your on-premises instances. There's a couple of terms and concepts we probably want to describe with ECS. So before we get into cost optimization, we'll talk about them because we're going to mention them as we go through the different ways of saving cost. So the primary grouping within ECS is a Cluster, and if you go back to the original EC2-backed flavor of ECS, that's what the cluster was for.

It was for grouping your instances together into a logical grouping, and they would be able to run tasks for you. Now, you still use clusters if you're running Fargate, but you don't really have to configure very much there, and a cluster can combine different workloads that are running EC2 and Fargate, which is kind of interesting, and we'll get into that. And then the smallest, I suppose, unit on ECS is a Task, and a task is one or more containers, so it's a bit like a pod, and you've got multiple containers running in it.

It could be one. It could be one container with a kind of a sidecar, and in order to run those tasks, you need something called a Task Definition, which is where you declare what's the container image, is it compatible with Fargate or EC2, and lots of other parameters like your networking setup, memory, and CPU. So that's your task and task definition, and then if you're running groups of tasks or pools of tasks together, you can use the ECS Service concept, and a service is basically where you say, I want to run a desired number of tasks, and I want you to scale it up and down for me.

So you're leveraging the ECS managed service capability to scale up and down for you, and it includes a feature called application auto scaling, which will scale up and down tasks, and if you're using EC2 under the hood, you can then combine that with cluster auto scaling so that your tasks are scaling up and down, but also the instances are scaling up and down to be able to run that number of tasks as well, and a service gives you a couple of other neat features like deployment configurations, so what happens when you upgrade a new version of that task definition, and what happens when things stop working? Do you want to be able to do a circuit breaker and roll back from that? It includes health checks, and it also includes integration to load balancers as well so that when you scale up, you're adding tasks into that load balancer target group. So that's really the set of concepts you kind of need to learn if you're getting started with ECS. A lot of people might be familiar with them already, but pricing is sometimes a little bit more of a nebulous area, Luciano, which you want to try and describe. How does ECS pricing work?

Luciano: Yes, so you mentioned that there are pretty much two different ways of running containers on ECS. Either you provide your own EC2 instances, or you basically delegate the task of figuring out where the compute is by using Fargate, and AWS will try to figure out how to allocate compute resources to you. So when you use EC2, it's actually pretty interesting how that affects the pricing model, because you basically just pay for the EC2 instances that you are putting into Fargate.

So you just pick whatever set of EC2 instances suits your needs, and then you know that's going to affect your cost somehow, and there is no extra cost for ECS at that point. If you decide to use Fargate, because it's on AWS to find the compute resources, there is, of course, a pricing model, and that pricing model is based on two different dimensions, which is how much memory are you giving to the specific containers that are running, and how many VCPUs you are giving to the container as well.

And of course, there is like a unit cost, but then there are unit costs for both of these two dimensions. For instance, if you use one VCPU and one gigabyte of memory, you might be paying something like, I don't know, four cents for the one VCPU, and another something for the one gigabyte of memory that you are giving to the container. And this makes sense, of course, in a specific region, and if you're using Linux x86, because if you're using Windows, the same way of calculating cost applies, except that the pricing is different because you are paying for a Windows instance.

And there is also an additional charge, which is the license which gets included. So on top of what you pay per VCPU and per gigabyte, you also pay something around five cents per hour just to cover that Windows licensing cost. So keep in mind that if you use Linux, you're not paying the extra licensing cost. If you use Windows, you have to pay an additional charge for covering the license effectively.

So the next question might be, given this model, what are our options to save on cost? And there could be different ideas that we are just going to try to put on the table, but the first one is that you can use a different processor, because if you use ARM or Graviton, it can get up to 20% cheaper when you use Fargate. It can be a little bit more variable with EC2, because depending on the type of instance that you pick, there might be differences in cost, but generally gets cheaper by using ARM or Graviton processors.

There is also another aspect there, because you could get better performances with ARM, so maybe it doesn't necessarily affect your cost per se, but you might need less compute overall if you switch to ARM. So maybe that can be another thing that ends up affecting your cost, because you can reduce the number of instances or the amount of compute that you actually need to perform your specific workloads.

Another option is to use Spot, and you can use Spot either if you go with Fargate or if you go with EC2. And the idea of Spot in general, if you never used or if you never heard of what Spot actually is, it's basically an idea that AWS offers you to try to sell all the spare capacity that they have available in their data centers. So at any given moment in time, it looks like an auction. You can just try to get additional compute at a better cost just because you are trying to buy the leftover from AWS, which is very interesting, because it makes it for a very variable market, where there isn't really a fixed cost that you can easily predict.

It's something that shifts all the time, so you're saving, even if most of the time you can get significant saving by going to the Spot market, but it's very variable, so it's not something that you can predict with extreme accuracy. And of course, you have different costs in regards if you go with Fargate or EC2, and in general, what AWS tells you is that you can get up to 70% saving if you go with Fargate and up to 90% saving if you go with EC2.

But again, that really depends on the price that you have available at the moment when you decide to get those instances. And we will talk more about other trade-offs that are there when you go for Spot. So it looks like such a good deal, so of course there are trade-offs there that we will need to keep in mind. So stay tuned, because in the end, we'll cover more about what are those trade-offs there.

Another option to save money is Compute Savings Plan, which is basically a commitment that you make to AWS to pay a certain amount of money upfront for Compute, and basically you will be able to use these computes that you are buying upfront at a discount. And it doesn't really lock you in on the type of instances that you can use. You have some flexibility to shift the compute as you go. So you are making this commitment for at least one year, up to three years.

So of course you cannot predict what is your use case going forward in the future, so AWS gives you some room to maneuver that, to try to figure out if you want to switch to different compute, you can do that. And it is actually quite interesting, so we will have a link to the specific page with the Saving Plan Calculator, if you want to play around with this idea. But basically you can get from 20 to 52% saving with Fargate, and up to 66% with EC2, and always depending on instance type, region, and so on.

Now another similar concept, and another way that you might be using to save money with ECS, is to use Reserved Instances, which is something where basically you are literally buying upfront instances, and you get locked into that specific type of instance, and that will give you up to 72% saving with fixed distance classes. And finally, another idea that we have here is ECS Anywhere, which is basically the option that we mentioned before where you can run ECS on your own hardware.

So basically you are running containers on your own data center, on your own spare hardware that you have available somewhere else outside AWS, and you don't pay, of course, for that hardware because we assume you already have it, so you probably send you already purchased, but there is a charge that AWS adds for every single instance that is connected to ECS. It's very low, it's about $0.01 per hour, but it's something that you need to keep in mind, maybe if you have thousands of smaller instances in regards instead if you have a few very big instances.

So the next question is, can you mix cost savings? And meaning, can you mix these strategies of cost saving? Of course you can. For instance, you can mix using a Graviton processor with Savings Plans, but you cannot use Savings Plans with Spot. So something to keep in mind that there is some flexibility there to use different strategies and combine them, but not every single strategy can be used together with the other.

Another point is if you are doing cost optimization, you might have this question. Do you always go for EC2 or do you go for Fargate? Because for what we said so far, EC2 seems to be cheaper in general, so you might be thinking, should I just use EC2 and not Fargate and save money that way? It is actually a very interesting question. It's very hard to answer in absolute, but in general we feel that it's not necessarily a good idea because the price difference is not that much, and we need to remember that EC2, you still have to manage your instances, while Fargate is doing all that work for you.

When you need compute capacity, AWS will figure out on which EC2s to run that compute capacity, and it's not up to you to provision those instances and make sure that they are secure, that they are connected to the network in the right way, and all these kind of things that you will need to do on your own. So you definitely get slightly cheaper prices when you go with your own EC2 if you just look at the cost of literally running that particular workload, but at the same time you have to take on board on you all the additional cost of managing those instances. So from a total cost of ownership perspective, Fargate still seems a lot more convenient than EC2, considering that the price difference is not that high after all. So I suppose that the question that we can try to address next is how do you set up ECS to use Spot instances?

Eoin: We said that you can get up to 70% saving with Fargate Spot. It's interesting that with Fargate Spot, you get actually pretty close to 70% almost all the time with Fargate Spot. It's generally somewhere in the region of 60, 67, 68% saving, whereas with EC2 you're paying individual Spot prices for individual instance types. So it's actually per instance type a lot more variables so it can sway between like 50% and 90%.

There's some tools you can use actually for EC2 which make it a little bit more transparent what the historical Spot prices are. You can actually go into the EC2 console and look at the historical pricing. There's an API for it and you can use the AWS CLI to look at the historical Spot prices, but there's also a couple of tools that are worth mentioning. One is an open source tool from Amazon called Amazon EC2 Instance Selector, which is just a really nice CLI tool that will allow you to select instance types and filter for the different criteria you want and then give you a report back saying these are the instance types that are suitable for you.

And then there's a website from Vantage. It's ec2instances.info. We'll give all those links in the show notes and that tool will show you the min and max Spot prices. So when it comes to actually setting up these Spot, we mentioned that for a cluster you can choose Fargate or EC2 or mix them. So what you basically do there is you have this concept of capacity providers. So within your cluster everybody gets Fargate and Fargate Spot capacity providers out of the box because they're always there. You don't have to set anything up.

And then you can create your own capacity providers for EC2 by creating auto scaling groups in EC2 and referencing them. So you basically say I'm creating a capacity provider and it's using an auto scaling group that uses a launch template for Spot instances and that launch template specifies that I'm going to use this different, this set of classes of EC2 instance and that's one auto scaling group.

But then you can create an on demand auto scaling group as well and you can add that as a capacity provider. I think you can add up to like 10 capacity providers. But then you need to combine them in what's called a capacity provider strategy where you say okay I'm going to use some from this capacity provider and some from this other capacity provider and that could be a mixture of EC2 and EC2 Spot and then you give a weight to each of them.

So you can say for every one on demand instance I want four Spot instances. And but you can also specify a baseline. So you could say I want a steady baseline of two on demand instances as well. So you could say my first two instances will be on demand and for the rest beyond that there's a ratio between on demand and Spot. And you can actually mix multiple capacity providers in that. It is worth stating that you, while you can mix Fargate and EC2 providers in your cluster, you can't really use them at the same time.

So if you set up a service that scales up to a certain number of desired tasks, you can't have a mix of Fargate and EC2 in that. You can only use Fargate together with Fargate Spot or EC2, together with EC2 Spot. You can always use run task API directly so you don't have to use a service. You could just launch tasks yourself using the run task API and every time you call it run task API you can launch up to 10 tasks and you can pick whichever capacity provider you want.

So that's another way you could potentially mix and match between Fargate and EC2 and get a blend. But obviously the more you do, the more complex it can become there. So you have to trade off between the complexity of basically managing your own scheduler and orchestrator and your cost saving. So Spot is pretty advantageous. I mean, if we just go back to our pricing example, we've run a few calculations there today.

You have to kind of look at how many CPUs you want and how much memory you want. Look at the available EC2 instances. With EC2 you always want to pick a ratio that matches your task definitions CPU to memory ratio and then you can do a pricing comparison. And if we look at generally the, if you look at a one vCPU configuration with two or four gigabytes of memory, you're talking about the range of like four cents or five cents per hour for a CPU and memory.

And when you compare Fargate and EC2, you get a bit of a, you pay a little bit extra for Fargate, but it's not completely miles away or anything. So you might say, okay, well Spot is giving me up to 70% saving in here, so why can't I just use Spot all the time? So what are the disadvantages of Spot? Well, you mentioned that they use spare capacity, so that means that when, there will come a point where the spare capacity isn't spare anymore because somebody actually wants to run an on-demand instance.

And in that case, you'll get a notification, a 120 second warning from AWS that they're going to shut your instance or your Fargate container down and you have to gracefully shut down and move your work somewhere else or just be satisfied that you're going to have to wait for a while. The other disadvantage is that prices fluctuate, so it's not exactly deterministic. And sometimes instances won't be available for your instance class or in Fargate even.

It depends on your region, the time and the demand, and of course, there are Linux and x86 only. So you don't get ARM, Spot, Fargate containers at the moment and you don't get Windows, Windows Spot. If you're using something that's on demand, has to be predictable. It's really a good idea to look at compute savings plans instead. It's really flexible, but if you know you're going to be using a certain amount of compute for a year or three years, then it's a significant save that you can get there. On this topic, what else do we need to bear in mind, Luciano? Is there anything else we need to cover off?

Luciano: I think in general, we focus a lot here on the compute cost itself. And it's definitely the biggest chunk of the cost when you run ECS workloads. Of course, when you run ECS workloads, you will have additional cost coming from your logs, your metrics, data transfer, disk I/O, and we can even extend that if you're using databases, probably how you interact with the database, how much data do you store and so on, and the cost of, I don't know whether it's Dynamo or RDS.

So in general, it's very hard to make just cost optimization only looking at the compute itself. So I would suggest to expand a little bit the horizon there and see if there are other opportunities for cost optimization. For instance, maybe you can reduce the retention of logs if you are storing logs indefinitely. You can put a retention time, or if you have some retention time that is very high, maybe you can reduce that and that can give you significant cost saving almost immediately with a very small change.

So I guess the suggestion there is try to look for opportunities to do cost saving, not just outside the realm of ECS itself, but just looking at what else ECS is kind of forcing you to use because you need to run very specific workloads. And there are some other tools that we might want to recommend there just to have other ideas on where there can be opportunities to save cost. One is the Fargate Right Sizing Dashboard, which basically is a tool that you can just deploy in your own account and use as container insights metrics to identify if there is any waste or optimization opportunities.

So we will have the link in the show notes, but again, it's just something you can decide to deploy in your own account and use it as an additional tool to get more visibility. And similarly, there is a tool already in AWS, in AWS Cost Explorer specifically, that is something that can give you a recommendation for right sizing EC2 instances. So something that you can just have a look at if you're using a lot of EC2 instances to back your ECS and just make sure that those instances are actually used well enough.

Maybe you can just get away with a smaller instance or just a different kind of instance. So that's everything we have for today. And because this is a very big topic and I think cost optimization, as you say, is a little bit nebulous. It's a little bit of an art. We would be really curious to know if you have other tricks that you learned along the way. Please share them with us and we'll be sure to share them on Twitter, on YouTube or in a next future episode. And this is definitely a topic where you never learn enough. You always need to discover something new. So I think it's very, very important that as a community, we keep sharing what we learn so we can all learn from each other. So thank you very much for being with us today. I hope you found this informative and we look forward to seeing you in the next episode. Bye.