AWS Bites Podcast

61. How do I control AWS cost?

Published 2022-12-02 - Listen on your favourite podcast player

Let's face it: when it comes to AWS, cost is one of the scariest topics!

Why? Mostly because the underlying model can get very complex.

There are too many variables and ultimately it's just hard to predict how much is a given workload going to cost you on AWS. Are going to be bankrupted by this unpredictable cost? Probably not!

In this episode, we share some suggestions and tools on how to approach cost when going to AWS. It's not a simple topic, but it's something you need to embrace, learn and get confident with. With a bit of effort, cost will not be so scary anymore and you'll be able to take advantage of all the awesome services and features of AWS without being so worried about cost!

AWS Bites is sponsored by fourTheorem, an AWS Consulting Partner offering training, cloud migration, and modern application architecture.

Some of the resources we mentioned:

Let's talk!

Do you agree with our opinions? Do you have interesting AWS questions you'd like us to chat about? Leave a comment on YouTube or connect with us on Twitter: @eoins, @loige.

Help us to make this transcription better! If you find an error, please submit a PR with your corrections.

Eoin: Things were simpler back when we could buy hardware and a few software licenses, hoping that it was enough, but not too much to run whatever we needed, and that was it. In the cloud, we don't have to pay for much upfront, and we can scale way beyond what we originally anticipated. This flexibility, though, comes with a trade-off. We are talking about cost complexity. If the cost of cloud is holding you back, this episode is for you.

We're going to talk about understanding AWS pricing and billing, and share tools and tips to get better visibility and control on your AWS costs. We will also let you know some ways you can get AWS to pay the bill for you. I'm Eoin, I'm here with Luciano, and this is the AWS Bites podcast. AWS Bites is sponsored by fourTheorem. fourTheorem is an AWS consulting partner offering training, cloud migration, and modern application architecture. Find out more at fourtheorem.com. You'll find the link in the show notes. Luciano, cloud computing is often compared to your domestic utilities like electricity. You pay for what you use, and this is probably very appealing if you've got very low or very sporadic usage, since you don't pay for the idle time. So if this is the case, why does cloud computing cost scare people off so much?

Luciano: Yeah, I think it's because everything looks great when you don't have surprises, and you can have some surprises because the billing model is a bit complicated. So people are naturally scared about, okay, what if I didn't get it right? What if it doesn't turn out to be as cheap as people say it should be? Because so many variables, I don't really have a lot of confidence that my understanding of this model is correct.

And the way I'm gonna be using this cloud computing magic thing, maybe it's not necessarily the same as other people do. So it might be cheaper for them, but it might not necessarily be cheaper for me. So, and another interesting thing is that there is this concept of free tier, which is kind of a way for AWS to allow people to just step into AWS and play around and try different things. Theoretically without incurring any costs.

But again, you need to be very careful about what is the promise there? What are the limits that you need to be careful? Like when are you within this free tier? It's not something obvious. It depends on how you use AWS. So it might be very easy for you to, don't even realize that you are actually going over the free tier and end up with a bill that might actually be even significant for you. We actually have a link in the show notes with the definition of what the free tier actually is and some frequently asked questions around it.

So that's another thing that we suggest people to look into if you are still a little bit skeptical about AWS getting started, what is the free tier? What can you get effectively for free? So that can be a very good resource to kind of figure out exactly where to start and what can you get for free. I want to give you another example just to make the point that sometimes things can be surprisingly expensive.

And that's something that you hear stories and then scare people off. Then they say, no, I don't want to go to AWS because I don't want to be that next person to have a horror story about billing. And this is very typical and it's about NAT gateways. We have been talking about them a lot in the past, also in our "Horror Stories" episode. So it's a recurring one. And the point is that whenever you are building an application in the cloud, you need to do some networking and it's very common that you create your own private networks.

And then when you need to have compute that needs to access the internet, you end up provisioning a NAT gateway in your private networks. And NAT gateway is a service from AWS. So it comes with its own cost structure that you need to understand. And it looks very cheap because it's like 5 cents per hour. And then you pay a certain amount for data processing, which might feel like a little bit of a need than cost.

And at first glance can look very cheap as well because it's 5 cents per gigabyte, more or less depending on the region. Oh, there is also an additional 10 cents for the amount of data that goes to the public internet, I believe. So depending on how much traffic you are actually having from your private VPC to the public internet, that might end up escalating very, very quickly. And you might end up with a massive bill without even realizing, like just with a NAT gateway, you end up paying so much. So this is just one of the other stories that people use as evidence that the cost structure is not easy to understand. And you might end up with surprises that are not nice to have.

Eoin: I think there's just some things you have to be aware of that you can't really escape. And the NAT gateway is a classic example, but I think data transfer in general is one of those things that's frequently complained about when it comes to AWS pricing. It's always free to get your data in, but unsurprisingly, maybe it's costly to send it out. So you pay for data transfer between AWS regions for data transfer out to the internet.

You also pay for data transfer between availability zones in a region. So you can have an EC2 instance that's talking to an RDS database in a different AZ, and you'll pay for that. S3 pricing has become a bit better. We talked about that in a few previous episodes because of the competition from Cloudflare with their S3 alternative. Now you can get 100 gigabytes out for free. So maybe we can paint a picture of cost complexity with an example.

So if you imagine we have a typical web server architecture with a database cluster, EC2 instances, some EBS volumes attached to those, a load balancer, internet gateway, NAT gateway, DNS, you've got a number of services there already. But for each of those services, you'll also have multiple different pricing dimensions. So if we just look at the EBS component, the price for your EBS volumes, so having this disk essentially attached to your instance will depend on the volume type, and you will choose a different volume type depending on whether you're optimizing for throughput or IO operations.

And then it's a function of that volume type, the size, the throughput, the number of IO operations. And then you also have things like EBS snapshots, and you might have backups as well. And these are difficult things to understand. Like EBS snapshots are kind of incremental in deltas. So it's not very easy to always predict what the cost is going to be. And in that example as well, load balancer cost is another tricky one.

With load balancers, you pay a standing charge, but also you pay for LCUs, which are load balancer capacity units. Lots of different AWS services have <something> CU, like DynamoDB has write capacity units, and even the new Neptune serverless has, I think, Neptune capacity units or something. With the load balancer case, each LCU gives you like a number of connections, a number of new connections, data throughput, rule evaluations.

So you have to work out all these things for your workload and figure out what the cost is going to be. So unfortunately, a lot of the advice here is, do your homework and become good at it. It's just becoming part of software architecture that you have to understand pricing models just as much as you do, understand performance and scalability. But look, it's not all horror stories. There's actually quite a lot you could do.

It can also be really advantageous in terms of cost if you get it right. I mean, some people get away with ridiculously low AWS bills. I think one good example was A Cloud Guru, who ended up building their entire learning management system on serverless and AWS, and frequently publicly talked about how, despite the fact that they were pretty much the poster child for serverless applications for a number of years, with hundreds of thousands of users and massive revenues, their AWS bill was essentially zero. So there's a good side to this as well. But what would you recommend, Luciano, for first early adopters to AWS to try and ease the burden and complexity here? Because it does end up a bit scary.

Luciano: Yeah, I want to echo what you just said, that you need to do your own homework and get good at it. And there are some tools that can help. For instance, there are billing simulators, even official AWS ones that you can use. And we might have a link in the show notes. But what I find most of the time more useful is just use a spreadsheet or whatever tool makes sense to you, because that's, I think it forces you a little bit more to understand these dimensions when you have to build the model yourself.

Like it makes you go a little bit deeper than just using a simulator where you might forget to look at a particular field. And then you won't consider a particular dimension, which might end up being very relevant in the final cost calculation. And building your own spreadsheet, it can be an interesting exercise. I don't know if you have startup experience, but when I end up doing this exercise myself, it always feels like I am doing a business plan for a startup.

And when you do a business plan for a startup, it's always a lot of like guesstimation, because there are so many dimensions that you might have a feeling on where this might go over time. But again, it's just a wild guess, more or less informed, depending on how much control you actually have on that particular dimension. And most of the time, you don't really have a lot of control. So it could be interesting to do all of that exercise, but when you do that exercise for a startup, there is an element of how much it's gonna cost me, but you always try to add another element of where can I save money?

Like, can I get some credits or something from somewhere? And you can definitely do the same exercise with AWS as well, because as we said in the intro, there are also ways to get discounts or get AWS to pay something for you in a way or another. And one thing that is actually really, really interesting is a program called AWS Activate that can give you credits if you are a startup or a solo founder.

We are gonna have, again, the link in the show notes, but the idea is that if you are a startup, you can get up to $100,000 in credits. There are, of course, constraints and limits that you need to check to make sure you actually classify for that. But if you can, of course, it's a huge advantage to have that kind of level of discount, especially for a company that is not stable yet. You might be scared that the cloud can just bankrupt you if you don't do it in the right way.

And at the very beginning, maybe you're not confident, you're not gonna have all the skills to get the right architecture set away. You might need to experiment, you might need to pivot, try different things. So just having that extra cash available for you that you don't have to spend yourself, I think it can just boost your confidence, it can just boost your success rate as well, because you will have a lot more freedom to do mistakes and then recover from them.

So that's also something you can do even if you are a solo founder and you are just starting, and it's actually quite easy to get Activate credits up to $1,000. So even if you just have a very simple idea that you want to test, and again, you don't have any revenue because you don't even know if that idea is gonna be successful, you can pretty much get the credit and probably build everything for free because probably with $1,000, you can build a lot of stuff and validate your idea. And then maybe decide if it's the case to invest more, grow the company, hire somebody, maybe try again for the Activate program at a bigger level and get more credits. And we need to keep in mind that of course, if you are a company that has been in the market for very long, you might not be considered a startup. So what are the option there if you are an established company? Does it mean that you don't get any credits or is there something else you would do there?

Eoin: Yeah, there's even more options, I would say, for larger companies. Before we get to that, actually, I've previously used AWS Activate credits. I was a solo founder of a company. I think my advice there is ask for the credits. Don't just look at the official channels. I mean, generally you have to go through an accelerator or incubator to get them, but AWS are keen to give you those credits. And it's an investment from them as well because I was in a case where I used those credits, did a lot with AWS, went on in subsequent companies to do even more with AWS and it's all good for their business, right?

So they want people to get stuck into AWS and have success stories building on AWS. And you can apply multiple times as well to hit that $100,000 limit. So definitely don't be shy when it comes to applying for AWS Activate credits. For larger companies, there's one great program, which is called the AWS MAP program or the Migration Acceleration Program. And this is a significant fund for migrating to AWS.

And when we're talking about migrating to AWS, that could be an existing on-prem application or it could be hosted somewhere else, or it could even be something that's a set of Excel sheets and a business process you have internally in your company. And there's a load of funds available for that. So if you, they break it down into three phases. You've got a fund for the assessment phase. This is kind of where you're doing discovery and planning for your migration.

And they'll pay for a partner like fourTheorem to do a lot of that work for you. And they'll fund up to $60,000 for that. And then the next phase is Mobilize. So this is basically when you're getting ready to actually do the migration. So that could be running proof of concept projects. It could be preparing your AWS organization with landing zones, et cetera. And they'll pay for half of that as well.

Then when we get into the actual migration, AWS will give you credits or cash back for up to 25% of the annual recurring revenue. So for anybody migrating to AWS, it's a bit of a no-brainer. You do have to reach certain targets. So there's two different versions of the program, but it depends on your, ultimately your annual revenue. But I would always say, no matter what you're doing with AWS, always pester your solutions architect and account manager as much as possible to do as much for you.

The more you talk to them, the more you'll find out about these programs. You can also talk to us about this. Like I mentioned, fourTheorem, our employer and the sponsor of the show is an accredited MAP partner. So we can do these migration workloads. There's a lot of due diligence that goes into making sure that the people who help you with the workloads know how to do it. So we'll give a link in the show notes to the fourTheorem page just about MAP, because it covers a very nice, it's a very nice summary of what it provides there.

But you can also do this for multiple workloads. So if you're an enterprise and you've got lots to migrate to AWS, you can apply for multiple MAP programs. So when it comes down to that back to, you know, tips and tools for optimizing costs then, one thing I'd recommend setting up is AWS Budgets. And of all the tooling that's out there for setting up cost optimization and getting control of budgets, AWS Budgets is a pretty simple one.

It doesn't take very long to set up and it gives you a bit of a safety factor right away. So with these budgets, the simplest approach is just to set a cost amount for your organization or for each account. And when your budget is either in reality exceeding that threshold or is forecast, you can configure it to use real data or forecasts, it's forecast to reach your threshold, you'll get an alert.

So that alert can come in via email or it can come in via Slack over SMS, et cetera. You can make them more advanced. Like you can select specific services for your budget, like have a different budget for EC2 and a different one for S3. But the simplest thing is to just start off with, okay, let's put X dollars on each account and measure over time and adjust the budgets. And it works pretty well.

You can also set up budgets for things like Reserved Instances and Compute Saving Plans, which we'll mention later. The last thing on those budgets is that you can even automate actions. So when budgets are exceeded, you can take serious action, like denying people in the accounts from doing EC2 RunInstance. So if you really, really, really have to ensure that you're not going to incur those costs, is there's always this balance between giving developers and AWS practitioners freedom, but also controlling costs.

So you need to decide how much leeway you're going to give. The different companies have different philosophies on that, you know. I remember hearing, I don't know if it was Brian Scanlon or somebody else from Intercom speaking at an event a few years ago, talking about it, their AWS approach to costs and their approach to cost was always retroactive. So give developers as much freedom to create the resources they need, and they'll look at the cost in retrospect and see if they need to take action. Their belief was that it's always more, it's better for business to allow people freedom and innovation and accept the cost surprises that might come along, rather than slowing people down and always having that ongoing expense in terms of people power.

Luciano: Yeah, there are other tools that you can find in the AWS console that can be very useful to try to understand cost and also manage it in a way or another. One of the most famous is definitely Cost Explorer, which is an interesting one because it's not the simplest, but when you get the grips on it, it can be pretty helpful because you can effectively drill down and make kind of interactive visualization on what is your actual cost.

There are some rough edges, for instance, sometimes the usage type, it's a bit confusing. Like it's a bit hard to actually attribute some costs to something specific. And this again has to do with the way certain type of costs are grouped together, but all over it's a pretty useful tool. And as soon as you learn what are the things that are a little bit misleading, everything else is still very, very valuable.

One interesting thing is that it has a forecast feature. So it gives you like a figure of how much are you probably gonna be spending at the end of the month for a particular visualization that you are building. And that of course might be more or less accurate depending on how predictable is your cost. Like if you have a very spiky service where the fact that it's spiky doesn't really depend on anything that you do recurringly, maybe, I don't know.

You just have randomly mentioned in the news and therefore your traffic multiplies by 100. Of course, this is not something that the cost calculator is gonna be able to predict very well. So it gives you more or less a figure of if everything remains more or less as the same shape that we have observed before, it could be actually pretty accurate. So something to keep checking to make sure that your cost, you have a figure that is kind of also a little bit projected in the future.

One case where that can be useful is for instance, if you are testing new services, like you're spinning up clusters, so I don't know, Cassandra, just to make an example, then you can see that prediction actually skyrocketing because it gets projected in the future. So that can be one case where this particular feature can be very, very beneficial because it can give you an early indication that you have done some important change that is actually projecting at much higher cost in the future.

So keep that in mind, try it, play around with it, and maybe you can find some value in it. There is also Cost and Usage Reports, which allow you to export effectively CSV to S3. So you can take all the data about cost and analyze it with whatever tool is the more convenient for you to do kind of fine-grained queries on the data and you can aggregate it as you need. It is updated once per day. So at the end of the day, you can use all the data and you can build your own tooling around that if you really want to have fine-grained control around the cost data.

Of course, you can also query it with AWS tools. For instance, Athena, it's a very good tool to query data in a structured form in S3. So definitely you can just use Athena to do more advanced queries. You don't need to build your own infrastructure just to query that data. So I suppose the next question will be, assuming that you are a large enough company where you have maybe different departments, different groups with different responsibilities, and you might have different microservices, so different areas of your company are responsible for different microservices, probably you want to see cost at the department level. What can we do there? Like what kind of suggestion would you have there? Maybe my feeling is that you can use a cloud service and maybe my feeling is that you can use accounts and organize things by accounts. Maybe you can use tags, but I don't know Eoin, if you have any tip or best practice that you would recommend there.

Eoin: I agree 100%. I think the two things you can do there, I think generally people recommend tags and that was always because it was normal to have shared accounts in the past. So tagging was more important. Tagging is still important and that depends on your business and how it's structured and how many cost centers you have and where the budget is allocated. To have like a cost center tag if you need to, and then to have project specific tags as well, because you'll end up with resources that can be shared in some accounts and that's just the useful thing to do. But if you've got kind of a modern AWS landing zone with different accounts for different workloads, then it's actually easier then to allocate costs because you can just look at it per account. But I would definitely recommend having both if you can. By the way, we have an episode dedicated to tagging.

Luciano: We are gonna be posting the link on the show notes if you want to deep dive on that topic.

Eoin: You mentioned like with cost and usage reports, you get data that's updated once per day. So what do you do if you wanna react quicker? And sometimes people feel like a day isn't enough. I could have run up $50,000 by then. So one thing I would recommend is for having good, if we go back to our CloudWatch episodes, we talked about having active monitoring on CloudWatch metrics. And those can be a useful proxy for billing because if you've got excessive service usage, that can mean a billing spike as well.

So if you've got Lambda and you're worried about all of a sudden having like some sort of cascading Lambda or recursive Lambda bug that causes a herd of functions to be allocated with your maximum account concurrency for 24 hours, then I would say, put an alarm on the concurrent executions of your Lambda functions and monitor that. You can put an anomaly detection alarm or you can say, look, I don't expect any of my functions to be invoked very frequently, especially if you're like early stage startup, because I'd like just put an alarm if all my functions are being invoked with more than 100 concurrent executions for an hour long period.

And all of a sudden you've got an alarm and that can give you early indicator that there might be something that might eventually cause a billing issue. You can also create your own metrics. So there's useful metrics around API usage. There's a set of CloudWatch metrics called API usage metrics. So you can find out how often certain APIs are called. You can use that to build, get cost insights in advance, but you can also create your own metrics.

And we've done this in the past with containers. So if you want to monitor the number of containers, if you've got large fleets of instances or containers, AWS doesn't give you very good metrics out of the box that tells you how many instances of each type you've got running all the time. So you can create your own, right? Just have a function or on the schedule or something that monitors the number of instances of a certain type you want to monitor, create your own metric and put an alarm on it.

And then you can automatically turn off resources with Lambda functions or Systems Manager based on those alarms. So if you don't want to be focused on just detection and mediation, you can get strict. If you wanna get real draconian, you can be taking more of a preventative approach. I mean, I guess, but like there is a reasonable case for having some more preventative measures in place. So you can use Service Control Policies.

A very simple thing is to, say, exclude regions that you don't expect to be using, right? Because then you don't have somebody who accidentally or just does an experiment in a region you're not monitoring and all of a sudden you end up with a cost in that region. So just limit the amount of surface area you have to observe for your billing data by turning off regions. But you can also turn off certain services. If you don't expect people to be using Sumerian, then turn it off, right? You can just turn it off at an organizational level and you can also limit resources of specific types. Like you can limit access to those really awesome, expensive EC2 instances that have like terabytes of RAM. When we're talking about costs, are we normally talking about compute? Like EC2, containers, Lambda, is that where the bulk of cost is gonna come from? Yeah, I think it might, but it also might not.

Luciano: There is definitely a lot more to explore. And every deployment is somewhat different. You might end up using entirely different set of resources in AWS. So I think it's important to try to have a full picture of where the cost might be. And then you try to apply that to your use case and see exactly, okay, my use case, I might be using more storage rather than compute or more networking rather than anything else.

And so yeah, definitely do the research and try to spend the time to understand what your workflow looks like and what are the things that you're gonna be using the most because those will be probably the ones that will contribute the most to the final cost. And there are certain things that you might want to look into to understand how can you actually fine tune your cost depending on what you're actually going to be using the most.

For instance, if you feel that instances, like virtual machines, is the things that you are going to be spending most of your money in the AWS building, you can look into, okay, how do I optimize instances? So you might start to look at different distance types that maybe are more optimized for your specific workloads. You don't need to buy a big machine where you have tons of CPU and tons of memory and graphic cards, maybe if you just need a good CPU.

You can find machines that give you only the good CPU and they don't really give you everything else if you don't need it. And this is just to make an example. There are so many combinations of EC2 instances and even with re:Invent, we are seeing new ones being created all the time. So definitely there is a CPU configuration, sorry, a EC2 configuration instance type that can match exactly the kind of use case that you have.

So do your research and look into that. But that's not the only option because sometimes you think you need a lot of EC2 instances because you have a very variable workflow. But in reality, you don't always need all these EC2 instances all the time. So you can start to think about, okay, how can I make it cheaper? So maybe you can figure out, can you use auto scaling groups to spin up instances up and down depending on what is your actual usage, what is your actual traffic?

Or another thing that you can do is you can think about spot instances, which is another entirely different model, which is really interesting. Maybe we'll have a dedicated episode to talk more about that. But the gist of it is that you are not going to be paying the full price of an EC2 instance. You're going to be paying something, some kind of discounted price. But because the caveat is that that instance is somewhat volatile, like it's not going to stick there forever.

You can use it as long as that instance is not being used by somebody that's willing to pay a bigger price for it. So it's kind of an auction market for available EC2 instances. But if you can afford for your compute model to have an instance that can go down any minute, that can be much cheaper for you to use. So you might have a pool of instances that are actually reserved, but you can also extend that pool with spot instances that will give you extra compute when you need it at a much cheaper price.

And another thing is Reserved Instances, which is when you pay some amount upfront. Again, this is something maybe worth going into more detail in a future episode, but the idea is that if you know in advance you're going to be needing these EC2 instances for the next five years, why not trying to get a discount on them? Try to reserve them for five years, AWS is going to give you a better deal on those instances.

So this is another way that you can decrease that compute cost. And there are other things like Compute Savings Plans. So again, the idea is once you understand what is the bulk of your cost, try to think what are the things that I can play with to reduce that cost. And sometimes it's architecture, sometimes it means using other services, sometimes it's trying to figure out a better usage model where you try to fit your actual usage to something where you have less waste, basically.

You are paying exactly for what you use. We already mentioned cost calculator, we already mentioned spreadsheet, so try to exercise those muscles and use it and you'll get better at it. You'll understand more and more the more you use AWS and I think it's going to become less and less scary. And it's going to become just another capability that you have in your business to every time you want to do something also to put in place, some time and some expertise to understand the cost in advance and include that as part of building new workloads and new capabilities into your business. I think that's all I have in my side. I don't know Eoin if you have any closing advice that we can give people?

Eoin: I think, yeah, I definitely recommend people checking out that AWS Pricing Calculator. You just go to calculator.aws so that should be pretty easy to remember. You don't even need to look at the link in the show notes, but it's a web-based application, right? And you can just pick your services, enter the different units for the different dimensions and it'll give you a cost and you can share, you get a shareable link that you can share with other people and you can also download a spreadsheet from it as well.

There are some bugs and niggles with it. I've heard people say that sometimes all of the calculation data doesn't stick. So just be careful with that. But I always recommend doing a spreadsheet approach anyway, especially if you're doing a serious cost calculation and not just a back of an envelope kind of a calculation. What you can do with spreadsheets is that you can get a little bit more granular and you can say, okay, based on this number of users, then I expect this number of API requests and this number of function invocations.

So then you can actually play around with the business inputs as well. Like what if I get, you know, peak usage on Black Friday? What would that look like in terms of cost? Spreadsheets are good. And then you can use the spreadsheet to validate the cost calculator outputs and vice versa. I think it's also worth mentioning the AWS Well-Architected Framework Cost Optimization Pillar. It's one of the six pillars.

And that documentation is, you know, just good advice on cost optimization in general and putting the practices in place. There's also a hands-on lab. And then just a reminder as well, don't forget to check out the AWS Activate and MAP programs. What better way to deal with your building getting AWS to pay it for you? We have experience of both. So if you have any questions, feel free to reach out to Luciano or myself. And finally then if you like AWS Bites, please share the link with a friend or a colleague because, you know, our audience is growing more all the time as the audience grows. We got lots of good ideas, suggestions and feedback. So keep those comments coming on YouTube and Twitter as well. Thanks for joining us. We'll see you in the next episode. See you in the next one.