129. Lambda Provisioned Concurrency

Published 2024-08-23 - Listen on your favourite podcast player

In this episode, we discuss AWS Lambda provisioned concurrency. We start with a recap of Lambda cold starts and the different concurrency control options. We then explain how provisioned concurrency works to initialize execution environments in advance to avoid cold starts. We cover how to enable it, pricing details, common issues like over/under-provisioning, and alternatives like self-warming functions or using other services like ECS and Fargate.

This episode of AWS Bites is powered by fourTheorem. Whether you're looking to architect, develop, or modernize on AWS, fourTheorem has you covered. Ready to take your cloud game to the next level? Head to ⁠⁠⁠fourtheorem.com⁠ to check out our in-depth articles, and case studies, and see how we can help transform your AWS journey.

In this episode, we mentioned the following resource:

Episode 60: "What is AWS Lambda"
Episode 104: "Explaining AWS Lambda Runtimes"
Episode 108: "Solving Lambda Cold Starts in Python"
Episode 120: "Lambda Best Practices"
AWS Lambda Concurrency Explained by James Eastham
Provisioned Concurrency pricing
Less than 1% of invocations are cold-starts (statement)
Middy Warmup Middleware
Lambda speculative warm-up init (mention in the Docs)
Episode 64: "How do you write Lambda Functions in Rust"
Episode 128: "Writing a book about Rust and Lambda"

Let's talk!

Do you agree with our opinions? Do you have interesting AWS questions you'd like us to chat about? Leave a comment on YouTube or connect with us on:

BlueSky: @eoins.sh + @loige.co,
LinkedIn: eoins + lucianomammino,
Twitter: @loige

Help us to make this transcription better! If you find an error, please submit a PR with your corrections.

Eoin: Ever had one of those days when a cloud deployment just refuses to play nice? We sure did, thanks to some quirky issues with Lambda's provisioned concurrency. Every issue is an opportunity to learn something new. And after some deep digging, we uncovered some insights about Lambda provisioned concurrency, and we just thought we'd share them with you today. So we're going to talk about the joy of cold starts, Lambda concurrency, and the different concurrency control features available, how provisioned concurrency works itself, some of its limitations, common problems, and of course, those pesky pricing details. I'm Eoin, I'm here with Luciano, and this is another exciting episode of the AWS Bites podcast. This episode of AWS Bites is powered by fourTheorem. Whether you're looking to architect, develop, or modernize on AWS, fourTheorem has you covered. If you want to take your cloud game to the next level, then head over to fourtheorem.com and check out our articles, case studies, and see how we can help transform your AWS journey. Luciano, take it away. What have we got to say about Lambda and provisioned concurrency?

Luciano: Yeah, let's start with a little bit of an introduction. Of course, we spoke before about how Lambda works in general and what is a cold start. And there are a few episodes that you can check out if you want to review these topics. One is number 60, what is AWS Lambda. 104, explaining how Lambda runtimes work. 108, how to solve cold starts, specifically in Python. And then we also have an entire episode dedicated to Lambda best practices, which is episode 120.

So definitely review those if you're interested in really going deep down into the rabbit hole of all things at AWS Lambda. But of course, it's probably worth doing a super quick recap of the things that are important today. And I think it's important to mention what happens when a Lambda function starts. And a Lambda function basically needs an environment that is created on demand when a specific event occur.

And if you have multiple concurrent events, more Lambda environments are created as needed just to try to keep up with the load. Remember that AWS Lambda will create environments and each environment will process only one event at a time. So if you have two events, a new environment needs to be created. And of course, these environments are totally dynamic. If there isn't really lots going on, maybe the throughput of event decreases or at some point you have a period of total inactivity, AWS will start to reclaim resources and it will destroy those environments.

And you have to think that of course, creating one of those environments is not a trivial operation. So it kind of requires some work on the AWS side. And just to simplify, you can imagine that this execution environment needs to be created somewhere. And specifically, these are micro VM running on Firecracker, which behind the scenes is deployed on EC2 instances. And when all of that is created, of course, the code that you want to provision into your Lambda function needs to be pulled from somewhere.

So that can be either S3 or a container registry. Then at that point, the instance is ready to be initialized and the initialization phase has multiple steps. For instance, if you have Lambda extensions enabled, those need to be initialized first. Then depending on the specific runtime you're using, for instance, if you're using Node.js, the interpreter itself needs to start and maybe it's going to do things like loading libraries, doing JIT compilation and whatever makes sense for that particular runtime that you're using.

And then finally, your code starts to be initialized. And you know that in your code, you generally have two parts. There is an init code, which is where you can do all the things that you want to do the first time that the environment is initialized. And this is what's going to happen. And then you have the under code, that's the code that gets executed for every event. So the init of your code is what happens when the environment is created.

And of course, all of this stuff can take some time. And when this happens and you are trying to process an event, all this extra delay is called a cold start. Now, you might be wondering, is this something bad? And the answer is that it really depends the way you look at it, because on one side, cold starts are actually cool because they are the necessary trade off that allows Lambda as a service to scale to zero.

So if we didn't have cold starts, we probably had something that was running all the time. And the pricing behind Lambda would be very different from what it is today. So in a way, they are kind of a necessary evil. And the other thing is that sometimes they are very negligible, because if you're doing some kind of background processing and it might be not very particularly time sensitive, if you have to wait a few milliseconds or a few seconds even extra, it's probably not going to be the end of the world.

Imagine you are just sending an email in the background or maybe resizing a picture. It's probably fine if that particular process happens a few milliseconds later rather than happening immediately. But of course, if you have a use case, maybe, for instance, like an API request with a user, like maybe using a browser that triggers that API request, and that request is handled by Lambda, if there is a cold start, the user might actually perceive that slowness and it might affect the user experience. In that case, you need to be a little bit careful with cold starts. And if you are in one of those cases, you probably want to know, okay, what are my options for reducing the cold starts? And one of such options is provision concurrency, which is what we are going to talk about today. Yeah.

Eoin: And maybe before we go further, I mean, I always feel a little bit reluctant to talk about topics like this, because I think cold start problems are generally overstated, especially by people who don't use Lambda really in anger. So it's really an advanced topic. It's something that's useful to know about, but I wouldn't fret about knowing all the different options and hyper-optimizing functions that probably don't need to be optimized in a lot of cases. Simpler is generally the better approach if you can get away without all of these fine-tuning options.

Luciano: I actually remember that there was a case studied by AWS where they looked at all the Lambda invocations that they have across all their customers, and they came up with a percentage that I don't remember exactly how it is, but I think it was in the order of like 1% of all the function invocations is a cold start. So generally speaking, this is a problem across all the customers. It doesn't happen so often. Then of course, if you have very sparse workloads, maybe you might be more affected than customers with lots of events coming in all the time. Yeah. Yeah.

Eoin: But look, these are useful things to know so that you've got these configuration options in your back pocket if you ever really do need to take advantage of them. So let's first clarify that there is a quota on the number of Lambda environments you can have running in a given AWS account in a region. The documented default is 1000 concurrent executions across all functions in an account in a region, but that's a soft limit and can be increased if needed.

Now, a lot of people have been seeing for a while now that new accounts have a limit of 10. We suspect that this is for abuse prevention to prevent people spinning up new accounts and managing to mine Bitcoin before they pay the bill. But if that is your case, it can be raised. You just need to request a support quota change. Now, let's talk about concurrency then. So if the number of in... This is the number of in-flight requests that your function is currently handling, generally matches the number of active execution environments for Lambda.

Now, there are two types of concurrency controls available. You've got reserved concurrency and provisioned concurrency. They can be confusing. Reserved concurrency is the maximum number of concurrent instances allocated to your function. And when a function has reserved concurrency, it is reserved for that function. So no other function can use that concurrency. If you've got lots of traffic, or maybe you've got both the triggers, tons of unnecessary invocations, you might end up in a scenario where you spin up enough Lambda environments to reach the account level concurrent execution limit.

And that means no more Lambda environments can be created. And if you consider that environments are created for specific Lambda functions, you might end up in a scenario where you can't even process events because you can't spin up new Lambda function environments to handle new events. So reserved concurrency is just useful to ensure that you've both got a cap on the number of concurrent executions for a function, but you also ensure that other functions can't steal an allocation for a specific function.

And of course, that has the impact of other Lambda functions having less capacity available. So it's a trade-off. Now, reserved concurrency is just something you can configure for a function. It doesn't have any additional charge. It's just a question of putting that cap on a function. It's also used in some cases, if you've got an errant function, something that's causing a lot of problems. Maybe you've got a recursive loop or something that's triggering a lot of errors or a cost issue.

You can just set your reserved concurrency to zero and that will stop your function from being invoked altogether. That's a useful tip. Now, this one doesn't really help with cold starts. It just really helps you to make sure you can clean up, you can keep scaling up specific functions up to a certain point. Environments are still created on demand and cold starts are still part of the picture in that case.

Now, provisioned concurrency is something that AWS added a good bit later. And this is something that a lot of people welcomed. I'm not sure, to be honest, but it essentially means that you've got a number of pre-initialized execution environments for your function. And these ones are ready, once you've deployed them and they're in an active state, they're ready to respond immediately to incoming events. So this is something that can be useful for reducing cold start latencies for a function. And of course it does because you've got these environments running essentially ready. They've started warm. There is a cost impact on that. So there are additional charges. So let's talk about how provisioned concurrency works then.

Luciano: Yeah, provisioned concurrency, as you said, keeps a certain number of Lambda execution environments warm for you. So this basically means that as soon as you have enabled provisioned concurrency and set a specific amount for a function, AWS will need to spin up that number of execution environments for you so that they are ready and warm for whenever new events come in. So basically if you receive a request, you will have this Lambda environment already available.

And also this environment not going to be eventually disposed by AWS, even though you might have a period of time where you don't receive enough events, or maybe you have even zero traffic. If you have provisioned concurrency, your instances will still be there and available, even if nothing happening in your account. So in a way, this is going to help you to fight cold starts, but it doesn't necessarily mean that you won't have cold starts anymore.

In fact, if you think about that, you are just setting a number of instances that are ready for you. But then if you start to have more events than you anticipated, then Lambda still needs to scale up even more. And that means that even beyond the amount of provision instances, AWS will start to create new instances. And that means that those new instances will incur in a cold start. So you might still see cold starts if you didn't really predict exactly the number of warm instances that you needed in the first place. So just be aware that there's not like a universal solution that's going to totally eliminate cold starts, but it's something that might help you to reduce the amount of cold starts that you will see for specific Lambda functions. And another thing is that you can also set the provision concurrency to zero, and this is going to have the same effect that you described before. So it's something you can use if you want to basically stop a function from running altogether. Now, how do you enable provision concurrency? It's probably something that we should discuss, and I'll let you all talk about that.

Eoin: Enabling provision concurrency theoretically is quite simple. It's just a number, an integer property that you're associating with a Lambda function through the web console or through APIs. You can configure up to the unreserved concurrency in your account minus 100. So this is a reservation of 100 units of concurrency for functions that aren't using reserved concurrency. For example, if your account has a limit of 1,000 and you haven't assigned any reserved or provision concurrency to any of your other functions, you can configure a maximum of 900 provision concurrency units to a single function.

Now, with Lambda functions, you have different versions and aliases. So generally, you can get away with using the $LATEST default alias for version for a function. But when you're using provisioned concurrency, you need to create an explicit function version with an alias. And it's on this alias where you set the provision concurrency value, not on the function itself. So this is something that can introduce a little bit more complexity.

And this is a reason why you shouldn't just jump for these optimizations by default. For example, if your function has an event source mapping, you have to make sure that the event source mapping points to the correct function alias. Otherwise, your function won't use provisioned concurrency environments. Again, it's worth remembering that configuring provisioned concurrency for a function has an impact on the reserved concurrency pool for other functions.

So if you've got function A and function B, you configure 100 units of provisioned currency for function A, other functions in your account must share the remaining 900 units of concurrency. So this is true even if function A isn't being invoked, and you're not making use of those 100 units. And this is very similar with reserved concurrency, because when you reserve concurrency, you're also not making it available for other functions.

The difference is that with provisioned concurrency, you have warm Lambdas running all the time. With reserved concurrency, you don't. Now, it's possible to allocate both reserved concurrency and provisioned concurrency for the same function. And if you do that, the provisioned concurrency can't be greater than the reserved concurrency. Now, if you're using all of this stuff, you probably want to monitor your metrics.

And with Cloud Web Metrics, you have a concurrent executions metric that will show you the number of concurrent executions for your account. And you should look at that and tweak your settings accordingly. And it's something you could use once you're looking at concurrent executions for any function. You can use that to figure out what the optimal provisioned concurrency might be. Then you're more likely to reduce cold starts and balance that with the cost impact. There's a good video actually by James Eastham with a good walkthrough and some code examples. And we'll definitely have that link in the show notes. So that's configuration. Let's talk about money. Yes.

Luciano: So provision concurrency cost is calculated from the time you enable it for a specific function until you disable it, if you, of course, ever disable it. And it's rounded up to the nearest five minutes. So imagine that you, I don't know, enable it for seven minutes before disabling it, you will be paying for 10 minutes. The price depends on the amount of memory you allocate. So similar to the invocation cost of a Lambda.

And of course, the amount of concurrency you configure on it. Duration is calculated from the time your code begins executing until it returns, otherwise terminates, rounded up to the nearest one millisecond. So basically you are paying an extra cost on top of the usual invocation cost that you would have to pay if you were not using provision concurrency. And that in a way makes sense because, of course, AWS is keeping those instances for you reserved and nobody else can use those instances, even if you are not processing any event. So of course, there is a cost associated to have all this infrastructure reserved for you. We'll be linking the full pricing documentation in the show notes if you want to review exactly what is the fee for your specific region and also changes depending on the architecture that you use and the memory. So if you really want to do some simulation or have a better understanding of how this might impact your cost, definitely check out the official documentation for all the official numbers. Now, what are some common issues and maybe suggestions for troubleshooting based on our experience? Yeah, there are definitely things to look out for.

Eoin: One is over provisioning or under provisioning. If you over provision, you're going to end up paying for compute you won't use. It seems like you're getting away from the goal of using Lambda in the first place. And if you're under provision, you may still see cold starts. So you really have to think about whether you want to get into this or not. Scaling limitations as well. So if you abuse reserve concurrency, you might end up in a situation where you can just erode the total Lambda concurrency pool available to a given account or region.

Same goes for provision concurrency. This can make it very hard for you and your team to keep using Lambda functions and it can affect the capacity you have for Lambda functions that don't have provision concurrency. Now, when we mentioned an issue we encountered recently, essentially it was a deployment error when we were deploying with provision concurrency with an alias and the error, I think we got it surfaced through cloud formation, just said handler error code not stabilized.

And this is AWS telling you that it was trying to warm up an execution environment for a Lambda with provision concurrency, but it failed to do so. The error is pretty vague, but there are actually a number of reasons why this can happen. So it can happen because the specific version can't be deployed. Maybe your Lambda zip size is bigger than the 50 megabyte limit or the total 250 megabyte limit.

Your Lambda is deployed correctly, but the initialization code fails. So maybe you've got a bug or a typo in your code, it fails to import a dependency or to initialize a client, permissions error, that kind of thing. So it makes sense, of course, that this can fail your entire deployment because AWS cannot fulfill its contract of warming up these functions as you have requested and creating this provisioned concurrency. But it's something that you mightn't think of if you're just moving from a non-provisioned concurrency setup where you don't have to worry about failures in your code until the function is actually invoked. So you just have to be a little bit more careful about that. Right. So I think we've given a good overview of provision concurrency, talked about pros and cons. It's not as simple as you might like. It's just the nature of it. What are some alternatives if we've put people off?

Luciano: Yeah, I think it's definitely worth mentioning that it's not a silver bullet and there are lots of trade-offs that you need to carefully analyze and take a decision on whether this is the solution that is going to solve your problems or maybe you want to look at other solutions. So let's just try to give you some alternative ideas to try to fight cold starts because that's our premise today. We are trying to think if I am annoyed by cold starts because they are affecting my applications in a way or another, what can I do to reduce or totally eliminate cold starts?

And the first thing that comes to mind is that you can do your own warm-up as needed. And this is actually something that people used to do before this provision concurrency feature was enabled in Lambda. And actually, when I was working in the very first version of Middy, this is something that didn't exist. And in Middy itself, one of the very first middleware we created was a middleware that would help you to basically use event bridge as a scheduler to effectively send a ping every sometime, maybe every five minutes, every 10 minutes, whatever made the most sense for you, to effectively wake up a Lambda environment for you, make sure that there was at least one Lambda environment.

And then the middleware would basically check, okay, if this is an event coming from event bridge, I'm just going to ignore it because I know that was only used to wake me up. But if another kind of event comes in, maybe an API request, then of course your own handle project is going to run. And of course, you don't have to use Middy to do this. You can do this on your own. The amount of code you need to write, it's relatively simple and small.

But yeah, and you can even do that without using event bridge. So whatever is going to trigger your Lambda, of course, is going to create potentially a new environment. So if you can do that recuringly, you are going to create instances that will be around for a little bit and they will be warm to handle real events. So that's just an idea. Of course, it's also very tricky that this particular approach will avoid cold starts entirely.

It's just a way to try to reduce cold starts. Then depending on how well you can predict traffic coming in, you might have different type of results. You will see more or less cold starts. Another interesting approach is Lambda's napstart. This is more relevant if you're using Java. And again, it doesn't really solve cold starts per se, but it can greatly reduce the duration of a cold start, especially for languages like Java, where the cold starts can be more significant than with other runtimes.

So if you're using Java and you want to reduce the cold start duration, definitely check out snapstart. And then the other approach is that you might want to consider other AWS services, because of course, if you really are in a situation where you cannot tolerate cold starts, maybe Lambda is not the solution for your problem. Maybe you need to use something like a container, maybe running on Fargate if you still want to have kind of a serverless deployment experience.

And in that case, you will have an instance that is running all the time, and therefore you are not going to have that particular problem of seeing cold starts. Of course, in that case, you might have the problem of how do I scale up? And then you need to see what that service is going to offer you to being able to scale in the case that you start to get more and more traffic. And then another final suggestion, which maybe can feel a little bit funny, but it's actually serious, is that you could consider using Rust with Lambda. And the reason is that with Rust, we have seen really amazing performances especially when it comes to cold start. They can be 10 or 20 milliseconds for the majority of times if your Lambdas are still relatively small. So that's maybe an amount of time that is basically making the cold start negligible. So if you're interested in this approach, we have actually two podcast episodes, number 64 and 128, where we talk about creating Lambda function Rust and what all of that entails. So you might check out those. So that's all I have for suggestions. What else do we want to share?

Eoin: I'd like to remind before we finish up that AWS keeps introducing optimizations under the hood, even things so that you don't necessarily have to change to get performance improvements. And one thing we didn't talk about, maybe we can find a link for the show notes, but I think people discovered about a year or so ago that AWS was doing some pre-warming of functions that didn't have provision concurrency turned on as well.

So they're doing things to try and make sure that your cold starts are as low as possible. And I think that's going to continue. We've seen it with Snap Start, and I think we can expect even with that optimization of Python functions episode, we talked about how container images are optimized for lower cold starts now. So I would say again, just think a little bit before you invest too much time in all the complexity of configuring provision concurrency, if you really don't need it.

But I think that wraps up our deep dive. And hopefully now you've got a clear understanding of how it works, its benefits and potential pitfalls. I think it's provision concurrency is definitely a useful tool in your AWS arsenal. Now, if you enjoyed the episode, do like, subscribe and share it with your fellow cloud enthusiasts. Don't forget, we really love hearing from you. And thanks very much to everyone who does reach out to us and gives us comments, questions, and just lets us know that they like the podcast. So do drop us a comment or question. Your feedback does help shape our future episodes. So thanks for tuning in and we'll catch you in the next episode of AWS Bites.

129. Lambda Provisioned Concurrency

Let's talk!

Prev

Next

AWS Bites Podcast

129. Lambda Provisioned Concurrency

Let's talk!

Prev

Next