151. EC2 loves Lambda - Lambda Managed Instances

Published 2026-01-16 - Listen on your favourite podcast player

AWS just made Lambda… less serverless. Lambda Managed Instances (Lambda MI) brings managed EC2 capacity into Lambda, and it changes the rules: environments stay warm, a single environment can handle multiple concurrent invocations, and scaling becomes proactive and asynchronous instead of "spin up on demand when traffic hits."

In this episode of AWS Bites, Eoin and Luciano break down what Lambda MI unlocks (and what it costs): fewer traditional cold starts, but a new world of capacity planning, headroom, and potential throttling during fast spikes. We compare it to Default Lambda, explain how the new scaling signals work, and what "ACTIVE" really means when publishing can take minutes on a new capacity provider.

To make it real, we built a video-processing playground: an API, a CPU-heavy processor, and a Step Functions workflow that scales up before work and back down after. We share the practical lessons, the rough edges (regions, runtimes, mandatory VPC, minimum 2 GB + 1 vCPU, concurrency pitfalls), and the pricing reality: requests + EC2 cost + a 15% management fee.

Big shoutout to fourTheorem for powering another episode of AWS Bites. At fourTheorem, we believe the cloud should be simple, scalable, and cost effective, and we help teams do just that. Whether you are building event-driven systems, scaling a global SaaS on AWS, modernising serverless and container platforms, or trying to keep cloud spend under control, our team has your back. Visit fourTheorem.com to see how we can help you build faster, better, and with more confidence on AWS.

In this episode, we mentioned the following resources:

Let's talk!

Do you agree with our opinions? Do you have interesting AWS questions you'd like us to chat about? Leave a comment on YouTube or connect with us on:

BlueSky: @eoins.sh + @loige.co,
LinkedIn: eoins + lucianomammino,
Twitter: @loige

Help us to make this transcription better! If you find an error, please submit a PR with your corrections.

Luciano: Hello and happy 2026. In our latest episode of AWS Bites, we covered ECS managed instances, which is a new way to power your ECS cluster with managed EC2 capacity. This basically means that you still use EC2 instances under the hood, but in this case, AWS takes care of all the usual instance chores like picking the base image, operating system lifecycle, security patching, ongoing maintenance, and you basically focus on describing what you need, for example, if you want a GPU, or maybe if you want specific storage profiles, or maybe particular networking characteristics. And then at that point, AWS provisions the right instance for your cluster in exchange for a management fee. Now, coming straight out of reInvent in Las Vegas, which is AWS's biggest conference of the year, AWS has taken this very idea, managed instances, and applied it to, drumroll, Lambda. So the name of the announcement is Lambda managed instances, and it has been a little bit controversial. It has sparked a lot of curiosity, plus a fair amount of confusion, because let's be honest, why in the world would you want to bring EC2 instances, aka servers, into one of the most serverless compute services out there? So you might be wondering, what does it really change for Lambda? Do you gain something that you could not do before? Honestly, we were as well curious as much excited by this announcement. So we took Lambda MI for a proper test drive during the holidays, and in this episode, we're going to share our take on it. So we want to share what Lambda managed instances enables that default Lambda could not do before. And there are a few spoilers I want to give you here, just to capture your attention. There are no more cold starts, kinda, we'll talk more about that. And a single Lambda environment can now handle multiple requests concurrently, which is also very interesting, and we'll cover that in detail. We'll also talk about how to set it up and make the most out of it. What are the different ways to scale the underlying EC2 capacity, limitations and pitfalls we noticed, and there are quite a few of them to be aware of. The use cases, when we think this is actually a good fit and a good idea to use this new technology. And of course, we'll talk about pricing and whether it's worth the cost and the effort. Now, since we built a realistic application for this exercise, we'll also talk about our use case and our example application as well. My name is Luciano and I'm joined by Eoin, and this is AWS Bites. AWS Bites is brought to you by fourTheorem. Stay tuned to hear more about fourTheorem at the end of the show. So, Eoin, maybe we should start by clarifying again, what is the difference between default, I guess we should call them, Lambdas, and Lambda on EC2?

Eoin: Yeah, let's call the old model the default, that's what AWS calls it in the documentation. So let's roll with that. Your default Lambda is the fully managed, just run my code service that we know and love and most people refer to when they say Lambda functions. And a function is invoked in response to an event, right? An object gets uploaded to S3, that can be an event that triggers a Lambda that resizes an image, extracts metadata. That's a canonical example. So when an event arrives, AWS is running your code in this isolated execution environment.

Where does this environment exist in terms of compute? It doesn't matter. Of course, it will be in a server somewhere, but we don't get to see that. We don't care. That's what we like. The important detail is that each environment processes only one event or one invocation at a time. If more events come in, Lambda scales out by creating more environments to handle concurrency. And then if traffic scales back, Lambda will manage scaling back the execution environments, often effectively to zero.

The main trade-off is cold starts then, because all of that scaling that Lambda is doing behind the scenes, you don't pay for that. You don't think about it. But it means that sometimes there isn't a warm environment ready to serve your event. And Lambda will need to create a brand new environment, which takes time. Sometimes it can be a few seconds. Sometimes it's as low as 100 milliseconds or even less if you're using Rust, for example. But often it can be significant. We've talked about that in previous episodes when we dealt with Python and large data science packages. Now, let's talk about the new way of doing things. The new option, at least. Lambda managed instances. This one is about the Lambda service keeping the same programming model and integrations, but it changes the how and where your code runs element. Now your function executes in containers on EC2 instances in your account.

These EC2 instances are chosen via something called a capacity provider. And we'll talk about more about that in a second. So the AWS is still managing provisioning, patching, routing, load balancing, the scaling mechanics, and the lifecycle. That's important. So it's not about necessarily taking on all the burden of managing EC2 instances. It is about just changing the scaling and provisioning capacity and also the cost model for Lambda, which we'll also talk about.

So it's not just Lambda, but on different hardware. There are a few big behavioral changes that will materially affect how your functions perform, how they scale, and how you need to write and operate them. So it's definitely something you need to go into with your mind fully informed on how all these things work. So what changes in practice is, well, for one, the concurrency model. One execution environment can now handle multiple concurrent invocations, unlike default Lambda's single invocation per environment model. This is a big change in how Lambda operates and something that people might have been asking about for a long time. And it reminds us a little bit of Vercel's fluid compute. It also makes Lambda suitable as a back end for high throughput APIs. We've seen people who are familiar with, say, the Node.js ecosystem and how it can handle high concurrency, asynchronous I/O in one process. This is one of the great benefits of Node.js when it came out originally. And then when you move to Lambda, you're wondering, well, I'm losing all of that power now because I can only do one event in a process. But now you can do multiple events per process so you get a little bit more of that benefit back. Other things that have changed with managed instances, now we now have always on environments. So environments can stay continuously active, no freezing between invocations, and this helps to mitigate most of the traditional cold starts. Scaling behavior now is asynchronous and driven by things like CPU utilization.

It doesn't scale to zero because it maintains a configured minimum capacity. And fast traffic ramps can outpace scaling briefly. So you'll need to think about that. And then when you publish a function to run on managed instances, Lambda by default is going to launch three instances, three EC2 instances in your account by default for resiliency. It's good practice. And it'll bring those environments up before making the version active. Now, there are ways, which we might touch on a little bit later on how to avoid there being three instances all the time, but that's just generally a good practice. When we talk about lifecycle as well, the instance lifecycle, these instances will stay up for a maximum of 14 days. That's what Lambda is saying, but then they're going to rotate them. So that's something you should plan for if it's important. And now with this model, you can pick instance characteristics. So you can think about the latest CPUs you want, like Graviton 4, you can get here now. Configurable memory to CPU ratios, if you want high bandwidth networking, which is something you didn't have that much control over before, now you have that option. So these are all new considerations. What stays the same, it's still Lambda in how you build and integrate it. You should treat your code more like a concurrent long lived service process because the execution environment can handle parallel invocations and sticks around. So if you used to, functions that didn't hang around, sandboxes that didn't hang around, these short lived execution environments, you may not have noticed your memory leaks before. Now with these 14 day instances, it might be something you'll have to think about. Should we talk now about how scaling works? I think a little bit more in detail. What do you think?

Luciano: Okay, I'll try to cover up for that. So I guess the first mental model shift is that Lambda managed instances scales proactively, as you explained, and not on demand, which is the case for the default Lambda. And with default Lambda, let's repeat that just for clarity. When an invocation arrives, Lambda looks for a free execution environment. And if there is not available, it creates a new one on demand.

And this is where you can see that famous or infamous cold start. With managed instances, Lambda does not scale because an invocation arrived. It scales asynchronously, basically up front, watching things like CPU utilization and multi-concurrency saturation. And it's basically trying to determine how busy is the environment and do I need to effectively provide more room for executing Lambdas. And we'll talk about what that actually means in practice in a bit more detail in a few minutes. But just think about these two differences.

In the case of default Lambda, you basically don't have to think about anything. You just know that when event arrives, if there are no environments, they will be created on demand. With Lambda MI, basically those environments are created basically in the background up front before your code is executed. And one analogy that I think can somewhat explain this idea a little bit better is we can think about managing a restaurant. And you can imagine that if you have, of course, you need to have tables in a restaurant and you can imagine that those are your EC2 instances.

Then you need to have execution environments and we can compare those to your staff working and serving those tables. And then max concurrency is basically the idea of how many guests can a single server handle at once. And if we use this analogy, which is a little bit of a stretch, but I think it's still useful to get the mental model right, So we can think about scaling in these terms. So when demand goes up, what Lambda MI can do is basically scale two different layers. It can either add more execution environments on existing instances.

So this is like if you are hiring more staff in the same exact restaurant, the same space available, you just have more staff available to do more things. And then if your instances are running out, you can add more managed instances. So basically you are adding more EC2 capacity under the hood with your capacity provider. And this is basically like adding more tables to the restaurant, which of course it doesn't, you have to think that as not necessarily cramming more tables in the same space, but probably more taking more space. Maybe you're putting the tables outside, or maybe you are expanding the restaurant somehow, right? Maybe taking the next building or something like that. So that these are kind of the two dimensions that are used for increasing your capacity for running more Lambda code.

And this is where why scaling can feel a little bit different with this new way of running Lambda code. And also because this scaling is asynchronous, it's basically trying to scale upfront. But sometimes if you have lots of demand happening very quickly, what happens is that you might not have that capacity available when you need it. So you might actually see throttles. So you try to execute Lambda code, there is no capacity available, your Lambda execution might effectively be throttled.

So there are some default mechanisms that AWS put in place where basically, if your capacity doesn't double within five minutes, you should be okay. But I think if you go over that, so if your traffic doubles very quickly, then maybe it's where you start to see throttles. You need to play around with that to see exactly how it works. But this is what we can infer from the documentation. So again, just to remark, the main change is that instead of thinking about the first request is going to be slower, which is the case of a cold start, the failure mode is more like maybe you have a sudden spike and you might see throttles. But in general, if you have predictable traffic and your capacity is enough, you are not going to see cold starts or throttles. So that gives you a little bit of a more predictable and always available environment to run on, which is nice, especially if you're using languages that tend to have a quite long cold start like Java, Python, or Node.js sometimes, or maybe if you have lots of dependencies that might take a long time to just keep the environment up for the first time.

This can be actually a nice use case where you effectively can eliminate that problem. Now, if we want to deep dive even a little bit more, I think there are a few moving pieces that we also need to mention. And these are the router and scaler and a Lambda agent. This is more to explain how AWS implemented all these things under the hood. So in practice, when you publish a new version with a capacity provider, Lambda launches and manage instances in your account or multiple instances.

Of course, if we consider that they will be available in different availability zones. And as we say, by default, there are three instances for resiliency in different AZs. And this version will eventually become active. So effectively, you publish a new Lambda, the instances are created in different availability zones. That Lambda is now considered active. When an invocation comes in, your environment is going to start to consume CPU and memory.

And that's where you have a Lambda agent, which is running within the EC2 instances and reporting this consumption to what AWS calls the scaler. And the scaler is effectively the component that decides, do we need to add more environments or more instances? So it's this continuous conversation between all these different moving parts to try to determine, are we using enough capacity? Do we need more? Is there still space to run more Lambda functions?

So that's another dimension to consider. Again, when the traffic goes down, that's another thing we need to consider. The agent report that too, the whole system can decide to scale down environments and instances accordingly. So I think that gives you the general idea of how things work more at an abstract level. When we talked about the restaurant analogy, we also spoke a little bit more in the actual implementation details with the different components. But what probably matters the most is what can you control as a developer building your applications with this new model in mind? What kind of tweaks and toggles can you touch? Well, at the function level, you can pick how big your execution environment is.

Eoin: That's in terms of vCPUs and memory. The smallest supported size is 2 gigabytes and 1 vCPU. The key is to choose a size that supports your intended multi-concurrency because each environment is meant to handle multiple invocations. The other big thing here is that previously you had a 10 gigabyte memory limit for default Lambda mode. Now you can have 32 gigabytes available to a Lambda invocation, which is a big change.

The rule of thumb, I think, if you're doing CPU heavy work with not that much IO, you typically want more vCPUs rather than just cranking concurrency. And then you can specify the maximum concurrency per environment. So with default in Lambda, that would be a one-to-one ratio. But here you can have up to 64 concurrent invocations per vCPU. So you can increase it if each invocation is light on CPU and maybe more IO bound.

Then you get more throughput per environment and you'll get more cost benefit as well. And you can decrease it if you're memory heavy and CPU light. Then that's at the function level. And then at the capacity provider level, you can specify your target resource utilization. So how much headroom do you want really? So a higher target would be higher utilization, potentially lower cost, but less headroom.

And then a lower target you could use if you want more spare capacity for bursts, but you'll pay for more idle compute. And then you can specify your instance types so you can constrain instance types. But AWS recommends letting Lambda choose for best availability. So don't be too restrictive and specific on what instance types you support. So when it comes to the two scaling modes, manual versus automatic scaling, at the capacity provider level, you can specify which one you want.

Auto is the default. Manual exists when you want precise control over the scaling threshold. Now from what we've seen, that's basically just a CPU scaling threshold, CPU utilization scaling threshold. I haven't seen any ways yet, haven't really tried it either, but of using other maybe custom metrics or other metrics to scale like you could with an auto scaling group. Then separately at the function level, you can specify the minimum and maximum execution environments.

So this is a particularly important one, because you need to, if you want your function to be invokable, you have to specify a non-zero minimum. Then AWS might add more as it sees fit. But if you set it to zero, as we'll discuss later, that's basically turning off your function, making it not invokable. But you can change the scaling characteristics with a put function scaling config API. And that allows, that'll allow you to have like more brute force or manual control over scaling before a batch of background processing. If you wanted to do some large scale processing during the night, for example. So given that, I hope that made some sense. Luciano, do you want to talk through what we built?

Luciano: Yes, we like to do practical test drives. And so we needed to find an excuse and think, what can we actually build that maybe makes a little bit of sense when it comes into the context of Lambda MI and its particular characteristics. So what we thought is, again, video processing, which seems to come up a lot in our examples, maybe because we think about this podcast and how to optimize the production of the podcast itself.

And to be fair, this is not like a full implementation. We didn't really implement like, I don't know, wiring FFmpeg or ML vision models or I don't know, subtitles type of things. It's more the idea that the processing is simulated, but we built everything around it. So in the computation bit, you could plug in whatever logic you like. So if you want to use this, what we built, for instance, to, I don't know, extract the audio from a video or convert the video or whatever else you want to do that you can do with FFmpeg or something else, you can definitely do it.

So the idea is that we built a service with three main components. So there is a REST API that allows you to manage videos. So effectively it's like a CRUD API where you can create a video entry, you can list them, you can get the details. And most importantly, you can trigger processing and we'll see how that works in a second. So one detail that might be important to know is that we built a little bit of a lambdalith.

So effectively it's one Lambda that can respond to all the routes and it's behind an HTTP API gateway. So then we have a simulated video processor. So whenever you call the process Lambda, sorry, the process API endpoint that we mentioned before, effectively you want to trigger the processing of a video. So that happens in this other component, which is a simulated video processor that effectively is where you will put all the heavy lifting.

Like, as we say, different use cases might come to mind, thumbnail generation, transcoding, content analysis, subtitle generation, I don't know, chapter generation, whatever you think it makes sense. So in a real system, this is generally something that will be CPU intensive work. So particularly sensible for this use case where you might spin up a lot of EC2 capacity just to be able to do that at scale.

And then you still have the convenience of Lambda to package your code in a way that gives you a nice developer experience. And then finally, we have a step function, which may be a little bit unexpected. We'll explain why we did that. But the idea is that with this step function, we can orchestrate everything. And the idea is that we don't want to have, so we actually want to have capacity always available for the rest API, but for the processor component, we only want to have capacity on demand while still having the convenience of Lambda MI.

So the idea is that we, I don't know if this is actually a little bit of a hack, it feels a little bit like that. But the idea is that we keep the capacity for the processor to zero, which means that by default, effectively, that Lambda is deactivated. But then anytime somebody is calling the process endpoint, we actually use the step function to spin up more capacity. So we basically change at runtime the capacity definition to actually go from zero to a different range that, of course, you can configure to whatever makes sense to you.

But of course, one to whatever you like. So at that point, we can start to see the instances appearing and the step function monitors to the point that there is enough capacity to start doing the processing and effectively run our code. So it is a little bit of a hack to effectively get that scale to zero, which of course only makes sense if you control the event that triggers, in this case, the processing.

It wouldn't make too much sense, for example, in a REST API. So this is how we tested the two different scenarios, one where we have some kind of manual control of the capacity and the other one where we are just letting the service manage all of that. And the way that we do that scaling up mechanism or effectively changing, controlling the details of the scaling configuration of the processor function is that there is an API called put function scaling config.

And that's where you can define the minimum and the maximum capacity. If you set the minimum to zero, you are basically saying this environment is disabled. And when you set it to more like one or whatever you like, then effectively you are creating the EC2 instances. You are letting the capacity provider create the EC2 instances. And then that's where your Lambda function becomes active and you can invoke it.

Other small details that might be relevant. We use CDK with TypeScript. And if you use the latest versions, everything we just mentioned is supported out of the box. So there are no weird hacks that we could see. We use Node.js and Node.js 24 on ARM64. We store metadata in DynamoDB. So all pretty standard, I would say. All this code is available on GitHub. So publicly available. You'll find the link in the show notes. So feel free to check it out and let us know if you like it, or even feel free to submit a PR if there is something else that you find doesn't work or maybe you want to change and improve. So based on all of that, Eoin, what are our impressions or maybe limitations or pitfalls that we discovered?

Eoin: Well, given that we just did the ECS managed instances review in our last episode, selecting instances here with Lambda MI is a bit more limited because in ECS MI you have more of a query system where you specify characteristics of your instance requirements. And AWS will pick the right instance type for that, which kind of abstracts you from having to think about specific instance types. And if new instance types become available, they'll automatically get included in the potential query results.

But with Lambda MI, you can only specify a very limited set of parameters, initially maximum vCPU count, and the instance type filter, which is basically allowed instance types or excluded instance types. So an inclusion list and an exclusion list, and you still have to think about actual instance types. So it's strange they didn't follow the same model. Region availability is limited for now. If your workload is multi-region or you want to use specific regions, right now we've got what?

US East 1, US East 2, US West 2, AP Northeast 1, and EU West 1. So just five regions. In terms of runtime, the support is for the latest version only. Some people, especially enterprises, rely on the ability to pin specific versions of managed runtimes. So if you're migrating existing functions, anything on older runtimes will not qualify. You have to use the latest supported one. Anything that's on the deprecated list or previous list is not possible.

VPC networking is now mandatory. Of course it is because you know you have to run EC2 instances. EC2 instances, there's no exceptions. They always need a VPC. So if you put them in a private subnet with no egress, you're going to have an issue. You'd have to have, make sure your VPC has reachability through VPC endpoints to AWS services your Lambda function might need. Think about S3, DynamoDB, SSM.

So you'll need either a NAT gateway, internet gateway, or a VPC endpoint. This is exactly the same concern when you have a normal Lambda function in a VPC. But you just don't have an option to avoid VPCs now. Deployments with Lambda MI can be noticeably slower. The first published on a new capacity provider has to launch managed instances and bring up execution environments before the version becomes active.

AWS explicitly says this can take several minutes and we've encountered examples of around eight minutes for end-to-end deployment. And then there's a minimum size jump. You don't have the option of small utility Lambda functions with 128 megabytes of RAM anymore. With managed instances, the smallest one is two gigabytes with one vCPU. So it could be a surprise if you're just trying it out and trying to keep things minimal in terms of resources and costs. Also important to know that creating a capacity provider with manual scaling still spins up baseline capacity. We saw in our cases when we were doing testing, even when we created a capacity provider without attaching any functions to it, AWS was starting two instances. It's worth knowing if you expect it to be zero until you're attached to a function. We also had cases where after scaling down instances remained active for arbitrary lengths of time. It's definitely something to keep an eye on. Anything else to add to that, Luciano?

Luciano: Yeah, I think it's worth remarking that scale to zero is possible, but it is complex, so to speak. Like we have an example in our repo and people can check out the way we achieve it, but I don't think it's like a general purpose way that you can use it for any use case. In our use case, it makes sense just because we have a very clear execution path that determines when we need that compute capacity available.

And we effectively get it almost like a cold start in our example, just on demand, if you want to call it like that, using this instance under the hood and by changing the min max execution environment on the fly. But I don't think, yeah, as we were saying that that's something you can use, for instance, for an API that wouldn't work, right? So you just need to be aware that while it is possible, it is complex and it's not something you can use as a general purpose mechanism.

And the main reason is because when you set the min execution environment to zero, effectively your Lambda function becomes deactivated. And there is a very clear indication when you go in the web UI on that particular Lambda function, you will see a blue banner saying this version is deactivated. To activate it, you need to set the scaling to a value that is non-zero. So just be aware of that. If you were expecting automatic scale to zero, that's not necessarily the case.

The other thing is that because now you get concurrent execution, that comes with a few more headaches from a developer perspective, which I don't think is necessarily a bad thing. It's probably useful, but it's just something you need to be aware and consider in your code because otherwise you might have unexpected bugs or side effects. And this is the same thing that you need to worry about anytime you are building an environment.

It can even be a container, it can be effectively anything where you're running code concurrently, where you might end up with race conditions or issues of that kind. In the case, for instance, of Node.js, which is what we used, what can happen is, for instance, when you have global state that is shared across execution environments. So imagine you have a global variable that you put outside your handler, and then you reference that variable inside your handler.

Because you might have multiple instances of that handler running at the same time concurrently, if they are both changing that value, then you might end up with inconsistent state where one handler suddenly is seeing the data that was changed from the previous handler. Imagine this is like, I don't know, a user session. You might have two users that are effectively overriding each other. And this is something that might lead to very serious bugs. So just be aware of that.

Of course, there are other ways to avoid this problem. We're not going to go into detail here on how you can solve this particular problem, but just be aware the problem exists. And there are tons of best practices that you can find online for your specific language so that you don't run into kind of threading or concurrency issues, which you now might have depending on your language of choice, just because you have concurrency.

And another similar issue is that if you're using the TMP folder, that's also a shared thing between concurrent executions. So again, the same issue might happen in the sense that if you are creating a file from an instance, and then another instance is also trying to create that file, they might end up overriding each other. So just be aware and make sure you select file names that don't conflict, maybe using UUIDs or something like that.

Logs can also be problematic in that sense because they will interleave. And I suppose this is the reason why structured JSON logs are enabled by default and you don't get to change that. And AWS says they will include a request ID by default in every JSON line. So that should make it a little bit easier to avoid confusion between logs when you're just looking at the logs. But you can still see interleaved lines, so it's up to you to filter by request ID.

There are lots more potential pitfalls and it's nice that AWS has put a documentation page that goes pretty much in detail also, not just with the problems, but with detailed solutions. So we'll just give you a link that you can find in the show notes if you're curious. And they are also organized by programming language. So probably that removes a lot of the noise depending on your language of choice. You'll be focused on what really matters for that particular language. Now, I suppose the last topic and probably one of the most interesting for most people is how much is this going to cost me? Yeah, a big change here really because one of the biggest mindset shifts with Lambda managed instances is that the billing is no longer your memory times duration default Lambda computer model.

Eoin: So you're not paying for that dimension at all anymore. But of course, you're paying for the underlying EC2 instances which are running in your account for a duration that you don't necessarily control to a fine level. So with managed instances for Lambda, the official pricing model has three dimensions running in parallel. So you still have request charges just like with default Lambda. You pay 20 cents per million requests.

That's simple, familiar. It's independent from how long each request runs. And generally in any bills I've seen, that's a tiny negligible component compared to the other dimensions. Then you have your EC2 instance charge. And now you're paying the EC2 instance that backs your capacity providers using standard on-demand EC2 pricing. The key benefit is that now you can apply EC2 instance savings plans and reserved instances and any other EC2 discount mechanisms that might be applicable.

The new thing is just like with ECS MI, now you've got kind of a management fee, managed instance tax, if you like. With ECS, that was 12%. We worked it out. With Lambda, it's a 15% premium calculated on the EC2 on-demand price of the instance. So the important nuance to this is EC2 discounts applied to the compute portion, not to the management fee. You're always paying 15% of the on-demand list price for the management fee.

And also critically, spot instances are not yet supported, just like ECS, managed instances. What this all means is that if you've got steady state high volume workloads, you might have massive cost savings with Lambda MI, but you'll have to measure and have a look. Like you really need a consistent load or to be doing something like we did with the video processing example, where you're scaling it up for a certain amount of time, doing a large volume of batch processing and then scaling it down. But you can also, things like multi-concurrency and like you said Luciano, with the high volume API, you've also potential for cost savings too. So your mileage will vary, but high volume requests, longer running Lambdas really can benefit here. And of course, remember that you can leverage existing savings plans with EC2 and reserved instances to save even more. Yeah, I guess let's jump to the conclusions.

Luciano: I think what's important to mention here is that Lambda MI, it isn't like a new version of Lambda. It isn't Lambda replacing Lambda or Lambda V2 or whatever you want to call it. It's just a different execution model. So it's like a new option for executing Lambda code. So it's still the same Lambda developer experience and integrations. It's just the compute, the underlying compute is something that you get to control.

Before it was just happening magically behind the scenes. That's why we love to call it serverless. Now it's a little bit less serverless, but it's an option and there are benefits and cases where you might want to use this particular option. But I personally, I don't know if I like it or not. I think it's a bit sad that makes my mental model or decision making process or decision tree, if you want to call it like that, a lot more complicated because now there are more options and more dimensions to think about.

But at the same time, it's also a good thing because there are definitely cases where something like this is useful. So now you have that option without having to leave the comfort of Lambda or without having to do a massive refactor if you already have a solution that runs on Lambda. So again, good and bad things. You have more options to decide on. But at the same time, those options can be very useful in certain particular cases.

I still expect those cases are maybe limited. Maybe it's more enterprises with big workloads or maybe cases where you're doing cost optimizations or maybe cases where you need a significant amount of concurrency. But those cases exist. So now be aware that this option exists. So as we said, it shines where you have steady state, predictable loads, high throughput APIs, CPU heavy or long running work, batch workflows and so on.

I think it might not be the best use case if you are worried about fast spikes, for example, and those spikes can cause throttling. So depending on your use case, the more traditional Lambda approach might be better suited for that. You have more knobs to tune. So for instance, you have to think about max concurrency, utilization targets, instance shape. So definitely something else that's worth considering and adds complexity.

And then the minimum function size is also something you need to be aware of because if you have, I really like, for instance, to do single purpose Lambda functions. So I sometimes end up with even hundreds of small Lambda functions. I think this approach pushes you a little bit more into the lambdalith land as we did in our particular example. So just be aware that there is nothing necessarily wrong with it. It's just something you need to consider. And I'd really like if there was an option here where if it was scaling slowly and maybe you're getting some throttling because it's trying to provision a new EC2 managed instance for you.

Eoin: If then you could say, well, in the meantime, just use default Lambda for that function, right? And handle the scale by using the normal execution mode, but you can't mix the two with Lambda-MI. You're either using one or the other, which is a bit of a shame. It would be nice to be able to have a blend. Yeah, maybe we can consider this a feature request if anyone from AWS is listening, but I definitely agree with you.

Luciano: So just to close things off, the bottom line is that this is a new tool and it might be a very good tool for the right workloads. It's not necessarily something we should consider as an upgrade to Lambda, but then yet again, it's definitely a useful tool and it's worth to know when to use it and how you can use it. I remind you that we have a repository with an example. You can find it in the show notes.

So if you have tried it, let us know. If you like it, let us know why. If you don't like it, also let us know why. We are always open to talk to you and hear your opinion and see if maybe you found other use cases that we didn't think about. One last thing. Thanks to fourTheorem for sponsoring yet another episode of AWS Bites. fourTheorem is an AWS partner. It's a consulting company. We can help you with your AWS architecture. We can make sure that your implementation are simple, scalable, cost-sane. So if you're curious, check out fourTheorem.com and find our case studies and get in touch if you want to know more. So thank you very much and we'll see you in the next episode.

AWS Bites Podcast

151. EC2 loves Lambda - Lambda Managed Instances

Let's talk!

Next