Exploring All-New ECS Managed Instances (MI) Mode

Description
Transcript

Love AWS Fargate, but occasionally hit the "I need more control" wall (GPUs, storage, network bandwidth, instance sizing)? In this episode of AWS Bites, Eoin and Luciano put the brand-new Amazon ECS Managed Instances (ECS MI) under the microscope as the "middle path" between Fargate simplicity and ECS on EC2 flexibility.

We unpack what ECS MI actually is and where it fits in the ECS spectrum, especially how it changes the way you think about clusters and capacity providers. From there we get practical: we talk through the pricing model (EC2 pricing with an additional ECS MI fee that can be a bit counterintuitive if you rely heavily on Reserved Instances or Savings Plans), and we share what it feels like to finally get GPU support in an experience that's much closer to Fargate than to "full EC2 fleet management".

To make it real, we walk through what we built: a GPU-enabled worker that transcribes podcast audio using OpenAI Whisper, including the end-to-end setup in CDK (roles, capacity provider wiring, task definitions, and service configuration). Along the way we call out the rough edges we ran into, like configuration options that look like they might enable Spot-style behavior, and the operational realities you should expect, such as tasks taking roughly 3–4 minutes to start when ECS needs to provision fresh capacity. We close by mapping out the workloads where ECS MI shines (queue-driven GPU jobs, HPC-ish compute, tighter storage/network control) and the scenarios where it's probably the wrong choice, like when you need custom AMIs, SSH access, or stricter isolation guarantees.

Big shoutout to fourTheorem for powering yet another episode of AWS Bites. At fourTheorem, we believe the cloud should be simple, scalable, and cost effective, and we help teams do just that. Whether you are diving into containers, stepping into event driven architecture, scaling a global SaaS on AWS, or trying to keep cloud spend under control, our team has your back. Visit fourTheorem.com to see how we can help you build faster, better, and with more confidence using AWS cloud.

In this episode, we mentioned the following resources:

Let's talk!

Do you agree with our opinions? Do you have interesting AWS questions you'd like us to chat about? Leave a comment on YouTube or connect with us on:

BlueSky: @eoins.sh + @loige.co,
LinkedIn: eoins + lucianomammino,
Twitter: @loige

Help us to make this transcription better! If you find an error, please submit a PR with your corrections.

Eoin: Regular listeners know that we are big fans of Fargate for running containers on AWS. Anytime that AWS gives us a nice way of doing less configuration, getting up and running quickly and reliably, we grab the opportunity. Of course, there's always a trade-off when you cede control to a cloud provider, and Fargate is not without its limitations. You don't get to use GPUs, you can't control network bandwidth, and you don't have a large number of storage options.

And that means you sometimes need to jump back to ECS on EC2, and then you have to take on the extra work in configuring and securing and keeping those instances running. We now have the brand new ECSMI managed instances support, and this could be the best of both worlds, promising control over storage, networking, and GPU support with the low management you get with Fargate. We've spent a good bit of time evaluating ECSMI, and we wanted to share with you what we think, and we'll cover GPU support, cost factors, scaling, performance.

You'll also hear from us what kind of use cases and workloads are best suited to ECSMI. So let's dive in. I'm Eoin. I'm back with Luciano for the latest episode of AWS Bites. AWS Bites is brought to you by 4Theorem, and we'll find out more about 4Theorem at the end of the show. So choosing something to run your containers is always a trade-off between simplicity and cost and control. We've talked about it a lot before, and we've mentioned the simplicity of AppRunner and Fargate, which we both like. And with both of those options, you don't ever really need to think about the underlying EC2 instances that AWS completely hides from you. ECS on EC2 lets you run tasks on a huge range of instances, but you need to pick the AMI, configure the EC2 correctly, ensure it has the ECS agent, all that stuff, and ensure that the instances are scaling in tandem with ECS tasks. So Luciano, what is this ECSMI, and where does it fit? So yeah, basically, when you say I need to run a container, there is always a hidden requirement baked in, which is you need somewhere a machine that runs that container.

Luciano: And that means thinking about CPU, memory, networking, storage, sometimes even GPUs, as we'll see in the next few minutes. So the real question is how much of that stuff do you want to care about? How much work do you want to put into managing all of that infrastructure? And when it comes to ECS, typically it's a spectrum, and you have two contrasting options. On one side, you have Fargate, and Eoin, as you put it really well in the introduction, Fargate is almost like the serverless option of containers, if you want to call it like that, where basically you only worry about your application code.

You define the task as a container that runs your application, and then whatever is the underlying infrastructure, AWS manages all of that and actually hides all of that complexity. This is great when you need to move fast and when you don't have really advanced requirements in terms of the underlying infrastructure, which is probably most of the use cases. So I think for most people using Fargate, it's probably a good win.

A caveat, maybe if you need something like a GPU. And in that case, we need to look into the other end of the spectrum, which when looking at ECS is basically you use ECS with EC2 instances, which means when you create a cluster, you need to provision your own EC2 instances, which suddenly means you need to pick the right instances. You need to decide which operative system are you going to use. You need to make sure you provision all the right software and libraries.

You keep the operative system up to date and so on. So you are on to AMIs, launch templates, autoscaling groups, patching, maintenance, all that kind of stuff, which for some people can be very joyful, but I think for other people could be very daunting and time-consuming. So in a sense, we could say the ECS managed instances in this spectrum is something that stands in the middle and it tries to give you the best of both worlds.

And so with this third option, basically you keep the flexibility of EC2, but without that management burden. So in practice, what this means is that you still need to define somehow what kind of compute power you need, but rather than actually picking the instances and the operative system and creating AMIs and so on, you just define a list of requirements and give it to AWS. And then AWS is going to try to select the best machine that fits that list of requirements and just spin it up for you.

You don't even know which operative system is running. Technically, you know, we can tell you it's Amazon Bottlerocket, but in practice, it doesn't matter because you're not managing that and you cannot even log in into the machine. So you kind of get a Fargate-like experience where you know there is a machine that is running your container. You don't get to see much of that machine. The advantage compared to Fargate is that you have much more control in expressing the requirements for the underlying infrastructure. So today we are going to be spending a little bit more time giving you the details of what we learned by exploring this new option that you have in ECS. But I think we should start by talking about costs, because I think there are some important nuances there that we need to clarify up front. Yeah, I think so.

Eoin: We normally talk about cost at the end, but since it's such a big aspect of the pitch around ECS managed instances, let's give the lowdown now. The pricing model itself is reasonably straightforward. You're paying for the underlying EC2, right? So you pay the normal EC2 per second pricing for the instances. And that includes the normal benefits you would get with compute savings plans or reserved instances, etc., if you have those enabled.

And this is a big part of the cost calculation with ECS MI, that you can use reserved instances. It's not an option you have with Fargate. Now, the important other point is that there is a management fee on top of that EC2 pricing. So AWS provides a separate ECS managed instances price list of the fees per instance type on the pricing page. So you have to look at each instance type. But we just did a calculation of like analysis of all the data against the EC2 instance pricing.

And it seems like it's 12% is the management fee overhead. And that's based on the on-demand instance price. So in some cases, that might be fine. You can accept, I think, as much as we all hate giving money to AWS, if they're doing a little bit more work and investing in infrastructure to lower your total cost of ownership and reduce the maintenance effort, the management fee might seem fair. But on the other hand, it does kind of challenge the claim that ECS MI is a more cost-effective approach.

If you're benefiting from compute savings plans or reserved instances, you're still going to pay 12% of the on-demand cost for the managed instances. So you would need to do that calculation and figure out what works out for you. But do bear in mind the total cost of ownership and the time you'll save because that's not trivial. There is one more bit of bad news. At least as of today, ECS MI does not support spot instances.

Now with ECS on EC2 and ECS on Fargate, you have a spot option. With ECS MI, right now, you don't. So that's a pity, especially since we thought we'd be able to use this to get really, really cheap infrastructure. But that's not really the case. That's the key caveat, right? It's not automatically true that ECS MI will save you money. Depending on your workload, how much you benefit from spot today, how mature your existing ECS on EC2 setup is, you may already still come out ahead running your own EC2 capacity, especially if you've invested time in rightsizing, scaling policies, AMI pipelines, operational playbooks, all of that. And you've got your compute savings plans, reserved instances, all of that already sorted. ECS MI is essentially trading some control and a margin on top of on-demand pricing for reduced operational work. And whether that trade is worth it really depends on your story and your constraints. The only honest answer here is to benchmark it against your workload and your current costs. I'm afraid we can't do that for you. But we did do some work. So do we want to talk about our story and how we evaluated ECS MI? Hey, Luciano, do you want to talk about how we got started and how people will get started?

Luciano: Yeah, I think it's a good idea. But maybe before we actually get into that, I think it's useful to give some plain English definitions for the necessary terminology that you are going to encounter when dealing with ECS and especially managed instances. Because, yeah, if you haven't used ECS in a while, I think there is a lot of terminology that is not necessarily obvious. So I always get confused myself.

So I think it's just a good refresher. So let's start with cluster. So with cluster, you can say it's basically like a logical home for your ECS workloads. It's basically where you define tasks that can run. And effectively, underneath a cluster, you have basically servers that are powering all of that machinery. Then we have task definition. So a task definition, you can think of it like a blueprint for container workloads, which doesn't just contain a container image, but it also contains the specifications such as how much CPU do you need, memory, maybe environment variables, networking configuration, and so on.

Then you have a task as opposed to task definition. So task definition is the blueprint, while a task is actually a running copy of that task definition. So it's almost like you have an instance that is running for that specific task definition. And as you might have one task definition, you might have multiple tasks that use the same definition to distribute that workload. Then you have the concept of service, which is also a bit confusing because sometimes I find it feels almost the same as a task definition, but actually it's a bit more in the sense that it's almost like what keeps the tasks running.

So the task definition gives you the blueprint. The task is the actual instance running. But a service is basically where you specify how many tasks do you want and what are the conditions for which these tasks should run. So that includes, for instance, deployments. Like, for instance, if you're going from version 1 to version 2, how you do roll out those version changes. Or maybe auto scaling, like if you have specific requirements where, I don't know, maybe depending on traffic or maybe depending on work available in a queue, you might want to scale things up and down.

You define all of that stuff in the service definition. Then we have capacity provider. And basically, capacity provider bridges the gap between ECS and where compute comes from. So that can be, for instance, ECS is more like a service that manages container, if you want. And you can think of it as it's abstracting the underlying infrastructure. So when you work with capacity providers, you actually say, okay, what kind of abstraction do you want to use?

And that can be Fargate in the case of a more serverless option. It can be EC2 with auto scaling groups or EC2 with managed instances. For managed instances, the capacity provider is basically where you have to describe what kind of instances are acceptable using something called attributes. And there are generally two options. One is the default capacity provider, which is pretty much automatic. So it tries to select the most cost-optimized general purpose instances for your workload requirements.

Or you can also use custom capacity providers where you have more control in specifying those attributes, which are things like the vCPU count, memory, CPU manufacturers. If you need any accelerator, which is not just GPUs, you can even use, I think, like TPUs and FPGAs as well. So, yeah, basically, this covers also what an attribute is. It's basically a list of desirable features of your underlying infrastructure.

And you can even get more specific, like technically you don't want to hardcode the instance types, but you can even define a subset of instance types that are acceptable if you want, or as we say, the kind of CPU that they need, or maybe if you want burstable performance or bare metal instances. So it can get pretty granular and you can imagine like a big filtering query across all the possible available instances in AWS, just trying to select the best one in terms of cost effectively is the best one, but also matching all the requirements that you are providing. So hopefully that's a good introduction before we start to talk about what are the things that you need to do step by step to get to a running deployment using ECS managed instances. Okay, we'll try and go through the list of steps then as comprehensively as we can.

Eoin: I mean, it starts with an ECS cluster like you might have used before. You may already have one that you can use. So you'll need your VPC, you need to create your cluster or use one if you've got one already. And then managed instances needs two IAM roles. One is going to be the instance profile role, which is the same as you always have with EC2. And then you've got an infrastructure role, which is the one that ECS MI would use to launch your instances and pass that other role to the instance itself.

You need a security group, of course. And then you get into the real MI specific stuff, like the capacity provider Luciano was already explaining very well. You can start with a default capacity provider and ECS will just pick the capacity for you. Or if you want to specify those instance attributes, like VPCPU and memory, CPU manufacturer, accelerators, storage, networking. It's basically like, it's a bit like SQL, right?

You're just providing a list of filters for all of the instance types that are supported by ECS MI. Then you attach your capacity provider to the cluster and you're ready to start thinking about tasks and containers. So you publish your container image to ECR and define a task definition. One thing is that in your task definition, you should specify managed instances compatibility. There's a new value for that.

And then you can add your containers, your environment variables, logging, etc. And if you need GPUs in your container, that's also something you need to specify with the GPU count attribute, resource attribute in your container as well. Then you will set up your task role as normal to give the actual task its permissions. And you can create an ECS service. You don't absolutely have to create an ECS service.

You can always just run tasks directly in the cluster. ECS services are what I think 95% of workloads are using because they will scale for you. And have other features. So you will set the capacity provider strategies on your service to the managed instances capacity provider name. And then you may want to set a desired count or let the service scale that for you. Another interesting point when we're talking about CDK is that if you're using managed instances, you're still using the Fargate service level two construct, which is a bit confusing.

But that's the way it is right now. There's official examples from AWS using that. And our own example we'll share later does the same. We'll talk about that in a bit. If you want auto scaling, you can also add your deployment auto scaling in the ECS service too. And then you will deploy the stack, trigger your workload. And based on your scaling configuration, you should see tasks starting in ECS, managed EC2 instances starting, and they'll appear in your ECS console under infrastructure, but also in the EC2 console. So that's roughly the list of steps you need to get going. Luciano, do you want to talk a bit more about that CDK support? Because, and maybe we'll talk about the code example we have as well, because there's not a lot out there. And we thought we'd try and add some examples that people might use as a basis for their own. Yes, so we are going to make available this template or example, if you want to call it like that, which uses CDK to provision an application running on ECS, managed instances.

Luciano: And yeah, this is just our honest attempt at providing an example. It might not be the best one going forward because, yeah, we just tried to build it from scratch. We didn't really find a huge amount of reference or examples out there. So hopefully this is our first good attempt, but we're definitely going to rely on the rest of the community to improve it or provide other examples as well. So what do we do in this template?

So we make a little bit of an assumption, which is not necessarily what you might want to do, but it's one use case. And the use case is basically you have a workload that needs GPU, and this workload runs based on tasks that are available on a queue. So it automatically scales to zero. If there is nothing in the queue, you have nothing running. If you drop messages in the queue, it will spin up all the necessary infrastructure until it processes everything from the queue and then shuts down all the infrastructure again.

And we actually provide two examples. So there is almost like a vanilla container that the only code you will find, it's Python code, by the way, is just pulling from the queue and then just telling you this is the message, do whatever you want with it. And then we have a little bit of a more realistic example. So the first one is almost like a template. You can put whatever business logic you need inside of the container and deploy it.

While the second one, it's a realistic example where we use OpenAI Whisper to transcribe a podcast audio that is coming from S3. You might remember we have talked about our current solution to transcribe this very podcast that is using SageMakers today. So maybe eventually this is going to become a replacement if we see that there is an improvement in terms of pricing and or performance. We haven't done all the maths yet to decide whether this is the solution we want to go.

But technically, we could prove that this solution works. So it might be an alternative. So I think it's important to mention what did we learn while writing all this CDK code. It's not huge. I think it's a file with just a couple of hundreds of lines of code. And yeah, already Eoin mentioned the limitation that you have to use this Fargate service, which is a little bit counterintuitive. Looks like you're doing something wrong because we know we already mentioned today that Fargate is a different way of using ECS.

I think it's just that the team hasn't created a more specific construct just to be able to work with managed instances. So Fargate is probably the best approximation you get so far. But it doesn't just get confusing because of the name. It gets confusing also because you will have attributes that are available to you. But then those attributes don't really do anything useful. And this is something that misled us into thinking that you could use spot instances, for example, because there is a very long attribute that is called something like max spot price as percentage of optimal on-demand price.

Something like that, that basically makes you think, OK, this is a way for me to be able to select spot instances. And we spend hours just trying to figure out what was the right configuration to get a spot instance, only to eventually find out that this attribute is ignored. And there is no way, at least today, that you can get a spot instance. So this is another side effect of having this Fargate service construct, which is, again, just an approximation. But you get probably more than what you actually need. And it's just confusing. And then, again, we already mentioned there is a little bit of a lack of documentation and proper examples. But we are confident this is just something that is going to improve. It's a new service. And it's probably normal that there isn't a lot out there yet. So hopefully we can help a little with this one example. But, yeah, we look forward to seeing more examples from the rest of the community. So what do you think is that of the GPU support? Because this was, I think, a big component in our tests.

Eoin: Yeah, I think you've asked publicly on the podcast many times for AWS to add GPU support to Fargate. It was on your wish list. And Lambda. Well done. Oh, yeah. OK, good luck with that. But look, it did work with GPUs. We were able to use it for the podcast case. I mean, it is. I think we did share our frustrations with SageMaker, batch transforms in the past, which we use for the podcast. It works.

And it has worked very reliably since we started using it with the whisper to do the transcription. But the overall process takes about 30 minutes. And the majority of that is just overhead while we wait for SageMaker to get a container up and running. So we were looking for other options. And this worked reasonably well, I think I can say. You can select a number of GPU accelerators. Which is good.

I'm sure these days people will have lots of use cases in mind. But also bear in mind that you may need to request a quota increase. Because GPUs don't just become available automatically on your account. They want to protect you from bill shock by making you ask explicitly for the really expensive instance types. GPUs typically belong to that category. So make sure you get the service quotas in quickly if you want to launch ECS-MI production next week. Now, that's GPU support. Should we talk about where ECS-MI fits for use cases, Luciano? Yes. So if you have reserved instances or capacity reservation and you want to combine the benefits of using those things with the ease of management of ECS, I think using managed instances, it's a good idea.

Luciano: And basically you get something that is close to target in terms of experience, but with a lot more control on the underlying hardware. So, for instance, you can enable GPUs this way, which was our main use case. Other thing that might be really useful is, for instance, when you need fine-grained control over storage and networking. Maybe because you are doing something that requires very specific performance or very specific characteristics in either the storage or the networking layer.

But if we want to provide some more practical examples, what we did was basically a low-volume AI workload that required GPU. In our case, it was audio transcriptions, but you can imagine something like, I don't know, you have a bucket, a three bucket full of pictures, and you want to come up with, I don't know, automatically labeling all these pictures or maybe creating captions, that kind of stuff.

That could be another very similar use case, and I think you can very easily change our container example to build something like that. So, I think this is a very good use case, and you can just summarize it as when there is work to do, and you need very specific infrastructure for that work to happen. You just define all of it, put the work as a message in a queue, and you know that it's going to scale up when needed and scale to zero when all the work has been completed.

So, I think that's an excellent use case, and I can see ourselves as an AWS consulting company using this kind of approach often enough because we see the kind of problem happening in many different ways very often. Another thing we see a lot, especially with financial services customers, is HPC, high-performance computing. And this is the case where basically you need to have lots of compute, but also lots of performance in terms of networking and disk access.

So, using something like ECSMI can be a very good option. And we haven't done a lot of experimentation yet, but I think it's a very interesting fit, at least on the paper. So, we want to test it more and see if it's really something we can run in production for this use case. And in general, in our experience, we could say that starting a single task took three to four minutes for the use case we just described.

So, we didn't do a huge amount of testing with many containers. Like, generally, we were trying to run one instance and one container. But I think it would be really interesting to see how much this scales and how fast. Can you run thousands of containers? And if it's possible and there is very little overhead, then there are a lot of other scenarios that are interesting and possibly even competing with Lambda. So, I don't know. I think we'll need to see what is the timing there and the volumes that you can get to. But definitely something worth experimenting more because it might unlock a lot of other use cases. But I think that that only tells you what you can do with it. I think that it's also interesting to see what are some use cases that you should definitely not use ECSMI for. I don't think there are that many, to be fair.

Eoin: But one thing that's pointed out of the documentation is around strong isolation. Because Fargate, you may know that it uses these really cool firecracker VMs that offer very strong guarantees around isolation. With ECSMI, you're still using EC2. Tasks can be running on the same instance, which means you don't have the same level of guaranteed isolation and security. You can configure ECSMI to run tasks on separate instances.

But then you're kind of losing the benefit of being able to pack multiple containers on a shared instance for cost and efficiency. So, security-conscious workloads. Take note of that. If you need a custom AMI for whatever reason, you have special monitoring security agents. You can't, right? You can't use managed instances with a custom AMI. That's the whole point. It's managed. And I think, lastly, if you want to shell into the instance itself, you can't do that again.

Just like Fargate, you can shell into the underlying instance. You can ECS exec into the container itself, but not into the host. And that's it, really. I think there isn't that long list of use cases I wouldn't explore ECSMI for. I think it's quite broad in its applications. It's probably time to wrap this up. I think, overall, we think this is a good addition to ECS suite. Pricing might be a bit of a downer, but that completely depends, as always.

So, worth evaluating carefully. The lack of support for spot instances, I think, was the biggest disappointment. But we hope it's going to be there eventually. The documentation and CDK support could be better. Lots more examples. Again, something that we hope will improve with time and more adoption. We can see ourselves using ECSMI in production. So, I think, overall, we're pretty positive on it. But if you've got any more thoughts on ECSMI, reach out to us and let us know in the comments below or on social media. All our links will be in the description. Finally, I just want to thank 4Theorem again for powering this episode. If you want help designing an AWS architecture that's simple, scalable, and cost-sane, head to 4Theorem.com. Thank you. We'll see you in the next one.

150. Exploring All-New ECS Managed Instances (MI) Mode

Let's talk!

Prev

Next

AWS Bites Podcast

150. Exploring All-New ECS Managed Instances (MI) Mode

Let's talk!

Prev

Next