AWS Bites Podcast

31. CloudFormation or Terraform?

Published 2022-04-07 - Listen on your favourite podcast player

Should I use CloudFormation or should I use Terraform instead? If you are just starting to write Infrastructure as Code (IaaC) you probably have this question. In this episode, we will discuss in detail how these two amazing pieces of technology compare against each other and what their features, weaknesses and strengths are. We will share our opinions based on our experience with these 2 technologies and guess what, for once we have a bit of clash of opinions! Can you guess who is in the Terraform camp and who is in the CloudFormation camp instead?

In this episode we mentioned the following resources:

Let's talk!

Do you agree with our opinions? Do you have interesting AWS questions you'd like us to chat about? Leave a comment on YouTube or connect with us on Twitter: @eoins, @loige.

Help us to make this transcription better! If you find an error, please submit a PR with your corrections.

Luciano: Should I use CloudFormation or should they use Terraform instead? If you are just starting to do infrastructure as code, you probably have this question. And in today's episode, we'll try to cover these two technologies and highlight some similarities and differences. We will also try to give you our opinions on which one is best and which one you should use, but get ready for some debate because I think here we have a difference of opinion. My name is Luciano and today I'm joined by Eoin and this is AWS Bites Podcast. So let's start by maybe recapping what both CloudFormation and Terraforms are useful for and maybe what infrastructure as code is. What do you think there?

Eoin: When you're using infrastructure as code, what you're talking about is declaring the state of everything you want to be in AWS and you use some sort of tooling. It could be Terraform or CloudFormation and that tooling will perform a set of actions to get you from your current state into the target state that you've declared in code. And so why would you do that? Well, we've covered this in previous episodes, but with infrastructure as code, what you want is to make sure you've got a predictable deployment so you know exactly where you're coming from and where you're going to, and you want to get some safety around what's in there and you don't want to have any kind of unpredictability there. So you can do things like code review, so you can have pull requests on your infrastructure as code and traceability, you know, you have a branch with your resource code changes as well as your application code changes. So what it gives you is the ability to be able to deploy your infrastructure in multiple environments and make sure that it's exactly the same in each one. So given that, I guess, yeah, that's we could that we could probably go into the two tools then. So we're talking today about CloudFormation and Terraform. Would you like to talk about Terraform first? Yeah, I can start with Terraform.

Luciano: So Terraform is a tool that was not created by AWS, but by another company called HashiCorp. And basically it's, yeah, it's an open source product. So you can just go on GitHub and look at the source code, even contribute yourself. But of course, Terraform being a company as also a commercial offering and specifically in relationship to Terraform, they have something called Terraform Cloud and Terraform Enterprise, which are tools that allow you to automate, I suppose, some of the manual work that you'll need to do yourself around if you're just using that, the bare bone open source Terraform. It supports, of course, infrastructure as code, but not just for AWS, but also for many other cloud providers. And by the way, not just the main ones, like Google Cloud or Azure, there is a very long list and you can even support, I don't know, smaller DNS providers. So you can really get very granular. There is a huge variety of providers, I guess is the technical terminology supported by Terraform.

One interesting thing, which is very distinctive from other tools like Terraform is that they use, it's not really a proprietary language, but it's something that came out of Terraform itself. So it's kind of a bespoke language to define infrastructure as code. And this language is called HCL, which stands for HashiCorp Language HCL. And the CLI allows you to, once you have defined your own infrastructure as code using HCL, allows you to create a plan, which basically means if I want to apply this particular configuration to remote deployment, what's going to happen? And Terraform is basically going to give you a plan of the different things that will change. So this is something very visible that you can see and decide, okay, this looks correct. At that point, you can decide to execute that plan and actually apply all the changes. Another interesting thing in Terraform is the concept of state.

And by state, the way I like to think about it is basically the view that Terraform has of your current deployment in production or whatever other environment that is. And that is important because every time you need to reapply something, like you want to change resources, Terraform will look into the last state, the last known state, and use that to decide what to do next. So this is another interesting thing because basically Terraform doesn't really go into the infrastructure itself and try to assess the current state, but uses the previous representation of the state. So very careful there because if you change things manually in between, Terraform is not going to see your manual changes. It only remembers the last state managed by Terraform. Now, by default, the state is just a JSON file that ends up in your file system when you run Terraform locally.

But of course, this is not ideal because if you're working on a team, you might want to have that shared so that everyone deploying will have the same view of the world. So there are ways that you can store that state in S3 or in DynamoDB or in some other shared storage. And this leads to another interesting feature of Terraform, which is we can say it's a client server model, where you are running these changes from a client. It could be like your own development machine, it could be a CI CD pipeline. But what happens is that once you try to run the CLI, the CLI itself will do all the changes on your infrastructure by calling AWS APIs directly. At least this is in the case of AWS. Of course, if you use other providers, it will interact with the APIs of those providers.

So this is kind of the idea that Terraform will call the APIs for you as if you were calling your APIs directly from the CLI or maybe if you were running CLI commands against AWS. So just to recap, the main feature is that you can write your infrastructure as code using HCL, then you can plan what happens if you want to apply those changes. You can actually apply the changes or execute the plan. You also have rollbacks. So basically you can decide to go back in case something went wrong. But there are also other interesting extensibility features like plugins and modules that we can probably discuss a little bit more later. And you can deploy to multiple cloud providers. And finally, you can also query existing resources. So for instance, if you want to use things that are already online as part of your definition of infrastructure as code, like I don't know, you want to reference a bucket that was already provisioned before, you can do that as part of the HCL language. There are constructs that allow you to do that. I hope that there was a comprehensive view, but yeah, what about CloudFormation at this point? Yeah, I think that sets it up nicely for a comparison.

Eoin: So I would describe CloudFormation as the primary difference really is that while Terraform is a tool that operates in that client server mode you described, CloudFormation is an AWS service. So it has a client, but the main features of CloudFormation happen and the AWS managed service. And that's the fun part and that's the fundamental difference. And it's quite an important one as well. So what you're doing essentially is you're declaring the state of all of your cloud resources and you're giving the templates to AWS and asking AWS CloudFormation service to apply the differences for you. So it all happens then cloud side rather than this client server model. So in terms of how, what the templates look like, it uses JSON or YAML. So it's not the language that CloudFormation gives you out of the box is not as powerful as what you get with HCL, but there's additional tools you can use to overcome some of the limitations of JSON or YAML. So there's a lot of features in CloudFormation. It's actually growing even in the last year, we've had a lot of new features, but fundamentally the state in CloudFormation is stored in something called a stack. And that state is something that AWS manages for you. So you don't have to think about where that state is.

So a stack is essentially a collection of resources with its own state. And there's also something called stack sets, which is essentially the same stack applied across multiple accounts or regions. You also have things like nested stacks. So you can have a hierarchy of CloudFormation templates. And when you mentioned Terraform plan, there's a similar idea in Cloud formation called a change set. So you can create a change set and that's also created on the cloud side. And it's essentially a list of changes that will be applied. And then you can decide whether to execute that change set or not. You also mentioned rollbacks. So one of the things I like about CloudFormation is that it has automated rollbacks. So I think about more like when you're interacting with a database, you've got a transaction and within that transaction, you've got a series of changes and a CloudFormation update is very much like a transaction. And if one of those updates fails, CloudFormation will manage the rollback for you. And for that reason, it feels safer using CloudFormation because it's kind of AWS's responsibility to fulfill that rollback. Recently, you actually have support for removing or disabling rollbacks in development as well. So that makes the development process a bit handier, a bit faster. And you also mentioned one of the things that can happen with any of these tools is that manual changes, people can go in and make changes that aren't reflected in the stored state. And one of the newer features of CloudFormation is drift detection. So it'll let you know and track the state of resources against the template. So you can see what's changed compared to that state. That can be useful. It doesn't support all the resource types, but I think there's a growing set. There is support for information as well for importing existing resources. It's not something I would like to have to do very often because it's a little bit of a laborious process, but it's also a reasonably new feature where if you've got some resources that you created manually in the console, you can kind of adopt them into your CloudFormation stack. There's a lot of other new features as well, like hooks. So you can execute arbitrary code at different stages in the deployment life cycle. And there's this kind of other stuff as well. Like if you've got an auto scaling group, CloudFormation integrates with that as well and can do rolling deployments. I think one of the kind of last features I'd call out is that CloudFormation gives you very good secrets management. So it's well integrated with SSM, parameter store for secure secrets and also secrets manager. So you don't have to pass those around. They can be imported securely within the cloud side, within the CloudFormation service for you. And again, because it's JSON or YAML, it's fairly flat and declarative and not very dynamic. And that can be a benefit, but also a drawback. I know in Terraform, you have like loops. You can do count, loops up to account, like a for loop essentially. In CloudFormation, you don't have loops, but you do have conditions. So you can decide whether to include something or not based on the value of a parameter. And that parameter could come from inputs to the template, or it could come from a SSM parameter for you.

Luciano: On that one, we have an article that we wrote some time ago with examples that we'll put it in the show description. Yeah, definitely. That's a good chat.

Eoin: Because it's an AWS service as well, I suppose it's worth calling out some of the integrations that CloudFormation already has with other AWS services. So if you're into code deploy for deploying to EC2 or Lambda, CloudFormation integrates well with that. So you can do rolling deployments there. Of course, it's integrated with IAM. So your CloudFormation actions are going to be done under a role that you can specify and control. And if you want to be able to deploy CloudFormation templates, give users the capability to deploy stuff from the console on demand, like if they need a bucket or whatever application you might want to deploy on demand, there's a service called Service Catalog that uses CloudFormation under the hood for that. So those are some integrations of note. And of course, you also have the tooling that's built on top of CloudFormation. I think I read somewhere recently that like 70% of CloudFormation is deployed using the serverless framework. Don't quote me on that. Something pretty high. Yeah. And AWS SAM is a similar tool that's also built on CloudFormation. And since we mentioned that with YAML and JSON, it's not very dynamic. Of course, you have the CDK, which we have covered in a previous episode in a lot of depth. And that's a programmatic imperative way of generating CloudFormation in your language of choice. There are some limitations in quotas because it's an AWS service. In terms of numbers, you can put 500 resources in a stack and you can have up to like 2000 stacks, which should be plenty for a given account. That number of resources was increased a couple of years ago, maybe even last year, but I've never, I definitely haven't reached the 500 resource limit because I tend to use small stacks. I prefer things that way. The template size itself can be 50K, but you can put it up on S3 and then you can use a template size of up to a megabyte. Which considering that's- One of the other important limitations, actually, just to mention it quickly, is that with CloudFormation, you cannot modify a resource that isn't within the stack. And that's an important one to be aware of. So if you've got an existing bucket and you wanted to add, like previously it was quite common, you'd want to create an application, but you wanted it to be a Lambda function to be triggered by a notification in S3 bucket. If you were trying to modify the bucket's notification configuration, you couldn't do that in a different stack. So you had all these workarounds in serverless framework that would create custom resources to fulfill that for you.

Luciano: But you can also import something into a stack, right?

Eoin: Yeah, you could, you can import that, but you couldn't say have a shared bucket and then have lots of different other stacks that create notification configurations in that bucket. So that's it. Should we talk about some of the differences then? Yeah, let's try to- CloudFormation Terraform, let's try and pick a winner here. Let's try to summarize the differences first.

Luciano: We already mentioned some of them, but I think it's good to highlight them into a little bit more detail. So probably the first one, again, is that with Terraform, you have this client-side mode where everything is happening in the machine that uses the Terraform CLI. So that machine is responsible for calling all the APIs and make sure that all the changes are applied through API calls. While in CloudFormation, it's a managed service by AWS, you just submit your YAML or JSON, and then AWS will take care of applying all the changes for you. So you could even disconnect the machine at that point, all the changes will still go on. So in that sense, probably plus one to CloudFormation for me, because of course it gives you a little bit more of peace of mind because you don't have to think, what's going to happen to this machine while the changes are happening? So AWS will take care of all of that for you if you use CloudFormation. Do you agree with that? Yeah, I agree. It's just a managed service idea, right?

Eoin: It's taking more of the responsibility away from you, which is always a good thing in my book. I can cite Ben Quijo's tweet on the matter there. I know he's a big fan of CloudFormation and that Cloud-side model. I think that I stole that term, Cloud-side from him. But he mentioned in a tweet there, which we can link in as well in the show notes, he said, going from CloudFormation to Terraform because of CloudFormation's shortcomings is like getting frustrated with Lambda and going to Kubernetes. Sure, you can accomplish what you want there, but with a bigger TCO. So your total cost of ownership might be higher because you're adopting a tooling that isn't a managed service from the cloud provider. It's probably an opinionated view, but I would lean towards that side of the argument. Absolutely.

Luciano: And if you want something like that, I think you can use one of the commercial offerings from Ashgore. But of course, at that point, you have to pay another provider and set up that account and manage that account. So maybe you have less responsibility at that point, but it comes with the additional cost of paying the provider, but also starting to use all new tools there.

Eoin: For sure. Yeah.

Luciano: Another interesting thing, and again, this is maybe a little bit opinionated, that Terraform feels a little bit more modular and extensible if you want, because there is a concept of modules, which is literally with the same syntax you use to define, I suppose we can call it a stack in Terraform. You can say, this is not a stack I want to apply right now to an actual deployment, but it's just like a prototype. And I'm going to accept some generic inputs, produce some outputs, and that becomes a module that at that point you can import in different stacks and just provide the different inputs that are expected. And it will do the same things as if you were writing that same code copy-based into your actual ACL code.

So that's a nice feature because basically by using the exact same syntax, there are very small differences, you get that modularity and it feels like importing functions in a programming language and just calling the functions. So that's something I really like from Terraform. But also there are other ways to extend Terraform. There is already a concept of provider. There are a lot of built-in providers like AWS, Azure, and all sorts of different providers. But of course, you can also create your own if you want to support, I don't know, any provider or any cloud service that is not natively supported. Or if you just want to do custom things to interact even with providers that are supported, but maybe using features that are not currently existing in the actual built-in providers. And another interesting thing that I used in the past and I think is not that uncommon is this idea of null resource, which is basically a way to say I'm not really defining a resource that Terraform itself needs to manage. It's more I want to have a hook in my provisioning steps to say this is kind of a virtual resource and I can define conditions like, I don't know, maybe when something else changes. And then with that condition, you can attach, for instance, a script or something else. And that way you can create mechanisms to say, okay, maybe before every deployment, if this particular condition happens, run a script that, I don't know, maybe tries to get an SSL certificate from somewhere and then use that certificate as part of your stack. So that's another, I suppose, easy enough way that you can create custom hooks into your Terraform deployments. In that term, how do we compare extensibility from Terraform to CloudFormation?

Eoin: Yeah, this is an interesting one because it used to be difficult when you had, if you had a gap in the supported resource types in CloudFormation, you were quite limited, but now there's so many options, there are almost too many. So the simplest one is probably custom resources where you can fairly quickly create a custom resource and you use AWS Lambda to fulfill the creation update or delete of that resource in your account. And that's reasonably straightforward to create. It can be a little bit difficult to troubleshoot, but it's fairly easy to get started if you find that there's a gap in functionality or you want to create something unique to you in CloudFormation. Now you also have support for CloudFormation modules now.

So you can create either like a single resource or multiple resources. And it's a bit like a CDK construct, but you're just doing it with declarative CloudFormation. There's a new thing called the CloudFormation registry, where you can then register those modules publicly. And another thing you can put into the CloudFormation registry is a CloudFormation resource type. And this is where you're going all full in on creating your own CloudFormation resource type. And it's a much more involved process. There is some tooling, there's this CloudFormation command line tool that you can use to bootstrap this and to publish it to the CloudFormation registry. But it's essentially like you're adding a feature into CloudFormation properly. So it includes validation, progress updates, all of the features you get with any CloudFormation resource type. I believe it's the same mechanism that CloudFormation internally uses for creating resources. And the difference between it and CloudFormation customer resources is that it's not running in your own Lambda, in your own AWS account. It's running in AWS, in their managed service. And there's something else called CloudFormation Macros, which allows you to do transformations and templating essentially. So people who have used AWS Sam might be familiar with the serverless transform, which allows you to create a Lambda function with just a few lines of code that actually uses the CloudFormation macro feature under the hood. And there's also another popular one called CloudFormation include, which is just for doing includes in your templates. And that's also using macros. But if I would recommend, if anyone is interested in learning more about creating custom things in CloudFormation and extending CloudFormation support where there's missing resources, then the Cloudanaut blog, we've mentioned them on the podcast before. You guys know a lot about CloudFormation and they've created, they've a really good podcast called Three and a Half Ways to Work Around Missing CloudFormation Support, which talks about all this stuff and more in depth. Yeah. So at this point, another topic that comes to mind is what about multi-account deployments?

Luciano: Does any of these tools out of the box allow you to start a deployment that actually is going to deploy resources, not just in one account, but in a few different AWS accounts? Yeah.

Eoin: So when you talk about multiple accounts, Luciano, I guess one of the things you think about is AWS organizations and you can create an AWS organization accounts using either CloudFormation or, sorry, using Terraform, or you can also create them with the AWS SDK. But what I found is that there's quite a lot of missing support across both of these ecosystems when it comes to multiple accounts. Now, CloudFormation does give you stack sets. We mentioned that already.

So you can deploy the same stack to multiple accounts, but another tool which really fills in all of the gaps here is organization formation or org formation. And this is just a really great bit of open source tooling that uses CloudFormation syntax, but extends it with lots of really, really great multiple account deployment capability. So it allows you to create your accounts, but also decide what accounts you want to, sorry, what stacks you want to deploy into different accounts and perform tasks in each of those accounts and manage all of your organizations cross account infrastructure as code. Terraform does have some support, so you can create accounts, like I said, but it's not as powerful as org formation. I don't think there's a really a good replacement for org formations, org formations capabilities. There was something called Controlled Terra Account Factory for Terraform. And I know that AWS and HashiCorp have put a lot of effort into that experience to make it easy for people to manage accounts and all the resources across a large organization. But I think it's still fairly new. It's been a long time in development and it's not yet widely adopted. So I haven't used it. I don't have personal experience of it. So your mileage may vary with it, but it's probably one to watch. Yeah, absolutely. I never had to do cross account or multi-account deployments.

Luciano: So I've seen these tools, but I never had the first-hand experience with them. So I wouldn't be able to compare them or give an opinion on those. Okay.

Eoin: So one of the nice things we can do actually on that topic is AWS just announced the ability to close an AWS account via an API. So it kind of opens up a lot of new possibilities for people to do kind of ephemeral account deployments with infrastructure as code. Yeah.

Luciano: Or even just to experiment with these features because you don't have to be worried about, I just created a new account just for testing out why I close it and that manual process. A lot of pain. Okay. So just to try to wrap this up. So who is the winner? Let's start with when do you use CloudFormesh? I'll leave this to you because you are on the CloudFormesh camp. Yeah. Yeah.

Eoin: The managed service aspect of it and the fact that AWS is managing your state for you is a big advantage for me. But of course that advantage only applies if you're only talking about AWS. So I would say use CloudFormesh if you're just talking about deployments to AWS, you're not dealing with multi-cloud deployments or other third-party resources. Then you'll get the benefit of automated rollbacks. You have lots of good tooling like the serverless framework.

So if you're doing serverless applications, you're going to be able to do that. If you're doing serverless applications, I would say embrace cloud formation and one of the tools that allows you to build on top of it like SAM or serverless framework, it makes it a lot, lot easier. So I would say my decision tree for infrastructure as code is use CloudFormesh. If it's good tooling, obviously it depends on the organization you're working with and what skills people have. These are really important considerations. I'm not going to go in and try to convert everybody to CloudFormation if they're already using Terraform. That doesn't make sense. But I would have a bias towards CloudFormation, especially if you're AWS only. So what's the case for Terraform, the channel? Yeah.

Luciano: So at this point, I think it's clear that I'm more on the Terraform camp, even though I've been using CloudFormation more and more in the last few years. But I still think that Terraform gives you a little bit of a better user experience still today. Like, yes, it's true that you need to learn a new custom language, but also that custom language, I feel that gives you... It's a lot more expressive and you will not feel stuck about just limitations of trying to express certain concept in JSON or YAML. So I definitely like that. And I also like how clear it is the diff of when you do a plan in Terraform, it's very clear to see what's changing, what not. And also Terraform has a very good documentation and very good ID integration. So you get a lot of auto-complete and it's easy to figure out what are the right properties and resources of what you're trying to do. So in general, and of course, this is opinionated, feel free to call me out if you think otherwise, I had a better user experience by using Terraform rather than CloudFormation. So that's maybe one data point to keep in mind.

But of course, if you are a company that is already heavily invested in Terraform, go with that. You don't need to change just because you think CloudFormation could be better. They are almost the same to some extent. The only win that I'm going to give to CloudFormation is that with Terraform, you are a little bit on your own in figuring out how to manage deployment, meaning which machine is going to actually do the deployment and where are you going to keep the state so that it's consistent across deployments. And that it's always a little bit of a pain, but there are ways to automate all of that through CI-CD and by keeping the state in shared places like S3, Dynamo or other shared storages. And of course, one final point in favor of Terraform is that if you are building applications and those applications need to live in different cloud providers, or maybe your application uses resources in different cloud providers, Terraform can give you a lot more control there because it supports out of the box a bunch of different cloud providers. Now, that doesn't mean that it's doing some magic translation for you.

You still need to explicitly say, I want to use this resource with this provider. So if you use, I don't know, an Azure function compared to a Lambda, there isn't any abstraction for you, but you can reuse the same HGL syntax and the same Terraform concept to provision both. It's going to be different code, but you have the same user experience. I hope that that summarizes my opinion and maybe to finish off with this episode, what we can do is give a quick mention to other tools that I personally haven't used, but I've heard them coming up more and more in conversation. And I think the main one is Pulumi, which is kind of a crossover between CDK and Terraform. And by that, I mean that it's like CDK, meaning that you use programming languages to actually define that infrastructure as code. So you get something a lot more dynamic and well integrated with your idea of choice. You don't need to learn a new language, but at the same time, it's multi-cloud. So where CDK is targeting only AWS, in quotes, because I think that that's going to be changing in the near future, but right now is really well-built only for AWS. Pulumi is already aiming to target a bunch of different cloud providers. And we already mentioned CDK as an alternative, but also SAM and serverless, which are built on top of cloud formation. So sometimes it can be convenient for you to use these higher level abstractions rather than just going straight to cloud formation, which might be a lot more lower level and verbose than using these other tools. And I think that concludes this episode. Do you have any final remark, Eoin?

Eoin: I guess I'm really interested to hear what people think of it. And if we've got any strongly held opinions that cite any reasons for using one over the other, we haven't covered here. Because it does tend to be a battle of camps sometimes when you're discussing cloud formation versus Terraform. I think the main thing is that you use infrastructure as code. If you're doing, no matter what tool you're using, you're already in a good place if you have infrastructure as code. And if not, it's time to pick one and move forward. Yeah.

Luciano: And I'm also really curious to know if there is any other tool, maybe something older that I haven't seen, or maybe something really, really new that we haven't seen yet. So definitely let us know if there is any other tool that you think should be part of this type of conversation and people should consider. So thank you very much for following and please give us a thumbs up, a like, share, whatever, if you are getting value from this episode. See you the next time.