AWS Bites Podcast

96. AWS Governance and Landing Zone with Control Tower, Org Formation, and Terraform

Published 2023-09-22 - Listen on your favourite podcast player

In this episode of AWS Bites, Luciano and Eoin dive deep into the world of AWS governance, landing zones, and automation tools. AWS emphasizes the importance of good governance for customers of all sizes, whether you're starting from scratch or have been using AWS for years. But with so many tools available, which one should you choose?

Join us as we explore the best practices for setting up your AWS accounts correctly and discover tools that can automate the process, including AWS Control Tower and open-source alternatives like OrgFormation and Terraform.

Whether you're new to AWS or a seasoned user, there's something valuable for everyone in this episode.

fourTheorem is the company that makes AWS Bites possible. If you are looking for a partner to accompany you on your cloud journey, check them out at fourtheorem.com!

In this episode, we mentioned the following resources:

Let's talk!

Do you agree with our opinions? Do you have interesting AWS questions you'd like us to chat about? Leave a comment on YouTube or connect with us on Twitter: @eoins, @loige.

Help us to make this transcription better! If you find an error, please submit a PR with your corrections.

Luciano: According to AWS, good governance is a must-have from the get-go for customers of all sizes. This area is not trivial, however, whether you are starting from a clean slate or you have been winging it with AWS for many years. Luckily, there are plenty of tools to automate this and guide you to a good setup from the start. But which one should you choose? If you stick until the end of this episode, you'll know some of the best practices to set up your AWS accounts correctly and which tools can help you to automate the work, including AWS Control Tower and open source alternatives like OrgFormation or Terraform or OpenTF, should I say. I don't know. My name is Luciano and I'm joined by Eoin and this is AWS Bites. fourTheorem is the company that makes AWS Bites possible. If you're looking for a partner to accompany you on your cloud journey, check them out at fourtheorem.com.

Eoin: Luciano, you mentioned that this is a must-have, so what must we have? What are our objectives? And we mentioned terms like landing zones and governance. What does this all mean?

Luciano: Yeah, Landing Zone is a term that you probably hear a lot when you start to look into governance of AWS accounts. And it's something that if you look at the Well-Architected Framework is described as basically a multi-account AWS environment that is scalable and secure. And it's something we might have mentioned before in previous episodes, so check out our previous ones. But basically, if you want to summarize that, the idea is that you most likely want to have a multi-account setup for different reasons. And also multi-account setup is going to ensure you that you can have isolation between workloads. For instance, if you have different teams, they're not going to step onto each other's toes. For instance, by consuming each other quotas, you can keep the different workloads separated in different accounts. From a security perspective, you can also use ideas like security boundaries, which basically they can reduce the impact for security incidents. And the other idea is that you probably want to automate the process of provisioning new accounts, so just to make sure that you have all these best practices baked into the process. You don't have to repeat the setup over and over again, and especially you don't want to do it manually.

And finally, a landing zone can also include the concept of auditing and compliance rules. So how can you make sure that every time you create a new account that you have good observability and you know exactly what's going on, you can collect the logs, and if you have shared resources, you can have ways to accommodate for all of that. So basically, when we set up a landing zone, we are looking for a solution that allows us to define all the accounts in the organization structure. You can imagine that as a tree where you have units and then under certain units you can have multiple accounts. We want to have that very clear, but then we want to have a way to define all of that in a formal way, basically.

And we might want to set up service control policies or SCPs, which are permissions that can be applied to multiple accounts within an organization unit. And sometimes they're also referred to as preventive guardlays because they will stop certain actions from happening from the very beginning. And just to give you an example, you might create an SCP that says this particular set of accounts cannot spin up a large EC2 instance. And that's something that at the account level is not going to be possible at all. It's like a policy that stops that action from happening. You can also use things like AWS Config Rule for compliance. These are more detective guardrails because they don't necessarily stop the action, but they will inform you if certain things that are not compliant are actually happening in a given account. As just as an example, I was looking into the set of managed rules that you get with AWS Config. And one of them is if you have an API gateway, you might want to have X-Ray enabled in all of them. So you might enable that rule for you and it's going to notify you if somebody creates an API gateway that doesn't support, doesn't have X-Ray enabled. Another one is CloudTrail for log auditing.

And this is actually really, really important. Imagine if you know that some credentials may have been leaked and you want to investigate if they were used and for what. If you have CloudTrail configured from the very beginning, you know that you have coverage across all your own accounts and you can have that peace of mind that if you ever need to check something, you can just go there and access that information. Another important thing is Delegated Administrator Accounts.

So basically, if you need to run important services in the management account, you basically don't want to use the management account for pretty much almost anything. You want to use other accounts. And the reason is because the management account is somewhat special because certain security features like service control policies don't really apply to them. So basically, if you want to make sure proper security is enforced, try to avoid to use the management account. So you need to think how do we delegate certain things to other administrator accounts. And just to quickly mention other stuff that might be important, building alerts and budget, enabling things like GuardDuty and using Security Hub, SSO, networking, creating, for instance, VPCs and connecting them, or even deployment of common resources. Maybe by standard, you want to have an EventBridge or an S3 bucket in every account. You might want to provision them immediately as soon as the account is created. And again, just to reiterate, this idea is not something that you need to do only if you are creating a landing zone from scratch, like if for a new set of accounts, let's say, it's something you should think about even if you already have a large usage of AWS in your organization, but you never created all this kind of governance structure. And now is the time to put all of that into place. So it definitely seems like a lot of work, and I think it is actually. But we need to find powerful allies here. And in general, things like infrastructure as code can make your life so much easier when dealing with this stuff. What kind of tools can we use and do they all support infrastructure as code? What do you think, Eoin?

Eoin: There are a lot of tools now. The set is growing and the capability of each is growing. And it's kind of catching up with what people actually want because it has been a little bit difficult in the past. We're going to talk about three main ones, right? That's clues in the title. The AWS solution is Control Tower, and we'll dive into that. Then you've got terraform, which a lot of people will gravitate towards anyway. And another really, really good one is org formation, which is an open source one. And you have some other options. You could look at CDK. You could just look at using CloudFormation directly with stack sets and custom scripting or orchestration to apply it to multiple accounts. You can't provision accounts with CloudFormation.

You can set up individual accounts with CloudFormation, but there are some gaps you need to address yourself, and then you have to orchestrate it all. So we're going to focus on Control Tower, org formation, and terraform. And we might as well start with Control Tower, since this is the one that AWS is pushing and talking a lot about. It has been, I guess, built on over the years and is becoming more and more capable. I think people are beginning to take it more and more seriously as a contender here. The main thing I would say about Control Tower and that differentiates it from all the others is that it's all driven from the console. So while you can have some customizations in source control, it's not really infrastructure as code get ups from the beginning. The setup and the administration of it is all through the AWS Management Console. Now, this is great for some people. I know a lot of people who think this is fine, but it's not for everyone, particularly if you want to have all that control and visibility that goes with infrastructure as code. I think one of the good things about the Control Tower approach is that it has an opinionated set of recommendations and best practices out of the box.

So it can help you to kind of jumpstart setup without worrying about all the different decisions you have to make. AWS are basically just giving you the recommendation. With other solutions, you tend to have to codify all of these practices yourself, although you can always take the approach, which I have seen plenty of people do, which is just to copy and replicate the control tower best practices in one of the other tools. Cost wise, there's no additional cost for control tower itself because it is essentially something that orchestrates a lot of other AWS services.

And it's those services under the hood that you'll pay for - things like AWS Config and CloudTrail and all the rest. When you go into the console and you set up Control Tower, it'll set up an organizational unit structure for you with security OU, which is the one it has to create from the start. It mandates that one. You can also create a sandbox OU for other accounts at the start, and then it'll create your log archive account and your audit account.

It will also create things like the Identity Center, SSO users, if you want that, and create service control policies from the start. So it'll do things like ensuring that you can only use the regions you want to use and other preventative and detective controls. So it's basically using things like AWS Config for the detective controls, AWS service control policies for the preventative controls, and then it gives you a nice UI, a dashboard on top of all that, and it presents us all in pretty much one pane of glass.

Now you can set it up with new accounts that don't even have an organization set up yet, or if you've already got an organization set up, you can kickstart Control Tower in there too. It'll basically just leave everything you have and set up new resources in parallel. So it doesn't migrate any existing stuff you have over to Control Tower or anything like that. It sounds pretty simple. It takes about half an hour to get it set up in an account. But of course we know that there's a lot of services out there where they're quick to set up, but can be painful to run in the long run. With Control Tower, it can be frustrating because it does try to hide a lot of stuff from you. And when things go wrong, it can give you very vague errors. Like I've seen errors that just say fail to set up landing zone, please try again.

And you don't really know what to do with that. And I was actually just trying with this today as preparation for the episode in one of my own accounts, I went through the Control Tower process and it failed because it said there was some error saying, oh, we can't deploy the stack because the bucket policy for the CloudTrail bucket has an invalid bucket policy. And I was thinking, well, I didn't create the policy, Control Tower did, yet it failed. So it can have a few rough edges. And I sometimes think that Control Tower is best if you've got somebody from AWS, like your technical account manager, your solutions architect, or a good support or a good support agreement in place and they can guide you through the process.

Now, there are some pretty cool features like the Account Factory. So we mentioned the ability to create new accounts in an automated way. Control Tower has this Account Factory, which allows users to come along and self provision accounts. And then you can deploy resources into them, like specific service catalog products. So you can ensure that people can self serve when they need new accounts, but you can also have some guardrails around that. That's pretty nice. Another thing they have is customizations. So Control Tower customizations are a means for you as an administrator of an organization to say when people create new accounts, resources can be automatically provisioned in them. So the way you work with Control Tower customizations is actually a whole CloudFormation template you deploy once you've got your landings on set up. And that puts a whole load of resources into your AWS account, like event bridge rules and step functions, a code commit repository, code pipeline, S3 buckets, CloudFormation, the whole lot.

And it's basically this big machine that will listen for things like Control Tower accounts being created. And then it responds to that event, kicks off Lambda functions, code pipeline, pulls templates from your code commit repository and deploys them to all the different accounts. So it works pretty well and it achieves the goal of having the ability to customize what happens when an account is created. I have to say though, that the whole implementation scares me a lot.

It's one of those features that it seems like a Rube Goldberg machine where you've got all these AWS services and one thing happens, it kicks off another, kicks off another event. And there's, if you can look at the step function implementation, it's like got an amazing number of states. And I just kind of worry a little bit about what will happen when this goes wrong and I have to troubleshoot it because it's a bit of a leaky abstraction, if you know what I mean. So, but Control Tower is getting a lot of popularity and I think a lot of even bigger companies now enterprises are starting to adopt it. So it's definitely worth checking out, especially if you're of the kind of letting AWS worry about it approach and don't want that really tight control and customizability yourself. What other options should we talk about next?

Luciano: Let's talk about OrgFormation. It is definitely quite different from Control Tower. The main thing is that is a totally open source project. So it's a community effort. And the good news about that is that if anything feels like magic, you just go and get up, check out the code and try to figure out exactly what's happening when the magic behavior is presented. It focuses on simple extensions. So it's basically trying to enhance the capabilities of CloudFormation and then makes it a little bit simpler and more intuitive in a way, because if you know CloudFormation, you can understand what's missing and what org formation is giving you on top of CloudFormation. And because it supports CloudFormation, the idea is that they keep extending on the idea of using YAML files. So it's all infrastructure as code. You have a special CLI that you need to install from the repository. And at that point, you can use the CLI to give you kind of an initial structure where it can create all the, you can import all the existing AWS account that you might already have. And it generates all the YAML files that contain the definition. And that could be a starting point for you. For instance, it's going to also figure out if you have SCPs, it's going to import all of them into this definition. And again, this is a great thing because at that point you can put all of this stuff in a repository and then you can manage changes into your account structure using source control. When you want to deploy changes, it's basically running CloudFormation, but it needs to orchestrate the execution of different CloudFormation stacks across multiple regions and accounts. And it can do that in parallel as well.

It is interesting there because you might be wondering, okay, that's a fairly complicated bit to execute correctly. And if there is an issue or in order to understand what's actually changing or not, it needs to keep the state somewhere. And if you have been using Terraform, that concept should be very clear to you. How do you manage the state of changes? And it turns out that org formation uses the same idea, so it can manage its own state and it stores it into an S3 bucket. So you can actually check it out how it is stored and how it keeps track of all the changes. You can use it to manage the organization and accounts, but also you can deploy stacks to multiple accounts. So in that way, it's similar to Control Tower because if you want to provision a new account and deploy a set of resources, you can easily do that as well.

And the way you do that is by just defining CloudFormation code. So if you are familiar with cloud formation, that shouldn't be anything surprising. You should be able to use it and learn how to do that very, very quickly. There have been some interesting developments. For instance, you can use CDK and Terraform as well, if that's something that you would prefer to use. So this is actually a nice feature because it gives you still all that kind of orchestration, but then you can pick the tool of your choice to write the infrastructure as code for the things you want to provision in every account. And another very cool feature, which is probably a little bit outside the scope, but still very closely related to the project, is that there are plenty of custom CloudFormation resources that you can use to basically fill the gaps where CloudFormation is lacking a little bit. For instance, SSO assignment groups, service quotas, and much more. There is a repository, we will have the link in the show notes that you can check out to see all the additional resources that the project is providing to you to make this whole experience even more powerful. This is powerful. There is one big limitation. I have to say I really like it, but we have to be fair and mention the limitation as well. It is not simple to do a diff, for instance, when you have done a number of changes in your infrastructure as code definition and before you deploy, you want some kind of reassurance that the changes you are trying to apply are actually the ones that you want to see, like the effect is actually something you might want to see before applying it. That feature is missing, so if you're doing something serious, it's not as simple as doing a Terraform diff or a CDK diff if you have used these other tools. That feature is simply lacking, so you just run it and hope that everything goes well. Maybe something that can be fixed at some point in the future, but right now it's a pretty important lacking feature that is worth mentioning. There are some very good examples which you can use as a starting point to create your own landing zone structure, and we will have a link to the specific part of the repository that has the examples in the show notes. That's pretty much it. Should we talk about Terraform?

Eoin: Terraform is one thing that's going to give you a good diff, which is probably a marked difference to OrgFormation. Although, I suppose, with OrgFormation, we have had the ability here at fourTheorem to work around it by just implementing our own diff on top of it, but it would really be nice to have proper implementation. Now, if you're not excited by OrgFormation because of these disadvantages, they both have limitations when it comes to either feature set or the level of control and visibility you get, but Terraform is a lot more mature than either of those solutions, I'd say. Now, it's not necessarily designed for multi-account deployment, especially across a large number of accounts, but it still has a lot of distinct advantages. You could provision AWS organizations and accounts just as Terraform resources, and there's a lot of great community modules that make this whole setup easy as well.

You can also create your own modules for resources that you want to be deployed across multiple accounts to make it easy for teams to get onboarded quickly. I think the whole idea of using Terraform for this has improved with Terraform Cloud as well, which is just a really nice managed solution to manage your projects and your workspaces and integrates very well into GitHub and AWS. Previously, it was a little bit difficult when you had to manage your state yourself with S3 or DynamoDB. Terraform Cloud makes that a much more taken care of, robust solution. You can also provision non-AWS resources, another major advantage. So if you wanted to think about deploying Azure resources or GitHub repositories or even Terraform Cloud workspaces themselves, you can do that with Terraform modules and do it in the same projects as your AWS resources.

When you do this, you get very nice deployment controls, so you can have really nice GitOps workflows, and you can see when you have pull requests with changes to your infrastructure and your organization, you'll get a very nice Terraform plan. It'll integrate well into your pull requests. You get a really good visibility of it in Terraform Cloud, and you can put in place manual approvals. So I think if you're a mature organization already familiar with Terraform, this is going to really appeal to you. I think the only real disadvantage with Terraform is that it's not really that easy, I don't know the good way at least, to have a dynamic number of Terraform providers. And when you're deploying to multiple AWS accounts, which is essentially what we're talking about here, you have to pretty much declare each Terraform provider for each account.

So it doesn't really have a seamless kind of Account Factory way, or for every account, deploy the same stack concept like you do with control data and information. The more idiomatic approach in Terraform is basically to copy paste the boilerplate at the start that says, okay, here's my entry point, my main for a new account. And then within that, you just use modules to compose everything else that goes underneath it. So you don't have to have a massive copy paste everywhere, but you do have to have a kind of copy paste the entry point. And once you do that, then you can integrate it seamlessly into your deployment workflow and get your diffs. You can get Terraform plan, which is like a really nice feature every time you have a pull request. And then when you merge it, you can have your approval workflow. I'm not an expert in this area at all, but luckily we have a colleague, Conor Maher, who has done a huge amount of work in this space. And he's also provided a nice demo repository showcasing a really nice mature landings on set up with Terraform cloud, AWS and GitHub, and it's well documented as well. So we'll have that link to Conor's Terraform demo in the show notes.

Luciano: Yeah, let's close this episode with some honorable mentions. Another one is CDK. And there is a specific example that we will have again in the show notes that shows how to do landing zones using CDK. It's not necessarily the best solution for OptiMs or anyone will prefer a more declarative approach because CDK is more like use a programming language and you instantiate classes that represent resources and then combine them together. So it might be a little bit different from what you're used to if you do declarative stuff, but it's still very powerful and it's still a very good dynamic way of doing the provisioning of all the different accounts in your landing zone structure. And you can make it very modular by using this idea of constructs that are somewhat similar to Terraform modules, but again more in the concept of a programming languages where you can import a library and that gives you a class that you can just instantiate and represents like an entire stack where you can apply certain customizations very easily. So CDK is an option and you might want to consider it as well, especially if you have CDK experience. The other one is AWS Control Tower Account Factory for Terraform, also called AFT for short, and it's a way of using Terraform to customize accounts instead of CloudFormation. And once you have Control Tower landing zone already set up, you can enable that and then use Terraform if that's something that's more familiar to you.

We didn't try it yet, so we just heard people using it and being relatively happy with it, so check it out. It might be worth experimenting, especially if you like kind of this mix of features that come from Control Tower, but you also like the Terraform more as a kind of language for writing resource definitions. Now to wrap things up, I'm gonna try to do a quick summary. I think what we mentioned today is that the best option is really... it depends on your context. I know it's a bit of a cliche answer, but it really depends on the level of expertise in your company, the kind of tools that you might have used already, and if you prefer a specific approach in terms of maybe you prefer to go through the UI, then Control Tower might be a little bit more friendly to you.

You prefer to do infrastructure as code, so maybe Terraform or OrgFormation are a little bit more ideal in that sense, so definitely try to weight all the pros and cons of the different approaches and pick the one that might be most suitable for your organization. I think if Control Tower had some way of supporting infrastructure as code, it would come out much stronger, at least in our view in this comparison, but it's still a good tool if you don't really care too much about infrastructure as code. We will have more links in the show notes. We found some more additional deep dives and additional material that you might want to check out to understand even more about this topic. And again, there are probably many solutions out there. Let us know what works for you, if you liked a specific tool, why, or if you didn't like some of them, what is the issue that is missing. I think if we can have an healthy conversation about these tools, especially on the open source ones, chances are that we can get the features that we are looking for, and we can even contribute to make these features happen. Now, one last thing that I have to say is that we are approaching 100 videos and almost 2,000 subscribers on YouTube, so please help us reach out to that milestone by subscribing, and if you can please share this podcast with your colleagues and friends, we would greatly appreciate that. So thank you very much, and we'll see you in the next episode.