Help us to make this transcription better! If you find an error, please submit a PR
with your corrections.
Eoin: There are loads of reasons you might want to find out what resources are in your AWS account. Maybe you've got a lot of cruft you need to clean up, or you want to sort out security issues and get some compliance. How would you explore and query what actually exists in your AWS account so that you feel in control of a tidy cloud environment? Today, we're talking about tools and services that help you to answer the question, what's in my cloud? We are going to give you some practical suggestions for tools to give you visibility and queryability in AWS. I'm Eoin, I'm here with Luciano, and this is the AWS Bites podcast. AWS Bites is sponsored by fourTheorem. fourTheorem is your AWS partner for migration, architecture, and training. Find out more at fourtheorem.com, and you'll find the link in the show notes. Luciano, what are we trying to achieve when we're talking about finding out what goes on in the cloud? What are our use cases?
Luciano: Yeah, I think it's very common that you look at your account and you realize it is a bit of a mess, and there are a lot of resources there that maybe you created years ago and you don't remember anymore. It is actually very common that because AWS is such a learning journey that when you start using it, you either didn't have a lot of time to learn how to do things properly, or maybe you were just experimenting, you weren't even sure you would end up using AWS for a long time, you didn't even use infrastructure as code, so you might end up creating a lot of things manually.
And basically, what could happen is that eventually, you might have something that you actually run in some kind of production environment, but at the same time, it is leaving with all that crap that you accumulated over the years. So that might be one reason why you might end up with an account that is not really tidy. So you might want to figure out, okay, what's going on right now and what should I clean?
Other reasons could be maybe you just want to be sure that you are not using resources that you don't really need. So it's more of a cost optimization reason. You want to remove everything that you are not actually using, because of course, you might be paying for that. Other reasons could be more related to compliance or security, where maybe you want to make sure that you don't have resources that are not really up to date or up to the latest standards.
One example could be maybe you want to make sure you don't have lambdas that are still running in Node.js 14 runtime, and you might want to update those in case you find them out to the more recent ones. Other ones could be, again, in security terms, you might want to check your VPCs and make sure that you don't have, I don't know, VPCs that are too open, or maybe you are not even running any resource on them, or maybe you want to make sure your street buckets are not public and all these kind of things.
So it could be also more specific about compliance, because maybe at some point you define a set of good standards and you want to make sure that every time something changes in the account, it is still respecting the standards that you set up for your organization. So in order to do all these things, basically you need some kind of capabilities. You need to be able to list all the resources in your account and somehow understand what are the different types of every resource, so you can kind of categorize them logically. It might be useful to be able to run queries or filtering, because you might end up with a very big list with thousands of items. That's not going to be very helpful, so you should have some kind of control on figuring out exactly what's in this list. And of course, what if you have things in multiple regions, or even worse, in multiple accounts, those are like extra layers of complexity. So ideally you need tools that can do all of these things across regions and across accounts. And I believe that we have some tools in AWS to do that, right?
Eoin: Yeah, and your choice of tooling in AWS is going to depend on whether you're doing just like an ad hoc exercise, or whether you're trying to put in place some kind of formal, continuous inspection of resources over time, and then also really meet like third-party compliance or internal corporate compliance. So if we're talking about doing it really properly with that continuous practice, then AWS Config is probably the good place to start.
So this is something that keeps track of resources and registers resource configurations. So it's called configuration items. It's kind of a standard practice in IT service management to record your configuration items, your assets in your cloud, and keep track of them. So Config supports quite a large number of services. It's growing all the time, so it's not every service. And this is something to be aware of.
I think it's about 84 services, from what I can count. And that includes support for about 240 resources. So what it does is it gives you a basic repository of all of the items in your cloud by continuously monitoring for you. And it gives you then query capability. So there's an advanced query support where you can run SQL-like queries. It's not exactly SQL, but it's a subset of SQL that you can run on most of those resources.
So again, not supporting all of the resource types that are captured, but probably most of the ones that you're interested in. And one of the nice things about AWS Config is that it will allow you to aggregate all of the configurations from all of the accounts in all of the regions you want. So we would typically set up AWS Config for all of the regions that you support. You disable the regions you don't want to support, and then you gather all of that data into one centralized aggregator within AWS Config.
And then you can run queries to find out how many EC2 instances do I have across all my organization, all accounts, all regions, and what's their configuration. So that's really nice. You'll get a picture of the configuration, and then you can see how the configuration item has changed over time. And the other thing that it's really aimed at within Config is evaluating compliance rules. So you can use a lot of the pre-canned rules that AWS will give you, and those rules can also then form part of what's called a compliance pack.
So that's a collection of rules all together that might tell you whether you are meeting the requirements of SOC 2 or NIST or CIS benchmarks or whatever industry standard you might need. You can also add your own rules using either Lambda functions to evaluate them or the CloudFormation guard syntax to check if resources meet your standards, essentially. So all in all, if you want to up your compliance gain, but also just do querying of resources and monitoring over time, Config is a good source.
Now, the downside with Config is always, when it comes to certain cloud deployments, pricing might hit you, because it's priced per configuration item recorded, and that's priced every time it changes. And also, if you're evaluating compliance rules, every time they're evaluated, you're charged as well. So it's like 3 tenths of a cent for item recorded and one tenth of a cent every time you evaluate a rule.
And a lot of the time, the pricing there is fine, but if you're somebody who deploys frequently and you're creating lots of resources and you're trying to capture rules for lots of different resource types, that can quickly escalate and you get an exponential effect there. You also might get a surprise when they add support for new resource types that they didn't support before, because your pricing, your bill will then go up, even though you didn't change anything.
So it's definitely worth looking at. I wouldn't dismiss it just based on pricing, but just understand the pricing model well. Now, since we're talking about pricing, we'll go to something then next in AWS, which is completely free. And this is released last year, end of last year roughly, called Resource Explorer. And it's not really for continuous monitoring, but it's more for ad hoc querying. How do I find out what's in my account right now?
How do I find that EC2 instance that I'm pretty sure I created last week? And Resource Explorer is pretty easy to set up. It's much simpler than Config, for example, and it builds an index. So it basically builds a cache that you can query and then you can query even just from the search bar at the top of the AWS console. So it's quite nice, quite easy to use, and you can aggregate all of the data from all of the regions you want into one account, or into one region.
So it makes it easy to query cross region. Unfortunately, the downside is that it does not support multiple accounts yet. So you're just running a query, you have to run your queries per account. And that's a bit of a downside, since people are using more and more organizations with lots and lots of member accounts. The number of services supported is still quite small. It's just 18 services, but within each service, it's very comprehensive in the actual number of resources it supports. Like within RDS, you can query parameter groups, for example, which is very low level of granularity resource. But it is free. So I would say, I suggest to people, definitely try that one out. It's very easy to onboard and get up to speed with. Before we talk about maybe the non AWS services, what else have we missed in AWS, Luciano?
Luciano: Yeah, there are another couple I will mention. For instance, resource groups and tag editor, which are a little bit older than the ones you mentioned. There are some, probably they're going to be, eventually be superseded by config and resource explorer. But they are also free. So maybe also worth having a look. And I suppose they're not necessarily built for this particular intent, but they can be useful for instance, to tag resources.
And you have different pieces of utility to actually find resources in bulk and apply tags. So maybe something that once you know what you have in your account or you start to discover, you can go to tag editor to apply tags. And we'll talk a little bit more about that later on. Another one which is actually really interesting and you should definitely rely on that one is CloudTrail. And the reason why I think it deserves some mention here is that even though it's not necessarily built for search, it basically tracks the entire history of how your resources change in a given account.
So it's kind of an audit log. And therefore you can use it in many different ways. For instance, you could go all in and use CloudTrail to build your own inventory, maybe dump all the data in something like Elasticsearch. And at that point you have a lot of like queering power and you can be able to really understand what's going on in an account. And even more interesting, CloudTrail can help you to answer the question, how did something get into my account?
Not just it is there, but also you can try to figure out the history of that particular resource. So who created it, when it was created, how many times did it change? So definitely worth considering because of that. And more often than not, you might find something through one of the other tools we mentioned. And then you might want to go to CloudTrail to actually try to understand the history of that particular resource. So CloudTrail is definitely something that is recommended to learn and to enable in every account and then start to use it heavily. Yeah, so you mentioned that there are other resources that are not AWS services. So which one of those?
Eoin: Well, one of the areas where third-party services can really help is where you just talked about CloudTrail and how Config and Resource Explorer will give you the ability to find out what's in my account now. But you have to go to CloudTrail to find out how did it get there and what's the story behind this resource. But those are two disconnected things and you might go through, use Athena to query your CloudTrail logs to try and do some detective work to go back.
And I've kind of been there before. It's a little bit cumbersome and it takes time. But there's one commercial offering that I have been playing with recently and I managed to have a conversation with the founder of this company. So I decided I'd give it a try. And the product is called Resmo. And this is essentially a paid SaaS application and it's similar to AWS Config in its intent, but it supports a lot more than just AWS.
So it's designed to support all the cloud providers and also a lot of different SaaS applications. So the advantage, I would suppose, as always when you go to a third-party provider is that you're getting a much more consistent and usable experience, much easier to get started. You don't have to do any configuration. It's all done for you. You can start once you set up access to your AWS account, which I thought was a fairly seamless onboarding experience and not too intrusive from a security point of view either.
It will scan your resources and immediately you can start querying them with like full SQL syntax. It'll also give you lots of nice queries out of the box and compliance out of the box. This is kind of refreshing having set up Config a few times. With Config, you have to pick your compliance rules and compliance packs and turn them on and set up notifications with commercial offerings, in particular with Resmo.
That is all out of the box. And the pricing model is different then again. So it's free for up to 3,000 resources and then you start paying per month as the number of resources. You hit basically tiers after that. So, yeah, if you're looking for something a little bit more user-friendly, that definitely ticks all the boxes. And as I say, also allows you to search outside of AWS and start correlating resources across different providers, which is really nice. So that's one of the commercial ones and I'm sure there's lots more. What about open source offerings? There's actually a couple of options there, right? So if you're trying to save on cost or you don't need to invest in any of these tools, what suggestions would you give Luciano?
Luciano: Yeah, so the two that I know that are probably the ones I would recommend people to just give it a spin and see if they work well for you. One is called Steampipe, steampipe.io, which comes actually from a company called Turbot, turbot.com, which is focused on extensive like cloud security, compliance, posture management. So they probably created Steampipe because it's something that they've been doing a lot in their line of duty.
So they basically created this as an open source tool and everyone can actually use the basics of this approach, which again, it's SQL-based. So it will allow you to run kind of SQL queries on top of your account and it will give you the ability to figure out exactly what exists in your account by running these queries. There is a CLI, there isn't really a UI, but yeah, it's an open source tool. So probably you can get a lot of value, you can get started quite quickly from your own desktop and that way you can just keep going.
And then if you need something more sophisticated, you can move into some commercial offering. Very similar is CloudQuery, cloudquery.io, which is basically an application that syncs cloud resources to a Postgres database. You can even use other destination if you want to. And then at that point you can use Postgres SQL to basically query what do you have in that account. Supports other vendors, so it's not just an AWS tool.
You can use, for instance, Vercel. I've seen uses, supports other cloud providers. I've seen it also supports GitHub, so you can probably query your repositories and the people in your organization that have access to those repositories. I think it also supports authentication providers like Okta and OutZero. So definitely there is a huge variety there and this can be actually useful to even understand.
For instance, if you are using these services in correlation with your AWS account, maybe you can do queries to try to understand what is the relationship there. Are you syncing data from one to the other? And is this data actually in sync? Is there something that you are missing? And it also supports compliance rules. So you can define your rules and use this tool to evaluate against those rules and see what's your current posture. Again, this is a CLI tool, so there isn't really much of a UI, but nonetheless it's something that can be useful and you can be really productive with it. Now, the next question I have is we have been talking a lot about tools that will allow you to discover exactly what do you have in your account. So you might end up having some surprises and figure out oh, there is a lot of stuff that I don't want to be there. So what do you do then?
Eoin: Yeah, we actually had this episode way back, episode 11, which was how do you move away from the management console and we'll link that in the show notes. So that gives you a bit more detail in there on how you can do that. I would say though there's kind of no silver bullet here. If you have a huge number of resources that aren't part of infrastructure as code and you need to tidy them up and clean them up, there's a bit of labor involved.
If you've got an account and you're happy to clean up everything, that makes it a lot easier for you. So there's a couple of tools there. One is called Cloud Nuke and the other one is called AWS Nuke and they kind of do what you would imagine and allow you to destroy everything in an account. But they also have support for filtering the resources you want to delete based on resource type or tag. So what you could do, you could always manually delete things one by one.
That's sometimes the best way of doing it. But it's a good idea to kind of label things first. If you're cleaning out your house, you might have one box for things to throw away, one box for things to keep and another for things to give away. So you take a similar approach here and you'd use tags to label things. So you might tag things first by project or by happy to delete and then you can kind of query that tag later, go through them and either import them into infrastructure as code or just delete them.
And there's nothing there that will make that process really automated for you unless you're happy to just blank it, delete everything. If you've got old, I'm always happy if I find an old resource, I don't know what it is, but then I look at the tags and I see it like a CloudFormation stack name because then I can just go to CloudFormation, look at the whole application and decide if I want to delete the whole stack or not. And that's one of the beautiful things about CloudFormation is that it'll show you that it's part of an entire application and you can then destroy the whole thing together. But what if there are things you want to keep, Luciano, what would you recommend if you want to tidy everything up?
Luciano: Yeah, you already mentioned it is a good idea to start to define all these things as infrastructure as code because at that point you can store that in a repository and you can actually start to create a process to keep these things in order and evolve them over time without having to think too much about kind of a manual process that might change every single time. So it will help you to keep things more in order going forward and to actually distinguish what's really important and what's not and what's part of a specific application rather than maybe something else that you are just playing with for some time.
And of course you can do all of that manually, but it might be a lot of work depending on how many resources do you have to convert into infrastructure as code. So there are some tools that can help. And to be fair, I would like to say that your knowledge might vary a lot. Like these tools are not necessarily perfect. They have their own quirks. So they might be good to try them, but I still find that the manual approach is always a little bit more reliable.
It takes more time, but at least you are in charge of deciding what needs to happen, what really needs to be kept, what kind of tags are you going to apply. So definitely it might be a lot more frustrating having to do all that work manually, but I think the end result might be a little bit higher quality if you go for the manual approach. Nonetheless, feel free to try those kind of tools for automation. One is called Terraformer and the other one is called Former2.
They are slightly different in what they offer and we'll provide the links in the show notes so that you can check the set of features. And again, I can just remark that it is important that even if you decide to keep things, it's actually even more important in that case to apply correct tags because at that point you can even track the cost. So maybe you decided to keep something without really knowing how much it's going to cost you.
You can tag it and then you can start to observe cost and then maybe later on re-evaluate that decision and maybe realize, no, this is too expensive for the value that I'm getting. I'm just going to get rid of it. Or maybe you can actually realize, no, this is actually cheap enough. I can keep going and maybe I will be working on this a little bit later. And another idea could be consider moving things into different accounts if you find yourself having a lot of mixed things, which I probably do a lot with my own personal account.
I have some small kind of production projects maybe from some of my own personal automation stuff or small applications that I built. But then I mixed up that a lot with, I don't know, maybe I'm just playing with some concept, I want to try some service, and then I ended up mixing all of these resources. This is definitely not a good practice going forward and if you find yourself doing that in corporate accounts, I think it's probably a good idea at that point to start to separate and create sub-accounts, one maybe for production application or even multiple ones for different applications and then keep other accounts for more kind of experimental processes where you can start to apply different policies, you can start to apply different rules to clean up resources over time and you are not going to have the risk to incidentally maybe deleting things that you're actually using in production just because you are trying to experiment with some new service or some new tool that maybe is going to give you some kind of value.
Eoin: Yeah, that's really good advice and I'm really interested to hear, did we miss anything, are there tools out there that could make this a little bit easier, especially the cleanup part, are there other tools that people use out there for querying their cloud and finding out what's going on under the hood? Thanks very much for listening, let us know your suggestions in the comments and we'll see you in the next episode.