AWS Bites Podcast

40. What do you need to know about IAM?

Published 2022-06-10 - Listen on your favourite podcast player

Identity Access Management, also known as IAM, can be an intimidating service when getting started with AWS. But IAM is also one of those core services that you can’t really avoid. In this episode we try to distill down everything that you need to know to understand IAM and start to use it proficiently. We cover what IAM is, why it is so important, how authentication and authorization work, what policy documents are and how to write them, how a user or an application get credentials to interact with AWS and finally many examples, tips and tricks.

In this episode we mentioned the following resources:

Let's talk!

Do you agree with our opinions? Do you have interesting AWS questions you'd like us to chat about? Leave a comment on YouTube or connect with us on Twitter: @eoins, @loige.

Help us to make this transcription better! If you find an error, please submit a PR with your corrections.

Eoin: Identity and access management, also known as IAM, is one of those services that you cannot escape when learning AWS. It can be a bit intimidating to learn it at first. So today we are going to try to distill down what you really need to know about IAM. And by the end of this episode, you will know what IAM is and why it's so important, how authentication and authorization works with IAM, how policies work, how users and applications can get credentials, and a few general interesting tips and tricks. My name is Eoin, I'm joined by Luciano, and this is the AWS Bites Podcast. IAM IAM IAM IAM So this is a video for starters, but also for those who are struggling, like a lot of people, I feel, with making sense of the various concepts of IAM. How do we start with this one? What is IAM? How would you just explain it to a beginner?

Luciano: Yeah, I guess the simplest way to describe IAM is just define the acronym. So it means Identity and Access Management, which basically tells us that it's a service in AWS that allows us to define identities and access to different resources. In another way of saying it, it's basically a way to apply fine-grained permissions to different types of services and resources. And it is very important because it's not like an isolated service. It really has ramification in everything you do with AWS. So you really need to understand how it works because probably in your day-to-day work with AWS, you need to define permissions. You need to understand what's the language behind those permissions, the policy language, and how to read and write different policies. So all the stuff means that you basically need to be very familiar with IAM to be proficient with AWS. So, yeah, I guess maybe we can explain next how does it work at a very high level. What do you think?

Eoin: Yeah, let's try that. So the main thing with IAM is that you're trying to grant access to resources for specific principles of some sort. So a principle is going to be a user or a service. So maybe you can think about the formula as being made up of who. So what's the user or the service? Who's trying to access something? And what level of access you're giving them? So what are the actions that you want this user or service to be able to perform?

And what are the resources associated with this action? So who are you giving access to? What can they access and what actions can they perform? So if we break that down a little bit, right, we're talking about resources. So what is a resource? AWS has hundreds of services and each of those services generally allows you to create resources. So we're talking about an S3 bucket is a resource, a Lambda function is a resource, an API gateway, API is a resource.

They have other things like a load balancer, an EC2 instance or a CloudWatch log group. Everything is a resource, right? And you can define access to those resources or to within specific subsets of those resources in some cases, actually. Then we can talk about the principle, right? So the principle, as we mentioned, could be a user, but it could also be an AWS service itself. So people may be familiar with an IAM user. When you, IAM user is one of the older principles there. So you can create users in IAM and use that essentially as a way for people to log on to AWS and do stuff. And you can log on by the username and password on the console, or you can use keys. What about services? So what, can you think of some good examples that would be illustrative of what a non-human interaction with IAM would be?

Luciano: Yeah, for instance, I don't know, if you are creating an application that runs on an EC2, for instance, that application probably needs to access all the resources in AWS, maybe needs to send a message to an SQS queue, maybe needs to write files to S3. So that application will need somehow to be authorized to perform those actions. And yeah, it's important to understand that by default in AWS, everything is blocked by default. So if we don't explicitly say that that application is authorized to perform those actions on those resources, you will get a permission error when running the application in an AWS environment. So that's, I think, a good example and applies to every compute layer, even Lambda or ECS, they would need to have some sort of policy applied to them that guarantees them that they can perform certain actions against AWS APIs and resources.

Eoin: Okay, that's interesting that you mentioned that everything is, you don't have access by default, you're denied by default. So it becomes a little bit challenging, I guess, you have to understand how do you grant access to this one specific thing for the specific principle under specific conditions, maybe. And that can be hard to get right and probably explains why a lot of people like to use wildcards for everything and grant access, because it's just easier. But of course, maybe something people can take from this episode is that it is possible to be specific. It's just a question of understanding how. So where do we start? How do you define those relationships between users, actions and resources?

Luciano: Yeah, I think the main concept to explore is this concept of policy. We already mentioned it a few times. So what does it mean? A policy is basically a document that is generally written in JSON format. And this document contains like a description of a particular permission or set of permissions. So let's try to describe what are the main parts of a policy. Generally a policy contains one or more statements.

So in a policy, you could have multiple statements, but yeah, so imagine that a policy starts with an array of things in a way, right? And in this array is literally where you try to define those relationships between the different concepts that we defined so far. So there is going to be resources, actions and other things. So let's go in order and describe all of them. Generally the first thing that you see is something called effect.

So there is a property inside a statement called effect and the value of this property can be either allow or deny. Now this is interesting because most of the times you will need to write policies with an effect allow because you are trying to allow access to something. Rarely you will need to write explicit denies for specific actions. So I don't want to go too much into detail right now, but imagine that you need to explicitly say, I want to allow something to happen or I want to deny something to happen.

Most of the time you're just gonna write allow effects. Then we have another section called action and action is actually interesting because it can be either just a string or an array of strings. And the idea is that you can put together in a single statement, multiple actions. And an action is a string that identifies a specific action that you could perform against a set of resources. And they are generally name spaced for instance, by service, I think is most commonly the case.

An example can be S3 create bucket. And an interesting thing is that you can specify wildcards there. So for instance, you could say S3 column asterisk, that means any action available in the S3 service, I am for instance allowing it if the effect is allow. You can also wildcard more specifically, like you can narrow down a specific subset of S3 actions. For instance, you could say S3 column create asterisk, that basically means all the action names that starts with S3 column create, like S3 create bucket or S3 create access point, or I don't even know if there are others.

But yeah, this is generally common for instance, to try to distinguish read and write actions. Sometimes you can classify all the read actions with like a prefix and then an asterisk and similarly with the write actions. It can be a little bit dangerous because sometimes you end up allowing more permissions that you really want to. So just word of conscious, don't try to use asterisk as less as possible, unless you are really sure that you are describing the very limited subset of actions that you are interested into.

And then the last interesting part of a policy is resource. And again, resource is an array or a string. So you can have either one single string or multiple strings because you can group together multiple resources. How do you define a resource? Like what is the content of the string that identifies a resource? And you generally do that using an ARN, which basically means Amazon Resource Name. And you can imagine that as a unique ID for every single resource that exists in an AWS account.

The interesting thing is that this unique ID are not like pseudo random values, like I don't know, increment that integers or UUIDs, but actually they have a very well-defined structure. So you can follow like a namespace and then you can have an account region and so on. So basically it's like a tree and you can also use wildcards at some point to say, take all the sub tree there. And the common example is when you want to allow, I don't know, for instance, read access to a set of files in S3, you can do that either by prefix because you can say, I don't know, the name, the ARN of the packet slash a certain prefix and then asterisk, or I don't know, if you want to give access to the entire packet, all the files in the packet, you just say the ARN of the bucket slash asterisk. So it's actually a really interesting approach from AWS to give you this ARNs in a way that you can easily define expressions to describe an entire subset of resources. And then I think there is a more advanced use case about condition. Do you want to talk about that, Eoin?

Eoin: Yeah, that's probably worth talking about because I think conditions are becoming more and more common. They were less so in the past, but they're becoming more expressive. So conditions allow you to have more fine grained access for specific resources. So you can grant access to S3 to create a bucket, for example, but you can also restrict it to certain conditions. For example, you might only want to have a condition which allows people to create objects in S3, but only if they have a.lutiano extension.

That's one example. There's also other ones like based on the tag and the request or the tag of the actual principle, you can kind of use it to enforce access control based on the user's principle. And every specific service has a different set of conditions that they support, as well as there being global conditions that are supported as well. So other examples are, if the source IP address is within a certain CIDR block or you're coming from a certain VPC, there's lots and lots of different conditions supported and that's really growing, like I said. So one of the things we can link to in the show notes is a document I really, really like, and I have bookmarked, which outlines all the actions, statements, resources, and conditions that are supported for all the different AWS services. So it's definitely a go-to when you're trying to develop policies.

Luciano: Very complicated issue, but you can be extremely specific.

Eoin: Yeah, and it sometimes feels like a real burden having to do this for people, but it's really powerful as well because it allows you to enforce really good security. Mm-hmm. Where do people start then as a user when they're trying to get up and running with AWS, maybe in their own account or in an account they've been given in their organization?

Luciano: Yeah, that's a very good question. So I suppose as a user, if you're just starting to learn AWS, it's probably okay to have an admin role and just use that to log into the web console and play around and try to create different things, destroy them and see what happens. But of course, that's not acceptable for a production account. Like if you have a production account where you are deploying production applications, try to limit as much as possible write access for humans.

Limit that to read access because of course you need people to be able to log in and check if everything is working correctly, read the logs and all these kind of operational tasks, but try to limit the amount of write that everyone can do. And a much better approach in that case would be to have an account that is dedicated to deployment into production accounts. You provision CI-CD pipelines in those accounts and those pipelines will do all the changes in the production account for you.

And the idea is that you reduce as much as possible like uncontrolled write access from people and you try to do that in a programmatic and auditable way. So you try to have processes that will do that with a very well-defined, I don't know, pipeline or set of steps and all the steps will give you all the trails that you can use in the future to make sure that everything is happening as expected. This is kind of the best practice. So don't be afraid to just use an admin user for your own testing stuff, play around and try to learn as much as possible. But when you start to move to production, then having an admin account there is kind of a big no-no. So try to be careful at that point. And I think we can approach the same topic also for applications. So are there, I don't know, best practices, how do you start to define permissions for applications instead?

Eoin: Yeah, ideally you want anything that's important in your system not to be triggered by a user, but to be automatically triggered by some code running in EC2, Lambda or a container maybe. So the idea there is that you will create a role, an IAM role and you attach policies to it. Same kind of policies we've already described. So you're giving permissions to that programmatic piece of code running there to write to the bucket or to put a message on the queue.

And you try and be restrictive and you would then associate that role with the resource in some way. So with an EC2, you have like an instance role or in Lambda, you have an execution role and lots of AWS services have a place where you can link the role that that service is going to use. And when you create that role, you have to say, I'm going to trust EC2 to assume this role. And the idea then is that you try to keep that minimal as possible because if access is ever compromised to that compute environment, you want to minimize that blast radius, minimize what people can do. So there's a couple of other things there. Maybe it's worth mentioning the identity and access management access analyzer tool that you can use and it can analyze your roles for you and let you know if you've over assigned privileges that aren't being used. That's a really good one. Is there anything that people should be aware of here when they're developing code, creating a role for it? Some people are probably used to running that code also locally. When they run it locally, they're not necessarily using the same privileges. What kind of practice would you recommend to make sure that people have a good understanding of what to watch out for there?

Luciano: Yeah, that's a very good one. I think we'll explain a little bit more later in this episode, how the whole credential system works and maybe at that point, more gaps will be filled and everything will be more clear. But basically one thing that happens when you are testing something locally, when you're running, for instance, a Lambda function somehow locally, is that generally you have authenticated yourself through the AWS CLI using your own personal credentials.

So as a user, you probably have in a development account, admin credentials or very extensive set of permissions anyway. So you are basically running your local environment with those credentials. So what can happen is that you have a false sense of security because you see your application being able to do a bunch of different things and you think, okay, this application is ready to go and be shipped somewhere.

Then you deploy that application to some environment and as soon as you run it, you bump into a permission issue. And there you have to realize that there is a disconnect between when you run things locally and that local environment inherits your own personal permissions as opposed to when you run the same application in an AWS environment. And in that environment, the environment is not necessarily inheriting your own permissions.

It's probably gonna have a more restrictive set of permissions that maybe you haven't even defined. So the permission set is literally as small as possible for that particular application. So yeah, it's important to understand how to basically go and define those application permissions and make sure that they actually work for your production use cases. I was hearing, I think you mentioned it to me a few days ago that on Twitter, somebody was mentioning that that might be an interesting feature for tools like SAM to be able to actually run a local simulation with real credentials. The ones you are specifying for the function, because that's the tricky bit. Even if you are specifying a role, for instance, when you use SAM for a particular Lambda function, then when you run that function locally, it's still using your own personal credentials and not the roles that you have defined. So even if you define the roles, it can be very misleading, that experience. You might think, okay, it's using these roles that maybe you think you define correctly, then you ship to production. There was an error, but you haven't tested that locally. You only realize in production. So that's another thing that might bite you, especially at the beginning. And it's a little bit tricky to realize what's actually going on there.

Eoin: Yeah, that's a good point. I think we've talked about policies being attached to roles. It might be worth just covering briefly where policies fit in with the various other pieces that they can be attached to. So we said we can attach them to IAM users and also IAM user groups. Now, a lot of people say don't use IAM users in groups before. And we did cover this a little bit in episode 12, when we talked about how to manage your AWS credentials, link will be below.

And the alternative to that is to use AWS single sign-on. We're not going to go into that in much detail now, again, refer to that previous episode, but you can also attach your policy statements, put them into these SSL permission sets, and they kind of get converted into IAM roles under the hood. But you also have specific services that can have policies attached. So these are sometimes called resource policies, but an example of a resource policy is an S3 bucket policy.

And that allows you to have specific policies that are attached to the resource itself. And that policy gets combined with all the other policies that come into play, like your identity policy. And the two policies together will work to determine the permissions. We've already covered then the EC2 case and the ECS case where applications will generally assume a role and will inherit policies and permissions from that role. So it's probably, maybe we can briefly cover the interesting technical details about how AWS recognizes who you are, what your principle is, whether you're using a user or an EC2 instance, takes that, what it knows about you, and converts that into some sort of allow or deny for every API call or SDK call you make.

Luciano: Yeah, I think that's a very good way of understanding a little bit more how things work and therefore make the right assumptions when you are building an application for AWS. So there are generally two phases. The first phase is authentication and the second phase is authorization. And authentication is generally, AWS needs to understand who you are as a person or as a service. And then authorization is based on the principle that was identified during the authentication phase.

Is this principle authorized to perform a specific action on a certain resource? And in a certain context, if you put also the condition set that you can have in a policy, right? So all these rules needs to be evaluated, but the rules are evaluated only after that it's clear to AWS who the user actually is. So it's very important to distinguish these two phases. Now, how do you define what's the user?

It changes a little bit, depending if you are doing a logging on the web console. So it's kind of a manual user that is in the web console doing things. Or if you're doing something programmatically through either the CLI or the SDK, which bot are using underneath the AWS APIs. So when you are logging in manually, you generally provide your credentials through username and password. Or if you use federated login, you probably go through a login flow, a federated login flow that involves, I don't know, your Google identity provider or Azure ID, whatever.

Instead, when you're doing something programmatically, you probably have seen that there is this concept of keys to identify credentials. And keys are generally a couple of access key and secret access key, access key ID and secret access key, actually. And this is generally what you need to use when you authenticate your local CLI, but also what you could use when you do API calls or use the SDK.

Now, it's very interesting. There is something I really like, and I'm maybe a little bit nerdy about this stuff, about the authentication protocol that is actually used underneath when calling APIs on AWS, because it's a signature-based protocol. And I really like this kind of protocols. But the idea is that basically most of the applications when you write, when you need to have some sort of authentication, there is a flow that is like, okay, I'm gonna send my credentials somewhere.

I'm gonna get a session token or a cookie or whatever. And then I keep using that session ID to communicate with a backend, right? And that's a little bit annoying because you have these two distinct phases. I need to get my authentication first with my token, and then with the token, I can do whatever I want to do. In AWS, you don't need to do all of that because you already have your secret access key and your key ID.

And the way that it works is basically every single request is signed using your secret. So you never really send the secret to AWS. You only use it to sign every single request. So this way you can just fire the request straight away without having to create a session first. And this is documented in the SIGV4, which stands for signature v4 protocol. So it's actually well-documented and you can read it.

You can even implement it yourself if you really want to. But I just find it very interesting because it's something that I always liked from AWS. And sometimes I even implemented myself in my own APIs with my own authentication. Not necessarily like 100%, but you can take some principles from this approach and use them yourself. So we're gonna have a link to the protocol in the show notes. But there is another interesting way, maybe Eoin you want to talk a little bit more about that. Like if you are an application, don't just copy paste those credentials into the application. But yeah, how do you do it in that case?

Eoin: Yeah, that's a good point. You want to avoid generally copying, pasting credentials everywhere. So there's lots of tools you can use and you can look into it more to avoid having to copy paste credentials, particularly with SSO. But if you're talking about your application, right? So your EC2 instance or your ECS container, for example, you never have to provide environment variables for a process running in that EC2 instance, like your secret access key and access key ID.

You don't have to put those into environment variables. You do not want to do that. Instead, there are mechanisms already in place. We talked about the role that you can attach to your EC2 instance and the role you can attach to containers and Lambda functions. AWS already has a mechanism built in that will seamlessly under the hood, allow that instance to get authorized access to the resources in that role's policy.

So if you've got an instance role attached to your EC2 instance, AWS has something called the instance metadata service. There's actually two versions of it, version one and version two. You should use version two because it's the secure one. But it allows you, under the hood, it will, the SDK, if you make an SDK call to put an object in a bucket, it will use that metadata service to check the instance role and authorize the principal, authorize that code to perform that action or not.

So you don't, there's a basic, essentially a little HTTP endpoint available inside of EC2 and also ECS that performs that for you under the hood. You can look into the details of that if you're curious, but otherwise it will just work. The AWS SDKs are already designed to check for credentials in a specific order, and it'll vary depending on which language you use, but generally it's looking at environment variables, then your credentials file, and then these metadata services.

So that's something to bear in mind, avoid you putting secrets in environment variables as always. Another one to, another thing to be aware of is the STS. When you're dealing with IAM, from time to time, you'll see the session token service or STS pop up. It's used in a lot of different places, but very briefly, if you've got a federated identity, like sign on with Google or maybe SAML sign on with Active Directory, you can use the session token service to exchange your federated identity for some AWS credentials.

And that's gonna be really useful. It's used in SSO, but it's also used in Cognito for providing access to the front end to gain direct access to services on the backend. You can also use it very effectively if you want to, if you have a user who's authenticated in one account, they can use it to get access to another account. So the account you're trying to get access to can grant cross account access essentially for you to assume a role in that destination account. And it means that you don't have to create users and have credentials for every account you need to work in, which would be very difficult to manage and ultimately not very secure. So STS allows you to do an assume role action. And if you're somebody who ends up working with multiple accounts, you'll end up using that explicitly or implicitly one way or the other. Like even in the AWS console, when you switch role that will use an STS assume role under the hood. So it's great because it avoids you having to create users everywhere and manage lots of credentials.

Luciano: Yeah, absolutely. Do we have any final tip maybe before?

Eoin: I think, yeah, I think at this point, right, there's so many resources that are really powerful, right? But one thing to mention is that we've got lots of different types of policies. We talked about the identity based policies. We talked about resource policies. You also have something called a permissions boundary, which is like another set of policies that can be attached to your role that restrict further what that role can do.

Organizations sometimes use this to be more, to provide more guardrails. And then you also have the service control policy, which is like the organization level control that says within this account or this set of accounts, you cannot do these things. It's something, it's the kind of policy actually where you would encounter deny quite often. And all of these things work together. It's generally like an intersection of all of the different policies, but there's a really good document called policy evaluation logic on the AWS docs, which we can link to.

And it's got a nice diagram of the whole flow, how at the start it results in an allow or deny. We've got some other really useful references in the show notes. I want to call out a few videos though, finally. There's quite a good one on IAM core concepts. If you want to just explore the fundamentals there, it's on, it's a Be a Better Dev YouTube channel. And we've also got two really good ones, which I tend to share quite frequently.

One from Becky Weiss at AWS about AWS identity and another one from Bridget Johnson, which is a really good one called becoming an IAM policy master in 60 minutes or less. I think if you understand everything in those videos, you will truly be in the top 5% of IAM users in the world, because there's just so many useful tips in them. So thanks very much for joining us and for sharing and liking these episodes. Please send us your feedback and let us know what else you'd like to learn about security or any other topic. And we'll see you in the next episode. ["Dance of the Sugar Plum Fairy"]