Help us to make this transcription better! If you find an error, please
submit a PR with your corrections.
spk_0: If you're thinking of using an external CI/CD tool to deploy to AWS, you're probably wondering how to securely connect your pipelines to an AWS account. You could create a user for your CI/CD tool of choice and copy some hard-coded credentials into it. But let's face it, this one doesn't feel really the right way, or at least not the most secure approach. In the previous episode, we discussed how AWS and GitHub solved this problem by using OIDC identity providers, and this seems to be a much better approach to this particular problem, or at least a much more secure approach. My name is Luciano, and today I'm joined by Eoin. In this episode of AWS Bites, we'll try to demystify the secrets of OIDC identity providers and understand how they really work under the hood. Let's start by summarizing that use case again. We have some process running outside of AWS.
spk_1: For instance, we have a pipeline running on GitHub actions. This process needs to interact with resources on AWS, like making API calls to AWS to create resources, so it needs some sort of authentication. A classic way of doing that would be to create a user in IAM, create a role associated with that user with the right permissions, and then you generate an access key, secret access key, and those are long-lived credentials. At this point, you could put those credentials into your pipeline. A lot of people might have done this before. You put it in some secret store, like GitHub actions secret store, and use those long-lived credentials to interact with AWS with the CLI or one of the SDKs. Now, the problem with that approach, as we may know by now, is that long-lived credentials might easily be leaked, and then it's very hard to detect that and really work against that sort of attack. So you can allow an attacker to impersonate your pipeline and execute malicious code. Of course, pipelines can tend to have very extensive permissions because they have to be able to create and delete important resources in your account and update your code, so this is a really dangerous use case. So the better alternative here is to use an OpenID Connect identity provider, so more specifically, configuring AWS to trust GitHub as an identity provider using the OIDC protocol. Luciano, do you feel like you could describe how OIDC works in broad strokes?
spk_0: Yeah, I'll try my best to do that, but before I do that, there is a link that we are going to have in the show notes, which is actually the GitHub documentation that explains really well how all of that works and there are illustrations. So if everything we say today is not 100% clear, we really recommend to check out this particular article. So the first thing to clarify is that there are two main entities here, and we are talking about AWS itself and GitHub, so we need to figure out how to make them talk to each other and how to create this kind of trust relationship. And in the OpenID Connect lingo, there are two pieces of terminology, identity provider and service provider. In this case, GitHub is the identity provider and AWS is the service provider. So GitHub is kind of the one providing users, while AWS is the one providing a specific service to the user. And this is where I was a little bit confused at first, because I don't think this is the most intuitive use case to understand this difference. In fact, GitHub doesn't really have a concept of users for AWS itself. GitHub doesn't really have a database, let's say, where there is a mapping between particular user names and particular roles. All this stuff actually still lives in AWS. So we'll see how this can be a little bit confusing and hopefully we'll try to explain that a little bit better. And another thing to keep in mind to really understand why there is this little bit of a blurry definition between who is providing the users and who is providing the services is because this is not like a user-facing integration, but it's more of a service-to-service integration. So in a way, we are connecting two services and the definition of a user there is not the canonical one, I would say. So in reality, the way we could see that is that AWS is providing a particular role and AWS is basically trusting GitHub to generate some sort of credential that will allow GitHub to assume that role. And we'll try to explain better how all of that works. So yeah, the first thing that needs to happen in this particular scenario is that we'll need to tell AWS to trust GitHub. So we need to create a trust relationship. And once trust is established, again, GitHub can just say, okay, there is a token that proves that I am the thing you trusted before, now give me access to this particular role. And that access is safer than the permanent credential scenario because that kind of access is using temporary credentials, using STS. So those credential will be short-lived and the chance of leaking them is much, I guess, is much harder to, even if they are leaked, is much harder to take advantage of these credentials, or at least not long-term. Okay. That sounds really good.
spk_1: So you've got the identity provider, you've got AWS as a service provider. We've mentioned STS and short-lived credentials, I think, a few times in various different episodes. So how do you start and how do you create the trust relationship between the identity provider and AWS as a service provider? What are the steps there? Yeah, this is something that I've done only manually.
spk_0: I don't know if there is a way to actually automate that through Terraform or something else. Probably there is, I'm going to guess, but if you want to do it manually, it's kind of one-off type of thing for most, at least for creating that first trust, then you can probably automate the creation of roles. But just to do that, what you can do, you can just go to the IAM portal and there is a section there called identity providers. And if you go in there, you can create a new identity provider. And once you go in that interface, it allows you to select different kinds of identity providers and one of which is OIDC, OpenID Connect identity provider. And it gives you kind of a form and you need to fill that form with certain kind of information that allows AWS to recognize GitHub actions as an identity provider. And the first thing that you need to provide is a URL. This is the OIDC URL.
And this is actually an interesting thing. I don't know if anyone is familiar with OIDC. It's kind of an extension of OAuth2 that also specifies in a much stricter way how the URL structure should be made, how the tokens should be created. While OAuth2 was much more liberal and every OAuth2 provider could be implemented in a much different way, in OIDC, you literally just need to know that one URL and everything else is standardized. So that's why we can here afford to specify only one particular URL. The other field that we need to populate in this form is something called audience. And audience, I don't think it's extremely important here because I think GitHub action can customize that audience on demand if you want to. But the standard convention that you find in the documentation is to set that to sts.amazonaws.com. And this is basically a value that will be available in your tokens and that you need to check to make sure that GitHub generated the token for the right application, in this case integration with AWS. And then the last thing that you need to do is to pass the time print of the TLS certificate. This is not really something that you need to copy paste, you just need to click a button in the UI and AWS will download the time print of the TLS certificate of the connection to that URL that we specified as the OIDC identity provider. And this is important because we need to make sure that in the future when AWS connects again to GitHub actions, it's still connecting to the same server, so to speak, that the trust is given by the TLS certificate. So if that TLS certificate changes, most likely we want to revisit the trust relationship and make sure we are still talking with the right provider at the other end of the line. Okay, so that's interesting to know.
spk_1: So this trust relationship can expire, so you need to have some process in place to make sure you're keeping on top of that and make sure you renew before these things expire potentially. I did actually just check there if it was possible to create all of this in CloudFormation and Terraform, and there's a Terraform plugin or Terraform resource for this, and there's also a CloudFormation OIDC provider resource. So it looks like everything that you just said is also possible there. I think you just have to figure out then how are you going to get the thumb prints into your infrastructure as code template if you're going to hard code those or do something maybe more dynamic. So you've now got this trust relationship. So that's step one. How do we link that? What's the next step in linking that through to permissions in AWS?
spk_0: So the next step is to create a trust policy in AWS, and that trust policy needs to have certain particular fields to make sure that you are kind of locking down the security as much as possible. So you're not basically allowing anyone or anything to assume the role. You just want GitHub action and maybe a specific workflow even to assume that particular role. So I suppose at this point you should have in mind exactly the kind of pipeline you're going to build in GitHub and what kind of permission that pipeline would require, and you create this trust policy where you specify that the principal for the trust policy is the ARN of the AM identity provider that we just created. Then the action is STS assume role with web identity, and then we can specify a bunch of conditions. We want to check that the audience is actually the one we specified, STS, amazon.aws.com, but also if we want to lock down the role to a particular GitHub action workflow, we can also specify another condition saying that the subject, which is going to be a field in the token specifying exactly the workflow that triggered that particular execution, matches exactly your expectation. So let's say that you have, I don't know, a project called e-commerce, and that project has a particular repository and a particular workflow in GitHub action called, I don't know, build and publish. You will have a way to say assume this role only if the pipeline was the e-commerce and the workflow was build and deploy. So you can create a condition to limit that kind of thing. At that point you have this trust policy and you can attach... that allows you to assume a role and that role can have specific permissions like, I don't know, you'll be able to create a bucket, you'll be able to deploy a lambda and all the things that you need to do for deploying your application. We mentioned...
spk_1: you've mentioned the token here and you mentioned things like the audience, the subject. Should we talk about the technology underpinning this, which people may have covered before, various different authentication authorization flows? What is a JWT? What do we need to know about it in this context?
spk_0: Yeah, one of the things that we mentioned before is that OIDC standardizes also the format of a token. A token can really be anything, like any string that you can verify and make sure it is actually trustworthy because, I don't know, maybe you can do an API call and get that that token is reliable from the API call or the token itself is somehow signed and you can trust that that signature gives you a guarantee that somebody trustworthy generated that token. In the case of OIDC, this is kind of the choice. They went for signed tokens and the technology of choice is JWT JSON web token. I have written an article a couple of years ago with some illustration that tried to describe in brief what's the structure and how they are generated, how they are validated. We'll have a link to that article if you want to go deeper, but the summary of that is that a JWT token is a string made of three parts separated by a dot and those three parts are a header, a payload, and a signature. They are all encoded in base64url and if you basically split the three parts and do a base64url decode, the header and the payload are actually two JSON objects, JSON encoded objects, and the payload can contain properties that are generally called claims and those properties are... they can be whatever you want, but there are some standards. For instance, audience is one of those, AUD, and generally represents the particular application for which the token was generated. So if you have an identity provider that can generate tokens for multiple applications, you can use the audience to make sure that you are receiving a token that is meant to be used in a particular application.
Then there are other claims like time validity. Don't use this token before a certain date, don't use this token after a certain date, or there are information about the issuer. For instance, which identity provider created that token. And again, if you have an application that accepts tokens from multiple identity providers, that's an important information because it also tells you how to check the signature for that particular token.
And if you want to check the signature, you also need to know which key was used to sign the token, so the ID of the key is another field that you will generally find in the payload. And in the case of GitHub Actions, there is also subject which is, in that case, will contain a reference to the workflow that generated the token. So, other information that you can use and is the information we can use in our roles to limit the fact that only the particular workflow can assume a particular role. One interesting thing is that you might wonder how the signature thing works because it's a little bit magic if you never really looked under the hood, but it works by using... Actually, JWT is a little bit open. You can use both symmetric and asymmetric encryption, so you could have either just a shared key to sign the token or you can have public and private key. Of course, in the case of OIDC, you want to have public and private key because you don't want to share a secret key between GitHub and AWS because then it means everyone will know that secret key and everyone will be able to create signed tokens. Instead, when you use a model with asymmetric keys, you will have a public key that allows you to validate tokens and that can be... Anyone can read that. And the private key is only known to GitHub in this case and it means that only GitHub will be able to sign these tokens. So, in reality, you almost never want to use the symmetric key these days. You always go for public and private keys.
spk_1: Given that we've got this trust relationship and we've got the role created and we've set the trust policy in the role so that it can only be assumed by principles coming from this identity provider, what's the next... How do we get that identity provider being GitHub and specifically ours pipelines to get credentials? So, to assume a role or whatever it is that allows us to enter the AWS world and make API calls. Yeah.
spk_0: So, in this part, I think there is a little bit of speculation because some parts are well known and well described and other ones, we can only assume what AWS is doing to actually validate the token based on the OIDC standard. So, I'm going to try to come up with a narrative, but it might not be 100% truthful to what AWS actually does. But basically, the point is that at some point we start a workflow in GitHub Action. And GitHub is kind of an event based in that sense, saying that every time there is a new workflow, it's going to generate a token for that particular workflow. And with that token, in your workflow, you might decide to use it or not. But of course, if you're going to interact with AWS, you might want to use that token and exchange it for AWS temporary credentials. And that's something that can be done either manually, if you want to write all that code with a CLI or an SDK, or if you want to make your life easier, there is an action that is provided by AWS and you can just import that action into your workflow and configure it to assume the particular role that you have in your AWS account. And what happens behind the scene in that action is that it's basically fetching the token generated by the GitHub workflow and then making an STS call, assume role with web identity and pass that token to AWS.
Now, this is where it gets a little bit speculative because we'll need to imagine what AWS will do to actually trust that particular token. Because the token, as we understood this like a string, where you can do some decoding and get some information out of it, and then there is a signature that gives some sort of trust that it was generated by a trusted source. So what AWS should do, in my opinion at least, is that first of all, it needs to check if the token is well formed. So is it a valid JWT? Can we decode it? And then are there three parts, a header, a payload, and a signature? Can we read the claims inside the payload? And then when we read the claims, is this token issued by an identity provider that we recognize? So this particular account did have a connection, a trusted relationship with this particular identity provider. If yes, then at that point it needs to check the audience. Like, do we recognize the application for which this token was created? And in our example, we said we will just use the generic STS, AWS, something, I don't even remember. But you can keep that generic or you can customize it if you have different applications. And GitHub Actions can actually change that value for you when you create the token. So that value is actually a little bit of a placeholder that you can configure. Either keep it standard if you have one particular use case, or you can customize it by application. And then the next phase is, okay, once we have validated that the token is correct, that the information in the token looks good and we understand it, we need to make sure that that token is authentic. So it wasn't forged by a third party, but it needs to come really from GitHub Action. And the way that I assume AWS is going to verify that is by using OIDC. So it's going to see in the claims what is the key ID. It knows what's the URL of the token, what's the name of the public key URL endpoint from the OIDC protocol. And it's going to use that to download that particular key, the public key. And then at that point, it can actually check, okay, was it really this key, the one that signed the token? So there is kind of a double trust there. One is given by the fact that we created this trust relationship with that particular URL of the OIDC provider. And the other one is given by the fact that AWS can download a public key from that URL, and that public key actually can verify that the token was signed by that particular OIDC provider. And at that point, if everything is good, STS will do its own thing. It will create temporary credentials, and it will return these temporary credentials that can be used to interact with AWS and will have the policy that is given to the particular role for those credentials. Okay. What form, what does it, what do those credentials look like? What form do they take?
spk_1: If I understand correctly, this is like the usual when you assume a role with STS.
spk_0: So my understanding is that it will be like an access key, a secret access key, and also it will have a session token and an expiration field. Okay. So yeah, temporary credentials that are linked to a particular role.
spk_1: Okay. Yeah. So I guess this is kind of familiar in some ways if you've used SSO or some of the Cognito flows where you're exchanging some third-party identity provider, you've got some credentials and you're exchanging it for temporary credentials, you get the exact same thing. Okay. That sounds a little bit clearer now. How would you use this outside of GitHub? Is this really limited to GitHub for now? What if you've got other CD providers? What other things would you use OIDC providers for?
spk_0: Yeah, this is something that got my curiosity because I was thinking, okay, how does AWS trust GitHub? And of course they made that generic. So if you can understand how that connection works, then you can create your own sources for allowing on a certain event to assume a particular role and you can kind of delegate to this particular source the idea of given that there is a trust relationship, I trusted when a particular event happens, then I can assume a particular role with temporary credentials.
So I don't know if there are interesting examples outside the CI/CD world, but for instance, if you have an on-premise Jenkins and you have your own OIDC provider, you could build basically that integration pretty much the same way as we explained for GitHub Actions. AWS is just going to trust your own on-premise identity provider to basically generate tokens that then will give access to particular roles on AWS. But I was thinking also, can you use this for other workflows? I don't know if it's the best way of doing this, but technically you could use event-driven things if you want to basically, I don't know, maybe a physical action in the real world triggers something in AWS. I'm thinking, I don't know, maybe you have an application that every time you enter the office, you swipe your own card to track time or something like that. If there is, I don't know, I know OIDC provider connected there, that's OIDC provider. There could be an application that creates a token using that OIDC provider, assumes a role, and then maybe recording a DynamoDB table that, I don't know, somebody accesses the building at a certain point. So you could create this kind of, I suppose, actions where you have a source of authentication and you want to assume a role, but in a kind of time-limited fashion. Now, probably there are better ways to implement this kind of stuff, but I was trying to stretch my imagination on once you understand this integration, how much can you use it? How far can you go? All right.
spk_1: It sounds like that this is applicable in any case where you've got a system to system interaction between a non-AWS environment and an AWS environment. So it could be used when you've got an on-prem Sys application that needs to talk to an AWS application, for example, and you don't want to have access keys configured. We know that in EC2 or in ECS, you've got like a profile you can associate with that resource. So you can use that resource. So you don't have to have secret keys, but outside of AWS, it has been very common for people to just use long-lived keys to perform that kind of interaction. So I guess this is one way of overcoming that. You just need to think about what is your OIDC provider and how are you going to issue those credentials? I know some people would maybe integrate it into Active Directory and have some sort of service credentials. So that might be another way of doing that. It might be worthwhile mentioning actually, as a slight segue, there was a very recent announcement for a new feature called IAM roles anywhere, and we can link to this announcement in the show loads, but it sounds like another way of doing this kind of system to system interaction where instead of having an OIDC provider, you use a public key infrastructure. So PKI, you've got a root certificate authority yourself, or you can use AWS certificates manager, and you can issue client certificates, and you actually set up a trust chain between your certificate authority and AWS, and then use client certificates as a means to exchange them for temporary credentials. So it's slightly tangent, but related, and it's a very recent announcement, so I just thought I'd call it out there. But I guess what this is kind of saying to us is that all you need to do is to create something that follows the OIDC protocol, and you can pretty much use it to exchange identities for credentials in AWS. So I guess that means it's potentially something that could cause security issues if you don't get it right. Now you could create your own identity provider. You could use it to give administrator access to all of your accounts. So you need to understand exactly what the trust model is. Is it worthwhile maybe summarizing that? How would you describe the trust model for this OIDC relationship?
spk_0: Yeah, so as we said, the first step is to create the trust relationship within AWS and GitHub Actions. In this case, we go to AWS IAM, and we create the OIDC provider connection. At that point, GitHub Action can create tokens in the form of JWT, and these tokens are something that AWS should be able to trust and recognize. So with a token like that, GitHub Action can say, assume a particular role, and it is basically exchanging that token that automatically generated with temporary credentials that are given by AWS for a particular role. So in summary, I suppose that we are basically creating a configuration where AWS trusts the signature of the IDC provider, and with that trust comes the ability to assume a role with temporary credentials.
spk_1: Okay, that makes sense. So it seems quite powerful, and it's nice the way it's using the standard, and it potentially opens up support for a lot of other OIDC providers. I think we've covered it in quite a lot of detail, so you've given a lot of information there. From a developer point of view, if you're thinking, okay, that's all very well and good, I'm informed now, but as a developer, I've got a CD pipeline, or maybe I'm creating a new one, or maybe I've got one that already uses long-lived credentials, and I want to switch over to using short-lived credentials with this new way. What are the steps in summary?
spk_0: Yeah, just make sure that you have configured the OIDC provider in AWS so that you have created that trust relationship, and we have explained extensively how to do that manually, but Eoin, you also pointed out that you can do that programmatically using Terraform, or Cloud Formation, or CDK, or something like that. So make sure that that happens, first of all. Then you need to create your own roles, so you can create a role for every single workflow. If you want to be very strict, make sure to set up the right permissions for every role, and at that point in your GitHub Action workflow, you can use the AWS Action, configure AWS credentials to basically have a step before you interact with any AWS resource to get the temporary credentials. So at that point, you can remove all your hard-coded credentials and swap them with this particular step that uses the AWS Action to do this exchange of a JWT token for AWS temporary credentials.
spk_1: Are there any other resources we should point people to who want to get started with this?
spk_0: Yeah, so I was actually reading a very good post by Elias Branche that is kind of a tutorial that guides you step by step on how to do all the things we describe today, and has very good examples, and also a lot of screenshots so that you can be sure that you are following and doing the right things. So I definitely recommend if you want to do this for the first time to use this tutorial as a reference to guide you through all the process, and we're going to have a link in the show notes. But also if you haven't seen our previous episode where we discuss why you should consider using GitHub Actions rather than CodePipeline, maybe that's a good one that you can check out after this one to make sure you really get all the context on why all this stuff might be interesting for you. That's all we have for today and we'll see you at the next episode. Thank you very much.