Help us to make this transcription better! If you find an error, please submit a PR
with your corrections.
Eoin: Cognito is a frequently used and core AWS service for managing users, authentication and authorization. But getting started with Cognito and knowing what features to apply to different use cases can be really challenging. By the end of this episode, you will know the differences between Cognito user pools and identity pools, also known as Cognito federated identities. My name is Eoin, I'm here with Luciano and this is AWS Bites.
Before we start, we have a favor to ask. If you have been enjoying this podcast, please consider giving us a review on Apple Podcasts or wherever else you get your podcasts. If you follow us on YouTube, consider subscribing and liking our videos and giving us plenty of comments. This will really help us to keep the podcast relevant and discoverable for other AWS enthusiasts. Thank you so much. So let's get to it. Luciano, what are some of the use cases where you might actually need to use Amazon Cognito in the first place? Yeah, the most common use case is when you need to do sign up and sign in for any mobile or web application.
Luciano: Like that's the service you go to in AWS. And outside of AWS, a compatible service is generally Auth0. So if you have been using Auth 0, you can always imagine as the AWS counterpart to out zero. But of course, you can do more. You can also, for instance, use Cognito to limit access to APIs. For instance, you build your own Lambda and you use that Lambda as a back end for an API gateway. You could be using a custom authorizer backed by Cognito to make sure that only people who actually have some level of access, some level of login or authentication into that particular application can actually call that API.
And similarly, that's not limited to HTTP. Of course, you can do that also if you're using GraphQL, for instance, by combining that with AppSync. So basically, this is the use case where you have users that can perform specific actions only if they are logged in. So you can build in all that security layer into your applications. And very similarly, you can do that for other kinds of resources. For instance, you can allow users to access S3, DynamoDB by using features that are built in in Cognito. So once the user is out and together, recognized, you can somehow give them access to these kind of services and resources in your AWS account. Yeah, so I think one of the things that I was confused about the most when starting to use Cognito is these two concepts of identity pools and user pools, identity pools also now is being called federated identities, which I think just creates more confusion. So what are these two things and what are the fundamental differences?
Eoin: When I started building serverless applications about six years ago, Cognito was the one service that really confused me the most. It's really great coming from places where you used to have to build this kind of thing yourself, that you got a service that takes care of all of the security needs and the standards you need to comply with. But it's really just not simple to understand at first. User pools and identity pools being kind of similar and having similar names, it kind of adds that confusion.
So let's try and do our best to clarify what they are. User pools allow you to create your own identity provider, so they're used for implementing authentication. So think about sign up, log in, and they allow you to build those sign up and log in flows. And as well as that, you also get a place to store user data. On the other hand, identity pools or federated identities, they're more for authorization.
So they don't give you a place to store any user data. And I prefer to think about identity pools as an identity broker because there isn't really a pool of anything. It's just a place that allows you to exchange one set of credentials for another set of credentials. And what you get back from identity pools are short lived IAM credentials. So you're essentially swapping your already verified identity from an identity provider or IDP.
It's also known as an IDP and you're getting back short lived IAM credentials. Now, identity pools don't have a lot of features. So let's just first talk about some of the features of user pool because there's quite a lot there. So you get a place to store your users in groups. You get hosted, sign up and sign in and reset password pages, as well as the ability for you to implement all of that stuff yourself and customize the UI.
You have lots of ways to authenticate so you can authenticate with username and password. And it now supports multi-factor authentication with an app as well. You can do server to server authentication. You can do lots of OAuth 2.0 flows. And then you can also do social sign in into your user pool with Google, Amazon, Facebook and Apple. As if you don't if you're not using any of those, you can also use SAML and OIDC federated sign in for anything that supports those standards.
If you want to customize any part of the login flow, then you also have Lambda triggers so you can put hooks in place at various points of the sign up and login flow. And as well, if you're coming from somewhere else, user pools have mechanisms that allow you to migrate your existing users and their usernames into user pools. So that's pretty good. So when a user logs in with Cognito, they will get an access token. That access token is particularly usually like a JWT, a JSON Web token.
So it's a it's a JSON object. It's signed and it's got a lot of properties in it. And that access token can be used to secure access to some AWS services. So you get your access token. You always get a refresh token as well. And if you're using the OIDC scope, you can get an ID token too. On the other hand, when you're using identity pools, you're swapping a token like that you've already got from your ID identity provider for an IAM session.
And then once you've got IAM credentials, it's just like having an IAM user. You can use that to make any AWS API call allowed by the policy associated with the role and the session. So at this point, you might realize, since a user pool is an identity provider that issues tokens and an identity pool is a broker that swaps tokens for IAM credentials, you can actually combine the two things to get IAM credentials for the users that are stored in your Cognito user pool. Of course, this is always confusing because you can use any other IDP instead of user pools as well. So you've got a lot of options here. So it is worthwhile exploring when to use one over the other and when to use both. Yeah, absolutely. And I think it's good to to rediscuss that and maybe make an example.
Luciano: And yeah, basically, just to summarize what I am understanding from what you just said is basically you want to if you want to have a place where you can sign up users and store attributes about these users and you don't necessarily have an existing identity provider, you should use user pools. Then if you want the state to allow users to access AWS resources, we mentioned S3, could be something else.
Then you need basically to use identity pools because the idea is that identity pools will be the broker, as you say, that will convert whatever authentication mechanism you have from user pools or something else into actual credentials that AWS recognizes to give you access to services. So if you need both, you can definitely use both. And again, I want to clarify that with a practical example, because otherwise it's going to be a little bit more abstract, a little bit too much abstract.
So let's make our classic favorite example, an e-commerce website. In particular, what do we want to do here is as many other e-commerce users will be navigating, seeing different products. And we want to be able to collect all of that information. We want to understand the user journey, the user preferences, so that we can be able to suggest the user products that might be interesting for them in as much real time as possible, of course.
So while they navigate, they should see suggestions that are calculated more or less in real time. So the user will be logging in with username and password. So there is definitely a user pool there that will allow the users to log in, but also store all the necessary information and attributes about every single user. Then while the user is navigating, we want to have IAM credentials to be able to send user activity, for instance, clicks or the different pages that the user is visualizing into a Kinesis stream.
And in order to do that from the client, we need to use something like Identity Pool so that we can get IAM credentials that are authorized to send messages to this Kinesis stream. So in this case, we are basically using both. We are using the user pools for login and storing user attributes. But then we also use Identity Pools to basically get the AWS credentials that are needed to connect to Kinesis.
Another interesting detail that took me a while to realize is that when we want to create a setup like this, what we are basically doing is using our user pool as an IDP for our Identity Pool. So this is how we are connecting the dots between user pools and identity pools and using them together. So we are basically creating that trust relationship saying to our broker for credentials, which is the identity pool, to trust our user pool. So when a user is logged in into that user pool, then the identity pool is giving us credentials or actually is giving the user credentials to be able to connect the services that we want to authorize for that user in AWS. There are ways to access AWS resources, but a limited subset just using user pools.
Eoin: And there you're just talking about using your token to protect an API and implementing authorization in that API. So how does that work? Let's give a few examples. So if you've got an API gateway or an application load balancer that you can put in a Cognito Authorizer and that can restrict access to APIs to authenticated users from a user pool, you don't need an identity pool in that particular case.
So the access token is just going to be validated by the authorizer that you've configured in the load balancer or in API gateway. So that doesn't really give you any role-based or attribute-based access control. It's kind of all or nothing access for each authorizer you configure. If your API is backed by Lambda, you do get information about the principal or the identity making the request. So you can find out what groups, because Cognito also has the concept of groups of users.
So you can kind of use that to implement your own level of access control by checking what groups are a member of and implementing check your own checks further down the chain behind the API. If you're using AppSync, it's actually a little bit better because AppSync's authorizer also gives you the ability to protect some of your AppSync queries or mutators using annotations that specify, okay, this is the user pool, but the user also needs to be a member of a specific group.
So it's a little bit more powerful. An alternative then is that you can say, okay, let's not use user pools to protect this API. Let's instead use IAM authorization. So API gateway has IAM authorization, and then you're just using standard IAM policies and the request to your API has to be signed using an AWS version 4 signature, just like a request to an AWS API. And in that case, you're back to using identity pools because that's where you get your IAM credentials from if you've got an IDP, as you've clearly explained.
So that means that you don't get any user identity information further down. So once the user has access through their IAM policy as issued by the identity pool, it allows them to invoke the API, but the backing code behind that API, it doesn't have any visibility onto who that user is. So in general, you kind of find that most people use user pools and access token for restricting API access for web and mobile applications and identity pool credentials are more often used for accessing other services directly. S3 being a common example, but you also mentioned the case with Kinesis, or it could be Kinesis Firehose or Amazon Pinpoint is another one. If you want to collect data directly from a client with low latency and overhead and let Amazon manage all the scalability and you don't want to put an API and a Lambda function in between the client and that AWS service. Are there any other interesting features we should talk about?
Luciano: Yeah, let's mention a few other interesting things that might not be too obvious and maybe worth checking out later after you listen to this episode. One interesting thing is that identity pools basically, we said they are just giving you IAM credentials. So you could use that idea to do ABAC or RBAC type of auto-excession. So basically the idea is that because users will have certain attributes, and then you can use those attributes in your IAM policies to create conditions basically.
So you could, for instance, say, I don't know, if a user has this particular group, then that's a condition that allows your policy to organize it. For instance, I don't know, more practical use cases like you have an admin flag or a group and you can use that to restrict that specific actions. For instance, only if you have that admin flag, you can do that particular action. I don't know, maybe delete a user or something like that.
Similarly, you can use the same idea to do, for instance, multi-tenant systems where you could have an attribute that tells you exactly which organization that user belongs to. And therefore you can allow specific actions only on the resources that are part of that organization. So that could be another thing you could do. Another thing is that you can do social sign-in. So for instance, I don't know, you want to allow login not just by username and password, but also through Google, Facebook or some other social login system.
You can use the OAuth provider as part of the user pool configuration. And then you can still use identity pools to still exchange tokens for IAM credentials. So again, this is another way that you can combine the two. It's not just limited to access with username and password. You can still do the social login and then if you still need to get IAM credentials, you can also use identity pools. Another thing that is interesting if you ever use Amplify, Amplify is kind of a nice abstraction over all these things. It gives you very easy to use APIs, but just behind the scene is doing all the things that we just described. So it's just an easier way to get those IAM credentials and use them in your own web or mobile application.
Eoin: Great. So I think that's all we have for today, but we're curious to know if you've been using Cognito. Do you have any further questions about Cognito and other things we should cover in the future? Do you have any interesting tips or use cases that you've managed to implement with it? Please share with us. You can drop us a comment on YouTube or reach out on Twitter. All our contact details are in the show notes. So thank you very much for listening and for liking and leaving us a review. We really appreciate that. And we'll see you in the next episode.