Help us to make this transcription better! If you find an error, please submit a PR
with your corrections.
Eoin: We recently had to integrate a workload running on Azure with some components running on AWS. To do that, we explored a lot of different options and evaluated all the trade-offs in terms of simplicity and security. So in this episode, we wanted to share our learnings and discuss how to securely and efficiently integrate workloads running on-premises in Azure or in any cloud with a workload running in AWS. We'll review several different options for integration and discuss their advantages and disadvantages. My name is Eoin and I'm joined by Luciano for another episode of the AWS Bites podcast. AWS Bytes is sponsored, as always, by fourTheorem, an advanced AWS consulting partner that works together with you on architecture, migration, and cost optimization. Find out more at fourtheorem.com. That link is in the show notes. Luciano, what are some of the use cases and examples that might require us to think about authorization and communication between Azure, some other cloud, and AWS?
Luciano: As much as we like AWS, we don't always run everything on AWS for different reasons. So sometimes you have this situation where you're running some kind of workload somewhere, can be in another cloud, can be on-premise, can be in your own home because maybe you have some devices to connect to the internet, and you might want to integrate that particular system with something running in your AWS account.
So the question that we want to address today is how can you establish a secure integration between those two systems? And just to give you some example use cases, for instance, you might have your own like home NAS where you keep all your personal files and just because you want to be extra cautious that you're never going to lose anything, maybe you want to backup the same files in an S3 bucket and possibly even use Glacier to reduce cost. How do you connect a system running in your own home network with AWS securely so that it can send data to S3 and Glacier? That could be one use case. Another use case could be maybe a little bit more business oriented. You might have in a big corporate network, maybe physically in an office, you might have some kind of network security device that collects network metadata. And probably if you're collecting this data, you want to analyze it. So one way to do that could be you could send this data to a Kinesis stream, and then later on you can dynamically analyze this data and record interesting network activities.
Or maybe you can even implement intrusion detection type of algorithms once you have that kind of data. So again, how do you connect a device running on a local network or in an office with something that is running on AWS like a Kinesis stream? Another example could be actually you might have another application running in another cloud. Maybe your company is using multiple clouds. For instance, you might have like a billing engine that is running on Azure, and this billing engine is producing invoices, but then the rest of the system is running on AWS. So for instance, you might have an SQS queue where you receive messages coming from Azure. Then you can take messages from this SQS queue and process them and maybe send emails to your customers with the attached invoice using SES. So again, this is kind of an integration where part of the system is running on Azure, part of the system is running on AWS. So how do you let, for instance, Azure send data to a queue, which is an SQS queue, so running in AWS in a secure and simple way? So the question for today is what are the mechanisms? Should we use IAM? Is that secure enough? What are some of the alternatives? So where do we start? What's the first option?
Eoin: Yeah, I think IAM is generally secure enough as long as you've got the credentials in the right way. And just thinking through what you're saying there, I guess this is quite valid as well. All of this kind of use case, it's valid in a case where you're migrating to AWS and maybe you're going for a hybrid approach. So you're deciding not to put everything on AWS, or just as you're migrating, you end up in this intermediate hybrid state.
So maybe one of the easiest ways or most understood ways is to put a public API in the AWS side with an authorization method. And that could be something like a API key. And then you just share the API key with the external side. You can use IAM authorization on that API, but you still have to figure out how to get that IAM session. Or you can use OAuth or OIDC and get a token that way. And that allows you for secure integration, but it makes the integration side fairly easy and understood because most people will understand how to call an API. So it's a fairly common way of doing it. So on the AWS side, you could use API gateway. API gateway allows you to provision API keys and share them with your clients. If you wanted to, you could go with more of a secure approach using an authorizer like an OIDC authorization. Then you need to configure the clients correctly to have some client ID in secret, for example, to be able to perform an OIDC flow and obtain credentials. Similarly, if you just wanted to go with IAM authorization, you have to figure out some way of getting IAM credentials. So I suppose a lot of the other options we're going to go through are going to cover getting IAM credentials anyway, whether you use the API gateway approach or not. But the advantage of an OIDC approach, if you already have an identity provider, is that you can hook it into that method and get a JWT token that they can use and attach it to the request. So that's public API method. Let's talk a little bit more about IAM. And I suppose the first thing people would reach for when they think about IAM is an IAM user, but we often talk about how this is discouraged. So what do you think? Is it a viable option?
Luciano: I think it's definitely the simplest one. So for very simple integrations, it's probably the one that I've seen the most, even historically. The idea is that you just go to your own AWS console, you create a user, you create credentials for this user, then somehow you propagate these credentials to your client. And then your client is basically just doing using the SDK or the CLI with these credentials. Sometimes you can make it a little bit better by just creating an external role and you give to that particular user only permissions to assume the role. At least you have another extra step where you can track exactly when the system is assuming the role. And therefore we have a little bit more control and more visibility on the actions that are happening there. But with that, you can also have proper logging and alerting, and you can try to set up automated key rotation to make it a little bit more secure. In reality, I've seen that when you have to do all these things, people just say it works and you end up with long-lived credentials, which at some point might become a very serious security liability. So this is something that brings us to explore other options just because this can be very convenient and very easy, but also to make it secure requires so much more work. So the risk is that you just stop yourself at the first step, it works, and don't make it secure until eventually maybe you have an incident in production. So just be aware that this solution is always a bit tricky, looks simple, but it's dangerous. So let's figure out if there are other options that maybe take a little bit more time to be set up, but then they will be more secure by default.
Eoin: If you wanted to use the IAM user approach, I would suggest you can use infrastructure as code. So in CloudFormation, you can create your IAM user and you can also create the access key either in CloudFormation or programmatically, and then store it in Secrets Manager and some other secret store or vault that might allow you to share it with the external identity. But yeah, it starts to be a lot of work if you want to implement rotation and alerting yourself. So another option is IAM roles anywhere.
And I think this is probably the purest, most suitable solution today for the problem in many ways, because it's really designed for this purpose actually. I mean, the hint is in the name. I think the service has been around for about a year or so. And the idea is that it allows you to use your PKI, so your private key infrastructure to exchange a certificate and a signature for IAM credentials. So a lot of organizations already have the PKI infrastructure, so sometimes it's a really good fit. They're already issuing private certificates for other reasons internally. So sometimes it's a really easy jump to roles anywhere.
Once you have your private certificate authority, then you can set up a few resources in roles anywhere. So the process works like this. You have your certificate authority, you create a trust anchor, which is basically creating the trust relationship between your certificate authority and IAM. And then you create an IAM role that roles anywhere can assume. That will give you the permissions you need. And then there's another thing called a profile, which essentially links roles anywhere to that role. Once you have those three things, you can use a tool called the AWS Signing Helper. You can use it, execute it manually, or you can use it to pick up credentials in your SDKs. And that will then basically share a signature and a public search with IAM roles anywhere, and it'll give you back the IAM credentials.
Now, this might sound a little bit complicated or unfamiliar if you haven't come across these concepts before. So we have put together a very simple demo for this. So there's a demo for this. So there's a link on GitHub. There's a repository we created, and you can check it out. And it'll give you steps to create like a dummy private CA on your laptop with open SSL, and then a CloudFormation template to deploy everything else. And once you have that, you can set it all up in about five minutes, and then you can get these credentials.
If you don't have a private CA already, it's setting one up and maintaining it and securing it is not for the faint-hearted, and I wouldn't generally recommend it. So the good news there is that you can use AWS certificate manager, private certificate authority, and that will manage all of that for you. It's a managed service. There's always a bit of bad news, though, and the bad news with that is that it costs $400 per month per certificate authority.
So be careful creating multiple certificate authorities for different development and test environments. You do get one for free per month, but I was recently given a bit of bill shock when I was creating certificate authorities and only ever creating one at a time, but following best practices, creating immutable stacks with infrastructure as code. And when I created and deleted three times, I saw an $800 forecasted bill on Cost Explorer. Now, I think this was more of a overestimate in the forecasting of Cost Explorer, but I did have to go to AWS support and open a case and check that I wasn't going to be billed for that. So just be careful of the cost there. But if you do have to create one, I would say try and create one and share it amongst multiple environments. So roles anywhere. If you're going to leverage a lot of instances externally, over time, looking for IAM permissions, I think that's a good one. What else have we got? Anything else useful?
Luciano: Another approach that we consider is OIDC federated identities, which again, if you have an OIDC provider like Azure AD, that can be easy to set up because you are already using that system. So it's more creating the integration between that system and AWS. And the idea is that you create managed identities in Azure and link them to whichever compute you are running in Azure, for instance, IBM or Azure functions. So you don't really have to generate explicitly some kind of credentials or secrets and keep them stored somewhere because the managed identity will do all of that transparently for you.
Then you also need to create this integration between the OIDC provider in AWS, so inside IAM, pointing it to your own Azure AD. And then as a client, you just need to perform the authentication flow, the OIDC authentication flow. That will give you a token. And basically with that token, then you can use the AWS SDK to do a similar role with web identity, pass the token, and at that point you have AWS credentials that are associated with that particular role that you are assuming, which will give you the permissions that are defined in that role.
So it seems a little bit convoluted, but the idea is that you already have Azure ID, so it's more about creating the trust relationship. On the Azure side, it's made easier because you have the concept of managed identity, so you automatically or easily enough you get access to that token. Then you can use the SDK to exchange the token for basically assuming a role and getting the credentials for that role.
And at that point, these are short-lived credentials, so you can have that peace of mind that if the gate's leaked, the blast radius is very limited. And in this particular case, we found a blog post that has very good instructions and examples, so we'll make sure to have the link in the show notes. And we also have a previous episode about how the integration between OIDC providers and IAM works that goes into more detail about the protocols, the different ideas, and why all of this approach is secure. So if you are really curious to go a little bit deeper into the details, we recommend checking out that episode, and again we'll have the link in the show notes. So in general, the advantage of this approach is that you don't have to store any secret, which is great from a security perspective because every time you're storing secrets, you need to make sure you have a process around it, you need to make sure you are auditing it, you need to make sure you have to do some kind of rotation. So in this case, you are kind of relieved from all these concerns. And it's especially good if you already use Azure ID or some other form of OIDC provider because you don't have to set up all of that. It's already an organization, it's just a matter of connecting it with AWS. So that's another option and I actually quite like this one, but is there anything else worth considering?
Eoin: One that might not occur to everyone, at least didn't occur to us until the very end, but then we reached into the back of our memories and found reference to SSM hybrid activations. So what are SSM hybrid activations, you might ask. We've talked about SSM a good bit recently because we've been talking about Session Manager in the context of Bastion hosts and ECS and EC2, and it's a nice way for accessing EC2 instances. But SSM also supports hybrid cloud setups as well through this hybrid activations method. So the idea here is that by running the SSM agent on the external host, you can SSM shell into them or just use some of the other SSM things like patching or run command. And this, the typical use case for this isn't really what we're talking about here, but it's if you've got a fleet of Windows machines and you need to run patches on them, then you can activate these hybrid activations and then you can run your patching automatically from SSM in AWS and that will cover your AWS instances as well as your external instances. So that's pretty useful.
But it works in this case as well because you can just install the SSM agent on the Azure side or in your data center. Then you go into AWS and with any of the usual ways, you can create this activation resource and the activation resource will be linked to an IAM role. It needs some specific SSM related permissions and then whatever other permissions you need for your use case. Once you create that activation, you get a code and an ID. So these are kind of your secrets in this case. And when you start the SSM agent on the instance, you provide that code and ID, it will register the instance in SSM and all of a sudden it'll appear in your SSM console and you can shell into it if you turn on that option or you can just do run command. And if you have this role and you have SSM agent running on your Azure instance, then all of a sudden you can do a run command from the AWS side and that can trigger some logic on the Azure side, which can then call back to AWS with the permissions you've given it. And that would achieve our goal as well. Now the SSM hybrid activations, there's two tiers, standard tiers, free for up to a thousand instances. Then the advanced tier starts to get a bit spicy in terms of cost. So I think it looks like $5 per instance per month. So if you wanted, I think you don't need that for generally the kind of case we're talking about. So don't worry too much, but just be aware, like if you want to actually SSM into your instances using start session, you need the advanced mode and that's when it can get expensive if you've got a lot of instances. So what else have we got?
Luciano: I think this ties nicely with another idea, which is a little bit different from most of the other ones we explored today, because in all of them, basically the idea is that you have this external system, then you have AWS and you are always starting the communication from the external system to basically call some kind of action into AWS. So in a way it's kind of a push model, right? But it doesn't have to be a push model. If we look at it from the perspective of AWS, it could also be a pull model. Maybe we can initiate the communication from AWS itself.
Nothing is really stopping us to use that approach. So the idea could be that rather than implementing an API in AWS, which is the option one we explored today, we could implement an API in the other side. Let's say it's Azure. On Azure we can expose an API and then from AWS we call that API to start to trigger some kind of integration. Again, you still need to figure out some kind of authentication because if that Azure API is running on the public, is exposed on the public internet, then potentially anyone could call it. So you might still want to make sure that it's only your trusted AWS side that is actually calling that API and sending data that you can actually trust. But that might not be the only way of solving this problem because maybe that connection is not really on the public internet. Maybe you have some kind of private network connection and maybe you can just trust that that network connection is good enough to give you guarantees that it's fine. Maybe you are, I don't know, white listing IP addresses or maybe you have some other form of network security. In general, I think today we focused on more of a zero trust approach where every call is authenticated strongly with tokens and specific mechanisms where you are never assuming that the network is secure. So I think we had a little bit of a bias towards this kind of solutions, but I think there is an entire realm of more kind of network-based security approaches that could be considered as well. I don't know if you have any idea in that sense, but I think that that's also worth calling out anyway.
Eoin: We've really been talking more about how to get IAM credentials where possible and trying to do it based on fine-grained authorization. But there's a whole other set of options we didn't cover, I suppose, network-based solutions, like a site-to-site VPN between your other cloud and AWS or data center. You could have a direct connect link or something else in place already. So there are lots of network-based approaches that you could try and do if you have some secure tunnel between the two environments.
We still think it's a good idea to have IAM or some authorization if possible. You could also think about IoT as well. AWS IoT also has methods similar to roles anywhere and hybrid activations where you can use certificates to get credentials to talk to AWS. But they're really more geared for lots of different sensor devices or other IoT devices. I think in general, we've presented, I think, six options in total. The OADC identity provider and the roles anywhere approaches are kind of the preferred ones, I would suggest, just because of the fact that you're looking at limiting the need to store and rotate secrets. The API one is a nice one as well, but you just have to make sure you have some authorization method in place. So there's six options in total you might need to reach for some of the password or secret-based ones, just depending on your context and your restrictions. So let us know what you think. And if we've missed any other options as well, if there's any other cool ideas for integrating Azure and AWS. If you like the podcast, please do leave a review wherever you listen to your podcasts because our audience is growing, but we can always reach more people and get lots more feedback and grow the community. So we'd love to hear more from you. Thanks for listening and we'll see you in the next episode.