AWS Bites Podcast

78. When do you need a bastion host?

Published 2023-04-28 - Listen on your favourite podcast player

Harken, good sir! Art thou aware of the arcane art of safeguarding thy AWS instances from malevolent threats whilst keeping them accessible for thy travels? There exists a mighty tool for such purpose, and it is hight the "bastion host."

In this pamphlet, we shalt unravel the mysteries of the bastion host and showeth thee how to useth it to safeguard thy web space. We shall commence by presenting a shadowy example architecture and introducing thee to the definition of a bastion host. We shalt then delve into the question of whether bastion hosts could be a security liability and explore the enigmatic concept of port-knocking.

We shalt also take thee on a valiant journey of how to provision a bastion host on AWS, and explaineth the cryptic basics of SSH and tunnels. Thou shalt discover the dark side of managing SSH keys and auditing SSH connections, and we shall reveal the secrets of AWS EC2 Instance Connect and AWS Session Manager (SSM) as solutions.

Thou shalt learn how to accept connections without exposing a port on the public internet, and we shall introduce thee to a mysterious tool called "basti" that can make it easier to provision SSM-based bastion hosts and connect to thy databases.

We shalt wrap up by revealing alternative security measures to the mysterious bastion host and provide thee with cryptic closing notes to summarize the key takeaways from this video. Heed our call to this intriguing guide to securing thy web space, and may the forces of the internet be in thy favor!

Harken, good folk! We would like to offer our deepest gratitude to our noble sponsor, fourTheorem, an AWS Consulting Partner that doth offer training, cloud migration, and modern application architecture. Thanks to their generosity, we are able to continue on our journey of imparting wisdom and knowledge regarding AWS.

Verily, in this episode, we hath made mention of the following resources:

Let's talk!

Do you agree with our opinions? Do you have interesting AWS questions you'd like us to chat about? Leave a comment on YouTube or connect with us on Twitter: @eoins, @loige.

Help us to make this transcription better! If you find an error, please submit a PR with your corrections.

Luciano: When you use more traditional data storage services such as a MySQL or a Postgres database on RDS, or maybe you're using a Redis instance on ElastiCache, it is good practice to provision these resources in a private subnet. In fact, only your applications running in your private network should be able to access these sensitive resources. We don't want them to be publicly accessible on the internet, of course. But what do you do if you want to connect to those resources from your own desktop machine? Maybe you want to run some Ad-hoc queries on your database, maybe you are investigating a bug and trying to figure out what's going on on the data layer, maybe we simply want to make sure that the data is being persisted correctly because you just released a new application and you want to control that things are going as expected.

So how can we access a resource that is not reachable from the public internet? Today we will be talking about Bastion host, sometimes also known as jump boxes or jump servers, and these are basically well-known ways to create a secure bridge that can allow us to access these private instances through the public internet when we need to do that. We will also discuss some practical architecture in AWS and the trade-offs of these solutions.

We will explore the different approaches you can use to connect to a Bastion host, including plain SSH, EC2 instance connect, and AWS Session Manager. Finally, we will give you some resources that can help you to create a Bastion host when you need them. My name is Luciano and today I'm joined by Eoin for another episode of AWS Bites podcast. AWS Bites is sponsored by fourTheorem. fourTheorem is an AWS partner for migration, architecture, and training. Find out more at fourtheorem.com. The link is in the show notes. So Eoin, should we maybe start introducing this topic by providing an example architecture that we can use to wrap our minds around?

Eoin: Why don't we start with an example like a three-tier web application architecture? So you've got load balancer, which is public facing. You might have a web application, which is running in private subnet, and then you've got a database, relational database, also running in private subnet. So for security reasons, we want to keep the network, the web application and the database are running in private networks, and that load balancer is public. So the subnet where the load balancer is deployed sometimes called a perimeter or DMZ, like a demilitarized zone. And this is like a buffer zone between the internal network and the public network. And it's not possible to reach the internal web application server from the public internet. So you're reducing the attack surface there, basically. The only way you can access it is by the very specific routing that's possible from the load balancer to that web server.

And equally, it's not possible to reach your database server. You don't want your database server publicly exposed for many, many reasons on the public internet. So this is all great in practice. And when you're running in production until for some reason, you decide you want to access that database from your local laptop. So either you've got a production instance, or you're just trying to troubleshoot some environment even in the staging environment, which is set up in the same way, or you want to run like a database user interface that connects over ODBC, for example, to your database. So like while this is a three-tier web application, we might describe this as a problem that only exists for traditional applications. This kind of challenge also comes up with serverless modern applications as well, because you can still have resources in a private VPC. You might have Lambda functions that access RDS or ElasticCache again, or ElasticSearch, or even an internal load balancer. So in all of those cases, this is the kind of scenario where something like a bastion host can help. So we've said bastion a few times Luciano, what is a bastion host?

Luciano: Yes, I'll try my best to describe that. So it is basically a virtual machine. So an instance that you run on AWS in this case, and it kind of works almost like you can imagine it as a bridge if we want to have the kind of mental model. So you can spin up this at the perimeter, so somewhere that is reachable on the public internet, and then you can configure your networking to use that to effectively route connections from public internet, which in this case may be your own laptop, to your database server. So you just put it there in the middle and use it effectively as a jump box.

That's why some people like to call it jump box. Now you might be thinking, okay, we are trying to keep our database in a private area because probably the main thinking there is security reasons. If we have a jump box, isn't that a security liability at that point? Like, I mean, we're re-exposing everything again, right? And that's kind of true. You have to be really careful with a bastion host or a jump box because if you don't do it correctly, it can become a security liability.

So there are some things that are very common that people would do to try to keep it as secure as possible. Of course, you need to use an up-to-date operative system and have things in place to always update the operative system every time there is a new security patch available. You need to install only the bare minimum software needed. Traditionally, that's just an SSH server. You don't need much more than that. And you need to open only the ports that are really needed for accepting connections on that SSH server. And you need to... Basically, if you can limit the IP addresses that can connect to that machine, that's even better. Most likely you will want only your own personal IP, so your own laptop, to be able to connect on that port. So you can create a security group that is restrictive. Or maybe if you are using that and sharing it with your own office, you can have a list of IPs that are authorized. But the idea there is don't make it open to any possible IP. Just limit to the ones that you trust. And don't use SSH through password, but use SSH keys. That's another common best practice. And one more thing is you don't necessarily need to have these instances running all the time, because we know in AWS you can spin things up and down when you need them. So it would be nice if you have a process that allows you to provision really quickly a bastion host only when you want to do a connection, and then tear it down when you don't need it anymore. I think that's an even better security practice, because then you are limiting the exposed surface only for the time when you are actually using it. And there are other tricks like port knocking, and maybe, oh, and you are more familiar with it, so you want to mention something about that.

Eoin: Port knocking is kind of just an interesting way of, I suppose, implementing a software-based combination lock for this bastion host. So the idea with port knocking is that by opening up some special ports on the machine, you don't open up SSH port 22 by default, or whatever port you're using. You have some special random ports that only you know about. And by opening these ports in a certain sequence, you can have a daemon or server on the machine which recognizes that you're essentially knocking on the door in the right sequence, and it will then enable the SSH socket. So it could use IP tables or something at that point to enable SSH inbound. And there's a number of different implementations of port knocking out there that will allow you to not leave SSH open all the time as a bastion host. So it just makes it more difficult. It's kind of an obscurity mechanism. It makes it more difficult for hackers to detect that you've got SSH running there. So maybe it's worthwhile just going through how you normally provision a bastion host on AWS. So traditionally, the way you do that is by running an EC2 instance.

You'd usually use a very small one using a recent Linux AMI, and you give it very minimal capabilities because all you want to be able to do is jump right to your destination on the target VPC. And you will have to generate an SSH key pair for that machine. And this is where it gets a little bit challenging because if you've got key pairs that are long lived, then you know you've already got a security problem because you've got issues with retention of those and also people who have access to those when they shouldn't have. Then you configure the security groups of that instance to accept traffic only on that SSH port.

And you would also make sure that the instance gets a public IP and is reachable from the internet. So you might have to update your routing tables and add internet gateway and all that kind of stuff. You also should make sure that the security group on your database can accept connections from this bastion host. And then you should be able to connect to this instance using SSH and create a tunnel to your database. So you could open a shell on the bastion and then go to your database, or you can tunnel. We might talk about that in a bit. If people aren't really familiar with SSH, many people would have seen it, but maybe not used it that much. It stands for secure shell and it's a common tool that's existed since 1995. OpenSSH is pretty widespread implementation. It allows you to manage a shell session in a remote machine and also to create tunnels. If you've used GitHub and you've had a repository of your own and you've pushed code to it, you've probably used SSH at some point, especially if you don't want to manage passwords all the time. So you can have a server application running on the machine you're connecting to, and you also need a client application. So OpenSSH is generally available on most platforms these days, including Windows, but you often see people using PuTTY on Windows as well. And it supports lots of different types of authentication. We mentioned username and password, but public and private keys with various levels of encryption are also possible. So that's SSH. And one of the cool things about SSH is its ability to tunnel. So Luciano, what the heck is a tunnel?

Luciano: Yeah, so a tunnel is also something that sometimes is referred to as SSH port forwarding. So the idea is that you can create a secure connection between, let's say, your own local machine and a destination machine using this kind of jump box. And the idea is in your own local machine, you will expose that connection in a local port. So then you can connect to that remote endpoint by just using localhost and whatever is the port that you selected locally. So basically the idea is that through SSH you create this channel, this channel is encrypted, and it's exposed somehow locally, and all the bytes that you send to this local port will be forwarded to the destination system. So maybe the database running on RDS in a private subnet. And of course all the data coming back from RDS will be channeled through your own local port, so you can also read and write all this stream of bytes this way. So it's effectively a secure way to create a channel on the public internet when you need to do something like that, and you want to make sure it's stable because it's through TCP and it's encrypted, so it's not going to leak any information along the way. So you also mentioned that managing SSH keys is a security risk, so is there any alternative? Like if you don't want to take this security risk, what can you possibly do?

Eoin: Yeah, once you have long-lived keys we know that's an issue. When we talk about IAM access keys, it's the same problem. Where do you store them? Who has access to them? Or how do you revoke them? But also another challenge is that with SSH you also have to figure out how do you collect logs and maintain audit trail of access and commands executed. That's quite common for various compliance and governance scenarios. So there are a couple of ways in AWS that make this a bit easier.

The first one, which is maybe less common, is EC2 instance connect, and that gives you a way to have short-lived native SSH connections with short-lived keys. So it works with Amazon Linux 2 and I presume the newer versions as well, and also Ubuntu AMIs. It doesn't work with other distros and OSs. If you want to install it on older versions you'll have to yum install EC2 instance connect, and then at that point you need to make sure that the principal has a policy that authorizes this special action which is EC2 instance connect send SSH publish key.

There will be a link in the show notes with examples and everything you need to do to get up and running with this. When you use this it basically means that when you want to initialize a new connection a new SSH key pair is generated on the fly. The public part of it is pushed into the EC2 updating the SSH server configuration and the private part is kept local and used to authenticate the new connection. Then the public key is removed from the server after 60 seconds so you don't have to so you have to have establish your connection within that time frame. And then once you do that you can connect from the browser from the EC2 instance console and also with the AWS CLI.

If you do use the CLI then you have to regenerate the key pair yourself and run the SSH connection command. So the fact that it's using SSH means that the instance still needs to have SSH installed and the SSH port has to be reachable and if you don't want your SSH port reachable from any public IP you can actually download the list of AWS public IPs and look for the ones associated to EC2 instance connect.

So there's a bit of work involved in that but the link for the IP range will also be in the show notes. So that's EC2 instance connect and that uses SSH so it might be a bit more familiar to people how that works. The other one which I think I've used a lot more often and is much more exciting in my view is Session Manager and it's a bit more advanced and a broader solution. It works on Linux and it works on Windows and it even works on premises and on edge devices like IoT Greengrass. And Session Manager what it does is it provides this a whole suite of things so it's not just connecting but it provides like auditable node management without the need to have sockets open and have SSH keys and all of that and you're just relying on IAM for authorization for everything. So you don't have to run any ports no SSH no RDP the only thing you need to have for your EC2 instances is to have the AWS Session Manager agent installed. So this is an agent-based approach and once you have it all set up you can create a connection from either a web browser or from the AWS CLI. All of the commands you issue can be logged then and made available in an S3 bucket or in cloud which logs for auditing purposes. The connection creation is also logged in CloudTrail as well. So you need because it's IAM based you need to make sure that the instance profile your EC2 instance is running has specific permissions there's a managed policy for that and you can also put in the permissions yourself. There's actually three kind of resources you need to deal with so you'll see it when you put in your actions in the IAM policy you've got SSM for systems manager EC2 messages and SSM messages so you actually have three namespaces in your actions that you need to use so it's actually if you if you look at the traffic it's actually talking to three different hosts in the background and once you have that set up you can actually create SSH tunnels through Session Manager sessions as well so all of the stuff we talked about with SSH port forwarding can also work over Session Manager as well. So it might seem like okay how does this work it doesn't use like normal TCP stuff it just uses IAM and AWS actions what kind of black magic is this how does it work in practice do you have any idea?

Luciano: Yeah so that's something actually that a few years ago I was trying to do something similar and I bumped into this tool called Inlets and I got curious about like what is the black magic when like you you don't have any inbound connection because you cannot literally reach that machine from the outside. How is it possible that you still can create a channel and connect to that machine when you set up this kind of thing? So there is some kind of networking trick there that I need to figure it out to really understand how is this possible.

So I investigated this Inlets project. It's an open source project you can check out on GitHub we'll have the link in the show notes. And I tried to figure out how it works and in practice what it does it's something really clever. But also later on I figured out it's really common it's something that's been used for a while and even when you use tools like ngrok they do something similar right to expose your own laptop to the rest of the world if you want to showcase as like a website you are running locally.

And the idea is that basically you don't accept any connection from the outside like starting from the outside but you can start a connection from your own blockchain machine to the outside world and then create a channel that way. And then on that channel you can basically keep sending bytes. So the trick is that you have to do the opposite. You have to initialize a connection from the machine that is in the private subnet and then with that connection you can start to accept traffic.

But that means that the machine in the private network needs to initialize the connection with some instance that has the capability to receive traffic from the public internet. So you have to have this kind of, let's call it a service, that runs somewhere in the public internet and your local machine is connected to that service and this intermediate public service is the one that receives connection from the outer world and then creates this kind of tunnel.

So in the case of SSM, this is the way it works because even if you expect the traffic on your machine you will see this machine reaching out to a domain that starts with ec2messages.something. AWS, so definitely they are creating a tunnel this way by initializing a connection from inside the machine itself and in the case of AWS of course they can use all the AWS ecosystem, they can use IAM as you mentioned Eoin, and make sure that you are authorizing all the kind of connection to happen.

At that point you basically have effectively this service being aware of all the connections that are established and you can also use that in SSM to track keep track of all the instances that you are running to distribute patches to all these instances, to see the health check, and basically it becomes kind of an overlay network that allows you to keep track of all the instances you are running and manage them. So I suppose that kind of brings another question. If this is such a generic solution to solve this kind of problem of exposing private resources in the public internet, is this something that we can use only for EC2 or can we use it outside EC2?

Eoin: There's probably a clue you've given already because we've said it's an open source agent so and that all it needs is you know a connection to AWS services like ec2messages, SSMmessages, and then you can probably imagine that it probably works in other environments and that's the case. It works on ECS as well, so you can get it working on ECS containers and Fargate containers as well and then not just shell into the host machine but actually into the containers as well.

And that feature has a different name. It's called ECS Exec in that context and it has a slightly different interface but it's still just SSM Session Manager. You always need to enable these things in advance. If you need to troubleshoot and you haven't set this up in advance, you're kind of out of luck. Or you have to go back and make sure these things are turned on, so you have to enable ECS Exec in your Fargate service and you'll also make sure you need those IAM permissions. Now Fargate doesn't need all the permissions that EC2 needs because again, its host is managed, so you just need to make sure you can do a certain number of SSM message actions from your task role in Fargate and then it can access SSM and then you can shell into containers to debug and troubleshoot and all sorts of good stuff.

Actually you can also run it outside of AWS as well, so that means you can run, if you've got instances on premises or if you've got hosts on other cloud providers or even on embedded devices, you can register these as managed instances in SSM and you can run the SSM agent on these hosts. There's a special kind of sequence you need to do to activate and register these just to set up your security. You have like an activation code and then you register the instance and then it assumes an IAM role from this instance and it can run SSM agent and then you can connect via the AWS console into your even your own laptop, for example, if you run the SSM agent on that. There a cost actually associated with running that. Just because it's your machine, once you're running SSM agent on it there is a certain cost and you also need to enable a special advanced tier of SSM to do all of that. But you can imagine that if you're using SSM for some of the other things not just SSH and bastion kind of stuff, but you're using RunCommand to be able to run the same command or a set of commands like documents across a fleet of machines that this becomes useful or if you're using patch manager to apply patches to thousands of machines, including on-premises machines, then this could be an advantage.

Luciano: Yeah, so you are definitely settling Session Manager as the solution you would want to use if you are thinking to set up a bastion host. But we also saw that it's not trivial, like there are a bunch of steps involved and you need to do many things right for that to work it's not just like one click and everything works. So when we were reviewing the notes for this episode we were thinking is there any tool that would help you to do all these things right? And we discovered this tool open source tool called basti by Bohdan Petryshyn.

I hope i'm pronouncing the name correctly, and what basti does is basically a CLI tool that allows you to provision a bastion host in the simplest possible way. So it tries to reduce the amount of knowledge that you need to have to basically be able to provision a bastion host and also create a tunnel for an RDS instance or for an ElastiCache instance. And the way works is it's basically a tool written in Node.js, so you can easily install it with 'npm install --global' and then the first thing that you can do once you have it installed is run the CLI with the command basti init and what init does, it will start a guided procedure.

On the CLI, it will ask you for a bunch of questions. For instance, it's gonna list all your RDS databases and tell you which one do you want to connect to and based on that choice it will figure out in which VPC is that instance running and it will let you select a subnet where the instance is going to be provisioned. Also that instance is going to be provisioned with the right policies and it's going to create the right security groups so that you can do all the connection using SSM.

And at that point, what you can do when you have everything provisioned, you can run a second command called basti connect and basti connect effectively is the part that connects to the instance that you just provisioned and creates a tunnel on your local machine. So the only thing that you need to do at that point is select a local port and then you can basically use localhost on that port to connect with, I don't know, another CLI or maybe using a graphic client whatever you want to use to connect to your database.

I did try to use this tool and I was very pleased with the developer experience, like using it seemed very simple. I didn't even have to think much about what I was doing, but unfortunately it didn't work the first time and this is definitely my fault. I did two very dumb mistakes that basti cannot really protect you against and the first mistake was that I selected a private subnet rather than a public one. So basically my instance didn't have any connection from from the public internet. And of course I fixed the problem.

Still didn't work and then I realized looking at the security group of my RDS instance that I didn't configure that security group to accept traffic from instances running on the public subnet. I was accepting traffic only from instances running on the private subnet, so I also needed to fix that. And at that point, basti connect worked straight away and I was able to connect with my graphic client to the database and inspect the data. Another interesting thing that is worth mentioning about basti is that it tries to keep the costs down even though it's running a very small machine.

So the cost would be minimal anyway. What it tries to do is it's something very clever what they do is while you run the basti connect command, they keep tagging your instance with like a timestamp. And then there is a cron job running on the instance itself that basically scan the instance tags and if you have been running well if the latest tag was older than a certain threshold they will automatically shut down the instance which reduces the cost a bit more. Also it makes it more secure because you are not always running that instance all the time but you run it only when you need it so that's a really clever trick that they figured out and I was really pleased to to look at the code and see how they implemented that. Yeah so definitely checkout basti and let us know if you like it as well and maybe if you are into open sources and other projects worth contributing to. But are there alternatives to Bastion hosts?

Eoin: You already mentioned inlets. You might want to look at other kind of similar tools in that realm. Tailscale is one that I use quite a lot for not necessarily for like production deployments on AWS but between my own machines and hosts. And you can even connect from your phone to your host. I actually use this sometimes when I'm building docker images on x86 hosts and I don't want to do it on my Mac and cross compile. I'm using a remote docker host and I can use TailScale to do all the tunneling for that. It's really nice has a really good user interface and manages all your devices.

That's built on top of a technology called WireGuard which is more or less a VPN solution. You could also use a more traditional VPN like OpenVPN, or you on AWS you could use like a Client VPN. So those are other ways of securing access to environments. You could also like really just do a basic solution if you wanted to access something like ElastiCache or RDS, you could create a Lambda function that accepts custom commands and executes them. I know that's something I've seen in the past for very simple access scenarios but really I don't think it's really worth the investment and I would recommend that people go with something like an SSM based solution and set that up in your developer , staging, production environments from the get-go. It makes it a lot easier in the long run. You wouldn't be able to run a graphical client or have proper open sockets through a Lambda function of course. It's still it's a simple solution for very simple tasks.

Luciano: Yeah i'm definitely guilty of implementing the solution many times and I think just because I always found until now recently that provisioning a bastion host the right way was more complicated than I wanted it to be the moment where I needed it but hopefully with all the research we did in this episode and trying all these different tools now it is much easier and I won't need to go for a Lambda anymore to satisfy these kinds of use cases. So I think that's everything we have for today. I hope you enjoyed this episode. If you did please remember to like this on YouTube and subscribe and if you have listened to the audio podcast, please leave us a review, and if you have any comment, if you have been using other solutions that maybe we didn't mention, please let us know because of course. We'd like to hear from you and we'd like to learn from you and then share these learnings with other people. So thank you very much and we'll see you in the next episode.