Help us to make this transcription better! If you find an error, please
submit a PR with your corrections.
Luciano: Hosting a public website on the internet is something pretty easy. You know we love static websites, we've been talking about them quite a lot and you have seen probably some of our work around them. But we also work with a lot of enterprises and they often want to do static sites but they want to keep them private for internet consumption, which means no connectivity from the outside and only connectivity from the enterprise network. So if hosting static website publicly is very easy, doing it for private ones is a completely different story. And today we're going to talk with you about why something that seems very easy, it becomes actually very very hard when you try to do it in a private way.
There are actually many solutions so we want to discover today what can you do and what are the pros and cons of every single solution. I am Luciano and I'm here today with Eoin and this is another episode of the AWS Bites podcast. AWS Bites is sponsored by fourTheorem. fourTheorem is an AWS consulting partner offering training, cloud migration and architecture consulting. Find out more at fourTheorem.com. You'll find this link in the show notes. Okay so let's start by summarizing why you might want to use a static website in a private mode, right? Because maybe if you haven't worked with enterprises it's not something that can be necessarily too obvious.
And I think that the main thing to understand is that companies of a certain size they will often have what they would call internal applications or internal corporate applications and those are applications that they don't need to expose to external customers or external stakeholders but something that they use internally to fulfill specific tasks. And you can imagine something like documentation, wiki, so things, information that they want to make available inside the corporate network but it can also be a little bit more advanced than that.
So if they want to expose some kind of functionalities, some kind of interactive website they can do that using a single page application and this single page application will be totally static. It's just HTML, CSS and JavaScript and it will be relying on some internal APIs for more dynamic functionality. And one example for instance could be, I don't know, you can imagine a bank that internal needs to have a way to browse, I don't know, mortgage application maybe and they can do all of that using an SPA and that SPA can call internal APIs and they can make it available only for internal consumption. Now again this just emphasizes how it is very common for web application to have a static part and a dynamic part. So we are trying to figure out how do we host that static part in the most efficient way or in the simplest way and the idea is that it's going to be only files, specifically HTML, JS, CSS, maybe some images, maybe some other assets like JSON, JSON, whatever and we need to have an easy way to serve those assets using HTTP. And for public websites, I know you are probably already thinking this is very easy, you will expect to be able to do that in like minutes just taking a repository from GitHub and deploying from there and if you have used Netlify or Vercel, these are the kind of experience you will get and we actually have spoken about that in a dedicated episode, check it out, episode 3, link is in the show notes if you are curious. How does it get difficult if you want to do it for private websites? What are the requirements there I guess is the question.
Eoin: A lot of the requirements are the same as when you're doing a public app so you want HTTPS, normally you want a custom domain so people can find it easily. You probably want a low latency as possible for fast load times, sometimes that matters less in the corporate context but not really, people's expectations for web applications is generally just going in one direction and you want support for large and small assets so not too limited in terms of asset size. Where it gets different is that unlike a public app, like you said, it should only be accessible from a corporate network, it should not leave the network boundary, often you also have a requirement to use an internal DNS service, also sometimes a third party HTTPS certificate vendor or provisioning mechanism and sometimes even you have a requirement that even for DNS records there's no public DNS trace so even though it might be acceptable for a lot of organisations to have some public DNS resources that resolve to private IP addresses, that's not always the case and sometimes it's just a hard no on things like that. So I guess why can't we just use something like S3 then? That's probably somebody's first answer when you think about static websites on AWS. Yeah and I mean this is something that was also my first idea, right?
Luciano: Sounds almost obvious that you just want to serve files in AWS, that's a string, right? That's what S3 is for. But there is a slight caveat there which is S3 is really good because it kind of offers that website feature but it's not quite there because it only works on HTTP so extremely annoying that you have like 90% of what you need and that 10% becomes kind of the hard no that kind of stops you from using that feature. So that's maybe my wishlist request for AWS to please, please, please support HTTPS on S3 websites that will make some of these things much easier and make S3 a viable option here.
Now the next best solution that we might think of if we still want to use S3 because S3 is really good for hosting files and keeping them around versioning, storing them in a reliable way and all the different things we know about S3. The next thing is probably, okay, can we just put CloudFront in front of it? And this is actually the most common way of serving public static websites. In AWS you put the files in S3 and you create a CloudFront distribution to distribute the files. And with CloudFront you get HTTPS, you even get cache, you get custom domains. So it looks like a good solution and it is definitely for public websites.
Now when you want to make it private, there are some things to consider there. First of all, you are already buying into a level of complexity that you probably don't need. Like why do you need a CDN? I think unless you are like such a huge organization with offices all around the world and you want to distribute that internal application in a way that can be performant for everyone, maybe even in different continents, maybe okay. It could make sense to think about a CDN, but for most use cases probably that's a bit of an overkill.
The other thing is that some assets somehow still need to be unsecured. Like let's say that you have a login page, most likely you would want to have a login page. How do you do all of that? Maybe you can use some kind of IDP and then MFA and all that good stuff. But once you have all of that, how do you connect it with the CDN distribution? And there you could be using Lambda at Edge and do kind of a dynamic layer of authentication there to make sure the user is authenticated before you serve that static content through CloudFront. Another thing is that you need to actually keep it private.
So how do you make it possible that if I am accessing from the internal network, I am authorized to see that content, but if I am somewhere else, I shouldn't be able to see anything? So one way is that you could be setting up a web application firewall in front of CloudFront and with a web application firewall you can define IP filtering. So you could say this class of IPs will be able to pass, anybody else is going to be blocked. And if you use private classes of IPs, that should be a secure enough approach because I think it's not possible or if it is, I think it's easy to spoof private IPs. So that should be a good way to create that kind of security boundary.
Now there is a next issue though, which is companies will tell you, okay, but then the traffic is not flowing through my VPC because effectively we have to think as CloudFront as location at the edge, which means that these are servers that live outside the data centers, the AWS data centers. So you cannot really have kind of a VPC only flow of data. You need to go around and that's not always desired because companies would want to track all the traffic in VPCs. They often use VPC flow logs. So they are not too happy if you tell them, well, some of your traffic is going to go in the public internet anyway. So that's another big no-no. this approach is way more complicated than we wanted it to be. It looks feasible. It looks nice because you can use S3, but then there is a lot going on there and setting it up correctly and securely. It is a lot of work and still there are some areas where it's still not perfect. So what's the next idea Eoin?
Eoin: I think after that, at this point you'd probably think, okay, well, the way we used to do this is just to run it in a container and have something like Nginx server static content. So let's have a look at that. And again, this is probably more complex than you might think. And the way you would normally do it is like, let's say take ECS or Fargate, you'd package your Nginx application, maybe with the static content in it, or else the Nginx is pulling and streaming from S3 in the background. And you can put an application load balancer in front of that. Then of course you have to make sure it's connected to your VPC correctly.
And then you're running your container in multiple availability zones. And with the load balancer, you can attach your HTTPS certificate. You can integrate it with Route 53 private hosted zones or third-party DNS, and you're ready to go. I still think it's like quite a lot of work for what you're trying to do here. If we go back to one of your first ideas, if we just had HTTPS in S3 and a private option there, you could use your resource policy and restrict it to your VPC.
That would simplify the thing a lot, but it seems like if you were to compare this kind of container setup to just running a very simple API with Lambda, it's a lot of work. And since we're mentioning API Gateway, I mean, there is a workaround here that maybe some people would think of in this case, which could be a bit easier, which is you could say, well, let's just use our API Gateway to serve our static content instead. And that's something that can definitely work. You can imagine an API Gateway endpoint with a Lambda function behind it, and it will take the content from your S3 bucket and push it through back to your response.
You could also do like a service proxy integration, like we talked about in the last episode, and fetch data directly from S3 and go back through the API Gateway. But there's a couple of limits there, and we talked about these limits recently when we were talking about response streaming with Lambda. You have a 10 megabyte payload limit in API Gateway. So that could be a bit of a blocking issue. You can, of course, if you're using Lambda, you can stream the response, like we mentioned with response streaming. That gives you the time to first byte benefit, which is important for this kind of scenario. And it will also give you the opportunity to go over the 10 megabyte limit. But I don't think response streaming really helps you for this case, because you only get that benefit with Lambda function URLs. And Lambda function URLs are not recommended for production by AWS, really. They are for testing Lambda functions, really, and not for this kind of highly secure corporate environment. So the other thing is that private custom domain names in API Gateway, you still need a load balancer anyway. So it seems like no matter what direction we turn to, things are just getting more complex instead of simpler. Is there any hope for a simpler approach, Luciano?
Luciano: There is maybe one. I mean, recently, we've been looking into AppRunner, which is a relatively new service from AWS. And the nice thing with AppRunner is that it tries to be a lot more like a SaaS offering, where in a way similar to what Vercell or Netify would be doing for you, where you just provide a repository or a container image, and you don't have to worry too much about everything else. AWS will take care of load balancers, network, scaling groups, and all the different things that we generally have to create ourselves when deploying things on AWS. So it's kind of trading convenience for less control. You don't see a lot of the stuff that is going on, but you get a lot more convenience. And you can definitely run static websites on AppRunner publicly. You just do a container that will serve your static assets.
But what it looks like for a private alternative, can we do that? And until very recently, you couldn't really do private containers using AppRunner. But this has just changed because now you can create private endpoints in AppRunner, which are only accessible from a specific VPC and uses the VPC interface endpoints feature. So we are going to have a link in the show notes if you want to deep dive on this feature. But this is what is kind of making us reconsider AppRunner for this particular use case. And we could get this up and running by following a few steps. So definitely we need to define the container image. So what I could think is the simplest way to serve static assets is to just spin up an NGINX container. You make sure to bundle all your static assets in the NGINX HTTP folder. And then that container should be a good enough starting point for serving those static assets on the web. Now we need to, the way that you expose this stuff to AppRunner, again, is either to a repository, for instance, on GitHub, or through an ECR registry. So you can decide whether you want to publish stuff directly to a registry or keep a repository and then AppRunner will abstract the publishing process for you. You also need to create a VPC endpoint, interface endpoint, and something that is called ingress connection. That is what is used to link that endpoint to the AppRunner service. Finally, you need to set up a custom domain. And that is where things get a little bit hairy. And because that forces you to use a public DNS record. At least it doesn't support yet private hosted zones, but there is an open issue on GitHub. So hopefully that's something that is going to change soon enough in the future. So I think that's quite a reasonable option, but is it what we would recommend today or do we have a different final recommendation?
Eoin: I think given everything we've said, it depends on what category you fall into, but let's assume that in a strict corporate environment, you can't let anything be visible outside your organization. So no public DNS, everything is completely strict, all within the network boundary. You're not going for a zero trust policy on this at all. So if you have that case, then I think the most pragmatic solution is to go with that Fargate container and load balancer approach that you described. The way I do it though, is I say, okay, within a corporate environment, don't make every team do that every time they need to deploy a static website.
Instead, provide centralized platform that where you do this once and pretty much leave it running and you just monitor it and support it like you do everything else in your platform. Just have a centralized bit of infrastructure with your Fargate container application load balancer, and allow people to just publish containers somewhere to start that content, or just put their content into a bucket and automate the process of making that available then as a static website with a certificate and a domain and all of that good stuff. Just as if S3 already had that out of the box for a private corporate network. And you can also use application scaling at Fargate to make sure it scales to your needs. So while this is a complex enough setup for just a single web application, if you've got some strict applications, strict requirements, and you've got multiple teams that you want to allow them to publish with very minimal manual effort, static websites for different line of business applications, then I think this is a good compromise solution.
Luciano: I like this approach because I think it kind of removes that on one side it still gives you the S3 option, which is the simplest interface I could think of if I just want to publish static assets. On the other side, if you have a platform that takes care of updates and making sure everything is always up and running with the most secure setup, and you avoid duplicating all of that work for every single team for every single project, I think that seems like a very good compromise. So it would be nice to be able to suggest some open source solution that does that, but I don't think I'm aware of one. So if you know of something like this, please leave us a comment in the comments box below. So let's maybe try to wrap up this episode by summarizing all the options we mentioned today. So we have S3 with CloudFront, which is good performance, global distribution, HTTPS support, but it's definitely an overkill. The CDN is still public, and you are relying on IP whitelisting. So good enough, but not quite. Then we discuss containers on ECS Fargate with ALB, which is good because it meets all the strict requirements. It's probably easy enough for new teams to add web applications when they need it, but it would be a bit too complex for a single web application because you need to set up a lot of stuff up front.
Then we have API Gateway, which could be used with Lambda to fetch from S3. It doesn't need to have a multi-xid setup and load balancers, but then it's limited to 10 megabytes, which could be a very strict requirement, maybe if you're serving big files, I don't know, maybe you have big images or other kind of big enough payloads. Also doing private custom domains is not supported yet, so that's another big issue. Finally, we have AppRunner, which is an interesting solution. It is probably something that we will need to revisit more in the future. It might become one of the best solutions if they enable certain features. As of today, it's simple enough to set up, it's multi-xid, it can give you private endpoints, but then you still are running public DNS records, so that's something that some companies might not be entirely happy with, and they might just disregard the solution because of that. Hopefully, that issue will be eventually closed and we will have that support, which might make AppRunner one of the best options out there for AWS. So that's everything we have for today, and again, if you think that we missed any idea that you maybe have solved this problem in a different way and you want to share your approach, we'd love to hear that. I think we spent significant amount of time thinking about this problem. For sure, we are missing some option there. I'm sure there are many other combinations of AWS services that you can use to achieve something like this, so if you know any one of them, please share it with us. We'd love to hear from you. Thank you very much, and we'll see you in the next episode.