Help us to make this transcription better! If you find an error, please
submit a PR with your corrections.
Eoin: Zero trust is the major security trend of the last five years. These days, it feels like everything is public and more open, but you have to authenticate every connection. In this episode, we're going to delve into the topic of virtual private cloud or VPCs and whether they should be avoided or not as we evolve to this zero trust future. If you stick until the end, we'll share our strategy for when to use and when not to use them.
I'm Eoin and I'm here with Luciano and this is AWS Bites. AWS Bites is sponsored by fourTheorem, an advanced AWS consulting partner that works together with you on architecture, migration and cost optimization. Find out more at fourTheorem.com. You'll find the link in the show notes. Zero trust is a concept that moves away from relying on strong network controls alone to strong authorization, fine grained access control and generally fewer network restrictions.
AWS credentials and IAM are a good example of this since you can access these resources over the internet even though they might be very powerful, very private and require strong security. But instead of enforcing network controls, AWS provides AWS v4 signatures for every API call and very fine grained policies with IAM and continuous monitoring and detection on top of that. Even if you've ever built an API with JWT or JWT tokens for authorization of a publicly accessible API, this can also be part of a zero trust approach. So this is all very practical in the context when an ever increasing number of corporate environments would be comprised of cloud deployments and on-premises systems and third party SaaS applications. There's just too much corporate footprint now living outside the boundaries of the private corporate network. So you might be wondering what does this mean for architecting systems on AWS? Do we still need VPCs? What's the story? Where are they relevant? Are they overly complex? Luciano, do you want to start off talking about what is a VPC first of all? What does it mean for architects or developers working on AWS?
Luciano: Yeah, I think it's a good idea to start by defining what is a VPC. And in the context of Amazon, VPC means virtual private cloud and the idea is that you are defining a network but in software. So you're not just connecting cables around, I don't know, somewhere in your storage place but no, you are actually doing all of that remotely and you are configuring a virtual network. But it's not too different from like a real network that you might have seen in a data center or somewhere else. The idea is that it's something that is logically isolated even if it lives in the cloud together with all the other accounts that live in AWS, it is isolated from all the others. So you need to define that and configure it so that you can start launching AWS resources in that particular virtual network.
And we can imagine that, again, it's like you build your own network so that you can start connecting things and provide services to whoever needs to be able to access those services, except you're doing all of that in the cloud. And when you start creating your own VPC, what you can do is you can manage a range of different IP addresses and these are private IP addresses. You can define subnets and place them in different availability zones.
You can define your own internal routing rules, inbound and outbound traffic. You can use internet gateways and NAT distances to basically define whether that network needs to have public internet access or it needs to be able to reach out to public IPs that live somewhere else in other networks in the public internet. And you can also do things that are a little bit more advanced. For instance, you could be connecting different VPCs together using a feature called VPC peering or using something like Transit Gateway. And finally, other things that you could be doing, for instance, you could be provisioning VPN access so that you could be able to join that VPC, for instance, from your own laptop, development laptop, and access resources that way. Or alternative things that you could be doing is you could be provisioning Bastion host. So there are different things that as soon as you start to think in terms of creating a virtual private network, you need to think about, define, architect, and then start to actually provision in AWS. So I suppose with that introduction, which I hope was good enough, when do we want to use VPCs?
Eoin: Well, historically creating a VPC was like the starting point for pretty much every new cloud project on AWS. You needed to configure the place where you put all this cloud infrastructure before you can start provisioning anything. And the rise of service and to manage serverless has changed this in a lot of different ways, effectively kind of making it easier for developers and architects to adopt cloud services without needing to think about networking, because it's all just part of AWS's managed network and you don't have to worry too much about IP addresses or routing or security groups. You're just relying on those kind of zero trust pieces like the IAM policies and roles and service linked roles and all that kind of stuff. So why do we still need VPCs then? Well, VPCs can still be incredibly useful when dealing with sensitive data or when you have the need to have fine grained control over the network environment. They can also be used to provide a secure connection between your on-premises infrastructure or another cloud and AWS resources. So they're not only useful for non-serverless applications, I would say, but they can be used in serverless applications as well. So an example is if you're using AWS Lambda, you can deploy your Lambda functions inside of VPC.
That's an optional configuration, which will give them access to resources inside that VPC, such as a database or some other services. So in general, VPCs and serverless applications can be used for a few different use cases. So we mentioned accessing resources in a VPC or in an on-premises network. Another one is preventing outbound internet access to prevent data exfiltration in the event of a vulnerability. So you can imagine if there's a supply chain attack and one of your modules is kind of rogue or has a vulnerability and an attacker gets access to that environment, they might need to exfiltrate some keys or some data. If they don't have internet access, you're making it very difficult for them to do that. So having a network boundary on top of your zero trust mechanism makes it, it just gives you more defense in depth there. Of course, from a compliance point of view, or just like a really strict, well thought through network architecture, you might want to do traffic analysis. And if it's all on AWS's managed network, you can't do that. But if everything is in a VPC, you can use things like VPC flow logs to monitor your traffic. And then you can also get fine-grained access control over access to AWS services with routing, security groups, and VPC endpoints as well, which can have their own policies. Finally, I'd say service discovery through private DNS is another thing you can get if you have VPCs. So if you've got a hybrid of instances, containers, and things like Lambda functions, it might make sense to think about service discovery and DNS as well. So there's still some valid cases for thinking about VPCs. I wouldn't dismiss them just yet, but it's not necessarily, you would have to think through the trade-offs and think if you could have avoided VPCs until now, you might think, okay, should I go and start creating network resources just because I need a database? Or should I continue to try and use serverless options where I don't have to think about networking? So why would you avoid VPCs, Luciano?
Luciano: Yeah, my main reason is that VPCs can be quite complicated, even though they're useful for all the reasons that you described, they are not really that easy to get right. And even when you are just starting, there are so many concepts that you need to learn. I remember the first time that I started to work with AWS and I was just trying to deploy a simple application. I did this one day course just to learn, I think, 11 different concepts that are the ones that you just described. And I was overwhelmed and it felt a bit unnecessary to having to go through all this pain just to deploy an application in the cloud.
Of course, in retrospective, I think it was very valuable to learn all these things. And I feel like now I understand the cloud much better and I can use it better. But I think the point still stands. When is it really worth it? Where should you go through all these layers and learning property properly while maybe other times you can just avoid all this complexity and focus on your application? So the complexity bit is probably the main reason. And kind of a consequence to that is that because it can be complex to understand setup correctly, sometimes you can do silly mistakes.
I remember one time I created a subnet, I allocated a Lambda in that subnet, and then I didn't realize that that subnet had very few IP addresses. So when that Lambda was running at scale, it was basically starving to get more IP addresses. The Lambda runtime wasn't able anymore to provision more Lambdas. And therefore, at some point, we hit a ceiling and we couldn't really serve the users that we were trying to serve anymore. And that was just a very silly mistake at the VPC level, configured in the subnets. And it was actually a bit complex to fix because then we needed to redefine all the VPC and it was a bit messy to fix that kind of issue. This is just to give you an example of stuff that can happen. If you want to do it, you need to learn it properly and you need to spend some time making sure you understand all the implications. And similarly, you can think about routing firewall security groups, so many things that can go wrong there.
The other thing is that you might be wanting to provision bastion hosts or VPNs. Again, because if you start to have things behind, basically in a private network, in a way that they are not very easy to access. Sometimes if you want to access from your own laptop, because you are trying to debug something, maybe access another base, for instance, or Redis, you need to figure out how do I do that? How do I enable that? And again, that requires provisioning more infrastructure and thinking more about security. And you are potentially creating a backdoor that is useful to yourself. But if you don't make it secure enough, you are actually creating a security risk there. We actually have been speaking about some of these topics, specifically around the concept of bastion hosts in another episode. So we'll have the link in the show notes if you're curious to explore that topic as well. And again, monitoring and intrusion detection. It's something you probably want to do. You want to have ways to understand what's happening in that VPC. And if somebody is having access and that somebody is not supposed to do that, how do you spot that and how do you react to that? You need to put all these kinds of things in place somehow. So again, I think that just to boil it down is not an easy feat. It's something you need to spend a significant amount of time learning, trying things before you can feel confident that you are doing it correctly and you can go to production without any big surprise. So I suppose if all of that sounds scary enough, how realistic it is that you just go VPC-less and be able to deploy even significantly complicated application without having to think about VPCs. Is that actually possible today or not? Yeah, definitely possible.
Eoin: I've been part of teams, I think you have as well, where we've built significantly complex applications that are completely VPC-less. And I think there's plenty of public examples out there of companies who are able to do it. There's a lot of people advocating for this VPC-less kind of architecture. And I think a lot of people believe that it's kind of a core tenant of serverless applications, that you should try and avoid VPCs. And I don't necessarily agree that you have to avoid them.
Let's just think about some of the services that don't require VPCs and we might be able to extrapolate what you could do with them. So DynamoDB, for example, I think that's one of its advantages and appeals is that you don't have to set up any networking in advance. API Gateway, you don't need it. Kinesis. Also, most of the event services, SQS, SNS, EventBridge, Step Functions as well, you don't need to associate it with any of the tasks within your Step Function. Can connect with VPC, of course, if you want to. CloudFront even doesn't use VPCs.
Cognito. And then you have the compute services like Lambda, AppRunner, and Glue. So these are all capable of running without a VPC. And you also have lots of third-party services that can integrate with AWS without needing private link. So you can think about, I suppose, any service that uses IAM or even services that are integrated with EventBridge. So all of these services, they use the AWS network and don't require you to set up a VPC. And Glue, AppRunner, and Lambda, those compute services we mentioned, while they don't require a VPC, they allow you to specify VPC subnets so you can connect them to network-connected resources. So you have that option there. I think it's very possible then to build large, complex applications without using VPCs. But it begs the question, is this a utopian vision we should all strive for? Is it a good practice? Is it a well-architected thing in 2023 to be trying to avoid VPCs? Are they some sort of anti-pattern?
Luciano: I wouldn't say, at least in the context of serverless, that VPCs are an anti-pattern. As you said, it's something that you can avoid in many situations. But I think there are situations where if you weight the benefits and the complexity, there might be a trade-off there where it could be worth it doing serverless and also do VPCs. And just to give you some examples, very simply, probably a use case we've been talking about before is if you start to use RDS, that's kind of a service that forces you to go down the path of creating your own VPCs. And then if you want to connect a Lambda, for instance, to your own RDS database, the easiest way is probably to just put the Lambda in the same VPC where you have your own database and make sure that the whole routing and subnets is configured correctly so that you can have access from the Lambda directory to the database. And very similarly, if you use ECS or EKS or Elasticash or Kafka or OpenSearch, all these services are built around the concept you need to effectively provision instances behind the scenes in a specific VPC. So you need to think about how am I going to structure this VPC and how I'm going to connect things together. So this is definitely a very good case where if you use some of these services, you are a bit forced down the path of thinking about VPCs. But again, as you said before, it's not necessarily a bad thing because you gain additional control in the sense of being able to add more in-depth security. So you can add more things around your application that makes it a little bit more secure. So it might not be just extra work for you. It gives you some advantages as well. In summary, I will say that VPCs are not necessarily an anti-pattern, but it's just something that you might or might not need to use. And if you need to use, of course, you need to be careful in terms of just not discount the amount of work and the amount of complexity that there is and make sure you take enough time to absorb all the concept and to try different things and to test that you're set up is actually doing what you think it needs to be doing. So again, anything that you would add in terms of how to think about VPCs in the context of Lambda?
Eoin: Well, if you want to access resources in VPC or in an on-premises network, you need to define VPC subnets and security groups for the Lambda function. It's actually interesting with security groups and Lambda function, it can be a little bit misleading because normally with security groups, you can say what ports are allowed in, what ports are allowed out, but Lambda doesn't really work like that because it doesn't have that stateful socket connection. So the only reason you define security groups really with Lambda is that so you can use those security groups in the rules of other security groups and allow that Lambda function to access network resources. The VPC with Lambda used to make a massive difference for cold start times. So Lambda used to create and attach an elastic network interface or an ENI for the cold start of each container. I think that was what led to the problem you mentioned having experienced when your IP CIDR block was too small. But this is no longer the case. So in the end of 2019, AWS changed this. So an ENI is now provisioned for the function when you deploy it and each Lambda container routes through this interface using a VPC to VPC NAT instead. So it's much more efficient. It doesn't impact cold start times like it used to and just makes VPCs with Lambda way more useful than they used to be. So that's what it used to be. So there are some advantages then of using VPCs for Lambda that are really worthwhile to state. And I think one really interesting one is something we alluded to earlier. You can prevent outbound internet access. There's no other way to do this.
There used to be this product called Function Shield, which doesn't really exist anymore, which was like a low level interface that would try and prevent outbound internet access to prevent exfiltration. But the only way to do this really now effectively is with a VPC and ensuring that there's no route to the internet from that VPC. So it's a useful security perimeter then. And if you've got that supply chain attack problem, you don't have to worry as much. You can also then analyze traffic with VPC flow logs and you've got fine grain network control, right? So in a corporate environment or if you've got other network resources, it can make a lot of sense to use Lambda with VPC. Now I would still in a lot of cases, even when you've got a corporate environment end up with a hybrid approach where you've got lots of Lambda functions that don't need to access those network resources. So those ones don't get assigned a subnet. And then the ones that do, do get to assign a subnet. So you can mix it like that. So I would definitely think, okay, don't worry so much about VPCs for Lambda. If you've got VPCs and private networks, it makes sense to just start using it. And we've got plenty of very serverless applications out there now that use ElastiCache and RDS and other VPC connected resources. And it's just becomes part of the architecture that you have to use VPCs at least until we get into a situation where maybe AWS will start to think about more of a zero trust model for all services. Like with RDS, you've got the data API now, maybe in the future, we'll have other examples where you're just using more like HTTP based interaction for all of these services using IAM instead of network controls. I'm going to try to do a summary of everything we just said today.
Luciano: And I suppose that the first point is that applications without VPCs are possible, that that's a given, and they can be simpler if you are able to build something like that, you don't have the extra overhead of thinking about VPCs. But VPCs are not going away, at least not in the short term. And they are an important feature that I think is good, especially if you are an architect, if you learn the principles and if you're able to use them, and if you're able to understand, especially the trade off, like when it makes sense to use them, when you can just keep things simple and not having to bother with VPCs. And if you're using VPCs, again, very important to do them, like take your own time to do them correctly. There are a lot of templates that you can find online that just create VPCs magically for you. They can be useful, but at least make sure you check out what they are actually doing and you understand what's happening, you are okay with that particular setup.
Don't just give it for granted that there is a matching default that would work in every case. I think you need to, again, spend the time, understand it, and then make the right choices for the specific context. It can be complex, but I feel that things are going in a direction where providers are giving simpler abstractions. So I think over time we'll be able to use things that are a little bit simpler, even though I think if you know the theory, if you know what's going on behind the scene, that's always something powerful to have, because I think you can understand those interfaces a little bit better and you can avoid some of the pitfalls that might still happen, even though you might have a simpler interface to deal with. There is an interesting tool called VPC lattice, I think it's pronounced, and we'll be doing probably another episode where we talk a little bit more in depth, but the promise of this particular service is that it makes it easier to define networks that are a little bit more advanced and where you can connect things in a very dynamic way. So it could be a very interesting topic to explore a bit more, so we'll probably leave that for another episode in the future. I'm looking forward to that one, actually.
Eoin: We're getting lots of great feedback on AWS Bites online, and when we meet listeners and viewers in person, which is always great as well. If you do get something from these episodes, please do leave a review and rating wherever you listen. It helps other people to discover to the show. And if you watch on YouTube, like our episodes, subscribe and share with all your friends and colleagues. We really appreciate that, so thanks for listening again, and we'll see you next week.