AWS Bites Podcast

29. Is serverless more secure?

Published 2022-03-24 - Listen on your favourite podcast player

Eoin and Luciano take you through the ways serverless can give you more security out of the box. We cover the tradeoffs between having more security control and the responsibility that comes with this power. There are always new security challenges so we cover some of the common pitfalls with serverless and AWS security in general. Finally, we share some tips to make your serverless deployments more secure.

In this episode we mentioned the following resources:

Let's talk!

Do you agree with our opinions? Do you have interesting AWS questions you'd like us to chat about? Leave a comment on YouTube or connect with us on Twitter: @eoins, @loige.

Help us to make this transcription better! If you find an error, please submit a PR with your corrections.

Luciano: Is serverless more secure? In today's chat, we're going to answer this question. And by the end of this episode, you will know how serverless compares with more traditional deployments in terms of security. What are the main security strengths of serverless deployments, but also what are some of the weaknesses of things to be aware of, common serverless security challenges, and some tips to make your serverless deployments more secure. My name is Luciano, and today I'm joined by Eoin, and this is AWS Bites podcast. Let's start with one topic that is probably the most commonly discussed topic around serverless and security. And of course, in this case, we are trying to talk more about the context of AWS. So we are probably going to be talking more and more about Lambda. So what do we mean when we say that, for instance, it's easier in that context to apply the principle of least privilege? What do you think, Eoin? I think this is the first thing that comes up when people discuss serverless and security, right?

Eoin: Yeah, for sure. So the idea with least privilege is that if you've got very small units of deployment, very small granular functions, you can have very small granular policies that only need to do specifically what that function needs to do. So if you imagine like an API and you've got a get method, a post method, one for listing resources, they can all have the individual policy attached to their Lambda execution role, and then you don't need to have access to put item in a DynamoDB table in your read-only resource accessors. And that's a very good way to reduce the attack service, right? So if the function is compromised in some way and functions can be compromised, generally you're limiting the blast radius, you're limiting the effect that that attack can have. Of course, there are ways to inject vulnerabilities into Lambda deployments. So what are some of the ways there? I guess you're a big Node.js fan on the channel, but one of the things that we've seen from time to time is that dependencies can be injected through the Node.js module system.

Luciano: Exactly. That's probably one of the most common attacks, even outside the world of serverless, but of course, applies to serverless as well. Most likely, when you write a Lambda function, you are just going to do npm install and get some useful dependency. Of course, you need to be careful with that because those dependencies can be compromised at some point. You might be installing a version of even a very common dependency that might have been compromised in different ways, and that dependency will be running in the context of your Lambda, and it might try to do dangerous things that we'll probably discuss more and more through the course of this episode. The interesting thing in the context of least privilege there is that, of course, if you have smaller units, you will probably want to install the minimum amount of libraries that you need for every single Lambda. So every Lambda will only keep the libraries that are really needed to perform that particular Lambda task, and that probably helps to reduce the surface once again, because if one module gets compromised, most likely that's not going to affect your entire application, but only those the subset of Lambdas that actually use that module. So this is another way that I think we can see that the principle of least privilege doesn't just apply to IAM policies, but also applies to dependencies.

Eoin: In AWS, you have this shared responsibility model, don't you, where you have a set of things that AWS take responsibility for from the perspective of security, and then the part that you are responsible for. Is that significantly different, or how would you quantify that if you're moving from a system where you're doing EC2 instances, or maybe something in the middle like containers on EKS, and then looking at Lambda, how much of a benefit do you get with that shared responsibility shift?

Luciano: Yeah, I think this is a very good point, because let's start with the comparison with EC2. When you have to provision an EC2, generally, the first thing you need to do is decide, okay, which operative system am I going to use when I build my image, and with that, which version of that operative system, and then at that point, you probably will need to install some custom software, even, I don't know, system libraries, things like that.

Eventually, then you'll get to the point where you install your own code, some sort of configuration in the machine, and then at that point, you have something you can actually execute as a compute unit. So there is a lot of things that you need to keep in mind, and a lot of places where security will go wrong in a way or another, because, of course, you own all these decisions, and you need to make sure that you are doing everything as secure as possible, of course.

When you do systems in the context of, I don't know, Fargate or ECS, so when you're using container, it gets a little bit better, because you don't worry too much about the machine itself, where things are running. You only worry closer to your code. But again, when you are building a container, you are still starting from that similar concept of operative system libraries. There is still a layer where it's not just your code, but there is a lot more that you are bringing in, and you need to make sure that that layer is also not vulnerable or not compromised. And also, again, there is a similar thing with external dependencies. If you're using third-party containers, those might be vulnerable as well, or there might be malicious attacks or other worms. There's a whole lot of those.

Eoin: Yeah, exactly. And I guess you're thinking about the operating system, but I guess when you have containers and instances, you've got the operating system, but also the disk. So you might have file system security to worry about, and then you have the network in a container or an EC2 instance, so you have network security to worry about too.

Luciano: Yeah, I guess the point that we are trying to make is that in the context of a Lambda, all these things are... You kind of get a smaller set of options, and therefore you also get a smaller surface. And most of the things that are happening underneath in the underlying layer that executes your Lambda are managed by AWS, and AWS should reasonably take care of keeping security under control. And I think we had a very good example with the Log4j famous vulnerability when it was a few months ago, where everyone was rushing to update their deployments because, of course, very, very dangerous vulnerability there.

It could allow pretty much uncontrolled remote execution, so probably one of the most dangerous kinds of vulnerabilities. And Log4j is also one of the most common libraries in the Java world, so almost any Java system was exposed at that point, or at least potentially could have been exposed. So everyone was rushing overnight to fix all their Java deployments, and it's a huge surface to fix overnight.

And in the case of AWS, if you were running, for instance, a Lambda using the Java runtime, AWS immediately, or almost immediately, took care of trying to patch that runtime to try to reduce to the minimum the risk for that for that runtime to for people to use that would exploit that vulnerability and create attacks based on that. So this is just an example of having that shared responsibility model. It's something that can help in terms of security. Right. So I suppose the next topic I have in mind is another advantage of Lambda, and this is a little bit of a contentious one, because sometimes you talk about short execution times as a negative thing for Lambda. It's more like a blocker, but it's interesting to discuss this because in terms of security can have, I think, even a positive effect in a way. What do you think? What's your opinion on that?

Eoin: Yeah, and I've heard, I can't remember who it was, but I've heard members of the Lambda teams state this when people are arguing for shorter execution times, they say, well, look, one of the great benefits of having a short execution time is that the attack window is much shorter. So even if people do get access to that environment, they only have 15 minutes to do the damage or to exfiltrate the data or whatever it is, but it also means, you know, as systems run, they accumulate state or cruft, and this can also have security consequences. So having things that are, don't accumulate this kind of state, let's say, even it could be like some sort of memory leak or something that could eventually open up an attack, that 15 minutes is actually a benefit. So I think we should, in a lot of cases, people should try and embrace that 15 minutes and say, okay, well, how can I split the workload so that everything runs in a short period of time in this kind of stateless way? And then you get that security benefit as well.

Luciano: Absolutely. Yeah, I agree. Even though sometimes I think that there is a little bit of a double-edged sword in this, because let's say that an attacker managed to somehow inject something in a Lambda and execute code. Again, we'll be talking a little bit more of some examples. What's going to happen is that from the perspective of somebody managing this infrastructure, whatever the attack is going to do is going to vanish very quickly when the Lambda gets disposed and the next execution is started, right? So it can also become harder, I suppose, to see when something bad is going on, because you don't have, I don't know, any simple way to do, like to run scanners constantly over your infrastructure or to detect drift because just things get recreated and deleted all the time. So if an attacker manages to time an attack very well, which is going to be hard, of course, for the attacker, but it also means that it can be harder to detect that kind of attack. So that's something to be aware of, I guess.

Eoin: Are we generally talking about like looking at the normal set of attacks that people should be mindful for in any execution environment, but a subset of that? Or are there like new ones that emerge?

Luciano: I think for sure we can talk about like same type of common attacks and how they change. They will change a little bit in the context of serverless, but most of them are common attacks. I don't expect anything like extremely new. They might just have a slightly different variation in the way that they are performed and the effects that they can have. Of course, the first one that we already mentioned is either data filtration or remote code executions and the two always kind of go together because one is kind of useful for the other.

And we say that there could be ways for an attacker to be able to run arbitrary code in the context of a lambda. We already mentioned the case of dependency that gets compromised. That can also happen with injection. So if your lambda is receiving external input and that external input is used in an insecure way, that might lead to remote code execution as well so that there might be several different ways for an attacker to be able to run arbitrary code in the context of your own lambda.

So let's say that that happened in a way or another. What can happen next? What can the attacker do? And the first thing is that I would expect an attacker, the first thing they will do is probably try to do some recon so they will try to see, OK, I am inside an AWS account. What else can I do? What's exposed from this point, from this start to the end? And for instance, most likely what they're going to do is they can try to grab the credentials of that particular lambda where they are running and they can do that in different ways. There is like at that point when you can run arbitrary code, you have access to the credentials. So there's really no environment variables. Yeah, exactly.

Eoin: Which is one good reason not to store secrets, additional secrets in environment variables, of course. Exactly.

Luciano: But also the lambda itself will have a policy that gives some permission to the lambdas for the lambda. So if those permissions are very wide open, the attacker can start to do list of sorts of resources, try to spin up EC2 instances. And by spinning up EC2 instances, they can create a more persistent footprint in the infrastructure. Maybe they can spin up, I don't know, something that allows them to do remote control of the infrastructure in a more permanent way.

They can steal data because they can access S3. So you need to be very, very careful at that point that the surface of that lambda is as restricted as possible, because whatever the lambda can do, the attacker will be able to do the same things. So it's, again, very important to apply that principle of least privilege. And another example, if we just want to think about maybe the attacker doesn't really care about using resources in your AWS account that they don't want to run compute and basically steal your money in that indirect way that they just run compute that you are going to pay for and they can do their own stuff with it.

Maybe they have, exactly, like crypto mining or DDoS endpoints or stuff like that. Maybe they care more about data. So another interesting thing is that they might try to exfiltrate data. And how can they do that? And again, we mentioned they could try to exfiltrate environment variables, secrets and so on, because that can give them other types of access even to other third party systems that you use in your company. But they can also just try to exfiltrate interesting data that you might have in your account from S3, from databases. And of course, the first way that they are going to try is to try to upload some data that they got access to, to some remote server. So the next interesting topic is probably network traffic. Like, can Lambda give you ways to kind of control the network traffic and limit this kind of attacks? So it can definitely be beneficial as opposed to limit the outbound network traffic. But how is that possible with Lambda? Is it easy? Is it obvious?

Eoin: We should talk about that. So maybe we're getting into the area of, OK, what are some of the challenges that serverless security can bring that you don't have in traditional security, let's say. And you already mentioned, like, OK, your 15 minute window might give you a short opportunity to actually spot attacks. What else do you think is challenging? Because it seems like it's fairly beneficial so far.

Luciano: Yeah, I think there are challenges mostly from an engineering perspective, because if you compare something a little bit more monolithic, like a single EC2 instance or even a few containers, when you look into Lambda and serverless, you generally have a lot more moving parts. That means that you need to be careful and diligent with a lot more smaller things so that the room for mistake is there. It's like it's going to, you're going to have to take care and be strict with a lot more things.

So something might slip more easily. So I think in that case, it's good to have processes in the company and to have a more structured approach. For instance, one thing that I really like to do, even though it's a little bit painful, is whenever I write a Lambda, I basically give it zero IAM permissions, except from logging and the basic stuffs. And then I need to use, for instance, DynamoDB. I write the code, I make it fail.

And I see, OK, why did it fail? You don't have permissions to put item, right? For instance, then, OK, do I need permission to put item? Obviously, yes. How can I limit that permission? Maybe I can limit to a subset of resources, not like put item asterisk on the table. So I try to do that very strict exercise of, OK, now I need to give some permission, but what's the minimum level of permission that I could give?

So that's a very painful way of doing it. But I think it's one way that will help you to make sure that you are at least thinking of what's the minimum amount of permission you can give to a Lambda. Of course, it's still challenging because over time your Lambda will evolve. Different people will work on that same Lambda. So sometimes you end up changing even the implementation and you might forget to remove permissions that maybe you don't need anymore. So that process is something that needs to be revised even every time you do updates, not just the first time you create a Lambda.

Eoin: You might use IAM access analyzer, some of the tooling looking at your analyzing your cloud trail. But yeah, just I guess auditing your policies to make sure they're not overly permissive because you add them right. But then you remove your DynamoDB use. You don't need to have your put item action anymore, so you should remove it.

Luciano: Yeah, that's actually a very good tool and we'll put a link in the show description because I don't think many people use it enough, I guess.

Eoin: So do you think security can get in the way then of like the one of the things we talked about when we're talking about serverless deployment is OK, it gives you the ability to increase your deployment velocity, your speed of iteration because you're able to isolate what you're doing into small units. You're relying on less infrastructure, so you have less to deploy. You're using a lot more of what's available to you in terms of managed databases, API gateways, and so on. Managed databases, API gateway, you know you're focusing on the very least minimum amount of business logic you need to implement. That's the idea. Do you think security might become like a barrier in that like if you've got all these hundreds of IAM roles and policies for all of your Lambda execution roles and your step function execution role that all of this can slow you down?

Luciano: I think there are cases where that can happen and it's depending on the kind of process you have in the company because for instance, if you have an external security team by starting I mean not working directly with the development team and maybe they need to sign off everything that is related to security like for instance they need to review manually every single policy and sign off before you can deploy.

At that point that can become a blocker because you're probably going to create new policies every day and if you need to stop and wait for somebody to approve that it might block basically your work every day and you're not going to take advantage of that velocity that you could generally have with serverless deployments. So there is another case where having a good process, having collaboration and having automation and tooling, it's something that might help you in that direction and yeah at that point you might still get a good enough level of security and still retain that velocity. But I suppose it's tricky to get to that level of maturity and really understand how the process can help this way of development and vice versa how that's not going to affect security in a bad way.

Eoin: Yeah you have this idea of like for the over the past five to ten years of shifting security left and having development teams take responsibility for security and that happened I guess with like DevOps movement and containers it became possible I suppose for developers to control things at a fine-grained level and to take part of the responsibility for security. Does that increase with serverless? Does it decrease? Is it are we now the case where you have to have developer teams that have an increased level of security awareness? Is that a disadvantage? Because I guess like previously you mentioned you might have a team that was dedicated to security so it was kind of somebody else's problem. Now that might have had its disadvantages because you know you've a dependency on that centralized team but does it become like an extra skill set that serverless developers need?

Luciano: I would probably say that this is something that like all the because the security risks are growing every day and there is a lot more concerns and I think that's a skill that every developer needs to develop anyway to some extent so I would be very opinionated on this that way. But yeah I agree that it is also beneficial to try to reduce the gap between teams that are focused on security and that's their core skill and teams that are more focused on software engineering and that's their core skill and I've seen this new term coming out a few times DevSecOps that tries to kind of define I think that idea in a way that DevOps so operation, development and security is not three different things but it's actually something that needs to work very closely together and it needs to be like one unit, one skill, one methodology and probably one comprehensive set of tools that they facilitate that all of that from happening consistently in a company.

Eoin: Yeah in this serverless world then the security where you've got that level of skill does it become all about IAM and can you dispense with the network security practices that have evolved and improved over decades? Is this more just about IAM on AWS?

Luciano: I suppose yes and no I would say because I guess if you just ship your Lambdas in the default VPC then you kind of get a standard starting point in terms of network security where things are open to some extent but I mean most of them you are just going to be worried about IAM because you take everything else for granted but that doesn't necessarily mean that you are secure it's just that you are not thinking about some possibilities that can happen through the network.

With a Lambda you don't get inbound like arbitrary inbound traffic you only get the events that you configured so in that sense this is good but then you are still like if there is an injection if there is remote execution that Lambda in a default VPC can still reach out to any arbitrary server on the internet so that that's something that maybe you want to limit because of course at that point it's a risk. Somebody can filtrate data arbitrarily connect to any arbitrary server so if you want to control that you again are in the realm of okay let's do our own custom network security let's do our own VPC and let's configure network access and maybe at that point you can control more what's going on on the outbound traffic and limit exfiltration that way.

Eoin: Yeah it's a very difficult balance I don't think there's an ideal solution there so if you talk about okay I want to put my Lambda function in VPC so that I can access an RDS database and that's fine you can just ensure that you've got access to that RDS database and they're in the same VPC and there's no additional routing outside that like there's no internet gateway no NAT gateway so they can't exfiltrate data outside that network but at the same time you might have an existing on-prem system so you might have a VPC that allows you to route through to the company's corporate network so that it can access all of their existing on-prem systems and systems running in other clouds or whatever it is but you can imagine that in order to give network access to one of those systems you may have to give access to the corporate network in general and then think it becomes a little bit more onerous trying to restrict it down to a single IP address or a single host or set of hosts because giving an attacker through Lambda or through EC2 access to your full corporate network is seriously risky and usually your corporate network then access has access to the internet and you might have an intrusion detection system or intrusion monitoring system but suddenly your blast radius is seems quite large so it seems like I always kind of believed you know don't try and avoid VPCs unless you have to because you know you don't have to worry about the that level of that level of access and it's also a little bit more complexity and you suddenly have to think about network security seems less serverless once you start bringing in VPCs but like you say if you don't have a VPC you can't control data exfiltration very carefully because you'll have access to the internet so people can access their own hosts and take your secrets take any data they can pull from your S3 bucket and upload it to their host wherever it is on the internet there was a solution to this at one time which I know you've encountered Luciano called Function Shield but it looks kind of at least I know the company who developed a PureSec were acquired by Panao Total Networks so I think it's not maintained anymore it's been absorbed into some sort of commercial offering the idea behind that was that you could inject it into your lambda functions no matter what language you were using and it would make it harder to do disk access or network access or do other command execution in your lambda environment but yeah I'm not sure there anymore but so I don't think there's a valid alternative I'd be interested to hear if anybody else has a creative way of solving that problem of network access internet access from a lambda

Luciano: without a VPC yeah not sure how it was implemented but that the way you would use it was actually really simple and interesting it will really you import a module and that module would work in Node.js, Python and Java I think so also cross cross language and then with that module you just run a function that says I want to use this policy and this policy says this lambda cannot I don't know execute sub processes or it cannot use the temporary file system or it cannot use the network and then the library will limit all these things from happening so that will be an extra level of security for you because these are not things you can easily control with IAM policies they are more behaviors of your code and this way you can also control behaviors that you don't really need your code to perform so yeah I'm also interested to see if there is any other alternative these days.

Eoin: Are there also potential for attacks where you know what we know that one of the other advantages of serverless is that it can scale with your workload but what if that workload isn't a genuine workload but a malicious kind of denial of service workload what what do you think about that?

Luciano: Yeah no there are an interesting few cases that some of them I've even encountered myself but I suppose that the point is that because you have units of computers that just spin up themselves and they can speed up in the order of thousands very very quickly there are a lot of situations where that might go against I suppose your benefits and for instance they can let's say it's an attack what can happen for instance that if you have somebody triggering a DDoS attack that DDoS attack might spin up a lot of lambdas for you and maybe you don't really see a negative effect in your infrastructure because the infrastructure can actually scale and take that attack but you might see a negative effect in your building because suddenly you are running I don't know maybe an order of magnitude more many lambdas that you generally run so probably your bill will increase proportionally to that so that that can be a dangerous side effect of serverless that should be taken under consideration even though it might not be strictly related to security. Another case that I had myself and this is also related to VPC and configuring your own VPC I had a case where I actually did configure very badly a VPC because I started to put lambdas in a subnet where there were also other services I think it was Elasticsearch an Elasticsearch cluster and living in the same subnet at some point there was a bug in the code where many lambdas were retrying because of a bug in a processing logic so it was failing and it was retrying and suddenly that generated a huge number of lambdas trying to compete to address that event that would never be fixed because there was a bug in the code so I basically saturated the subnet with lambdas trying to do something that I would never be able to do and that stopped Elasticsearch from scaling up because the Elasticsearch nodes couldn't be spun up because they couldn't get an IP address in that subnet so that was an interesting thing that very convoluted set of events but again the reason is that if you don't do your network security correctly network configuration correctly and you don't consider that lambda can scale massively in a very short amount of time then you might have this interesting side effects where lambda is actually competing with all your resources and it might have an impact on them not being able to scale as much as you expect so of course then we needed to fix the bug in the code we also changed the network configuration to isolate the lambdas in their own subnet rather than keeping them in a shared subnet and we fixed it that way but it was not something obvious it was not something we anticipated before we we actually encountered the problem so yeah I suppose that the we could probably finish this episode by summarizing what serverless doesn't really protect you against what do you think good idea yeah so yeah first of all we say you don't get protection from injections meaning that yeah it will be probably a little bit more complicated to inject arbitrary commands or I don't know SQL injection XML injection all sorts of injections you can think of it will be maybe a little bit harder because you you have one additional level of indirection because you have some sort of input that gets converted to one event and then your input is encapsulated in that event that will go into lambda but that doesn't really give you any security or perfect guarantee that people cannot perform injection attacks then you are still receiving external input the way you process that external input can lead to injection attacks and we also discussed about dependency poisoning so you are using third-party dependencies so those might be poisoned and those might create side effect and security vulnerabilities if you are not careful so one suggestion that is to use dependency scanners like SNCC is probably one of the most famous but there are a bunch of alternatives make sure you have a process as part of your CI CD or your deployment process to always keep your dependencies under control and make sure at least you scan them to to find commonly vulnerable dependencies and update them another risk is data tampering or data destruction again if code is executed in your environment malicious attacker can do all sorts of different things they can try to aspitrate data they can try to compromise data so that's you don't get many guarantees from serverless that that's not gonna happen to you maybe it's again a little bit harder but it can still happen and you need to put boundaries in place and yeah remote code execution is absolutely related to that because at that point an attacker can try to do anything not just as filtrate data but they can run any code they want so you might try to limit the exposure of a lambda but if you don't limit that correctly an attacker still can have a very big surface to use and yeah I think that that's all I have I think we we can put a bunch of links for more in-depth material for instance there is a good OWASP paper that gives you the top 10 vulnerabilities in kind of in a more serverless way so that they reconsider the top 10 OWASP from a serverless perspective and I think you will find some common points with what we discussed today but probably there is a lot more material there and then there is an interesting blog article from AWS that gives you a bunch of additional tips on how you can architect secure serverless applications so we'll put the link for that in the show notes as well. Okay anything else you want to add, Eoin?

Eoin: Yeah it's never going to be possible for us to cover everything in security but I think that's that seems like a pretty comprehensive list but I'm curious to hear what we missed if anybody has any ideas for what we missed any other security incidents that they've learned.

Luciano: Yeah absolutely security is a huge topic and it's always evolving so I'm sure that there are a lot of stories that people can share and we can all learn from them so definitely please let us know in the comments reach out on Twitter and yeah we'll be more than happy to learn with you and share your learnings so please do that. Okay and with that thank you very much for being with us today and we'll see you at the next episode.