Help us to make this transcription better! If you find an error, please
submit a PR with your corrections.
Eoin: Lambda functions have revolutionized the way we build and deploy applications in the cloud, but are we really harnessing their power efficiently? Today we're going to uncover some of Lambda's best practices, combining insights from AWS documentation, but also our own experiences to help you to optimize your serverless architecture. Overall, we want to make sure you have a great time when working with Lambda.
I'm Eoin and I'm joined by Luciano for another episode of the AWS Bites podcast. AWS Bites is brought to you by fourTheorem, an AWS partner that specializes in modern application architecture and migration. We are big fans of serverless and we've worked on a lot of serverless projects, even at a massive scale. If you are curious to find out more and to work with us, check us out on fourtheorem.com. One of the things that's worth starting with is how Lambda works under the hood, and we've covered a few different episodes in the past about AWS Lambda, but it's worthwhile having a brief refresher. So, Luciano, do you want to tell us how does Lambda work?
Luciano: I'll try my best and I think this is one of the interesting topics about Lambda that is often not covered enough or not discussed enough. So let's try to discuss how Lambda works under the hood, because I think it's important to come up with a good mental model. And once we do that, I think it's going to be easier to understand how to write Lambda code that is more optimized for that particular kind of environment.
So the first thing to clarify is that Lambda as a service, by definition, is serverless. So that means that if you're not using that particular service, for instance, if you don't have anything to do, like you brought an application, but nobody's using that application, ideally, nothing should be running at that moment in time. And that also means as a nice side effect, that you don't get charged for it. So it's not like you have a server spinning out 24-hour seven, and you're going to be paying for even though maybe nobody's actually using your service.
With Lambda, if nothing is running, you don't get charged for it. It's also event-based, which means that the computation starts when an event is triggered. And when that event happens, AWS will need to be able to run your Lambda code somewhere. And this is where things get a little bit interesting, because if there is no instance of your Lambda at that moment in time, or maybe the caveat there is if you have other instances, but they are all busy, AWS will need to figure out, okay, where do I run the code for this particular event?
And this means that AWS needs to spin up some infrastructure, which generally means loading the code from S3, creating some kind of lightweight virtual machine, and make sure that that virtual machine can run your code. And of course, doing all of that extra setup takes a bit of time. And the time can vary between milliseconds, or even a few seconds, depending on the kind of runtime that you pick for that Lambda function.
And because this is extra time that you are spending not really executing any code from the Lambda function, this is generally called a cold start. So you need to take to give AWS time to bring up all the infrastructure so that you can run your own code. And this is what's called a cold start. Now at some point, you take that event, execute that code, you do something useful with it, and you provide some kind of useful response to the system.
So in a way, you can say that that particular Lambda execution is completed, but the infrastructure related to that Lambda function is not immediately disposed by AWS, it will be kept around for a few minutes. It's not absolutely clear by AWS how much is that time, we have seen varying between five minutes and 15 minutes. But it's just in the order of minutes. So you need to be aware that your Lambda might stay around doing nothing for a few minutes before it's disposed.
And the reason why that happens is because if a new event comes in during that period, AWS can be smart about it and recycle that instance, rather than recreating an entire new one for every single event. And of course, this is an optimization and it's called a warm start, your code can be executed immediately without waiting for another cold start. Now, why it is important to understand all of this? I think in practice, once you understand the lifecycle of a Lambda function, you can write code that is way more optimized for this environment.
And just to give you an example, I think at this point, it should be clear that there are more or less two phases of your code. One is what you can call the init phase, and the other one is what you can call the event handling phase. And the init phase generally happens only one for every Lambda function that gets instantiated. And generally, you can use this init phase to create all the expensive resources that you might want to recycle across multiple invocations of the same Lambda instance.
For instance, if you need to establish a database connection, if you need to create some kind of client, maybe another AWS SDK client, and all of these resources should be made globally available so that multiple execution of your handler can access those and they don't need to be recreated over and over for every single event. This is something that might be done slightly different, depending on your choice of runtime and language. But pretty much with every language and runtime, you have ways to have this kind of initialization phase, and then you have handler logic that can be called over and over once for every event, and it can access some kind of global space where all these instantiated globals are available, and you can just use them rather than having to recreate them within your handler code. So what can we cover next? I think one interesting topic is maybe clarifying the different ways you can invoke a Lambda function. Should we try to describe all of that, Eoin?
Eoin: Yeah, briefly we definitely should, because this is a really important thing when it comes to understanding why best practices are good practices in the first place. And let's break it down. You've got synchronous invocations, asynchronous invocations, and then you've got the third type, which is polling-based or event source mapping. So with synchronous invocation, you have something waiting for the Lambda function to complete and issue a response, and that applies to services like API Gateway and Application Load Balancer, and it also applies to some other more niche cases like CloudFront, Lambda at Edge functions, Cognito Lambda function integrations, and you can also synchronously invoke things from the API or the CLI if you choose request response mode.
Now, the best practices when it comes to synchronous invocation is probably obviously enough to try to keep the execution time as short as possible. For example, if it's an API, you should return a response before an HTTP timeout will occur. API Gateway will give you a 30-second timeout, so you should probably set your Lambda functions timeout to 29 seconds or less. And when it comes to synchronous invocation, then another good practice is to monitor concurrency and throttling, because otherwise a lot of synchronous invocations come in, and then you'll end up in trouble, and your users are going to experience those errors, and you can't automatically handle that seamlessly for them.
So if you're not able to spin up new Lambda functions, you'll directly affect those end users. Now, when it comes to asynchronous invocation, this is probably the most common, actually, but with async invocation, you're not waiting for the response. It happens in the background. Now, the execution in this case can take longer, always up to the Lambda global limit of 15 minutes. It also means that Lambda is going to retry if there's a failure, so this is something you can't do with synchronous invocations without handling it yourself.
But with async invocations, Lambda will retry up to three invocations total per event in the case of an error. And it does that by queuing up invocations internally for up to a maximum of six hours if it doesn't have enough concurrency available to it when the event comes in immediately. So Lambda has its own internal queue, a bit like an SQS queue, but you don't see it. It's all internal, and that's how async functions work.
Now, when it comes to failure handling with async events, you can configure a dead letter queue or a DLQ to automatically receive events that have failed, now up to three times, and then you can inspect them for future investigation, and you can also redrive them back into the function. DLQs are only supported for SNS and SQS. Now, there is another way to do this kind of failure handling, and that's with destinations.
So destinations are a newer and probably better way of doing it that is more recommended these days, and you can have a success destination and a failure destination. And then that destination can go to EventBridge, SQS, SNS, or onto another Lambda function. And one of the advantages of failure destinations is that it gives you additional metadata about the event that caused the failure. Now, when it comes to async event invocations, services like S3 and SNS are examples of services that will invoke asynchronously. EventBridge is another one.
One of the additional best practices that you should really consider when looking at asynchronous invocation is implementing idempotency. And in event-driven architectures in general, idempotency is a very important thing to understand, and it basically means that the result of invoking a function more than once should have the same outcome as just invoking the same function once. So if you end up sending the same event to a destination, to a target for processing multiple times, you shouldn't end up with any additional side effects just because you received that event multiple times.
And the reason why idempotency is important is that most of these delivery modes have at least once processing delivery semantics, so they don't guarantee that they'll only deliver it to you exactly once because that's a really hard thing to do. You should be prepared to expect more than one invocation of the same event, and there are tools that can help you to implement idempotency like AWS Lambda power tools, which we'll talk about a little bit later as well, especially when we talk about monitoring.
When it comes to async events, there's some nice new metrics you get like the async event age, which will tell you the time between the event coming into Lambda to be queued and ultimately being invoked. And if that turns out to be more than you expect, there's probably a concurrency or a throttling issue or also a failure you need to look out for. And you'll also get a metric that tells you how many dropped async events there have been.
So if events aren't processed because they stayed in the queue for more than six hours, that metric will tell you all about it. So that's synchronous invocations and async invocations. And the third one is polling invocation, otherwise known as the event source mapping. And it's called event source mapping because there's a separate piece called an event source mapping that takes the function from a source and essentially synchronously invokes that for you, but it's managing the failures and retries in this event source mapping feature.
And it applies to things like DynamoDB streams, Kinesis streams, SQS queues, and Kafka topics as well. When it comes to best practices, idempotency applies here equally. You should also monitor the size of the queue or the iterator age of the stream, because if it's growing and you're not processing items fast enough, that can result in a problem for you. Then when it comes to Kinesis, it's important to make sure that you have a process that keeps track of repeated failures.
So items are usually processed with Kinesis in order per shard. And if you want to know more about that, you can look back at our previous Kinesis episode where we dived deep into it. And if you can't process an event in an ordered stream, AWS by default will retry indefinitely and basically block up the whole stream from being processed. So that's something you'll need to configure and handle accordingly.
And then another thing with things like Kinesis, you can receive batches of like up to 10,000 events. So if you receive batches of messages and one of them fails, you don't necessarily want them all to fail. So there's a few different ways to tell AWS which item succeeded and which ones failed. So only the failed ones will need to be reprocessed. So we have a whole set of articles around event processing, which we'll link in the show notes, and you can take a look there. So those are our three different invocation modes. Important to get those out of the way. Maybe now we could talk about best practices relating to performance and cost.
Luciano: Yeah, it's very important to understand, first of all, what is the formula to calculate cost when it comes to Lambda. And with many things in AWS, it's not always easy to predict cost with like extreme accuracy. But in the case of Lambda, it's not that bad, meaning that there is a simplified formula that we can use to get a feeling for what the cost is going to look like. And this generally is a function of the memory that you pick for your Lambda invocation and the execution time in milliseconds.
So that basically means that there is a price unit that changes based on the amount of memory that you allocate for your Lambda function. And then you have to multiply that unit price for the number of milliseconds that you have been invoking that particular function. And of course, if you have concurrent invocations, every invocation amounts for its own cost independent. So it's additional milliseconds that you need to multiply as well.
Now, an interesting thing is that CPU is not part of this mix. Network is not part of this mix as well, because these other two units are automatically provisioned for you by AWS, and they are proportional by the amount of memory that you pick. And again, it's not always super streamlined to really understand how much CPU or what kind of networking speed do you get. But the general gist of it is that the more memory you provision, even if you don't necessarily need all of that memory, the more CPU you get and the fastest the networking for your Lambda is.
So just be aware of these two things, because sometimes you might try to save cost by reducing the memory to the minimum, but then you end up with very little CPU or very slow network. And that might end up making your Lambda spend so much time completing the computation, and it doesn't necessarily result in saving cost in any way. And you might even have degraded the user experience if you have a user on the other side waiting for things to happen.
And this is where things get a little bit tricky. And thankfully, there is a tool that can help you out to figure out exactly what is the sweet spot between performance and cost. And this tool is called Lambda Power Tuning by Alex Casaboni. We will have a link in the show notes. And what it does is basically going to simulate your specific Lambda. So it's not going to do something very generic, but it's actually using your Lambda code, and it's going to try different configurations in terms of memory and give you a nice chart where you can effectively see with all the different configurations what is the time for computing an execution, and compared to that time and the amount of memory used, what is the cost for it.
And at that point, you can decide for that particular Lambda function. Maybe you want to prioritize for cost, and it's fine to be maybe a little bit slower, but you're going to save on cost. While with other Lambda functions, you might want to optimize for performance, and it doesn't matter if you're going to be spending a lot more, but you really want to have those milliseconds reduced to the maximum. Sometimes there is a good way in the middle.
So the chart that you get at the end is definitely a very good way to figure out exactly what is the sweet spot for you for that particular use case. Another thing worth mentioning, which is not necessarily a best practice, because again, it comes with lots of complexity, and it's not always a binary choice, is you might consider using a compiled runtime, as opposed to runtime such as Python, JavaScript, or Java.
You might want to use something like C++, Rust, or Go, because generally these runtimes can have very good performance in terms of cold start, in terms of execution times, just because those languages are more optimized for certain tasks. And this is especially true if you have CPU-intensive Lambdas. Now, where is a problem with that approach is that sometimes learning those languages is much more complicated than learning JavaScript or Node.js.
So the trade-off is a bit more on making an investment in terms of knowledge and maintenance costs, and then you might get benefits in the long term because your Lambdas might be more efficient. But this is not always an easy choice to make. You need to take care of making sure that in your team you have people that can do that efficiently, they have all the training available, and effectively you are introducing potentially new languages that you might have to support long term.
Another key element to this conversation is that it also affects sustainability, because those combined languages generally have a much better carbon footprint. So if it's something that you really care about, this might be another element to bring into the equation when you decide to invest in these particular languages. And this is something that was mentioned by Werner Vogels in his keynote during the latest re-invent.
So again, something else worth considering, but I wouldn't say it's an easy choice. Every team generally might end up with different choices depending on what they know already, what their skills are, how much do they want to invest in learning new technologies. And sometimes you don't always have requirements of looking for extreme performance. So yeah, up to teams to decide what to do, but we would be really curious to know if this is something you are considering for your team. I guess the next topic is how should people structure their Lambda code? This is something we get asked a lot, to be honest. And again, maybe there isn't a one way of doing it, but for sure there are some best practices that I think we can recommend.
Eoin: Yeah, this is true. There's definitely not one way to do it and one right answer, but it is a good idea to have some sort of layered architecture and make sure that the handler function itself ideally only contains glue code. So the layer that adapts the Lambda interface and the event source into the underlying implementation. So you can pass it to a service function or a piece of domain logic, and then your handler is basically just serializing the input, passing it on and transforming the output into the required response type.
Now, this is going to make your code more testable because it allows you to test the service logic independently from the Lambda function itself. There's lots of ways of doing this. As we said, hexagonal architecture is one which you'll see mentioned quite a lot, especially in the serverless space, but there's other different ways of implementing, like clean architecture, et cetera. There's a blog post from Luca Metzelira on hexagonal architecture, which we'll link in the show notes, but there's lots of resources out there on it.
If you want to test the whole Lambda function logic itself, you can use integration tests then and end-to-end tests and we'd definitely recommend that because Lambda as a service itself can introduce lots of interesting behavior, which you should cover in some sort of automated test and even performance and load tests as well. You can also abstract dependencies like storage or interacting with other AWS services as part of an architecture like this, and then you can easily mock these abstractions or have secondary implementations you can use for local testing.
Now, let's give a quick example. Let's say we've got an e-commerce shopping cart example and you want to add an item to a cart and you've got an API endpoint to achieve this. You can define what the expected inputs and outputs are and the validation rules for this API endpoint. So your input might be a cart ID, an item ID and a quantity. With your validation, you can say, well, the cart ID must exist in the database and be associated with an active user.
The item ID must exist in the database and it should be associated with a valid product and then a quantity should be a reasonable positive integer, maybe below some arbitrary limit, depending on your context. It's always good to have lower and upper bounds. And the output then would be an updated view of the cart with the new items included perhaps. Now, this validation is all something that you should be considering implementing in your Lambda Function or at some level in the stack, possibly multiple levels.
Then also defining all the errors, right? Not just thinking about the happy path, but what are the different errors and what is the information associated with those errors that you want to leak outside your function. Then you can implement code to abstract the persistent layers, like how you're storing the state of your user cart across requests. And this layer might have helper functions such as creating a cart, adding an item to a cart, removing item from cart and maybe emptying the cart.
So that could be in a data access layer or repository. And then you implement the service that takes the inputs, like the cart ID, the item ID, the quantity, and updates the given cart or returns the error. So you would update the handler that takes the event, validates it, extracts the cart ID, item ID and quantity, and then pass it to the service. Then it would take the response from the service and convert it to the appropriate maybe HTTP response, if this is a HTTP API, and the handler would take care of working with the Lambda proxy integration event and response format.
Now, you don't necessarily always have to have all these layers of abstraction. Sometimes it is okay to keep things simple, especially if you've got something that doesn't do something very complex, as long as you've got some way of testing your code and you have confidence that it does exactly what it's supposed to do and that if you apply changes later, you can easily test it and you don't have to rework your tests completely.
And you can also use a middleware library like Middy.js in Node.js, JavaScript, TypeScript, to abstract some of this logic, like validation and serialization, and make it reusable and easily testable. And you can refer back to our previous episode on Middy to find out all about that. And again, this will be related to Power Tools, which maybe we should talk about next. The Power Tools we mentioned already in the show in the context of idempotency and also middleware because it integrates with Middy, but even in Python, it will provide you that middleware support as well. What other things can Power Tools provide? I think we probably regard it as just a default best practice to just start off with Power Tools and functions these days.
Luciano: Yes, I think before we get into the details, it's worth clarifying that Power Tools is effectively a library that exists for different languages and specifically targets Lambda. And it tries to solve, I guess, some common problems that you might have when writing Lambdas, and it tries to provide kind of a comprehensive, better-included solution that gives you tools to solve these problems. And the common things that Power Tools started with are logs, metrics, traces, but we already mentioned also idempotency and different versions of the library, meaning different languages might have more, I guess, complete support for different things than others.
Generally speaking, the Python one, because probably historically was the first one, is the most complete one, and the other ones tend to follow along after a few months with the new features. So definitely check out, depending on your language of choice, what are the available utilities, but Power Tools is definitely something you should be using to the very least for logs, metrics, and traces. But there is more, like idempotency that is absolutely useful, and if it's there for your runtime, you should definitely consider it.
Other things that are there are, for instance, supports for OpenAPI specification. For instance, the Python one recently introduced lots of helper methods to make all of that process easier. Not really sure if that's already supported in Java or the Node.js equivalent, but something that eventually is going to come up to those versions of the library as well. Another thing that if you really care about metrics and alarms, you might want to check out a tool we have already mentioned before called SlickWatch, which is an open source tool that we created at Fortier to simplify, almost automate the creation of sensible defaults when it comes to metrics and alarms and dashboards.
So worth checking it out because it can make your life easier, and it can effectively speed up the work around covering those areas for at least 80% of the use cases. Other things that we already mentioned in terms of tooling is Lambda Power Tuning for performance. But when it comes to all this topic, we have a bunch of previous episodes where we cover details about how to do good logging, how to use CloudWatch for logs, how to use CloudWatch alarms, how to do metrics with CloudWatch.
So we'll have all the links for this episode in the show notes if this is a topic that you want to dive on and really understand the details. Moving on to another area, I'm just going to go through some quick suggestions and maybe going less in detail compared to the previous areas we covered today, but hopefully you still get some value in getting some quick suggestions on things to focus on when it comes to writing Lambdas.
And one topic, for instance, is configuration, and can be a very big topic. There are different ways to manage configuration when it comes to AWS. So the quick suggestions we have on this one is when it comes to secrets, don't store them in clear text as an environment variable, or even inside your code, that would be probably even worse. But yeah, there are other ways to do that. For instance, you can use Secrets Manager, you can use SSM and have encrypted parameters.
So generally speaking, the recommendation there is when it comes to secrets, it can be very convenient to store them in plain text in your code or in environment variables, but generally speaking, if you can avoid that, it's best to do so. And the other point is infrastructure as code. This is a topic that also we cover extensively. So the only recommendation we have here is use it. There is no excuse not to use it.
Of course, you can prefer different tools. There is some, Terraform, CDK, Serverless Framework. We have covered pretty much all of them in previous episodes, and we'll have the links in the show notes. But the point is, regardless of which tool you prefer, you should be doing infrastructure as code, because the advantages definitely outweigh the disadvantages, which might be maybe a little bit of learning curve, because you need to learn a new tool and become familiar with it.
But then there will be such great advantages that definitely they're going to pay off big time, that initial investment. So if you haven't picked infrastructure as code yet, definitely put it at the top of your to-do list, because you're going to be grateful going forward. And for sure, the whole management of infrastructure is going to become so much easier and so much more reliable. That is something that you will be thankful that you have finally tackled.
And finally, one last point is security. This can be a massive topic on its own, but the one quick tip that is particularly relevant for Lambda is to apply the principle of list knowledge. And the reason why I think this applies particularly for Lambda is because in Lambda, since you have such, generally speaking, at least such a small focus, like generally Lambdas are like one purpose only, you can really fine-tune the permissions for that one particular purpose.
So you might have, for instance, going back to our example of the adding item to the cart Lambda, you might give that Lambda only the permission to write that particular item, maybe to a DynamoDB table. So it can only add items to carts and not do anything else. So if that Lambda gets compromised, it's not like an attacker can read passwords or maybe manipulate credit card detail. An attacker will only be able to add items to a cart. So, of course, not ideal anyway, but the blast radius is very, very limited. So this is why with Lambda, it's even more important to apply this principle because you can really fine-tune it to the maximum. And therefore, your application as a whole is going to become extremely secure, at least compared to more traditional and monolithic architecture where effectively your weakest spot becomes the biggest vulnerability of the entire system.
Eoin: There are a lot of other areas you might pick when you're talking about best practices. We didn't even cover things like deployment and dependency management or the hot topic of Lambdalith or monolithic Lambda functions versus single-purpose functions. Like maybe these are topics for a future episode, but at the end of the day, a lot of these choices just come down to personal preference and context. So for now, we'll just leave you with some extra resources to check out. So look at the links in the description below for AWS advice on Lambda best practices, which is worthwhile, but also the great video series from Julian Wood, who has 20 good videos on understanding Lambda, and I really recommend them for anyone who's looking to fill any knowledge gaps there. So thanks for listening, and until next time, we'll see you in the next episode.