65. Solving SQS and Lambda concurrency problems

Published 2023-01-27 - Listen on your favourite podcast player

In this episode of the AWS Bites Podcast, we dive into the serverless pattern of using AWS Lambda together with SQS. We explain the basics of both Lambda and SQS for those who may not be familiar with them. We talk about how we use Lambda, a Function as a Service offering in AWS, to write our own functions and have AWS run them in response to certain events. And we also discuss SQS, a scalable and managed queuing system available on AWS, which we use to offload work to background workers.

We delve into how the two services work together through the use of "Event Source Mapping" in Lambda, which polls our SQS queue and makes synchronous Lambda invocation requests when messages are available. We also mention how this feature provides us with the ability to control batch size and window, as well as specify filters to save execution time and cost. But we also share one of the limitations we faced when using SQS and Lambda together which was the lack of control over concurrency and the potential for excessive throttling.

But recently, AWS has released a new feature called "SQS maximum concurrency support" which allows us to specify a maximum number of invocations for an Event Source Mapping. This solves the problem of excessive throttling and eliminates the need to use reserved concurrency. It also allows for more control over concurrency when using multiple Event Source Mappings with the same function. We explain how this new feature has improved our workflow and made it much more efficient.

AWS Bites is sponsored by fourTheorem, an AWS Consulting Partner offering training, cloud migration, and modern application architecture.

In this episode, we mentioned the following resources:

AWS Lambda
Amazon SQS
Series of blog posts by Zac Charles covering the original problem and the solution in detail
Official AWS blog post with the announcement of the maximum concurrency feature
Our previous episode on SQS
Our video-series on AWS event services

Let's talk!

Do you agree with our opinions? Do you have interesting AWS questions you'd like us to chat about? Leave a comment on YouTube or connect with us on:

BlueSky: @eoins.sh + @loige.co,
LinkedIn: eoins + lucianomammino,
Twitter: @loige

Help us to make this transcription better! If you find an error, please submit a PR with your corrections.

Eoin: Using AWS Lambda together with SQS is a very common serverless pattern that has always suffered from some special limitations. We covered SQS in a dedicated episode last year, but recently we've had a significant new feature solving a common pain. And today we want to dive deeper into using SQS and Lambda together and tell you all you need to know about using SQS triggers, about scaling and concurrency.

I'm Eoin, I'm here with Luciana, and this is another episode of the AWS Bites podcast. AWS Bites is sponsored by fourTheorem. fourTheorem is an AWS consulting partner offering training, cloud migration and modern application architecture. You can find out more at fourtheorem.com and you can find that link in the show notes. Luciano, let's start with addressing the basics for anyone not intimately familiar with AWS Lambda and SQS. We have a lot of seasoned experts out there listening, but we also know that plenty of people are listening and watching or taking their first steps with AWS or these serverless offerings. Do you want to give a quick overview and a quick elevator pitch for SQS and Lambda? Okay, I'll try my best and I'm going to start with Lambda.

Luciano: So Lambda is basically a function as a service offering that is available on AWS. So what that means is that it is effectively a managed compute service with a very simple abstraction. As developers, we are familiar with the concept of a function, which is basically a piece of code that takes some inputs, executes some logics and returns some output. And Lambda basically takes that particular model and provides it as a managed on-demand compute layer.

So as a user, you write your own Lambda function, you can use many languages, so pick the language of your choice. You write your own business logic in that particular shape, so in that particular function format, and then you just tell AWS when to run that particular function. And generally, this is in response to a particular event. Just to give you an example, that event can be an HTTP call if you're using something like API Gateway and you are trying to implement an API in a serverless fashion.

It can be a schedule, I don't know, every Monday at 9am, maybe you want to do something, so you can trigger the Lambda that way on that schedule. Or maybe you want to react to certain files being created in S3, that can be another trigger. Or maybe, because we are going to be talking about the integration with a queue, you can trigger your Lambda as a response to a new message being available in SQS, which is a queueing system available in AWS.

So let's just talk more about SQS as well. So, very similar to Lambda, SQS is another managed, scalable service provided by AWS. So if you need a queueing system and you don't want to manage all of the deployment updates, security patches, scalability by yourself, you just go on AWS and you provision a new SQS queue. And just to explain why you would want to do that, let's present an example. Or actually, in a more generic sense, let's just say that you have some piece of functionality going on, but also you might want to do more work on demand in the background.

For instance, I don't know, you are sending transactional emails, or you need to resize some pictures, or you need to run some workloads. For instance, you have some text available in picture format, and you want to extract the text, maybe running an OCR algorithm. So all things that you don't want to do in line, you probably want to offload in the background, and maybe you want to parallelize that kind of compute.

So what you could do there is you could create a queueing system, maybe use just SQS, and then every time a new job becomes available, you get the definition of that job. Rather than doing it straight away in the process that receives the definition of the jobs, you just send it to the queue, and then the queue is going to keep it in storage somehow, and other workers in the background can just ask to the queue, is there something I can do?

They can pick up the work and just do it in the background. Now, this brings a few interesting advantages. The first one is that you are not blocking the core application. If it's a web server, the web server can reply to the user as fast as possible. This is what the user expects on the web, while all the heavy work is offloaded to the background. If you have a peak of traffic, maybe that creates a lot of work that you'll need to do in the background.

So by having a queue and having workers, you have a decoupled system, and basically you can decide to scale up the workers' part. So add more and more workers to be able to respond to that increased demand for background work. And that can be very elastic, so when the demand is over, maybe you go back to a normal, you can remove all the workers that you don't need anymore. And also that adds resiliency, because if something fails, the queuing mechanism can automatically recognize that a job failed and put it back in the queue, so that means that another worker would pick it up again later and you can retry it automatically. And even more interesting, if that particular job keeps failing, you can add rules to basically move that job on the site. They are generally called dead letter queues. So it's basically another queue where you store all the jobs that you were not able to process, and a human can just go there and try to figure out why this consistently failed. Maybe there is a bug. You can fix that bug in your code and then push the message back to the original queue, and at that point, you are able to reprocess that message correctly. So that's basically giving you ways to never lose jobs and consistently be able to deliver on what the user expects. We want to talk about Lambda and SQS together. So how do they work together?

Eoin: So with SQS, you're always using a poll-based model. You would need something to poll the queue, retrieve events, process them, and then delete them. It's a fairly simple API, really, when it comes to consuming messages from SQS, and we covered that in detail in the previous episodes. So traditionally, you'd use EC2 or a container or some other piece of long-lived compute running on AWS or even on-premises or anywhere else.

With AWS Lambda, it's a lot simpler because you don't have to run a poller yourself. The polling service is actually provided as part of Lambda's Event Source Mapping feature. And you may have used SQS and Lambda together without knowing that there was such a feature because a lot of things like the serverless framework or SAM kind of create this for you transparently under the hood when you create that trigger.

So within the AWS Lambda service, you've got this Event Source Mapping feature, and this is the bit that's doing the polling. It's also the same feature that handles Lambda triggers from Kinesis and Kafka, MQ, and DynamoDB streams. So if you imagine a simple architecture diagram, you've got your queue on the left and a Lambda function on the right. The Event Source Mapping is essentially a box in the middle that's running that polling and passing messages from the queue and invoking Lambda functions with the messages.

And those invocations are actually synchronous. So they're not, with Lambda, you've got synchronous and asynchronous invocations. Event Source Mapping is using a synchronous invocation and waiting for the function invocation to complete. So Event Source Mappings are very good because they give you a few neat features for free that you would have to implement yourself. Like you can control the batch size and the batching window in terms of the number of events that arrive in a batch, what kind of time interval they need to arrive within.

And since about a year or so ago as well, you can also specify filters. Some messages are filtered out before they reach your function. So that can save you a lot of execution time and cost as well. So if you imagine, if you've got an instance or a number of EC2 instances polling from a queue, you're in control of the processing rate and the concurrency because it's directly linked to the number of workers you have, right? You can retrieve a batch of messages and process them with whatever cluster size or worker pool size you have running. Now with Lambda, the Event Source Mapping is doing this for you. So it's in control of the concurrency. And it's this fact that's been a source of pain for a lot of users. And this was the case until very recently when the AWS announced a feature called SQS maximum concurrency support.

Luciano: That makes a lot of sense to me, but maybe we can provide an example of what is the pain we are talking about so we can make it more obvious to everyone. And also how is this new feature helping to kind of ease this particular type of pain?

Eoin: Yeah, no, this is good. Let's try and give some sort of an example. So let's say you've got a queue that has messages relating to signups for your SaaS application. So user fills in a form, signs up, they're now customer reviewers. As part of this whole signup flow, maybe you've got this event driven mailing list subscription features. So when a user signs up, you go off and you want to make an API call to MailChimp, for example, so that they're going to receive your weekly user mailing list.

Now let's say in this contrived example that MailChimp has a rate limit of 10 invocations per second to this API. So you've got a queue and a Lambda function that takes the signup event and initiates a subscription with the MailChimp API. So you want this to scale as users sign up, you want to have this resiliency you talked about, but you don't want to flood this API because it's got a rate limit.

So let's talk about the behavior before we've got this recent change in how Lambda and SQS work together. So the Event Source Mapping in Lambda starts five pollers by default, reading messages in batches from the queue. So let's say you've just got a batch size of one, but you've got more than 10 messages coming in per second. So when messages are available, the Event Source Mapping will pass these to running Lambda containers.

But if you've got more messages coming in because your service is really popular and people are signing up at a really, really fast rate, the Event Source Mapping is still going to scale, try and scale up the number of Lambda workers by making synchronous invocation requests. And it's going to increase that concurrency level by a factor of 60 every minute. And it will keep going up to the account concurrency limit or the reserve concurrency or 1000, whichever is the lowest.

So in order to prevent your function from exceeding that relative API limit, you might set the reserve concurrency to five because you're thinking, okay, maybe this function takes about 500 milliseconds to invoke. So in order to keep it at 10 per second, I'm going to just have five concurrency. But Event Source Mappings don't seem to know anything about your functions reserve concurrency or the account level concurrency.

It just keeps scaling up. So the Lambda for service will stop there from being more than that number of concurrent workers, but Event Source Mapping just keeps trying to invoke functions anyway. And this results in invocations being throttled. So this can happen in a lot of different cases, like when other functions are consuming the account concurrency and there just isn't the available capacity as well. It can even happen if you've got cases where you've got multiple Event Source Mappings invoking the same function, which is also possible because each one is scaling independently. So this has been a source of a lot of pain for use cases like this.

Luciano: So what happens when the throttling actually occurs and how can we actually leverage this new maximum concurrency feature to make our life easier?

Eoin: Yes, when the throttling occurs, messages are going to go back onto the queue once the visibility time has been reached. And if this keeps happening after a number of retries, which is configurable, the message will be discarded, or if you've got a dead letter queue, it will end up in your dead letter queue. So the new maximum concurrency feature is really doing exactly what it implies. It's specifying a maximum number of invocations at the Event Source Mapping level.

So you don't need to use a reserved concurrency for the function, although you can use both together in combination. So it solves the problem by essentially capping the number of concurrent invocations and reducing the excess of throttling that can happen with the default behavior. So it's helpful for our example, when you don't want to flood the third party API with requests that might cause a rate limiting error, it means you don't have to use a reserve concurrency, which can be annoying for people, because when you use reserve concurrency, you're also taking away capacity from other functions. It's also nice for anyone using multiple events or mapping with the same function that you can now control the concurrency of each trigger independently, instead of just the function as a whole. Yeah, that makes a lot of sense.

Luciano: Basically before we were effectively hacking the system, trying to limit the number of execution with something that was not necessarily meant to be used in that way. And that was creating the side effect that the event source mapper was still trying to trigger your Lambda and probably you would end up with a lot messages in the dead letter queue, just because there wasn't capacity to execute them, not because the messages were actually failing. So probably this will lead to a lot of false positives and then somebody needs to look at them, retry them, and a lot of like overhead for just because for lacking the capacity of saying, I don't want you to run more than a certain number of Lambdas at any given time for this particular source of events. So that makes a lot of sense. Is there any other improvement that we would like to see in this integration between Lambda and SQS?

Eoin: One of the things I mentioned was that the scaling rate of Lambda and SQS, it's adding 60 concurrent function executions per minute. Now this is pretty slow scaling rate. And if you've got like batch processing workloads when suddenly you've got tens of thousands of requests coming in and you want to scale out to maybe hundreds of thousands of Lambda functions concurrently, adding 60 every minute is really slow.

And I've encountered this myself and then had to use other mechanisms. So if you have just the Lambda API and you call invoke directly with the async mode, then you can scale to thousands of concurrent functions in seconds. And I know for a fact that that's using SQS internally to manage that queue of invocations as well. So it's still a bit strange that SQS seems to be really slowing down your scaling rate and other events sources don't, like with Kinesis, it's tied to the number of shards. So this is a bit limiting. So I would like to see if there was a new future coming out in this integration, I'd like to see that changed and make it more configurable so that you could at least, if you choose to, you can scale up much faster than that. That would be my number one next feature. That makes a lot of sense.

Luciano: I also have a slightly related comment that another feature that is available in Lambda is that you can consume messages in batches, not necessarily just one by one. Now this is not necessarily going to solve this problem because yeah, this problem still exists, but it's another dimension that you might use, for instance, to handle throttling or to handle cases where you want to, where maybe the task that you need to perform is very small and therefore it makes sense to try to get this task together. So you pull once from the queue and then you, that Lambda that gets executed can do a certain number of dams together rather than just doing that one by one. So this is just something to keep in mind and something we mentioned in the other SQS episodes. So maybe check out that particular feature if you're trying to figure out what kind of patterns you can use when using SQS. With that being said, are there resources that we want to recommend people if they want to deep dive?

Eoin: A lot of people have been writing and talking about this maximum concurrency feature recently, but I think the best place to go is the series of articles written by Zach Charles, who described this problem very well when he originally encountered it, described how to reproduce that problem and has now written a followup in that series about how this solves maximum concurrent, how this maximum concurrency feature solves the problem, but also some other things you might want to watch out for. So that is definitely the go-to guide here. We will also include a link to the AWS blog post and sample code provided with the announcement. There's a SAM template that you can use to explore the new feature. And of course, do check out our previous episode on SQS and all our other series on all the AWS event services. So that's it for today's episode. Thank you very much for joining us and we'll see you in the next episode.

65. Solving SQS and Lambda concurrency problems

Let's talk!

Prev

Next