AWS Bites Podcast

77. How do you use Lambda Response Streaming?

Published 2023-04-21 - Listen on your favourite podcast player

Are you tired of waiting for your Lambda functions to finish before getting a response? Well, now you don't have to! In this episode of the AWS Bites podcast, we will talk about Lambda Response Streaming, a new feature recently added by AWS that lets you stream responses from your Lambda functions in real time.

We'll start by explaining what Lambda Response Streaming is and how it differs from buffering. We'll also discuss HTTP Chunking and other benefits of streaming. If you're a Node.js developer, you'll be happy to know that we'll cover how to work with streams in Node.js and how the new Lambda Response Streaming API works with the Node.js runtime.

But that's not all! We'll also discuss how to consume Lambda Response Streaming responses and compare that with S3 Object Response. And if you're wondering about pricing and quotas, we'll cover that too.

Finally, we'll answer the question on everyone's mind: will we get streaming requests as well? You'll have to watch the video to find out!

So if you're interested in learning more about Lambda Response Streaming and how it can improve the performance of your serverless applications, make sure to tune in. We promise it'll be worth your time.

AWS Bites is sponsored by fourTheorem, an AWS Consulting Partner offering training, cloud migration, and modern application architecture.

In this episode, we mentioned the following resources:

Let's talk!

Do you agree with our opinions? Do you have interesting AWS questions you'd like us to chat about? Leave a comment on YouTube or connect with us on Twitter: @eoins, @loige.

Help us to make this transcription better! If you find an error, please submit a PR with your corrections.

Eoin: The AWS Lambda team recently announced an exciting new feature that might be interesting to all of the JavaScript and TypeScript developers out there using Lambda. We're talking about Lambda response streaming, which allows for sending data incrementally from a Lambda function, reducing the time to first byte. I'm Eoin, joined by Luciano, and in this episode of AWS Bites, we will tell you everything there is to know about this new Lambda feature.

We will discuss the benefits of streaming over buffering, talk about the provided API, what we like and what we don't like about it, and we'll mention quotas and pricing. And finally, we're going to speculate on whether we can expect a more streamable future for Lambda. So stick around for the full episode to learn more about Lambda response streaming and how it can benefit your JavaScript Lambda functions.

AWS Bites is sponsored by fourTheorem. fourTheorem is an AWS partner for migration, architecture, and training. Find out more at fourtheorem.com. That link is in the show notes. Okay, Luciano, for Lambda response streaming, let's talk a little bit about what it is. It was announced, I think, just about a week ago, and it's a Lambda feature and it's currently only possible for Node.js runtimes and custom runtimes as well. And the idea here is that it allows you to send a response from a Lambda function incrementally. Previously, you had to kind of build up your whole response and then send it back in one go. So instead of sending it back in one block, you can now start to send some bytes as soon as you have them and your client can start receiving them. So this is useful for streaming data, like let's think about some use cases, let's say records from a CSV file if you're generating them on the fly, or even server side rendering. I think that's a big use case for this kind of functionality. So the client can start receiving the data straight away, even while processing is still happening. Now, how do you actually integrate this into your client? So it currently works with Lambda function URLs. This is something we talked about in a very recent episode, link is in the show notes, but you can also use that then with a CloudFront distribution in front of it if you want to get all the benefits of caching and edge locations with CloudFront CDN. It doesn't really work with API Gateway or Application Load Balancer. I mean, it kind of works, but it doesn't give you the streaming benefit. So you can send use it with an API Gateway in front of it, but the API Gateway is still going to buffer it up for you. But there are actually some subtle differences when you use it with API Gateway that give you some benefits and we'll cover that later. On support, it's supported in CloudFormation and SAM. I mean, you don't actually have to change the Lambda function itself, only the code. But if you want to use it with function URLs, there is a new CloudFormation feature. It's not available in CDK yet. It is available in SAM and it's not yet available in the serverless framework, although there is a pull request open for it. Luciano, you're a bit of a streaming guru, if you don't mind me saying so. What is a stream really and how does it differ from buffering? Now I feel the pressure.

Luciano: Okay, let me try to explain what is the difference between streaming and not streaming. Because on one side, if you think about sending data over the wire, like networking is inherently streaming, like you keep appending bytes in a channel. So it's more of a conceptual model on how do you actually produce the data before sending it to the wire or while sending it to the wire. So when we talk about buffering, basically the idea is that you are creating some kind of response object or some kind of response data. And of course, it's a sequence of bytes, represents some serialization, represents some kind of structure. And you might be accumulating all of that information in memory for a while until you actually have the entire message ready. And this is kind of what we mean when we say buffering. You buffer all of that information in memory and only when you complete that, you start to send the bytes over the wire.

As opposed to streaming where maybe you can have meaningful chunks of that information, maybe, again, your example of CSV records, I think it's a very good one. So you might be able to send partial information over the wire as you are accumulating that information. Actually, you don't even accumulate. You keep kind of passing the bytes as they are becoming available. And your client can start to make sense of that information as soon as the first row, for instance, if the CSV record arrives, you can actually start to take advantage of that information.

So again, in a CSV file example, what that can be useful for is the client can start to do analysis on that CSV data. Maybe it's doing a query so I can filter out the records that are relevant and keep only the ones that display to the user the ones that match the query. Or maybe if it needs to do some kind of streaming aggregation, I don't know, maybe you're calculating a sum or an average or something like that, you can start to do all the computation real time. And even with partial data, you can start to show something useful to the user. Other examples are you could, for instance, if you are plotting a chart that has thousands or millions of points, if those points are sent to you in a streaming fashion, you can start to display them straight away as soon as they arrive rather than waiting for the entire response to arrive. Similarly, you could render pictures that are encoding formats that are progressive, so you could start to render the picture as it arrives, even before it's all received. Or even web pages, you mentioned the case of server-side rendering pages, and I think this is kind of a common trend in the front-end world. Lots of single-pages application frameworks now have capabilities to server-side render to make sure that you get the advantage of that time to first byte, which is a very important metric on the web today. Search engines will prioritize web pages that will start to render as soon as possible. So this is probably the most interesting use case, and I would bet that this is the main motivation why AWS spent energies enabling this feature, because I've seen a lot of other competitors trying to support the story of if you're using Svelte, if you're using Vue.js, if you're using React, if you're using Solid or Quick, they all offer today ways to server-side render and be able to send dynamic content to the users, and probably Lambda needs to be on par with these other engines and make sure that this use case is supported as well. Now, another interesting technical detail is that if you're really curious to understand what is the actual protocol that gets used, this is just using plain HTTP chunked encoding behind the scenes, and we will have a link in the show notes if you're curious, but the idea is that because you need to send small frames of data, you need to tell the receiving side how big is every frame, so there is a protocol that kind of documents how do you encode that information, so not just the raw data, but encapsulated in a frame that also specifies the length of that frame, and then of course there is a way to mention, well, to specify when the entire stream is over and the entire request can be considered completed, so I hope that kind of clarifies what do we mean when we say buffering and streaming and what are the benefits there and the use cases, but I don't know, maybe there is something more to add on why one model could be better than the other or when one is more convenient than the other.

Eoin: I think you already mentioned the main benefit, which is the time to first byte metric, and you also mentioned that this is useful for search engine optimization and also for performance metrics like Lighthouse, so it's pretty important for web applications and mobile applications these days, but the other thing I suppose is that with streaming you can handle potentially infinite amounts of data in a pretty efficient way, so you've already got the alternatives of doing frequent polling if you wanted to get periodic data like to render a chart based on real-time data, but it's much more efficient to do with streaming. You don't have the request overhead every time and you don't have to specify the content length up front with junked encoding. It can dynamically evolve in that way.

It could also be used for say streaming footage from a video camera, a security camera for example, but from an application development point of view the nice thing about streams I suppose is that they are composable, so you can have streams within streams and if you're a Node.js developer you'll probably see this quite often, but it's also possible with lots of other languages where you've got streaming abstractions that allow you to nest streams. For example, you can have a zip stream or an encryption stream and you might have then a base64 encoding on top of that and by writing into one stream you get that automatically happening for you under the hood and the base64 gzipped version is being streamed out at the other end. I suppose there is a disadvantage with streaming as well in that it can be a little bit more complex and harder to reason about. Buffering is a little bit of a simpler model and easier to debug, but I still think that in modern web applications it's hard to avoid streaming. So maybe Lutiano, you've done a lot of content, a lot of speaking and writing about Node.js streams and all the different generations of how streaming works in Node.js, so you're probably best place of anybody to talk about how this will work in Lambda with response streaming. So how does it work under the hood?

Luciano: Yeah, not going to go too much in depth, but I think it's worth giving a very quick summary of why Node.js is particularly suitable for streams and for this particular use case in Lambda. And I think it's because streams in Node.js are kind of a first-class citizen. They always have been, even from the very early versions of Node.js. And there are some primitives that get used in different places in the Node.js core library and third-party libraries. And these are all different kinds of streams that you can use and compose in different ways. So the main ones are readable streams, which are basically an abstraction that allows you to consume data from a source.

That can be a file from the file system, a network socket basically is how do you read data in a streamable fashion. Similarly, there are writable streams. So again, if you are trying to send data to a file or send data over the wire or maybe write to standard output, you can use writable streams, which again, is just an abstraction to basically allow you to write data incrementally into whatever source of data. Then there are transform streams, which are kind of something in between. They basically allow you to take data from readable stream, do some transformation, and then send it to a writable stream on the other side. So these are generally, if you can imagine, streams are like plumbing. These are generally something you put in between a readable and a writable to change the data on the flight. And good use cases also in the standard library are encryption, compression, and yeah, basically you can use them even together. Like you can even pipe two transform streams and add encryption and compression at the same time.

And finally, there is duplex streams, which is kind of an abstraction that represents bidirectional channels, like a network socket. So for instance, when you have a communication channel where you can both read and write and you are kind of creating a channel where two parts can basically exchange information in both directions. So again, this is important just because these are all classes that you can use in Node.js core and they will be used to implement all the more advanced streams capability like file system, HTTP, compression, encryption, standard inputs, standard output, standard error. So once you get familiar with those primitives, you can even build your own custom streams, custom transformations. And if you, for instance, have to interact with a new database that you're starting to use for the first time and it's not in the standard library, you can use stream to also read and write data from that database if you want to in a streaming function. Now again, I don't want to go too much into detail, but we will have a link in the show notes with a workshop that I created some time ago that basically guides you through the whole experience of understanding how stream works, there are exercises up to get into more advanced topics where you can create your own custom streams. So if that's something you are curious about, feel free to check the link and let me know if you like it or not.

Eoin: Okay, sounds good. Do you also want to shamelessly plug this book, which also covers Node.js streams? Oh, that's a good one. Yes, we will also have the link to that one in the show notes. Thank you.

Luciano: Thank you. Yeah, there is an entire chapter about streams and I think most of the material is also covered in the workshop, but I think the book probably goes a little bit more in depth.

Eoin: Okay, great. Well, maybe we can talk a little bit then about how the world of Node.js streams meets Lambda functions with this new feature. There's a new API that you can use in your Lambda functions for Node.js. So you end up with a global object called 'awslambda', and then you call the 'streamifyResponse' function on it. So it's not a case of just returning your existing payload. You have to write your functions a little bit differently if you want them to work with response streaming.

So instead of your typical Lambda function signature with event and context parameters, you do this streamifyResponse and then you use a function with three parameters. So you get an event, a response stream and a context, and then you can start writing to this response stream in the way that you just described. So it's just a writable stream and it forwards bytes incrementally. So you can write to it or use any of the libraries out there that support Node.js streams to write to it. Once it's finished, then you call.end to close the stream.

So you mentioned that there's lots of different cool things you can do with Node.js stream. So you could use pipeline if you've got multiple steps in the pipeline to create those streams. Like we talked about gzipping and base64 encoding, that would be one example you can do with it. Now, it is a bit of a weird thing, this global object, awslambda. We haven't had this before. I'm slightly surprised by this because globals are generally frowned upon. This is weird for a few different reasons. Your linter or TypeScript will complain about this object because it's just not the done thing these days to have globals. So you'll have to configure your linter to ignore this global. You won't be able to run your Lambda function code globally. So you'll have to fix something for your unit tests. It does seem to work with local invoke, which is good news. But I did try the latest serverless framework version and the one from the PR that has this function streaming URL support. It didn't work with local invoke. It didn't know what the AWS Lambda global was at all. So it's a little bit of a strange kind of black box that you have to work with. The code doesn't seem to be publicly available on GitHub. Although I think Luciana, you found that there is something you can find from the Lambda emulator container, the container image for Lambda. You can actually reverse engineer that a little bit and find out how it works. If you don't want to deal with all of these problems, MIDI has actually been very quick and added support for response streams and added a nice interface that hides a lot of the complexity. There's a link to the documentation in the show notes. Now, Luciano, is there anything you'd like to add to that? And maybe you can talk about how do you actually read the response of a Lambda using response streaming?

Luciano: I think that's a good segue from what you just mentioned. Yeah, basically the idea is once you have a Lambda that uses this feature, sending responses in a streaming fashion, how do you consume that information? Right. Also from the client perspective, you need to be ready to consume information in a streaming way. So you will receive chunks of bytes rather than just the whole response straight away.

So the way they implemented this in AWS is we already mentioned the diffuse Lambda function URLs. You basically can just use HTTP and you will see the HTTP response coming up in a streaming way. So for instance, if you get a chunk every second, you should see, maybe you do a call or you open the browser in a specific function URL, you should see the response being rendered basically over time as the new chunks are available. That's one way, but if you want to use a more programmatic way, you can also use the SDK. There is a new functionality called Invoked with Response Stream.

So basically that I haven't checked exactly the specification in JavaScript, but my expectation is that you will get a readable stream so you can then keep using streams in Node.js if you're using the Node.js SDK, the JavaScript SDK. Now, what about Node.js runtimes? It's probably the follow-up question there. And the answer is still a little bit, let's figure it out. I think that there is still new documentation that will come up in the next few weeks. It seems that there is some support coming up. For instance, we were able to dig into the Rust runtime, which is fully open source, and we found instances in the code where it looks like this feature is fully supported there, even though we couldn't really find an official piece of documentation for that. And we will have links in the show notes for all of that. Very similarly for Golang, we were able to find something for it. But again, we expect we will have news in the coming week about more official support for other runtimes because of course all these concepts, yeah, custom runtimes, all these concepts are not special for Node.js. It's just Node.js has a better support because it's a primitive that existed in Node.js for a long time, and people are more used to use these kind of concepts in Node.js. But of course, this concept, you can use them in any language. You just need to have an interface to work with that. Okay.

Eoin: So the official line there is that we have supported Node.js and custom runtimes, but not in any of the other provided runtimes. So you won't be able to use this in the Python provided runtime or the Java one or.NET yet. You'll have to wait for new features to come out in those actual runtimes. Yeah, that's a good clarification. Okay. Maybe then given that we can talk a little bit about some of the other limits and pricing, anything else that might stop you from plowing ahead with this feature. We talked about the ability to do infinite streaming with streaming. Now that's practically of course not the case with Lambda because you still have a 15 minute timeout limit. One of the advantages of using Lambda response streaming though is that you can go over the traditional limit of a six megabyte response payload. And now you can stream up to 20 megabytes of data, but that's only a soft limit.

So you can ask for more if you needed to. And that will then allow you to go beyond six megabytes and well beyond that with function URLs or with the direct invocation method. If you're using an API gateway, you still have a 10 megabyte limit. So it allows you to get up to 10 megabytes with API gateway, but it's not a massive increase. And if you do go over six megabytes, then you start to incur an extra cost. So this is another pricing dimension for Lambda. So it used to be that you only had requests and memory per second to worry about when you were thinking about Lambda.

Now there's a few more pricing dimensions. So this is a new one and it's just less than a cent per gigabyte roughly for anything above the normal six megabyte limit. It's also important to realize that there is a maximum bandwidth throughput limit of 16 megabits per second, otherwise known as two megabytes a second for streaming functions. So if that doesn't sound like it really works for your use case, we should probably remind everybody that there's a couple of other ways to do streaming with Lambda. And if you cast your memory back, you might remember that there was an S3 object Lambda feature released a couple of years back. I wrote an article about it at the time and did some example repos, which we can link in the show notes.

But this is one way to get a streaming response back to user. And this is where you basically intercept something like a get object request to the S3 service. And it will invoke a Lambda function that allow you to intercept the object and pass back some different response. So you can either generate a completely new response or you can mutate the response in some way like convert the file format, do some filtering. And this allows you to stream back as well using a different stream interface. So this is something that might also work for you if you need to do streaming, but it's only based on objects that exist in an S3 bucket. So an object still has to exist.

You could probably hack this a little bit and just have some dummy objects that you use and then use the S3 interface to essentially give you a way to get a much longer stream back and then control that stream using Lambda function code. And you can even do S3 presigned URLs on top of it as well. So there's probably all sorts of cool hacks you could build on top of it. Also worth mentioning that you can do WebSockets. So you don't necessarily have to do it in the streaming way.

You can use HTTP, WebSockets, and API Gateway has support for that. The disadvantage being that you have to maintain the state yourself. So you generally need a DynamoDB table or something to keep track of all the connections you need to send messages back to. And the other option for WebSockets is to use AWS IoT, which is the original way to do a serverless WebSockets on AWS. And that way it does the state management for you, but there's a little bit of other complexity around authorization, but it's not too bad. So I think those are all the response streaming topics we have. Maybe we could speculate a bit about the future. Do you think we'll get request streaming support at some point? And why would we want it? What would it be good for?

Luciano: That's a good question. And I think one of the reasons why this conversation comes up is because if you were dealing with kind of a more classic standalone web server in Node.js, like Express or Fastify or Happy or something like that, almost all of them, they will give you this abstraction where the request that is coming in into your handler, it's a stream. It's a readable stream. So for instance, if you are implementing a piece of functionality that receives an upload, you can actually start to consume that information and process it and save it somewhere else without having to buffer that entire upload up front and then later on push it somewhere else. And this is one of the common limitations when you implement APIs in Lambda that is not a great idea to do that with, for instance, for uploading files.

You probably will end up using something like a strict resigned URLs, which by the way, something else we talked about before, we will have the link in the show notes as well if you're curious. So I think once we get now, since we got now the response streaming functionality, it's legit to ask ourselves, are we going to get also requests streaming? So maybe we can support uploads or other use cases where even the request is streaming, or maybe when you need to send a lot of information to the Lambda and you want the Lambda to start to process that information as soon as possible. Now, I don't really have an answer, of course, we are just speculating here, but I think this was a couple of years ago, I was curious about the same question. We didn't even have response streaming at the time. And I was just trying to understand how the Lambda runtime was working behind the scenes. And the way I did the research was basically, okay, let me try to build a custom runtime myself for Node.js and see if I can make that custom runtime closer to an actual web framework in Node.js. So rather than receiving an event and the context, I actually wanted to be able in my handler to receive a request object and a response object where the request object was a readable stream and the response object was a writable stream, which is basically the lower level interface that you get in any web framework, even the HTTP library that is built in with Node.js. So I was actually able to achieve that in a bit of a hacky way. And the reason why that worked is because the Lambda runtime, when you want to build a custom runtime, AWS gives you two HTTP endpoints to work with. The first HTTP endpoint is like a polling endpoint where you have to constantly poll from your runtime to see, is there a new message there? Is there a new event that I need to handle?

And of course, it's an HTTP request. So you can take that as a readable stream and that's basically your input. And then also you get another endpoint, which is basically when you finish and you have a response, send the response to this HTTP endpoint. So effectively you have an HTTP endpoint to read from and an HTTP endpoint where you send the response to, and what you have to run in between is your actual handler code. So you could build your own custom handler in any way. You could even call something written in COBOL at that point in between, right?

And this is basically the way you build custom runtimes. So at that point, what I did is basically a very thin layer that just call a handler passing the request from the first endpoint and a response that is already starting to the second endpoint. And then the handler can basically fill the gap by reading from the first endpoint and writing in the second endpoint. Now this is very sketchy and barebone and it's probably exposing more that it should need to be exposing because it's actually giving the handler full access to requests and responses that are actually part of the runtime.

So I'm not too sure it's the best way of doing this, but it kind of works and it demonstrates that the idea makes sense in the context of Node.js, even though there is a big caveat. This is assuming that you are streaming end to end, that basically you are not buffering anywhere else outside the runtime, which we know is not the case when, for instance, you integrate this solution with API Gateway. So we saw that for response streaming, AWS needed to kind of come up with different ways and add support for different features to actually make this possible. So even though we expect this is possible also for request streaming, I think AWS will also need to kind of come up with some new feature that enables that end to end. So we still, yeah, big speculation here. We cannot just say for sure that this is going to happen and when it's going to happen or how it's going to happen, but we just feel that it's technically possible and it could enable some interesting use cases. So hopefully it's something that we will get in the future.

Eoin: I think we've covered all of the features and benefits and everything with function response streaming so far. So I'd just like to say thanks for joining us for this episode of AWS Bites. Again, we hope you enjoyed learning about response streaming and how it can benefit your JavaScript, TypeScript, and custom Lambda functions. If you want to learn more, check out the links in the show notes and don't forget to subscribe to the podcast if you haven't already. Hit like, hit the bell and stay tuned for more AWS news and updates. Thanks for listening and we'll see you next time on AWS Bites.