AWS Bites Podcast

21. What services should I use for events?

Published 2022-01-27 - Listen on your favourite podcast player

In this episode of AWS Bites podcast, Luciano and Eoin talk about AWS services related to events and message passing like SQS, SNS, Event Bridge, Kinesis and Kafka (MSK).

We discuss in which context is convenient to use messages and events and we deliver a quick walkthrough of all the services discussing major features and some practical examples on how to use them.

In this episode we mentioned the following resources:

Let's talk!

Do you agree with our opinions? Do you have interesting AWS questions you'd like us to chat about? Leave a comment on YouTube or connect with us on Twitter: @eoins, @loige.

Help us to make this transcription better! If you find an error, please submit a PR with your corrections.

Luciano: Hello, today we're going to answer the question, what services should they use for events? And we are going to cover in which context is convenient to use messages and events. What are the major services available in AWS for these particular cases? And then we are going to basically discussing all the services at high level, what are the major features and some of the examples. My name is Luciano and today I'm joined by Eoin and this is AWS Bites podcast. Okay, so Eoin, do you want to start by giving us an idea of contexts and use cases where it might make sense to use a message passing or AWS services related to that? Yeah, there's a lot to cover here.

Eoin: And I suppose we're talking about asynchronous communication between services or components or systems. There's a lot of things you can do with all of these services, but we might be talking about integrations between two different systems, just communication between microservices and event-driven serverless architectures, which are really exciting when you look at what you can do with things like Lambda and all these services. Yeah, so I think we've got, there's a lot of things we could cover. We're going to cover the five different services, SQS, SNS, EventBridge, and then Kinesis, but also Kafka, managed streaming for Kafka, MSK. So should we start with SQS, given that it's, I think, the oldest AWS service, what would you use SQS for?

Luciano: Yeah, I suppose one first good way of seeing all the services is trying to understand like high level, what are the categories? For instance, talking about SQS, I will say that SQS is more like point-to-point type of communication. You generally have one producer that is creating tasks or task definition, job definition, whatever we want to call them. They are stored in a queue. So this idea that you push things from one side and you consume them from another. And generally you have one consumer or a group of consumers, but they are kind of consuming the same type of tasks. Yeah. So you generally would say that it's like one producer, one consumer type of configuration when you use SQS. For any usage, yeah.

Eoin: Yeah.

Luciano: While in comparison, we have something that is more PubSub, publish and subscribe event, and I would probably put SNS and EventBridge in that bucket. And the reason is that with PubSub, you generally have, again, one producer, but on the other end, you might have from zero to many consumers. Like you are just saying, this is happening. This is the definition of an event. Nobody might listen to it.

So the event just phase away, or you might have one or more consumer actually reacting to that particular event. And then the other case or group, if you want, is a more streaming based use case. And this is where I will put Kinesis or MSK. Again, you have one producers, zero to many consumers. The difference that messages are, I would say somewhat more important because they are persisted and they're kept around, and that gives you more features. For instance, you can replay them. And also generally streaming as higher throughput. So definitely a good candidate for Kinesis or MSKs when you are really processing high throughput data and you need to do like real time operations on the incoming data.

Eoin: That makes a lot of sense. Yeah, so should we start with SQS first then and think about some of the use cases and where would you use it? What do you think?

Luciano: Yeah, so in general, I will use SQS in all those situations where I want to make a system more resilient because you can just put a queue between a particular operation. So whenever you receive the input and where you are producing your output, if you put a queue in between, then that gives you the ability of first of all, persisting all that input, then decoupling that the collection of the input from actually doing the work to produce the output.

And with that, you get a lot of features like, I don't know, if something fails, you can easily retry it. And you create that letter queue. If you are actually unable, you retry it multiple times and you are still failing, you can save it. And somebody can analyze it manually, figure out what went wrong. And then from that letter queue, maybe you can re-ingest the message and execute it again after you realize what was going wrong.

So that's generally a good use case for SQS, but it can also help for performance. For instance, you can use a queue for every single action in a system where maybe you are receiving a user request, you need to do some actions, but those actions can be postponed. You don't need to do them right now just to answer, to produce a response for the user. Just few examples, very common ones. A user does something, you need to send a confirmation email. You don't need to do that in line in the user request. You can just save it to a queue and a background process can pick it up and actually send the email. Very similarly, a user is uploading a picture, you need to resize it. You don't need to let the user wait in line while you are resizing the pictures. You can just say, okay, the picture was received and you can do all the resizing in the background using a queue. Anything else worth adding?

Eoin: I think that really covers it very well. Like if you imagine at a very simple level, if you've got a synchronous request processing flow right now, if you want to add resilience to that, just put a queue in the middle. And that's a good first step to making your architecture more resilient and more performant.

Luciano: Yeah. Do you want to move to SNS then?

Eoin: Yeah, yeah. So SNS is, back to the categorization, it's a pub sub mechanisms. So instead of having a point to point thing like SQS where you typically have each message being processed by one consumer, with SNS, you want to target multiple subscribers. So a subscriber could be something like a Lambda function. It could be one of the other things that SNS supports like email and SMS messages. That's how it's kind of got started.

But it's essentially when you want to be able to publish something and have other systems react, but you don't necessarily know what those other systems are in advance. You can anticipate multiple potential subscribers in the future. So the thing about SNS is that you need to create a couple of resources. You need to create both the topic resource and then you need to create one or more subscription resources in order for messages to flow.

And it's also not inherently resilient. So SNS, if you want your messages to be stored, you would typically have an SQS subscriber and then have your subscriber action for actually react to the queue rather than the topic directly. And that's very typical way of doing a fan out with SNS and then multiple SQS queues at the end. So the examples where you'd use them, I think, are in a microservices architecture, if you want to communicate events between domains.

I suppose, if you imagine an e-commerce application with multiple microservices, and let's say you've got an order service and an order is created following a web request, then you might have an analytic service that picks up that order event and stores it in the data warehouse to do some analytics or ETL downstream. Then you might have another service that sends a confirmation email to a customer and another service that starts the fulfillment workflow. So if you're doing this kind of event-driven orchestration, that's one of the ways you could do it. SNS has so many different use cases. Wherever you would use something, even like a topic in some of the traditional services like ActiveMQ or RabbitMQ, you could use SNS instead. And it's pretty performant as well. So the throughput is pretty good. And so that brings up an interesting point because SNS and EventBridge can seem quite similar on the face of it. How would you describe if somebody was choosing, should I use SNS or should I use EventBridge? They both seem like PubSUB. What's the difference?

Luciano: Yeah, that's a good question. And if I have to be honest, I was a little bit confused myself when EventBridge was announced to try to understand, OK, why a new service when we have SNS? And I think it's just it covers a very similar space. So definitely there is an overlap between the two services. It's more the way that they provide you all the tools and features that you might need to fulfill that particular need.

So I think EventBridge is definitely more flexible than SNS and in different ways. So just to try to make a list, the first obvious difference we will notice is that with EventBridge, we don't need to explicitly create topics. You have a default bus, it's called, and you can just use that to publish all your messages. Also, the other thing is in the way you consume those messages with SNS, you are basically listening on that particular channel for all the events that are published on that channel.

With EventBridge, the mechanism is based on pattern matching. So you could describe pretty flexible patterns that will allow you to capture even different types of events with just one expression. So for instance, you could say, OK, I want to listen for all the events produced by a particular source, all the events that contain a specific ID in a given field of the event. So you can build all the patterns in the way that makes the most sense for your application, and that can be very powerful.

Other than that, it is very interesting that EventBridge supports out of the box what I will call AWS internal events, if you want. So not necessarily specific for your application, but events that happen in the context of a given AWS account. And that's something you can use to build specific integration. For instance, you could listen for a particular step function that is changing its own state, and maybe you would want to react to that.

Or very interestingly enough, you can listen for S3 events. So you can listen for new files on an S3 bucket, or files that are updated and deleted. And similarly enough, you can listen for CrowdTrail events. And in all these cases, you can still use pattern matching, so you can have very sophisticated ways of capturing very specific events that matter to your application. Other thing is that it supports many more targets than SNS.

For instance, you can use EventBridge to basically propagate events to SNS itself, SQS, Lambda, step function, log groups, event batch, an EC2, or even other event buses in other regional accounts. So definitely much more powerful in terms of all the different ways you can distribute the messages. Finally, there are other features, like you can have schema, the schema registry where you can visualize the shape of all the messages that are going to the bus.

You have discovery functionality so that every time there are new messages, the schema is registered, and you can use it. And then you can archive the messages for long retention, and you can even replay the messages. So definitely much more features from EventBridge compared to SNS. In terms of examples, I guess they are pretty similar to what you will do with SNS. So I think your re-commerce example is still valid.

You could implement that even with EventBridge. Definitely. Because you can also listen to AWS-specific events. Another example could be you are interested in files that are uploaded to an S3 bucket, maybe, and you can build easily a pattern that will capture those events. And you can do, I don't know, a virus scan, maybe, on files that are uploaded by users. Or maybe if those are text files, you can pick them up and index their content so you can create a search functionality on top of these files. All of that is something you can build easily because you don't need to create an event bus for them. You don't need to create the events yourself. But they will happen automatically in the context of your AWS account. So yeah, that's all I have. Am I missing anything important from your perspective?

Eoin: I think that covers a huge amount. One of the things I've seen is that the performance characteristics of EventBridge are slightly slower compared to SNS. So that's one consideration people might want to bear in mind. Sometimes if you need to process it within a few hundred milliseconds, SNS might be the best option. But generally, for these kind of events, most of the events in EventBridge are going to arrive pretty quickly anyway.

You may have some outliers that are a little bit slower. So that's one thing to bear in mind, especially when you're doing this common event process you mentioned. It's a very common thing to have an event bus for all the lifecycle events for all of the resources in your application get published. But if you've got performance characteristics around that, then you might want to think about something else.

Traditionally in microservices, people may have used Kafka or a stream processing system in the past. So maybe that brings us nicely along to the other category we mentioned, Kinesis and Kafka and stream processing. So I think we mentioned that it's suitable for high throughput. So definitely with these Kafka and Kinesis, you're going to get lower latency, like really low latency. So when you really want low latency and have to react in milliseconds to events, this is what you need to be thinking about using a stream processing.

Some people do use these stream processing systems as pub sub buses because you can. They all have a concept of like a channel or a stream that you can treat like a topic. And then you have consumers for it. The difference is that the way they work and the way data is stored is completely different. It's almost like instead of a message bus, it's almost like sequential lines in a file. And the consumers are just pointing at a given line number.

And that's kind of a simple model for how those things work. But what that means is that you can get guarantees around ordering. And that is one of the fundamental things that can be beneficial with Kinesis and Kafka. But you've got retention. So Kinesis will allow you to store messages for up to a year. Kafka, you can serve it forever. But you have to think about scalability because if you're looking at a Kinesis consumer, there's consumption throughput limits you need to think about.

There are producer throughput limits. And in all these cases, you need to size the cluster if it's Kafka, or you need to size the streams if it's Kinesis. So you really have to think about the numbers and the mathematics around your event flow if you want to use those things. So there's more of an investment, I would say, needed for them. If you can get away with using SNS, SQS, and EventBridge instead, it's going to be much simpler. But they do have their places. So I guess, yeah, maybe we should give some examples. Do you have actually some examples, Vigianno, that you could say are good? Illustrate the differences between stream processing and what you'd use it for as opposed to a PubSub?

Luciano: Yeah, absolutely. So one example that comes to mind is, for instance, you have an application where you have users interacting with, let's say, for instance, products. Again, just to stick with the e-commerce example. Maybe you would be really interested in trying to observe what the user is doing in the page to understand maybe what other products you can suggest to that user. And you could implement something that captures all the user clicks, but not just the clicks, maybe even the user offering specific elements in the page, or even scrolling the page and looking at a specific area of the page.

And you can create streams where you are basically sending all these events in real time. And then you can have analytics consuming all this information real time and responding with suggestions for other products that the user might be interested to watch and maybe purchase. Or other examples that I've seen are, in general, when you need to transfer large amounts of data between systems. And you need to do that as quick as possible.

For instance, logs or network traffic, metadata about network happening in a particular, I don't know, private network or something like that because you might want to do a security analysis or things like that. One cool thing about Kinesis is that basically you send messages in batch. And you can immediately receive the batch as soon as it's available. So for instance, as soon as you accumulate, let's say, I don't know, 30 messages in a batch, and that can happen in literally single digit milliseconds, you will receive that message. Or alternatively, you can define time-based window. So if the batch is not complete within one second, I still want to receive it straightaway because I want to process as fast as possible. So I think Kinesis will give you these capabilities that are not easy to replicate with something like SNS or Event Bridge.

Eoin: Yeah, for sure.

Luciano: Yeah, I think with that, we have a good overview of Kinesis. Maybe it's worth spending a little bit more time trying to highlight some of the differences with Kafka, MSK, and why you might want to use that as an alternative to Kinesis. What do you think?

Eoin: Yeah, that's true. Because Kafka, I guess, is similar, but has way more features than Kinesis and Minkisa, Kinesis is deliberately simple. You have to size it correctly. Even both of those services recently announced they have serverless mode. So it's kind of auto-scales a little bit. It's not completely serverless, but it helps. You still have to think a lot more about sizing. With Kafka, you'll get more features.

If you've already got Kafka, I would say, it's a reason you would use MSK. If you don't already have an existing investment in Kafka and all of the tooling and the ecosystem around it, which is really large and fairly complex, then it's probably OK to go with Kinesis if you need stream processing. And I think that's really where MSK fits, is really for people who are migrating to AWS, and they've already got Kafka somewhere. But it does have a huge feature set. So if you've got lots of advanced stream processing use cases at scale, then it's definitely worth a look.

Luciano: Yeah, absolutely. I will probably summarize it that Kafka is more of an industry standard for this kind of task. It is.

Eoin: It's de facto.

Luciano: So if you probably work in multi-cloud environments or, I don't know, run your own Kubernetes clusters, probably it's better to use something like Kafka because you will have it available in all the environments you need it to be, while Kinesis is AWS specific, so you can pretty much run it on AWS. And yeah, I think that brings us to the conclusion of this episode. Just to summarize, I think you can probably cover 90% of the use cases by just sticking with SQS and then SNS or EventBridge.

And then you can use only Kinesis or Kafka for more advanced use cases where you need high throughput or where maybe you need to store all the events for a long time and you need all the additional features that you get from something like Kafka. So with that, I think we have covered pretty much all that we wanted to cover today. We will be doing a more in-depth series on all the services. So if that's something that interests you, make sure to subscribe and click that Like button so you can support us and be notified every time we publish the next episodes. Thank you very much for being with us, and we'll see you in the next episode. Slac.