AWS Bites Podcast

23. What’s the big deal with EventBridge?

Published 2022-02-10 - Listen on your favourite podcast player

Eoin and Luciano continue their series about event services. In this episode, they chat about EventBridge and explore why this AWS service has such a great potential for event-based serverless applications. This episode presents some interesting examples of when and how to use EventBridge. It also covers all the different classes of events that you can manage with EventBridge: AWS events, third-party events and custom events. We discuss limits and pricing and, finally, we show how things can go wrong and how much you can end up paying for it. We conclude the episode with some tips and resources to avoid shooting yourself in the foot and get good observability when using EventBridge.

In this episode we mentioned the following resources:

Let's talk!

Do you agree with our opinions? Do you have interesting AWS questions you'd like us to chat about? Leave a comment on YouTube or connect with us on Twitter: @eoins, @loige.

Help us to make this transcription better! If you find an error, please submit a PR with your corrections.

Luciano: Hello everyone, today we're going to answer the question how do you use event bridge? And by the end of this episode you will know why event bridge is very different from everything else in this space. We will also discuss some interesting examples on when and how you can use event bridge, what can possibly go wrong with it and how much are you going to pay for it, and finally we are going to give you some tips to avoid shooting yourself in the foot and get good observability. My name is Luciano and today I'm joined by Eoin and this is AWS Bites podcast. So before we get started let me mention that this is a series about AWS services that cover the space of event and message passing. We just finished an episode about SQS, the previous one, and we also had a more generic introduction to all these services. Today we're going to focus on event bridge but if you're interested in the space make sure to follow and subscribe so you can be notified whenever we publish the next events. But back to event bridge, Eoin what can you tell us about what are the main features or the main things why event bridge is so interesting?

Eoin: Yeah event bridge if we talk about the classification we previously used we had queues and then we had pubsub and then streaming. Event bridge we put in the pubsub category but it's a little bit different to other pubsub systems it's got some interesting features. So event bridge is nice in that you can publish events without really provisioning anything up front that's I think the thing that people really like. You don't have to provision anywhere you don't have to provision any resources in AWS if you want to send events to it but if you want to consume them then you just need to create a rule and that's how it works.

So you don't really have the subscriber concept like you do with SNS. You don't necessarily have to have listeners instead you have rules and a rule is slightly different to a subscription. With a rule you define a pattern and it is really like a pattern matching concept similar to the kind of pattern matching concept you get in some programming languages where you have an event you're trying to say I want to receive events that look a little bit like this pattern here and then you can specify some properties in that event and their values this partial string that it might begin with integer comparison and then you say okay with this rule if you find any matches then can you please send these events to these targets and one of the great things about EventBridge is that it supports lots of targets so EventBridge is really good it's almost like the integrate anything with anything service on AWS so it's of all the services we're going to talk about in this series it's probably the most general purpose it can do almost anything you can imagine. There are of course some limitations which we'll talk about but it is really really powerful and it's one of the services that has really got a lot of people excited since it was announced it was actually based off a service that had existed for a long time CloudWatch events but they rebranded it and gave it a new purpose and added a lot of new features and suddenly people are getting as excited about it almost as people did when Lambda came out so there's lots of different different features in it so you've got different event types do you want to talk about maybe what what are the different events you can get with EventBridge?

Luciano: Yeah before that I just want to mention that it reminds me a little bit in its purpose and the way it has been marketed as the if this then that or Zapier answer from AWS right? Yeah absolutely yeah different type of events so there are actually I would say three main categories of events that you can deal with and the first one is that there are events that are I would call them native in AWS so things that happen in your AWS account or other accounts and you might want to listen for and react to them and just to give you a few examples for instance you can be notified whenever a spot instance has been interrupted or maybe you have step functions and you want to react to a step function changing state maybe it's started maybe it's failing maybe it's timing out and for instance you can even enable notifications and events for S3 objects so you could say every time there is a new a new object in a bucket trigger that this is an event and you can create a rule to match that and very generic thing you can use CloudTrail as well and that will give you access to another wide range of events there is a list that we will put in the show descriptions there is a link there and we'll you will find that all the events and services from AWS that will trigger events that you can capture in EventBridge there is an interesting small detail that you need to be a little bit careful about there are different guarantees in terms of delivery depending on the depending on the type of event source that you are trying to to use sometimes you have kind of a guaranteed delivery and then sometimes you have best effort so I suppose that pretty much means

Eoin: when you get at least once delivery or at most once type of delivery yeah I guess if it's best effort it's kind of at least zero right because you just for some of those services you just won't

Luciano: get an event so yeah it's good to check the docs so this is the first category and it's something that it's there for you if you want to use it you just create a rule and you use it in some cases you need to enable the events but most of the time it's there and it works for you then there are another category that is partner events and this is actually really interesting and this is why sometimes I like to think about EventBridge as sort of a Zapier or if this then that because you can plug in events from other SaaS platforms for instance I don't know Salesforce, Outzero, PagerDuty so if you integrate the platform with your AWS account basically you can start to get events from those other external platform and consume them I don't know an example that comes to mind maybe you have a sales pipeline that you are mapping in Salesforce if that pipeline changes maybe you have new deals coming in you can probably receive events and react to those and create integrations and finally the third category of events is custom events so you can use the SDK to dispatch your own totally custom events for instance I don't know if you are building any commerce which seems to be our favorite example you could create your own custom events every time there is a new order and you can define the structure of those events and then different parts of your application can create patterns and rules to capture those events and react to them that's all I have for events but what about targets

Eoin: that's yeah I mentioned that one of the advantages is that you have loads of targets so compared to SNS or SQS you have many services that you can integrate directly from EventBridge you also have HTTP destinations so if you've got a third-party API you can use EventBridge to route all of these different types of messages that can come in to a third-party API or an external API or one of your other APIs and it also supports neat things like throttling and exponential backup and back off and retry that's really useful and one of the great things that EventBridge has that a lot of other services doesn't have is cross account of EventBridge so you can very if you've got different applications different domains running in different accounts or even third-party events you can set other EventBridges in other accounts as a target so that makes it really easy to integrate without having to have you know specific services that are there with an IAM policy and you know network routing that can route from one account to another that you have to do with some other kinds of services and EventBridge takes care of that so there's definitely the most rich the richest supportive supported set of supported targets available.

I suppose we there's a couple of other features that came out after EventBridge was launched the schema registry and then the archive and replay and for people who have a lot of events in their system and they're trying to support a large number of developers and trying to share knowledge around what kind of events they can listen to the schema is really useful for that useful for that EventBridge can discover what kind of events are coming through your system and automatically register schemas for it and then you can use EventBridge to generate code samples there are code bindings so if you're using an object oriented typed language it can generate classes for you for example yeah that's really that's really neat for those absolutely yeah it can get really difficult if everything is dynamic and event driven if you don't have types it can get pretty difficult to kind of understand the structure of events and what what properties are supported and then you've got the archive so EventBridge archive allows you to retain have retention on those events so you don't lose them because they're otherwise ephemeral really once if nobody's listening if there are no rules the events magically disappear but if you've got an archive you can actually replay events so you can add rules change rules fix problems and then replay events which is pretty good for resilience yeah should we go through the process of using EventBridge it's pretty simple right but uh i guess it depends what are the steps involved if you wanted to start sending and receiving events we talked about how to do it with sqs how do you do it with

Luciano: EventBridge yeah i think it's a good thing to split the process into parts of sending events and receiving events because of course you care about sending events only when you are creating your own custom events if you want to listen for aws events or third-party events you you just get them so you just need to focus more on the receiving part so if you're trying to create your own custom events basically the first thing you need to do is select a bus that you want to use there is a default bus so i will say that most of the time that's good enough but of course you also have an option to create more specialized buses if you if you have to the other thing you have an api to send events so you can do that of course through the sdk is probably the most common pattern or you can just call the api or the cli and there is a specific interface that you have to use but it's pretty free form because you can structure the event in pretty much as you want is a big blob of json and we'll talk in a second more about that and then finally when you want to listen to the event you'll need to create the rule and the rule also is another json object with properties so let's zoom in into those two json objects so when you create an event there is kind of a best practice that is pretty much something you will see also in the aws events and the third-party events as well and what there are like fields that are expected to be there and they have a meaning so and these are the ones you will be using also when matching for those events the first one is source and source pretty much describes who is creating the event so it could be your own service maybe i don't know again in the example of an e-commerce you could be the order service so you could literally say source order dot service or something like that in the case of aws you will see something like i don't know aws dot states for instance for step functions and for third parties i expect that you have something similar for instance i don't know salesforce dot i don't know pipeline maybe and then another field is detail type which is a little bit more descriptive and it tries to describe the specific type of event that was generated by that source for instance in our e-commerce example that can be order created in the case of aws sometimes they tend to be a little bit more verbose and they have like entire sentences like step function execution status change and it's like an entire sentence that you need to match on and finally the the most interesting part is the detail attribute and generally you can use that as you want meaning that it's an object and you can store inside that object all the data that you think is relevant to describe that example and this is where it gets a little bit tricky because because you have a lot of freedom you need to be careful and on one side you can say i'm gonna just store the minimum amount of data that represents this event for instance i don't know if it's an order just the order id on the other side you might sort all the data that represents that order right and and the two sides are basically they will force a new different constraints if you go with a very many very minimal approach maybe whoever is consuming the event will need to query different data sources or fetch additional data somewhere else so that might be consuming that might be cost efficient or on the other hand if you end up with very big messages that can also be a problem especially if you are storing these messages you might incur an additional cost i'm not really sure if there are limits on the sides of the payload but we'll talk more about limits later i believe it's the same as s and s and sqs actually so 256k for the for the whole message yeah so yeah be careful that also have to to respect that limit yeah absolutely yeah one suggestion there will be try to start with something that is relatively small and that you think it makes sense and then if you realize

Eoin: over time you need additional feeds you can always add them later yeah it's interesting when you talk about event driven architectures people say oh you should do event driven architectures because it means you're then loosely coupled as if it's like a very clear yes no thing loosely coupled or not but i've heard the term semantic coupling being used whether that means when you've got when you're relying very tightly on the structure of an event and all its fields this is a form of semantic coupling so you still end up with it it's not the same as having you know location coupling coupling or time coupling but it it means you're bound to the event that you're used to seeing so it makes it very difficult for the producer of that message to change it and to change the

Luciano: structure of their data types over time so you really have to get that balance right yeah i think to to to all extents you should consider those messages as an interface right because as soon as another service is starting to consume them then yeah you might break things if you suddenly change

Eoin: dramatically the structure of the events yeah for example you mentioned that in the source field you might put the order service and so if somebody says oh well then i should match on the source because i want to get order of events so i should match on the order service but it probably doesn't make a lot of sense to match on the source a lot of the time because this is kind of coupling to the producer you might it makes the detail type is usually a little bit more semantically meaningful like order created or order created that sounds like a lifecycle event that you can

Luciano: use and you don't need to match on the source then yeah what else can we add we mentioned that you can use this to communicate across domains so it could be a good way to do event choreography and we gave you a few interesting details other interesting detail that could could change i suppose the way you implement that the integration between services is that if you if you really need a semantic that is like exact ones processing of a particular event it's on you to to do that like you need to provide your own the duplication ids and you need to handle duplication because by default you get at least once delivery so you might get the same message more than once and finally another interesting detail is security you need to of course use iam and define all the resource constraints but you can be very very granular and for instance you could say that is not possible for instance to create a rule that is using i don't know certain detail types like you can limit also that the structure you can leave it based on the structure of the events and that that can be useful because of course if you have total freedom on listening for events that can become a side channel for services to to listen for information that they are generally not allowed to listen for so you you might be very strict in that regard as well yeah an interesting thing is the other side of the coin i suppose is how do you actually write the rules do you want to tell us something about that howan

Eoin: yeah so when you create a rule you can create it using cloud formation or terraform or the api or the console and the other rule you will you'll define the pattern but you'll also define the targets and for targets you know we mentioned you can target a lambda function you can target an sqsq or another event bus lots of other services but you can also do mappings so if you're want to transform the data and if you're transforming it to a htb destination you know if you want to match this the required payload structure that's required you can do an input transformer on that which allows you to extract the data and then you can do a map and then you can do a map and then you can do a map and then you can do a transform around that which allows you to extract out fields and map them into a different kind of a structure you can do that with any event actually and any destination you've also got destinations that you can configure that are specific to an individual service and examples of that would be kinesis or sqs so with kinesis and sqs5 as we specify at what level you want to order guarantees so with kinesis you might put in a partition key and with sqs fifo you could put in the message group id and when you write an event bridge rule the targets one of those things you can say this is the you can say this is the field that i want to become the partition key or the message group id so it makes sure that if you've got events coming in through event bridge you can specify the order thing you want or the shard essentially that would be processed now remember that event bridge doesn't give you any order guarantees because it's it's not that kind of service so you don't have order guarantees on the input side so it's really more or less about doing best effort ordering or just controlling the shard they get allocated to so that you can process them concurrently that's really the the benefit there and another example of that would be ecs so you can fire off an ecs task so run a container in response to an event and you ecs task configuration in a rule will allow you to specify what container image to use what the task definition is all of that stuff so there's so much you can do with this it's worth looking into the cloud formation documentation i always find that's a good way to see what are all the configuration parameters for these things yeah it really does allow you to integrate anything with almost anything yeah the cool

Luciano: thing about that is that most of the time you don't i mean it's so configurable that you just need to write the right configuration to achieve a good integration you don't even need for instance to write a lambda to reshape the data and call the destination service which is under you maybe whole thing you would need to do but because you have such flexibility you can probably build sophisticated integration with the data and then you can do it with the data and then you can do sophisticated integration without provisioning any like compute functionality it's just event

Eoin: bridge configuration yep and that that's that's a good thing because okay it might be a little bit more difficult to troubleshoot an input transformer than a lambda function that's that's the trade-off but you don't have the latency and the extra lambda function to manage so you just got to pick your battle i suppose and figure out which is best for your use case so since i mentioned latency should we start talking about limitations and constraints and

Luciano: performance characteristics yes i think that's a good follow-up to how do you use event bridge yeah i think that the main one is latency like you you said that already but how how different it is for instance from sns where you have very small latency like we're talking about 30 milliseconds i believe in the case of sns with event bridge is a little bit unwavy but i think it's generally around alpha second which is yeah quite different from sns right sns is very performant event bridges suppose you need to start from the premise that you don't really care about extremely fast delivery it's more it's gonna happen relatively fast but not like milliseconds fast

Eoin: yeah i think this is this is exactly i mean this is one of the rules where you can say am i going to use event bridge if you need strict performance and latency high throughput you have to go with sns or one of the streaming solutions but if if you're talking about okay i just need to react to business events and process them and it doesn't have to be like if you want to fulfill an order half a second is probably not a significant latency in the grand scheme of things you know if you're talking about package delivery but if you're talking about a user who's clicked an action and maybe you're going to do some processing on it they're waiting in the web browser on a mobile device and they're waiting for a green tick to show that something has been processed you might think twice about event bridge in that kind of flow because the the guarantees like the aws documentation says typically around half a second

Luciano: so it's not exactly confidence inspiring if you're looking for real-time event processing so yeah i suppose it could be a good idea that becomes especially true if you have like a cascade of events events depending on one another of course that that effect can compound so yeah be be even

Eoin: more careful if you have that kind of situation i would still say that for a lot of you know across domain events and micro even inter microservice communication this should be sufficient in vast majority of cases and i would i would recommend against using this latency limit as a false argument just to reject event bridge and go with something that could be really more complex because if event bridge is so simple and manages so much for you that it will it could has the potential to save you massive amounts of engineering time so it's worth sticking with it and only if you really need to tweak performance you know you might need to deal with optimizations

Luciano: for specific cases yeah there are other limitations this is what i'm mentioning quickly but nothing extremely special for instance you have invocation quotas they are slightly different depending on the region so watch out for that they can also be increased if you need to have more invocations also there is a limit on the number of put events that you can do over a certain period of time there is a number i think it's 300 rules per bus but it's only you can get increased if you need to create more rules and also for every single rule you create you have up to five different targets that you can trigger if that rule matches yeah in terms of pricing is there anything

Eoin: interesting worth mentioning but the pricing for sending events is pretty straightforward so i think for every region you're looking at a dollar for every million events and that's whether it's custom events or third-party events or aws events so a dollar for a million events you have in the case of ws events that counts only for the ones that you need to explicitly enable like the s3 objects but yeah i think not all of them are actually charged right absolutely yeah they're sent by default they're sent already even if you don't know it so that's true you're not going to be billed for that there is an additional cost though for things like http destinations it's like a supplemental cost of 20 cents for a million deliveries and those features we talked about like archive and replay those are the areas that you might want to be a little bit more careful about when looking at pricing because if you're archiving events sometimes you don't notice that events are the volume is escalating and if you've got millions of events you're going to be charged per gigabyte of archive per month so that that's an ongoing cost that will escalate over time and replay you'll get charged by the events processed replay is probably not something that you're going to do as part of a programmatic workflow it's probably more likely to be done as part of remediation in the event of a failure so it's just important to be aware of and there's also a charge for a schema registry as well in terms of events ingested when you're discovering schemas automatically so that said that's the pricing spiel done and i think one of the one of the things that our channel is challenging with all event driven systems is how do you test it and you've been reading some pretty good articles on this lechano and i know paul swale has covered this in in detail a couple of times what do you think are the best recommendations there

Luciano: yeah i think at least for me the main pain point is to make sure that the rules that i write like the patterns are actually doing what i think they should be doing because this is of course a json based language with its own nuances like you have a lot of power but you need to learn how to use the different constructs to do that the pattern matching and of course you you need to find a good way to test if your rule is actually correct and i found that that there is a little bit of friction maybe something to be improved in aws in terms of the tools you could get to actually get the reassurance that your patterns are actually doing what you want them to do and so there are really a lot of great ways to do that in aws what i discovered is that there is an api called test event pattern that you could use either from the cli or from an sdk and with that api what you can do is send an example event and your pattern and that api will just tell you true or false based on is your rule actually matching that particular example of event so that could be a good way of testing things it's just a little bit annoying that you need to write either as code through the sdk or if you want to write it through the cli like it's it's very hard to write json in the cli especially for big events or rules so maybe an area where somebody can write a little tool or a little ui to do that that would be definitely useful at least for me maybe aws itself could do that that would be amazing then in the article that you mentioned from paul it's actually a really good article and it goes in depth in terms of all the things that could possibly go wrong when you start to use event bridge like from the moment you publish are you actually able to publish and if not why yeah the moment you try to consume the message are you actually receiving the message and if not why and then after you receive the message are you actually processing incorrectly and if you fail to process it what can possibly happen so that that's an interesting article and if you are interested in all these possible edge cases we recommend you to check it out there are another couple of tools that can help you in terms of troubleshooting where whether you are actually delivering like from source to destination correctly one is something called event bridge cli again we will have the link in the description and this event bridge cli it's a tool written in go i believe that it's very interesting because you just run it locally and it will provision in your account a rule that you are actually defined that that's the one you want to test but it will automatically dispatch that rule into sqs and then it's going to create for you it's going to poll at the cli so you get like an interactive cli it's going to keep pulling from that queue so you can start to try to fire events and see if they appear in your local terminal so it's a very powerful and convenient way to test that to test that your patterns and your events are actually connected correctly together and i think there is another tool which is probably similar from lumigo but i haven't used that so we'll put the link in the description as well other interesting tool are event bridge atlas and event bridge canon which help you out as well with writing the events and testing them and sharing them in a team so i haven't used them i haven't used them extensively so i cannot give you a lot

Eoin: of detail but definitely interesting to to check them out yeah event bridge atlas and event bridge canon both come from david boin who's also tweets and writes a lot about event bridge and has some really interesting tools so yeah definitely worth checking those out um should we talk about integration with sqs because i know we've got um we talked about dlq's last last time with sqs i know that um you can use dlq with event bridge which is really good because event bridge doesn't give you reliability built in if you don't get to catch the event if you missed it it's it's gone right it can be in an archive but it's essentially gone but you it does have retry policies you can configure retry rules in your rule and you can set a dlq as well then your undelivered messages will go to sqs um but when do you know so here's an interesting one when do you decide to process things directly using a target that might be a lambda function or when should you put an sq

Luciano: sq in between those two things so that you get that extra piece of resiliency yeah this is this is something that suppose we mentioned also in the in the previous episode about sqs that sqs can be used in general to to give you that extra peace of mind that a message is actually being recorded and then you can consume you can retry so i think that that's that should answer your question right whenever you want to have that use case that when you really care about a message and you want to be sure that it's being processed i think it's worth putting an sqs in between and then consuming the message through sqs rather than letting event breach invoke directly your target in other cases where maybe you don't really care that much you can afford to lose a message or two probably it's not worth it you can just keep that integration and yeah and go ahead with that yeah that's good god do you need best effort durability or most time durability i guess i guess isn't it okay cool what else should we cover relating to event bridge i think i will just mention few additional resources i i really like the work that sheen bristles from the lego group has been doing around event bridge so definitely check out his blog again we'll have a link in the description there are i think six or seven different articles that cover pretty in-depth like tutorial style how to do different things with event bridge and also there is a video that where he's talking a lot about different characteristics of event bridge

Eoin: we'll have a link for that as well and i think that's really good it's good to see how i think sheen in that talk gives good examples of how they structure their events and they've got a different a different style to what we mentioned in the detail which is really interesting and really

Luciano: worth watch so yeah definitely recommend that and yeah i think with this you get a pretty good overview of what you can do with event bridge how to use it what to watch out for so this is it for this episode but make sure to follow and subscribe because in the next episodes we'll continue the series about event systems we'll be talking about sns kinesis and gafka so yeah stay tuned we'll we'll talk more about this stuff bye