AWS Bites Podcast

7. When do you use Step Functions?

Published 2021-10-22 - Listen on your favourite podcast player

In this episode Eoin and Luciano try to reply to a question suggested by Emily Shea: When do you use Step Functions?

Our answer describes what Step Function is and what you can build with it. We discuss some examples of features that we built in the past using step functions (a billing engine and a crawler) and why. We also discuss what are the main advantages of Step Functions and some things to be aware of, including limitations, cost and when not to use Step Functions.

In this episode we mentioned the following resources:

Let's talk!

Do you agree with our opinions? Do you have interesting AWS questions you'd like us to chat about? Leave a comment on YouTube or connect with us on Twitter: @eoins, @loige.

Help us to make this transcription better! If you find an error, please submit a PR with your corrections.

Luciano: Hello and welcome to another episode of AWS Bites, the weekly show where we try to answer AWS questions in just about five minutes. My name is Luciano and today I'm joined by Eoin. And before we get started, I would like to suggest you to follow and subscribe so you can be updated the next time that we publish a new episode. This question was suggested by Emily Shih and it was on Twitter. And the question is, when do you use step functions? I will say before we try to answer that question, maybe we should give a very quick description of what step function actually is. Yeah.

Eoin: So step functions are AWS's service for implementing state machines. So what that allows you to do is to create workflows that are composed of sequences of steps and they might have, you know, control flow if statements in them and you create them declaratively in JSON or YAML rather. So it's an alternative to trying to implement workflow in something like a Lambda function where you might be using code or using an event driven kind of orchestration approach. So that's what step functions are. Yeah, to me, I like to describe them as like the power user version of something like Zapier.

Luciano: So where you can easily integrate different services. And the reason why I like to say that is because especially recently they published a new visual editor. So I think the barrier to entry is much lower now and it's easier to just even drag and drop different things and integrate them together. So I like that analogy. And I like to actually to give an example of what was the first thing I implemented using step functions.

I was working in the electricity industry. So we were working in a company that was selling electricity and we needed to calculate billing for every customer. And that process was actually quite complex. We needed to fetch data from a number of different sources in a different format, different databases. There was like FTP, SQL, CSV repositories. And after we got all the data, basically we needed to do a bunch of different calculations.

And then finally we had that billing figure. Now interesting enough, there was also manual step to somebody actually review the numbers because those were big industrial customers. So we wanted to be double sure that everything was okay. And then when the final bill was reviewed, somebody will click a button somewhere and that will make the step function continue. And the step function will send the final PDF by email, save it in a bunch of different places and finalize the billing process. So that was an interesting use case. And using step function was helpful to us because we would actually for every customer, just open the step function page and see literally what was the state, the current state. If there was some error, we would implement the retry mechanisms. So it was very convenient rather than orchestrating all of that in like one big Lambda. Do you have any use case on your side Eoin that you want to share?

Eoin: Yeah, you can use it for so much these days because now it's actually integrated into the AWS SDK. So it used to be able to integrate with DynamoDB and Lambda and a few other services, but now you can actually call any AWS SDK service. So the possibilities are pretty unlimited. One of the examples I covered off in the book I co-wrote with Peter Elger, the AI as a service book is actually implementing a simple web crawler in the step function.

So that was an interesting use case. I wouldn't necessarily implement Google using step functions, but it was a case where you have a page to crawl. You need to traverse through the page recursively and do some analysis on it. You're actually extracting text and then using AWS Comprehend to analyze the text and figuring out if there are any dates or locations mentioned in the webpage. So we use step functions to implement that.

And it's quite good for that kind of thing where you've got loops, recursive flows, or you've got AWS services you want to call out to that might take a long time to execute. It's like some of the AI services. So that's one thing. I also find it, I think where it really shines and will be adopted more and more is in enterprise workflow cases. When you've got businesses that need to do process business rules, and that could be like batch processing based on some rules relating to pension schemes or insurance policies or financial rules. All of the FinTech banking insurance sector has a lot of that kind of stuff that might run on a scheduled basis or in response to an event. And it needs to evaluate some complex workflow and perform a series of steps. And being able to visualize them in the step functions console is really handy, especially when you're troubleshooting for standard workflows that is. Maybe we can get into express workflows too. So I think those kind of business process modeling cases where you might have used other very complex custom code or business process modeling engines in the past, those are really good places where you can use step functions. Yeah.

Luciano: All the use cases you just described struck me as like long running, I don't know, pieces of business logic. So I'm starting to ask myself, what are the limitations that maybe you should consider before using step function for something? Yep.

Eoin: Yeah, that's a good one. The interesting, slightly amusing limitation for step functions that you can run each execution for up to a year. So I think it's the longest time limit of any service. But yeah, and obviously that brings its own challenges because how do you test something that expires after a year? It's an interesting challenge in the development and test. So you can have long run against executions.

And the reason for that is because you can have manual approval steps like you've already outlined Luciano. So that's why you might need that. People might take months to actually approve something. You also have a limit on the number of transitions. So in a standard workflow, that's 10,000 transitions. And that can hit you, especially if you've got a wait and retry event loop, if you're waiting for a service to run, or if you've got a lot of items. You've also got a limit into the number of the amount of data you can process, right? So you can't throw megabytes of data into a step function, and you can't process millions of items in a map state or a parallel state, some of those really useful control flow states. Have you hit some of those limits?

Luciano: I have hit some of these limits once, and I needed to figure out a different solution. I don't remember exactly what it was, but I think I was doing all the orchestration in a lambda for that particular step, which was a little bit unsatisfying, I'd say, because I lost the benefit of step function at that point. But on the number of transition, you reminded me that that's actually an interesting thing to consider, because the pricing model is actually based on how many transitions you are doing in the execution of a step function, including even start and end. So even if you have just one step, you are still paying at least three transitions there. It might get very expensive if you do a lot of transitions very quickly and you do a lot of executions, so that's maybe something else to consider. If you have very simple use cases, maybe with a lot of transitions, maybe if you need to save money, don't go directly into step functions. That's also something that could become a limitation, the pricing model there. Yeah.

Eoin: And they introduced the Express workflows then, so that can help with the pricing, but only if you have a very short running execution. So when would you use Express workflows as opposed to standard workflows? Yeah, as far as I know, Express workflow, they work in synchronous mode.

Luciano: So basically it's like you can implement a request response type of pattern. So if you need to have, maybe you have still a complicated workload behind the scene, but the way of consuming that workflow, you just make a request and you expect a response as soon as possible. I think in that case, using the Express version of step function will be a much better suit for that use case. Yeah.

Eoin: And then you've got a five minute limit, which is obviously a completely different scale to a standard workflow. Yeah. So it's for really, really fast executions and things that might be behind an API.

Luciano: Okay. So now we are approaching the 10 minutes mark, so I think this is time to do the closing. That's all we have for today. Thank you for listening. And if you enjoyed this episode, of course, make sure to subscribe and like. And we are always curious to know your opinion. So if you have other interesting use cases that you want to share, or if you disagree with our opinion, absolutely send us a comment or reach out to us on our social channels. We'd be more than happy to chat with you and compare our experiences. And with that, thanks again. We'll see you at the next episode. Bye.