AWS Bites Podcast

Search...

141. Step Functions with JSONata and Variables

Published 2025-03-21 - Listen on your favourite podcast player

In this episode, we provide an overview of AWS Step Functions and dive deep into the powerful new JSONata and variables features. We explain how JSONata allows complex JSON transformations without custom Lambda functions, enabling more serverless workflows. The variables feature also helps avoid the previous 256KB state size limit. We share examples from real projects showing how these features simplify workflows, reduce costs and enable new use cases.

AWS Bites is brought to you in association with fourTheorem. If you need a friendly partner to support you and work with you to de-risk any AWS migration or development project, check them out at fourtheorem.com.

In this episode, we mentioned the following resources:

Let's talk!

Do you agree with our opinions? Do you have interesting AWS questions you'd like us to chat about? Leave a comment on YouTube or connect with us on:

Help us to make this transcription better! If you find an error, please submit a PR with your corrections.

Eoin: Welcome, everyone. This is episode 141 of the AWS Bites podcast, and today we're going to dive deep on some pretty transformative new features in Step Functions. We've always been big fans of Step Functions, and we actually covered them back in episode 54. Recently, AWS released big new features, so we wanted to share our experience of using them with you. We're talking mainly about JSONata support, but we'll also touch on the new variables feature. I'm Eoin. I'm here with Luciano, so let's get started. AWS Bites is brought to you in association with 4Theorem. If you need a friendly partner to support you and work with you to de-risk any AWS migration or development project, check them out at 4theorem.com. So maybe it makes sense to start by giving a quick recap of what Step Functions are, right? What do you think?

Luciano: Yeah, let's go for it. Okay, I'll try my best. So essentially, Step Functions allow you to avoid writing code in order to define some kind of workflow or state machine. So if you have multiple independent steps and you want to orchestrate them together using steps, conditions, parallelization, loops, Step Function is effectively the service you want to use in AWS. And just to give you some examples, it can be, I don't know, an e-commerce order flow, something that you might want to model with Step Functions, or maybe an ETL or some kind of other data transformation process.

One thing that we have done, and we actually have been talking about that back in episode 103, is automating a transcription for this very podcast. And we have a Step Function that uses some AI components to basically extrapolate transcriptions and help us to create all the things that we need to publish the episodes and also update them on our website. So if you're curious about that, go and check out that particular episode, where we'll share all the details.

And also some of this work is even open source. So you can even check out the code if you're curious. So there are lots of benefits when it comes to Step Functions. The main thing is that you don't have to use programming language. It's effectively a declarative approach. And therefore, you don't even need to worry about what is the operative system that you need to install things on. Effectively, AWS manages all of that for you.

So effectively, AWS manages all the stuff. So you worry less about security issues, library dependencies, upgrades, all this kind of stuff. So it's good to use manager service for this kind of things. And a few features are actually really cool, and I really like them about Step Functions. One is, for instance, that if a step fails, because maybe you are trying to model something that might occasionally fail, it's very easy to configure retries.

And then if something fails and it breaks your entire Step Function, you can easily inspect that Step Function and replace specific steps. You also have archives, re-drives, and in general, robust error handling. And similarly, observability, I think, is really good. For instance, when something goes wrong, you can easily see exactly what happened throughout all the steps. AWS will retain all the inputs and outputs for each step.

So effectively, you can easily pinpoint exactly what kind of error happened. You can easily fix it and then maybe retry from there. And the other cool thing is that Step Function can integrate directly with almost every other AWS service. And for instance, there are specific optimized integration with certain services. But whenever that integration doesn't exist, you can rely on the SDK and effectively model an API call directly in Step Functions.

So Step Functions are, how do you define them? This is probably the main question you might have. So there is a specific declarative syntax that is called Amazon States Language, or ASL. And it can either be written in JSON or YAML, and effectively allows you to define all the different components of your state machine, and also to reference things like ECS tasks, HTTP APIs, and much more. You can also use CDK or other infrastructure as code tools if you don't want to write plain JSON or YAML.

And those make the process a little bit easier. But there is actually a very good IDE that is called Workflow Studio. And you can access that in the AWS Management Console. But recently, AWS also launched a VS Code extension that effectively supports this UI2 directly in VS Code. So I think so far we talked about the benefits of Step Functions. But it's fair to say that there are a few drawbacks. The local development isn't great.

This is probably the main thing. So the feedback loop can be sometimes a little bit annoying. There are things like local stack that you can try to use, and maybe they solve some of these problems. But I think in general, what I've seen is that people just deploy to AWS and test it directly in AWS. So there might be a little bit of latency there between changes and then testing those changes. The syntax that is supported is, of course, not as good as a fully feature programming language.

So if you are trying to model something very complex, maybe you'll find that the syntax itself might become a little bit limiting. And often you need to do something custom, and you end up creating like an AWS Lambda step to just do something very specific, maybe a particular transformation. So those are things that can get in the way and maybe be a little bit easier sometimes. And there has been another annoying limit, which is the state, which is effectively all the data that you are carrying around between steps, and then every step can read and write into that state, was limited up to 256 kilobyte.

And I think it's still limited to that, but there are ways to kind of work around it now. And this is where we're going to start to talk about JSONata and variables. So one last thing that I'm going to mention is that those two features are also available in both standard Step Functions and express Step Functions. So standard is effectively long-runner workflow. So you pay based on the number of transition, and express is kind of a shorter, short-lived version of Step Functions, where effectively your Step Function can only run for maximum five minutes, and you pay by execution time. Generally, these are a little bit faster and cheaper. So depending on the type of workflow, you might want to use either standard or express. So I'll pass it now to you, Eoin, because I was talking a lot, and you can tell us everything about JSONata. This is pretty new to me.

Eoin: I think this is the first time I've really used JSONata. Maybe I've heard of it before, but it's not something that's widely used. I think it's supported a lot by IBM, and they use it a lot in their products. But it is a JSON query language, which was inspired by XPath from the world of XML. And it allows you to create sophisticated queries to transform and extract data from JSON. You might be familiar with JSON path support, which has been used in AWS in a few different places, including Step Functions.

JSONata is a much, much more fully featured syntax. So it supports string manipulation, numerical operations, things like date-time conversion, regular expressions, which we know we all love, comparison operators, and conditional logic. Also array and object manipulation. You can do even sorting, grouping, and aggregation. You can define functions in it and closures. But you've also got things like filter and map and reduce.

So pretty much anything I think you can imagine doing in order to transform a blob of JSON-A into a blob of JSON-B, JSONata has support for it. Now, traditionally, Step Functions supports the JSON path mechanism. And it also had some intrinsic functions. They added some of those, like for formatting strings and whatnot. But realistically, JSON path really only allows you to provide a tiny subset of what JSONata can now do.

So the amount of data transformation you can do is massively increased. And there are a load of benefits that come with that. So I guess you might ask, how would you use JSONata in Step Functions? And how does it differ to the traditional approach, if you like? Well, previously, JSON path was your only option. Now, every Step Function itself can have a top-level query language specified, which can be JSONata or JSON path.

And you can also customize it on a state-by-state level. And that's pretty interesting if you already have a large code base, lots of Step Functions, and you just want to start dipping your toes in, or maybe just applying it where it's really, really valuable. So you can do that. You can just specify the query language for one state as being JSONata. And instead of using all of those quite frustrating and difficult to understand properties before, like output path, result path, input path, result selector, all that stuff, you just specify a JSONata expression for either the output field, which encapsulates everything you're going to output from that state, like in a pass state.

Or you can specify JSONata for an arguments property as well. So if you imagine you're invoking a Lambda function and you want to pass some parameters to it, you now do that with arguments, which supports JSONata. And then all of the inputs, like the states input, can be referenced using a special variable $.states.input. And once you know that, you want to start writing some expressions in JSONata.

And you can look up the documentation, which is pretty good. There's also some online tools. One is the kind of official, more or less, I think, JSONata Exerciser, which is an online tool. But we also know that Steady has an online JSONata playground, which has nice autocomplete support and stuff in it. There's not a lot of other tools like VS Code plugins yet. So the ecosystem is not as rich as some other things.

But those tools, I found, have been pretty useful to get you everything you need. Now, at the time of recording today, JSONata support is not yet fully developed for the AWS CDK constructs. So if you do need JSONata in CDK Step Functions right now, you have to use a custom state and provide the raw Amazon states language. And I've found that to be pretty okay, actually. It's fine. Because once you're writing JSONata, you're in a string within an op property anyway.

So I don't know if CDK is really going to provide anything significant beyond that. But it is something to be aware of. There is an issue in the AWS CDK repo to track that. Overall, the experience with JSONata states is, I think, simpler, easier to read, easier to understand, since you don't have to deal with all those input path, output path, result path stuff that all interact together. You also don't have that .dollar syntax for property names that you might keep forgetting anymore either. And overall, I just think it's a much more developer-friendly experience and way more powerful. There is a good guide on moving from JSONata to JSONata. And Eric Johnson has an overview video on that page as well. We'll link it in the show notes. And yeah, that's JSONata support. I found it pretty useful so far.

Luciano: That's pretty cool. What about this new variables feature then?

Eoin: Variables, yeah. So this 256K limit you had for state data. And it basically stayed past all the way down from the top, fell out the bottom of your Step Function. There was no such thing as global state. And the 256K limit could be annoying. And we definitely had cases before where you were pushing data to S3 or DynamoDB and trying to pull it out and use lambda functions to extract a subset that was under 256K.

Now there's a new feature called variables. And that allows you to declare variables in one state and then reference them in any subsequent state without having to propagate them all the way down the state chain. So the variables are just named. And the total size of these is up to 10 megabytes. So already you can do far, far more than you could previously. Now you assign those variables using either JSONata or JSONpath.

I think you're going to see them being used with JSONata a lot more, to be honest. But when you combine this feature with JSONata, now you have a really big step forward in what you can do with the capability of Step Functions. We have been using it in real projects. And we were able to remove a lot of lambda functions that just did data transformation only and reduce the complexity and even the cost of Step Functions overall, if you consider the cost of invoking lambda functions and waiting for them.

Maybe some examples of the things we were able to do. One which I found really satisfying actually was on a project we were able to use Step Functions for doing API integrations. We had a third party API that was authorized with a bearer token. You're able to set up an event bridge connection, which is what Step Functions uses for HTTP invocations. And we can call APIs now over HTTPS using Step Functions, using the bearer token from Secrets Manager.

Now we can process large API responses. We can process them with JSONata to filter and transform the data in a really powerful way and then fan out and call more specific actions in like a map step. And that was an existing code base. So we were able to just remove a whole lot of lambda functions, which has great benefit. You know, you reduce your deployment time, you reduce your maintenance, you reduce the runtimes you need to keep track of. And it just simplifies everything and makes your whole workflow very easy to understand by reading it. And the other thing we use JSONata a lot is just for simple past states. Like you've got the output of one state and you want to do some processing transformation on it. And we just do all that with JSONata in a simple past state. That's awesome.

Luciano: And I'm sure that everyone is wondering about cost. Like are these two new amazing features coming with some extra cost or not?

Eoin: Surprisingly, no. The cost of billing model for Step Functions is still the same in that for standard mode Step Functions, you're paying per transition. By the way, you can reduce transitions now with JSONata. So you might actually get a price reduction. And for express functions, those are priced based on the runtime, the time that they take to execute up to five minutes. So no, there's no additional cost for using variables or JSONata.

I would definitely recommend people give them a try and let us know what you think. Have you found any drawbacks? I think for us, there's a bit of a learning curve for sure, but it's not significant. And I think it's not like VTL, the velocity language you might have come across for AppSync and various other things like API integrations in the past. JSONata has been a much more pleasant experience, at least for me. So let us know what you think. I think that's everything we wanted to share. Share your use cases. Let us know. And we're always looking to compare notes and learn how you use AWS. So thanks for listening. And we'll see you in the next episode.