AWS Bites Podcast

82. Redis on AWS: Is ElastiCache the Right Choice?

Published 2023-05-26 - Listen on your favourite podcast player

Who is the king of all databases when it comes to performance? Yes, Redis! Of course!

In this episode of AWS Bites, we talk about Redis on ElastiCache, one of the most essential instruments in the cloud architect's toolbox.

We explore the joys and woes of Redis on AWS and share some exciting alternatives regarding in-memory databases and caching systems.

We discuss the use cases of Redis, including session storage, web page caching, database cache, cost optimization, queues and pub/sub messaging, and distributed applications state.

We extensively talk about ElastiCache, the managed cache solution on AWS based on either Redis or Memcache, and its features such as replication groups, auto-scaling, and monitoring.

Finally, we discuss potential alternatives, such as DynamoDB (with DAX), Upstash, or Momento, a serverless cache built on Pelikan.

AWS Bites is sponsored by fourTheorem, an AWS Consulting Partner offering training, cloud migration, and modern application architecture.

In this episode, we mentioned the following resources:

Let's talk!

Do you agree with our opinions? Do you have interesting AWS questions you'd like us to chat about? Leave a comment on YouTube or connect with us on Twitter: @eoins, @loige.

Help us to make this transcription better! If you find an error, please submit a PR with your corrections.

Eoin: For a long time, if you asked a lot of architects, "How do I make my system faster?", the answer was "Redis"! It's still one of the most important instruments there in the architect's toolbox. On AWS, the recommended way to run Redis has been ElastiCache, but that's not without its problems. We're going to talk about the joys and woes of Redis on AWS and share with you some exciting alternatives. I'm Eoin, I'm here with Luciano and this is the AWS Bites podcast. Our sponsor is fourTheorem, an AWS consulting partner that helps your organization get excited about building software again. Find out more at fourtheorem.com. You'll find that link in the show notes. Luciano, what is Redis and how does it work?

Luciano: Let's start with the name. Also, because before we prepare for this episode, I actually didn't even know what the name of Redis stands for. We figured it out by looking on Wikipedia that stands for remote dictionary server, which I don't know. Didn't expect that meaning. Let's move on. Let's talk a little bit about the story of Redis and then try to understand how it works. Redis was born from an Italian startup that was building effectively a real-time web log analyzer type of application.

I imagine that like some kind of Google Analytics alternative. I don't know if it's too accurate, but you can imagine the kind of data problems, number crunching that you have to do to keep aggregating all the type of information coming through very quickly. I believe initially this company was using MySQL for all the persistence, but they, of course, if at the point of scale where they needed to do all these operations fast for large volumes of data, MySQL wasn't really enough.

Antirez, Salvatore Sanfilippo is the real name, was the developer of Redis, which was initially written in Tcl to try to address this problem. So how do we figure out a different kind of storage and data persistent layer that allows us to aggregate all this information fast enough for this specific problem? Now, after the first prototype in Tcl, they realized that this solution could work and they rewrote that in C and open-sourced.

And it was actually quite successful, especially in the Ruby space. And it was quite quickly adopted by GitHub, later on by Instagram. I believe also Twitter was a heavy user of Redis as well, at least for the very initial period. Now, how does it work and why it is faster than MySQL? And the main difference is that while MySQL is a database that tries to give you the best guarantees that when your writes are acknowledged, they are actually persisted on disk, and you can trust that you're not going to lose your data, Redis works in a totally different way because performance is the main concern.

So what they do, they just store everything in memory. And of course, it's a little bit less reliable when it comes to making sure that the data is there, but it will give you very fast times and it will be doing very, very fast operations. So we'll talk more about the trade-off and what can you do to mitigate the risk of losing data. So the idea is that you store everything in memory, but it's also a distributed data store and it works with that key value kind of mindset.

So the main primitive is that you just say, in this key, I want to store this information, and you don't really have a lot of flexibility like you would have with a relational database. So you really need to think about your key value pairs there. You get very fast sub-millisecond latency. Very commonly, of course, it might depend on networking, but if you have a good networking connection with the client, you get very, very fast times.

It is mostly used for caching. So that's a case where if you lose some of the data, it's not a big deal, but you're going to get very, very fast round trips, reading the data and writing data. It can also be used as a message broker. It does support natively kind of a pub-sub mechanism. And then coming to the durability piece, there are some options there that you can enable to try to mitigate the risk of losing data.

Like you can create snapshots and do different things. We'll spend a little bit more time later on that. Now, another cool thing is that every time you write something on a key, you can decide which data type you want to use for that particular key. And then based on that data type, you can do different operations. So in a way, this could be like your own introduction to algorithms and data structures.

If you read the documentation where you see all the data types supported and all the operations for every different data type, you also get what is the time and space complexity of each and every one of them, which is really cool if you come from a computer science background to see in real life how all these different algorithms and data structure affect things that you can actually use in production to build products.

And just to give you some examples, you can write, of course, strings, but you can also write lists, maps, which are effectively ash maps or dictionaries. You can also write sets, sorted set. You can even have more advanced data types, for instance, streams, or you can even use the special indexes if you are trying to index points in space and do queries on geographical problems. And other data structures like that.

It is also extensible because it supports out-of-the-box Lua scripting. So if you want to build, let's say, your own kind of operation by using the basic APIs, you could provision into a Redis instance your own Lua script and then you can invoke it later to do more complicated stuff. In a way, it kind of reminds me of when you start functions in a database and then you call this function. It's probably kind of a very similar idea.

So stored procedure, that kind of thing. And you can also create pipelines. So if you want to do a series of commands, you can just send this kind of pipeline of commands to Redis and Redis will execute them in order. And there is kind of a structure in place that allows you to define this kind of pipelines. And finally, one really cool thing, this is more recent. In recent years, the developers of Redis spend a lot of time trying to make it extensible.

And now there is a quite large ecosystem of modules that you can add on top of Redis to just extend its functionality. And some interesting use cases are there are modules for full-text search, for converting Redis into kind of a graph database. So for all the kind of graph problems, you can store the information Redis and the module will give you query functionality to use Redis as a graph database. And there are also some modules that will give you some ML capabilities. So given all this introduction, and now you should be understanding all the capability of Redis, what is it good for? What are some use cases? Session storage is probably the number one use case.

Eoin: So if you go back to the early days of web applications, it was more common in the beginning to not have any high availability and to have single servers with state in them. And eventually people realised that we needed to be able to scale for high availability and also for performance, horizontally scale. Then the question became, how would you store distributed state like session storage? And Redis is, I think it's probably the number one use case there.

Also with the web applications, web page caching, like pre-rendering, pre-rendered server-side content is something you can also store in Redis. And then you can just use it as a database cache. And I think this is probably one of the areas where it became popular in the Rails community, where you've got a relational database in the back end, but you don't want to hit the database for every single query, especially for reads.

So you could use it as a database cache because it's only coming from memory. You could save a lot of latency. You can reduce the query load on your database and that can internally save you a lot of money. So Redis is also a cost optimisation tool in that sense. It's not just a cache as well. So it does have support for pub-sub messaging, which means that it has become quite common for low latency microservice communication as well.

I've also been using ElastiCache and Redis for things like application stage. So if you've got, especially in a serverless environment where you've got lots of Lambda functions that need some sort of shared state, but you want really low latency for that state, you realise you can't just have point-to-point communication or networking between functions because that doesn't exist. Instead, you use Redis as your state store.

Another example, which is kind of similar and related, is when you need to be able to query lots of keys in S3. S3 isn't a file system. It's an object store, as we know. So one of the disadvantages there is that doing file lookup operations can be very expensive on S3, especially if you want to do a list of all your keys in S3. So what I've done in the past in a number of different cases is every time an object is put into S3, you can capture that event with event bridge, and then you can have a function or some downstream process that takes that event and registers the presence of the object in Redis. And then if you want to do a lookup of all objects with a certain prefix, you could just do a list operation in Redis or whatever. So that makes it incredibly fast. If you were to do this with the S3 API and do pagination with list objects, you could be there for days reading a large bucket with lots of objects.

Luciano: Yeah, I do remember that use case, and it was pretty cool to be able to solve it with Redis. And yeah, the performance difference was impressed.

Eoin: I think the trade-off there is always that when it comes to Redis, you just need to make sure you have the right memory. And also if you've got lots of objects arriving at a fast period, you need to be able to scale it correctly. So maybe a little bit later on, we'll talk a little bit more about the non-serverless nature and how we might overcome that. But maybe first let's talk about the persistence. Some people like to use Redis as a database. It's something I would be very fearful of using as a database myself. There are things that will give you ACID compliance. Redis will not. What persistence options do we have?

Luciano: Yeah, so if we just take a vanilla installation of Redis, not necessarily in AWS, there are some interesting options worth exploring. And the main one is that you have point-in-time snapshots. So you can imagine those as a backup that you can do every once in a while, and that will give you a full snapshot of all the data that is currently stored in memory. That's one thing that you can do and is definitely something recommended to do anyway, unless you really don't care about losing the data.

Maybe it's very cheap for you to rebuild all that information in memory if you happen to lose all of it. The other option is you can use Redis as an append-only file. You can enable this append-only file log, which basically is going to write a transaction log in the background and you can configure the frequency. And I think the recommended setup that I saw somewhere was one second. So it's going to flash that transaction log into disk every one second.

And that way you have a little bit more guarantee that if you lose data, it's not going to be more than one second worth of data. Now, this is still maybe something that could not be acceptable for you. Maybe you want really not to lose any record. You can actually even configure the system to flash into disk for every single record. But I think that kind of defeats the point of Redis a little bit because then you are converting what is effectively an in-memory database into something that needs to write to disk for every single write operation.

So you're probably going to lose most of the benefits in terms of performance that Redis can give you. So this is definitely an option. It could be interesting for you to explore that option, but just keep in mind that at that point, you are almost having the same constraints that you have with a regular relational database minus all the features that a relational database would give you. So worth considering the kind of trade-off if you want to go down that path.

So in general, it's probably worth enabling both of these options. You can have the snapshots just in case, I don't know, something fails or you are restarting your machine. Maybe you are doing, I don't know, an upgrade. It's going to be very easy for you to rehydrate all that memory with the information coming from the backup. And then you can use this append-only file just to kind of protect you from data loss. It's not going to be a perfect protection, but it can be quite good for most use cases. Now, all of this makes sense in the sense of a kind of generic context where we are not specifically talking about AWS. So maybe it's worth spending a little bit of time trying to figure out what is AWS giving us out of the box, what kind of services can we use to provision Redis in AWS, and what are the features available if we do that.

Eoin: I mean, you can always run Redis on EC2 or in ECS or EKS, manage it yourself and get full control and flexibility, but we tend to always look for ways of saving ourselves, heavy lifting, and all sorts of upgrades and patching. So we turn to the managed version, which is ElastiCache. ElastiCache is managed cache, and it's based on either Redis or Memcache. So you can kind of choose your flavor when you set up an ElastiCache cluster.

Now, despite the fact that it's called ElastiCache, it's not just a cache, you get all the features pretty much of Redis when you have an ElastiCache Redis instance. So you get PubSub, you get streams. Of course, those aren't necessarily natively integrated with AWS services. It's not possible yet, but it would be interesting if you could trigger like EventBridge or Lambda from a Redis PubSub. It's in memory only, really, so no persistence.

The append only file mode, the AOF, it's not supported anymore on ElastiCache. So you can just use the snapshots, and you can also do backups to S3. You can also do replication groups. So there's two different ways you can set up a high availability. You can have vertical scaling, where you just have read replicas in your cluster, but you can also set up what's called cluster mode, which is where you start sharding your data over multiple nodes, and you can scale horizontally as well.

I mentioned that you can do backups to S3, so minimum of one an hour. If you want to do more granular backups, you'll have to rely on snapshots or some other mechanism. I think the most important message to take away from using ElastiCache on Redis is it's not a serverless service. You need to right-size it. So that means you have to monitor things like your latency, the performance, CPU, and memory in particular.

Everything is stored in memory, so you need to keep an eye on that. There's CloudWatch metrics for all of that. You should have alarms on them, and you should be constantly revisiting that. Redis as well is single-threaded and uses an event loop. So if you've got expensive commands, like you mentioned Lua scripts, you could have Lua scripts in your event loop. That can tie up your cluster for a long time. So you should look at the latency metric to help you spot that and see if you need to spread your workload in some other way or just increase the size of the instance that's underpinning your ElastiCache. Should we talk about ElastiCache in a serverless context?

Luciano: Because you mentioned that, and I think it's really interesting to double down on that and see how serverless it is, ElastiCache. And you already spoiled it a bit that it is not really that serverless. We need to do cluster management, high availability. We need to figure out what's the right size, what kind of... How much memory do we need effectively for the given workload that we want to use Redis for?

So in a way, it reminds me of RDS in terms of experience. You need... Yeah, you don't necessarily need to host and make sure everything is running all the time, but you need to know a lot about how Redis works and how to configure your workload upfront if you want to make sure everything is going to work well in production. And that's, of course, also in terms of networking, it requires a VPC. You need to make sure that any instance that wants to connect to Redis has network access to that VPC.

For instance, if you use Lambda, that Lambda needs to be provisioned in a subnet that has access to that ElastiCache VPC. So you might argue there's no necessarily great fit for serverless, even though we have been using it and the performance is still pretty good. It's just that the amount of effort in managing it is probably way more than what we wanted it to be. So if we can have an additional wishlist item for reInvent 2023, that would be to have a real serverless Redis available in AWS that we can just click and it's available and it auto scales, even scales to zero if you're not using it and pricing should be kind of, of course, proportional to that. Now, AWS actually announced MemoryDB quite recently, and we were excited, but that excitement didn't last too long because by just reading the announcement, we realized, okay, Redis is compatible, is durable, is in memory. All of that were very good tick boxes and made us very happy. But then we saw, okay, still requires a VPC, still requires us to specify an instance size. Also, it's not open source. While Redis, it is open source, so we don't really know exactly what's going on there. And it was also in terms of pricing quite expensive, or at least that's what it seemed from just looking at the table of prices. So the next question will be, are there alternatives that maybe are a bit more easier to use when you have a serverless setup?

Eoin: Well, if we think about the distributed state example we mentioned earlier, it would actually probably try and go for DynamoDB first before last catch because of the amount of extra management and the whole non-serverless nature of ElastiCache. And it might work for you, but DynamoDB, we know that you get single millisecond reads for records on DynamoDB, but your writes can be slower and queries, like if you're trying to do a query like in Redis, you can do a wildcard lookup of a bunch of keys with a specific prefix and its nanosecond responses.

With DynamoDB, that's going to be multiple tens, hundreds of milliseconds potentially, and you have pagination and all that kind of stuff. Now you can use DynamoDB DAX, which is their cache layer that is on top of DynamoDB, but that's only going to help if you have a read-heavy workload. If you've got lots of writes happening, as many writes as you have reads, then that read cache isn't really going to help you because you still have to write it to DynamoDB, it's still going to make sure that it's committed to at least part two nodes before you get a response.

So if you really need that low latency, then you might look at some of the hosted Redis solutions. I think we've mentioned Upstash on the podcast quite a few times. They're not sponsors, but we like to point people in the right direction from time to time. That's a much more serverless option, at least at the pricing level. So you've got Upstash Redis, and you also have Redis Enterprise Cloud from the company that's managing Redis as well.

Now there's been a fairly new player on the block as well, which I think is a very interesting one to watch, and I've been watching it fairly closely, which is Momento. And this is not hosted Redis or even a Redis compatible cache, but it's a completely new SaaS offering. And it's aimed at a similar space, at least for caching, and they do actually have a new PubSub offering as well. So this is built on Pelikan, which is an open source caching engine that came from a lot of the ideas in Twitter.

And this Pelikan open source caching engine has recently been rewritten in Rust, so we can expect very low latency and good performance and security there. And they have their own SDKs, so they have like Java, Node,.NET, Python, PHP, Go, Rust. And they do actually have a Redis compatible kind of drop-in library for Node.js, which will work for some of the Redis commands, not all of them, I believe.

And the idea of it is it runs in AWS or GCP, so you can pick your cloud host and your region to make sure that you get low latency and you avoid data transfer costs as well. And then the pricing on that, at the moment, it's like 50 cents per gigabyte. So this is the kind of thing where you could, some pricing models will work well for your workload, depending on your patterns, your read and write patterns.

It could be expensive for data-heavy operations, could be very cheap for lower volume operations, but you get a pretty good free tier actually, so 50 gigabytes every month is free. So I think that sounds like a pretty nice incentive to start using Momento. It's pretty new, but we suddenly see them everywhere. And I certainly like the idea of having a completely serverless, lightweight, simple caching that can work with your AWS deployment.

I think that they've got funding as well, and I think part of that significant funding they got is towards supporting other clouds as well. So we can probably expect Azure support down the line. I was actually impressed to find that they got a cloud formation provider for their caches as well on GitHub. So that you can put that provider in your account, and then you can create a cache just like you can with a Redis cluster.

And you can expect the management overhead will be significantly less. If you're looking to find them, by the way, their website is at gomomento.com, and that link is in the show notes. I mention this because they seem to have a bit of an SEO problem, since there's also a Memento database with an E. Momento, the one we were talking about, is with an O. But definitely check them out. So if you have any other alternatives, please let us know. That's it for today's episode of AWS Bites. Whether you watch on YouTube or you listen on your podcast player, if you like it, please subscribe, leave a review, and share AWS Bites with your friends and colleagues. And we really appreciate that. We'll see you in the next episode.