AWS Bites Podcast

122. Amazing Databases with Aurora

Published 2024-05-03 - Listen on your favourite podcast player

In this episode, we provide an overview of Amazon Aurora, a relational database solution on AWS. We discuss its unique capabilities like distinct storage architecture for better performance and faster recovery. We cover concepts like Aurora clusters, reader and writer instances, endpoints, and global databases. We also compare the serverless versions V1 and V2, noting that V2 is more enterprise-ready while V1 scales to zero. We touch on billing and additional features like the data API, RDS query editor, and RDS proxy. Overall, Aurora is powerful and scalable but not trivial to use at global scale. It's best for serious enterprise use cases or variable traffic workloads.

AWS Bites is brought to you by fourTheorem. If you need someone to work with you to build the best-designed, highly available database on AWS, give us a shout. Check us out on fourtheorem.com!

In this episode, we mentioned the following resources.

Let's talk!

Do you agree with our opinions? Do you have interesting AWS questions you'd like us to chat about? Leave a comment on YouTube or connect with us on Twitter: @eoins, @loige.

Help us to make this transcription better! If you find an error, please submit a PR with your corrections.

Eoin: Are you looking for an easy way to set up a relational database with best practices, resilience and disaster recovery in mind? Are you maybe looking for something reliable but also cheap and easy to maintain? Today we're going to try and answer the question, does such a thing even exist? The best hope for an AWS solution to this challenge is Amazon Aurora. We're going to dive into Aurora, talk about its unique capabilities, intricacies and of course, trade-offs. I'm Eoin, I'm joined by Luciano and this is AWS Bites. AWS Bites is brought to you by fourTheorem. If you need someone to work with you to build the best designed, highly available database on AWS, give us a shout. Check out fourtheorem.com or contact us directly using the links in the show notes. Luciano, before we dive into Aurora, what's the use case? Maybe let's try and imagine a scenario where Aurora might be a good fit.

Luciano: Yeah, I think it's always good to try to put a frame to the conversation by defining a good use case. Here we are talking about enterprise applications, so not really like hobbyist type of databases. So that's what it means. It means basically you might have hundreds or even thousands of data users connected to the database. Volume of transactions that can vary, they can be relatively low, but they can grow as needed because as a business, you might have your own spikes and you need to figure out how to manage all of that. Generally speaking, we are talking about really important business data, so something we can consider critical. So something you really don't want to lose. And we are expecting low RPO and RTO. So RPO probably in terms of minutes, RTO in terms of maybe one hour. And of course, something that needs to be replicated across availability zones and possibly have regional failure. So this is a very common use case, but it still considers something really, really hard to achieve. So somewhat of a pain point if you find yourself having to set up this kind of databases. So we are wondering today if there is a way that can make us happy and make all these things easy for you and give us low overhead managed database, high availability, multi-region, fast recovery, something that is relatively secure, also as cheap as possible. That's always something nice to have and somewhat developer-friendly.

So is this something we can achieve on AWS? If yes, what kind of services we should be looking at and try to analyze and decide if they really fit our description. And the first service that comes to mind is Aurora. So what is Aurora? So if you're looking for a relational database on AWS, you have a few options. You're probably aware of RDS, which is a service that allows you to create relational databases and supports a big number of databases, including SQL Server, Oracle, but also open source ones like Postgres, MariaDB, MySQL. And the idea is that they give you a service that is somewhat managed and it somehow removes some of the pain points of having to manage databases, but it's still running on EC2 instances. So it does abstract some things, but there is still a bit of the pain point of having to manage servers. So with RDS, you also have another category of databases, which is the one called Aurora. And Aurora promises to deliver MySQL and Postgres compatibility with performance three times faster than the regular one that you find in the regular MySQL and Postgres RDS. On top of that, Aurora has some different characteristics that are only possible because effectively AWS reinvented this kind of databases. It's a service that they somewhat recreated, provided the compatibility with the protocols of MySQL and Postgres. So they figured out clever ways to optimize the service more than what you would get with the open source alternatives. So let's try to dive into the details. Maybe we should talk first about the storage, because I think that's the first thing that comes to mind when we try to think why Aurora is different from just a simple MySQL or Postgres.

Eoin: Yeah, definitely the most important thing to know about Aurora is that its storage is different to RDS and to most other databases out there. You mentioned three times faster performance in some benchmarks. That ultimately comes down from the way they have designed Aurora storage. I mean, the engines themselves are using the open source MySQL and Postgres engines on top of this new storage layer, but because of the way it's been architected, they've been able to reduce the number of writes that those database engines have to do and achieve that better performance.

Normally when you configure a database on a server, you configure the database running as a process or a set of processes, and then you have the storage, which could be attached in some way. With Aurora, it's a bit different in that the storage is completely separate from the compute layer and it uses its own magic to give you that better performance, as well as more durability and faster recovery, which if you're an enterprise, all of those things sound like a dream.

Now you can think of Aurora storage, I think, as an intelligent kind of EBS volume layer that is database transaction aware. So all of the data you store is automatically stored in six copies across three availability zones by default. So that automatically gives you great resilience. And then you have asynchronous replication processes that happen outside of the database compute. So normally when you have other databases, you have to configure the engines themselves to do the replication between the compute notes. With Aurora storage, it happens at the storage layer, so you don't have that compute level replication. So when your data is replicated, it doesn't actually affect the instance performance, which is a really good thing.

And then because it's being written all the time, asynchronously, recovery time is really fast because the data is not affected by instances going down. The data is already replicated by design. And you can then add new instances to a database cluster that give you horizontal scalability using that existing storage layer. The other thing to know about Aurora storage is that it scales automatically, so you don't have to provision storage capacity upfront. It just grows automatically in increments of 10 gigabytes up to a maximum of 128 terabytes. So let's try and understand all this a bit better. And we might take a look at, I suppose, a few Aurora concepts and constructs and how you might architect a database based on Aurora. Luciano, would you like to take us through some of the terminology and concepts in Aurora? Yeah, the first one that comes to mind is probably the concept of a cluster.

Luciano: And this is something that is already different from a more traditional RDS database. So the first thing you need to create is this idea that your database exists in a cluster, you need to create this cluster. And this represents the storage layer. So even if you don't have any database, meaning the compute part of a database in this cluster, the storage exists as a kind of baseline. And of course, you'll want at some point to add at least one instance to make it useful because just the storage alone is not going to allow you to do any query or any actual operation. It's just there to keep your data safe. So every cluster can have one writer instance that allows you to effectively under-read and write requests, but you cannot add more than one writer. So you can have only one writer per cluster. So that means that you can only scale writes vertically by using bigger write instance if you end up having necessity for a bigger write throughput. You can add though up to 15 reader instances. So you can definitely scale horizontally the number of reader instances.

So we can also call them read replica if that term is a little bit more familiar to you. And that's something that allows you to handle read scalability. So you can easily spin up more instances if you see that you need more throughput in terms of read. There are ways to do auto scaling. So you can set up auto scaling policies, and this allows you to basically look for things like CPU utilization or number of connections to provision new reader nodes as you see that your traffic increases. And of course, you can also scale it down if you see that that traffic decreases. So each instance in the cluster will have its own endpoint. So this is the idea that you need to have a way to connect to the specific instances. So every instance has its own kind of address that allows you to connect to it. But there is also a concept of a cluster read and write endpoints. And these endpoints are kind of the preferred way to connect to a database because they will automatically do all the routing for you, figuring out which instance is the most appropriate to handle that particular request. And this is important because, for instance, as you scale up and down, or maybe if there is some failover in your cluster, that layer, that kind of global endpoint will know exactly what to do to make sure that your request gets answered correctly. If you manage your own connection directly to the specific instances, then doing all of that stuff is on you. And of course, that's not always fun to do. It can lead to all sorts of problems. So try to avoid it unless you really know what you're doing. So on the topic of failover recovery, if your writer fails, because remember, you only have one writer, so you might be wondering, okay, what happens if that writer fails? Of course, there is some kind of failover mechanism. Aurora will automatically promote one of your replicas to a writer. So this is all within a single region. We mentioned also that Aurora supports the concept of multi-region, which is something that seems really, really cool and promising. And if you ever tried to do a multi-region database, you know that it's extremely complicated to do it correctly. So maybe we should talk a little bit more about this particular characteristic of Aurora.

Eoin: When we get to talk about cost a little bit later in the episode, spoiler alert, Aurora is a little bit more expensive than the alternatives. And for that, you have to expect some extra value. And I think when it comes to replication, failover, all of these disaster recovery scenarios and scalability, that's really where you see the value. And multi-region is one of those things where you just get something that you can't really achieve easily with other databases. So let's talk about global databases. With Aurora, a global database is something that connects together clusters across multiple regions.

So it's essentially a grouping of one or more regional Aurora clusters. And only one of those clusters can be the primary. And that's where the writer instance exists. Now, there is a thing called multi-master for MySQL only, but let's put that aside for the moment. If you're looking at a global database, you'll have one primary region and that's where the writer will exist. And a global database is really just an identifier that Aurora uses to replicate data from the primary cluster to read clusters in different regions. And because Aurora global databases are using Aurora storage, replication is very fast, typically less than a second. And this is with a database that can support up to 150,000 transactions per second. So when it comes to multi-region disaster recovery, if you've got a very low recovery point objective (RPO), this is a way to achieve it. It's really something that's very difficult to achieve without Aurora.

Now, you can have up to six regions in a global database in total. And if we take the 15 read replicas supported per region mentioned already, this allows you to go to a pretty big scale with 90 read replicas in your database in total. So then if you've got your multi-region database setup, you can use this for data locality. If you have some readers in some regions, which might be better to serve requests from users in a specific region, you can also use it for disaster recovery.

And if you have your multi-region global database, you can trigger a failover from primary region to a secondary region. And this is really useful for enterprise use cases where you need to seriously reduce the risk of data loss and lower downtime as well. Now, global databases don't have global endpoints. So we mentioned about cluster endpoints, read and write endpoints per region. So you still need to use those regional cluster endpoints for your application. You need to decide which region you're targeting. You can use DNS of course, to manage that or ensure that the application is aware of the cluster typology and failover scenarios and can respond accordingly. But there's no such thing as a global endpoint that automatically does that for you at the moment. Interestingly, there is a thing called write forwarding in Aurora. So you can actually configure regional read endpoints to take write requests, and they'll just forward them to the write node for you, which is something that might be useful to you, especially in disaster recovery scenarios. Now, I think that's pretty much all the terminology and some of the primary benefits of Aurora, but something that makes the headlines quite frequently is Aurora serverless, sometimes for the good reasons, sometimes not so good. Luciano, can you take us through Aurora serverless and what it can offer people?

Luciano: Yeah, so what we've described so far is what we would call provision mode. So effectively you have to configure your instance sizes more or less similar to what you do with standard RDS. And the only difference is that you don't have to provision the storage. But as you say, there is also a serverless mode, and this is what is generally referred to as Aurora serverless. It gets a little bit confusing because there was originally a V1, which was, of course, just called Aurora serverless. Then they did a big rebranding changes and they call it Aurora serverless V2. We'll talk a little bit later about the differences between these two versions. But the point is, what is this concept of Aurora serverless? Is it something that it is somewhat easier to use because the terminology serverless generally is associated with something that you don't have to manage almost anything. So it's kind of an easier user experience.

Or maybe something else, is it something that maybe looks like if you ever use services like Neon or PlanerScale or Supabase, are we talking about something like that? And my personal answer is probably no. It is quite different from products like Neon, PlanerScale or Supabase. And I think before we go into the details, maybe it's worth remembering what a serverless database look like, or at least trying to define it in some way. And of course, the first one that comes to mind when you try to think about a serverless database, of course, is DynamoDB in the context of AWS. And it's kind of the gold standard, if you want, for serverlessness in database world.

And the idea is that it's a database that scales to zero by default. So you don't really have to even think about that. It can go up and down and you can also in terms of pricing, you can pick between provision and on demand. But when you pick the on demand approach, again, even the pricing model becomes more serverless. And you have almost a spontaneous creation of tables. You don't even have to think in terms of clusters or databases. You just create tables and they almost immediately appear. But what is the problem with DynamoDB? Why aren't we just using DynamoDB then? Because DynamoDB is a NoSQL database. It's not an RDBMS type of database. So when you need relations, DynamoDB gets much trickier to use to do all the things that you can do with a relational database.

So services like Neon or PlanetScale are really cool because they try to give you that kind of experience where almost everything is managed for you, almost like with DynamoDB, but you get a fully fledged relational database that you can use straight away. Just connect to it and use it. The problem with those services is that generally they seem to be targeting more kind of software as a services or smaller startups, not as much as the enterprise, at least for what we have seen so far.

While Aurora on the other end seems to be positioning themselves as the relational database product for kind of the serious enterprise that needs a certain number of features and needs something to be really, really reliable. And maybe cost is not always the first trade off that they look for. So what are we talking about here is more of a modern take on something like Oracle RAC and maybe a little bit cheaper, but the idea is that something runs natively in the cloud on AWS.

What is Aurora Serverless at this point? Why did they give you all this appeal about DynamoDB and what we mean by serverless and why we think that Aurora Serverless is not really serverless as you might think? The first thing, okay, scales up and down to some degree it does that. The problem is it doesn't really scale to zero. There is a concept of ACU, which stands for Aurora Capacity Unit. And the idea is that one ACU is equal to two gigabytes of RAM, more or less. And the idea is that you cannot just scale to zero ACUs. There is a minimum and the minimum is 0.5, which means that even if you have a database that is totally idle, because maybe, I don't know, it's a dev deployment, you maybe are on a break because it's the weekend, nobody's really using that database.

You are still paying for two days, 0.5 ACUs, at least for that particular deployment. And imagine if you have multiple development environments, maybe you try to segregate things by domain, you might have dozens of database laying around just doing nothing and costing you money. So in that sense, it's not really serverless as we might like it to be. So that means there is a minimum cost. And the other interesting thing is that there are still maintenance windows required.

So you need to plan around those. So depending on what you do, you might need... eight ACUs are recommended for global cluster in the primary. So certain parts of your setup will require even more ACUs if you want to follow the recommended setup. And two ACUs are for performance insight, which is a tool that gives you query metrics. So it can be even more expensive as a baseline if you actually apply all the suggestions that you get from the documentation. And the funny thing is that we recently realized while we were using Aurora that there is the possibility to reboot the server. So we were seeing the server rebooting. So it's kind of funny to see that a serverless product, you actually see the server rebooting. So again, it doesn't really feel as serverless as you might like to think. So with all of that being said, there is a benefit in this capacity of Aurora serverless being able to scale up and down. And it's something that can be useful in some circumstances, especially for instance, when you have very variable traffic, it's something that can remove some of the headaches about planning, for instance, the capacity of your database. So in those cases, even if it's not really a serverless database, this product can be beneficial as opposed to just going for the traditional Aurora or RDS standard. But I mentioned that there are two versions, V1 and V2. Eoin, do you want to try to explain what is it all about?

Eoin: Yeah, V1 was around for a good few years. I'm not sure exactly when it was announced, but V2 now has been, it was in beta for a while, and I think it's been at least a year generally available. They announced it started this year that V1 is no longer going to be supported from the end of 2024. And this is quite disappointing for some people because version one actually did scale to zero. So that's something as you said, Luciano correctly, you can't do it with version two. But V2, when they announced it, they did add some big advantages.

So scale up time is faster. With V1, this could be quite slow, could be like seconds or minutes. Now you can scale up in milliseconds, but the speed you scale up at actually depends on your baseline capacity. So the number of ACUs you're scaling from. So the more ACUs you have, the faster you'll scale up. In V1, it could take minutes and it only actually scaled in double increments. So you could go from two ACUs to four, then to eight, and then to 16. Now you can just scale by 0.5 ACUs all the time. Version two now supports global databases, where version one did not, and version two supports read replicas where version two did not. So version one was really just a single instance database, so you couldn't really regard it as a serious production ready database for the enterprise. So I think now it's probably worthwhile thinking, OK, when should you consider Aurora Serverless version two instead of provisioned Aurora?

And the primary difference is that scaling you mentioned. Aurora Serverless means you can scale vertically without having any failover. And that's really one of the sweet spots here. Because normally, if you've got a provisioned instance and you decide it isn't big enough for your needs anymore, you would have to add a larger instance size as a reader and then promote it to the writer and then deprovision the old one. And that takes time, and you might have outage time on the right instance while you do that. Otherwise, the management overhead between Aurora Serverless and provisioned Aurora isn't really that different. As he said, you still have maintenance windows, you can reboot the instance. It doesn't really seem like a serverless product. It has a serverless badge, but I think it's a little bit of a mask it's wearing, to be honest. It might be a good fit for your pre-production or development databases where you might have lots of idle time and then you just want to scale them up as you deploy and run test workloads. So that might be one of the cases where you can actually make use of the cost difference. Because generally, Aurora Serverless will cost you more if you're comparing gigabyte for gigabyte. And it will only really start becoming cheaper if you've got that really variable traffic pattern and you spend a lot of time scaling down from peak capacity. So we mentioned cost a few times. So let's go into cost a little bit. I've done some calculations on this. Maybe we can share the link to this cost calculator, the AWS pricing calculator in the show notes. And if you look at the cheapest possible serverless V2 instance, so 0.5 ACUs, you're talking about around $50 a month. Whereas the cheapest Aurora standard that I could find at least was just a little bit more expensive, closer to $60. But that was for a much bigger instance. So I think it was, I don't have it two hands right now, but I think it was eight or 16 gigabytes of RAM. So you're already getting far more compute and memory than you would for the serverless version. Now, if you compare that to the cheapest possible one you can get on RDS, I could pick one there that costs $15 a month. Of course, all of this, you have to try it and measure it yourself. There's no way of saying there's absolute price comparisons between all of these options. It depends on your storage, your traffic and everything. So you really just have to give it a go. If you were to look at an R6G large, this is a Graviton memory optimized instance, I think 10 gigabytes. So that's kind of entry level when it comes to Aurora standard. You're looking at provisioned cost of over $200 a month. But if you wanted Aurora serverless as a primary in your cluster, you're going to need ACUs like you mentioned, that's going to cost you $400 a month. So if you're like a startup bootstrapped solopreneur, you might be looking at some of these costs thinking this isn't going to work for me. And you might be better off starting with something like neon or one of the other services. Or you could just say, okay, well, look, the database is a serious part of my infrastructure. I'm going to have to spend a serious amount on this. I don't think that production grade, enterprise grade databases come for cheap, unfortunately. So that's our two cents on cost, but there's a lot of other features. We don't have time to cover them all. In fact, I think we should probably do a few more episodes on Aurora architecture, maybe setting up and managing Aurora, but what other topics should we run through this channel before we finish up?

Luciano: Yeah, let's do a quick list of other things that might be interesting to know, but we'll leave it to you the task of like doing a deep down and really trying to understand all of the details. So the first thing to mention is that there is a data API, which right now exists only for Aurora Postgres. MySQL was interesting enough supported in Aurora server as V1, but it's not there yet, at least in V2. So if you're planning to go from V1 to V2 and you were relying on this MySQL data API, just be aware that it's not there for the V2, at least not yet. Hopefully it's going to come soon. But what is the point of this data API? So the idea is that with the database, you generally connect through kind of a raw TCP connection that uses a protocol that is specific to the database system. While with this data API, there is effectively an HTTP API that replaces all of that. And why is this convenient? Because it's a little bit easier, for instance, to call if you are in a Lambda, you don't necessarily need to install specific drivers for your own database. So it can make things a little bit easier in terms of connectivity from different environments. Now, should you use it? Again, if you are in a server environment, serverless environment probably makes things easier. But if you are in other context, maybe you have a more traditional application with maybe using an ORM, a standard framework, like, I don't know, Spring Boot or something like that, it's going to be probably much easier to just use the tools that you are already familiar with and just do things in the classic way.

Another interesting thing is that there is an RDS query editor. So you can finally run queries in the AWS console. This is something that for traditional RDS, I was looking for for a long time. It's good to know that with Aurora, you have that option. And it is based on the data API, so you need to enable the data API for it to work. So just be aware of that small detail. And there are some limitations that we have observed using it. But if you just need to run a quick query just to try to validate some assumption, it can be a really useful tool, and it might save you a lot of time. Now, it's not necessarily the main tool you should be using for doing all your data modeling with your database. But again, if you just need to use it for some debugging, it can be very beneficial. And the other thing is RDS proxy. It's probably a better solution for Lambda compared to data API. It's an additional resource that you need to provision for each regional cluster. And the idea is that because when you run Lambdas, you might end up having really, really quick spinning up thousands of Lambdas. And every single Lambda is going to try to establish its own connection. So if you try to do that in the traditional way, where you try to just establish a TCP connection directly against the database, you might very easily saturate the pool of connections available in the database. So this proxy tries to manage all of that, tries to provide some kind of share pool and manage all of this stuff so that you don't end up overloading your database in the case where you're spinning up thousands of Lambdas at the same time. So it keeps your database a little bit leaner and manages all of these things, and also gives you faster failover because it's aware of the cluster topology. So you don't have to rely on DNS. And with DNS, oftentimes you might have problems because maybe you have a TTL, and then your DNS doesn't get refreshed fast enough. So you might be failing for a while before you figure out that there is a new DNS record, and then you can easily connect to the correct instance. Another interesting point is that allows you to enforce IAM authentication. So this can be useful if you have, for instance, to share secrets with the application directly. The small problem is that, of course, because it's an additional resource, there is an additional cost. So something that you have to consider is $25 a month, more or less. So it's not a massive cost, but depending what you are trying to do and what is your budget, it can be something significant and is worth considering that. So these three things I wanted to mention, Data API, RDS query editor, and RDS proxy, somewhat interesting. So if you end up using our Aurora or seriously considering Aurora, check them out a little bit better in the documentation. We'll have links in the show notes if you really want to understand why those additional three things can be really beneficial for you. I guess this is time now to jump to the conclusions.

Eoin: Yeah, while we were talking about v1 and v2, I was reminded that the Cloudanut podcast and blog, they had a good analysis of the differences between v1 and v2 because I think they were using v1 and were a bit disappointed to realize that the migration path from v1 to v2 isn't great. Essentially, you have to take a snapshot and then copy it across, create a new v2 cluster from it. It's worth checking out, but maybe we should finish up with some advice on when you should use Aurora from our perspective. I think it's really for serious enterprise use cases.

And I can see Amazon basically aiming at people who are paying a lot of money for commercial databases with high license costs and lower performance. And they're saying, this is going to be cheaper and faster, even though it's still a significant cost for the average user like us from our perspective. For enterprises, they can definitely make some savings. But it's also good for single instance, low config, serverless type cases that need relational database rather than no SQL. So if you're somewhere in between those two extremes, you might just use RDS or another vendor, especially if you are cost conscious. So just to summarize everything we talked about today, Aurora, its relational database solution on AWS, we talked about some of its unique capabilities, intricacies and trade offs, and how it'll give you MySQL and Postgres compatibility with faster performance. But the most important thing maybe to take away is its distinct storage layer that gives you that better performance and durability and faster recovery. And then bear in mind as well the concepts of Aurora clusters, reader and writers, endpoints, instances, and global databases as well. When it comes to Aurora serverless, comparing its two versions and their features, V2 is definitely more enterprise grade, but V1 is not going to be supported from the end of this year anyway. And while it definitely does not reach the gold standard of serverlessness set by DynamoDB, it does have its uses, particularly for variable traffic use cases, and maybe pre-production workloads as well. And we also touched on billing aspects and things like the data API, RDS query editor, and RDS proxy as well. So overall, I think it is a really powerful and scalable solution. It's not trivial to use, especially when it comes to global scale, but still way more trivial than the alternative because when you're dealing with clusters of relational databases, at least as of yet, there is no silver bullet, ultimately simple solution. So thanks very much for joining us and join us in the upcoming episode for more on Aurora and a whole load of other AWS topics.