AWS Bites Podcast

117. What do EBS and a jellyfish have in common?

Published 2024-03-08 - Listen on your favourite podcast player

In this episode, we provide an overview of Amazon EBS, which stands for Elastic Block Storage. We explain what block storage is and how EBS provides highly available and high-performance storage volumes that can be attached to EC2 instances. We discuss the various EBS volume types, including GP3, GP2, provisioned IOPS, and HDD volumes, and explain how they differ in performance characteristics like IOPS and throughput. We go over important concepts like IOPS, throughput, and volume types so listeners can make informed decisions when provisioning EBS. We also cover EBS features like snapshots, encryption, direct API access, and ECS integration. Overall, this is a comprehensive guide to understanding EBS and choosing the right options based on your workload needs.

AWS Bites is brought to you by fourTheorem, an AWS Partner that does CLOUD stuff really well. Go to fourtheorem.com to read about our case studies!

In this episode, we mentioned the following resources.

Let's talk!

Do you agree with our opinions? Do you have interesting AWS questions you'd like us to chat about? Leave a comment on YouTube or connect with us on Twitter: @eoins, @loige.

Help us to make this transcription better! If you find an error, please submit a PR with your corrections.

Luciano: If you have run an EC2 instance on AWS, you are very likely to have used EBS storage. Storage may be a dual topic, but if you scratch under the surface, EBS is actually a bit of a technical partner. Its design is even based on the Portuguese Man O' War, (not the metal band), but a stinging jellyfish-like sea animal. We'll try and explain why and explain why EBS is an understated genius amongst AWS offering. This is AWS Bites and Luciano, joined today by Eoin.

AWS Bites is brought to you by fourTheorem, an AWS partner that does cloud stuff in a really, really good way. Go to fourtheorem.com to read about our case studies and feel free to reach out to us. So if you use EC2, you're going to need some sort of storage, something like a disk drive or a remote storage, and you would use some kind of physical server that needs to store information, of course. Some instances support what they call instance store, which is something that is physically attached to the instance itself, which is also really fast, but they are also ephemeral, which means it's temporary storage. So if your instance gets stopped, terminated or hibernated, all the data gets lost. So for these reasons, they are very good for cases where you need fast access but not persistent data like caches or temporary storage. But for almost everything else, imagine you have databases or files that you need to keep around, there is something that's called EBS, Elastic Block Storage. So what is really EBS? Eoin, do you want to try to take a stab at it?

Eoin: It might be worthwhile starting with block storage. What is block storage? It's used by physical disks as well as SANs, so storage area networks. And with a block store, data is stored in fixed sized blocks. That's how it gets its name. If you imagine old school traditional spinning disks, these blocks related to physical sectors, and then with software or network-based storage, dividing a storage volume into blocks can actually be used to achieve resilience and performance if you optimize how those blocks are read and written. So when you create a file system on top of a block storage device, file reads and writes are translated to reads and writes of individual blocks. And the volume might also have some form of caching built in. So what about EBS then? Well, EBS is a kind of block store, hence the name, and it's a service that allows you to create storage volumes in AWS that can be attached to EC2 instances. And now actually ECS containers. So we'll talk a little bit about that later. The volumes themselves exist independently of the EC2 instances. So they can be detached and reattached to other instances. And they give you a really good performance. They also give you different options so you can balance the cost and performance factors. In terms of size, they can go from one gigabyte all the way up to 64 terabytes.

And that's just for a single volume. So you can also have multiple volumes and you can even use things like RAID to create arrays of volumes. You can also easily change volume types. So you've got elastic options with EBS as well. You can change the type, the size, as well as other features. Volumes can even be resized dynamically without any downtime on attached instances, which is pretty cool. Now they're encrypted. And one of the amazing things about them is that they're encrypted without that encryption incurring any performance impact. Because of their design, they also have a reduced failure rate when compared to physical disk drives, typically, which is something that is really an advantage when it comes to choosing EBS over a traditional physical medium in a data center. And they're highly available within an AZ. And I think that's one of the important things to understand about EBS volumes. They are related to availability zone. Within that AZ, there is redundancy when it comes to the storage and failure of components of the EBS volume, but they're still within one availability zone. So if you want to make them more resilient, you can do that by taking EBS snapshots. And these are backed by S3. So then they're stored in multiple AZs. Another point to mention is that we're talking about EC2. EBS volumes are of course also used by RDS for storage. So when you're creating databases in RDS, you will choose EBS volume types when you're creating them. It doesn't apply to Aurora because Aurora has its own very special storage mechanism, which we might talk about in another future episode. One of the neat features of EBS is snapshots. And we just referred to them there, but it's worthwhile going into them in a bit more detail. What do you think?

Luciano: Yes. So snapshots are point in time backups, and that backup will contain the entirety of your EBS volume. And they will be stored in S3. So that means that by default, you just get the 11 nines of durability that S3 gives you. It needs to be monitored for cost because of course, if you have large volumes, you are effectively replicating all of that data into S3 and that comes with a cost. And if you are not careful, that cost is going to build up, right? But there is actually an interesting feature there because these backups are incremental. So only the data that has changed from the previous snapshot is actually stored. So that can make them more efficient. And you still need to be careful of course, but generally speaking, the incremental approach will make so that you are not replicating all the data all the time, but just the first snapshot and then the changes layer on top of that. You can create new volumes starting from a snapshot.

So the idea is that basically you might snapshot a volume from a machine. From that snapshot, you can maybe bootstrap another machine. And other things that are relevant are that you can create copies of the snapshot within or across regions. So maybe you can use this approach for instance to spin up a new machine in another region. And you can also share snapshots with other accounts. So you can even use this approach to share data across different accounts, or even you can create snapshots that are publicly accessible. So if you have information that somehow needs to be shared across pretty much everyone that might need that information, maybe it's like a large data set that you want to make public, and people might need to spin up an EC2 and load that information easily into the EC2 instance, you can use this approach as well.

Snapshots can also be archived, which is something that can be convenient when you don't expect to necessarily access that snapshot frequently, maybe just for extra peace of mind. That will help you to reduce the cost. So you just take the snapshot, archive it, and it's going to be there for 90 days or more, for example. So apart from attaching an EBS volume, is there any other way to access the data? You might be wondering, okay, is that the only use case? There is actually a very interesting API that is something that I came across only very recently. So it's a little bit of something that is not necessarily well known to everybody, and it's what is called the direct API. We will have a link in the show notes with all the documentation, some examples, but generally speaking, it's kind of a lower level API that allows you to access directly the EBS snapshot data, so without having to mount a volume into an EC2 instance. And this is probably a bit of a niche use case that is used maybe if you are a vendor that wants to provide some kind of backup, DR kind of extra facility, they can use this approach to be very efficient in the way they can access the data and read and write data directly into EBS snapshot data. You don't necessarily have to create volumes. That's kind of the idea. So if you have a use case where you want to control the data directly and you might do that at large volumes, this might be an API worth exploring a little bit more. It will come at a lower price. That's kind of the idea. We mentioned that EBS itself is a bit of a technical wonder in AWS. So what is this magic? How does it work under the hood?

Eoin: Well, I can't explain all the details because the details and if I did, I probably wouldn't understand them. But one of the interesting things I suppose fundamentally is that it's not directly attached to physical storage. It's accessed by instances over a network, but it appears to the operating system like a physical disk. Now there is a paper, there is an academic paper published by AWS on how they evolve their internal configuration management system. And the paper is called "Millions of Tiny Databases", which gives you a hint as to how that's architected. The idea is that in the configuration management system for EBS, it uses cells, which are full tolerant units that manage a small portion of EBS data. And these little cells are replicated across the storage nodes to give durability and availability. And each cell is actually a seven node Paxos cluster.

This cellular database design is known as Physalia after the Latin name for the Portuguese Man o' War. So the Portuguese Man o' War is, it looks like a jellyfish, but it's not technically a jellyfish because it's not a single animal, but a colony of millions of individual organisms. So I guess the team thought that this was a good fit for the Physalia EBS architecture, which is built basically on a colony of individual distributed database cells. So it's difficult to really wrap your head around the idea that this high performance storage system is actually backed by millions of tiny distributed databases that give you durability, high availability and consistency that you need. And if you have a look at that paper, like a lot of distributed systems papers, it'll talk about the CAP theorem, which is the famous theorem that it describes in a distributed system, you can have two out of the three when it comes to consistency, availability, and partitioning. Now consistency and partitioning are non negotiable in the context of EBS, you have to have consistency when it comes to block storage. And the architecture basically just optimizes to achieve the maximum possible value of the third one, which is availability. So it's they're just essentially achieving availability with a very high probability, which is the best trade off you can achieve for a system like this. Now you often hear AWS people talking about the nitro system, which is their custom, very extensive hardware that underpins a lot of the modern AWS infrastructure. And this the nitro system actually has specific features built into it to facilitate EBS volumes. And to enable this encryption, we talked about without having any performance overhead, you might also have heard of EBS optimized easy to instances, because it's a network based file system, these instances actually have dedicated bandwidth, so they give you more guarantees around the bandwidth you have when accessing EBS. So that's as much I think, as I can say about the internals of EBS, feel free to check out the paper if that kind of thing floats your boat. When it comes to actually using them as an end user, though, one of the things I find is that if you're not used to EBS, suddenly, all you want is a disk, but you're confronted by all these different terms that seem very confusing. And there's a lot of different options. And people talk about IOPS, and it's very easy to get lost and confused. So should we try and do our best to explain these concepts at least at a high level without boring everybody to death?

Luciano: I agree. And it's something that to be honest, I still get confused about every once in a while. So it's definitely good for me to also go over all of it again and try to finally memorize all of these different concepts. So we have at least three different pieces of terminology that can be confusing and something that we need to know. And we are talking about volume types, throughput, and IOPS. And those are really important, not just because you need to make the choice, right? As you said, you just need a disk, but you don't just get to pick a random disk, you need to decide based on these options. So definitely, you need to be aware.

But the other element is that it has a massive correlation with the cost. So cost is definitely based on the different values that you pick when it comes to these options. So it's important to understand the meaning and the associated pricing so that you can avoid some kind of random bill shock just because you provision a disk and you didn't know what you picked. So let's start with IOPS. What is an IOPS? It basically means IO, input output per second. And it identifies the number of IO operations that you can perform every second. So what is an IO operation, you might ask?

And this is basically either a read or a write of either 16 kilobytes, 64 kilobytes, or one megabyte, depending on the EBS volume type. So you are effectively reading or writing a certain amount of data. That's an operation. If you have any meaningful production workload, it's generally a good idea to really understand how IOPS can become kind of a bottleneck and make sure that you fine tune the number of IOPS to make sure it matches your needs for that particular workload.

Just always be aware of cost. Don't just push that to the maximum because you might end up just paying a lot of money for something that you don't really need to use. So instead, what are volume types? This can be a little bit confusing because there are many different options, but let's try to cover the broad categories and how to think about them so that you can choose somewhat consciously. So you have one category of volume type, which is basically SSD, so solid states, like, I don't know, the flash memory that you have in your phone, or probably if you have a very modern laptop, they will have some kind of SSD inside. Then the other option is HDD, so the old school spinning mechanical disks. It's another option and there are slightly different trade-offs in terms of performance and cost, but you can make this option. With SSD, you have general purpose SSD and provision IOPS types. So this is a kind of a subcategory when you pick SSD. Then if you go with HDD, you can pick either throughput optimized or called HDD. Now again, we might kind of deep dive a lot more, but let's just try to keep it high level. And we will put a link in the show notes about some pricing examples that are part of the official documentation when it comes to pricing to really understand how these different choices will affect pricing. Again, the pricing model as sometimes happens with AWS is not very linear. Depending on the choices, it changes the formula. So I think the examples are a very good way to try to figure out some common use cases and how to think about pricing for those specific use cases. Eoin, do you want to try to give us a little bit of a more complete overview on what the types are and how to think about them?

Eoin: Yeah, it's probably worth starting with the default option. So I think the default option, if you don't listen to anything else in this section, it's probably worth listening to the fact that GP3, so the general purpose SSD type known as GP3 is probably the default option. And if you're not sure or you can't be bothered learning the rest of them, I would say go with that one. This one is the one that measures IOPS with 64 kilobyte I O units. So when we talk about IOPS, the different types can have different I O unit size. So you need to think about two things, the number of I O operations you might want to perform per second, but also the throughput.

And these things are two different dimensions. They're slightly related, but you need to think about them separately. Now the GP3 is relatively new and it gives you a decent balance between cost and the ability to control IOPS throughput and size independently. So there's a lot of flexibility there too. That's why I think it's a good default choice. The cost is generally good as well compared to the other options. And by default, the baseline is that you get 3000 IOPS and 125 megabytes a second of throughput. So this is different to the older type, which we'll talk about in a second, but the fact that you get that baseline, which is pretty good in its own right, but then if you need more, you can adjust those levers. That's nice to know. And it's reassuring, I think the GP2 one was the default for a long time. It uses smaller 16K I/O units, but the baseline is very variable. It depends on your volume size. So they give you three IOPS per gigabyte. So it makes it a little bit more difficult to calculate, but the unique thing about GP2 is that it also gave you first of all IOPS so that it was pretty good.

If you didn't know what you were doing, you had something that generally didn't need that much IO, but from time to time, like even on boot, it would need a burst of IO or for variable workloads. But then again, you could also exhaust that first of all limits and you needed to be careful to watch out and monitor for that. I think it's still a good side issue there just to recommend to monitor your IO consumption against your allocation for any type. So GP3 is generally cheaper on the order of 20% in most cases. So that's why we'd say if you're going for general purpose SSD, go with GP3 and you should be pretty safe with that option. And there's a calculator to indicate what savings you'll get. And we'll give a link to that in the show notes. You can by the way, upgrade GP2 volumes to GP3 and you don't need to restart the attached instances, which is pretty cool.

Now, if you're a performance hungry workload, then you can go for the provisioned IOPS. So you can imagine that you've got something like you're running a database. That's the typical example that you get, or you've got something else that's just really read and write heavy in terms of IO. Then you've got a few options there. They used to be IO1 and IO2. IO1 would give you up to 64,000 IOPS, which was pretty good. IO2 was available up until the end of last year. And it's actually been replaced now by a new one called IO2 Block Express. IO2 is essentially legacy. Now IO2 Block Express has a whole bunch of additional features, but basically compared to IO1, it allows you to get up to as many as 256,000 IOPS. Now that'll depend on the instance side, because that needs specific instance characteristics to support it, but it'll give you four times higher throughput than IO1 as well. Other benefits of the IO2 Block Express are higher durability.

So you get five nines instead of three nines, and you get lower latency as well. Again, GP3 is probably your go-to. If you know you need more IO, you can always move to one of the provisioned IOPS options. Worth quickly talking about the HDD types, but these are more niche these days. Generally much, much slower. The IO size is one megabyte, so it's completely different physical architecture. If you need sequential access, people who have used hard disks and SSD disks and physical machines in the past may know that SSD is good for random access, because you can read at a consistent speed anywhere on the device, but physical disks that have a robotic arm need to actually move to the location you're reading from. So they're always much better for linear sequential access. So that can be workloads like big data, things like MapReduce, Kafka, where you've got streams that are sequential and logs. You're only going to get 500 megabytes max throughput, even on throughput optimized ones, and 500 IOPS. Now those are one megabyte IOPS, but you won't use it for a lot of individual read or write operations. Cold HDD, the last one we have to mention, it's really slow, 250 megabytes a second, but that's something you'd only use for cold storage. So things you aren't going to read and write option. And that's the cheapest of them all. Luciano, what do you think? How would you decide?

Luciano: Yeah, I think I can give a very high level decision framework. So a bit of a shortcut that doesn't really take into account maybe all the intricacies of different workloads, but it might be just a good reference framework if you either don't want to spend too much time investigating all the possible options, or maybe as a sanity check just to make sure that your investigation makes sense at the high level. So the idea is that if the throughput is more important than IOPS, you should probably go for an SD1 HDD type.

If it's a really cold storage that you are looking for, then SC1 is the one to go for. Instead, if IOPS is more important, and this is in my experience, in most cases, that's what it is, right? You care more about IOPS rather than throughput. In that case, you need an SSD. And if you really need the maximum IO performance possible, you need to go for I02 Block Express. Otherwise, choose the best all around, which is GP3. And as we said, it's generally the safest default these days. So again, maybe your decision tree starts with GP3, and then you try to look for reasons not to use GP3, and you can look at the other points we mentioned as a way to steer your decision. Now, I think there are also some other interesting features that we should quickly mention before we wrap up. The first one is that with EBS, you can have multi-attachment.

That basically means that when you create a volume, that single volume can be attached to multiple EC2 instances. And there is a limit, of course, and the limit is 16 EC2 instances, which basically means that you can share data in a volume across 16 different machines. Only works with provision IOPS though, so you basically are forced to use IO1 or IO2 if you want to use this particular feature. And of course, it's something that you need to use carefully because you know that reading and writing from different machines in the same disk might create consistency problems. So one way to avoid those consistency problems is to use a file system that is designed for this particular use case, like a cluster or file system. So don't just go for the standard Linux one like ext or XFS because these ones might not guarantee you any consistency. There are of course other options if you want to do something like this. Probably you are thinking about NFS or EFS on AWS or maybe FSX, and these are designed for sharing data across many devices. So probably these are more scalable options. Generally speaking, they might be more suitable for this kind of use case, but we thought it was worth mentioning the idea of multi-attachment. Maybe for specific use cases, you maybe have a cluster of a few machines, it might be an easy way to just be able to share information across these machines. And the last thing that I want to mention is that there is ECS support, which basically means that you can configure your tasks or services to create EBS volumes when they are launched or deployed, but you cannot attach an existing EBS volume. The thing that you could do is that you could do a snapshot and then you can create an EBS volume from a snapshot. So again, in the use case where you might have datasets that you have created before and you want to be able to consume those datasets from an ECS cluster, this could be one way of giving your cluster access to that data. And of course, this works in Fargate too, so something to be aware. And this is everything we have on this Jellyfish EBS episode, so hopefully you enjoyed all of that and you find it valuable. But of course, don't forget that if you don't clean up your old volumes as snapshots, you are going to get stung! So thank you very much for being with us and we'll see you in the next episode.