AWS Bites Podcast

8. What are your AWS horror stories?

Published 2021-10-28 - Listen on your favourite podcast player

In this Halloween-themed episode, Eoin and Luciano tell some AWS horror stories! Get ready for some trick or treat!

Of course we have to start with billing and we tell some of our failures with predicting cost ending up with a nightmarishly bad billing surprise! We also discuss some horror stories from the perspective of AWS developer experience and finally we touch on some Cloudformation terrors!

In this episode we mentioned the following resources:

Let's talk!

Do you agree with our opinions? Do you have interesting AWS questions you'd like us to chat about? Leave a comment on YouTube or connect with us on Twitter: @eoins, @loige.

Help us to make this transcription better! If you find an error, please submit a PR with your corrections.

Eoin: Hello and welcome to a special Halloween episode of AWS Bites, the weekly show where we answer questions about AWS in about five minutes. My name is Eoin and today I'm joined by Luciano. Before we get started, be sure to give us a follow and subscribe so you can get notified when the next episode goes live. But today our Halloween themed question is, what are your AWS horror stories? I guess the scariest thing about AWS has to be billing. Luciano, do you have any billing nightmares that you can tell us about?

Luciano: Of course, that was actually the first thing that came to my mind about AWS horror stories. Specifically one time where I was working on a very new project, it was a startup, we didn't have big funding. And then suddenly we had a bill of like $7,000 I think. And then I realized that it was actually my fault because by mistake in Terraform, I ended up provisioning a number of IOPs in an EBS volume that was way over what we needed. And yeah, we just did realize when we got the bill and thankfully we did have some credit with AWS at the time. So we didn't just bankrupt the company straight away. So there is a little bit of a silver lining in that story. Yeah. Do you have any things like that related to billing? Yeah, I have.

Eoin: I've had a few scary stories. I think in general, if that kind of thing happens, you can try and reach out and clear your case and get some mercy from AWS. Yeah, I've had a few of them. I have to say I've left EC2 instances, not one or two, but thousands running over the weekend and realized on a Monday morning when I saw the billing alerts. But yeah, look, you got to put billing alerts in place and try and react to them as quickly as possible.

I also had a really interesting one. I mean, one of the really, I suppose, standout billing nightmares with AWS is the data transfer cost. And another particular one is NAT gateway cost. So managing that gateway cost is pretty expensive. And I did have a case where it was working in an enterprise that had, you know, a direct connect link into their corporate data center. But I was using CDK actually to use one of the ECS patterns for CDK.

So it's a really nice construct that will create all of the infrastructure you need automatically. So you don't have to worry about what's under the hood. Except I didn't realize, I didn't pay enough attention to realize that what it was creating under the hood included its own NAT gateway and its own VPC and its own network constructs as well as the ECS infrastructure and the cluster and SQSQs and all the great stuff just from having one line of code that creates this nice AWS queue processing service. And yeah, because we were doing a lot of S3 traffic, we ended up getting charged quite a significant amount of money. I can't remember exactly what it was, but that was completely not obvious to me at the time. But I've definitely learned my lesson and checked these things and double check each time. What else have we got in terms of AWS horror stories? Is there anything not building related? Yeah, I think in general there could be a lot of other stories regarding developer experience.

Luciano: And we all know, for instance, the web console is not great. Sometimes you struggle to do even things that you expect to be like basic features. But recently I struggled a lot with EventBridge, for instance, which is even a new service. I was expecting this problem not to be there. And it's kind of a funny one because in EventBridge, the way it works is you can trigger events, you have AWS events, you have custom events.

And then the way you hook into those events and react to them is that you have to create rules. But the problem is that when you are publishing an event, if you are not the one publishing it, maybe it's an integration. In order to visualize that event, you need to write a rule. But if you don't know the shape of that message, it's very hard to write a matching rule. So I did find myself spending a lot of time in this kind of chicken and egg problem trying to figure out, OK, how am I supposed to write a rule for a message that I don't really know about? And I know that there are solutions, but I did end up spending a lot of time trying to solve that particular problem the first time I was using EventBridge. So maybe something to be improved there in terms of user experience.

Eoin: Yeah, for developer experience, there's so many horror stories we could probably talk about and so inconsistent as well across services. One of the things I find is that if you decide to go with a third party service versus AWS and you're trying to weigh up the pros and cons, a lot of people will lean towards AWS because they've already got everything set up in AWS and you've got the IAM permissions all in place.

But that can bite you when it comes to developer experience. And one of the nightmares I've been through a few times is setting up build pipelines. And people can assume that code pipeline and code builder are something analogous to using GitHub actions or GitLab CI or Jenkins. But in fact, it's a very different beast. You really have to invest a lot in understanding what the constructs are. You can't just create a YAML file with declarative workflow and get it working.

You have to create and configure individual resources and every step in your pipeline is an AWS resource. And it's powerful in a lot of ways that it's well integrated into AWS. But it doesn't have the flexibility of all those other options, unfortunately. You can't dynamically control the workflow. You can't just skip actions in a code pipeline stage very easily. And you don't get that nice kind of visualization that you get in GitHub or other services where you can see your code pipeline really easily and link it through to your pull request and everything. So I've definitely had a case where I thought, okay, this is going to be easy to set up a code pipeline. But it ended up really being a horror story and something that burned me badly. So I'm very cautious about recommending code pipeline as a result.

Luciano: So one last one. Yeah, go ahead. I was thinking about deployments and well, pipelines. You make me think about deployment. And one thing that I'm sure that many people have been through is when you have to fix a very urgent bug in production and you are working really hard, you have the changes, you are ready to ship them. And suddenly, CloudFormation is throwing at you that was the actual name update rollback failed which takes whatever to settle. And then you are there just waiting like, AWS, please do something for me. I really need to ship these changes. So I don't know if that ever happened to you, but it happened to me a bunch of times and has been an horror story every single time.

Eoin: Yeah, I think everybody who uses CloudFormation probably has a nightmare where they wake up in a cold sweat dreaming about update rollback failed. It's really so difficult to recover from. Yeah, that's definitely a true horror story. Maybe with that, we scared people enough and we should let people go have a lie down. But thanks very much for joining us and do make sure to like and subscribe so that you can find out about the next episode and we'll see you next week.