AWS Bites Podcast

48. Building a File Transfer application on AWS - Live coding PART 1

Published 2022-08-17 - Listen on your favourite podcast player

How can you build a WeTransfer or a Dropbox Transfer clone on AWS?

This is our first live coding stream. In this episode, we started a new challenge: building a product live on AWS!

In this first issue, we managed to implement a very simple MVP using S3, API Gateway, and Lambda.

All our code is available in this repository: github.com/awsbites/weshare.click.

Let's talk!

Do you agree with our opinions? Do you have interesting AWS questions you'd like us to chat about? Leave a comment on YouTube or connect with us on Twitter: @eoins, @loige.

Help us to make this transcription better! If you find an error, please submit a PR with your corrections.

Luciano: Hello, everyone, and welcome to our first live stream of AWS Bites. My name is Luciano, and today I'm joined by Eoin. So hello, everyone. Hello. Yeah, if this is the first time for you hearing or seeing AWS Bites, what AWS Bytes is is basically a podcast about AWS where we try to share all the things that we learn and discover about AWS. And so far we've been doing it in the form of a regular podcast.

We have a website, awsbytes.com. You can go there and watch all the previous episodes. And in every episode, we basically try to answer different kinds of questions about AWS. For instance, we've been talking about all the things that you need to know when doing JavaScript with serverless on AWS, or we compare Terraform and Cloud formations. We have a bunch of different topics. And by any means, feel free to let us know which one did you like the most, which ones would you like to see in the future, and things like that.

But for today, we want to try something a little bit different. We actually want to try to do a little bit of live coding and actually show what could be the experience of building a product on AWS. And the product that we have in mind for today is something that looks like a clone of something like Dropbox Transfer or maybe WeTransfer, where basically the idea is that you need to share a file with somebody else.

Let's find the easiest possible way to do that, or the fastest maybe possible way to do that, which is basically let's upload it somewhere and then have a URL that it's somehow secret and we can take that URL, share it with the person we want to share the file with, and they can use the same URL to download the file that we just uploaded. So the idea is that basically we are going to build a service like that using AWS services.

So probably we're going to be using things like S3 to store the files. We're probably going to be using signed URLs to be able to upload and download things. And we might be using API Gateway because probably we want to turn that into an API that then we can use from maybe a CLI, maybe a front end. So we have some kind of rough ideas, but it's very important for you to contribute to the conversation and suggest maybe different things that we could be doing. So for today, our goal is to, we're going to be streaming for about one hour, I think, and our goal is to try to get as far as we possibly can to just to build the first MVP, where what we expect to do is later to have one Lambda that allows us to upload to S3 and give us back a URL that can be used to download the file. So yeah, anything else you want to add, Eoin? No, it's really good to be able to get started on this.

Eoin: And yeah, let's do this. Let's see how we get on, how far we get today, and then we can start talking about maybe where we can take it next. Absolutely.

Luciano: Yeah, that makes sense. So I'm going to start by sharing my screen. Actually first of all, I'm going to mention that we already set up a repository, which is the one we are going to be using today, and I'm going to be posting it on YouTube. But if you are following from Twitter or LinkedIn, you can find it at awsbytes slash weshare.click, which is the name we selected for this project. I'm actually going to see if I can show it on screen. So let me share my screen here. Okay, this is the repository. So this is the URL awsbytes weshare.click. Right now it's pretty much an empty repository. We only have a little bit of readme and an architecture diagram that I'm going to be trying to describe. Cool.

Eoin: Actually, let me show it here, which is a little bit bigger.

Luciano: So basically the idea is that, as we said, we want to have a quick way for Alice and Bob to share one particular file through the internet. So the idea is that we can give Alice basically a web server where she can upload the file and then she will get back a URL that once shared with Bob allows Bob to download that same file. Now, if we zoom in a little bit more on how we are thinking to implement all of that right now, at least for version one, is that we have an API gateway that gives Alice a particular API endpoint.

On this API endpoint, we can do a post and this post basically, what it's going to do is going to trigger a Lambda and this Lambda is going to effectively create a pre-signed URL on S3 that is going to allow Alice to upload the file itself. So the first step is that basically with this request, Alice gets an S3 upload and then she can use that S3 upload pre-signed URL to actually put the file into the bucket.

But at the same time with the upload URL, we also generate a download URL that is going to be ready to be shared as soon as the upload is completed. So the first step is actually giving back two things, the upload URL and the download URL. So at that point, Alice can upload the file and then just share the download URL with Bob. I think there might be different ways to implement that. This is just one way that we came up with and this is the way we think we are going to go for today. So if you have other ideas, definitely let us know and we are open to discuss alternatives and maybe pros and cons of different alternatives. Yeah, the URLs will not be pretty.

Eoin: Maybe we should pre-warn people, right? These URLs will not be very user-friendly, but don't worry, future episodes will take care of that. Yeah, absolutely.

Luciano: I see we have a comment here in the chat by Abu. Welcome to the stream. I was just discussing this idea with my colleague as a side project and now I'm seeing this. Sorry, we are going to spoil the side project, but I mean, if you had different ways of...

Eoin:

Luciano: You were thinking about different ways to implement this by any means, chime in and we can chat about that.

Eoin: Yeah, also different features for sure. Yeah. I mean, I can imagine like we're going to build this and eventually we have the name we share.click. So this is the domain name we're going to use for this deployment, but I can imagine other people would maybe take this and use it as a way to store files for themselves, to share files between different devices and have it as their own personal Dropbox using their own AWS infrastructure, right? And avoid having to go to Google or Dropbox for their file storage. So yeah, it's probably an opportunity here for us to all have our own forks and customized deployments of this application.

Luciano: And I see that there is another comment on YouTube by Italo. Sorry, I cannot put this one on screen, but it's basically asking, should we try to rebuild this thing and then update you on the progress? Absolutely yes. Feel free to redo it, copy paste our code, try different things because again, in AWS, there are millions of ways to build anything and you can use different services. So it will be interesting to also see what other people will think about how to solve this particular problem.

Okay. So let's maybe start to look at the code and this is a very kind of vanilla repository that is literally nothing. So maybe we can start by doing a little bit of just creating the kind of the structure for the project. So we are going to be using Node.js as a kind of runtime of choice or JavaScript. So let's start by doing animate of the project. So I just run npm init dash y, which is basically just doing like a default package JSON.

And we can see the result here. It's actually taking some stuff from the repository, which is pretty cool. But other than that, not that special. One thing that we want to do is we want to start to organize this as a mono repo because we eventually want to create an API, the one we just described, but also some ways to interact with this API, probably a CLI application and maybe also front end. So it might be interesting to put all the same things in the same repository just for convenience and actually recent versions of npm make that very easy.

Because we can just create a new folder and call it backend for instance. And then we can just say that that's a new workspace. And actually this is an array, if I remember correctly, and we just need to say backend is one item. Now at this point we can go into backend and do another npm init y and this is how we are creating sub projects inside our mono repo. But also we want to configure a bunch of other things. For instance, we want to add a license. We want to add a gitignore file. And there are some tools that I really like. For instance, I think there is one called gitignore. Gitignore, which we can say, I think node.js and it should give us a default node.js gitignore. Let's see if my memory is good. It is not. It gave us an empty one. Okay. So let's see if we just say gitignore if allows us to select. Okay. Gitignore types. Okay, you're going to have to do slash types. And what do we have? Do we have node? Node with just one word.

Eoin: Okay, fair enough. Cool.

Luciano: Now we've got something that looks a little bit better with a bunch of default stuff. Okay. And another thing we can do is npm, I think it's called license and we can select one license. Or do we want MIT or something else?

Eoin: Sounds good to me. Okay, let's keep it super free.

Luciano: I don't have that yet.

Eoin: I think that's good.

Luciano: And now we have a license. I don't know about other tools.

Eoin: I like it. Cool.

Luciano: So the next thing that we might want to do is ESLint, I guess. Oh, yeah.

Eoin: So we're going to write all this stuff in JavaScript. That's the plan, right Luciano?

Luciano: Yeah. Okay. It's okay to proceed. Now our mileage might vary because they keep changing this command. It's really updated very often. But it's kind of a guided procedure to pick kind of a default or a starting configuration for ESLint. And it's very convenient because of course we don't need to remember by heart all the different options or presets and things like that. So we want to check syntax, find problems and enforce code style. We want to use JavaScript modules or CommonJS. I don't remember if we decided on that.

Eoin: We're using... Yeah, I think when we're trying to prepare for this, we were using ESM modules. Okay.

Luciano: So should be... Let's give it a go.

Eoin: Yeah.

Luciano: Let's try to be very brave and go with ESM.

Eoin: There are some interesting corner cases and edge cases you encounter with them, but maybe that'll be part of the fun.

Luciano: We have also a question by Mauro on YouTube saying, isn't PMPM usually used for Monorepo? Honestly, I don't know. I know that PMPM is just a faster version of MPM because it does things more in parallel, but I never use it. So don't know. Feel free to try it and let us know if you find any meaningful difference in terms of how to manage the Monorepos as well. Okay. So... I'm still trying to figure out how MPM workspaces works, especially for service projects.

Eoin: And maybe this is something we can deal with when we start adding third party dependencies into different services in the repo and figure out how to package them. Yeah, exactly.

Luciano: I also use it for very simple cases. So there might be edge cases that we haven't discovered yet, but the gist of it should be that if you're an MPM installer at the top level of the project, it should go inside every workspace and install all the necessary dependencies. So it's convenient that way that you can just do MPM installer at the top level. And then it should also give you a way to import from packages in the same Monorepo.

So for instance, we could have a utils package and we could be reusing that utils package in frontend, CLI, backend, and things like that. So it gives you a way to do this kind of cross import and your code is automatically linked. So if you do changes, you don't need to publish those changes at independent libraries to actually use the changes in the rest of the code. And then you have other advantages, like you can run a command in all workspaces. For instance, if we do tests, you could be running MPM tests in all workspaces and stuff like that. So it's very, very simple implementation. I think for instance, if you are used to using learner, learner has a lot more features, but at the same time, I don't think we need anything fancy right now. Okay. Do we want to use TypeScript or not? I'd say probably no. So I'm going to go with no, but we might revisit that decision at some point later. And do we want to run our code for now just not VS. Popular style guide. Now, do we want semicolons or not?

Eoin: I'll go with, I just defer to the popular opinion here. So I'm easy. I always adapt because otherwise it ends up in a big bike sharing discussion. Okay.

Luciano: Let's see if anyone. Okay. Is it possible to zoom VS code a little bit? It should be possible. Let me know if this is better. So okay. I'm going to go with standard just because it's not really a standard, but I like to present it is a standard. Sounds good.

Eoin: Configuration in JavaScript and now yes.

Luciano: MPM. Oh, nice. Now you can pick. Okay. Interesting. This is a new option I haven't seen before, but now you can pick your favorite package manager. We are using MPM. So let's stick with it. Perfect. And this should have created a default ESLint configuration for us. Now another thing that we should have been doing is type modules. Good one. Yeah. Because this one allows us to use ESM everywhere without having to call the files CJS or MJS. So every JS file will automatically default to ECMAScript modules, which means that we'll need to rename this one CJS. Cause this is like a special case where it needs to use module.export. Okay. So do we want to do anything else in terms of bootstrap in the project? Let's see.

Eoin: I think from a JavaScript point of view, that's okay. Next we're onto the more AWS and serverless specific.

Luciano: So I'm going to do a commit with what we have so far. Okay.

Eoin: That sounds good.

Luciano: So we have created a new ESLint, gitignore, license, backend, package log and packages. Perfect. Okay. Now you should be able to see these changes in the repository. Okay. So I suppose the next step is based on what we saw here.

Eoin: We need to have a way to create all this infrastructure and to start to write the code for our Lambda.

Luciano: And this is also another topic where there are lots of different options, but I suppose one of the most famous ones is the serverless framework. So let's actually show it here. Serverless framework, serverless.com. So this is basically a way to, it's a framework that you can install and use it in your project, but allows you to define all the infrastructure in a YAML file, and then allows you to deploy that infrastructure together with your code. And every time you do changes, behind the scenes, it's actually using cloud formation, so you can actually deploy changes incrementally and you don't have to worry too much about how do I replicate these changes in different environments. You just do deploy and it's just gonna make sure that your infrastructure converges to the final state that you currently have in your code base. I don't know if I'm describing that well enough, but yeah, we're going to see it in action.

Eoin: YAML will tell a thousand words when we get going with our serverless.yaml. All will become clear.

Luciano: Certainly. If I remember correctly. Oh yeah, we also have DRAM, no semicolons please. So I'm glad we picked that option. That is the bullet there.

Eoin: Okay, so we can bootstrap our serverless project with the generator. That's our first step, is that it? Yeah, remind me the command because I'm not sure I remember.

Luciano: I think there is an NPX-SLS.

Eoin: It seems to be, yeah, NPX-SLS or NPX serverless, and then the command is create. And then we want to do, we want to pick a template. So it's, yeah, dash dash template, or I think you can also use single dash T and AWS dash Node.js. Like this, right?

Luciano: Yeah, and you're within the root.

Eoin: Yeah. Oh yeah.

Luciano: Actually you want to be in the backend directory, I think for this. So let's copy all of this.

Eoin: Although you can pass a directory, but I don't know what the path is off the top of my head.

Luciano: Let's do the same thing. Okay. Okay. Now you've got a serverless.yaml that is much bigger than you need.

Eoin: So you can start trimming.

Luciano: Let's look what we have here. Okay. We get a bunch of comments that we're going to remove just to keep things minimal. So the service is being called backend. It's just taking the name of the folder. We could rename that. Probably let's call it WeShare. Click, or WeShare or WC, WC sounds a bit funny. Then this is basically saying that we use the version of serverless framework, number three, version three. There are different versions and the yaml that you write can change a little bit depending on which version to use. So we want to use the latest. Now here is using by default node 12. We want to use node 16. Do we need to specify the region here or is it going to infer it by what we have? Oh, we lost Eoin. I think I'm going to specify the region here. We lost Eoin for a second. Eoin is back. I'm back.

Eoin: I back buttoned on the browser.

Luciano: You ejected yourself. Do you remember if this one is mandatory or not?

Eoin: That's a good question. I always specify it because I want to be explicit. I'm guessing it is, but I don't know. Maybe somebody knows the answer to that and they can tell us.

Luciano: I think it's maybe a good default because if different people are contributing to this project and they might have different default regions, probably this is going to be making sure that everyone uses the same region at the end of the day, right? Yeah.

Eoin: And you might make this configurable, but I think it's good to be clear in that. Okay.

Luciano: Now here we have an example about how to add permissions. I'm going to ignore all of this because we are going to be using probably a plugin that allows us to do this in a slightly different way. Yeah, let's get rid of that. So actually I'm going to delete all of this. Then we can also define environment variables, which we don't need right now. And we can define which files needs to be included or excluded. I don't think we need to do any of that. Right? Yeah.

Eoin: We don't need that right now. If we really want to trim things down later, we can, but generally it's not needed. Okay.

Luciano: So now this is the interesting part. This is very generic. Like what's the name of the service? What's the framework? What is the provider? This is actually interesting because serverless framework can work not just with AWS, but also with other providers like Azure or Google cloud, I believe. So here we are basically configuring more kind of generic project level settings. We want to say, we want to use AWS as a provider.

We want to specifically use Node.js for everything that is like compute, like Lambda functions and the region is this one. And this is where we start to specify Lambda functions. And by default, there is like an L-law function and we can see that this also created an handler.js. So this is basically saying we want to have a function called a law where the code lives inside handler, which is handler.js. So this is, we need to change this because by default is using common JS, but you can have kind of a starting point for your Lambda function. And I think that's all. Then here we have a bunch of other examples. We have outputs and other resources. Do we want to create, do we want to start from the function or do you think it's best to do something else before?

Eoin: How about starting with the bucket since this is the center of our architecture?

Luciano: That's a good point. Yeah. So let's do that. Maybe we can even deploy the bucket and then start writing the function.

Eoin: Okay.

Luciano: We can probably leave the L-law function there and deploy just to show how a function is created. So resources is basically a more generic thing, which is actually behind the scenes is going to use CloudFormation. So here we actually going to be using mostly CloudFormation syntax while functions is kind of a higher level idea. It kind of makes it a little bit easier to create CloudFormation resources.

It's way less verbose. It's going to take a bunch of defaults for you. But I think the first time you see this, it might be a little bit confusing because at the end of the day, they are both creating a bunch of CloudFormation resources. So you could be doing everything even without writing this and just writing pure CloudFormation here. But of course, at that point you are kind of losing the benefits of using serverless framework. So we are going to be using functions for everything that regards AWS Lambda functions. And this is going to make our life much easier to do that. But then for things like buckets, we are going to be using resources because I'm not aware if there is an easier way to do that in serverless framework. Yeah.

Eoin: I mean, when you create a bucket, you have one resource that creates a bucket. When you create a function with the serverless framework, you have a function which might have a configured version as well. And then if you have an event trigger, you might have a lot of API gateway resources as well. So sometimes like five lines of YAML in serverless framework will actually generate 200 lines of CloudFormation with 15 or 20 different resources. And maybe we can do that later on. You can run the serverless package command and we can see the generated output of CloudFormation.

Luciano: We can appreciate that we don't have to write it. That's a good idea. Let's do the bucket first. So the idea is that this is the syntax where we can start to specify resources. So you can see that YAML is very nested. And for every year, it's basically an array of different resources. And it's actually a key value pair where every item has a name. For instance, here we want to give it a name, I don't know, file bucket.

And then here we specify all the properties. And the properties are generally a type. And in this case, the type is, if I remember correctly, something like this. Yeah, auto complete is helping me. And then you generally have properties. And properties, they will be different depending on the type that you are using. In this case, they are properties that make sense in the context of an S3 bucket.

So definitely we're going to have something like bucket name. And here, this is interesting because we can call it some random name for now. But in reality, we really want to make it random. So we will see how to do that in a second. And then there are some kind of best practices that we can use. For instance, generally when you do a bucket, you want to make sure that it is encrypted. So what we can do is do something like bucket encryption. Specify... Oh, I like that. The auto compute is doing all of this for me. But we can say something like server side encryption by default. Yeah, that's what I wanted to do. And here, AES2056. Now if you search online, you are going to find the stuff. This is something that we looked up before. You don't need to remember by art all these things. But the idea is basically every file that is going to be in this bucket, it's automatically encrypted server side. And it's going to be using this particular algorithm.

Eoin: So it's not using KMS keys, which is the other option.

Luciano: So AWS is kind of managing all the keys for you. You don't need to worry too much about keys. If you want more control, you can use your own keys through KMS. And another thing is that we want to limit public access. So we can do this one. And then there are a bunch of properties that I'm actually checking in another tab. So I'm not going to pretend I remember them by art. But basically we want to block public ACL, which means if somebody is trying to make this bucket public by mistake or whatever reason, this configuration is going to prevent that. Right?

Eoin: Yeah.

Luciano: I remember. I always copy paste this too.

Eoin: There are four properties, right?

Luciano: And then this is very similar. Ignore public ACL. True. Like this is trying to prevent all the possible ways that this bucket could be made public. Do we need anything else?

Eoin: So I think these are the best. These are the good security practices. You know, we've got a config rules that warn us when we create buckets that don't have encryption turned on and that are public, that can be made public. The other thing that I have recently started adding into all bucket declarations is to turn on event bridge notifications, which is a relatively new feature. But it makes sense because that means you can start reacting to objects being added or removed from the bucket using event bridge. So you don't have to use some of the older methods like CloudTrail or S3 notifications, which were a lot more limited. So it's definitely a good idea, I think, to add in event bridge notification. So if you create a bucket in the AWS console, there's a checkbox for this. In CloudFormation, it's under the notification configuration property. Yeah.

Luciano: And I think we have an example for that if we just want to add it there.

Eoin: Yep.

Luciano: I'm just going to copy paste. Should be something like this. For some reason, Visual Studio got better like this property, but I think it's connect. Because it's new, maybe.

Eoin: Okay.

Luciano: I've seen that Andrea has a questionnaire. So he's asking the only... It's more of a comment, I guess. The only issue I found by putting S3 buckets in the serverless configuration is that when I needed to remove the stack for any reason, I redeployed all the objects and the bucket were deleted. This is a very good point. And I think that there are ways to limit that behavior. For instance, there is definitely a way to make sure that the bucket is not deleted. I think it's called deletion policy. Yeah.

Eoin: This is a CloudFormation property that you can put on lots of different resources like DynamoDB tables as well. It's not within the bucket policy itself, but it's actually at the higher level. So it's a sibling of properties itself.

Luciano: So we'll need to put it here.

Eoin: Yeah. Still doesn't like it, but...

Luciano: Doesn't like it.

Eoin: Okay. I trust you that this is correct.

Luciano: Trust but verify. Yeah. We'll verify what gets generated later.

Eoin: So one issue on that is CloudFormation won't delete objects from your bucket. So if people have seen that in the past, it might be because you're using some tooling that is deleting objects for you. It shouldn't delete your bucket with objects. But what this does is it just makes sure that even if the bucket is empty, it won't be deleted when we delete the stack. Yeah.

Luciano: Okay. So at this point, what we can do is we can see what gets generated, right? So we can do SLS package, if I remember correctly, right?

Eoin: Which is basically...

Luciano: It's not deploying, it's just bundling up everything. And then we can see what would eventually get deployed if we proceed with that package. So it's kind of a preview in what's going to be produced for us. And we can see that there is a new folder here called.serverless. So if we go in there, we have a bunch of CloudFormations and this serverless state JSON. Now I think the one that is most interesting is the first one, CreateStack.

Eoin: The CreateStack is the one that creates the bucket that serverless framework is going to use to deploy its own assets. So all of your stuff is going to be in the update stack, because it does this kind of two phase update on the first deployment.

Luciano: And this is because serverless will need to upload certain assets to be able to proceed with the deploy. So this is kind of a bootstrapping thing.

Eoin: It's a bootstrap. Yeah.

Luciano: Okay. So this is the one we are actually interested in, which looks almost identical because we have... Actually this one is the same, right? It's literally serverless deployment bucket.

Eoin: It should have the same bucket because it has to make sure that it keeps that bucket.

Luciano: And now...

Eoin: Then that's also their bucket, but it's just the policy, the bucket policy.

Luciano: This is our file bucket. Yep. Okay. It is actually adding this deletion policy, retained. Type, is that one properties? Now this, some random name is something that we definitely need to change. And we'll talk about that in a second, but everything else seems to make sense. Yeah. Okay. And when we do this, when we have a function that will look a little...

Eoin: When we have a more complex function, it will look a lot more interesting.

Luciano: Now why do we need to change this some random name? Because an interesting thing about S3 buckets is that they have to have a unique name across every account and every region. So if we try to use some random name, maybe we get lucky, meaning that nobody else ever used this one. But if somebody just tries to deploy the same stack as it is, they will bump into a conflict with whoever deployed first.

So it's better to actually have a way to generate a random string straight away. Now this is not something that you can do easily with just a CloudFormation itself, but because we are using serverless framework, serverless framework actually gives us ways to interpolate code, let's say, or code generated strings into whatever is going to be the result in CloudFormation. So it's kind of a template language at the same time. It's not just giving us an easier way to create some resources, but it also gives us more abilities in terms of how do we structure the infrastructure as code. How do we write infrastructure as code? We have more functionality in terms of string interpolation and things like that. So if I remember correctly, it is possible to create a JavaScript file that actually executes some logic and then the result of that logic, it can be a string and then we can use that string in our template, in our serverless.yaml, right?

Eoin: Yeah, there's a specific function signature and I'm going to link into the documentation in serverless.com, serverless framework documentation that tells you how to do this. So you can check that too.

Luciano: So we can call this file, I don't know, unique target name.js. Actually it needs to be common.js. So we need to do cjs and then basically what we do here, module.exports equal, it can be an async function. And basically whatever we return is going to be a variable that we can use in our own. By the way, I'm using Copilot. Copilot is trying to suggest us something which is not really what we want to do.

Eoin: No, this looks like a Lambda handler. You're writing a Lambda function.

Luciano: But the idea is that we might be doing something like this. We might be doing, I don't know, bucket name and something random. Then we should be able to reference this bucket name. And before we implement all of that, let's try to wire things in and see if it's actually giving us something random in the final, in the final CloudFormation template. So here what we want to do is basically we can use, I think it's like this syntax, right? We need quotes?

Eoin: You don't need quotes for this. You just need to... And it's like file or something like that.

Luciano: Let's see if I remember.

Eoin: It's file, yeah. And then there's no colon actually at this point. So it's file and then parentheses.

Luciano: Right. Okay.

Eoin: Yeah. I can check. I'm verifying all of this in the background. So don't worry. It's not all off the top of my head.

Luciano: And then the path. We call it unique bucket name.cjs. If I can type. And then at that point it gives you back an object. Let's say like it kind of runs that file and gives you back an object. So we can just reference bucket name, which is one of the properties that we export from that file. Yeah.

Eoin: That's it.

Luciano: Now you just need to check the spelling of unique there to match the file name. Now if we repackage all of that. It was very fast. So if we check here, now it is something random. So just to show that I'm not lying, if we change this to something random too, and we run this again, it should give us something different now. Yeah. Something random too. Okay. So now the trick is that we need to generate something random that is consistently random, right?

Eoin: Yeah. Yeah. So a lot of times what you would do here is you don't necessarily have to do this JavaScript module approach. You can just put like an interpolated string using CloudFormation substitutions or serverless variables where you just add in your account ID and the region as a suffix onto your bucket name. But for our application, since the URLs we generate will include the bucket name, we want to be a little bit more protective of our account name maybe and the name of our bucket or the account name. Just make it a little bit more obtuse, I suppose. So we just want to generate something from these variables rather than exposing our account ID in the bucket itself.

Luciano: So I'm going to copy paste a solution that we developed before, but basically the idea is that we could hash some information that we get for the current account region and use that as a unique key because at that point we are guaranteed that if you try to deploy the same thing in a different account or a different region, you will get a different bucket. But as long as you use the same account in the same region, you always get the same bucket name consistently.

So you don't end up with a different bucket at every deployment, which is basically what we are trying to avoid. On one side, we want something pseudo random, on the other side it needs to be consistently the same value for the same account and region. So we can use the create hash function from Node.js. And then an interesting thing is that you could get some information here by the serverless framework. So serverless framework is going to run this code by passing certain things into it. And this result variable is a function that allows you to actually retrieve information from the current context. So for instance, we can get the current account like this. Then we could get the region. Yeah, this is exactly what I wanted to do. Thank you, Copilot. And then we can get the stage. AWS stage. Actually region is slightly different. I think we want provider.region. Stage should be...

Eoin: No, I think you could do self provider.region or AWS colon region is a new variable in serverless version 3 that you can use directly. Interesting.

Luciano: Okay. Should we try region then? Yeah, that's good to go.

Eoin: Andrea has another useful comment actually about this particular topic. SLS print to display all the... Oh, that's a good one.

Luciano: Let's see.

Eoin: Yeah, we can try that.

Luciano: I don't like something. Stage.

Eoin: Yeah, we have AWS colon stage. That should be SLS colon stage because that's a serverless specific. Right.

Luciano: Okay. Now it's actually printing all our... Like a preview of our transformation without having to package. Yeah, that's a good point.

Eoin: Yeah. Thank you. It resolves all the variables. It's not CloudFormation strictly because it still has the high level of functions and stuff. Okay.

Luciano: So at this point we still are returning something random too, but we have information that we can hash. So basically what we can do is say const input equal and we can do a string like we share and then account hash region dash stage. Thank you, Co-Pilot. And at this point, the bucket name we want to generate is basically... Let's see. What we can do is we still want to retain a prefix, right? So let's say we share hash plus create hash MD5 update input. Yeah, that's exactly what I want. Nice. So this is basically saying take all the string, hash it using MD5, and then prepend this we share dash, whatever is the hash. Now at this point, we can just read out this bucket name. And if we do this again, we should get something slightly different. We share something something. And because these values are not going to change, we should get the same bucket again. Makes sense? Cool. Should we try to deploy all of this?

Eoin: This makes sense. Yeah, let's deploy. Okay.

Luciano: We have some credentials set up.

Eoin: Do I have credentials? I think I did have credentials in another terminal.

Luciano: Let's see. Here I don't think I have credentials. Here I have my own credentials. So how do we do this? I think I can deploy from here. I can, we can do SLS deploy, right? That's actually let me make sure we have the right credentials. I do. That looks good to me. And now we can do just SLS deploy should be enough, right? We don't need any other option. Yep, that looks good.

Eoin: Okay. This is creating the CloudFormation stack.

Luciano:

Eoin: So this should just create our bucket and that default boilerplate Lambda function we have. So if we go into the AWS console, we should see that these resources are being created.

Luciano: Meanwhile we have a question from Juan Lopez from YouTube saying the need for uniqueness across bucket name in S3 is something that I had to deal with in the past. I will not have expected that S3 bucket names needs to be unique across S3 and not only your account or region as well, I guess. And I think the main reason for that is that because S3 is like one of the oldest services. So probably some of the thinking that happens now when AWS create a new service didn't happen at the time.

And the other thing is that S3 creates domain names. So in that sense, the name of a bucket is kind of one-to-one to the domain name that gets created. And also some of the rules for creating a bucket name are pretty much the same rules for a domain name, right? So I think that was the idea at the time and probably AWS now is stuck with that decision. Okay. So everything was deployed. I should be able to go into the account and show you that we have the bucket deployed. Let me bring up another window. Okay. I have it here. Let me make this a little bit bigger. So I am filtering because in this account we have so many more buckets, but you can see the two buckets were created. This is the bucket created by serverless framework for dealing with the deployments. This is the one that we just created from our resource. Okay. So maybe what I can do now is commit all these changes and that's it to you, Eoin, for writing some Lambda code. Okay.

Eoin: Actually, status, git add.

Luciano: Should I add the handler for now? Yeah, we will change it. Either way is good.

Eoin: Serverless. Okay. So whenever you are ready, feel free to share your screen.

Luciano: We also have Gil in the chat who is sending us a chicken, a log gill. We also like chickens.

Eoin: Do you need to stop sharing the channel?

Luciano: Yes, I need to switch.

Eoin: Excellent. Let me pull down these latest changes.

Luciano: By the way, Andrea is asking if the episode will be available later. We will definitely post it on YouTube. We are also considering to add it as an audio only podcast. I don't know if it makes sense, but we will probably try that anyway. So yeah, definitely we will make it available later in different ways. Hopefully, we will see you next time, Andrea. Thank you for all the questions and comments.

Eoin: Great tips. Okay. So we're about to set about creating a function. So if we go back to the architecture, maybe it's worth a quick look at that, actually.

Luciano:

Eoin: We mentioned that we're going to create a function here that will create this kind of share that allows you to get a look at an upload URL and a download URL. So we're kind of in a restful sense, we're going to create a share object, but we don't create anything in a database. We're just going to talk to S3 here. So let's have a look at that. In serverless.yaml, we've got the existing function boilerplate.

So let's try and take this and make something more of it. So let's say we're going to give our function a name. Now, this is just a name that the serverless framework uses to identify our function. So let's just call it create share because we're going to create a share resource. And we want this to be triggered by an API endpoint. So we're going to create an API endpoint. So our handler in our Lambda function code is going to be responding to this event.

So let's call this handle. Let's just call it handle event. We might rename our handler to be a little bit more explicit. Let's call it share handler. And beyond that, we need to start wiring in the HTTP interfaces. So we've got a couple of ways of doing this. And within API gateway, we've got the API gateway rest API, which is the kind of traditional way of doing it. And now you have the kind of simpler and more cost effective way of doing it, which is the HTTP API method. I know these things aren't particularly well named, but HTTP API is pretty simple. So let's go with that one. So we're saying that this function is going to be triggered by a number of events because this is an array here we're creating. So this function can be triggered by more than one. We're just going to restrict it to HTTP API events. So for that, we just need to give it the method. And since we're creating this share resource, then this will be a HTTP post and we can give it a path.

Luciano: Maybe one thing that is worth clarifying for people that never used Lambda before is that Lambda is not like something that is always running, but it's just a function that is automatically triggered by AWS when a specific event happens. So what we are doing here is basically telling AWS the event that we want to use to trigger our Lambda is an HTTP event in particular, like a post request to a particular path. Yeah.

Eoin: Yeah. This is definitely worth stating. We can give this a path and what we can do here is just give it the root path and why we do this at the root rather than creating like a specific path, like a share resource here, it will become more apparent when we look at introducing domain names and API later APIs later, because we can create all of the functions and APIs relating to this type of resource all within this serverless project.

And then we can map it to a domain name with a path so we can actually apply the path later. And this makes it much easier to do that. So that's our handler code. This is a handler module we're going to have to write and this is the event and that's it. We will need to add some permissions, but maybe we can come back to that because that's maybe a little bit clearer just to write the handler code that will do the code that responds to the event.

So let's start creating that handler now. So since we renamed it, handler.js is not going to work anymore. So let's just rename this to share handler. I must remember to use Node.js syntax rather than Python. Okay. So let's just rename this one. Okay. So we've got all sorts of hours here, like semicolons and common JS modules. So let's just make this and go from scratch. So the syntax of our handler is we're going to create a function.

This will be an async function with the syntax for a Lambda function, which is that it takes an event and it also takes context. We will probably not use context in fact, so we can admit it completely, but let's just leave it in there for clarity maybe for now. And within this handler, we're going to do everything we need to create the get URL and the upload URL. So maybe we can just think about the syntax of this.

This is an arrow function. This is an arrow function, so I'll need an arrow. So we've got a few steps, right? So what are the things we need to do in order to interact with S3 and generate an upload URL and a retrieval URL? So we're using the concept of S3 presigned URLs, which are really useful feature. And the beauty of that is that it allows us to offload all of the scalability for retrieving and uploading large files to S3 completely. So none of that file data ever has to actually go through any of the systems that we're building here. And that's really the goal because S3 is way better at handling the throughput and scalability that would be required if the system was to scale.

Luciano: Yeah, I guess another advantage to that is that signed URLs allow us to define certain like boundaries for which the file can be uploaded or downloaded. Now I think for the first version, we're not going to worry too much, but in the future, we might use that for instance, to limit the time that the file is going to be available, just as an example, but you can put other like boundaries and they are just built in. You just need to configure specific properties. You don't need to implement additional codes for that.

Eoin: Yeah. So just referring back to the diagram, then we're creating this function here and we've already declared everything we need to do for the API gateway endpoint. We've declared the post. We're just using the root path at the moment. And the next thing we need to do is then create presigned URLs. So maybe let's just pseudocode this out or comment about the steps we need to do. So when we think about it, if we want people to be able to upload an object, we need to have some sort of identifier for this file or object.

So we should probably first create a key or a file or a file name. Then we'd want to create an upload URL and then we create the download URL. And finally we'll return something like an object, right? So this is an API gateway Lambda proxy, which means that API gateway is proxying to the Lambda service internally within AWS. So there's a specific contract that you'll have to obey here. That means when you return your response, it should have HTTP status code.

And if you want to return a body, it should also have a body and you can also return HTTP response headers. So in the status code, I guess what we want is a 201, right? Because we're creating something and that would indicate we've successfully created something and then we're going to create a body. So this will be something with a download URL and upload URL. So let's go through this. So we need to start using the AWS SDK because we're going to interact with S3. So let's set about that. In order to get this to work, we'll need to install a few modules. So Luciano, is it a good time to talk about the AWS SDK v3?

Luciano: I think we covered it on the podcast actually a couple of months back, but it works quite a different way to the one we're used to, the AWS SDK version two.

Eoin: Yeah, I think it's interesting to show how it works. Okay. Okay, good. So these are the two modules that we researched and know we need to use in order to interact with S3. So you have the, everything is a separate module, the AWS SDK v3, that allows you to have very small bundles of modules when you deploy. So we actually have two that we can use here. We have the S3 one and we have the S3 request presigner one, which is a separate module just for doing presigning. And I'm installing those as dev dependencies actually. Is that a good idea Luciano? Or should we be installing these as top level dependencies?

Luciano: I would probably consider this top level dependencies because they need to be available when we run our code in production, right? It's not just something we use for building or for testing.

Eoin: And Lambda does provide the AWS SDK in the runtime, but if you want to be sure that you're pinning to a specific version that you've tested with, this is a good practice. Okay. So now that we've got that, let's set about importing these and getting to use our S3 client. Okay. So there's, the first thing we need to do is get our S3 client. So the syntax for that is S3 client. There's a couple of ways you can use the AWS SDK version three, one of which is very similar to SDK version two.

But the new kind of idiomatic way to do it with the version three client is with the command pattern. So if you want to be able to create, get an object from S3 or put an object, then you're basically sending a command to get object command or put object command to the S3 service. So let's import the classes we need to do that from the S3 client module. And even though we're not doing the upload or the download in this handler, we're generating a presigned URL, but the presigned URL needs to know what is the command that this URL will ultimately fulfill.

So that's the pattern we're following here. So let's just have a quick look. Maybe we'll do the download URL first. So maybe what we can do is just have a look at the syntax of the get object command. So let's say we create this command, get object command. So we're creating a new instance of this get object command that needs the properties that get object command accepts, which is going to be a bucket.

So we need to figure out what our bucket name is. I'm going to suggest that we take that from an environment variable, and then we're going to need a key. So the key is the path to this file. So we're missing a couple of things here. We've got red lines all over the place. So what we need to do is make sure we import the environment variable so we can take that from process.env. And the key is something we can generate. So let's figure out how we would generate that. I think it makes sense to use like a UUID, like you have version 4 UUID for that. And there's a new, you don't need to install a third party dependency for that anymore in Node.js, do you? Yeah, I think since node 14 or 16, I'm not sure.

Luciano: But yeah, you have now built in functionality in the crypto module. Excellent.

Eoin: Can I ask you a question, Luciano, because I don't know the answer to this. I saw that I had two autocomplete options here. One was crypto and one was node crypto. And I've seen people use both, but I'm not sure what the difference is.

Luciano: To be honest. Can you clarify? Yeah, node column crypto is the recommended way, I would say right now, because the idea is that the module resolution algorithm, the way that it does to resolve a package is by giving precedence to whatever you have in your node modules. So for instance, if you install a third party module called crypto, then you end up importing something that is not the node core crypto. So by doing node column crypto, you are kind of explicitly saying, I want to use the node JS one. Like in reality, this is not a very common problem, but because it has been a problem and it will be very hard to debug otherwise, I think this is why we have now this new best practice where every day you are importing a node core module, it's better to prefix it with node column. Okay.

Eoin: Yeah, that makes sense. I guess there could be a vulnerability if you have a spelling mistake in here, right? Yeah.

Luciano:

Eoin: Because then somebody could have published some sort of supply chain attack thing to NPM. Okay. That's really good to know.

Luciano: Yeah. If you end up packaging a folder called crypto inside your node modules, whatever you have there is going to take precedence. Yeah.

Eoin: Yeah. Okay. So let's, let's generate a UUID and from that we can create a key. Now we could just make the key to be the ID, but if we think about lots of uploads of hundreds, thousands, millions of files over time into the same bucket, it might not be a great user experience when you open the S3 console and you see everything in one prefix, because that's three, even though forward slashes in S3 don't really mean anything, there's no such thing as paths.

They're more of a, just a user friendly way of browsing through files. The S3 console will use slashes as a, in the same way as you would see in a traditional file browser. So maybe it's a good idea. It's also, you know, there are cases if you have extremely high throughput on your bucket that S3 will try to automatically partition the bucket based on prefixes. So it is good to make sure that the start of your keys have kind of an even distribution and that helps S3 to automatically partition the bucket for you so that you can get a throughput allocation per partition. So let's put everything into a shares prefix, but we'll use like the first two characters of our UUID to just give us some sort of sorting or categorization.

Luciano: Remember this used to be very, very common when doing similar things with like network drives. I think at the time there was also some, maybe depending on the file system, some performance benefit to that.

Eoin:

Luciano: But yeah, I think as a user, as you described the user experience, it makes a lot of sense to do the same thing here as well, because I mean, if we, if we ever need to debug something where we know the UUID and we want to go in the S3 console to see the file, we know we need to click twice to see that file rather than scrolling across potentially millions of items. Yeah.

Eoin: Yeah. Okay. So we've got a key that allows us to create the get command. And from that get command, we can create a URL. So this will allow us to create a retrieval URL for our users to use. So what we can do for that is the, we're going to start using the presigned URL module that we already added into our node modules. There is a separate. So it's a separate module, we need to import that separately.

So this is the AWS SDK S3 request presigner and the function we're going to use is called get signed URL. These are asynchronous functions. They're going to return a promise. So we're a way to get signed URL and we need to pass in the command that we created. So the function, you can see the function signature here is we need an S3 client and we need a command. So let's pass in S3 client, our get command and our properties.

So the properties that are kind of important here are the expiry. So how long is this temporary presigned URL going to last for? You want to make sure that it's long enough that people get to upload their content by the time you send it to them, but not so long that maybe somebody could grab it and intervene. So maybe we'll just create a constant at the top of our file that gives us some sort of default expiration.

And I think maybe 24 hours seems like a good value to start with. The value is in seconds. So that will allow us to have URLs that last for a day. Okay. So we don't have an S3 client. We can create one. And that's fairly straightforward. We can just do that outside the handler because it can be reused for multiple Lambda function invocations. So all of the code that's outside the handler is going to get evaluated when the function is loaded for the first time. So in cold start phase of your Lambda, everything within the handler is going to be evaluated every time an event comes in. Okay. So now we've got a retrieval URL and the process for an upload URL is going to be very similar. So much so that I'm just going to copy paste and change everything from get to put. So we'll need a put command, which will use the same bucket and key because we have to put it to that key before we get it. And let's change the name of this to upload URL. And that will use the put command. Okay. So I think now we have everything we need to give our users.

Luciano: I think it's also interesting that you can specify two different expiries for the upload and download. Maybe you would want in real life the upload window to be very small while the download can be even a week, I suppose. Right.

Eoin: That's a good idea. Yeah. But for now it makes sense to keep it the same because we can optimize it later.

Luciano: Okay.

Eoin: Now, since this is our MVP and we don't have a command line interface, we don't have a web interface, we're giving our poor users two really ugly URLs. It probably makes sense that we don't return any JSON here. We can just return some instructions and the two URLs that they can use. So we know that they could use something like curl as a command line interface to upload and download their files. So maybe we can just give those instructions in the output. So the upload would be curl and with curl, we can do minus X put, because this is a put command to upload. So we need to specify the put method, put HTTP method, and then they can specify the file name and the upload URL. So that's the upload instruction for the user. And we can say download with curl download URL. I didn't call it download URL, did I? Called it retrieval URL. Yeah. Good catch.

Luciano: Okay. So I think we have a function so we can export that. Yeah, we need to export.

Eoin: Okay. Actually the... Yeah. I generally prefer to just say export function at the top and not even use the arrow function. I think the, yeah, I'm thinking ahead to one of the ideas we have to improve this, which is to use MIDI. And in that case, we'll do it in two separate steps. But okay, this is good and there's a couple of things we added here. So we added some interactions with S3, which means we're going to need permissions. We also added an environment variable usage. So before this is going to work, we need to go back to our serverless.yaml and do some changes.

Luciano: Also, before we forget, I think we also need to specify type modules inside the package JSON of the backend. Because I think this is what AWS will see. AWS is not going to see the top one we have created. Okay.

Eoin: Good call. So it looks like that. Is that correct?

Luciano: I think so, yeah.

Eoin: Okay. Okay. So let's add some environment variables. So we've got... We could do this per function, but I think since everything is going to be centered around this bucket, we can do this globally. So it will be applied to all functions. And we want to say that every function will receive an environment variable called bucket name. So in order to pick up that bucket name, it has to use the variable that Luciano created with that clever unique bucket name. But we can also use the CloudFormation syntax to retrieve the name of the bucket. So the shorthand for that looks like this ref file bucket. And if you look in the CloudFormation documentation for AWS S3 bucket, it will tell you that every CloudFormation resource outputs a reference. And for buckets, that reference is the bucket name. It's not consistent across all the different resources. Sometimes it's an ARN or something else. For buckets, we know this is a way to get the bucket name. So that should work nicely. I think for buckets, it kind of makes sense because it's guaranteed to be unique anyway.

Luciano: Like an ARN is always unique. Bucket name is always unique as well. Again, it's not consistent, but if you think about uniqueness, it makes sense.

Eoin: Yeah. Okay. Now, when we are adding in IAM permissions, the way I like to do it is using the IAM roles per function serverless plugin. So let's add in this plugin into our configuration here. So the way to do that, there is a serverless native way to do that, but I'd like to just do it explicitly with NPM. So we'll do NPM install, and this is a development dependency. And the name of the plugin is serverless IAM roles per function.

And this allows you to kind of honor the principle of least privilege by having a separate set of IAM policy statements for every single function. So at the top of our serverless.yaml, then we need to declare our plugins array, and we just add in the module we've installed. So this is a plugin that's going to get hooked into the lifecycle when we run serverless package or serverless deploy. And it will pick up the IAM policy statements for each individual function.

So let's start writing this then. So the syntax is to declare IAM role statements inside the function instead of at the provider level. And then we need to start creating some statements for an IAM role for this function. So this is the set of, this is the policy that the function will run with. So every time an event comes in the API, the execution of that Lambda function or the code within it will be run within this role.

So we want to allow, we need to allow the permissions that we're giving the signed URLs because we're using the get object command, the put object command, we need permissions to do those things. So the actions we want are S3 get object and S3 put object. And we'll try to be as specific as we possibly can be for this MVP. So what we can do here is we essentially want for the resource identifier should be the ARN of the bucket with the prefix attached to it.

So it's going to be something like ARN, AWS, S3. You don't need to specify the account or the region because those things are, you know, as Luciano said, it's globally unique. Then we'll need here the bucket name. So that's a placeholder for now, and then the path, which is shares. We set everything with the go and shares after that it's pretty much random. So we need a wildcard and we can also make use of another CloudFormation intrinsic function with the short term syntax sub. And this is basically saying to CloudFormation substitute variables inside that with some resolved value. So instead of bucket name, we can just put in file bucket and CloudFormation is going to take the reference of that file bucket, which is the bucket name and pop it in here in its place. So that should be enough. Yep. One thing that is worth mentioning, I don't know if we made that very clear.

Luciano: And again, this is more for people that are trying to do AWS for the first time that in AWS, everything is by default, it's like blacklisted, like you cannot do anything. So if we were not adding this policy, what will happen at runtime is that our Lambda will fail as soon as it's going to try to do a get object or a put object operation, because it doesn't have permission to do that. Now that we added this policy, what's going to happen is that the Lambda is going to run in a context where we authorized the action of get object and put object only in those specific resources that match this particular expression here. So everything inside our bucket, the start switch shares, and then whatever the file name. Yeah.

Eoin: Yeah, that's a really good point. And the other action we'll need is, well, we don't need it actually. I was just thinking you could also add a list bucket permission. Whether we do it or not, it doesn't really make a massive difference for this application, but you could also allow users to list the bucket. Why would you need a Lambda function that only gets or puts permissions to list bucket? Well, the advantage of that, and we just use the ARN directly like this. The advantage of that is it means that if you try to get an object that doesn't exist, it will give you a 404 instead of a 403 error. So it'll give you a not found response instead of a permissions error. Because if you don't have permissions to list bucket, then it's not going to tell you whether the object can't be retrieved because it doesn't exist or because you don't have permissions to read it. And I think that's everything we need. Luciano, have I missed anything? Are we ready to give this a run and try and deploy it?

Luciano: I think we can try deploying it and see if it works. Okay.

Eoin: Exciting times. So let's do this. So I'm going to run serverless deploy within the backend folder. Maybe I'll give it just some trepidation here. I'm just going to try package. We already have a validation error because it says I put in the word event. I think what I need there is events, plural.

Luciano: It's good that we get this kind of validation from serverless. This would have been tricky to debug otherwise. Yeah.

Eoin: And it's pretty new. It's only around the last year, I think. So this validation has been there, but it's really good. Okay. So that's all the information that has been packaged. Let's have a quick look at our generated CloudFormation. Okay. So now we can see that serverless framework is starting to really take over and do a lot of work here, like creating a CloudFormation log group, or sorry, a CloudWatch logs group log group for us. It's creating our role. So let's have a look at that. The role can then be assumed by Lambda, which is important. It has permissions to create logs.

Luciano: One thing that I would like to mention there is that this is where we see the advantage of serverless framework, because if we were doing the same thing with something like, I don't know, Terraform or CloudFormation directly, you don't get anything for free. Like you really need to know, okay, to create a Lambda, I need to create a role. Then I need to create a log group, which are things that you always do all the time because they are required for the Lambda to run. So with serverless framework, we are actually getting all this stuff being created for us for free using best practices, rather than having to copy paste all the stuff every time. Yeah.

Eoin: That's a really good point. Yeah. While we're on this topic, actually, one of the points that Juan has mentioned on the YouTube chat is what is the best way to test locally?

Luciano: And it's probably almost something that could be an episode in its own right.

Eoin: You can definitely test these. I think you need maybe to cover this topic very briefly. You can test using, you need to write unit tests anyway to test your handlers and to test the code within those handlers. We've done it in a fairly simple way because we've got a very simple function today. But you can also use now with AWS STK version 3, you can also use their mocks. You can, because we're using serverless framework, you can use local simulation of the API gateway and Lambda in order to test things a little bit more end to end before you deploy. But once you've tested your code, unit tested your code, you want to get it into AWS and test on AWS as quickly as possible and then start doing things like integration tests and end to end tests for a real world application. Because the local simulations can be very useful from time to time, but there's always a point at which there are limitations you can't overcome. Yeah.

Luciano: For instance, here you start to bump into this kind of philosophical questions. Okay, if I run things locally, should I test against the real S3 or should I simulate S3 as well locally? And there are ways to simulate S3 locally, but as Eoin said, the degree of fidelity might vary depending on what kind of features we are going to use. So for simple use cases, maybe everything works as expected. For as soon as you start to use more advanced features, you might start to bump into discrepancies and you might have this kind of false sense of security because everything works locally.

Then you test it remotely and you bump into issues. Another thing that is also very interesting is permissions. When you test things locally, you end up using the permissions that you have as a user most of the time, depending on which tool do you use. But most of the time, all the tools, they will use your own local credentials. And generally you have very broad credentials as an admin, right? Because when you are deploying, you need a large set of credentials.

So you simulate your code with this large set of credentials and you don't realize that you haven't specified your policies correctly. So yeah, that's another use case that I've seen a lot of people test it locally, everything works, then deploy, run it for the first time and they have a permission error and they have to revisit their own permissions. So I think it's still a very open debate whether you should try everything you can to test locally or whether you should go as fast as possible to a real AWS environment and test it there. Usually there is some kind of middle ground that gives you benefits, but yeah, for this simple use case, I think it's just easier to try to deploy remotely and see if it works that way. Okay. Yeah. Feel free to disagree with our opinion if you know better ways of testing this locally. Yeah, of course.

Eoin: Yeah, so we can see in the clip generator information then we've got the API gateway resources, including the integration routes. And then we can see actually the specific policy that's been generated. So this is the one that's been generated by the IAM roles per function plugin and it's got our, as well as permission to create logs, it's got the get object and put object on this bucket. So I think let's go ahead and try and deploy this because we need to, and I have some time in case we've made any further typos or mistakes in the code. Okay. So this is deploying to CloudFormation now. Let's have a flick into the CloudFormation console where we can see the stack. This is from the time Luciano created it previously. We should see that it's doing an update.

Luciano: I don't see it update yet.

Eoin: It says creating CloudFormation stack, which is slightly concerning. I would have expected it to.

Luciano: Hopefully you are not deploying to another account. Let's see. It could be.

Eoin: I don't think so, but let me check here because I think I have a, yeah, this is the right account here, so maybe what I can do. I have, yeah. I'm just going to pause this other one here and check if that was the same account. Yeah, it's the same account.

Luciano: Maybe a different region, but that shouldn't be the case because we are to call it in our serverless frame. It shouldn't be the case.

Eoin: So the account is the same. What did we change? Did I change the name of the stack or something inadvertently? But I don't see any other stack being deployed here either.

Luciano: Are we using the same account? Good question.

Eoin: Let me check. We are, because this is definitely from your deployment, right? Yes.

Luciano: This is the right time.

Eoin: Let me give it a full browser refresh for good luck.

Luciano: This is where we show the joys of debugging on AWS. Wow. Well, it has deployed somewhere.

Eoin: Okay, so let me check this identity again. I mean, it makes sense. This is the right account. Yeah, this looks good. So the question is, what's the region? I'm actually using credentials that have region variables here. We've got AWS region, we've got AWS default region, just for good measure. And we have the region defined in the serverless stack. So that's EU West one, and I haven't changed its WeShare backend. So the deployments deployed stack name should indeed be WeShare backend dev.

Luciano: This is really interesting.

Eoin: I'm going to give it one more go, right? So what I'm going to do is I'm going to create a new terminal, a new session in my terminal, and I'm going to set up some new credentials. So I'll stop screen sharing for the risk of inadvertently leaking any credentials. I can already see what the problem is here. When I'm looking at my credentials. Do you want to try and guess what it is? Maybe someone can guess what it is.

Luciano: Did we deploy to the wrong account?

Eoin: I deployed to the wrong account just now. So we're using AWS SSO, right? So we use SSO to get credentials for our accounts and serverless framework doesn't support SSO credentials yet properly. There is an issue on it in their GitHub. And unfortunately the latest response I saw on that issue is that we're not going to fix it anytime soon. But one of the work rounds you have to do then is use some kind of tool that takes your SSO credentials and converts them into like credentials on your file system or environment variables.

And I'm using the approach that uses environment variables. And I can actually show you what that is. So I think this will work a lot better. I'm just going to share my screen again, because I think we're past the point of worrying about leaking credentials. So this is the command I use. So Ben Kehoe from iRobot wrote a couple of really useful utilities around SSO. One of them is AWS export credentials.

And then I can specify my SSO profile and it will convert this into a set of environment variables that I can use. So when I call it get caller identity, when I did that previously, it was using my SSO credentials. But I also had an AWS profile environment variable set. So I actually had environment variables for two different profiles set up. I don't know how I managed that in this account. They were both my accounts. But I think what serverless framework did was it picked up AWS profile and used that instead of the other variables. Let's go back to our deployment.

Luciano: By the way, if you're curious about these topics, we have an entire episode dedicated to credentials. And I'm going to post the link here in the YouTube chat.

Eoin: Now we can see your stack is being updated, Isilano. So we can actually see the events occur, so it's creating the HTTP API resources. It's creating the log group and it'll be creating the function as well. So at the end of this, we'll be able to take a file, upload it to S3. I'll be able to share it with you, Luciano, or anyone else. And they'll be able to download it. That could be, you know, already this is an MVP.

You've got ugly URLs, but already this is useful. If you want to share it from transferred from one laptop to another, from your mobile to laptop, vice versa, you could use this service as it is and use it as your own personal file to share it with your friends. Now there's a couple of restrictions, right? We've got some security issues here because anyone who gets that URL, and in fact, anyone who gets the upload URL, in fact, the API Gator URL can start creating upload URLs and putting files onto our bucket, right? Which isn't great. They could certainly do a denial of wallet attack if they wanted to by just continually uploading large objects and also retrieving them because that's where you really pay because you have your data retrieval costs out of AWS. So the next thing we're going to do is start figuring out how to protect that and lock it down a little bit more. Okay. So this has been deployed and now you've got an upload URL or sorry, we've got an API URL. So we can start invoking our post URI, creating a share and seeing what response we get back. So let's say clear for this. So let's create a post to our API endpoint. Aha. What went wrong?

Luciano: Okay, let's give it a go.

Eoin: Okay. So this is our log group. We've seen you've got a log stream here and it says we can't find package AWS SDK client S3, which in fact, if you look at the function code, there is no node modules.

Luciano: I'm not really sure why that's the case. Do we need to specify something in serverless framework v3 to include dependencies?

Eoin: It's a good question. But what I think the, what we've got here is I guess if we look, we've got node node modules here. I'm just thinking about NPM workspaces. This has bitten me before. So if we look at the parent node modules, I guess we would get AWS SDK. Oh, client S3 is in there. So I'm wondering if it's failing to pick that up, but we've did this before, right? We have checked that this would work in advance and we didn't come across this problem. So I'm just wondering now. Can you try doing NPM install inside the backend folder?

Luciano: Yeah.

Eoin: I don't think it's going to change anything, but I'm happy to be wrong. So we still don't have a node modules here.

Luciano: Did you try NPM install already?

Eoin: I did. Yeah. Okay.

Luciano: I didn't see that command.

Eoin: Yeah.

Luciano: That's interesting.

Eoin: So would it be okay if I just disable the workspaces because we've only got a single workspace and see if this goes away and maybe then this is, this is like a bit of homework we can do for the next time to figure out how.

Luciano: Maybe there is a plugin for serverless framework to basically use the workspace definition. Yeah.

Eoin: Yeah. That would be good. So, okay. Let's try that. So I think what we have to do then is edit the root package.json and remove this basically. Remove the workspaces property. Okay. So if I do that and I go back to the shell and I do NPM install now within backend, does it realize that it has its own world? It does. Okay. So now I've got node modules and let's just check that I've got AWS. Maybe my problem is here. Oh, maybe we added them at the top level.

Luciano: Interesting.

Eoin: This is a known problem, not a node problem. Okay. So let's put workspaces back. I added it at the top level instead of the root level. So let's go back to our NPM install, add them into backend. I'm guessing, yeah, that's gone now because we've got workspaces back. If I go back to the root, yeah, I added them in here.

Luciano: So we could probably remove them from there.

Eoin: Yep. Which is NPM uninstall, right?

Luciano: Or RM, I think as well.

Eoin: RM. Okay. Okay. They're gone from there. They're in here now as primary dependencies. So the AWS SDK version three, right? What I'm seeing from this, the other learning I'm getting here is that the version three SDK is not packaged with the Lambda runtime. Even with the version, the node 16 latest runtime.

Luciano: As far as I remember, there is a discrepancy depending on whether you use common JS or ESM modules. I think if you use common JS, you might have some version package there, but your mileage might vary because it's not necessarily guaranteed to be the latest version or any specific version you might get. Whatever is the current version, the Lambda runtime. Okay. So it's always best to package your own dependencies. At least you are guaranteed that you get whatever you are requesting, right?

Eoin: Yeah. Yeah, I agree. Yeah. Yep. It's just something you should be, I guess, more conscious of now than we used to be with version two. Okay. So what we can also do actually is we can use an npx sls package and inside the serverless directory you have the full backend uploaded. Now the file size looks a little small, so let's have a look at the contents of this. Zip.

Luciano: I'm actually looking at the files uploaded in the Lambda and we have an odd modules folder, but it's empty. So it's interesting.

Eoin: I think we're still going to have some problems there. Okay. Can I go back to removing workspaces? I think so, yeah. Okay. All right. So let's do another install npm install.

Luciano: I like that it works even with a typo. With a typo.

Eoin: Yeah. Any sub-stream, any prefix will do. Okay. So in node modules now it looks like we have something a bit more expected. Let's run one more. Let's just do the package command because that's really quick. Okay. And we'll do our unzip again. Ah yeah. Now we've got a lot more going on. All right. I'm getting more confident that this is going to work.

Luciano: I also like that we did everything we could to avoid to bring that UUID dependency and somehow it's there anyway. I'm going to guess it's a dependency of the AWS SDK.

Eoin: Yeah.

Luciano: So we have Will in the chat saying that AWS SDK version 2 is included in this common JS runtime. So you don't get the version 3. Thank you, Will, for the clarification. Okay. That's right.

Eoin: So does that mean you get the AWS SDK version 3 if you use common JS or not? You only get the AWS SDK version 2? You get the AWS SDK version 2 as far as I understand. Okay. Yeah. Okay. All right. Let's give another go to our command. So we're going to curl minus X post to this and hopefully we'll get back a couple of URLs. Let's see. Not. No such joy. Okay. Ah, okay. I've given myself a few more. Okay. Ah, okay. I've given a Python syntax in our handler. Snake case is no good here. So I think it was handle event, but let's just verify that here. Nope. It's called handler. So I think I prefer to give it a verb.

Luciano: Yeah. Than a noun.

Eoin: So let's stick with handler event.

Luciano: It's always weird when you see handler dot handler. It's not very clear what that means.

Eoin: Yeah. Oops. Yeah. So in a couple of seconds we should have all that resolved. And like with a proper project structure here, we might have a sub we would have a sub directory for our handlers and then separate directories for the logic that occurs within these handlers and probably not avoid having all of our modules at the root with all of the code and that the lambda handler itself.

Luciano: Will is also clarifying that the SDK version three is not included in any runtime, not just runtime, but hopefully will be not JS 18.

Eoin: Oh, okay. Okay. Cause I was, I was wondering if the reason not to include it was just to keep the runtime light and to like reduce cold start time by, because I suppose what the advantage of the version three SDK is that they're modular and you don't have to bundle them all. So that's interesting. I guess having it in the runtime is good for people who are using, you know, cloud nine or the inbuilt lambda console editor to try something fairly basic because you don't have to worry about packaging modules there. Okay. We are. Nice. Okay. We've got our first URL. So now we've got, um, an upload URL and our download URL. So let me, I did promise that the URLs weren't going to be pretty. I think I've lived up to that promise. So let's, let's put something up on S3. Let's upload something to WeShare. So what'll I use? Let's say, let's use our diagram from the repository. What is it called? Oh, it's in the, um, in the directory. Yeah, and it's called MVP-DIAG.png.

Luciano: Okay. That's, looks to have succeeded.

Eoin: So let me just go back to the download URL.

Luciano: Ah, I don't know if this is going to work for me, my scrolling and pasting in my terminal.

Eoin: This might be a slight gotcha for this demo. Let's see how we get on.

Luciano: Well, I can confirm that I see something in a stream. Oh yeah.

Eoin: Okay. This is good. Yeah. I have a problem here with this URL. So let me do it a slightly different way. I can show very quickly that here we have this file in S3. Nice. Okay. Let me do this again. MVP-DIAG.png. What did I miss? MVN. MVP. MVN. It's like a hangover from my days of using Maven. Excellent. Let's give our, um, upload, download URL a go. So I'm going to do curl minus V just so we can see like things like the response headers and stuff.

Luciano: Can you zoom a little bit your font?

Eoin: Yeah. Actually, let's just, um, let's just paste this into the browser. So this is the download URL. I don't need the curl command. Okay. So now I've got a file with a not too obvious name in my downloads folder. So let me do it. Let me open up my downloads folder so everyone can see. And I guess I'm going to have to rename this to.png. And let's open it up. And we have our architecture diagram. Awesome. That's a success for MVP.

Luciano: I think so. Excellent.

Eoin: Let's have a quick check of what we added since last commit. And I'm going to commit the add handler to generate, upload and download URLs and push that and everybody can have a go. And deploy it to their own environments. And what I'm going to do before we wrap up is I'm just going to remove this stack because I don't want us to be DDoSed on our bucket. Yeah, because everyone saw the URL for uploading, right? Yeah. Maybe, maybe we've got a few surprise objects, but we better not share them publicly. Yeah, absolutely.

Luciano: I think just to wrap things up, of course, we did a very simple implementation, like the bare minimum that you could possibly do to get this working. There are a few things that would be nice to do and maybe something we can consider for the next episode. For instance, one thing that we saw is that when you were downloading the file, you totally lost the file name, the extension. So you needed to figure out, oh, this is a PNG.

Let me rename it and open it. There are actually ways that we could retain the MIME type and the file name. So we could implement all of that in the next session. And then other things we will be doing are making this a little bit more production ready. For instance, by adding proper logging metrics and stuff like that, we could be using something like MIDI to implement all these features. We will be using power tools as well. So maybe in the next session, we try to do all of this. But by any means, if you have other ideas or specific questions, feel free to ask and we'll try to tackle those questions in the next episode. Which probably is gonna be next week, more or less, same day, same time. But just stay tuned on YouTube and our social media channels, because as soon as we have all of that scheduled, we are gonna announce it. Yeah.

Eoin: Look forward to it. It was good to get all that setup done, because there's quite a few small pieces there. We were at the start before we got to actually writing the code. I think it's gonna be interesting next week. We should be able to plow ahead with making this a little bit more user friendly. I don't like the look of those URLs.

Luciano: Just as a reminder, in the chat, I'm gonna be posting our repository. And I think this is all we have for today. So hopefully you enjoyed all of that and we'll see you next time. Bye everyone.

Eoin: Thanks everyone. We'll see you next time.