Help us to make this transcription better! If you find an error, please submit a PR
with your corrections.
Eoin: Automated continuous build and continuous delivery are must-haves when you're building modern applications on AWS. To achieve this you've got numerous options including third-party providers like GitHub Actions and CircleCI and the AWS services CodePipeline and CodeBuild. This is our topic for today so you're going to hear about what CodePipeline offers and how to set it up, what are the trade-offs and when to choose one over the other, and when you should look outside AWS to a third-party provider for continuous deployment. We'll also talk about the rise of GitHub Actions and when to choose it and all the features it offers. My name is Eoin, I'm joined by Luciano and this is the AWS Bites Podcast. Okay Luciano, regardless of the service we're trying to pick for continuous integration, continuous deployment, continuous build, what are we trying to achieve? Maybe we should talk about that first. Yeah, I think there are a number of use cases when it comes to discussing topics such as CI and CD.
Luciano: And so let's try to come up with a list of the different things that you might want to do with this kind of technology. So first of all you might want to do something we can define as automated build. For instance you have a repository with all your source code, you do things on the repository, I don't know, you open a PR, you merge that PR or you just push a commit. You might want something to happen as a reaction to that event and that reaction is probably building your code and doing something with it, I don't know, running tests, making sure that your code is okay and conform to certain standards that you define.
Then another aspect is releasing a specific artifact. So for this is maybe something you would do for instance when you create a tag on your repository, you can say I don't know this is version 1.5 and then at that point you want to kind of archive that particular version and package it in such a way that it can be easily deployed to a particular environment. Then there is the deploying itself where it's basically you take one artifact and you all the code inside it, somehow you deploy that to an environment that might mean, I don't know, if it's defining containers, probably those containers the artifact is going to be an image in a registry and you might want to publish and run a task for instance on ECS using that image or maybe it's a serverless function so you might want to deploy that function and run it.
Or sometimes it can be even something more complex, it can be, I don't know, an entire cloud formation stack and you want to apply all the changes in that stack. So it's really up to you to define the granularity but the concept is you build, you create an artifact and you try to deploy that artifact. And there are other things to keep in mind, for instance we want to be, most likely we want to be using different environments and different applications so how do you manage maybe different AWS accounts, one per environment, how do you manage multiple applications running in AWS accounts?
We spoke before should you use multiple accounts for multiple applications, multiple accounts for multiple environments, so you might have a very large matrix of combinations there and your CI CD needs to be able to interact with all of those. And then there are other two aspects that I would classify as observability and security. So on one side you want to know exactly what's going on, so if you are doing a deployment what are the different steps and if something goes wrong at which step did it go wrong and you should be able to see logs and react to whatever is going wrong. And finally in terms of security that's kind of a very broad term but in general we want to make sure that our CI CD doesn't get compromised, doesn't become kind of an attack surface for people to just, what if they can steal credentials for instance and then they can impersonate your CI CD and do all the things that your CI CD can do. So there needs to be also a certain level of concern around making sure that your CI CD infrastructure is as secure as possible because generally that layer has a lot of permissions because it's literally spinning up new infrastructure, changing existing infrastructure and so on. So yeah I think that covers more or less what is the need but speaking of AWS, how do we do all this stuff because I know there are so many different services to do all these different things and I often get confused about which service does what. So should we try to do Erica?
Eoin: Yeah in the AWS console if you go to developer tools you see four or five different services and the main ones there are CodeCommit and that's kind of their alternative to GitHub and Bitbucket and the like. So we're not really going to cover that here since we're assuming people are using something like Bitbucket or GitHub. We've also got CodeDeploy and this is a special service for deploying to EC2 or ECS or Lambda. We talked about it a little bit before you can do blue-green deployments with it. That can be used regardless of whether you used CodePipeline or a third-party service. So let's also park that to one side. The other two main services are CodeBuild and CodePipeline. So those are really I guess what we're going to focus on today. CodeBuild is like the basic building block and it compares them a little bit to what CircleCI or Travis or other similar services offer in that you can declare a YAML file called a build spec and that allows you to declaratively write all your build steps and CodeBuild will execute them for you. Because it's an AWS service you always need to kind of create a resource for that to work so you also before that works you have to create a CodeBuild project and associate it with a source like GitHub or Bitbucket.
Then you can pick a container image for your project to run in and you give it an IAM role to use. You can configure the size of the instance. All in all I think CodeBuild is a pretty good service as a basic build execution environment but it's I would say no frills. It doesn't provide a particularly good UI or anything. CodePipeline then is like a continuous deployment or continuous delivery workflow service so it allows you to chain multiple stages and actions together.
So you can use it together with CodeBuild and it will allow you to orchestrate multiple steps. So in CodePipeline you have this concept of stages so each phase of your workflow is called a stage and each stage can have multiple actions and actions can be run in parallel or in sequence. There's lots of different actions you can do. So you've got source actions and then you've got so source actions can come from GitHub, it can come from S3, ECR, CodeCommit and then for actually doing stuff based on that source you can run a CodeBuild job, you can even run Django, you can even run Jenkins and there's also lots of third-party providers as well including lots of providers for running specific tests. So it's well integrated into lots of AWS services but also third-party services and it then allows you to deploy with CodeDeploy out to ECS, you can deploy out to S3, Elastic Beanstalk and as well as that if you really want to do custom stuff you can invoke Lambda from CodePipeline or Step Function. I've spent actually a lot of time creating pipelines based on CodePipeline and CodeBuild and that might be a little bit of a hint as to where these services fit because you do end up spending a lot of time. We do have an open source project called SlickStarter which is like a serverless template project that you can use for exploring lots of different things you need to implement when you are creating a serverless project. One of those is continuous deployment and in there there's a CICD folder that has a CDK project that creates a CodePipeline and CodeBuild with all of the build phases, integration tests and multiple stages to multiple environments. It supports cross-account deployment and yeah that's probably a good example because to achieve something like that you do need to spend quite a lot of time but let's maybe talk about the advantages there first. So with CodePipeline you know everything is well integrated on AWS that's always going to be the advantage of picking the AWS native option. You can also maintain the pipeline code using the same infrastructure as code be it Terraform or CloudFormation or CDK as you do for the rest of the application so you can manage all those changes together and the way you do it is it's fairly consistent right everything is an AWS resource. It also kind of scales well right you've got the elasticity of AWS with your with your CodeBuild jobs and you know it's it's you can get notifications as well so you get SNS notifications on your code pipeline you can integrate that to Slack using AWS chatbot.
It's obviously quite well integrated into the AWS ecosystem but having used it quite a lot on multiple projects I still think there's lots of areas where CodePipeline and CodeBuild could be improved right so the main disadvantages I'd say would be like there's a steep learning curve right compared to the alternatives you you kind of have to design and deploy the CodeBuild projects understand how the services work it's never as easy as you will think you will always underestimate the amount of time you need to create these things. Unsurprisingly the user experience for both services is not as great as the alternatives you don't get like a nice single pane of glass for all your pipelines with expandable sections that you can quickly go into and go out of.
CodePipeline's overview for the execution is pretty good overview but it doesn't show you like multiple workflows very well if you want to drill drill down deeper you end up clicking across to CodeBuild and then the user experience is just a big log. There can also be a performance problem I'd say so if you've got multiple CodePipeline stages because you're trying to break it up into lots of distinct steps the transition step between each one can be a little bit slow now they did improve the performance of this but you still use S3 as an intermediate storage between your stages so there's always a push to S3 at the end of stage and a pull from S3 at the end of the next stage and if you've got you know a substantial amount of data source code being passed around which is quite common these days that can really slow down your execution.
So that's a bit of a problem right because build and deployment speed is important it is really important for developer productivity I would say and we should always be trying to get that deployment time down as low as possible. I have seen people overcome that problem by like just getting rid of CodePipeline altogether and just using CodeBuild but the problem there is that when you're just using CodeBuild you lose that structured workflow everything is running one job you do have distinct phases but you don't get any visualization of that really it's just they're all just steps that are logged out to a log file. The last kind of disadvantage I'd say is that source providers be it from GitHub or elsewhere are quite clunky to set up with CodeBuild and CodePipeline. Setting up authentication there's different ways of doing it in CodeBuild and CodePipeline they also have a reasonably new thing called connections which is a little bit better but you still can't have triggers in CodePipeline from multiple branches. Everything else out there allows you to specify like a glob pattern for your branches to trigger from PRs. CodeBuild allows you to do that but CodePipeline does not. People end up then using a CodeBuild job at the start which uses a wildcard pattern on your branches and then this triggers the CodePipeline and it's just not as seamless as you would want. It seems like we've talked a lot about the shortcomings now of them. How does the alternative compare? Do you want to go through what GitHub actions is like in comparison?
Luciano: Yeah let's talk specifically about GitHub actions because seems to be kind of the main contender in the market even outside AWS just because even in open source everyone is hosting projects or almost everyone at least is hosting projects on GitHub and everyone is starting to take more and more advantage of the built-in GitHub actions to do all the automations around their open source projects so it kind of makes sense to also use all that knowledge to try to deploy applications in all sorts of different environments including AWS and the experience is actually fairly simple in my opinion. I've been using this extensively for open source not as much with AWS but for open source I have a very good grasp on what is the process to kind of create a workflow and make it run and it always starts with a YAML file that is generally created in the root folder of your repository in a actually it's not in the root folder there is a special folder called.github and in that folder you can create another subfolder called workflows and then every YAML file that you define in there is kind of automatically becomes a pipeline or a workflow if you wish that will be executed depending on the condition that you specify inside of your YAML file so this is kind of what makes that integration almost seamless because you don't really have to go and call APIs or click around the UI to enable a specific workflow you just create a file and as long that file exists and it's well formatted your workflow exists and will be executed according to what you specified inside the file in terms of AWS of course there is a step to integrate the two systems together so GitHub needs to be aware of a particular role that needs to be assumed in order to have permissions to do all sorts of different actions and this is something that can be done using OIDC provider and used to be a little bit more cumbersome in the past but I think now is a little bit more simplified so maybe we can go in details in another dedicated episode but yeah basically you create an OIDC provider for GitHub actions and at that point GitHub actions is able to assume a role and all the permissions related to that role so in terms of why is this better we already mentioned that looks a little bit easier to define our workflow and make it run because it's just creating a YAML file with a quite simple syntax but there are a lot more advantages and first of all that is already like if you are already using GitHub as a source your repositories are in GitHub that that's literally it like you don't need to create another source or another connection or you just create files in the same repository and that's it so that that integration is very seamless and the other thing is that it's very easy to have conditionals for instance if a particular step is something that maybe you want to run only in the main branch or maybe you want to run if that commit was a tag you can have all sorts of conditions actually the language is very flexible that is very very easy to do there is literally an if attribute in the YAML statements and that if attribute has its own expression language and it is quite simple but at the same time powerful enough for most use cases and another thing that i really like that is also very simple maybe in comparison with code build and code pipeline is the matrix feature so for instance a common use case this is more maybe when you are building a library you might want to run the unit tests against different versions of your runtime let's say it's an OJS project probably you want to run the tests against i don't know node 14, node 16, node 18 just to make sure that people using different versions of Node.js can use that particular library without problems this is extremely easy to do with GitHub actions because you literally have to define a property that says i have these three attributes that are variations of my pipeline and the attributes will be node 16, 18 and maybe 14 and then you can use these variables inside your action for instance you're probably going to have a setup step that says configure the version of Node.js and use that variable and at that point um GitHub action is going to take all the variations that you specify for every type of attribute for instance Node.js versions you might have also a imperative system it's going to do a matrix with all of them and basically it's going to execute all the variations for you and the UI is actually very sleek in making you see all the different variations that are running by default they are executed in parallel so the amount of configuration is very minimal and the result is quite powerful the other thing that is very nice is this idea of third-party actions that is basically for most cases you might want to do something that is very common for instance setup Node.js or authenticate against AWS like in in other CI systems you need to write your own bash script using specific CLI utilities to do certain things and that's always either a copy paste which is a little bit annoying or something that you need to figure out every single time and then you end up copy pasting from your previous repository using GitHub third-party actions basically what you do is like you are importing a module and then you say do this thing and use this particular configuration for instance the the action called setup Node.js is either you say reuse this third-party action which is provided by GitHub itself and say you only need to specify the version of Node.js that you want there are of course other parameters but it's literally import this module initialize it with this configuration I don't want to know exactly what's happening behind the scene but I know that it's going to solve this particular use case that might also be a little bit of a problem because you might start to think oh what about supply chain attacks what if I'm using an action that maybe is not trustworthy and people can use that as an attack vector this is definitely a concern I'm not sure of course but the good news is that GitHub has 50 official actions that they maintain themselves and these are the most commonly used ones for instance setup node setup java all the basic building blocks that you might find on all sorts of different programming languages and runtimes but they also do this thing called verified by GitHub where the most commonly used actions they actually audit them to certain extent I'm not really sure to what degree but they will tell you we kind of spend some time and this is trustworthy so if you see that badge you might be a little bit more confident that it's not gonna create security problems for you I still recommend you verify the source code because all these actions are actually open source so it's actually another repository on GitHub so you can literally read all the code they generally run as containers so what happens is that they will pull the code from that repository and run it as part of your workflow so you can literally see exactly what's going to happen and you can also tag specific commits on that repository if you really want to be sure you are running a specific version that you have been auditing so this is just as a suggestion if you if you want to be really cautious about importing third-party source code into your pipelines what else I think oh yeah there is another interesting point regarding self-hosted runners so in general when you use GitHub actions the pipeline is running on GitHub infrastructure and of course that comes with a cost that maybe will detail a little bit later but you can if you don't want to run your code your pipelines in GitHub self-host in GitHub runners you can self-host the runners yourself and there is like an agent that you can install in anywhere where you want to run your code it might even be I don't know a raspberry pi connected to the internet if you really want to and at that point you just need to register the workers with a particular project and GitHub will dispatch the the workload to to this managed runners that you manage yourself um yeah I think that's all we have so maybe in terms of disadvantages let's see what we can say uh in comparison with CodePipeline of course so yeah we said that there are a lot of things that are simplified because just the user experience of GitHub actions is is different from what you get with CodePipeline but at the same time you need to do this additional step of making sure that credentials are actually set up correctly and that you configure this OIDC provider to allow GitHub actions to authenticate against your AWS accounts yeah um if you use multiple environments that comes with a host so also that's something to keep in mind some people complain that is not the most reliable service it has been going down a few times and even if you even if you use your own workers that if the let's call it the data plane I don't know if it's the most correct terminology but the orchestration plane whatever you want to call it if that one goes down your workers on your own infrastructure are not getting triggered anyway so that that's also another case where you are not 100% in control and also self-hosted runners are they have some quirks I heard people complaining about they are also not reliable in different ways and they seem to be a little bit different from what you get in the hosted the managed GitHub runners in in a sense that there are some subtle differences in behaviors and it's not obvious when they appear so your mileage might vary but be careful if you use the self-hosted runners make sure you test them
Eoin: because it might not be 100% the same experience you get with the GitHub runners yeah it's probably a good idea I suppose just to try and use the managed runners where possible unless for some
Luciano: compliance reason you need to keep those that build running on premise and then we can also talk very quickly about pricing so if you're if you're doing an open source project this is actually the best part it's totally free like if you're building a library and this library is your repository is public you can build as much as you want and it's literally free so this is actually really nice because it gives you an opportunity to experiment with GitHub actions without having to worry too much and it's funny that you can even do like scheduled executions for instance for the AWS bytes website what we do is every Friday at midnight if we have a new episode coming that will be released that day the website automatically build itself and it will show the new episode and this is something we do entirely for free because the whole website is open source in a public repository so GitHub gives us all that service for free so that can be very nice in different ways but of course if you are building a startup you are not going to publish probably all your source code publicly right so what happens when you need something private something more enterprisey you actually the pricing is really interesting because on GitHub it's like you have a bunch of different services like repositories GitHub actions I think maybe even copilots now as part of all the different services that they offer and you don't buy them like individually it's kind of a one plan where you pay seats for developers and you get access to a certain amount of features for all the services and in the case of GitHub actions you get 3 000 minutes per month I think per seat and that seat is 45 dollars per year so my understanding is that if you need more than 3 000 minutes of build time this is of course using their own workers the GitHub workers you probably have to buy more seats so buying more seats will give access to more developers but this is going to give you more build minutes yeah I'm
Eoin: not sure I'm a big fan of that just because I mean why why is the number of builds going to be tied to the number of developers it seems a little bit like those those things aren't necessarily going to scale linearly I mean I would joke like everybody knows every time you add a member to your team your productivity goes down anyway because you have so much more coordination to do so maybe it should be the opposite anyway I digress yeah on one side I appreciate that they
Luciano: are trying to keep it simple and you don't have to worry about too many dimensions and how they can affect your pricing but at the same time it's probably true that you might have specific use cases where I don't know maybe you end up paying so many seats just because you need more build time but you don't really have as many developers right so yeah there are kind of implicit dimensions and I guess if you happen to be in the standard use case you're probably fine but if you deviate from that standard use case I don't know maybe your pricing is not going to make that much sense anymore and then what else can we say there is a thing called GitHub engine that is not going to say there is a thing called GitHub enterprise that is 231 dollars a year I think this is per organization right not per user it's also per user per year okay and that one gives you environment
Eoin: production which I'm not really sure what that is yeah so that means if you want to have you know rules conditions that specify under what conditions can you deploy to production so you don't allow everybody to create a bill that can trigger release to production those kind of conditions then you need GitHub enterprise for that which is you know it's it's something that I guess a lot of people might want and it's kind of unusual that you would need to go from 45 to 44 dollars a year to 231 dollars a year just to get that so I get your mileage is going to vary like pricing could be very work out really well for you with GitHub but it could also get expensive if you've got
Eoin: long running builds with you know small startup so in comparison to all of that how does code pipeline and code build work what is the price in there I think I think it's a little bit I guess just more linear in terms of the number of pipelines and build jobs you have so code pipeline is one of the simplest AWS pricing sheets out there it's a dollar per month per pipeline and that's it whether it runs or not you get one free on the free tier and code build then it depends on the instance size so you can configure different instance sizes the kind of standard one is general one medium that's a cent a minute dollar cent per minute for Linux you can also do Windows builds are more expensive you can go down as far as the smallest arm instance which is like a third of a cent per minute and if you want a really massive GPU on its own it's like 65 cents per minute so if it's but it's just based on the number of minutes you execute and there's there are some quotas but you can get the quotas increased so I would say that it's one of the advantages of code build is actually it scales pretty well I have had cases on especially on new accounts where code build jobs can sometimes take a while to provision and I guess this is something that will kind of come up now and again as AWS add more and more infrastructure and as more people run code build jobs but I have found that sometimes even recently that it can you can
Luciano: end up waiting and prefer provisioning stuff so that's something to be mindful of yeah I was about to say that this is another kind of trade-off that with GitHub actions if you use the managed runners you don't really know on what kind of hardware you are running your code so if you need specific things like a GPU because I don't know if you're doing training models whatever you you're not necessarily going to get a fine-tuned experience there but if you either host yourself the runners then you can use whatever hardware you want but in code build that's a lot more kind of obvious that you're going to pay for the compute you actually use
Eoin: but at the same time you can customize that compute as much as you want yeah that's a good point maybe it's a good point to talk about when to choose one or the over the other in summary so I think maybe people have already made up their own mind based on what we discussed in the pros and cons I'd say like use code build code build and code pipeline if you have a good understanding of those services already and want this AWS service integrations also maybe if you're all in on CDK CDK actually has something called CDK pipelines which allows you to create all these things very simply for you with a self-updating pipeline we'll link in the show notes to a CDK workshop which is really good it talks about how you do that but in general I'd say if you should use GitHub actions if you want to reduce the amount of time developers spend on maintaining the pipelines because it's just a lower barrier to entry and it's not so steep a learning curve and if you don't need those specific AWS integrations code pipeline offers it's I would see that more and more people are going to choose GitHub actions to do this in the future so it would become the well-traveled path and code pipeline code build maybe you know it's still very widely used because it's just part of all the AWS services and it's reasonably well documented in terms of third-party community resources but not it will not be it's not the juggernaut that GitHub is so you won't you just won't find the same level of support after that yeah we're interested in everyone else's opinion also there's plenty of other third-party services out there circle CI I've used in the past as well which has also been really easy to set up and comparable I think in to GitHub actions in a lot of ways in terms of resources if people are looking for other places to to go one of the reasons why we were inspired to talk about this topic today is because Paul Swale released a really good article a couple of weeks ago called why I switched from AWS code pipeline to GitHub actions it's really excellent article and well worth a read we've also linked to a tutorial showing how you can set up authentication between GitHub actions and building and deploying a web app to EC2 we'll also link to our previous episode which is on the same theme when to use other alternatives to AWS services so please check out that last episode if you haven't heard it already so thanks for listening and we'll see you next time