AWS Bites Podcast

58. What can kitties teach us about AWS?

Published 2022-11-11 - Listen on your favourite podcast player

Building actual projects is a great way to consolidate our understanding of AWS. In this episode, we present 4 different project ideas to explore services and concepts in the space of web application development, machine learning, and data science.

Ok, you are probably wondering where kitties come into the equation here. Every one of these 4 project ideas involves kitties! 🐱

We can learn stuff and have some fun too!

AWS Bites is sponsored by fourTheorem, an AWS Consulting Partner offering training, cloud migration, and modern application architecture.

Some of the resources we mentioned:

Let's talk!

Do you agree with our opinions? Do you have interesting AWS questions you'd like us to chat about? Leave a comment on YouTube or connect with us on Twitter: @eoins, @loige.

Help us to make this transcription better! If you find an error, please submit a PR with your corrections.

Luciano: The World Wide Web could have been something great, something to move humanity forward. Instead, it turned out to be a place where we spend our time slacking off and looking at cat pictures and videos. But honestly, that's actually cool. We can actually use it to our advantage. And yes, I promise you, we can learn about AWS while having fun with kitties. This is Luciano and Eoin and another episode of AWS Bites podcast for you.

OK, after this intro, I feel that I need to be a little bit more serious, but we really want to talk about AWS and ways that you can actually learn more about AWS. And we believe that you should be building projects on AWS to actually learn really what it takes to be successful with AWS. And of course, when you build projects, we can also have fun. So we can build projects that involve kitties. And we have actually prepared a list with four different project ideas that you can build on AWS so that you can practice different skills, different architectures and grow in parts like application development, data science, DevOps and even machine learning.

Again, this is just in the idea that on one side it's good to know the theory, maybe get some certification, but you really need to put those skills into practice if you really want to remember and learn deeply what it takes to build applications and projects on AWS. Before we get into the first project, I want to mention that AWS Bites is sponsored by fourTheorem. fourTheorem is an AWS consulting partner that offers training, cloud migrations and modern application architecture. You can find out more at fourtheorem.com and there will be a link in the show notes. Okay, with that being said, Eoin, what do you think? Should we start to discuss the first project? Yeah, the first project I really like and it's one that our colleague Peter Elger came up with for the book that I co-wrote with him, AI as a Service.

Eoin: And it's a cat detector project and it's a good bit of fun, but it's actually a really good learning one for people who are getting into AWS and maybe exploring some of the newer capabilities that you have there. Things like serverless technology, infrastructure as code, but also some of the machine learning services. And you do not have to be, like this might be useful for people interested in machine learning or computer vision, but it's certainly not a prerequisite that you have to have any specialty there.

That's the whole point of these managed machine learning services that anybody can use them for use cases like this. So obviously the use case here is a really critically important one that is detecting pictures of cats on the internet. So let me talk through briefly how this works. So there's a couple of services here, right? It uses things like AWS Lambda, SQS, Rekognition, that's the machine learning service, and API Gateway as well as S3.

And it shows you how to build up this architecture slowly piece by piece. Now we've got a workshop as well. So there is a GitHub repo with a workshop that goes through this step by step. You can have a look at the book if you want to, but you can just follow the workshop as well and you can see the code. So it brings it through fairly step by step. And it's not a particularly complicated architecture, but it shows you a couple of things like event driven communication over SQS, as well as showing you how Rekognition works.

And the Rekognition part is actually quite simple. Everything is really managed for you. So what this application does is it allows you to bring up a front end, very simple web page where you can enter a URL. We'll go off to that URL in Lambda, pull down the HTML, find images on that web page, and then we'll take the images and put them on S3. And then we'll submit the images to Rekognition and it will identify objects in that.

And then the back end will also kind of generate a word cloud based on the objects detected in the image. So hopefully cats. And it will also like generate, use the confidence of objects that it has found to generate that word cloud. So then in the front end, it'll be able to render the image and what it's found. So hopefully you'll see cat and you might even see things like the breed of cat as well. So this is something you could probably work through in a couple of hours. I know Peter and myself have presented this workshop at various conferences and things. And it only takes a couple of hours to get through. I think it's really nice one to just get a feel for what modern application development is like on AWS. Yeah, I really like this one. I remember that I saw a similar one. There is a website called Kaggle.com where you can find interesting datasets.

Luciano: And one of the datasets, we'll put the link in the show notes, is basically a dataset where you have pictures of cats and dogs. And basically you can use that idea to try to build a classifier and distinguish between the two. So there could be an alternative project that you can build to try to develop the same skills of the project we just mentioned. But moving on to project number two, what you could build is HTTP cat clone.

And if you don't know HTTP cat or HTTP dot cat, that's literally my favorite website ever. And it's basically a website that is built to make sure that you easily remember what is the meaning of HTTP status code. So if you really want to have a quick feeling, what does that mean? Can you remember what is the meaning of the status code for one eight? I can never remember. You always find it referenced everywhere because it's kind of a web joke.

So if you just want to find out, just go to HTTP dot cat slash four one eight. And what happens is that you will get a funny picture with a cat. But that picture also describes the meaning of the four one eight HTTP status code. So the idea is, OK, what would it take to try to rebuild the same website, literally a clone of this website by using AWS? And I have at least three different solutions in mind and we can talk through them.

And I think every different solution tries to exercise different architectures and different services you can use in AWS. So I'm not going to be mandating which one is better. I think they are all equally viable. It's going to be more on you to decide which kind of tools we want to practice with and then maybe pick one approach or the other. So the first approach would be we could build this website literally as a static website.

At the end of the day, there isn't really anything dynamic. It's just a collection of pictures mainly. But we have some HTML and CSS. So once you create all of that and you can use any static site generator, for instance, like Astro 11, whatever you like. There are hundreds of them at this stage. You will be able to eventually will have a collection of assets, HTML, CSS, images. So you need to figure out, OK, how do I put this asset into production so that I can have a public URL that other people can visit?

And one of the simplest solutions is you could be hosting all these assets in S3. So that becomes kind of your place where you put all the files. But then you need to figure out a way to serve all these files as a website. And with S3, you can easily enable a feature that is called S3 websites. But that feature, although it works and for a website like this, it might be just enough, it doesn't support HTTPS.

So if you also want to test how to make a static website and serve it over HTTPS, you can also use CloudFront. And serve the website through CloudFront, which also gives you additional advantages because CloudFront is a CDN. So you are actually replicating all these assets around the world and reducing the latency with the actual users. So that could be approach number one, a static website, S3 and optionally CloudFront.

Another approach could be you could build this as a more traditional website. While it's true that there isn't really a lot of dynamic stuff, nothing is stopping you to still build a web server that is there, accepting requests and deciding which assets should be served back to the user. And to do that, you have a number of different options in AWS. For instance, you are free to pick whatever web framework, web application framework you like.

If you are in Node.js, for instance, you could be using Express or Fastify. If you are using Python, you can be using Django. Any language really has a lot of options for web servers. At that point, it's up to you to build the application and then decide how do we ship the complete application with all the assets to AWS. And again, different options. You can just go for an EC2, figure out a way to just copy all the necessary software and code into an EC2, spin it up, connect a DNS.

And at that point, you basically serve traffic on the public web using an EC2 as a backend. Similarly, you could be using Fargate. So if you prefer to containerize all of this code, you can go and deploy it on Fargate. Or other alternatives could be you could be using something that is more of an application backend like Elastic Beanstalk or AppRunner that will give you a bunch of tooling already out of the box when it comes to facilitating deployment or scaling things up if you get a lot of traffic.

So those could be other options to explore when you want something a little bit more out of the box and more kind of production ready. And the third approach could be you could imagine this website a little bit as an API and build it with Lambda and API Gateway. There are interesting concerns that comes in at that point because with Lambda and API Gateway, it's very easy to serve JSON or kind of structured responses.

When it comes to serving files, you have a bunch of limits. For instance, your payload cannot be more than six megabytes, which should be enough for this kind of website. I don't expect the images will be more than six megabytes, but still you need to figure out how do I encode a response that contains a binary payload like an image. So there are different approaches there. Again, you can just use S3 and maybe create presigned URLs, or maybe you can just use S3 and CloudFront and then serve the images of all the other static assets from CloudFront.

But again, it's up to you to experiment and figure out how practical this solution is. But if you really want to use Lambda and API Gateway, I think there are ways to make all of that work. Now, one interesting thing is that if you go for option two or three, because you have a backend at that point, you can start to do something a little bit more dynamic. For instance, you could create an endpoint called slash random that just gives you a random image every time.

And you could use it for fun just to discover new status codes that maybe you're not aware about. And you could also consider doing that with option one if you really want, maybe trying to use something like Lambda at Edge to kind of intercept specific requests to CloudFront and then serve that response dynamically using Lambda at Edge. So again, option number one, even though it seems very static, you can still do something more dynamic if you really want and you have an opportunity to explore Lambda at Edge. So, yeah, I think that's probably more than enough ideas for how to build applications and websites on AWS with a bunch of different architectures. So I guess let's move to project number three. Just before we do, it's probably worth stating in the interest of fairness that there's also a HTTP.dog.

Eoin: If you want to go down a more data science path and learn how to store structured data on AWS and run analytics queries, there's a huge amount to learn here and I think this is a really big growth area and one where there's a lot of skills sought. So it's really one good one to get into. So we found a cat breed dataset on Kaggle again with more than 65,000 records and pictures. There's a link in the description.

This dataset can be used to train ML models, but since we have a lot of data, we can also use it to just try out some data analytics and exercise those data analytics muscles. So we could take the index CSV file where every record will reference a cat picture and provide other labels like the breed of cat, age, gender, size and coat. So let's say, what could we do with this data? We might want to run some queries and find out what's the most or least common age in this dataset.

What's the distribution of gender and size? Or we could even try to combine different attributes and figure out what is the most and least common combination of breed and gender. So some very simple statistical operations. Now, of course, this is not big data. You don't necessarily need to rely on the power of cloud. You could do this in Excel probably pretty easily. Or you could write a local script or a notebook using Python and pandas and process a CSV file that way.

But you can still use these small datasets and use very powerful cloud services just to get really quick results and then try and think about experimenting with larger datasets. There's lots of datasets out there, including Amazon has a public open dataset with a public bucket where you can pull down much, much larger datasets. So this is more like step one on your journey. So since you're here to learn about AWS, what could you do with this in the cloud?

So some options would be option one, say, sticking with the idea of a simple notebook. You can use SageMaker Studio or SageMaker notebooks and load in the CSV file using pandas and some of the Python data science kit. Maybe if you're into R, you can also do RStudio now in SageMaker as well. Option two would be to put the data in an S3 bucket then and to use other services. Athena being the probably the most obvious example.

So you would create like an external table in Athena and then you could start running queries there. Again, with a small dataset, you're not really showcasing the power of Athena, but you're showing how you can query data on S3. These projects are always a little bit more fun if you can add some visuals in there as well. So apart from visualizing things in your notebook, you might also want to try some BI dashboards and spin up QuickSight on Amazon as well and try some data visualizations.

You can do some really cool stuff there as well. So for people who are looking to kind of get into basic data analytics and progress a career and look at data science and data engineering on AWS, it's a good place to start and you can grow from there and then start looking at all the other services like Glue and Elastic MapReduce, EMR and many more, even things like Lake Formation if you're getting into enterprise data engineering. So I think that's number three covered. What have we got for our final exercise? So another idea could be still focusing on the realm of application development, more specifically in the realm of APIs development on AWS.

Luciano: We could still use the same cat breed CSV file that we mentioned in the previous idea, but this time, rather than just using for data analytics, we could use it as a data source and build an API on top of it. So we could expose some of this data to a RESTful API. And one idea could be, OK, what kind of APIs can we expose? For instance, people might want to know what are all the different breeds of cats that are known, at least in this dataset.

So we could create an endpoint called slash breeds. When you call it, you get this list with all the names of the different breeds. Then because this dataset has pictures, maybe you also want to list all the pictures for a specific breed. That could be funny. I don't know if you're trying to allow other people to build a mobile application where you can see which breed is the cutest or something like that.

Maybe you can display pictures and create a little game that way. So if you're trying to build the API behind it, an endpoint could be slash breeds slash breed ID and then slash pictures. And that should give you a list with all the pictures that are available in the dataset for that particular breed. And of course, you might also think how to make those APIs paginated. This is on you to decide exactly what the shape of the API will look like.

And finally, again, there could be another idea that you can just have a /random endpoint that just gives you a random picture and the details of that picture. So how can we build this API? We already mentioned API Gateway and Lambda, and this is definitely a very valid solution. But there is an opportunity here to experiment a little bit more. And you are not limited to REST. So why not try something like GraphQL? So maybe you can also think about AppSync, for instance, as a way to build an equivalent version of this API, but that exposes the data through GraphQL. In both cases, you still need to think about the data.

Where do we store all this data? Right. And it will be fine to store it in a stream, but every single time there is a request, of course, you don't want to load a big multiple megabytes CSV and manipulate it in real time. You probably want something a little bit more structured so that you can respond to the APIs very, very quickly. So an idea here is why not use DynamoDB? So that could be an opportunity to try to figure out, OK, given a bunch of data, how do I store it in DynamoDB so that I can query it efficiently for this particular use cases that I have in mind?

Or again, you can still think about, OK, I'm going to be a little bit more traditional, spin up an RDS, put the data there and then query through SQL. It's really up to you to decide how do you want to store the data and how do you want to consume it? And another thing that you could do is, again, try to think, how do I serve all the pictures? Because, of course, this is going to be the majority of your traffic. If people are using this API and then they want to eventually have access to all the pictures, you still need to be able to provide those pictures. There is a little bit of a shortcut there, because if you look at the CSV, in the fields that you have for every record, one of the fields is a public URL that is already on CloudFront.

Then you can just use that URL to provide access to the actual picture. But it might be interesting to try to think, OK, what if I had to do that myself? How do I actually expose this information? And again, you can just go down the route of, OK, I'm going to put all this data in S3, then I'm going to use CloudFront as a way to efficiently serve all the data. And then you can get links directly from CloudFront.

There are additional concerns that you might try to think about and see what kind of solutions are available on AWS. For instance, one would be authentication. What if you want to allow only authenticated users to consume this particular API? And of course, if you are hosting all of this, you will have to pay the bill on AWS. So maybe you want to offload some of that cost to your users. That's why they might be needing to have an authentication so that you can actually track the usage.

And it's actually really cool if you use API Gateway that you can easily create API keys and then you can create usage plans attached to those API keys. And that way you can make sure that a user is not abusing the system and making too many API calls that will result in an increased bill on your side. So you can experiment with all these ideas. Similarly, you can experiment with documentation. How do you serve a documentation to the users? And again, API Gateway supports some degree of Swagger based or an open API based documentation formats.

So you can try to experiment with those as well. And there are other topics like, for instance, can we do caching? This data is quite static at the end of the day. So maybe it makes sense to think about should we be using a layer of caching? And again, API Gateway has some options that you can explore. So I think this is just a very interesting project. And it's if you are coming into AWS as an application developer, I think it's very important to understand how do you build an API on AWS?

Because this is one of the most common topics as a developer that you will need to face when building projects on AWS. So really recommend that to try to experiment with this idea if you're going down that path of learning AWS as a web application developer. So this is all we have. These are just four projects that you can try to experiment with if you want to learn more about AWS. Let us know if you like them. Let us know if you come up with some variations of these ideas or maybe if you don't like cats and you prefer other pets, definitely send us links for your projects if you end up implementing them, because we would be really, really curious to see them live and working.

And one last thing that I want to mention is that if all these projects are a little bit scary to you because you don't really know where to start and you want something a little bit more guided, like you want to see somebody actually building something on AWS to give you the confidence that you know which steps you need to follow and you have some ideas on how to go from zero to something actually working.

We actually did a series of live streams where we built an application that is like Dropbox Transfer or basically an application that allows you to upload files in S3, get back a link and then use that link to share the file with somebody else. This is an application we built live so you can see all the live recording. We will put a link on the description and that can be another thing that you can try to rebuild yourself as another exercise to learn more about AWS and some of the more common services that people use on AWS. So with that being said, thank you very much for being with us. Remember to like and subscribe and give us reviews and feedback and we will see you at the next episode.