Help us to make this transcription better! If you find an error, please
submit a PR with your corrections.
Eoin: The public cloud gives you amazing machine learning powers with a low barrier to entry. Once you know where to begin, you can quickly build solutions to process images, video, text, and audio, as well as data. Today, you will hear about all the available managed AI services on AWS that require zero machine learning expertise, services you can use to run custom models, some different use cases, and some of the things you need to consider before you do machine learning in production. My name is Eoin, I'm joined by Luciano, and this is the AWS Bites podcast.
Luciano: So, okay, when we're talking about machine learning, I'm always very confused because it's a very broad and loose term. So today, what kind of machine learning are we talking about? Let's give it a somewhat of definition.
Eoin: I think we're mainly focusing today on the latest generation of machine learning, so built on deep neural networks or deep learning, as it's called. So in the last 10 years, there's been a big leap forward in machine learning, mainly down to three things. One is the availability of massive amounts of data from internet scale web and mobile applications out there. Another one is the availability of the cloud and scalable compute resources to do training and run machine learning. And the third one is the improvements in the machine learning algorithms themselves, and specifically the advancements in deep neural networks, which have allowed us to do all sorts of things like natural language recognition, image recognition, and those are the kinds of things that drive a lot of the machine learning services we see today, like Alexa, or maybe GitHub Copilot. Dali is another one which people might be familiar with now, right, which allows you to generate images from an arbitrary description.
Luciano: Yeah, no, that's super cool. One thing that I'm always fascinated about when talking about machine learning is that there seems to be like a very big and long process every time you want to come up with, I don't know, a new model that solves a specific problem. So what are the different steps or the different main areas that are always present when you want to do ML?
Eoin: Every machine learning expert I talk to or data scientist will say that preparation of data, so getting your data, preparing it, cleaning it, normalizing it is at least 80% of the effort. So that's the first one. And then you have training. So creating the model from that data. So that could be the bulk of the rest of your effort. And then the inference part itself, which is like running your model in production, that's like 5% of the effort really. So it's kind of an 80, 15, five split. So there's a huge amount of effort required to train, including acquiring and preparing all your data. And that makes kind of pre-trained managed services very appealing. I mean, specifically for me, because I'm not a machine learning expert. I'm very interested in the topic, but I'm always looking for managed services that take all that heavy lifting and the need to have very specialist skills away from me.
Luciano: Yeah, I suppose also the cost. Like when you put all this time into that particular phase, if you can just use that as a service, you're probably saving a lot of costs and a lot of time to production, right? So what kind of services do we find in AWS that give us that kind of experience? You don't need to think about models and preparing all the data upfront, but more as a user, I just need this feature. Just give me a service that is gonna do that well enough. And maybe I can fine tune a few things for my particular use case.
Eoin: There's a lot here. So let's run through them really quickly. So AWS Recognition is one of the most popular and that's the computer vision one. So if you've got images or videos, it can process those and recognize people or text within photographs or objects. It's even got a feature that allows you to recognize celebrities in an image. So that works on images or video recordings and also streaming video.
Then you have text to speech and speech to text. So Polly is the Amazon service that does text to speech and Transcribe is the one that does speech to text. There's a couple of the newer ones that are more kind of business oriented. Like forecast is about time series forecasting. So if you can imagine a use case, if you have all of your historical sales data for your e-commerce platform, and you want to project and predict your sales for Q4 2022, you could use forecast to help you to do that.
Of course, like events can change that. I heard Mark our fourth year machine learning expert mentioned that, you know, at the start of the pandemic, a lot of people were using forecasting models like this and they all became completely useless in the face of world events. Personalize is another one then, which is also based, a lot of these services are based on amazon.com retail models that they've used and have trained based on their data personalizes one of those.
So if you're browsing amazon.com, you often see product recommendations based on your browsing history and your purchase history. So personalize gives you the ability to do that within your own service. So if you're building the next version of Netflix and you want to do recommendations on video titles, that's the service you'd go for. Comprehend is one I've used quite a lot, which is for text analysis.
And that allows you to do named entity recognition. So if you've got a document and you're looking to identify the people, the places, the dates, the locations in that document, Comprehend will do that for you. And it'll also do sentiment analysis. So if you want to monitor social media feeds about your company and figure out if people are complaining or very happy and respond accordingly, you could use Comprehend for that.
If you want to do something more high level, kind of on that theme, Lex is the chatbot, one that a lot of people might be familiar with. So this is if you want to have an interactive chatbot in your mobile app or a webpage, that's machine learning driven because it'll understand the intent of what the user is trying to say and allow you to direct a conversation. So it's like an orchestration for an interaction between a customer and a robot representing your company. The last couple we've mentioned there are Textract, which is very useful for document processing. So if you've got images, PDFs, and you want to extract all the text out of them, they might have tables and you want to extract that out as structured data, Textract will do that for you. And then there's Translate. So Translate is for translating from one language to another.
Luciano: That's super interesting. But I suppose that all the services, they are available to all different kinds of industries. So probably there is like a good enough expectation in terms of results. But I assume that given more specific use cases, you probably want to fine tune something. So what are the options there to adapt these services and be more accurate for specific use cases?
Eoin: Yes, some of those services allow you to cross train with your own data. So one example of that would be recognition. Again, so if you're processing images and you want to identify something that it doesn't recognize out of the box, now the default set of labels that it does recognize is quite large. You can go onto the website and download a CSV of all the labels of things they identify. It's a long list.
But you can also add your own. So depending on your own business case. One example of that, and I've seen a few companies try to do this is if let's say you're a global consumer goods company, and you've got 1500 brands in your portfolio, and you want to monitor social media for people taking pictures of your product, and maybe commenting on your product. So you could process images you see on Twitter, pass them through to recognition, but train recognition to recognize your logo or your products, images of your product, and then label them accordingly. Then you could maybe use comprehensive sentiment, analyze the text. And if there's negative sentiment about your brands, you might route that through as a support query to the customer support department of individual brands. We actually, the book, myself and Peter wrote, the AI as a Service book. We have examples that do something like that with Comprehend for social media sentiment analysis. This is another few use cases in that book actually. So we'll link to that in the show notes. Yeah, that's super cool.
Luciano: And I suppose being AWS, one of the cool things is that probably all the services are very well integrated with all the AWS ecosystem. I'm going to assume that you just have, as part of the SDK, access to all these different services. So you can, for instance, call different features of the service from a Lambda or from some other compute service. Is that the case? Or is there something else to be aware of there?
Eoin: Yeah, exactly. That's how you use it. It's all through the SDK. Of course, many of these services have console access if you need to do very ad hoc workflows with them, but the SDK is the way to go really. If you're going to integrate it into any kind of a application workflow, and it then is a very good fit for serverless because you can imagine that if you've got images with data arriving in S3, and you want to respond to that and analyze it, things like Lambda and Step Functions really help you to stitch that all together very well without having to put in a load of additional compute infrastructure.
Luciano: Awesome. What if instead you want to actually go really into the depths and build your own models or do more advanced research and more of your own ML? What kind of tools are worth considering in the AWS ecosystem? Tools or services, of course.
Eoin: You have a lot of options there outside of AWS and on AWS, but of course, when you start getting into custom models and training, then you start to think about large amounts of compute and also GPUs. So there is SageMaker, and SageMaker is difficult enough to comprehend when you're coming into it for the first time because it has a large number of features with very confusing names. So maybe we'll try and digest that a little bit in a somewhat clear way, and we can start with notebooks.
So anyone used to data science and machine learning development is used to working with notebooks. So that could be Python, Jupyter notebooks. Now you also have RStudio in SageMaker. I don't know what that's like, but I've used the Jupyter notebooks version and it works pretty well. You also have now SageMaker Studio, which is like a notebooks, but it's in its own, has its own domain and account system.
So you can actually use it inside of your AWS accounts. It's a bit like Google Colab. I'm reading you one called SageMaker Canvas, which is their attempt to build like a no code machine learning. And this is mainly concerned with processing like tabular data. From what I saw when it first came out, it's still a little bit limited in what it can do and its feature sets, but the idea is good, right?
Eventually you want people who have business domain knowledge, but no machine learning knowledge to be able to train their own models. If you are training, then a big part of the job, you mentioned 80% of the data of the effort is in preparing data. So there is a service in SageMaker called Ground Truth and Ground Truth will allow you to label that data and manage all of the human interaction that's required to take individual items in your data set, label them, mark them as labeled, quality control your labeling, also using Mechanical Turk, if you want to outsource a lot of that effort and also with some machine learning assistance as well.
So it can even do things like creation of synthetic data samples within your data set. Then when it comes to actually training itself, so SageMaker, the main really, I would say, if you're a real serious machine learning developer, probably the main benefit is the training infrastructure, because it will allow you to build clusters of GPU containers that can be used to train your model. And then it can also help you to tune your model.
So with like deep learning, the typical processes, you have hyper parameter tuning where you're constantly tweaking different configuration parameters to your model, you rerun your training model, then you test it against your test data, and you see if your model is improving or disimproving. And the SageMaker training platform is designed to make all of that a lot more automated compared to the typical manual flow.
The most important thing at the end of it is getting it into users hands, and that's where inference comes in. And that's basically when you run the model in production, you pass the data and you get results. So is there a cat in this picture being a canonical example maybe? And this is also container based. You deploy your model, you deploy like a Python wrapper file and you get back a HTTP endpoint.
And you can access that publicly, maybe from your web application, or you can access it just from another system within your microservices architecture, however you're building your application. That was typically like a provisioned mode where you pick a size for your infrastructure, whether you need GPUs or not, and you run an endpoint and 20 minutes later it becomes available and you can call the HTTP endpoint. But now they also support a serverless mode. So you deploy your model and then SageMaker will scale the infrastructure for you based on the mental traffic.
Luciano: Awesome, so one thing that is interesting is that the work that we do at Fortier, it's not as much on the side of let's do ML research, create new models, but it's more on how do we take existing models and to put them in production the most optimal way. So generally we think about, I don't know, what kind of AWS infrastructure do we need to use for data pipelines, model deployment, recording results? What kind of costs are we talking about? So can we optimize anything to save money, like not overspend on all this infrastructure? And yeah, we generally start by taking an existing model and think how do we put this in AWS in the best possible way? So do you have any, I don't know, comment or suggestion that what are the options or the different topics that we generally explore with this kind of use cases?
Eoin: Yeah, it's very common that people have these really interesting models and the next step is like, okay, how do we integrate, how do we give this to our customers? And you end up with some prototype that works on somebody's laptop or in a data center somewhere. And the question is, okay, how do we make this production grade? And it's really typical applying typical software engineering best practices and just applying it to this machine learning application, so just thinking about, okay, how do you manage your data?
What's the deployment pipeline for your model? How do you manage very different versions of your model and the process for going from one version to another, testing and rolling back? What do you do with your results? So one good thing you might want to put into practice is that every time you do inference, you record that result and store it with the data so that you can feed that back into future training exercises.
You mentioned like performance optimization and cost optimization. These are like typical well-architected things, but you're also applying them to machine learning as well. So looking at the compute infrastructure and how scalable is it, how manageable is it, what observability do you need, and how do you optimize that cost versus performance trade-off? And yeah, it's quite typical. Yeah, the people give us a model and it's like a PyTorch model. There's something trained in TensorFlow. And then we're saying, okay, well, what's the best workflow for this? Do we need to build training infrastructure or is just inference infrastructure? And if it's inference infrastructure, then there's quite a few compute options actually. And surprisingly, maybe we can discuss this, but SageMaker inference is something we've used quite a lot, but it's not our number one go-to service for inference. Yeah, that's super interesting.
Luciano: What about other considerations? Like, I don't know. I know that every time we talk about AI and ML ethics, for instance, it's one of those big things that you'll need to address somehow. What does AWS do from that perspective? Or maybe the customers on there needs to be aware when using AWS.
Eoin: Yes, if you've been reading about machine learning, you probably have seen many, many instances of bias problems in all aspects of machine learning. And just because you choose a managed service doesn't mean you don't have any responsibility there. There's a couple of interesting studies into this that we can also link in the show notes. There was one from MIT that looked into recognition and demonstrated gender and ethnic bias in its accuracy in doing person recognition.
There's also the case of recognition is actually used in law enforcement and security industry in a number of places. And back in 2020, Amazon announced like a moratorium. So they were suspending recognition for police use in the States for one year, because they were asking for the legislation to catch up to make sure that the technology would be applied in an ethical way. So these kinds of things kind of show that we have a long way to go before we can just deploy these things and use them, particularly when it relates to classifying people or labeling people.
So this is everybody's responsibility. Just because you're using a managed service doesn't mean you can just say it's somebody else's problem. What are the other considerations? So beyond the ethical one, which is probably the most important one, we mentioned compute, right? And I said, SageMaker Infosys isn't the number one. The reason for that is that, as I mentioned, it could take like 20 minutes for your endpoint to come up and running.
It's also can be quite expensive. Number one, service for inference, a go-to is Lambda actually, because if your model can fit in 10 gigabytes of space or on a BFS volume and can run with less than 10 gigabytes of RAM, I would choose Lambda every time, really, because it scales so quickly and is so responsive as your inference workload rises and falls, much better compared to SageMaker and much easier to set up and much easier to deploy.
So I've just had a way better experience deploying machine learning to Lambda. As a developer, for the customer, in production, for performance and scalability, it's just always ticking all the boxes. So SageMaker will be further down the list. Another thing then, when you're looking at cost and performance is maybe look at edge machine learning rather than doing all of the inference in the cloud.
So if you have a mobile app, maybe it's better to do the machine learning on the mobile device, leverage the consumer's computing power rather than your computing power, then you can get some cost benefits, performance, because you don't have any round trip to the cloud. And there's also a data privacy element there as well, because the customer's data doesn't have to leave their device. So there is SageMaker Edge, another SageMaker service, which allows you to run these models on the edge, as well as a lot of alternatives. Like all the deep learning frameworks have a mobile equivalent. I think that's pretty much it. And then going back to like the typical software engineering practices, considerations before you go into production is all of those continuous deployment, best practices, change management, observability, governance, reliability, all of those well-architected pillars have to be applied to machine learning as well.
Luciano: Okay, last question that I have is, I guess up to which point it is correct, I guess up to which point it is convenient to go with managed services compared to let's build everything from scratch ourselves. So obviously talking about pricing, like what's the price equation? Is it convenient to just go with AWS because the price is reasonable for most use cases, or there are maybe the price is just enough expensive that sometimes you might want to consider, I'm going to spend more time, but build my own thing in the long run is going to be cheaper.
Eoin: Well, it depends on your use case. Maybe we can give two examples. So let's say example one is you're a technology provider that is selling cutting edge devices that do Fox recognition for small holders with chickens. So you have some device with a camera that you put in the chicken run. And if it spots a Fox in the field of vision, it's going to maybe play some sort of sound that will deter the Fox.
So if you connect that up to AWS recognition, you can imagine that if you're doing a number of images and the number of Foxes are being detected every night, that's not too bad, right? Your volume is low and then each small holder is playing a subscription. So it would probably cover your cost with AWS recognition. The other one is like if you're airport security and you're scanning all of the incoming passengers in real time on a video stream, right?
So then you're running multiple images per second. So a million images in recognition is about a thousand dollars. So for your chicken cam, that's probably very achievable, but for your airport security, then it's probably a case where you would say, look, now it's time to run our own infrastructure and do machine learning. It's something more cost-effective because I think a recognition at scale, it would want to be very valuable to justify the cost of million images for a thousand dollars.
When you're talking about your own inference like SageMaker versus Lambda, SageMaker, if you look at the price, it's like four zeros and a two cents. So what is that? 20,000th, two 10,000th of a cent, but that's about a thousand times more expensive than Lambda from what I can see in the per second cost. Now, a thousand times sounds extreme, but remember that a single SageMaker endpoint can process multiple images concurrently. A Lambda processes one event at a time. So it's not exactly comparing apples to apples, but again, another reason to go with Lambda or even Fargate if you want to use containers, because I don't know, SageMaker, unless you need a GPU, which you don't always for inference, that will come down to your performance requirement. But if you don't need a GPU, then SageMaker, you don't need it.
Luciano: Right, is there any resource that we can suggest to people as a final thing?
Eoin: Yeah, so myself and Peter Elger wrote the book, AI as a Service, and it's all really about the AI. It's all about that topic and how to use these managed services in a serverless way. So we'll link to that book. And I think our YouTube channel from Julien Simon is a really good one for people who are trying to explore the space, because he worked at AWS for a long time, and he has lots of really practical kind of use case driven scenarios where he shows you how to use these services. And it's very technical. He gives you very unbiased opinion. It's not full of AWS spin. It's really very factual and honest. And he's since left AWS, but he's still doing this kind of content and using SageMaker and reviewing these services. So I think his YouTube channel is really good resource for anyone doing ML on AWS.
Luciano: Awesome, and I think with that, we have covered everything we wanted to cover. Let us know what you think in the comments. And yeah, we look forward to know if you use any of these services, if you found any particular issue that is worth highlighting, or if you're actually just having fun and learning a lot of things, please share with us what are you learning and what are maybe the next topics you want us to cover. Until then, see you in the next episode. And we'll see you in the next one.