Retrieval Augmentation Generation (RAG) is a state-of-the-art approach to building question answering systems that combines the advantages of retrieval and foundational models (FMs). The RAG model first retrieves relevant information from a large text corpus and then synthesizes answers based on the retrieved information using FMs.
An end-to-end RAG solution includes multiple components: knowledge base, search system, generative system, etc. Building and deploying these components can be complex and error-prone, especially when dealing with large data and models.
In this post, we demonstrate how you can use the Amazon Bedrock and AWS CloudFormation knowledge base to seamlessly automate the deployment of an end-to-end RAG solution, enabling organizations to set up powerful RAG systems quickly and easily.
Solution overview
This solution uses the Amazon Bedrock knowledge base to provide an automated end-to-end deployment of the RAG workflow, using AWS CloudFormation to set up the required resources, including:
- AWS Identity and Access Management (IAM) role
- Amazon OpenSearch Serverless Collections and Indexes
- A knowledge base with relevant data sources
The RAG workflow allows you to use document data stored in an Amazon Simple Storage Service (Amazon S3) bucket and integrate it with the powerful natural language processing capabilities of FMs provided by Amazon Bedrock. The solution simplifies the setup process, allowing you to quickly deploy and start querying your data with your FM of choice.
Prerequisites
To implement the solution provided in this post, you will need:
- An active AWS account and knowledge of FM, Amazon Bedrock, OpenSearch Serverless.
- An S3 bucket where your documents are stored in supported formats (.txt, .md, .html, .doc/docx, .csv, .xls/.xlsx, .pdf).
- Amazon Titan Embeddings G1-Text model enabled on Amazon Bedrock. Model Access On the Amazon Bedrock console page, when the Amazon Titan Embeddings G1-Text model is enabled, the access status is displayed as follows: Allow AccessAs shown in the following screenshot.
Configure the solution
After you complete the prerequisite steps, you’re ready to set up the solution.
- Clone the GitHub repository that contains the solution files.
- Navigate to your solutions directory.
- When you run the sh script, it creates a deployment bucket, prepares the CloudFormation template, and uploads the prepared CloudFormation template and required artifacts to the deployment bucket.
While running deploy.sh, if you provide a bucket name as an argument to the script, it will create a deployment bucket with the specified name, otherwise it will use the default name format. e2e-rag-deployment-${ACCOUNT_ID}-${AWS_REGION}
After you complete the preceding steps in your Amazon SageMaker notebook instance, you can create a deployment bucket in your account by running bash deploy.sh in the terminal, as shown in the following screenshot (account number has been redacted).
- After the script completes, note the S3 URL for main-template-out.yml.
- In the AWS CloudFormation console, create a new stack.
- for Template Sourceselect Amazon S3 URL Enter the URL you copied earlier.
- choose Next.
- Enter a stack name and specify the RAG workflow details according to your use case, then click Next.
- Leave everything else at default Next On the next page.
- Review the stack details and select the Acknowledge check box.
- choose submit Start the deployment process.
You can monitor the stack deployment progress in the AWS CloudFormation console.
Test the solution
Once the deployment is successful (this may take 7-10 minutes to complete), you can begin testing the solution.
- In the Amazon Bedrock console, navigate to the created knowledge base.
- choose Synchronization Start the data ingestion job.
- Once the data synchronization is complete, select the desired FM to use for ingestion and generation (you will need to grant model access to this FM in Amazon Bedrock before you can use it).
- Start querying your data using natural language queries.
That’s it, you can now work with documents using RAG workflows powered by Amazon Bedrock.
cleaning
To avoid incurring future charges, delete the resources used by this solution.
- In the Amazon S3 console, manually delete the content in the bucket that you created for the template deployment, and then delete the bucket.
- In the AWS CloudFormation console, stack Select the Main Stack in the navigation panel and click erase.
When you delete your stack, the knowledge base you created is also deleted.
Conclusion
In this post, we presented an automated solution for deploying an end-to-end RAG workflow using Amazon Bedrock and AWS CloudFormation knowledge base. By leveraging AWS services and pre-configured CloudFormation templates, you can quickly set up a powerful question answering system without the complexity of building and deploying the individual components of the RAG application. This automated deployment approach not only saves you time and effort, but also provides a consistent and repeatable setup, allowing you to focus on leveraging the RAG workflow to extract valuable insights from your data.
Give it a try, see for yourself how it can streamline your RAG workflow deployment and increase efficiency, and let us know your feedback!
About the Author
Sandeep Singh He is a Senior Generative AI Data Scientist at Amazon Web Services, helping enterprises innovate with Generative AI. He specializes in Generative AI, Machine Learning and System Design. He has successfully delivered cutting edge AI/ML powered solutions to solve complex business problems across industries, optimizing efficiency and scalability.
Yanyan Chang He is a Senior Generative AI Data Scientist and Generative AI Specialist at Amazon Web Services, working on cutting edge AI/ML technologies and helping customers achieve their desired outcomes using Generative AI. He has a keen interest in exploring new areas in the field and is always striving to push the boundaries. Outside of work, he loves to travel, work out, and explore new things.
Mani Kanuja She is a technical lead for generative AI specialists, author of the book “Applied Machine Learning and High Performance Computing on AWS” and a member of the Women in Manufacturing Education Foundation Board. She leads machine learning projects in a variety of areas including computer vision, natural language processing and generative AI. She has spoken at internal and external conferences including AWS re:Invent, Women in Manufacturing West, YouTube webinars and GHC 23. In her free time, she enjoys long runs along the beach.