This post was co-authored by Michael Shaul and Sasha Korman of NetApp.
Generative artificial intelligence (AI) applications are commonly built using a technique called Retrieval Augmented Generation (RAG). RAG provides the underlying model (FM) with access to additional data that was not available during training. This data is used to enrich the generative AI prompts, providing more context-specific and accurate responses without continually retraining the FM, improving transparency and minimizing hallucinations.
In this post, we present a solution using Amazon FSx for NetApp ONTAP with Amazon Bedrock to provide a RAG experience for generative AI applications on AWS by ingesting enterprise-specific unstructured user file data into Amazon Bedrock in an easy, fast, and secure way.
Our solution uses FSx for ONTAP file systems as a source of unstructured data to continuously populate an Amazon OpenSearch Serverless vector database with users’ existing files and folders and associated metadata, which then enriches generative AI prompts using Amazon Bedrock APIs to enable RAG scenarios with enterprise-specific data retrieved from the OpenSearch Serverless vector database.
When using RAG to develop generative AI applications such as Q&A chatbots, customers are also concerned about maintaining data security and ensuring end users cannot query information from unauthorized data sources. Our solution uses FSx for ONTAP to enable users to extend their current data security and access mechanisms to enhance model responses from Amazon Bedrock. We use FSx for ONTAP as a source of relevant metadata, specifically the security access control list (ACL) configurations of users attached to files and folders, to populate that metadata into OpenSearch Serverless. By combining access control operations with file events that notify the RAG application of new and changed data on the file system, our solution shows how FSx for ONTAP enables Amazon Bedrock to ensure that embeds from authorized files are only available to specific users connecting to our generative AI application.
AWS serverless services provide automatic scaling, built-in high availability, and a pay-per-use model, making it easier to focus on building generative AI applications. Event-driven computing with AWS Lambda is well-suited for compute-intensive, on-demand tasks such as document embedding and flexible large-scale language model (LLM) orchestration, and Amazon API Gateway provides an API interface that allows pluggable frontends and event-driven invocation of LLMs. Our solution also shows how to use API Gateway and Lambda to build a scalable, automated, API-driven serverless application layer on top of Amazon Bedrock and FSx for ONTAP.
Solution overview
This solution provisions an FSx for ONTAP multi-AZ file system using storage virtual machines (SVMs) that are joined to an AWS managed Microsoft AD domain. OpenSearch serverless vector search collections provide scalable, high-performance similarity search capabilities. You use an Amazon Elastic Compute Cloud (Amazon EC2) Windows server as an SMB/CIFS client for the FSx for ONTAP volume and configure data sharing and ACLs for SMB shares in the volume. You use this data and ACLs to test permission-based access to embedding in a RAG scenario with Amazon Bedrock.
The embedded container component of our solution is deployed on an EC2 Linux server and mounted as an NFS client on an FSx for ONTAP volume. We periodically migrate existing files and folders, along with their security ACL settings, to OpenSearch Serverless. We populate the OpenSearch Serverless vector search collection index with company-specific data (and associated metadata and ACLs) from the NFS share on the FSx for ONTAP file system.
The solution implements a RAG Retrieval Lambda function that enables RAG on Amazon Bedrock by using company-specific data and associated metadata (including ACLs) retrieved from an OpenSearch Serverless index populated by an embedded container component to power generative AI prompts using the Amazon Bedrock API. The RAG Retrieval Lambda function stores the conversation history of user interactions in an Amazon DynamoDB table.
End users interact with the solution by sending natural language prompts through the chatbot application or directly through the API Gateway interface. The chatbot application container is built using Streamlit and fronted by an AWS Application Load Balancer (ALB). When a user sends a natural language prompt to the chatbot UI using the ALB, the chatbot container interacts with the API Gateway interface and invokes the RAG Retrieve Lambda function to get the user’s response. Users can also send prompt requests directly to API Gateway to get responses. We demonstrate permission-based access to RAG documents by explicitly retrieving the user’s SID and using that SID in the chatbot or API Gateway request. The RAG Retrieve Lambda function matches the SID against the Windows ACLs configured for the document. As an additional authentication step in production, you can also authenticate the user against an identity provider and match the user against the permissions configured for the document.
The following diagram shows the end-to-end flow of our solution. First, we use FSx for ONTAP to set up data shares and ACLs, then periodically scan these with an embedding container. The embedding container splits documents into chunks and creates vector embeddings from these chunks using the Amazon Titan Embeddings model. We then store these vector embeddings along with associated metadata in a vector database by indexing a vector collection in OpenSearch Serverless. The following diagram shows the end-to-end flow.
The following architecture diagram shows the different components of our solution.
Prerequisites
Complete the following prerequisite steps:
- Make sure your model is accessible in Amazon Bedrock. This solution uses Anthropic Claude v3 Sonnet in Amazon Bedrock.
- Install the AWS Command Line Interface (AWS CLI).
- Install Docker.
- Install Terraform.
Deploy the solution
You can download the solution from this GitHub repository: clone the repository and use the Terraform templates, which will provision all the components with the required configuration.
- Clone this solution repository.
- From the terraform folder, deploy the entire solution with Terraform.
This process can take 15-20 minutes to complete, and once complete the output of the terraform command will look similar to the following:
Load data and set permissions
To test the solution, you need to create an EC2 Windows server (ad_host
) to mount the FSx for ONTAP volume as an SMB/CIFS client to share sample data and set user permissions that will be used by the embedded container component of the solution to populate the OpenSearch Serverless index. Follow these steps to mount the FSx for ONTAP SVM data volume as a network drive, upload data to this shared network drive, and set permissions based on Windows ACLs.
- Obtaining
ad_host
Get the instance DNS from the output of the Terraform template. - Go to AWS Systems Manager Fleet Manager in the AWS console.
ad_host
Create an instance and follow the instructions here to log in with Remote Desktop. Use a domain admin userbedrock-01\Admin
Get the password from AWS Secrets Manager. You can find the password using Secrets Manager.fsx-secret-id
The secret ID from the output of your Terraform template. - To mount an FSx for ONTAP data volume as a network drive, This PCSelect (right click) network And choose Mapping a Network Drive.
- Select a drive letter and use the FSx for ONTAP share path for mounting.
(\\<svm>.<domain >\c$\<volume-name>
): - Upload the Amazon Bedrock User Guide to a shared network drive and set permissions to only the admin user (make sure you disable inheritance). Advanced):
- Upload the Amazon FSx for ONTAP User Guide to the shared drive and ensure that the permissions are set. everyone:
- Above
ad_host
On the server, open a command prompt and type the following command to get the SID for the Administrator user:
Use a chatbot to test permissions
To test permissions using the chatbot, lb-dns-name
Get the URL from the output of the Terraform template and access it from a web browser.
The prompt query asks a common question in the publicly accessible FSx for ONTAP User Guide. In our scenario, we asked, “How do I create an FSx for ONTAP file system?” and the model replied in a chat window with detailed instructions and source attributes for creating an FSx for ONTAP file system using the AWS Management Console, AWS CLI, or FSx API.
Now, let’s ask a question about the Amazon Bedrock User Guide, which is only available to administrator access. In this scenario, you ask “How do I use foundational models with Amazon Bedrock?” and the model responds that it has insufficient information to provide a detailed answer to your question.
Use the admin SID in the User (SID) filter search in the chat UI and ask the same question in the prompt, this time replying with detailed instructions on how your model uses FM with Amazon Bedrock and providing the source attribute your model used in the response.
Testing permissions using API Gateway
You can also use API Gateway to query the model directly. api-invoke-url
Parameters from the output of the Terraform template.
Next, start the API Gateway. everyone To access queries related to the FSx for ONTAP user guide, set the metadata parameter value to NA and everyone access:
cleaning
To avoid recurring charges, clean up your account after you’ve tried the solution: Delete the solution’s Terraform templates from the terraform folder.
Conclusion
In this post, we introduced a solution that uses FSx for ONTAP with Amazon Bedrock and FSx for ONTAP file ownership and ACL support to provide permission-based access in RAG scenarios for generative AI applications. Our solution enables you to build generative AI applications on Amazon Bedrock and enrich generative AI prompts in Amazon Bedrock with company-specific unstructured user file data from FSx for ONTAP file systems. With this solution, you can provide more relevant, context-specific, and accurate responses while ensuring that only authorized users have access to that data. Finally, this solution demonstrates the use of AWS serverless services with FSx for ONTAP and Amazon Bedrock to enable autoscaling, event-driven computing, and API interfaces for generative AI applications on AWS.
To learn more about how you can get started building with Amazon Bedrock and FSx for ONTAP, see the following resources:
About the Author
Kanishk Mahajan He is a Principal of Solutions Architecture at AWS. He leads cloud transformation and solutions architecture for AWS ISV customers and partners. He specializes in containers, cloud operations, migration and modernization, AI/ML, resiliency, security, and compliance. He is also a Technical Field Community (TFC) member for each of AWS’ domains.
Michael Scholl He is a Principal Architect in the CTO’s Office at NetApp and has over 20 years of experience building data management systems, applications, and infrastructure solutions. He has unique and deep insight into cloud technologies, builders, and AI solutions.
Sasha Corman He is the technical leader of a dynamic development and QA team spread across Israel and India. With 14 years of experience at NetApp as a programmer, he has hands-on experience and leadership in successfully delivering complex projects with a focus on innovation, scalability and reliability.