Generative artificial intelligence (AI) offers the opportunity to improve healthcare by combining and analyzing structured and unstructured data across previously disconnected silos. Generative AI can help drive higher levels of efficiency and effectiveness across the entire spectrum of healthcare delivery.
The healthcare industry generates and collects large amounts of unstructured text data, including clinical documents such as patient information, medical history, and test results, as well as non-clinical documents such as administrative records. This unstructured data is often found in various paper-based formats that are difficult to manage and process, which can affect the efficiency and productivity of clinical services. Streamlining the processing of this information is essential for healthcare providers to improve patient care and optimize their operations.
Processing large volumes of data, extracting unstructured data from multiple paper forms and images, and comparing it to standard and reference forms can be a long, tedious process prone to errors and inefficiencies. However, advances in generative AI solutions have introduced automated approaches that provide a more efficient and reliable solution for comparing multiple documents.
Amazon Bedrock is a fully managed service that makes foundational models (FMs) from leading AI startups and Amazon available via APIs. You can choose from a wide range of FMs to find the best fit for your use case. Amazon Bedrock provides a serverless experience, so you can get started quickly, privately customize your FM with your own data, and quickly integrate and deploy it into your applications using AWS tools without having to manage any infrastructure.
This post describes how to use Anthropic Claude 3 with Amazon Bedrock large-scale language models (LLMs). Amazon Bedrock provides access to several LLMs, including Anthropic Claude 3, that you can use to generate semi-structured data relevant to the healthcare industry. This is particularly useful for creating a variety of healthcare-related forms, such as patient intake forms, insurance claim forms, and medical history questionnaires.
Solution overview
Before diving into the specific elements and services used, we’ll walk through the architectural steps required to build the solution on AWS so that you can get a high-level understanding of how the solution works. We’ll present the key elements of the solution and provide an overview of the different components and their interactions.
We then explore each key element in more detail, discuss the specific AWS services used to build the solution, and explain how these services work together to achieve the desired functionality, providing a solid foundation for further exploration and implementation of the solution.
Part 1: Standard Forms: Extracting and Saving Data
The following diagram shows the key elements of the solution for extracting and storing data using standard forms.
Figure 1: Architecture – Standard formats – Data extraction and storage.
The standard procedure is as follows:
- Users upload images (PDF, PNG, JPEG) of paper forms to Amazon Simple Storage Service (Amazon S3), a highly scalable and durable object storage service.
- Amazon Simple Queue Service (Amazon SQS) is used as a message queue: every time a new form is loaded, an event is raised in Amazon SQS.
- If an S3 object is not processed, after two attempts it is moved to an SQS Dead Letter Queue (DLQ), which can be further configured using an Amazon Simple Notification Service (Amazon SNS) topic to notify users via email.
- The SQS message invokes AWS Lambda, which processes the new form data.
- The Lambda function reads the new S3 object and passes it to the Amazon Textract API to process the unstructured data and generate a hierarchical output. Amazon Textract is an AWS service that can extract text, handwriting, and data from scanned documents and images. This approach enables efficient and scalable processing of complex documents, allowing you to extract valuable insights and data from various sources.
- The Lambda function passes the converted text to Anthropic Claude 3 for Amazon Bedrock to generate a list of questions.
- Finally, the Lambda function saves the list of questions to Amazon S3.
Amazon Bedrock API calls to extract form details
It calls the Amazon Bedrock API twice in the process for the following actions:
- Extract questions from a standard or reference form – The first API call is made to extract a list of questions and subquestions from the standard or reference form. This list serves as a baseline or reference point for comparing other forms. By extracting questions from the reference form, a benchmark can be established against which other forms can be evaluated.
- Extract questions from a custom form – The second API call is made to extract the list of questions and subquestions from the custom form or the form that needs to be compared with the standard or reference form. This step is necessary because the content and structure of the custom form needs to be analyzed to identify the questions and subquestions that can then be compared with the reference form.
By extracting and structuring the questions in both the reference form and the custom form separately, the solution can pass these two lists to the Amazon Bedrock API for a final comparison step. This approach maintains the following:
- Exact Comparison – Because the API has access to structured data from both forms, it can easily identify matches and mismatches and provide associated inferences.
- Efficient Processing – Separating the extraction process for reference and custom forms allows you to avoid redundant operations and optimize your overall workflow.
- Observability and Interoperability – Keeping questions separate allows for better visibility, analysis, and consolidation of questions from different forms.
- Avoiding hallucinations – By following a structured approach and relying on extracted data, the solution avoids content generation and hallucinations and ensures the integrity of the comparison process.
This two-phase approach leverages the power of Amazon Bedrock APIs to optimize workflows, enable accurate and efficient form comparisons, and promote observability and interoperability of related questions.
See the following code (API call):
User prompt to extract and list fields
Provide the following user prompts to Anthropic Claude 3 to extract fields from the raw text and list them for comparison as shown in Step 3B (Figure 3: Data Extraction and Form Field Comparison).
The following image shows the output from Amazon Bedrock with a list of questions from a standard or reference form.
Figure 2: A sample questionnaire in standard format
As shown in part 2 of the process below, you can store this questionnaire in Amazon S3 so that you can compare it with other forms.
Part 2: Data Extraction vs. Form Fields
The following diagram shows the architecture for the next step: data extraction and form field comparison.
Figure 3: Data extraction vs. form fields
Steps 1 and 2 are similar to those in Figure 1, but are repeated for any form that you want to compare with the standard or reference form. The steps are as follows:
- The SQS message invokes a Lambda function, which processes the new form data.
- The raw text is extracted by Amazon Textract using a Lambda function, and the extracted raw text is then passed to step 3B for further processing and analysis.
- Anthropic Claude 3 generates a questionnaire from the custom form that needs to be compared with the standard form. Both the form and the document questionnaire are then passed to Amazon Bedrock to compare the extracted raw text with the standard or reference raw text to identify differences and anomalies and provide insights and recommendations relevant to the healthcare industry by their respective categories. It then generates the final output in JSON format for further processing and dashboarding. The Amazon Bedrock API calls and user prompts from step 5 (Figure 1: Architecture – Standard Form – Data Extraction and Storage) are reused in this step to generate a questionnaire from the custom form.
The following sections explain steps 4 through 6.
The following screenshot shows the output from Amazon Bedrock, including a list of questions from the custom form.
Figure 4: Sample Question List for Custom Forms
Final comparison using Anthropic Claude 3 on Amazon Bedrock:
The following example shows the results of a comparison exercise using Amazon Bedrock and Anthropic Claude 3, showing what did and did not match the reference or standard form.
Below is the form comparison user prompt.
The first call is:
The second call is:
The following screenshot shows the matching questions in the reference form.
The following screenshot shows a question that did not match the reference form.
The steps in the preceding architecture diagram continue as follows:
4. The SQS queue invokes the Lambda function.
5. The Lambda function invokes the AWS Glue job and monitors its completion.
a. The AWS Glue job processes the final JSON output from the Amazon Bedrock model in a tabular format for reporting.
6. Amazon QuickSight is used to create interactive dashboards and visualizations, enabling health professionals to explore the analysis, identify trends, and make informed decisions based on the insights provided by Anthropic Claude 3.
The following screenshot shows a sample QuickSight dashboard.
Next steps
Many healthcare providers are investing in digital technologies such as electronic health records (EHRs) and electronic medical records (EMRs) to streamline data collection and storage and ensure records are accessible to the right staff for patient care. Additionally, digitized health records offer the convenience of electronic forms and remote data editing for patients. Electronic health records provide a more secure and accessible system of record, reducing data loss and promoting data accuracy. Similar solutions can also capture data from these paper forms into the EHR.
Conclusion
Generative AI solutions such as Amazon Bedrock and Anthropic Claude 3 can greatly streamline the process of extracting and comparing unstructured data from paper forms and images. By automating the extraction of form fields and questions, and intelligently comparing them to standard or reference forms, the solution can process large volumes of data more efficiently and accurately. The integration of AWS services such as Lambda, Amazon S3, Amazon SQS, and QuickSight provides a scalable and robust architecture to deploy this solution. As healthcare organizations continue to digitize their operations, AI-powered solutions like this can play a key role in improving data management, maintaining compliance, and ultimately enhancing patient care through better insights and decision-making.
About the Author
Satish Sarapuri He is a Senior Data Architect for Data Lakes at AWS. He helps enterprise-level customers build high-performance, highly available, cost-effective, resilient, and secure generative AI, data mesh, data lake, and analytics platform solutions on AWS, enabling them to make data-driven decisions, drive impactful business outcomes, and support their digital and data transformation efforts. In his spare time, he enjoys spending time with his family and playing tennis.
Harpreet Cheema He is a Machine Learning Engineer in the AWS Generative AI innovation center. He is very passionate about the field of Machine Learning and working on data-oriented problems. He focuses on developing and delivering Machine Learning focused solutions for customers across various sectors.
Deborah Devadason She is a Senior Advisory Consultant with the Professional Services team at Amazon Web Services. She is a result-driven and passionate Data Strategy Specialist with 25+ years of consulting experience across industries across the globe. She leverages her expertise to solve complex problems and accelerate business-focused initiatives, building a stronger backbone for digital and data transformation efforts.
1 Comment
Can you be more specific about the content of your article? After reading it, I still have some doubts. Hope you can help me.