Automate invoice processing using Streamlit and Amazon Bedrock

Invoice processing is an important and often tedious task for businesses of all sizes, especially large businesses that deal with invoices in a variety of formats from multiple vendors. The sheer amount of data and the need for accuracy and efficiency can make invoice processing a major challenge. Invoices vary widely in format, structure, and content, making them difficult to process efficiently at scale. Traditional methods that rely on manual data entry or custom scripts for each vendor’s formats not only result in inefficiencies, but also increase the likelihood of errors, leading to financial discrepancies, operational bottlenecks, and backlogs. may occur.

A fully managed service that offers a selection of high-performance foundational models (FM) from leading AI companies such as AI21 Labs, Anthropic, Cohere, and Meta to extract important details such as invoice numbers, dates, and amounts. Use Amazon Bedrock. , Mistral AI, Stability AI, and Amazon in one API, along with the breadth of capabilities you need to build generative AI applications with security, privacy, and responsible AI.

This post provides a step-by-step guide with the necessary building blocks to create a Streamlit application to process and confirm invoices from multiple vendors. Streamlit is an open-source framework for data scientists to efficiently create interactive web-based data applications in pure Python. To build the application front end, we use Anthropic’s Claude 3 Sonnet model on Amazon Bedrock and Streamlit.

Solution overview

This solution uses chat functionality with documents from Amazon Bedrock Knowledge Bases to analyze and extract important details from your invoices without the need for a knowledge base. The results will be displayed in the Streamlit app, displaying your invoice and extracted information side-by-side for easy review. Importantly, documents and data are not saved after processing.

The storage tier uses Amazon Simple Storage Service (Amazon S3) to hold invoices that business users upload. Once uploaded, you can set up a regular batch job to process these invoices, extract important information, and save the results to a JSON file. In this post, we will store the data in JSON format, but you can also choose to store it in your favorite SQL or NoSQL database.

The application layer uses Streamlit to display PDF invoices with data extracted from Amazon Bedrock. For simplicity, we deploy the app locally, but you can also run it in Amazon SageMaker Studio, Amazon Elastic Compute Cloud (Amazon EC2), or Amazon Elastic Container Service (Amazon ECS) if you prefer.

Prerequisites

To implement this solution, follow these steps:

Install dependencies and clone the sample

To get started, install the required packages on your local machine or EC2 instance. If you are new to Amazon EC2, please refer to the Amazon EC2 User Guide. This tutorial uses your local machine to set up the project.

To install the dependencies and clone the example, follow these steps:

Clone the repository to a local folder.

git clone https://github.com/aws-samples/genai-invoice-processor.git

Install Python dependencies
- Change to your project directory.
```
cd </path/to/your/folder>/genai-invoice-processor
```
- upgrade pip
```
python3 -m pip install --upgrade pip
```
- (Optional) Create a virtual environment to isolate dependencies.
- Activate your virtual environment.
  1. Mac/Linux:
  2. Windows:
Install the required Python packages by running the following command in the cloned directory:
```
pip install -r requirements.txt
```
This will install the required packages including Boto3 (AWS SDK for Python), Streamlit, and other dependencies.
update region in config.yaml Copy the file to the same region configured for Amazon Bedrock and the AWS CLI where Anthropic’s Claude 3 Sonnet model is available.

After you complete these steps, the invoice processor code is set up in your local environment and you are ready to take the next step in processing invoices using Amazon Bedrock.

Process invoices using Amazon Bedrock

Now that you’ve set up your environment, you’re ready to start processing your invoices and deploying your Streamlit app. Follow these steps to process your invoices using Amazon Bedrock.

Store invoices in Amazon S3

Store invoices from different vendors in an S3 bucket. These can be uploaded directly using the console, API, or as part of your normal business processes. To upload using the CLI, follow these steps:

Create an S3 bucket.
```
aws s3 mb s3://<your-bucket-name> --region <your-region>
```
exchange your-bucket-name The name of the bucket you created and your-region Using the region configured in the AWS CLI, config.yaml (for example, us-east-1)
Upload your invoice to your S3 bucket. Upload the invoice to S3 using one of the following commands:
- To upload an invoice to the root of your bucket:
```
aws s3 cp </path/to/your/folder> s3://<your-bucket-name>/ --recursive
```
- To upload invoices to a specific folder (for example, invoices):
```
aws s3 cp </path/to/your/folder> s3://<your-bucket-name>/<prefix>/ --recursive
```
- Validate your upload.
```
aws s3 ls s3://<your-bucket-name>/
```

Process invoices with Amazon Bedrock

In this section, you will process an invoice in Amazon S3 and save the results to a JSON file (processed_invoice_output.json). Extract key details from invoices (invoice number, date, amount, etc.) and generate a summary.

You can trigger processing of these invoices using the AWS CLI, or you can automate the process using Amazon EventBridge rules or AWS Lambda triggers. This tutorial uses the AWS CLI to trigger processing.

Packaged processing logic into Python script invoices_processor.pyyou can run it like this:

python invoices_processor.py --bucket_name=<your-bucket-name> --prefix=<your-folder>

of --prefix Arguments are optional. If omitted, all PDFs in the bucket are processed. for example:

python invoices_processor.py --bucket_name=’gen_ai_demo_bucket’

python invoices_processor.py --bucket_name="gen_ai_demo_bucket" --prefix='invoice'

Use the solution

In this section: invoices_processor.py code. You can chat with your documentation using the Amazon Bedrock console or Amazon Bedrock. RetrieveAndGenerate API (SDK). This tutorial uses the API approach.

Initialize the environment: The script imports the required libraries and initializes Amazon Bedrock and Amazon S3 clients.

import boto3
import os
import json
import shutil
import argparse
import time
import datetime
import yaml
from typing import Dict, Any, Tuple
from concurrent.futures import ThreadPoolExecutor, as_completed
from threading import Lock
from mypy_boto3_bedrock_runtime.client import BedrockRuntimeClient
from mypy_boto3_s3.client import S3Client

# Load configuration from YAML file
def load_config():
    """
    Load and return the configuration from the 'config.yaml' file.
    """
    with open('config.yaml', 'r') as file:
        return yaml.safe_load(file)

CONFIG = load_config()

write_lock = Lock() # Lock for managing concurrent writes to the output file

def initialize_aws_clients() -> Tuple(S3Client, BedrockRuntimeClient):
    """
    Initialize and return AWS S3 and Bedrock clients.

    Returns:
        Tuple(S3Client, BedrockRuntimeClient)
    """
    return (
        boto3.client('s3', region_name=CONFIG('aws')('region_name')),
        boto3.client(service_name="bedrock-agent-runtime", 
                     region_name=CONFIG('aws')('region_name'))
    )

setting: config.yaml file specifies the model ID, region, entity extraction prompt, and output file location for processing.

aws: 
    region_name: us-west-2 
    model_id: anthropic.claude-3-sonnet-20240229-v1:0
    prompts: 
        full: Extract data from attached invoice in key-value format. 
        structured: | 
            Process the pdf invoice and list all metadata and values in json format for the variables with descriptions in <variables></variables> tags. The result should be returned as JSON as given in the <output></output> tags. 

            <variables> 
                Vendor: Name of the company or entity the invoice is from. 
                InvoiceDate: Date the invoice was created.
                DueDate: Date the invoice is due and needs to be paid by. 
                CurrencyCode: Currency code for the invoice amount based on the symbol and vendor details.
                TotalAmountDue: Total amount due for the invoice
                Description: a concise summary of the invoice description within 20 words 
            </variables> 

            Format your analysis as a JSON object in following structure: 
                <output> {
                "Vendor": "<vendor name>", 
                "InvoiceDate":"<DD-MM-YYYY>", 
                "DueDate":"<DD-MM-YYYY>",
                "CurrencyCode":"<Currency code based on the symbol and vendor details>", 
                "TotalAmountDue":"<100.90>" # should be a decimal number in string 
                "Description":"<Concise summary of the invoice description within 20 words>" 
                } </output> 
            Please proceed with the analysis based on the above instructions. Please don't state "Based on the .."
        summary: Process the pdf invoice and summarize the invoice under 3 lines 

processing: 
    output_file: processed_invoice_output.json
    local_download_folder: invoices

Set up API calls. RetrieveAndGenerate The API retrieves the invoice from Amazon S3 and processes it using FM. It accepts several parameters, including a prompt, source type (S3), model ID, AWS Region, and S3 URI for your bill.

def retrieve_and_generate(bedrock_client: BedrockRuntimeClient, input_prompt: str, document_s3_uri: str) -> Dict(str, Any): 
    """ 
    Use AWS Bedrock to retrieve and generate invoice data based on the provided prompt and S3 document URI.

    Args: 
        bedrock_client (BedrockRuntimeClient): AWS Bedrock client 
        input_prompt (str): Prompt for the AI model
        document_s3_uri (str): S3 URI of the invoice document 

    Returns: 
        Dict(str, Any): Generated data from Bedrock 
    """ 
    model_arn = f'arn:aws:bedrock:{CONFIG("aws")("region_name")}::foundation-model/{CONFIG("aws")("model_id")}' 
    return bedrock_client.retrieve_and_generate( 
        input={'text': input_prompt}, retrieveAndGenerateConfiguration={ 
            'type': 'EXTERNAL_SOURCES',
            'externalSourcesConfiguration': { 
                'modelArn': model_arn, 
                'sources': ( 
                    { 
                        "sourceType": "S3", 
                        "s3Location": {"uri": document_s3_uri} 
                    }
                ) 
            } 
        } 
    )

Batch processing: batch_process_s3_bucket_invoices The function batch processes invoices in parallel in the specified S3 bucket and writes the results to an output file (processed_invoice_output.json as specified by output_file in config.yaml). It relies on the process_invoice function, which calls the Amazon Bedrock RetrieveAndGenerate API for each invoice and prompt.

def process_invoice(s3_client: S3Client, bedrock_client: BedrockRuntimeClient, bucket_name: str, pdf_file_key: str) -> Dict(str, str): 
    """ 
    Process a single invoice by downloading it from S3 and using Bedrock to analyze it. 

    Args: 
        s3_client (S3Client): AWS S3 client 
        bedrock_client (BedrockRuntimeClient): AWS Bedrock client 
        bucket_name (str): Name of the S3 bucket
        pdf_file_key (str): S3 key of the PDF invoice 

    Returns: 
        Dict(str, Any): Processed invoice data 
    """ 
    document_uri = f"s3://{bucket_name}/{pdf_file_key}"
    local_file_path = os.path.join(CONFIG('processing')('local_download_folder'), pdf_file_key) 

    # Ensure the local directory exists and download the invoice from S3
    os.makedirs(os.path.dirname(local_file_path), exist_ok=True) 
    s3_client.download_file(bucket_name, pdf_file_key, local_file_path) 

    # Process invoice with different prompts 
    results = {} 
    for prompt_name in ("full", "structured", "summary"):
        response = retrieve_and_generate(bedrock_client, CONFIG('aws')('prompts')(prompt_name), document_uri)
        results(prompt_name) = response('output')('text')

    return results

def batch_process_s3_bucket_invoices(s3_client: S3Client, bedrock_client: BedrockRuntimeClient, bucket_name: str, prefix: str = "") -> int: 
    """ 
    Batch process all invoices in an S3 bucket or a specific prefix within the bucket. 

    Args: 
        s3_client (S3Client): AWS S3 client 
        bedrock_client (BedrockRuntimeClient): AWS Bedrock client 
        bucket_name (str): Name of the S3 bucket 
        prefix (str, optional): S3 prefix to filter invoices. Defaults to "". 

    Returns: 
        int: Number of processed invoices 
    """ 
    # Clear and recreate local download folder
    shutil.rmtree(CONFIG('processing')('local_download_folder'), ignore_errors=True)
    os.makedirs(CONFIG('processing')('local_download_folder'), exist_ok=True) 

    # Prepare to iterate through all objects in the S3 bucket
    continuation_token = None # Pagination handling
    pdf_file_keys = () 

    while True: 
        list_kwargs = {'Bucket': bucket_name, 'Prefix': prefix}
        if continuation_token:
            list_kwargs('ContinuationToken') = continuation_token 

        response = s3_client.list_objects_v2(**list_kwargs)

        for obj in response.get('Contents', ()): 
            pdf_file_key = obj('Key') 
            if pdf_file_key.lower().endswith('.pdf'): # Skip folders or non-PDF files
                pdf_file_keys.append(pdf_file_key) 

        if not response.get('IsTruncated'): 
            break 
            continuation_token = response.get('NextContinuationToken') 

    # Process invoices in parallel 
    processed_count = 0 
    with ThreadPoolExecutor() as executor: 
        future_to_key = { 
            executor.submit(process_invoice, s3_client, bedrock_client, bucket_name, pdf_file_key): pdf_file_key
            for pdf_file_key in pdf_file_keys 
        } 

        for future in as_completed(future_to_key):
            pdf_file_key = future_to_key(future) 
            try: 
                result = future.result() 
                # Write result to the JSON output file as soon as it's available 
                write_to_json_file(CONFIG('processing')('output_file'), {pdf_file_key: result}) 
                processed_count += 1 
                print(f"Processed file: s3://{bucket_name}/{pdf_file_key}") 
            except Exception as e: 
                print(f"Failed to process s3://{bucket_name}/{pdf_file_key}: {str(e)}") 

    return processed_count

Post-processing: Extracted data processed_invoice_output.json It can be further structured or customized to suit your needs.

This approach allows you to process invoices from multiple vendors, each with its own format and structure. Using large-scale language models (LLMs), extract important details such as invoice numbers, dates, amounts, and vendor information without the need for custom scripts for each vendor format.

Run the Streamlit demo

Now that you have your components in place and used Amazon Bedrock to process your invoices, it’s time to deploy your Streamlit application. You can start your app by calling the following command:

streamlit run review-invoice-data.py

python -m streamlit run review-invoice-data.py

When the app starts, it opens in your default web browser. From there, you can see your invoice and extracted data side by side. Previous and Next arrows allow you to move seamlessly between processed invoices, allowing you to efficiently manipulate and analyze your results. The following screenshot shows the UI.

Amazon Bedrock has quotas (some of which are adjustable) that you should consider when building at scale with Amazon Bedrock.

cleaning

To clean up after running the demo, follow these steps:

Delete the S3 bucket that contains your invoice using the following command:
```
aws s3 rb s3://<your-bucket-name> --force
```
If you have set up a virtual environment, run the following command to deactivate it. deactivate
Delete any local files created during the process, including cloned repositories and output files.
If you were using AWS resources such as EC2 instances, terminate them to avoid unnecessary charges.

conclusion

In this post, we provided a step-by-step guide for automating invoice processing using Streamlit and Amazon Bedrock to address the challenge of processing invoices from multiple vendors in different formats. You learned how to set up an environment to process invoices stored in Amazon S3, and deploy a user-friendly Streamlit application to view and manipulate the processed data.

If you want to further enhance this solution, consider integrating additional functionality or deploying your app to scalable AWS services such as Amazon SageMaker, Amazon EC2, or Amazon ECS. This flexibility allows your invoice processing solution to evolve with your business, delivering long-term value and efficiency.

Learn more by exploring Amazon Bedrock, the Access Amazon Bedrock foundation model, the RetrieveAndGenerate API, Quotas for Amazon Bedrock, and building a solution using the sample implementation and datasets relevant to your business provided in this post. is recommended. If you have any questions or suggestions, please leave a comment.

About the author

deepika kumar I’m a solution architect at AWS. She has over 13 years of experience in the technology industry helping enterprises and SaaS organizations securely build and securely deploy workloads on the cloud. She is passionate about using Generative AI in responsible ways, including driving product innovation, increasing productivity, and improving customer experience.

Jobandeep Singh is an AWS Associate Solutions Architect specializing in machine learning. He helps customers across a wide range of industries leverage AWS to drive innovation and efficiency in their operations. In his free time, he enjoys sports, especially hockey.

Ratan Kumar is a solutions architect based in Auckland, New Zealand. He works with large enterprise customers to help them design and build secure, cost-effective, and reliable internet-scale applications using the AWS Cloud. He is passionate about technology and loves sharing his knowledge through blog posts and Twitch sessions.

What's Hot

This Motorized VR Gaming Chair Gives ‘Makes Me Sick’ Vibes Just by Looking at It

Mpox: What is the fatality rate of this virus and what treatment is there?

SpaceX plans to launch first-ever manned spacecraft above Earth’s poles

Accelerating insurance policy reviews with generative AI: Verisk’s Mozart companion

Announcing general availability of Amazon Bedrock Knowledge Bases GraphRAG with Amazon Neptune Analytics

Evaluate RAG responses with Amazon Bedrock, LlamaIndex and RAGAS

Build a Multi-Agent System with LangGraph and Mistral on AWS

Ground truth generation and review best practices for evaluating generative AI question-answering with FMEval

Accelerating insurance policy reviews with generative AI: Verisk’s Mozart companion

Announcing general availability of Amazon Bedrock Knowledge Bases GraphRAG with Amazon Neptune Analytics

Evaluate RAG responses with Amazon Bedrock, LlamaIndex and RAGAS

Pixel body temperature measurements expanding to Europe

Smart speakers at crime scenes could provide police with valuable clues

Elon Musk’s brain implant company Neuralink will conduct first test in 2024

Most Popular

How to change your body clock to overcome jet lag

NBC Sends 27 Creators to Paris; All It Takes is Snoop and an Olympian

Samsung Galaxy Buds3 Review: Cybertruck AirPods

Our Picks

AI models cannot learn like humans can

NYT Mini Crossword Answers for August 18

It’s time for Android launcher-like customization to make its way to iOS and Windows

Subscribe to our newsletter

Subscribe to Updates

What's Hot

Automate invoice processing using Streamlit and Amazon Bedrock

Solution overview

Prerequisites

Install dependencies and clone the sample

Process invoices using Amazon Bedrock

Store invoices in Amazon S3

Process invoices with Amazon Bedrock

Use the solution

Run the Streamlit demo

cleaning

conclusion

About the author

Related Posts

Subscribe to our newsletter

Subscribe to our newsletter