Invoice processing is an important and often tedious task for businesses of all sizes, especially large businesses that deal with invoices in a variety of formats from multiple vendors. The sheer amount of data and the need for accuracy and efficiency can make invoice processing a major challenge. Invoices vary widely in format, structure, and content, making them difficult to process efficiently at scale. Traditional methods that rely on manual data entry or custom scripts for each vendor’s formats not only result in inefficiencies, but also increase the likelihood of errors, leading to financial discrepancies, operational bottlenecks, and backlogs. may occur.
A fully managed service that offers a selection of high-performance foundational models (FM) from leading AI companies such as AI21 Labs, Anthropic, Cohere, and Meta to extract important details such as invoice numbers, dates, and amounts. Use Amazon Bedrock. , Mistral AI, Stability AI, and Amazon in one API, along with the breadth of capabilities you need to build generative AI applications with security, privacy, and responsible AI.
This post provides a step-by-step guide with the necessary building blocks to create a Streamlit application to process and confirm invoices from multiple vendors. Streamlit is an open-source framework for data scientists to efficiently create interactive web-based data applications in pure Python. To build the application front end, we use Anthropic’s Claude 3 Sonnet model on Amazon Bedrock and Streamlit.
Solution overview
This solution uses chat functionality with documents from Amazon Bedrock Knowledge Bases to analyze and extract important details from your invoices without the need for a knowledge base. The results will be displayed in the Streamlit app, displaying your invoice and extracted information side-by-side for easy review. Importantly, documents and data are not saved after processing.
The storage tier uses Amazon Simple Storage Service (Amazon S3) to hold invoices that business users upload. Once uploaded, you can set up a regular batch job to process these invoices, extract important information, and save the results to a JSON file. In this post, we will store the data in JSON format, but you can also choose to store it in your favorite SQL or NoSQL database.
The application layer uses Streamlit to display PDF invoices with data extracted from Amazon Bedrock. For simplicity, we deploy the app locally, but you can also run it in Amazon SageMaker Studio, Amazon Elastic Compute Cloud (Amazon EC2), or Amazon Elastic Container Service (Amazon ECS) if you prefer.
Prerequisites
To implement this solution, follow these steps:
Install dependencies and clone the sample
To get started, install the required packages on your local machine or EC2 instance. If you are new to Amazon EC2, please refer to the Amazon EC2 User Guide. This tutorial uses your local machine to set up the project.
To install the dependencies and clone the example, follow these steps:
- Clone the repository to a local folder.
- Install Python dependencies
- Change to your project directory.
- upgrade pip
- (Optional) Create a virtual environment to isolate dependencies.
- Activate your virtual environment.
- Mac/Linux:
- Windows:
- Install the required Python packages by running the following command in the cloned directory:
This will install the required packages including Boto3 (AWS SDK for Python), Streamlit, and other dependencies.
- update
region
inconfig.yaml
Copy the file to the same region configured for Amazon Bedrock and the AWS CLI where Anthropic’s Claude 3 Sonnet model is available.
After you complete these steps, the invoice processor code is set up in your local environment and you are ready to take the next step in processing invoices using Amazon Bedrock.
Process invoices using Amazon Bedrock
Now that you’ve set up your environment, you’re ready to start processing your invoices and deploying your Streamlit app. Follow these steps to process your invoices using Amazon Bedrock.
Store invoices in Amazon S3
Store invoices from different vendors in an S3 bucket. These can be uploaded directly using the console, API, or as part of your normal business processes. To upload using the CLI, follow these steps:
- Create an S3 bucket.
exchange
your-bucket-name
The name of the bucket you created andyour-region
Using the region configured in the AWS CLI,config.yaml
(for example,us-east-1
) - Upload your invoice to your S3 bucket. Upload the invoice to S3 using one of the following commands:
- To upload an invoice to the root of your bucket:
- To upload invoices to a specific folder (for example,
invoices
): - Validate your upload.
Process invoices with Amazon Bedrock
In this section, you will process an invoice in Amazon S3 and save the results to a JSON file (processed_invoice_output.json
). Extract key details from invoices (invoice number, date, amount, etc.) and generate a summary.
You can trigger processing of these invoices using the AWS CLI, or you can automate the process using Amazon EventBridge rules or AWS Lambda triggers. This tutorial uses the AWS CLI to trigger processing.
Packaged processing logic into Python script invoices_processor.py
you can run it like this:
of --prefix
Arguments are optional. If omitted, all PDFs in the bucket are processed. for example:
or
Use the solution
In this section: invoices_processor.py
code. You can chat with your documentation using the Amazon Bedrock console or Amazon Bedrock. RetrieveAndGenerate API (SDK). This tutorial uses the API approach.
-
- Initialize the environment: The script imports the required libraries and initializes Amazon Bedrock and Amazon S3 clients.
- setting:
config.yaml
file specifies the model ID, region, entity extraction prompt, and output file location for processing. - Set up API calls.
RetrieveAndGenerate
The API retrieves the invoice from Amazon S3 and processes it using FM. It accepts several parameters, including a prompt, source type (S3), model ID, AWS Region, and S3 URI for your bill. - Batch processing:
batch_process_s3_bucket_invoices
The function batch processes invoices in parallel in the specified S3 bucket and writes the results to an output file (processed_invoice_output.json
as specified byoutput_file
inconfig.yaml
). It relies on the process_invoice function, which calls the Amazon Bedrock RetrieveAndGenerate API for each invoice and prompt. - Post-processing: Extracted data
processed_invoice_output.json
It can be further structured or customized to suit your needs.
This approach allows you to process invoices from multiple vendors, each with its own format and structure. Using large-scale language models (LLMs), extract important details such as invoice numbers, dates, amounts, and vendor information without the need for custom scripts for each vendor format.
Run the Streamlit demo
Now that you have your components in place and used Amazon Bedrock to process your invoices, it’s time to deploy your Streamlit application. You can start your app by calling the following command:
When the app starts, it opens in your default web browser. From there, you can see your invoice and extracted data side by side. Previous and Next arrows allow you to move seamlessly between processed invoices, allowing you to efficiently manipulate and analyze your results. The following screenshot shows the UI.
Amazon Bedrock has quotas (some of which are adjustable) that you should consider when building at scale with Amazon Bedrock.
cleaning
To clean up after running the demo, follow these steps:
- Delete the S3 bucket that contains your invoice using the following command:
- If you have set up a virtual environment, run the following command to deactivate it.
deactivate
- Delete any local files created during the process, including cloned repositories and output files.
- If you were using AWS resources such as EC2 instances, terminate them to avoid unnecessary charges.
conclusion
In this post, we provided a step-by-step guide for automating invoice processing using Streamlit and Amazon Bedrock to address the challenge of processing invoices from multiple vendors in different formats. You learned how to set up an environment to process invoices stored in Amazon S3, and deploy a user-friendly Streamlit application to view and manipulate the processed data.
If you want to further enhance this solution, consider integrating additional functionality or deploying your app to scalable AWS services such as Amazon SageMaker, Amazon EC2, or Amazon ECS. This flexibility allows your invoice processing solution to evolve with your business, delivering long-term value and efficiency.
Learn more by exploring Amazon Bedrock, the Access Amazon Bedrock foundation model, the RetrieveAndGenerate API, Quotas for Amazon Bedrock, and building a solution using the sample implementation and datasets relevant to your business provided in this post. is recommended. If you have any questions or suggestions, please leave a comment.
About the author
deepika kumar I’m a solution architect at AWS. She has over 13 years of experience in the technology industry helping enterprises and SaaS organizations securely build and securely deploy workloads on the cloud. She is passionate about using Generative AI in responsible ways, including driving product innovation, increasing productivity, and improving customer experience.
Jobandeep Singh is an AWS Associate Solutions Architect specializing in machine learning. He helps customers across a wide range of industries leverage AWS to drive innovation and efficiency in their operations. In his free time, he enjoys sports, especially hockey.
Ratan Kumar is a solutions architect based in Auckland, New Zealand. He works with large enterprise customers to help them design and build secure, cost-effective, and reliable internet-scale applications using the AWS Cloud. He is passionate about technology and loves sharing his knowledge through blog posts and Twitch sessions.