Amazon Bedrock is a fully managed service that provides a selection of high-performance foundational models (FM) from leading AI companies, including AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon, through a single API. The broad feature set you need to build generative AI applications with security, privacy, and responsible AI.
Amazon Bedrock’s batch inference uses foundational models (FM) to efficiently process large amounts of data when real-time results are not required. It is ideal for workloads that are not latency-sensitive, such as embedding retrieval, entity extraction, FM evaluation as a judge, and text classification and summarization for business reporting tasks. The main benefit is its cost-effectiveness, with batch inference workloads being charged at a 50% discount compared to on-demand pricing. For currently supported AWS Regions and models, see Regions and Models Supported for Batch Inference.
Batch inference has many benefits, but it is limited to 10 batch inference jobs submitted per model per region. To address this consideration and enhance the use of batch inference, we developed a scalable solution using AWS Lambda and Amazon DynamoDB. This post describes implementing a queue management system that automatically monitors available job slots and submits new jobs when slots become available.
We will walk you through the solution, detailing the core logic of your Lambda function. By the end, you will understand how to implement this solution to maximize the efficiency of your batch inference workflows on Amazon Bedrock. For instructions on starting an Amazon Bedrock batch inference job, see Improve Call Center Efficiency with Batch Inference for Transcript Summarization with Amazon Bedrock.
The power of batch inference
Batch inference allows organizations to process large amounts of data asynchronously, making it ideal for scenarios where real-time results are not important. This feature is especially useful for tasks such as asynchronous embedding generation, large-scale text classification, and high-volume content analysis. For example, companies can use batch inference to generate embeddings of large document collections, classify broad datasets, and efficiently analyze large amounts of user-generated content.
One of the main advantages of batch inference is its cost-effectiveness. Amazon Bedrock offers select FMs for batch inference at 50% of on-demand inference prices. This significant cost reduction allows organizations to process large datasets more economically, making it attractive for companies looking to optimize their production AI processing costs while maintaining the ability to process large amounts of data. This is a great option.
Solution overview
The solution presented in this post uses Amazon Bedrock’s batch inference to efficiently process many requests using the following solution architecture.
This architectural workflow includes the following steps:
- Upload the files you want to process to an Amazon Simple Storage Service (Amazon S3) bucket.
br-batch-inference-{Account_Id}-{AWS-Region}
in process folder. Amazon S3 is{stack_name}-create-batch-queue-{AWS-Region}
Lambda function. - The invoked Lambda function creates a new job entry in the DynamoDB table with a pending status. DynamoDB tables are important for tracking and managing batch inference jobs throughout their lifecycle. Information such as job ID, status, creation time, and other metadata is stored.
- An Amazon EventBridge rule scheduled to run every 15 minutes is
{stack_name}-process-batch-jobs-{AWS-Region}
Lambda function. - of
{stack_name}-process-batch-jobs-{AWS-Region}
Lambda functions perform several important tasks.- Scan jobs on DynamoDB tables.
InProgress
,Submitted
,Validation
andScheduled
situation - Updates DynamoDB job status based on the latest information from Amazon Bedrock
- Calculate available job slots and submit new jobs from there.
Pending
Queue if slot is available - Handle error scenarios by updating the job status as follows:
Failed
Logging error details for troubleshooting
- Scan jobs on DynamoDB tables.
- The Lambda function executes the GetModelInvocationJob API call to retrieve the latest status of the batch inference job from Amazon Bedrock.
- The Lambda function then uses the UpdateItem API call to update the status of the jobs in DynamoDB, ensuring that the table always reflects the latest state of each job.
- The Lambda function calculates the number of available slots before reaching the service quota limit for batch inference jobs. Based on this, query for pending jobs that can be submitted.
- If there are slots available, the Lambda function makes a CreateModelInvocationJob API call to create a new batch inference job for the pending job.
- Update the DynamoDB table with the status of the batch inference job created in the previous step.
- Once a batch job completes, its output files are available in your S3 bucket.
br-batch-inference-{Account_Id}-{AWS-Region}
processed folder
Prerequisites
The following prerequisites are required to run this solution:
Implementation guide
To deploy your pipeline, follow these steps:
- Please select startup stack button:
- choose Nextas shown in the following screenshot,
- Specify pipeline details using options that suit your use case.
- Stack name (required) – The name you specified for this AWS CloudFormation. The name must be unique within the region in which you create it.
- Model ID (required) – Specify the model ID required to run the batch job.
- Roll Earn (Optional) – By default, the CloudFormation stack deploys a new IAM role with the required permissions. If you have a role that you would like to use instead of creating a new role, create a batch inference job in Amazon Bedrock and create an IAM role with sufficient permissions to read/write on the created S3 bucket. ).
br-batch-inference-{Account_Id}-{AWS-Region}
. To create this role, follow the instructions in the prerequisites section.
- In the (Amazon Configure Stack Options) section, add optional tags, permissions, and other advanced settings as needed. Or you can leave it blank and select Nextas shown in the following screenshot.
- Review the stack details and select I acknowledge that AWS CloudFormation may create AWS IAM resourcesas shown in the following screenshot.
- choose submit. This starts the pipeline deployment in your AWS account.
- After your stack is successfully deployed, you can start using your pipeline. first, /process Specify the folder in the created Amazon S3 location for input. The .jsonl uploaded to this folder will contain the batch job created with the selected model. Below is a screenshot of a DynamoDB table where you can track the status of your job and other types of metadata related to your job.
- After the first batch job from the pipeline completes, the pipeline /processed Place the folders under the same bucket as shown in the following screenshot. Output from batch jobs created by this pipeline will be saved in this folder.
- To start using this pipeline,
.jsonl
Files prepared for batch inference in Amazon Bedrock
That’s it! The pipeline is successfully deployed and you can check the status of the batch job in the Amazon Bedrock console. If you would like to learn more about each, please .jsonl
To check the status of the file, go to the created DynamoDB table {StackName}-DynamoDBTable-{UniqueString}
Check the status there. EventBridge is scheduled to scan DynamoDB every 15 minutes, so you may have to wait up to 15 minutes before seeing the batch jobs created.
cleaning
If you no longer need this automated pipeline, follow these steps to delete the resources created to avoid additional costs.
- Manually delete the content in the bucket in the Amazon S3 console. Make sure your bucket is empty before proceeding to step 2.
- In the AWS CloudFormation console, choose stack in the navigation pane.
- Select the stack you created, eraseas shown in the following screenshot.
This will automatically delete the deployed stack.
conclusion
In this post, we introduced a scalable and efficient solution for automating batch inference jobs on Amazon Bedrock. We’ve used AWS Lambda, Amazon DynamoDB, and Amazon EventBridge to address key challenges in managing large-scale batch processing workflows.
This solution has several major advantages.
- automatic queue management – Maximize throughput by dynamically managing job slots and submissions
- Cost optimization – Achieve economical large-scale processing with a 50% discount on batch inference pricing
This automated pipeline greatly enhances your ability to process large amounts of data using Amazon Bedrock’s batch inference. Whether you want to generate embeddings, classify text, or analyze content in bulk, this solution provides a scalable, efficient, and cost-effective approach to batch inference.
As you implement this solution, remember to regularly review and optimize your configuration based on your specific workload patterns and requirements. With this automated pipeline and the power of Amazon Bedrock, you are equipped to tackle large-scale AI inference tasks efficiently and effectively. Please try it out and share your feedback so we can continue to improve this solution.
See below for additional resources.
About the author
Yangyang Zhang She is a Senior Generative AI Data Scientist at Amazon Web Services and a Generative AI Specialist working on cutting-edge AI/ML technologies to help customers use generative AI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a PhD in electrical engineering. Outside of work, I love traveling, working out, and exploring new things.
Ishan Singh As a Generative AI Data Scientist at Amazon Web Services, I help customers build innovative and responsible generative AI solutions and products. With a strong background in AI/ML, Ishan specializes in building generative AI solutions that drive business value. Outside of work, I enjoy playing volleyball, exploring local bike trails, and spending time with my wife and dog, Beau.
neeraj lamba Cloud Infrastructure Architect for Amazon Web Services (AWS) Worldwide Public Sector Professional Services. He helps customers transform their businesses by helping design cloud solutions and providing technical guidance. Outside of work, I enjoy traveling, playing tennis, and experimenting with new technology.