Today, we are announcing the general availability of batch inference for Amazon Bedrock. This new feature enables organizations to process large amounts of data when interacting with their foundational models (FMs), addressing a critical need across a range of industries, including call center operations.
Summarizing call center transcripts has become a vital task for businesses looking to derive valuable insights from customer interactions. As the volume of call data grows, traditional analytical methods become insufficient to keep up, calling for scalable solutions.
Batch inference is an attractive approach to address this challenge. It processes large volumes of text transcripts in batches and offers advantages over real-time or on-demand processing approaches by frequently using parallel processing techniques. It is particularly well suited for large call center operations where immediate results are not necessarily required.
In the following sections, we provide a detailed step-by-step guide on implementing these new features, from data preparation to job submission to output analysis. We also discuss best practices for optimizing your batch inference workflows on Amazon Bedrock, helping you maximize the value of your data across different use cases and industries.
Solution overview
Amazon Bedrock batch inference capability provides a scalable solution for processing large amounts of data across various domains. This fully managed capability enables organizations to run batch jobs CreateModelInvocationJob
It can be done through APIs or the Amazon Bedrock console, simplifying large scale data processing tasks.
In this article, we use call center transcript summarization as an example to showcase the capabilities of batch inference. This use case demonstrates the broad potential of the feature to handle various data processing tasks. The general workflow of batch inference consists of three main phases:
- Data Preparation – Prepare your dataset for optimal processing depending on the model you select. For more information about batch format requirements, see Formatting and Uploading Inference Data.
- Submitting a batch job – Initiate and manage batch inference jobs via the Amazon Bedrock console or APIs.
- Collecting and analyzing the output – Take processed results and integrate them into existing workflows or analytical systems.
By walking through this particular implementation, we aim to show how batch inference can be adapted to suit a variety of data processing needs, regardless of the data source or nature.
Prerequisites
To use the batch inference feature, make sure you meet the following requirements:
Prepare your data
Before starting a batch inference job for call center transcript summarization, it is important to properly format and upload your data. The input data must be in JSONL format, with each row representing one transcript for summarization.
Each line in the JSONL file must follow this structure:
here, recordId
An 11-character alphanumeric string that serves as a unique identifier for each entry. If you omit this field, the batch inference job automatically adds it to the output.
The format is modelInput
The JSON object must match the body fields of the model you want to use. InvokeModel
Request. For example, if you are using Anthropic Claude 3 on Amazon Bedrock, MessageAPI
The model input looks like this code:
When preparing your data, keep in mind the batch inference quotas listed in the following table.
Restriction Name | value | Adjustable through service allocation? |
Maximum number of batch jobs per account per model ID using underlying model | 3 | yes |
Maximum number of batch jobs per account per model ID with custom models | 3 | yes |
Maximum records per file | 50,000 | yes |
Maximum Records per Job | 50,000 | yes |
Minimum Records Per Job | 1,000 | no |
Maximum size per file | 200MB | yes |
Maximum file size for the entire job | 1GB | yes |
For optimal processing, ensure that your input data adheres to these size limits and format requirements. If your dataset exceeds these limits, consider splitting it into multiple batch jobs.
Start a batch inference job
After you prepare and store your batch inference data in Amazon S3, there are two main ways to start a batch inference job: by using the Amazon Bedrock console or by using APIs.
Running a batch inference job in the Amazon Bedrock console
First, let’s walk through the steps to start a batch inference job from the Amazon Bedrock console.
- On the Amazon Bedrock console, inference In the navigation pane.
- choose Batch Inference Select Create a job.
- for job titleenter a name for your training job and select an FM from the list. In this example, we select Anthropic Claude-3 Haiku as the FM for the call center transcript summarization job.
- under Input dataSpecify the S3 location of your prepared batch inference data.
- under Output DataEnter the S3 path of the bucket where you want to store the batch inference output.
- Your data is encrypted with AWS-managed keys by default. If you want to use a different key, Customize encryption settings.
- under Service AccessSelect the method you want to use to authorize Amazon Bedrock. You can choose Use an existing service role You have an access role with a fine-grained IAM policy, or Create and use a new service role.
- Optionally, tag A section to add tags for tracking purposes.
- After you add all the required configurations for your batch inference job, Create a batch inference job.
You can check the status of your batch inference job by choosing the corresponding job name in the Amazon Bedrock console. After the job is completed, detailed job information is displayed, including the model name, job duration, status, and the location of the input and output data.
Run a batch inference job using the API
Alternatively, you can start a batch inference job programmatically using the AWS SDK. Follow these steps:
- Create an Amazon Bedrock client.
- Set the input and output data.
- Start a batch inference job.
- Get and monitor the status of your jobs.
Replace placeholders {bucket_name}
, {input_prefix}
, {output_prefix}
, {account_id}
, {role_name}
, your-job-name
and model-of-your-choice
Use the actual value.
The AWS SDK allows you to programmatically start and manage batch inference jobs, enabling seamless integration with your existing workflows and automation pipelines.
Collect and analyze the output
After a batch inference job is completed, Amazon Bedrock creates a dedicated folder in the specified S3 bucket with the job ID as the folder name, which contains a summary of the batch inference job and the processed inference data in JSONL format.
You can access the processed output in two convenient ways: through the Amazon S3 console or programmatically using the AWS SDK.
Access the output in the Amazon S3 console
To use the Amazon S3 console, complete the following steps:
- In the Amazon S3 console, bucket In the navigation pane.
- Go to the bucket you specified as the output destination for your batch inference job.
- In the bucket, find the folder with the batch inference job ID.
Within this folder you will find the processed data files which you can view and download as needed.
Accessing output data using the AWS SDKs
Alternatively, you can access the processed data programmatically using the AWS SDK. The following code example shows the output for the Anthropic Claude 3 model. If you used a different model, update the parameter values accordingly.
The output file contains not only the processed text, but also the observations and parameters used for inference. Below is a Python example.
In this example, we use the Anthropic Claude 3 model to read the output file from Amazon S3 and then process each line of the JSON data. To access the processed text, data('modelOutput')('content')(0)('text')
,observability data such as input and output tokens, model,outage reason, and inference parameters such as max tokens, temperature, top-p,,top-k.
The output location specified for the batch inference job is manifest.json.out
A file that provides an overview of the records that were processed. This file includes information such as the total number of records processed, the number of records processed successfully, the number of records with errors, and the total number of input and output tokens.
You can then process this data as needed, for example by integrating it into existing workflows or performing further analysis.
Don’t forget to exchange your-bucket-name
, your-output-prefix
and your-output-file.jsonl.out
Use the actual value.
The AWS SDK enables you to programmatically access and manipulate processed data, observability information, inference parameters, and summary information from batch inference jobs, enabling seamless integration with your existing workflows and data pipelines.
Conclusion
Amazon Bedrock batch inference provides a solution to process multiple data inputs in a single API call, as demonstrated in the call center transcript summary example. This fully managed service is designed to handle data sets of various sizes, benefiting a wide range of industries and use cases.
We encourage you to implement batch inference in your projects and experience how you can optimize your interactions with FM at scale.
About the Author
Yanyan Chang YangYang is a Senior Generative AI Data Scientist at Amazon Web Services and a Generative AI Specialist working on cutting edge AI/ML technologies to help customers achieve their desired outcomes using Generative AI. YangYang graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, he loves traveling, working out, and exploring new things.
Ishan Singh As a Generative AI Data Scientist at Amazon Web Services, Ishan helps customers build innovative and responsible Generative AI solutions and products. With extensive AI/ML experience, Ishan specializes in building Generative AI solutions that drive business value. Outside of work, he enjoys playing volleyball, exploring his local bike trails, and spending time with his wife and dog, Bo.
Rahul Virbhadra Mishra He is a Senior Software Engineer with Amazon Bedrock and is passionate about delighting customers by building practical solutions for AWS and Amazon. Outside of work, he enjoys sports and spending quality time with his family.
Mohammed Altaf He is an SDE for AWS AI services based in Seattle, US. He works in the AWS AI/ML technical domain and has worked on building various solutions across teams at Amazon. In his spare time, he enjoys playing chess, snooker and parlor games.