Improving call center efficiency using batch inference for transcript summarization on Amazon Bedrock

Today, we are announcing the general availability of batch inference for Amazon Bedrock. This new feature enables organizations to process large amounts of data when interacting with their foundational models (FMs), addressing a critical need across a range of industries, including call center operations.

Summarizing call center transcripts has become a vital task for businesses looking to derive valuable insights from customer interactions. As the volume of call data grows, traditional analytical methods become insufficient to keep up, calling for scalable solutions.

Batch inference is an attractive approach to address this challenge. It processes large volumes of text transcripts in batches and offers advantages over real-time or on-demand processing approaches by frequently using parallel processing techniques. It is particularly well suited for large call center operations where immediate results are not necessarily required.

In the following sections, we provide a detailed step-by-step guide on implementing these new features, from data preparation to job submission to output analysis. We also discuss best practices for optimizing your batch inference workflows on Amazon Bedrock, helping you maximize the value of your data across different use cases and industries.

Solution overview

Amazon Bedrock batch inference capability provides a scalable solution for processing large amounts of data across various domains. This fully managed capability enables organizations to run batch jobs CreateModelInvocationJob It can be done through APIs or the Amazon Bedrock console, simplifying large scale data processing tasks.

In this article, we use call center transcript summarization as an example to showcase the capabilities of batch inference. This use case demonstrates the broad potential of the feature to handle various data processing tasks. The general workflow of batch inference consists of three main phases:

Data Preparation – Prepare your dataset for optimal processing depending on the model you select. For more information about batch format requirements, see Formatting and Uploading Inference Data.
Submitting a batch job – Initiate and manage batch inference jobs via the Amazon Bedrock console or APIs.
Collecting and analyzing the output – Take processed results and integrate them into existing workflows or analytical systems.

By walking through this particular implementation, we aim to show how batch inference can be adapted to suit a variety of data processing needs, regardless of the data source or nature.

Prerequisites

To use the batch inference feature, make sure you meet the following requirements:

Prepare your data

Before starting a batch inference job for call center transcript summarization, it is important to properly format and upload your data. The input data must be in JSONL format, with each row representing one transcript for summarization.

Each line in the JSONL file must follow this structure:

{"recordId": "11 character alphanumeric string", "modelInput": {JSON body}}

here, recordId An 11-character alphanumeric string that serves as a unique identifier for each entry. If you omit this field, the batch inference job automatically adds it to the output.

The format is modelInput The JSON object must match the body fields of the model you want to use. InvokeModel Request. For example, if you are using Anthropic Claude 3 on Amazon Bedrock, MessageAPI The model input looks like this code:

{
"recordId": "CALL0000001", 
 "modelInput": {
     "anthropic_version": "bedrock-2023-05-31", 
     "max_tokens": 1024,
     "messages": ( { 
           "role": "user", 
           "content": ({"type":"text", "text":"Summarize the following call transcript: ...." )} ),
      }
}

When preparing your data, keep in mind the batch inference quotas listed in the following table.

Restriction Name	value	Adjustable through service allocation?
Maximum number of batch jobs per account per model ID using underlying model	3	yes
Maximum number of batch jobs per account per model ID with custom models	3	yes
Maximum records per file	50,000	yes
Maximum Records per Job	50,000	yes
Minimum Records Per Job	1,000	no
Maximum size per file	200MB	yes
Maximum file size for the entire job	1GB	yes

For optimal processing, ensure that your input data adheres to these size limits and format requirements. If your dataset exceeds these limits, consider splitting it into multiple batch jobs.

Start a batch inference job

After you prepare and store your batch inference data in Amazon S3, there are two main ways to start a batch inference job: by using the Amazon Bedrock console or by using APIs.

Running a batch inference job in the Amazon Bedrock console

First, let’s walk through the steps to start a batch inference job from the Amazon Bedrock console.

On the Amazon Bedrock console, inference In the navigation pane.
choose Batch Inference Select Create a job.
for job titleenter a name for your training job and select an FM from the list. In this example, we select Anthropic Claude-3 Haiku as the FM for the call center transcript summarization job.
under Input dataSpecify the S3 location of your prepared batch inference data.
under Output DataEnter the S3 path of the bucket where you want to store the batch inference output.
Your data is encrypted with AWS-managed keys by default. If you want to use a different key, Customize encryption settings.
under Service AccessSelect the method you want to use to authorize Amazon Bedrock. You can choose Use an existing service role You have an access role with a fine-grained IAM policy, or Create and use a new service role.
Optionally, tag A section to add tags for tracking purposes.
After you add all the required configurations for your batch inference job, Create a batch inference job.

You can check the status of your batch inference job by choosing the corresponding job name in the Amazon Bedrock console. After the job is completed, detailed job information is displayed, including the model name, job duration, status, and the location of the input and output data.

Run a batch inference job using the API

Alternatively, you can start a batch inference job programmatically using the AWS SDK. Follow these steps:

Create an Amazon Bedrock client.

import boto3
bedrock = boto3.client(service_name="bedrock")

Set the input and output data.

input_data_config = {
    "s3InputDataConfig": {
        "s3Uri": "s3://{bucket_name}/{input_prefix}/your_input_data.jsonl"
    }
}
output_data_config = {
    "s3OutputDataConfig": {
        "s3Uri": "s3://{bucket_name}/{output_prefix}/"
    }
}

Start a batch inference job.

response = bedrock.create_model_invocation_job(
    roleArn="arn:aws:iam::{account_id}:role/{role_name}",
    modelId="model-of-your-choice",
    jobName="your-job-name",
    inputDataConfig=input_data_config,
    outputDataConfig=output_data_config
)

Get and monitor the status of your jobs.

job_arn = response.get('jobArn')
status = bedrock.get_model_invocation_job(jobIdentifier=job_arn)('status')
print(f"Job status: {status}")

Replace placeholders {bucket_name}, {input_prefix}, {output_prefix}, {account_id}, {role_name}, your-job-nameand model-of-your-choice Use the actual value.

The AWS SDK allows you to programmatically start and manage batch inference jobs, enabling seamless integration with your existing workflows and automation pipelines.

Collect and analyze the output

After a batch inference job is completed, Amazon Bedrock creates a dedicated folder in the specified S3 bucket with the job ID as the folder name, which contains a summary of the batch inference job and the processed inference data in JSONL format.

You can access the processed output in two convenient ways: through the Amazon S3 console or programmatically using the AWS SDK.

Access the output in the Amazon S3 console

To use the Amazon S3 console, complete the following steps:

In the Amazon S3 console, bucket In the navigation pane.
Go to the bucket you specified as the output destination for your batch inference job.
In the bucket, find the folder with the batch inference job ID.

Within this folder you will find the processed data files which you can view and download as needed.

Accessing output data using the AWS SDKs

Alternatively, you can access the processed data programmatically using the AWS SDK. The following code example shows the output for the Anthropic Claude 3 model. If you used a different model, update the parameter values accordingly.

The output file contains not only the processed text, but also the observations and parameters used for inference. Below is a Python example.

import boto3
import json

# Create an S3 client
s3 = boto3.client('s3')

# Set the S3 bucket name and prefix for the output files
bucket_name="your-bucket-name"
prefix = 'your-output-prefix'
filename="your-output-file.jsonl.out"

# Read the JSON file from S3
object_key = f"{prefix}{filename}"
response = s3.get_object(Bucket=bucket_name, Key=object_key)
json_data = response('Body').read().decode('utf-8')

# Initialize a list
output_data = ()

# Process the JSON data. Showing example for Anthropic Claude 3 Model (update json keys as necessary for a different models) 
for line in json_data.splitlines():
    data = json.loads(line)
    request_id = data('recordId')
    
    # Access the processed text
    output_text = data('modelOutput')('content')(0)('text')
    
    # Access observability data
    input_tokens = data('modelOutput')('usage')('input_tokens')
    output_tokens = data('modelOutput')('usage')('output_tokens')
    model = data('modelOutput')('model')
    stop_reason = data('modelOutput')('stop_reason')
    
    # Access inference parameters
    max_tokens = data('modelInput')('max_tokens')
    temperature = data('modelInput')('temperature')
    top_p = data('modelInput')('top_p')
    top_k = data('modelInput')('top_k')
    
    # Create a dictionary for the current record
    output_entry = {
        request_id: {
            'output_text': output_text,
            'observability': {
                'input_tokens': input_tokens,
                'output_tokens': output_tokens,
                'model': model,
                'stop_reason': stop_reason
            },
            'inference_params': {
                'max_tokens': max_tokens,
                'temperature': temperature,
                'top_p': top_p,
                'top_k': top_k
            }
        }
    }
    
    # Append the dictionary to the list
    output_data.append(output_entry)

In this example, we use the Anthropic Claude 3 model to read the output file from Amazon S3 and then process each line of the JSON data. To access the processed text, data('modelOutput')('content')(0)('text'),observability data such as input and output tokens, model,outage reason, and inference parameters such as max tokens, temperature, top-p,,top-k.

The output location specified for the batch inference job is manifest.json.out A file that provides an overview of the records that were processed. This file includes information such as the total number of records processed, the number of records processed successfully, the number of records with errors, and the total number of input and output tokens.

You can then process this data as needed, for example by integrating it into existing workflows or performing further analysis.

Don’t forget to exchange your-bucket-name, your-output-prefixand your-output-file.jsonl.out Use the actual value.

The AWS SDK enables you to programmatically access and manipulate processed data, observability information, inference parameters, and summary information from batch inference jobs, enabling seamless integration with your existing workflows and data pipelines.

Conclusion

Amazon Bedrock batch inference provides a solution to process multiple data inputs in a single API call, as demonstrated in the call center transcript summary example. This fully managed service is designed to handle data sets of various sizes, benefiting a wide range of industries and use cases.

We encourage you to implement batch inference in your projects and experience how you can optimize your interactions with FM at scale.

About the Author

Yanyan Chang YangYang is a Senior Generative AI Data Scientist at Amazon Web Services and a Generative AI Specialist working on cutting edge AI/ML technologies to help customers achieve their desired outcomes using Generative AI. YangYang graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, he loves traveling, working out, and exploring new things.

Ishan Singh As a Generative AI Data Scientist at Amazon Web Services, Ishan helps customers build innovative and responsible Generative AI solutions and products. With extensive AI/ML experience, Ishan specializes in building Generative AI solutions that drive business value. Outside of work, he enjoys playing volleyball, exploring his local bike trails, and spending time with his wife and dog, Bo.

Rahul Virbhadra Mishra He is a Senior Software Engineer with Amazon Bedrock and is passionate about delighting customers by building practical solutions for AWS and Amazon. Outside of work, he enjoys sports and spending quality time with his family.

Mohammed Altaf He is an SDE for AWS AI services based in Seattle, US. He works in the AWS AI/ML technical domain and has worked on building various solutions across teams at Amazon. In his spare time, he enjoys playing chess, snooker and parlor games.

What's Hot

Private missions led by Elon Musk are booming, and space could be up for sale in 2024

Parents are the ones who are worried about smartphones, not children.

There’s a new way to make everyday items without using fossil fuels

Maximize your file server data’s potential by using Amazon Q Business on Amazon FSx for Windows

LLM continuous self-instruct fine-tuning framework powered by a compound AI system on Amazon SageMaker

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

How Rocket Companies modernized their data science solution on AWS

Orchestrate an intelligent document processing workflow using tools in Amazon Bedrock

Maximize your file server data’s potential by using Amazon Q Business on Amazon FSx for Windows

LLM continuous self-instruct fine-tuning framework powered by a compound AI system on Amazon SageMaker

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

Celebrating mangrove forests with beautiful photography

Streamline RAG applications with intelligent metadata filtering using Amazon Bedrock

NYT mini crossword puzzle answers for July 28

Most Popular

Egypt vs Paraguay live stream: watch Paris 2024 football for free

AI models cannot learn like humans can

Hey Google, fix android auto

Our Picks

Biden Drops Out of Presidential Race in Statement on X

Earth may be about to pass through a comet’s ion tail

Titans vs Saints live stream: How to watch the NFL preseason for free

Subscribe to our newsletter

Subscribe to Updates

What's Hot

Improving call center efficiency using batch inference for transcript summarization on Amazon Bedrock

Solution overview

Prerequisites

Prepare your data

Start a batch inference job

Running a batch inference job in the Amazon Bedrock console

Run a batch inference job using the API

Collect and analyze the output

Access the output in the Amazon S3 console

Accessing output data using the AWS SDKs

Conclusion

About the Author

Related Posts

Subscribe to our newsletter

Subscribe to our newsletter