Accelerate custom labeling workflows with Amazon SageMaker Ground Truth without using AWS Lambda

Amazon SageMaker Ground Truth enables the creation of high-quality, large-scale training datasets that are essential for fine-tuning across a wide range of applications such as large-scale language models (LLMs) and generative AI. SageMaker Ground Truth dramatically reduces the cost and time required to label data by integrating human annotators and machine learning. Whether you annotate images, video, or text, SageMaker Ground Truth allows you to build accurate datasets while maintaining human oversight and feedback at scale. This human-involved approach is important for tailoring the underlying model to human preferences and enhancing its ability to perform tasks tailored to specific requirements.

To support a variety of labeling needs, SageMaker Ground Truth provides built-in workflows for common tasks such as image classification, object detection, and semantic segmentation. Additionally, you have the flexibility to create custom workflows, allowing you to design your own UI templates for specialized data labeling tasks to suit your unique requirements.

Previously, you needed to specify two AWS Lambda functions to set up a custom labeling job. One is a pre-annotation function that is run on each dataset object before it is sent to the worker, and the other is a post-annotation function that is run on each dataset’s annotations. Create objects and integrate multiple worker annotations as needed. These functions provide valuable customization capabilities, but also add complexity for users who do not require additional data manipulation. In these cases, you would need to create a function that simply returns the input unchanged, which would increase development effort and potentially introduce errors when integrating the Lambda function with UI templates and input manifest files. .

Today, we’re excited to announce that you no longer need to provide pre-annotation and post-annotation Lambda functions when creating custom SageMaker Ground Truth labeling jobs. These functions are now optional in both the SageMaker console and CreateLabelingJob API. This means you can create custom labeling workflows more efficiently when no additional data processing is required.

This post shows you how to use SageMaker Ground Truth to set up a custom labeling job without using Lambda functions. We’ll walk you through setting up a workflow with a multimodal content evaluation template, explain how the workflow works without Lambda functions, and highlight the benefits of this new feature.

Solution overview

Omitting the Lambda function in your custom labeling job simplifies your workflow.

No pre-annotation feature – Data from the input manifest file is inserted directly into the UI template. You can reference data object fields in a template without mapping them using a Lambda function.
No post-annotation feature – Each worker’s annotations are stored as separate JSON files directly in the specified Amazon Simple Storage Service (Amazon S3) bucket, and the annotations are stored under the worker response key. If you do not use post-annotation Lambda functions, the output manifest file references these worker response files instead of including all annotations directly within the manifest.

The following section describes how to use multimodal content evaluation templates to set up custom labeling jobs without using Lambda functions. This allows you to evaluate the image descriptions produced by your model. Annotators can review images, prompts, and model responses and rate responses based on criteria such as accuracy, relevance, and clarity. This provides important human feedback for model fine-tuning and LLM evaluation using reinforcement learning from human feedback (RLHF).

Prepare the input manifest file

To set up a labeling job, start by preparing an input manifest file that the template will use. The input manifest is a JSON Lines file, where each line represents a dataset item to be labeled. Each line contains: source Embedded data field or source-ref A field for a reference to data stored in Amazon S3. These fields are used to provide data objects for annotators to label. For more information about the structure of the input manifest file, see Input Manifest File.

For the specific task of evaluating image descriptions produced by a model, we structure the input manifest to include the following fields:

“sauce” – Prompts provided to the model
“image” – S3 URI of the image associated with the prompt
“Model response” – Description of the image generated by the model

Including these fields allows you to present both the prompt and related data directly to the annotator within the UI template. This approach eliminates the need to pre-annotate Lambda functions because all the necessary information is easily accessible in the manifest file.

The following code is an example of what the lines in the input manifest might look like.

{
  "source": "Describe the following image in four lines",
  "image": "s3://your-bucket-name/path-to-image/image.jpeg",
  "modelResponse": "The image features a stylish pair of over-ear headphones with cushioned ear cups and a tan leather headband on a wooden desk. Soft natural light fills a cozy home office, with a laptop, smartphone, and notebook nearby. A cup of coffee and a pen add to the workspace's relaxed vibe. The setting blends modern tech with a warm, inviting atmosphere."
}

Insert prompts in UI templates

In your UI template, you can insert prompts using: {{ task.input.source }}to display the image. <img> tagged src="https://aws.amazon.com/blogs/machine-learning/accelerate-custom-labeling-workflows-in-amazon-sagemaker-ground-truth-without-using-aws-lambda/{{ task.input.image" grant_read_access }}" (The grant_read_access Liquid filter provides workers with access to S3 objects.) View the model response as follows: {{ task.input.modelResponse }}. Annotators can then evaluate the model’s responses based on predefined criteria such as accuracy, relevance, and clarity using tools such as sliders and text input fields for additional comments. The complete UI template for this task can be found in the GitHub repository.

Create a labeling job in the SageMaker console

To configure a labeling job using the AWS Management Console, follow these steps.

Found in the SageMaker console at: ground truth In the navigation pane, select Label pasting work.
choose Creating a labeling job.
Specify the input manifest location and output path.
choice custom as a type of task.
choose Next.
Enter a title and description for the task.
under templateupload the UI template.

Annotation Lambda functions now have the following optional settings: Additional configuration.

choose preview Display the UI template for review.

choose create Create a labeling job.

Create a labeling job using the CreateLabelingJob API

Create custom labeling jobs programmatically using AWS SDKs to CreateLabelingJob API. After you upload your input manifest file to your S3 bucket and set up your work team, you can define your labeling job in code and omit parameters for your Lambda function if you don’t need them. The following example shows how to do this using Python and Boto3.

In the API, pre-annotated Lambda functions are PreHumanTaskLambdaArn parameters within HumanTaskConfig structure. The Lambda function after annotation is AnnotationConsolidationLambdaArn parameters within AnnotationConsolidationConfig structure. With recent updates, both PreHumanTaskLambdaArn and AnnotationConsolidationConfig is now optional. This means that if your labeling workflow does not require additional data pre- or post-processing, you can omit it.

The following code is an example of how to create a labeling job without specifying a Lambda function.

response = sagemaker.create_labeling_job(
    LabelingJobName="Lambda-free-job-demo",
    LabelAttributeName="label",
    InputConfig={
        "DataSource": {
            "S3DataSource": {
                "ManifestS3Uri": "s3://customer-bucket/path-to-manifest"
            }
        }
    },
    OutputConfig={
        "S3OutputPath": "s3://customer-bucket/path-to-output-file"
    },
    RoleArn="arn:aws:iam::012345678910:role/CustomerRole",

    # Notice, no PreHumanTaskLambdaArn or AnnotationConsolidationConfig!
    HumanTaskConfig={
        "TaskAvailabilityLifetimeInSeconds": 21600,
        "TaskTimeLimitInSeconds": 3600,
        "WorkteamArn": "arn:aws:sagemaker:us-west-2:058264523720:workteam/private-crowd/customer-work-team-name",
        "TaskDescription": " Evaluate model-generated text responses based on a reference image.",
        "MaxConcurrentTaskCount": 1000,
        "TaskTitle": " Evaluate Model Responses Based on Image References",
        "NumberOfHumanWorkersPerDataObject": 1,
        "UiConfig": {
            "UiTemplateS3Uri": "s3://customer-bucket/path-to-ui-template"
        }
    }
)

When an annotator submits a rating, the response is saved directly to the specified S3 bucket. The output manifest file contains the original data fields and worker-response-ref This points to the worker response file in S3. This worker response file contains all annotations for that data object. If multiple annotators worked on the same data object, each individual annotator will answers key, an array of responses. Each response includes annotator input and metadata such as acceptance time, submission time, and worker ID.

This means that all annotations for a given data object are collected in one place, and you can later process or analyze the annotations according to your specific requirements without the need for a post-annotation Lambda function. You have access to all raw annotations and can perform any necessary integrations or aggregations as part of your post-processing workflow.

Advantages of labeling jobs without using Lambda functions

Creating custom labeling jobs without using Lambda functions has the following benefits:

Simplified setup – Create custom labeling jobs faster by skipping the creation and configuration of unnecessary Lambda functions.
time saving – Reducing the number of components in your labeling workflow saves development and debugging time.
Reduced complexity – Fewer moving parts means less chance of configuration errors or integration issues.
cost reduction – Not using Lambda functions reduces costs associated with deploying and invoking these resources.
flexibility – If your project requires preprocessing or annotation integration, you can use Lambda functions to perform the preprocessing and annotation integration. This update provides simplification for simple tasks and flexibility for more complex requirements.

This feature is currently available in all AWS Regions that support SageMaker Ground Truth. In the future, we plan to focus on built-in task types that don’t require annotated Lambda functions to simplify the overall SageMaker Ground Truth experience.

conclusion

The introduction of SageMaker Ground Truth custom labeling job workflows without Lambda functions greatly simplifies the data labeling process. By making Lambda functions optional, we’ve made it easier and faster to set up custom labeling jobs, reducing potential errors and saving you valuable time.

This update removes unnecessary steps for users who don’t require specialized data processing while maintaining the flexibility of custom workflows. Whether you’re performing a simple labeling task or complex multi-step annotation, SageMaker Ground Truth now provides a more streamlined path to high-quality labeled data. It has become.

We encourage you to try this new feature and see how it enhances your data labeling workflow. Check out these resources to get started:

About the author

Sundar Raghavan He is an AI/ML Specialist Solutions Architect at AWS, helping customers leverage SageMaker and Bedrock to build scalable, cost-effective pipelines for computer vision applications, natural language processing, and generative AI. . In his free time, Sander loves exploring new places, trying local eateries, and enjoying the great outdoors.

Alan Ismail is a software engineer at AWS based in New York City. He focuses on building and maintaining scalable AI/ML products such as Amazon SageMaker Ground Truth and Amazon Bedrock Model Evaluation. Outside of work, Alan is learning how to play pickleball, with mixed results.

Yinan Ran is a software engineer at AWS GroundTruth. He worked on GroundTruth, MechanicalTurk, and Bedrock infrastructure, as well as customer-facing projects for GroundTruth Plus. I also focused on product security, remediating risks and creating security tests. In his spare time, he is an audiophile and especially loves practicing Bach’s keyboard compositions.

george king is a summer 2024 intern at Amazon AI. He studied computer science and mathematics at the University of Washington, where he is currently a sophomore-junior. George loves the outdoors, playing games (chess and all types of card games), and exploring Seattle, where he has lived his entire life.

What's Hot

Amazon’s AI Alexa subscription may arrive in October

Dolphins can inhale microplastics, which can damage their lungs

Clown visits may reduce the time kids spend in hospital

Optimize Deepseek-like inference models with rapid optimization on Amazon Bedrock

Transforming financial analysis with CreditAI on Amazon Bedrock: Octus’s journey with AWS

Amazon Bedrock announces general availability of multi-agent collaboration

Accelerating insurance policy reviews with generative AI: Verisk’s Mozart companion

Announcing general availability of Amazon Bedrock Knowledge Bases GraphRAG with Amazon Neptune Analytics

Optimize Deepseek-like inference models with rapid optimization on Amazon Bedrock

Transforming financial analysis with CreditAI on Amazon Bedrock: Octus’s journey with AWS

Amazon Bedrock announces general availability of multi-agent collaboration

NYT Strands July 23rd Hints and Answers

This week of Brat Summer will go down in internet history

What the Fiat 500e electric car is like – Fiat 500 owners tell us

Most Popular

Birds are dumb runners – and maybe dinosaurs were too

How big is the DeepSeek AI model in China?

H5N9 of poultry, the development of tuberculosis in Kansas and RFK, JR. Confirmation hearing

Our Picks

6 troubleshooting tips for Apple CarPlay

Federal government financing freeze will cause permanent damage to medical research.

Is Google’s new Willow quantum computer really such a big deal?

Subscribe to our newsletter

Subscribe to Updates

What's Hot

Accelerate custom labeling workflows with Amazon SageMaker Ground Truth without using AWS Lambda

Solution overview

Prepare the input manifest file

Insert prompts in UI templates

Create a labeling job in the SageMaker console

Create a labeling job using the CreateLabelingJob API

Advantages of labeling jobs without using Lambda functions

conclusion

About the author

Related Posts

Subscribe to our newsletter

Subscribe to our newsletter