Amazon SageMaker Ground Truth is a powerful data labeling service provided by AWS that uses a diverse workforce of human annotators to label various types of data, including text, images, videos, and 3D point clouds. provides a comprehensive and scalable platform for In addition to traditional, custom-tuned deep learning models, SageMaker Ground Truth also supports generative AI use cases, enabling the generation of high-quality training data for artificial intelligence and machine learning (AI/ML) models. Masu. SageMaker Ground Truth includes a self-service option and an AWS managed option known as SageMaker Ground Truth Plus. This post focuses on getting started with SageMaker Ground Truth Plus by creating a project and sharing the data you need to label.
Solution overview
First, fill out the consultation form on the Getting Started with Amazon SageMaker Ground Truth page, or if you already have an AWS account, submit the request project form in the SageMaker Ground Truth Plus console. An AWS expert will contact you to review your specific data labeling requirements. You can share specific requirements, such as subject matter expertise, language expertise, or the geographic location of the labeler. If you submitted a consultation form, you can submit the request project form in the SageMaker Ground Truth Plus console and it will be approved without further discussion. When you submit a project request, the project status changes as follows: Under review to request approved.
Next, create your project team. A project team includes people who collaborate on a project. Each team member receives an invitation to join the project. Next, upload the data you need to label to your Amazon Simple Storage Solution (Amazon S3) bucket. To add that data to your project, go to your project portal, create a batch, and include your S3 bucket URL. Every project consists of one or more batches. Each batch consists of data objects to be labeled.
Currently, the SageMaker Ground Truth Plus team takes over and procures annotators based on your specific data labeling needs, trains the annotators based on your labeling requirements, and creates the UI for labeling your data. After the labeled data passes internal quality checks, it is returned to the S3 bucket for use in training ML models.
The following diagram shows the solution architecture.
Using the steps outlined in this post, you’ll be able to set up your data labeling project in no time. This includes requesting a new project, setting up a project team, and creating batches containing data objects that need to be labeled.
Prerequisites
This tutorial requires that you meet the following prerequisites:
- AWS account.
- The URI of the S3 bucket where your data is stored. The bucket must be in the US East (N. Virginia) AWS Region.
- AWS Identity and Access Management (IAM) users. If you are the owner of the AWS account, you have administrator access and can skip this step. If your AWS account is part of an AWS organization, you can ask your AWS administrator to grant your IAM users the necessary permissions. The following identity-based policy specifies the minimum permissions an IAM user needs to perform all the steps in this post (specifying the name of the S3 bucket where your data is stored).
Request a project
To request a project, follow these steps:
- Found in the SageMaker console at: ground truth In the navigation pane, select plus.
- choose Request a project.
- for business email addressplease enter a valid email.
- for project nameenter a meaningful name without spaces or special characters.
- for Type of taskNow select the option that best represents the type of data you need to label.
- for Contains PIITurn on only if the data includes personally identifiable information (PII).
- for Role of IAMThe role you choose gives SageMaker Ground Truth Plus permissions to access your data in Amazon S3 and run your labeling jobs. You can specify an IAM role using one of the following options:
- choose Create an IAM role (Recommended). This provides access to the specified S3 bucket and automatically attaches the necessary permissions and trust policy to the role.
- Enter your custom IAM role ARN.
- Select an existing role.
If you don’t have permission to create an IAM role, you can ask your AWS administrator to create one for you. If you use an existing role or a custom IAM role ARN, the IAM role must have the following permissions and trust policies:
The following code is the permission policy.
The following code is the trust policy.
- choose Request a project.
under ground truth Can be selected in the navigation pane plus To see the projects listed in project Status section Under review.
An AWS representative will contact you within 72 hours to review your project requirements. Once this review is complete, the project’s status will change to . Under review to request approved.
Create a project team
SageMaker Ground Truth uses Amazon Cognito to manage employees and work team members. Amazon Cognito is the service you use to create identities for your employees. To create a project team, follow these steps:
- Found in the SageMaker console at: ground truth In the navigation pane, select plus.
- Please select Build a project team.
The remaining steps vary depending on whether you are creating a new user group or importing an existing group.
Option 1: Create a new Amazon Cognito user group
You can use this option if you do not want to import members from existing Amazon Cognito user groups in your account, or if your account does not have any Amazon Cognito user groups.
- Select when creating a project team. Create a new Amazon Cognito user group.
- for Amazon Cognito user group nameenter a meaningful name without spaces.
- for email addressenter up to 50 addresses. Use commas between addresses.
- choose preview invitation Check the email sent to the specified email address.
- choose Create a project team.
under ground truth In the navigation pane, select plus To see the project teams listed in project team section. The email address you added is member section.
Option 2: Import existing Amazon Cognito user groups
You can use this option if your account has existing Amazon Cognito user groups whose members you want to import.
- When creating a project team, select: Import existing Amazon Cognito user groups.
- for Select an existing Amazon Cognito user group and Select the user group whose members you want to import.
- choose Create a project team.
under ground truth In the navigation pane, select plus To see the project teams listed in project team section. The email address you added is member section.
Access the project portal and create a batch
You can use the project portal to create batches with unlabeled input data and track the status of previously created batches within a project. To access the project portal, make sure you have created at least one project and at least one project team with one verified member.
- Located in the SageMaker console at: ground truth In the navigation pane, select plus.
- choose Open project portal.
- Log in to the project portal using the project team user credentials created in the previous step.
A list of all your projects will be displayed in the project portal.
- Select a project to open its details page.
- in batch section, selection Creating a batch.
- Enter the batch name, batch description, S3 location for the input dataset, and S3 location for the output dataset.
- choose submit.
To successfully create a batch, ensure that the following criteria are met:
- The S3 bucket is located in the US East (N. Virginia) region.
- The maximum size of each file is 2 GB or less.
- The maximum number of files in a batch is 10,000.
- The total size of the batch is less than 100 GB.
- The batch sent is batch Status section Your request has been sent. Once the data transfer is complete, the status will change to . received data.
Next, the SageMaker Ground Truth Plus team sets up a data labeling workflow. This changes the batch status to: in progress. Annotators label data. Users complete data quality checks by accepting or rejecting labeled data. Rejected objects are returned to the annotator and relabeled. Accepted objects are delivered to an S3 bucket so they can be used to train ML models.
conclusion
SageMaker Ground Truth Plus provides a seamless solution for building high-quality training datasets for ML models. SageMaker Ground Truth Plus eliminates the overhead of building and managing your own labeling workforce by automating data labeling workflows using expert labelers managed by AWS. With a user-friendly interface and integrated tools, you can easily submit data, specify label requirements, and monitor project progress. When you receive accurately labeled data, you can train your models with confidence and maintain optimal performance and accuracy. Streamline your ML projects and focus on building innovative solutions with the power of SageMaker Ground Truth Plus.
For more information, see Label your data using Amazon SageMaker Ground Truth Plus.
About the author
joydeep saha He is a systems development engineer at AWS with expertise in designing and implementing solutions that drive business outcomes for customers. His current focus revolves around building cloud-native, end-to-end data labeling solutions that help customers unlock the full potential of their data and help them through accurate and reliable machine learning models. Enabling you to drive success.
Amidani I’m a senior technical program manager at AWS with a focus on AI/ML services. During her career, she has focused on delivering innovative software development projects to the federal government and large corporations in a variety of industries, including advertising, entertainment, and finance. Ami has experience driving business growth, implementing innovative training programs, and successfully managing complex, high-impact projects.