create_response = client.create_guardrail(
name="math-tutoring-guardrail",
description='Prevents the model from providing non-math tutoring, in-person tutoring, or tutoring outside grades 6-12.',
topicPolicyConfig={
'topicsConfig': (
{
'name': 'In-Person Tutoring',
'definition': 'Requests for face-to-face, physical tutoring sessions.',
'examples': (
'Can you tutor me in person?',
'Do you offer home tutoring visits?',
'I need a tutor to come to my house.'
),
'type': 'DENY'
},
{
'name': 'Non-Math Tutoring',
'definition': 'Requests for tutoring in subjects other than mathematics.',
'examples': (
'Can you help me with my English homework?',
'I need a science tutor.',
'Do you offer history tutoring?'
),
'type': 'DENY'
},
{
'name': 'Non-6-12 Grade Tutoring',
'definition': 'Requests for tutoring students outside of grades 6-12.',
'examples': (
'Can you tutor my 5-year-old in math?',
'I need help with college-level calculus.',
'Do you offer math tutoring for adults?'
),
'type': 'DENY'
}
)
},
contentPolicyConfig={
'filtersConfig': (
{
'type': 'SEXUAL',
'inputStrength': 'HIGH',
'outputStrength': 'HIGH'
},
{
'type': 'VIOLENCE',
'inputStrength': 'HIGH',
'outputStrength': 'HIGH'
},
{
'type': 'HATE',
'inputStrength': 'HIGH',
'outputStrength': 'HIGH'
},
{
'type': 'INSULTS',
'inputStrength': 'HIGH',
'outputStrength': 'HIGH'
},
{
'type': 'MISCONDUCT',
'inputStrength': 'HIGH',
'outputStrength': 'HIGH'
},
{
'type': 'PROMPT_ATTACK',
'inputStrength': 'HIGH',
'outputStrength': 'NONE'
}
)
},
wordPolicyConfig={
'wordsConfig': (
{'text': 'in-person tutoring'},
{'text': 'home tutoring'},
{'text': 'face-to-face tutoring'},
{'text': 'elementary school'},
{'text': 'college'},
{'text': 'university'},
{'text': 'adult education'},
{'text': 'english tutoring'},
{'text': 'science tutoring'},
{'text': 'history tutoring'}
),
'managedWordListsConfig': (
{'type': 'PROFANITY'}
)
},
sensitiveInformationPolicyConfig={
'piiEntitiesConfig': (
{'type': 'EMAIL', 'action': 'ANONYMIZE'},
{'type': 'PHONE', 'action': 'ANONYMIZE'},
{'type': 'NAME', 'action': 'ANONYMIZE'}
)
},
blockedInputMessaging="""I'm sorry, but I can only assist with math tutoring for students in grades 6-12. For other subjects, grade levels, or in-person tutoring, please contact our customer service team for more information on available services.""",
blockedOutputsMessaging="""I apologize, but I can only provide information and assistance related to math tutoring for students in grades 6-12. If you have any questions about our online math tutoring services for these grade levels, please feel free to ask.""",
tags=(
{'key': 'purpose', 'value': 'math-tutoring-guardrail'},
{'key': 'environment', 'value': 'production'}
)
)
The API response includes the guardrail ID and version. In the next section, you will use these two fields to manipulate the guardrail.
Build the test dataset
of tests.csv
The files in the project directory are math-tutoring-guardrail
Created in the previous step. Create your own dataset data
Save the folder in the project directory as a CSV file with the same structure as the sample tests.csv
Create files based on your specific use case. The dataset must contain the following columns:
-
-
-
test_number
Unique identifier for each test case.
-
-
-
-
-
test_type
It’s eitherINPUT
orOUTPUT
.
-
-
-
-
-
test_content_query
A user query or input.
-
-
-
-
-
test_content_grounding_source
AI context information (if applicable).
-
-
-
-
-
test_content_guard_content
AI response (OUTPUT
test).
-
-
-
-
-
expected_action
is set toGUARDRAIL_INTERVENED
orNONE
. set toGUARDRAIL_INTERVENED
If the prompt needs to be blocked by a guardrail andNONE
When a prompt passes through a guardrail.
-
-
Make sure your test dataset comprehensively tests all elements of your guardrail system. Load the test file into your workflow using Python’s pandas library. use df.head()
Now we can check the first 5 rows of the pandas dataframe object and confirm that the dataset was read correctly.
# Import the data file
import pandas as pd
df = pd.read_csv('data/tests.csv')
df.head()
Evaluate guardrails using test datasets
To run tests on the guardrails you create, use the ApplyGuardrails API. This enforces model input or model response output text guardrails without the need to call FM.
of ApplyGuardrail
The API requires:
- guardrail identifier – Unique ID of the guardrail being tested
- Guardrail version – Version of guardrail to test
- sauce – The source of the data used in the request to apply the guardrail (
INPUT
orOUTPUT
) - content – Details used to request guardrail application
Use the guardrail ID and version. CreateGuardrail
API response. The source and content are extracted from the test CSV created in the previous step. The following code reads a CSV file and prepares the source and content. ApplyGuardrails
API call:
with open(input_file, 'r') as infile, open(output_file, 'w', newline="") as outfile:
reader = csv.DictReader(infile)
fieldnames = reader.fieldnames + ('test_result', 'achieved_expected_result', 'guardrail_api_response')
writer = csv.DictWriter(outfile, fieldnames=fieldnames)
writer.writeheader()
for row_number, row in enumerate(reader, start=1):
content = ()
if row('test_type') == 'INPUT':
content = ({"text": {"text": row('test_content_query')}})
elif row('test_type') == 'OUTPUT':
content = (
{"text": {"text": row('test_content_grounding_source'), "qualifiers": ("grounding_source")}},
{"text": {"text": row('test_content_query'), "qualifiers": ("query")}},
{"text": {"text": row('test_content_guard_content'), "qualifiers": ("guard_content")}},
)
# Remove empty content items
content = (item for item in content if item('text')('text'))
can make a call ApplyGuardrail
API for each row in the test dataset. You can decide on guardrail actions based on the API response. If the guardrail behavior matches the expected behavior, the test is considered. True
(pass), if not False
(We’re screwed). Additionally, each line of the API response is saved, allowing users to inspect the response as needed. These test results are written to an output CSV file. See the code below.
with open(input_file, 'r') as infile, open(output_file, 'w', newline="") as outfile:
reader = csv.DictReader(infile)
fieldnames = reader.fieldnames + ('test_result', 'achieved_expected_result', 'guardrail_api_response')
writer = csv.DictWriter(outfile, fieldnames=fieldnames)
writer.writeheader()
for row_number, row in enumerate(reader, start=1):
content = ()
if row('test_type') == 'INPUT':
content = ({"text": {"text": row('test_content_query')}})
elif row('test_type') == 'OUTPUT':
content = (
{"text": {"text": row('test_content_grounding_source'), "qualifiers": ("grounding_source")}},
{"text": {"text": row('test_content_query'), "qualifiers": ("query")}},
{"text": {"text": row('test_content_guard_content'), "qualifiers": ("guard_content")}},
)
# Remove empty content items
content = (item for item in content if item('text')('text'))
# Make the actual API call
response = apply_guardrail(content, row('test_type'), guardrail_id, guardrail_version)
if response:
actual_action = response.get('action', 'NONE')
expected_action = row('expected_action')
achieved_expected = actual_action == expected_action
# Prepare the API response for CSV
api_response = json.dumps( {
"action": actual_action,
"outputs": response.get('outputs', ()),
"assessments": response.get('assessments', ())
})
# Write the results
row.update({
'test_result': actual_action,
'achieved_expected_result': str(achieved_expected).upper(),
'guardrail_api_response': api_response
})
else:
# Handle the case where the API call failed
row.update({
'test_result': 'API_CALL_FAILED',
'achieved_expected_result': 'FALSE',
'guardrail_api_response': json.dumps({"error": "API call failed"})
})
writer.writerow(row)
print(f"Processed row {row_number}") # New line to print progress
print(f"Processing complete. Results written to {output_file}")
After reviewing the test results, you can update the guardrails as necessary to meet the needs of your application. This approach allows you to practice TDD when using Amazon Bedrock Guardrails. You can see the failed tests in the following table. the result, achieved_expected_result
There is FALSE
Because the guardrails intervened when they shouldn’t have. Therefore, you can modify the rejected topics and additional filters on the guardrails to ensure that this test passes.
Using a TDD approach, you can prevent bad actors from exploiting your applications, identify previously unconsidered edge cases and gaps, and improve the success rate of your guardrails in adhering to responsible AI policies. , you can improve your guardrails over time.
Optional: Automate workflows and iteratively improve guardrails
We recommend checking the test results after each iteration. This procedure does not guarantee that the guardrail will pass all tests. Use this step to understand how to modify an existing guardrail configuration.
When practicing a TDD approach, we recommend improving your guardrails over time through multiple iterations. This optional step allows you to prompt the user for details. Then use those details to build guardrails and test the case from the beginning. Next, let the user enter n iterations. Rerun all tests at each iteration and adjust rejected topics in the guardrails based on test results.
To create a guardrail, prompt the user for a name and description for the guardrail. Use the InvokeModel API using the specified description. guardrail_prompt.txt
Generate a guardrail denied topic using a system prompt. Use this configuration to build a guardrail by calling the CreateGuardrail API. You can verify that the new guardrail was created by refreshing the Amazon Bedrock Guardrails dashboard. In the following screenshot, you can see that a new guardrail has been created for the Photos application.
Using the same parameters, you can use the InvokeModel API to generate test cases for your newly created guardrail. of tests_prompt.txt
This file provides system prompts to confirm that FM will create 30 test cases, including 20 input tests and 10 output tests. To practice TDD, use these test cases and iteratively modify the existing guardrails n times according to user requirements based on the test results of each iteration.
The process of iteratively modifying existing guardrails consists of four steps:
- Use the GetGuardrail API to get the latest configuration for your guardrail.
- Create a new version of the guardrail at each iteration using the CreateGuardrailVersion API. This allows you to track all guardrails that have changed throughout each iteration. This API works asynchronously, so your code will continue to run even if your guardrails are not fully versioned. use.
guardrail_ready_check
Function to verify that guardrail is inside‘READY’
The state before code execution continues.
of guardrail_ready_check
The function uses the GetGuardrail API to retrieve the current status of the guardrail. If the guardrail is not on the inside, ‘READY’
This function implements wait logic until a state is reached or a timeout error occurs.
- Evaluate the guardrails against:
auto_generated_tests.csv
using the fileprocess_tests
The function created in the previous step:
of input_file
will be yours auto_generated_tests.csv
file. however, output_file
Dynamically named based on iteration. For example, for iteration 3, the resulting file name would be: test_results_3.csv
.
- Based on the test results for each iteration, use the InvokeModel API to generate modified rejected topics. of
get_denied_topics
The function isguardrail_prompt.txt
Design your model to consider test results and guardrail descriptions when modifying rejected topics when calling the API.
updated_topics = get_denied_topics(guardrail_description, current_denied_topics, test_results)
- Call the UpdateGuardrail API using the newly generated rejected topic.
update_guardrail
function. This will provide the updated configuration to your existing guardrails and update them accordingly.
update_guardrail(current_id, current_name, current_description, current_version, updated_topics)
After completion n When iterating, n Created guardrail version and n test results as shown in the following screenshot. This allows you to review each iteration and update the guardrail configuration to meet your application’s requirements. When using TDD, it’s important to validate your test results and make sure you’re improving them over time to get the best results.
cleaning
In this solution, we created guardrails, built a dataset, evaluated the guardrails against the dataset, and iteratively modified the guardrails based on test results. To clean up, use the DeleteGuardrail API, which deletes a guardrail using the guardrail ID and guardrail version.
Pricing
This solution uses Amazon Bedrock and charges based on FM calls and guardrail usage.
- FM call – You are charged based on the number of input and output tokens. Depending on the model used, one token corresponds to one word or subword. For this solution, I used Anthropic’s Claude 3 Sonnet and Claude 3 Haiku models. The size of the input and output tokens is based on the test prompt size and response size.
- guardrail – You will be charged based on your guardrail policy configuration. Each policy is billed per 1,000 text units, and each text unit can contain up to 1,000 characters.
For more information, see Amazon Bedrock Pricing.
conclusion
When developing generative AI applications, it is important to implement robust safeguards and governance measures to maintain responsible use of AI. Amazon Bedrock Guardrails provides a framework to accomplish this. However, guardrails are not static entities. Continuous improvement and adaptation are required to keep pace with evolving use cases, malicious threats, and responsible AI policies. TDD is a software development methodology that promotes software improvement through iterative development cycles.
As this post shows, you can employ TDD when building safeguards for your generative AI applications. By systematically testing and refining guardrails, companies can not only reduce potential risks and operational inefficiencies, but also foster a culture of knowledge sharing among technical teams and encourage continuous improvement in AI development. and can facilitate strategic decision-making.
As new edge cases emerge and use cases evolve, we recommend integrating a TDD approach into your software development practices to ensure that your safety measures improve over time. If you have any questions, please leave a comment on this post or open an issue on GitHub.
About the author
Harsh Patel is an AWS Solutions Architect who supports over 200 SMB customers across the U.S. to drive digital transformation through cloud-native solutions. As an AI and ML specialist, he focuses on generative AI, computer vision, reinforcement learning, and anomaly detection. Outside of the world of technology, I recharge by going to the golf course and hiking through beautiful scenery with my dog.
Aditi Rajnish He is a second-year software engineering student at the University of Waterloo. Her interests include computer vision, natural language processing, and edge computing. She is also passionate about community-based STEM support and advocacy. In his spare time, he enjoys rock climbing, playing the piano, and learning how to bake the perfect scone.
Raj Pathak is a Principal Solutions Architect and Technical Advisor to Fortune 50 and midsize FSI (banking, insurance, and capital markets) clients in Canada and the United States. Raj specializes in machine learning with applications of generative AI, natural language processing, intelligent document processing, and MLOps.