Prompt engineering refers to the practice of writing instructions to obtain a desired response from a fundamental model (FM). You may need to spend months experimenting and iterating with your prompts, following best practices for each model, to achieve your desired output. Additionally, these prompts are model and task specific, and performance is not guaranteed when used with another FM. This manual effort required for rapid engineering can reduce your ability to test different models.
Today, we are excited to announce that Prompt Optimization is now available on Amazon Bedrock. With this feature, you can now optimize your prompts for several use cases with a single API call or the click of a button in the Amazon Bedrock console.
In this post, we’ll walk you through some performance benchmarks as well as a use case to get started with this new feature.
Solution overview
At the time of writing, Amazon Bedrock’s prompt optimization supports Anthropic’s Claude 3 Haiku, Claude 3 Sonnet, Claude 3 Opus, and Claude-3.5-Sonnet models, Meta’s Llama 3 70B and Llama 3.1 70B models, Mistral’s Large model, and Amazon’s Titan Text Premier model. Prompt optimization can significantly improve generative AI tasks. Several examples of performance benchmarks for several tasks are performed and discussed.
The following section explains how to use the prompt optimization feature. In this use case, we want to optimize prompts that examine call or chat recordings and categorize the next best action.
Use automatic prompt optimization
To start using this feature, follow these steps:
- In the Amazon Bedrock console, quick management in the navigation pane.
- choose Creating a prompt.
- Enter a name and optional description for your prompt, then select it. create.
- for User messageenter the prompt template you want to optimize.
For example, let’s say you want to look at call or chat recordings and optimize prompts that categorize the next best action as one of the following:
- wait for customer input
- Agent assignment
- escalate
The following screenshot shows how the prompt looks in the Prompt Builder.
- in composition pane, for Generate AI resourceschoose model Please choose your favorite model. This example uses Anthropic’s Claude 3.5 Sonnet.
- choose optimize.
A pop-up will appear indicating that the prompt has been optimized.
Once the optimization is complete, you’ll see the original prompt and the prompt optimized for your use case side by side.
- Add the value to the test variable (in this case,
transcript
) Please select run.
You can then see the output from your model in your desired format.
As you can see in this example, the prompts are more explicit and provide clear instructions on how to process the original transcript provided as a variable. This will result in correct classification with the desired output format. Once your prompt is optimized, you can deploy it to your application by creating a version that takes a snapshot of its configuration. You can save multiple versions and switch between different use case prompt configurations. For more information about versioning and deploying prompts, see Prompt Management.
Performance benchmark
We ran prompt optimization functions on several open source datasets. We’re excited to share the improvements we’ve seen in some important and common use cases that our customers are working on.
- Summary (XSUM)
- RAG-based dialog continuation (DSTC)
- Function call (GLAIVE)
To measure performance improvements with respect to baseline prompts, we use ROUGE-2 F1 for summary use cases, HELM-F1 for dialog continuation use cases, and HELM-F1 and JSON matching for function calls. We saw a performance improvement of 18% for the summary use case, 8% for dialog completion, and 22% for the function call benchmark. The detailed results are shown in the following table.
Use case | original prompt | Optimized prompts | Improved performance |
summary | First, please read the article below. {context} Now, can you write me an extremely short abstract for it? |
<task> Your task is to provide a concise 1-2 sentence summary of the given text that captures the main points or key information. </task> <context> {context} </context> <instructions> Please read the provided text carefully and thoroughly to understand its content. Then, generate a brief summary in your own words that is much shorter than the original text while still preserving the core ideas and essential details. The summary should be concise yet informative, capturing the essence of the text in just 1-2 sentences. </instructions> <result_format> Summary: (WRITE YOUR 1-2 SENTENCE SUMMARY HERE) </result_format> |
18.04% |
Continuation of the dialogue | Functions available: {available_functions} Examples of calling functions: Input: Functions: ({"name": "calculate_area", "description": "Calculate the area of a shape", "parameters": {"type": "object", "properties": {"shape": {"type": "string", "description": "The type of shape (e.g. rectangle, triangle, circle)"}, "dimensions": {"type": "object", "properties": {"length": {"type": "number", "description": "The length of the shape"}, "width": {"type": "number", "description": "The width of the shape"}, "base": {"type": "number", "description": "The base of the shape"}, "height": {"type": "number", "description": "The height of the shape"}, "radius": {"type": "number", "description": "The radius of the shape"}}}}, "required": ("shape", "dimensions")}}) Conversation history: USER: Can you calculate the area of a rectangle with a length of 5 and width of 3? Output: {"name": "calculate_area", "arguments": {"shape": "rectangle", "dimensions": {"length": 5, "width": 3}}} Input: Functions: ({"name": "search_books", "description": "Search for books based on title or author", "parameters": {"type": "object", "properties": {"search_query": {"type": "string", "description": "The title or author to search for"}}, "required": ("search_query")}}) Conversation history: USER: I am looking for books by J.K. Rowling. Can you help me find them? Output: {"name": "search_books", "arguments": {"search_query": "J.K. Rowling"}} Input: Functions: ({"name": "calculate_age", "description": "Calculate the age based on the birthdate", "parameters": {"type": "object", "properties": {"birthdate": {"type": "string", "format": "date", "description": "The birthdate"}}, "required": ("birthdate")}}) Conversation history: USER: Hi, I was born on 1990-05-15. Can you tell me how old I am today? Output: {"name": "calculate_age", "arguments": {"birthdate": "1990-05-15"}} Current chat history: {conversation_history} Respond to the last message. Call a function if necessary. |
|
8.23% |
function call |
|
<task_description> You are an advanced question-answering system that utilizes information from a retrieval augmented generation (RAG) system to provide accurate and relevant responses to user queries. </task_description> <instructions> 1. Carefully review the provided context information: <context> Domain: Restaurant Entity: THE COPPER KETTLE Review: My friend Mark took me to the copper kettle to celebrate my promotion. I decided to treat myself to Shepherds Pie. It was not as flavorful as I'd have liked and the consistency was just runny, but the servers were awesome and I enjoyed the view from the patio. I may come back to try the strawberries and cream come time for Wimbledon.. Highlight: It was not as flavorful as I'd have liked and the consistency was just runny, but the servers were awesome and I enjoyed the view from the patio. Domain: Restaurant Entity: THE COPPER KETTLE Review: Last week, my colleagues and I visited THE COPPER KETTLE that serves British cuisine. We enjoyed a nice view from inside of the restaurant. The atmosphere was enjoyable and the restaurant was located in a nice area. However, the food was mediocre and was served in small portions. Highlight: We enjoyed a nice view from inside of the restaurant. </context> 2. Analyze the user's question: <question> user: Howdy, I'm looking for a British restaurant for breakfast. agent: There are several British restaurants available. Would you prefer a moderate or expensive price range? user: Moderate price range please. agent: Five restaurants match your criteria. Four are in Centre area and one is in the West. Which area would you prefer? user: I would like the Center of town please. agent: How about The Copper Kettle? user: Do they offer a good view?
|
22.03% |
Consistent improvements across different tasks highlight the robustness and effectiveness of prompt optimization to improve prompt performance for a variety of natural language processing (NLP) tasks. This means that prompt optimization allows you to save significant time and effort while achieving better results by implementing best practices for each model and testing your models with optimized prompts. is shown.
conclusion
Amazon Bedrock prompt optimization makes it easy to improve prompt performance for a wide range of use cases with a single API call or a few clicks in the Amazon Bedrock console. Significant improvements demonstrated in open source benchmarks for tasks such as summarization, dialog continuation, and function calls highlight the ability of this new feature to significantly streamline the prompt engineering process. Amazon Bedrock’s Prompt Optimization allows you to easily test different models for your generative AI applications by following best prompt engineering practices for each model. The reduction in manual effort significantly accelerates the development of generative AI applications within organizations.
We encourage you to try Prompt Optimization for your own use case and contact us for feedback and collaboration.
About the author
Shreyas Subramanian is a Principal Data Scientist who uses AWS services to help customers solve business challenges using generative AI and deep learning. Shreyas has a background in large-scale optimization and ML, and the use of ML and reinforcement learning to speed up optimization tasks.
Chris Pecora Generative AI Data Scientist at Amazon Web Services. He is passionate about building innovative products and solutions, while also focusing on customer-focused science. When I’m not experimenting or keeping up with the latest developments in generative AI, I love spending time with kids.
Shen Zhenyuan He is an applied scientist at Amazon Bedrock, specializing in foundational models and ML modeling for complex tasks including understanding natural language and structured data. He is passionate about leveraging innovative ML solutions to enhance products and services and simplify customers’ lives through a seamless blend of science and engineering. Outside of work, I enjoy sports and cooking.
Shipra Canoria Principal Product Manager at AWS. She is passionate about using the power of machine learning and artificial intelligence to help customers solve their most complex problems. Prior to joining AWS, Shipra spent more than four years at Amazon Alexa, launching many productivity-related features in the Alexa voice assistant.