Optimize Deepseek-like inference models with rapid optimization on Amazon Bedrock

Now available in the Amazon Bedrock Marketplace, Amazon Sagemaker Jumpstart and Amazon Bedrock serverless models, the Deepseek-R1 model has recently become popular due to its long, elaborate thinking style that leads to impressive performances in highly challenging mathematical benchmarks, like the Aime-2024 and Math-500, according to the published results of Deepseek. Claude Sonnet 3.5, GPT 4o, and Openai O1 (details on this paper).

During training, researchers show how DeepSeek-R1-Zero naturally learns to solve tasks with more thought times, leading to improved performance. What is often ignored, however, is the number of thought tokens needed for inference time and the time and cost of generating these tokens before answering the original question.

This post shows how to use rapid optimization in Amazon Bedrock to optimize inference models such as DeepSeek-R1.

Long reasoning chains and challenges with the largest token limit

Try some simple questions with DeepSeek-R1.

For specified math problems, Nate’s dog can dig six holes per day. He digs for 14 days while Nate is on vacation. When Nate gets home, he starts filling nine holes a day, but the dog continues to dig six new holes each night. How many weeks will it take him to fill all the holes? Write down the steps you will take to resolve it.

In Amazon Bedrock Chat/Text Playground, you can select and follow the new DeepSeek-R1 model, as shown in the following screenshot:

Based on the question, it may be that the inference model is not finished thinking within the overall maximum token budget.

Increase your output token budget so that your model can be thought of for a longer period of time. The maximum token increased from 2,048 to 4,096, so you will see the model inference for a while before printing out the final answer.

The appendix at the end of this post provides a complete response. You can also disrupt the inference procedure to show only the final answer.

The thought process didn’t end, as seen in a 2,048 token budget. This not only cost 2,048 tokens of time and money, but I also didn’t get the final answer! This observation of high token counts for thought usually leads to several follow-up questions such as:

Is it possible to reduce the number of thought tokens and still get the correct answer?
Can thoughts be limited to the maximum number of thought tokens or to the thinking budget?
At a high level, should a thought-intensive model like DeepSeek be used at all in real-time applications?

This post shows how to optimize thinking models such as DeepSeek-R1 using rapid optimization in Amazon Bedrock, bringing more concise thought traces without sacrificing more concise thought traces.

Optimize DeepSeek-R1 prompts

To start optimizing your prompts, select DeepSeek-R1 in the Amazon Bedrock Model Playground, enter the prompt, select the Magic Wand icon, or use Amazon Bedrock optimize_prompt() API. Also, use rapid optimization in the console, add variables as needed, set the model to DeepSeek-R1, set the model parameters, and click “Optimize”.

A challenging dataset is first needed to demonstrate how rapid optimization in Amazon Bedrock can help with inference models. The Last Examination of Humanity (HLE) is a benchmark for highly challenging questions from a number of subject areas and is designed to be the “final” closed benchmark of widespread academic ability. HLE is multimodal, with questions that are either text-only or with image references, and includes both multiple selection and accurate match questions for automatic answer validation. Questions require knowledge of various vertical, deep domains. They are clear and resistant to simple internet searches or database retrieval. In the context, some cutting-edge models (including thinking models) work poorly in benchmarks (see the results table in this complete paper).

Let’s take a look at the question in this example dataset.

In an alternate universe where the mass of the electron was 1% heavier and the charges of the
electron and proton were both 1% smaller, but all other fundamental constants stayed the same, 
approximately how would the speed of sound in diamond change? 

Answer Choices:
A. Decrease by 2%
B. Decrease by 1.5%
C. Decrease by 1%
D. Decrease by 0.5%
E. Stay approximately the same 
F. Increase by 0.5%
G. Increase by 1%
H. Increase by 1.5%
I. Increase by 2%

The problem requires a deep understanding of physics, and most large-scale language models (LLMs) today fail. Our goal with rapid optimization in Amazon Bedrock for inference models is to reduce the number of thought tokens, but not sacrifice accuracy. After using prompt optimization, the optimized prompts are:

## Question
<extracted_question_1>In an alternate universe where the mass of the electron was 1% heavier 
and the charges of the electron and proton were both 1% smaller, but all other fundamental constants
stayed the same, approximately how would the speed of sound in diamond change?

Answer Choices:
A. Decrease by 2%
B. Decrease by 1.5%
C. Decrease by 1%
D. Decrease by 0.5%
E. Stay approximately the same
F. Increase by 0.5%
G. Increase by 1%
H. Increase by 1.5%
I. Increase by 2%</extracted_question_1>

## Instruction
Read the question above carefully and provide the most accurate answer possible.
If multiple choice options are provided within the question, respond with the entire text of the
correct answer option, not just the letter or number. Do not include any additional explanations or
preamble in your response.

Remember, your goal is to answer as precisely and accurately as possible!

The following diagram shows how to correctly get the final answer in this particular case with a 35% reduction in the number of thought tokens (B. 1.5% reduction). Here, the number of thought tokens has decreased from 5,000 to 3,300. You will also notice that in this and other examples using the original prompt, some of the inference is summed or repeated before the final answer. As you can see in this example, the optimized prompts provide clear instructions, separate different prompt sections, and provide additional guidance based on the type of question and how to answer. This leads to both shorter, clearer inference traces and a directly extractable final answer.

Optimized prompts can lead to the correct answer as opposed to the wrong answer after long-term thinking. Because thinking does not guarantee the correct final answer. In this case, the number of thought tokens decreases from 5,000 to 1,555, and we see that the answer is obtained directly, not after a long, long thought explanation. The following diagram shows an example:

The previous two examples show how rapid optimization improves results and shortens the output tokens for models such as Deepseek R1. The rapid optimization was also applied to 400 questions from HLE. The following table summarizes the results.

experiment	Overall accuracy	Average number of prompt tokens	Average number of token completions (Thought + Response)	Average number of tokens (response) only)	Average number of tokens (thinking only)	The thought percentage is complete (6,000 maximum output token)
Baseline Deep Seek	8.75	288	3334	271	3063	80.0%
Prompt optimized DeepSeek	11	326	1925	27	1898	90.3%

As you can see, the overall accuracy jumps to 11% in this subset of the HLE dataset, reducing the number of thoughts and output tokens (and thus reducing the time to the last token and cost). The experiment shows that there is no explicit mention of reducing thought tokens, but clearer and detailed instructions on the tasks after rapid optimization may reduce additional efforts by models such as DeepSeek-R1 to self-cooperate or deeper understanding of the problem. Rapid optimization of the inference model improves the final answer by making sure that the quality of thought and overall flow is self-adaptive and dependent on questions.

Conclusion

In this post, we demonstrated how rapid optimization on Amazon Bedrock can effectively enhance the performance of thought-intensive models like DeepSeek-R1. Through experiments using the HLE dataset, the optimized prompt not only significantly reduces the number of thought tokens, but also improved overall accuracy from 8.75% to 11%. Optimization has resulted in more efficient inference paths, shorter response times and reduced costs, without sacrificing the quality of the responses. This improvement in both efficiency and effectiveness suggests that rapid optimization can be a valuable tool for deploying inference-rich models in production environments where both accuracy and computational resources need to be carefully balanced. As the field of AI continues to evolve with more sophisticated thinking models, technologies such as rapid optimization become increasingly important for practical applications.

To get started with quick optimizations on Amazon Bedrock, optimise your prompts and improve the performance of your generated AI applications with prompt optimizations on Amazon Bedrock.

appendix

Below is the complete answer to Nate’s dog questions.


Okay, let's see. So the problem is about Nate's dog digging holes and Nate trying to fill them.
I need to figure out how many weeks it takes Nate to fill all the holes once he starts. Let me break
it down step by step.\n\nFirst, the dog digs 6 holes a day. He does this for 14 days while Nate is on
vacation. So when Nate comes back, there are a certain number of holes already dug. Let me calculate
that first. If the dog digs 6 holes each day for 14 days, that's 6 multiplied by 14. Let me do that
math: 6*14=84. So there are 84 holes when Nate gets home.\n\nNow, once Nate starts filling the holes,
he fills 9 holes per day. But here's the catch: the dog keeps digging 6 new holes every night. So 
each day, Nate fills 9 holes, but the dog adds 6 more overnight. I need to figure out the net change
per day. That would be the number of holes filled minus the new holes dug. So 9 filled - 6 dug = 3 
holes net reduction per day. Wait, but does this happen every day? Let me make sure.\n\nSo Nate works
during the day, filling 9 holes. Then, during the night, the dog digs 6 new holes. So each full day
(day and night) results in a net decrease of 3 holes. So starting from 84 holes, each day the total
number decreases by 3. But I need to check if the process is that Nate fills 9, then the dog digs 6,
so the next day starts with 84 -9 +6 = 84 -3 =81, and so on. So yes, each day the net is -3.\n\nBut
wait, maybe the timing is important. Let's think: when Nate starts filling, does he fill 9 holes on
the first day, and then the dog digs 6 that night? So each day after Nate fills, the dog adds 6. So
the sequence is: Day 1: Nate fills 9, holes become 84-9=75. Then night: dog digs 6, holes become
75+6=81. Day 2: Nate fills 9, holes 81-9=72. Night: +6=78. So each full day cycle (day and night)
reduces the total by 3. So yes, net 3 per day.\n\nBut wait, maybe on the last day, if Nate fills the
remaining holes and the dog doesn't get to dig that night because all holes are filled. Hmm, that's a
crucial point. For example, if on a certain day, the number of holes left is less than or equal to 9,
Nate can fill them all in that day, and the dog doesn't add any more because there are no holes left
to dig. So we need to check if on the day when Nate fills the remaining holes, the total would reach
zero, and the dog doesn't dig that night. So we can't just do total holes divided by 3, because the
last day might not have the dog digging.\n\nSo let's model this properly. Let's denote the number of
holes each day.\n\nStarting with 84 holes.\n\nEach day:\n\n1. Nate fills 9 holes. Subtract 9.\n\n2.
If the result is zero or negative, then done.\n\n3. Otherwise, the dog digs 6 holes. Add 6.\n\nSo we
need to iterate this until the number of holes is zero or less after Nate fills them.\n\nLet me try
to calculate day by day.\n\nStart: 84\n\nDay 1:\n\nAfter Nate fills: 84 -9 =75\n\nSince 75 >0, dog
digs 6: 75 +6=81\n\nDay 2:\n\nAfter Nate fills:81-9=72\n\nDog digs 6:72+6=78\n\nDay3:\n\n78-
9=69\n\n+6=75\n\nDay4:75-9=66+6=72\n\nDay5:72-9=63+6=69\n\nDay6:69-9=60+6=66\n\nDay7:66-
9=57+6=63\n\nDay8:63-9=54+6=60\n\nDay9:60-9=51+6=57\n\nDay10:57-9=48+6=54\n\nDay11:54-
9=45+6=51\n\nDay12:51-9=42+6=48\n\nDay13:48-9=39+6=45\n\nDay14:45-9=36+6=42\n\nDay15:42-
9=33+6=39\n\nDay16:39-9=30+6=36\n\nDay17:36-9=27+6=33\n\nDay18:33-9=24+6=30\n\nDay19:30-
9=21+6=27\n\nDay20:27-9=18+6=24\n\nDay21:24-9=15+6=21\n\nDay22:21-9=12+6=18\n\nDay23:18-
9=9+6=15\n\nDay24:15-9=6+6=12\n\nDay25:12-9=3+6=9\n\nDay26:9-9=0\n\nAt this point, after Nate fills
on day26, the holes are 0. Since there are no holes left, the dog can't dig any more that night. So
the process stops here.\n\nSo total days:26 days.\n\nBut the question asks for how many weeks. Since
a week is 7 days, we divide 26 by7.\n\n26 divided by7 is 3 with a remainder of5. So 3 weeks and5
days. But the question says \"how many weeks does it take him to fill in all the holes\". Depending
on whether they want it rounded up to full weeks or expressed as a fraction. Let me check the problem
statement again.\n\n\"How many weeks does it take him to fill in all the holes?\"\n\nTypically, such
problems expect the answer in whole weeks, rounded up if there's a remainder. Because you can't have
a fraction of a week in practical terms. So 26 days is 3 weeks and5 days, which is more than3 weeks
but less than4. So they might expect 4 weeks. But let me verify.\n\nAlternatively, maybe they want
the exact number in weeks as a fraction. 26 days is 26/7 weeks \u22483.714 weeks. But the problem
might expect an integer number of weeks, rounded up. Let's see.\n\nBut let's check the problem again.
It says \"how many weeks does it take him to fill in all the holes?\" So if it takes 26 days, which
is 3 weeks and5 days, then the answer would be 4 weeks if we're counting full weeks needed. Because
after 3 weeks (21 days), he still has 5 days left. So total weeks is4.\n\nBut maybe the answer is
26/7 weeks, which is approximately3.714, but since the problem might expect an exact answer in weeks,
possibly as a fraction. However, the problem might be designed to have a whole number. Wait, let's
check the math again.\n\nWait, when I did the day-by-day count, I got 26 days. Let me recount to make
sure I didn't make a mistake.\n\nStarting from day1 to day26. Let's check day26:\n\nStart with
day1:84\n\nAfter day1:81\n\n...\n\nLet me check day25:\n\nAfter day24:12\n\nDay25:12-9=3, then dog
digs6:3+6=9\n\nDay26:9-9=0. So day26 is the day when Nate fills the last 9 holes. So yes, 26
days.\n\n26 days divided by7 days/week=3.714... weeks. So the answer is 26/7 weeks, which is3 and5/7
weeks. But the problem might want it expressed as a mixed number or an exact number. However, the
question says \"how many weeks does it take him\", so likely they expect the answer in weeks,
possibly as a fraction. But let's see if there's a better way to model this without counting each
day.\n\nAlternative approach:\n\nTotal holes after vacation:84\n\nEach day, Nate fills9, dog digs6,
net -3 per day, except possibly the last day.\n\nBut if on the last day, Nate fills the remaining
holes without the dog digging, then we need to adjust.\n\nLet\u2019s model it as:\n\nLet H be the
number of holes. Start with H=84.\n\nEach day, H decreases by9, then if H>0, increases by6.\n\nWe
need to find the number of days until H<=0.\n\nThis can be modeled as:\n\nEach full cycle (day and
night) reduces H by3, except the last day where if H<=9, then H becomes0 and no addition.\n\nSo
the number of full cycles (days where H>9 after filling) plus the final day.\n\nLet\u2019s
denote:\n\nLet\u2019s say after n days, the holes would be:\n\nEach day, H = H -9 +6 = H -3, but only
if after subtracting9, H >=1 (since if H-9 <=0, then the dog doesn\u2019t dig). Wait, no. Wait,
the dog digs6 only if after Nate fills9, there are still holes left. Wait, no. Wait, the problem says
the dog keeps digging6 new holes every night. So regardless of whether there are holes left, the dog
digs6 every night. Wait, but if all holes are filled, can the dog dig new ones? The problem says
\"fill in all the holes\", so once all holes are filled, the dog can't dig new ones because there are
no existing holes? Wait, the problem might be implying that the dog digs new holes regardless, but
Nate is filling the existing ones. Wait, the problem says \"the dog keeps digging6 new holes every
night\". So perhaps every night, regardless of the current state, the dog digs6 new holes. But that
would mean that even if Nate fills all holes on a day, the dog digs6 new ones that night, making it
impossible to ever finish. But that can't be, because the problem asks how long to fill all holes,
which implies that eventually Nate can catch up.\n\nWait, this is a critical point. Let me re-examine
the problem statement:\n\n\"When Nate gets home, he starts filling in 9 holes a day, but the dog
keeps digging 6 new holes every night. How many weeks does it take him to fill in all the holes?
\"\n\nSo the key is: every night, the dog digs6 new holes. So even if Nate fills all the holes during
the day, the dog will dig6 new ones that night. But that would mean the process never ends. But the
problem must assume that once all holes are filled, the dog stops. But the wording is ambiguous.
However, logically, if all holes are filled, there's nothing for the dog to dig. So perhaps the dog
only digs6 holes if there's ground to dig, but the problem states \"keeps digging6 new holes every
night\", implying that regardless of the current state, the dog digs6 each night. This is a problem
because if Nate ever fills all holes, the next night the dog digs6, so Nate would have to fill those6
the next day, but then the dog digs6 again, leading to an infinite loop. But the problem must have a
solution, so likely the dog stops digging once all holes are filled. Therefore, the correct
interpretation is that each night, the dog digs6 new holes, but if there are no holes (i.e., all
filled), the dog can't dig. Therefore, the last day when Nate fills the remaining holes, the dog
doesn't dig that night.\n\nTherefore, the correct model is:\n\nEach day:\n\n1. Nate fills9
holes.\n\n2. If holes remain after filling, the dog digs6 that night.\n\nSo the process is:\n\nStart
with H=84.\n\nWhile H>0:\n\n H = H -9\n\n if H >0:\n\n H = H +6\n\n days +=1\n\nSo let's model
this with equations.\n\nLet\u2019s find the number of days required.\n\nThis is similar to the
classic problem where each day you have a net change, but with a different condition on the last
day.\n\nLet\u2019s denote that each \"cycle\" (day and night) reduces H by3, except possibly the last
day where H <=9, so after filling, H becomes0, and no addition.\n\nSo the total number of days is
the number of full cycles (each reducing H by3) plus the final day.\n\nBut how to calculate this?
\n\nLet\u2019s think of it as:\n\nLet\u2019s suppose that after n days, the number of holes is H_n
=84 -3n.\n\nBut this is only true if every day the net change is -3. However, this would be the case
until H_n -9 <=0. Wait, no. Wait, the H after filling is H -9. If H -9 <=0, then the dog
doesn't dig, so the total reduction is H (to 0). So we need to find the day when H -9 <=0.\n\nSo
let\u2019s solve for when H -9 <=0:\n\nH =84 -3(n-1) -9 <=0\n\nWait, this is getting
complicated. Let me use another approach.\n\nLet\u2019s consider that each day before the final day,
the net change is -3. On the final day, the net change is -9 (since the dog doesn\u2019t add
6).\n\nLet\u2019s let x be the number of full days (with net -3), and 1 final day (net -9).\n\nTotal
reduction: 3x +9 =84\n\nSo 3x =84 -9=75\n\nx=75/3=25\n\nSo x=25 days of net -3, and 1 final day of 
-9.\n\nTotal days=25+1=26 days.\n\nYes, this matches the day-by-day count. Therefore, 26 
days.\n\nConvert to weeks:26/7=3.714... weeks. But the problem asks \"how many weeks does it take
him\". If they want a whole number of weeks, rounded up, it's4 weeks. If they accept fractions,
it's26/7\u22483.714. But the problem might expect the answer in weeks as a mixed number, like3 and5/7
weeks. However, typically, such problems expect the answer in whole weeks, rounded up. Let me check
the problem statement again.\n\nThe problem says \"how many weeks does it take him to fill in all the
holes\". The answer should be in weeks, and since it's more than3 weeks but less than4, but you can't
have a fraction of a week in practical terms when scheduling, so they might expect4 weeks. However,
mathematically, it's26/7 weeks. Let me see if the problem expects an exact answer or a whole 
number.\n\nLooking back at the problem statement, it's a math problem, so likely expects the exact
value, which is26/7 weeks, which can be written as3 5/7 weeks. However, sometimes problems like this
expect the answer in whole weeks, rounding up. Let's check both possibilities.\n\nIf the answer is26
days, which is3 weeks and5 days, then depending on the context:\n\n- If partial weeks count as a full
week, then4 weeks.\n\n- If exact, then3 5/7 weeks.\n\nBut the problem doesn't specify, so as a math
problem, the precise answer is26/7 weeks, which is3 and5/7 weeks. However, the question might want it
in weeks as a whole number, rounded up. Let's see.\n\nBut let's check the problem again. The problem
says \"how many weeks does it take him to fill in all the holes?\" The answer choices (if any) aren't
given, but since the user hasn't provided options, we need to infer. In math problems like this,
unless specified otherwise, the answer is expected in the most precise form, which would be26/7 weeks
or3 5/7 weeks. However, sometimes problems expect whole numbers, so maybe the answer is4 weeks. But
let's verify with the day count.\n\n26 days is exactly3 weeks and5 days. If the question asks for the
number of weeks, and if part of a week counts as a full week, then it's4. But if they want the exact
number, it's3 5/7. However, the problem might have an integer answer. Wait, let's check the
calculations again.\n\nWait, when I did the day-by-day count, I got26 days. Let me verify that again
quickly.\n\nStart with84.\n\nEach day:\n\n1. Subtract9.\n\n2. If result >0, add6.\n\nRepeat until
H<=0.\n\nLet\u2019s model this with a formula.\n\nLet\u2019s denote that each day, except the
last, the net change is -3. The last day is -9.\n\nSo total holes =84 =3x +9\n\nSo3x=75 \u2192x=25
days of net -3, plus1 day of -9. Total26 days.\n\nYes, that's correct. So26 days.\n\n26 days divided
by7 days/week=3.714... weeks.\n\nBut the problem might want the answer as a mixed number, so3 and5/7
weeks. However, in some contexts, especially in school problems, they might expect the answer in
whole weeks, rounded up, so4 weeks. But since the problem is from Nate's perspective, he would count
the exact days, so converting26 days to weeks gives3 weeks and5 days. But the question asks for
weeks, so unless specified, it's better to present it as26/7 weeks, which is3 5/7 weeks. However,
let's check if there's a different approach that gives a whole number.\n\nAlternatively, maybe I made
a mistake in the day count. Let me recount the days more carefully.\n\nStarting with84
holes.\n\nDay1:\n\nFill9:84-9=75. Since75>0, dog digs6:75+6=81.\n\nDay2:\n\n81-9=72>0
\u2192+6=78.\n\nDay3:78-9=69>0 \u2192+6=75.\n\nDay4:75-9=66>0 \u2192+6=72.\n\nDay5:72-9=63>0
\u2192+6=69.\n\nDay6:69-9=60>0 \u2192+6=66.\n\nDay7:66-9=57>0 \u2192+6=63.\n\nDay8:63-9=54>0
\u2192+6=60.\n\nDay9:60-9=51>0 \u2192+6=57.\n\nDay10:57-9=48>0 \u2192+6=54.\n\nDay11:54-
9=45>0 \u2192+6=51.\n\nDay12:51-9=42>0 \u2192+6=48.\n\nDay13:48-9=39>0 
\u2192+6=45.\n\nDay14:45-9=36>0 \u2192+6=42.\n\nDay15:42-9=33>0 \u2192+6=39.\n\nDay16:39-
9=30>0 \u2192+6=36.\n\nDay17:36-9="

About the Author

Shreyas Subramanian A leading data scientist, helping customers by using generative AI and solving business challenges using AWS services. Shrayas has a background in large-scale optimization and ML, and augmentation learning to accelerate ML use and optimization tasks.

Zhengyuan Shen He is an applied scientist at Amazon Bedrock and specializes in basic models and ML modeling for complex tasks such as natural language and structured data understanding. He is passionate about leveraging innovative ML solutions to enhance products and services, simplifying the lives of his customers through a seamless blend of science and engineering. Outside of work, he enjoys sports and cooking.

Xuan Qi An applied scientist at Amazon Bedrock, applying a physics background to tackle the complex challenges of machine learning and artificial intelligence. Xuan is passionate about translating scientific concepts into practical applications that promote concrete improvements to technology. Her work focuses on creating more intuitive and efficient AI systems that allow you to better understand and interact with the world. Outside of professional pursuits, Xuan finds balance and creativity through his love of dancing and playing the violin, bringing the accuracy and harmony of these arts to scientific efforts.

Shuai Wang He is a senior applied scientist and manager at Amazon Bedrock, specializing in natural language procedures, machine learning, large-scale language modeling, and other related AI fields.

What's Hot

House of the Dragon, Season 2, Episode 7: Did Rhaenyra know Vermithor would go on a murderous spree?

Yelp is suing Google, and here’s why.

UK Cast Review flops badly for trans kids