How Planview used Amazon Bedrock to build a scalable AI assistant for portfolio and project management

This post was co-authored with Lee Rehwinkel of Planview.

Businesses today face numerous challenges in managing complex projects and programs, extracting valuable insights from large amounts of data, and making timely decisions. These hurdles often become productivity bottlenecks for program managers and executives, hindering their ability to effectively drive organizational success.

Planview, a leading provider of connected work management solutions, has embarked on an ambitious plan to revolutionize the way its 3 million users around the world interact with project management applications in 2023. To realize this vision, Planview developed an AI assistant called Planview Copilot using a multi-agent system powered by Amazon Bedrock.

The development of this multi-agent system presented several challenges.

Ensure tasks are routed to the right AI agent
Access data from various sources and formats
Interacting with multiple application APIs
Enabling self-service creation of new AI skills by different product teams

To overcome these challenges, Planview developed a multi-agent architecture built using Amazon Bedrock. Amazon Bedrock is a fully managed service that provides API access to foundational models (FMs) from Amazon and other leading AI startups. This allows developers to choose the best FM for their use case. This approach is architecturally and organizationally scalable, allowing Planview to quickly develop and deploy new AI skills to meet evolving customer needs.

In this post, we will mainly focus on the first challenge: task routing and managing multiple agents in generative AI architectures. We explore Planview’s approach to this challenge during the development of Planview Copilot and share insights into the design decisions that provide efficient and reliable task routing.

This project was implemented before Amazon Bedrock Agents were generally available, so this post describes a customized homegrown agent. However, Amazon Bedrock Agents is currently the recommended solution for organizations looking to use AI-powered agents in their operations. Amazon Bedrock Agent can maintain memory across interactions, allowing you to provide a more personalized and seamless user experience. Enjoy more consistent and efficient interactions with your agents, with improved recommendations and the ability to recall previous context when needed. We share our learnings in solutions that help you understand how to use AWS technologies to build solutions to achieve your goals.

Solution overview

Planview’s multi-agent architecture consists of multiple generative AI components working together as a single system. At its core, the orchestrator is responsible for routing questions to various agents, gathering learned information, and delivering synthesized responses to users. Orchestrators are managed by a central development team, and agents are managed by each application team.

The orchestrator consists of two main components called routers and responders, which are powered by large-scale language models (LLMs). The router uses AI to intelligently route user questions to different application agents with specialized capabilities. Agents can be divided into three main types:

help agent – Provide application help using Retrieval Extension Generation (RAG)
data agent – Dynamically access and analyze customer data
action agent – Perform actions within the application on behalf of the user

After the agent processes the question and provides a response, the responder also leverages LLM to synthesize the learned information and create a coherent response for the user. This architecture enables seamless collaboration between a centralized orchestrator and specialized agents to provide accurate and comprehensive answers to user questions. The following diagram shows the end-to-end workflow.

Technology overview

Planview built a multi-agent architecture using key AWS services. A central Copilot service, powered by Amazon Elastic Kubernetes Service (Amazon EKS), is responsible for coordinating activity between different services. Responsibilities include:

Managing chat history for user sessions using Amazon Relational Database Service (Amazon RDS)
Coordinate traffic between routers, application agents, and responders
Processing, monitoring, and collecting feedback submitted by users

Routers and responders are AWS Lambda functions that interact with Amazon Bedrock. The router considers the user’s question and chat history from the central Copilot service, and the responder considers the user’s question, chat history, and responses from each agent.

Application teams manage agents using Lambda functions that interact with Amazon Bedrock. To improve visibility, evaluation, and monitoring, Planview has adopted a centralized prompt repository service to store LLM prompts.

Agents can interact with your application using different methods, depending on your use case and data availability.

Existing application API – Agents can communicate with applications via existing API endpoints
Amazon Athena or traditional SQL datastore – Agents can retrieve data from Amazon Athena or other SQL-based data stores to provide relevant information
Amazon Neptune for graph data – Agents can access graph data stored in Amazon Neptune to support complex dependency analysis.
Amazon OpenSearch Service for Document RAG – Agents can perform RAG on documents using Amazon OpenSearch Service

The following diagram shows the generative AI assistant architecture on AWS.

Sample prompts for routers and responders

The router and responder components work together to process user queries and generate appropriate responses. The following prompts show examples of router and responder prompt templates. Additional rapid engineering will be required to improve the reliability of production implementations.

First, we’ll explain the tools available, including their purpose and sample questions each tool asks. Example questions help guide the natural language interaction between the orchestrator and the available agents represented in the tool.

tools=""'
<tool>
<toolName>applicationHelp</toolName>
<toolDescription>
Use this tool to answer application help related questions.
Example questions:
How do I reset my password?
How do I add a new user?
How do I create a task?
</toolDescription>
</tool>
<tool>
<toolName>dataQuery</toolName>
<toolDescription>
Use this tool to answer questions using application data.
Example questions:
Which tasks are assigned to me?
How many tasks are due next week?
Which task is most at risk?
</toolDescription>
</tool>

Router prompts then provide guidelines for whether the agent should respond directly to the user’s query or request information through a specific tool before crafting a response.

system_prompt_router = f'''
<role>
Your job is to decide if you need additional information to fully answer the User's 
questions.
You achieve your goal by choosing either 'respond' or 'callTool'.
You have access to your chat history in <chatHistory></chatHistory> tags.
You also have a list of available tools to assist you in <tools></tools> tags.
</role>
<chatHistory>
{chatHistory}
</chatHistory>
<tools>
{tools}
</tools>
<rules>
- If the chat history contains sufficient information to answer the User's questions, 
choose the 'respond' action.
- To gather more information before responding, choose the 'callTool' action.
- You many only choose from the tools in the <tools></tools> tags.
- If no tool can assist with the question, choose the 'respond' action.
- Place your chosen action within <action></action> tags.
- When you chose the 'callTool' action, provide the <toolName> and the <toolQuestion> you
would like to ask.
- Your <toolQuestion> should be verbose and avoid using pronouns.
- Start by providing your step-by-step thinking in <thinking></thinking> tags.
- Then you will give your answer in <answer></answer> tags.
- Your answer should follow the format of one of these three examples:
When choosing the 'respond' action, your answer should follow the below example EXACTLY:
<example>
<answer>
<action>respond</action>
</answer>
</example>
When choosing the 'callTool' action for a single Tool:
<example>
<answer>
<action>callTool</action>
<tools>
<tool>
<toolName>applicationHelp</toolName>
<toolQuestion>How do I reset my password?</toolQuestion>
</tool>
</tools>
</answer>
</example>
Executing the above, would produce the following result:
You can also call multiple Tools using this format:
<example>
<answer>
<action>callTool</action>
<tools>
<tool>
<toolName>dataQuery</toolName>
<toolQuestion>How many tasks are assigned to me?</toolQuestion>
</tool>
<tool>
<toolName>applicationHelp</toolName>
<toolQuestion>How do I add a new task?</toolQuestion>
</tool>
</tools>
</answer>
</example>
</rules>
'''

Below is a sample response from a router component that starts the dataQuery tool to retrieve and analyze each user’s task assignments.

<thinking>
To determine who has the most tasks assigned, I will need to query the application data. The "dataQuery" tool seems most appropriate for this question.
</thinking>

<answer>
<action>callTool</action>
<tools>
    <tool>
        <toolName>dataQuery</toolName>
        <toolQuestion>Which user has the most tasks currently assigned to them?   </toolQuestion>
    </tool>
</tools>
</answer>

Below is a sample response from a responder component that uses the dataQuery tool to retrieve information about tasks assigned to a user. The user is reported to have 5 tasks assigned to him.

<thinking>
Based on the chat history, I previously called the dataQuery tool to ask "How many tasks are currently assigned to the user?". The tool responded that the user has 5 tasks assigned to them.
</thinking>

<answer>
According to the data I queried previously, you have 5 tasks assigned to you.
</answer>

Model evaluation and selection

Evaluating and monitoring the performance of generative AI models is important in any AI system. Planview’s multi-agent architecture allows evaluation at various component levels, providing comprehensive quality control despite system complexity. Planview evaluates components at three levels:

prompt – Evaluate the effectiveness and accuracy of LLM prompts
AI agent – Evaluate complete prompt chains to maintain optimal task processing and response relevance
AI system – Test user-facing interactions to verify seamless integration of all components

The following diagram shows the prompting and scoring evaluation framework.

To conduct these assessments, Planview uses a carefully crafted set of test questions that cover typical user queries and special cases. These evaluations are performed during the development phase and continue in production to track the quality of responses over time. Currently, human raters play a key role in scoring answers. To aid in assessment, Planview has developed an internal assessment tool to store a library of questions and track responses over time.

To evaluate each component and determine the best Amazon Bedrock model for a given task, Planview established the following prioritized evaluation criteria.

Quality of response – Ensure accuracy, relevance, and usefulness of system responses
response time – Minimize the delay between user queries and system responses
scale – Ensure the system can scale to thousands of concurrent users
Corresponding cost – Optimize operational costs, including AWS services and generative AI models, to maintain economic viability

Based on these criteria and the current use case, Planview selected Anthropic’s Claude 3 Sonnet on Amazon Bedrock for the router and responder components.

Results and impact

Over the past year, Planview Copilot’s performance has significantly improved through implementing a multi-agent architecture, developing a robust evaluation framework, and adopting the latest FM available through Amazon Bedrock. Planview has seen the following results between the first generation of Planview Copilot and the latest version developed in mid-2023:

accuracy – Accuracy of human evaluation improved from 50% response acceptance rate to over 95%.
response time – Average response time reduced from 1 minute to 20 seconds
load test – The AI assistant passed load tests, with no noticeable impact on response time or quality even when sending 1,000 questions simultaneously.
cost effectiveness – Cost per customer interaction has been reduced to one-tenth of the initial cost.
Time to market – Time to develop and deploy new agents reduced from months to weeks

conclusion

In this post, we explored how Planview was able to develop a generative AI assistant to handle complex work management processes by employing the following strategies:

module development – Planview has built a multi-agent architecture with a centralized orchestrator. The solution enables efficient task processing and system scalability, while enabling different product teams to rapidly develop and deploy new AI skills through specialized agents.
Evaluation framework – Planview has implemented a robust evaluation process on multiple levels that is essential to maintaining and improving performance.
Amazon Bedrock integration – Planview uses Amazon Bedrock to innovate faster with a broader model selection and access to a variety of FMs, allowing for flexible model selection based on specific task requirements.

Planview is moving to Amazon Bedrock Agents, enabling the integration of intelligent autonomous agents within your application ecosystem. Amazon Bedrock Agent automates processes by coordinating interactions between underlying models, data sources, applications, and user conversations.

As a next step, you can explore Planview’s AI assistant capabilities built on Amazon Bedrock and stay up to date with new Amazon Bedrock features and releases to advance your AI efforts on AWS .

About the author

sunil ramachandra is a senior solutions architect who enables fast-growing independent software vendors (ISVs) to innovate and accelerate on AWS. He works with customers to build scalable and resilient cloud architectures. When I’m not working with clients, I enjoy running, meditating, watching movies on Prime Video, and spending time with my family.

Benedict Augustine He is a thought leader in generative AI and machine learning and a senior specialist at AWS. He advises client CEOs on their AI strategies, advising them to build a long-term vision while realizing immediate ROI. As Vice President of Machine Learning, Benedict has spent the past 10 years building seven AI-first SaaS products that are now used by Fortune 100 companies and are making a huge impact on their businesses. His research resulted in five patents.

Lee Lewinkel He is the Principal Data Scientist at Planview and has 20 years of experience in incorporating AI and ML into enterprise software. He holds advanced degrees from both Carnegie Mellon University and Columbia University. Mr. Lee spearheads Planview’s research and development of AI capabilities within Planview Copilot. Outside of work, I enjoy rowing my boat on Lady Bird Lake in Austin.

What's Hot

Why machines learn: A smart introduction to understanding what makes AI possible

Book Review: The Remarkable Return of the Most Underappreciated Sense

Connecticut Sun vs Atlanta Dream 2024 Live Stream: Watch WNBA Live

Improve your bike safety with Amazon Rekognition

Earthly Meditation – My Travel and Geology Blog: Tierra del Fuego

Use language embeddings for zero-shot classification and semantic search with Amazon Bedrock

Build a dynamic, role-based AI agent using Amazon Bedrock inline agents

From concept to reality: Navigating the Journey of RAG from proof of concept to production

Improve your bike safety with Amazon Rekognition

Use language embeddings for zero-shot classification and semantic search with Amazon Bedrock

Build a dynamic, role-based AI agent using Amazon Bedrock inline agents

Basketball Paris 2024 Live Stream: Watch Basketball Free

First eye and face transplant patient recovering well

Returning the Amazon rainforest to its true custodians

Most Popular

DigiCert to revoke over 83,000 SSL certificates due to domain validation issues

What the Heck Should Marvel Do About Blade?

Generative AI could generate millions more tonnes of e-waste by 2030

Our Picks

In California, school cafeterias are filled with vegan food, and kids are loving it.

How to never pay full price for PS5 games

Using Amazon Q Business with AWS HealthScribe to gain insights from patient consultations

Subscribe to our newsletter

Subscribe to Updates

What's Hot

How Planview used Amazon Bedrock to build a scalable AI assistant for portfolio and project management

Solution overview

Technology overview

Sample prompts for routers and responders

Model evaluation and selection

Results and impact

conclusion

About the author

Related Posts

Subscribe to our newsletter

Subscribe to our newsletter