This post was co-authored with Lee Rehwinkel of Planview.
Businesses today face numerous challenges in managing complex projects and programs, extracting valuable insights from large amounts of data, and making timely decisions. These hurdles often become productivity bottlenecks for program managers and executives, hindering their ability to effectively drive organizational success.
Planview, a leading provider of connected work management solutions, has embarked on an ambitious plan to revolutionize the way its 3 million users around the world interact with project management applications in 2023. To realize this vision, Planview developed an AI assistant called Planview Copilot using a multi-agent system powered by Amazon Bedrock.
The development of this multi-agent system presented several challenges.
- Ensure tasks are routed to the right AI agent
- Access data from various sources and formats
- Interacting with multiple application APIs
- Enabling self-service creation of new AI skills by different product teams
To overcome these challenges, Planview developed a multi-agent architecture built using Amazon Bedrock. Amazon Bedrock is a fully managed service that provides API access to foundational models (FMs) from Amazon and other leading AI startups. This allows developers to choose the best FM for their use case. This approach is architecturally and organizationally scalable, allowing Planview to quickly develop and deploy new AI skills to meet evolving customer needs.
In this post, we will mainly focus on the first challenge: task routing and managing multiple agents in generative AI architectures. We explore Planview’s approach to this challenge during the development of Planview Copilot and share insights into the design decisions that provide efficient and reliable task routing.
This project was implemented before Amazon Bedrock Agents were generally available, so this post describes a customized homegrown agent. However, Amazon Bedrock Agents is currently the recommended solution for organizations looking to use AI-powered agents in their operations. Amazon Bedrock Agent can maintain memory across interactions, allowing you to provide a more personalized and seamless user experience. Enjoy more consistent and efficient interactions with your agents, with improved recommendations and the ability to recall previous context when needed. We share our learnings in solutions that help you understand how to use AWS technologies to build solutions to achieve your goals.
Solution overview
Planview’s multi-agent architecture consists of multiple generative AI components working together as a single system. At its core, the orchestrator is responsible for routing questions to various agents, gathering learned information, and delivering synthesized responses to users. Orchestrators are managed by a central development team, and agents are managed by each application team.
The orchestrator consists of two main components called routers and responders, which are powered by large-scale language models (LLMs). The router uses AI to intelligently route user questions to different application agents with specialized capabilities. Agents can be divided into three main types:
- help agent – Provide application help using Retrieval Extension Generation (RAG)
- data agent – Dynamically access and analyze customer data
- action agent – Perform actions within the application on behalf of the user
After the agent processes the question and provides a response, the responder also leverages LLM to synthesize the learned information and create a coherent response for the user. This architecture enables seamless collaboration between a centralized orchestrator and specialized agents to provide accurate and comprehensive answers to user questions. The following diagram shows the end-to-end workflow.
Technology overview
Planview built a multi-agent architecture using key AWS services. A central Copilot service, powered by Amazon Elastic Kubernetes Service (Amazon EKS), is responsible for coordinating activity between different services. Responsibilities include:
- Managing chat history for user sessions using Amazon Relational Database Service (Amazon RDS)
- Coordinate traffic between routers, application agents, and responders
- Processing, monitoring, and collecting feedback submitted by users
Routers and responders are AWS Lambda functions that interact with Amazon Bedrock. The router considers the user’s question and chat history from the central Copilot service, and the responder considers the user’s question, chat history, and responses from each agent.
Application teams manage agents using Lambda functions that interact with Amazon Bedrock. To improve visibility, evaluation, and monitoring, Planview has adopted a centralized prompt repository service to store LLM prompts.
Agents can interact with your application using different methods, depending on your use case and data availability.
- Existing application API – Agents can communicate with applications via existing API endpoints
- Amazon Athena or traditional SQL datastore – Agents can retrieve data from Amazon Athena or other SQL-based data stores to provide relevant information
- Amazon Neptune for graph data – Agents can access graph data stored in Amazon Neptune to support complex dependency analysis.
- Amazon OpenSearch Service for Document RAG – Agents can perform RAG on documents using Amazon OpenSearch Service
The following diagram shows the generative AI assistant architecture on AWS.
Sample prompts for routers and responders
The router and responder components work together to process user queries and generate appropriate responses. The following prompts show examples of router and responder prompt templates. Additional rapid engineering will be required to improve the reliability of production implementations.
First, we’ll explain the tools available, including their purpose and sample questions each tool asks. Example questions help guide the natural language interaction between the orchestrator and the available agents represented in the tool.
Router prompts then provide guidelines for whether the agent should respond directly to the user’s query or request information through a specific tool before crafting a response.
Below is a sample response from a router component that starts the dataQuery tool to retrieve and analyze each user’s task assignments.
Below is a sample response from a responder component that uses the dataQuery tool to retrieve information about tasks assigned to a user. The user is reported to have 5 tasks assigned to him.
Model evaluation and selection
Evaluating and monitoring the performance of generative AI models is important in any AI system. Planview’s multi-agent architecture allows evaluation at various component levels, providing comprehensive quality control despite system complexity. Planview evaluates components at three levels:
- prompt – Evaluate the effectiveness and accuracy of LLM prompts
- AI agent – Evaluate complete prompt chains to maintain optimal task processing and response relevance
- AI system – Test user-facing interactions to verify seamless integration of all components
The following diagram shows the prompting and scoring evaluation framework.
To conduct these assessments, Planview uses a carefully crafted set of test questions that cover typical user queries and special cases. These evaluations are performed during the development phase and continue in production to track the quality of responses over time. Currently, human raters play a key role in scoring answers. To aid in assessment, Planview has developed an internal assessment tool to store a library of questions and track responses over time.
To evaluate each component and determine the best Amazon Bedrock model for a given task, Planview established the following prioritized evaluation criteria.
- Quality of response – Ensure accuracy, relevance, and usefulness of system responses
- response time – Minimize the delay between user queries and system responses
- scale – Ensure the system can scale to thousands of concurrent users
- Corresponding cost – Optimize operational costs, including AWS services and generative AI models, to maintain economic viability
Based on these criteria and the current use case, Planview selected Anthropic’s Claude 3 Sonnet on Amazon Bedrock for the router and responder components.
Results and impact
Over the past year, Planview Copilot’s performance has significantly improved through implementing a multi-agent architecture, developing a robust evaluation framework, and adopting the latest FM available through Amazon Bedrock. Planview has seen the following results between the first generation of Planview Copilot and the latest version developed in mid-2023:
- accuracy – Accuracy of human evaluation improved from 50% response acceptance rate to over 95%.
- response time – Average response time reduced from 1 minute to 20 seconds
- load test – The AI assistant passed load tests, with no noticeable impact on response time or quality even when sending 1,000 questions simultaneously.
- cost effectiveness – Cost per customer interaction has been reduced to one-tenth of the initial cost.
- Time to market – Time to develop and deploy new agents reduced from months to weeks
conclusion
In this post, we explored how Planview was able to develop a generative AI assistant to handle complex work management processes by employing the following strategies:
- module development – Planview has built a multi-agent architecture with a centralized orchestrator. The solution enables efficient task processing and system scalability, while enabling different product teams to rapidly develop and deploy new AI skills through specialized agents.
- Evaluation framework – Planview has implemented a robust evaluation process on multiple levels that is essential to maintaining and improving performance.
- Amazon Bedrock integration – Planview uses Amazon Bedrock to innovate faster with a broader model selection and access to a variety of FMs, allowing for flexible model selection based on specific task requirements.
Planview is moving to Amazon Bedrock Agents, enabling the integration of intelligent autonomous agents within your application ecosystem. Amazon Bedrock Agent automates processes by coordinating interactions between underlying models, data sources, applications, and user conversations.
As a next step, you can explore Planview’s AI assistant capabilities built on Amazon Bedrock and stay up to date with new Amazon Bedrock features and releases to advance your AI efforts on AWS .
About the author
sunil ramachandra is a senior solutions architect who enables fast-growing independent software vendors (ISVs) to innovate and accelerate on AWS. He works with customers to build scalable and resilient cloud architectures. When I’m not working with clients, I enjoy running, meditating, watching movies on Prime Video, and spending time with my family.
Benedict Augustine He is a thought leader in generative AI and machine learning and a senior specialist at AWS. He advises client CEOs on their AI strategies, advising them to build a long-term vision while realizing immediate ROI. As Vice President of Machine Learning, Benedict has spent the past 10 years building seven AI-first SaaS products that are now used by Fortune 100 companies and are making a huge impact on their businesses. His research resulted in five patents.
Lee Lewinkel He is the Principal Data Scientist at Planview and has 20 years of experience in incorporating AI and ML into enterprise software. He holds advanced degrees from both Carnegie Mellon University and Columbia University. Mr. Lee spearheads Planview’s research and development of AI capabilities within Planview Copilot. Outside of work, I enjoy rowing my boat on Lady Bird Lake in Austin.