This is a guest blog posted in collaboration with travelers George Lee, Jordan Knight and Sarah Reynolds.
The foundation model (FMS) is used in various ways and works well with tasks such as text generation, textbooks, and questions. FMS has completed tasks that have been resolved earlier by monitoring learning, which is a subset of machine learning (ML), which contains algorithm training using labeled datasets. In some cases, small monitored models show the ability to execute in the production environment while satisfying potential requirements. However, building FM -based classifications using API services such as Amazon Bedrock means the development speed of the system, the ability to switch between models, quick engineering repeated experiments, and other relevants. There are other benefits such as other associated scalability. Classification task. FM -driven solutions can provide the theoretical basis of output, but traditional classifiers do not have this feature. In addition to these functions, the latest FM is powerful enough to meet the accuracy and the precision requirements for replacing the monitored learning model.
This post describes how the generated AI Innovation Center (Genaiic) develops FM -based classifications through quick engineering in cooperation with major real estate and victims of victim insurance companies. Travelers receive millions of emails a year in customer requests for agents or service policy. Genaiic and Travelers systems use the FMS prediction function to categorize complicated and sometimes vague service demand e -mails into several categories. This FM classification device saves tens of thousands of hours of manual processing and drives an automation system that can redirect that time to more complicated tasks. The Amazon Bedrock’s Anthropic Claude model formulates the problem as a classified task, achieving 91 % of the classification accuracy through quick engineering and partnerships with business theme experts.
Formatization of the problem
The main task was to classify the email received by travelers into a service request category. Requests include areas such as changing addresses, adjusting coverage, updating salary, and changing exposure. I used the trained FM in advance, but the problem was formulated as a text classification task. However, instead of using monitored learning, including training resources, we used a quick engineering with a small number of shot prompts to predict e -mail classes. As a result, we were able to use the trained FM in advance without paying the cost of training. The workflow began with email, giving e -mail text and PDF attached files, so e -mail was classified by models.
It is important to note that fine -tuning FM is another approach that can improve the performance of the classification at an additional cost. By curating the expected output as a list of longer examples, you can train FM to work better in a specific task. In this case, considering that the accuracy is high simply by using a quick engineering, the accuracy after fine -tuning must justify the cost. At the time of the engagement, the mankind’s clode model was not available to fine -tune it with the rocks of the Amazon, but the Claude and Hake fine adjustment of the mankind’s clode is currently in beta testing through the Amazon rock.
Overview of solutions
The following figure shows a solution pipeline for classification of email.
The workflow consists of the following steps:
- Raw e -mail is consumed on the pipeline. Body text is extracted from the e -mail text file.
- If the e -mail has a PDF attachment, the PDF is analyzed.
- PDF is divided into individual pages. Each page is saved as an image.
- The PDF page image is processed by Amazon TextRact and extracts text, specific entities, and table data using optical character recognition (OCR).
- The text from the email is analyzed.
- If necessary, cleaning the HTML tag for the text.
- The text of the e -mail body and the PDF attached file is binding to a single prompt of a large language model (LLM).
- Anthropic’s Claude classifies this content into one of the 13 defined categories and returns that class. Each email prediction is further used to analyze performance.
Amazon TextRact has achieved several purposes, such as extracting the raw text of the form contained as an electronic mail attached file. Includes additional entity extraction and table data detection, and identifies the name, policy number, date, etc. Next, the Amazon TextRact output was specified as a model by combining it with an e -mail text, and the appropriate class was determined.
This solution is serverless and has many advantages for the organization. Using serverless solutions provides managed solutions, promotes declining own costs, and reduces the complexity of maintenance.
data
The ground trousth dataset included an example of an email with more than 4,000 labels. Raw e -mail was in outlook .msg format and Raw .Eml format. Approximately 25 % of e -mails have PDF attachments, most of which were Acord Insures Forms. The PDF form included additional details of providing signals to the classification device. Only the PDF attachment was processed to limit the scope. Other attached files were ignored. In most cases, body texts contained most of the predicted signals in line with one of the 13 classes.
Quick engineering
To build a powerful prompt, it was necessary to fully understand the differences between categories in order to provide sufficient FM explanations. Manual analysis of e -mail texts, and through consulting with business experts, prompts include explicit instructions on e -mail classification methods. Additional instructions have shown a way to identify important phrases that help distinguish e -mail classes from other classes in human Claude. The prompt also included examples of a small number of shots indicating how to execute the classification, and an output example of FM formatting a response. By providing examples and other prompt technologies to FM, we have significantly reduced the dispersion of the structure and content of the FM output, providing explanatory and predictable and reproducible results.
The prompt structure was as follows.
- Persona definition
- Overall instruction
- A small number of shots
- Detailed definition of each class
- Send e -mail to data input
- Final output order
For more information about the quick engineering of mankind, see the quick engineering of human documents.
“Due to the complex insurance terms and the ability of Claude to understand the delicate policy language, it is particularly highly skilled in tasks such as email classification. Even vague communication is the ability to interpret context and intentions. It is completely consistent with the tasks facing, indicating that AWS and AWS can create such an efficient solution and convert the insurance process. I am.
-Jonathan Peloshi, humanity
result
To use FM -based classifications for production, you need to show high -level accuracy. Initial tests without quick engineering gained 68 % accuracy. The accuracy has increased to 91 % after using various techniques using human Claude V2, such as quick engineering, condensed categories, adjustment of document processing processes, and improving instructions. The mankind’s Claude Instant in Amazon Bedrock has also worked well with 90 % accuracy and an additional improvement area.
Conclusion
In this post, we explained how FMS ensures that the category of insurance service e -mail is automated through quick engineering. When formulating a problem as a classification task, FM maintains the expandability to other tasks, gets up quickly and executes, while performing a sufficient performance in the production environment. All experiments were conducted on Amazon Bedrock using a human clode model.
About the author
Jordan Knight A senior data scientist who works for business insurance analysis and research department travelers. His passion is to solve the challenging real world computer vision problem and explore the new cutting -edge methods to do so. He is particularly interested in the social impact of the ML model and how to continue to improve the modeling process to develop a fair ML solution for everyone. At his free time, he can find him to continue to develop rock climbing, hiking, or his somewhat rudimentary cooking skills.
Sala Laenolds A traveler’s product owner. As a member of the Enterprise AI team, she uses AI and cloud -based technology to convert the processing in operation. She has recently acquired MBA and PhD in learning technology and has been a part -time professor at North Texas University.
George Lee AVP, Data Science, and Generative AI Lead for International at Travelers Insurance. He specializes in the development of enterprise AI solutions with specialized knowledge of generated AI and large language models. George leads several successful AI initiatives and has two patents in the risk evaluation equipped with AI. He obtained a computer science master’s degree at the Illinois University of Urban Champa.
Franciscocar Delon A data scientist at the generated AI Innovation Center (GAIIC). As a member of GAIIC, he helps us use the produced AI technology to find the art that can be done with AWS customers. In spare time, Francisco likes to play music and guitars, play soccer with her daughter, and enjoy time with her family.
ISAAC PRIVITERA He is a major data scientist in the AWS Generative AI Innovation Center and develops a custom -made AI -based solution to address customer business issues. His main focus is to build a responsible AI system using RAG, multi -agent system, and fine -tuning models. If you are not immersed in the AI world, you can find the ISAAC on a golf course, enjoy football games, and enjoy a hiking trail, a faithful dog.