SK Telecom (SKT) is a leading Korean telecommunications company serving 30 million customers and is at the forefront of AI innovation. In line with the AI Pyramid strategy, which aims to unlock the potential of AI for anyone, anywhere, anytime, SKT is collaborating with the AWS Generative AI Innovation Center (GenAIIC) Custom Models Program to build Amazon Bedrock for carriers. I explored a domain trained model using . Usage example.
This partnership is in line with SKT’s vision to leverage AI expertise and strategic partnerships to develop innovative AI-based products and services. One such effort focused on developing a custom solution for reference-based, evidence-based question answering (Q&A).
Search Augmentation Generation (RAG) is a common technique used in Q&A tasks to improve factual accuracy and knowledge base. However, RAGs face challenges in generating responses that do not match the tone, style, and mannerisms preferred by carrier use cases, as well as retrieving irrelevant documents, which can lead to inaccurate responses. There is. To address this, SKT and AWS GenAIIC aimed to use model customization to improve the Anthropic Claude model on Amazon Bedrock in three key areas:
- Provide concise and informative answers
- Correctly reference links from retrieved documents
- Respond in a tone and style consistent with SKT and similar to truthful answers.
Additionally, the team uses synthetic data generated by larger language models (LLMs) to improve the performance of smaller models for knowledge distillation and scenarios with limited labeled training data. We have considered improving the .
Amazon Bedrock is a fully managed service that offers a variety of LLMs and foundational models (FMs), as well as features such as Amazon Bedrock Knowledge Bases, Amazon Bedrock Agents, and Amazon Bedrock Guardrails, to power many generative AI use cases. Amazon Bedrock is the only fully managed service that offers the ability to fine-tune Claude models. Amazon Bedrock provides an intuitive and secure way to fine-tune Anthropic’s Claude model and more. Fine-tuned cloud models can be deployed using Amazon Bedrock and seamlessly use Amazon Bedrock’s features. For example, Amazon Bedrock Knowledge Base for carrier domain-specific RAGs and Amazon Bedrock Agent for agent usage.
In this post, we share how SKT uses Amazon Bedrock to customize the Anthropic Claude model for carrier-specific Q&A on SKT’s telecom technical documentation.
Solution overview
The team considered a combination of on-the-fly optimization, customization (tweaking), and data augmentation with synthetic data. This multifaceted approach aimed to maximize the benefits of each technique for the grounded Q&A generation task.
The following sections describe these methods in detail.
Anthropic’s Claude Customization and Prompt Optimization
Tweaks available through Amazon Bedrock for various FMs, including Anthropic’s Claude, allow you to adapt pre-trained language models to specific use cases. This is especially effective when adjusting response style and formal adherence.
The team began by optimizing system prompts and implementing standardized guidelines for answer formatting and document citations based on human models that encourage best practices. The main focus areas are:
- Clear representation of system commands
- Consistent use of code block format
- Context-based customized responses
This combination of rapid engineering and fine-tuning resulted in significant improvements.
- ROUGE-3 score increased by more than 50%
- ROUGE-L score improved by more than 25%
- Embedded similarity score increased by more than 4%
- Significant advances in accurate bibliographic citations
The iterative enhancement process demonstrated cumulative benefits, with immediate updates alone showing 35-40 percent improvements in key metrics, and the final customized model showing 50-60 percent improvements in some metrics. Improvements have been achieved.
This progress clearly demonstrates the cumulative benefits of model customization through RAG, prompt engineering, and fine-tuning, resulting in significant improvements over both the baseline and prompt-updated versions in terms of ROUGE scores and citation accuracy. We have obtained a model that exceeds this. The ROUGE score measures the similarity between the ground truth and the generated results by calculating the overlap of words in N-grams. The following table summarizes these improvements.
LLM | Instant update | Fine adjustment | Relative improvement over baseline | ||
Rouge-3 | Rouge L | Citation accuracy | |||
Claude the Man 3 Sonnets | – | – | baseline | baseline | baseline |
Claude the Man 3 Sonnets | ✅ | – | +38.30% | +13.4% | +52.94% |
Claude the Man 3 Sonnets | ✅ | ✅ | +58.1% | +26.8% | +70.59% |
Synthetic data for fine tuning
To address the challenge of limited high-quality labeled training data, the team considered synthetic data generation techniques. This approach also facilitates the distillation of knowledge from large-scale LLMs to smaller, more targeted models, offering benefits such as reduced latency and costs.
The team conducted a control experiment using:
- Baseline set of 500 ground truth samples
- Expanded set with 500 original samples and over 1,500 synthetic samples
- Large original set of 2,000 samples
Synthetic data was generated using Anthropic’s Claude Sonnet 3, creating new question and answer pairs for the same retrieved documents used in the ground truth example.
Results were evaluated using both LLM-based comparison and human preference assessment. Human raters blindly ranked the model outputs by assigning scores based on their preferences (best: 4, second: 3, third: 2, worst: 1). The results of the human preference evaluation scores are shown in the table below.
rank | model | Cumulative score (Highest possible: 160) |
1 | Fine-tuned with 2,000 original samples | 114 |
2 | Fine-tune with 500 original and 1,500 synthetic samples | 112 |
3 | Fine-tune with 500 original samples | 85 |
4 | No fine-tuning (baseline) | 84 |
Key findings include:
- Small training set (500 samples) showed minimal improvement compared to baseline
- Large training set (2,000 samples) scores much higher
- Synthetically augmented data performs similarly to the original data of the same size
It’s always ideal to have large amounts of domain-specific training data, but many companies have limited datasets available. In such scenarios, synthetic data can play an important role in place of the original data. This shows the potential of synthetic data in model customization.
conclusion
SK Telecom’s collaboration with AWS GenAIIC demonstrates the company’s commitment to developing innovative AI solutions for telecom challenges. By customizing Anthropic’s Claude model using Amazon Bedrock, SKT achieved significant performance improvements for a telco-specific Korean use case without having to build a model from scratch. The proof of concept demonstrated significant improvements.
- Increase your ROUGE-3 score by up to 58%
- Increase your ROUGE-L score by up to 27%
- Significant improvements in returning correct referral links
This approach, combined with synthetic data generation techniques, aligns with SKT’s AI pyramid strategy and enables faster testing and development of new approaches. As SKT continues to focus on key areas such as personal AI assistant, AI healthcare, and AI data center, this partnership with AWS marks an important step in the evolution of AI and long-term competitiveness in the global AI environment. It represents.
If you are interested in collaborating with AWS on a similar project, please visit the Generative AI Innovation Center.
About the author
Hong Sung Min is a Senior Applied Scientist at the AWS Generative AI Innovation Center, where he helps accelerate a variety of use cases for AWS customers. Before joining Amazon, Sungmin was a postdoctoral fellow at Harvard Medical School. He holds a Ph.D. in computer science from New York University. Outside of work, Sungmin enjoys hiking, reading, and cooking.
Sujeong Cha She is a deep learning architect at the AWS Generative AI Innovation Center, specializing in model customization and optimization. She has extensive hands-on experience leveraging generative AI and traditional AI/ML solutions to solve business use cases for customers. Sujeong holds a master’s degree in data science from New York University.
Arijit Ghosh Chowdhury He is a scientist at the AWS Generative AI Innovation Center, working on model customization and optimization. In his role, he focuses on applied research in fine-tuning and model evaluation to enable GenAI in various industries. He holds a master’s degree in computer science from the University of Illinois at Urbana-Champaign, where his research focused on question answering, search, and domain adaptation.
Yiyue Chen is an Applied Scientist II in the AWS Generative AI Innovation Center, where he helps deliver generative AI solutions to AWS customers. In this role, she works with a team of experts to develop innovative AI-driven models for AWS customers across a variety of industries. Yiyue holds a Ph.D. in Computer Science from the University of Notre Dame, where his research focuses on advanced machine learning and deep learning techniques.
Chen Weiqi He is a Machine Learning Engineer at the AWS Generative AI Innovation Center, working on model customization and optimization for LLM. We also build tools that help teams tackle various aspects of the LLM development lifecycle, such as fine-tuning, benchmarking, and load testing, driving the adoption of diverse use cases for AWS customers. He holds a master’s degree in computer science from the University of California, Davis.
hannah marlowe is a senior manager of model customization at the AWS Generative AI Innovation Center. Her team specializes in helping customers use their own proprietary data to develop differentiated generative AI solutions to achieve key business outcomes. She holds a PhD in physics from the University of Iowa, with an emphasis in astronomical X-ray analysis and instrument development. Outside of work, I enjoy hiking, mountain biking, and skiing in the mountains of Colorado.
Jung Seung Hyun (Steve) He is a team leader for SKT’s Platform Applications team. He is responsible for commercializing the Global Intelligence Platform (GIP), which provides AI models and tools. He has spent most of his career as a PM, developing various mobile services such as mobile wallets, fashion streaming, and SK’s unified login service. His team is contributing to SKT’s AI transformation by expanding the offering of models and capabilities to make it easier for internal teams to apply AI. Before entering the AI field, he was a product manager developing and operating various mobile services such as mobile wallets, fashion streaming, and unified login services for the US and South Korea.
Lee Sun Woo (Lois) He is a team leader for the data construction and evaluation team within SK Telecom’s Global AI Tech division. She oversees the design and construction of language model training data, the model performance evaluation process, and its application to services. Her career has focused on NLP in IT, which is a great fit for her background in linguistics and Korean language education. Along with her world-class team, she teaches fascinating topics such as how to optimize data design for training language models, tasks and methods to implement to validate language model performance, and how to best design AI and human conversations. We continue to explore and solve problems. .
Eric Davis He is Vice President of SKT’s AI Tech Collaboration Group. Eric oversees technology collaborations with technology partners around the world to customize large-scale language models (LLMs) for the telecommunications domain. His team is responsible for designing and building datasets for tuning LLM, as well as benchmarking LLM in general and the telecommunications domain. Eric holds a Master of Science degree in Computer Science from Carnegie Mellon University’s Institute for Language and Technology and a Bachelor’s degree in Linguistics and Psychology from the University of California, Los Angeles.