AWS and DXC collaborate to provide customizable, real-time voice-to-voice-to-voice translation capabilities for Amazon Connect

Providing effective multilingual customer support in global business presents important operational challenges. Through collaboration between AWS and DXC technology, we have developed a scalable voice (V2V) translation prototype that translates how contact centers handle multilingual customer interactions.

This post explains how AWS and DXC can use Amazon Connect and other AWS AI services to provide near-real-time V2V translation capabilities.

Challenge: Serve customers in multiple languages

In the third quarter of 2024, DXC Technology approached AWS with a critical business challenge. Their global contact centres had to serve their customers in multiple languages without the exponential cost of hiring language-specific agents for bass languages. Previously, DXC had been investigating several existing alternatives, but found limitations in each approach, ranging from correspondence constraints to infrastructure requirements that affect reliability, scalability, and operational costs. DXC and AWS have decided to organize the focus hackathons that DXC and AWS Solution Architects collaborated with.

Define important requirements for real-time translation
Establish a benchmark for delay and accuracy
Create a seamless integration path on an existing system
Develop a step-by-step implementation strategy
Prepare and test the initial proof of concept setup

Impact on business

In the case of DXC, this prototype was used as an enabler, allowing for maximizing technical talent, operational change, and cost improvements.

Best Technical Expertise Delivery – Employment and Matching Agent based on technical knowledge rather than spoken language ensures that customers get the best technical support regardless of language barriers
Global operational flexibility – Removing geographic and linguistic constraints in employment, placement, and support delivery while maintaining consistent quality of service across all languages
Cost Reduction – Eliminate multilingual expertise premium, specialized language training and infrastructure costs through a pay-per-conversion model
Experience similar to native speakers – maintain a natural conversation flow with near real-time translation and audio feedback while providing premium technical support in customer priority language

Solution overview

Amazon Connect V2V Translation Prototype uses AWS advanced speech recognition and machine translation technology to enable real-time conversational translation between agents and customers, allowing you to speak in your preferred language while having natural conversations. It consists of the following important components:

Voice Recognition – Your customer’s speech language is captured and converted to text using Amazon Transcribe, which acts as a speech recognition engine. The transcript (text) is then fed to the machine translation engine.
Machine Translation – Amazon Translate, a machine translation engine, translates customer transcripts into the agent’s preferred language in near real-time. The translated transcript is converted to speech using Amazon Polly, which acts as an engine from text to speech.
Two-way translation – The process reverses for an agent’s response, translates the speech into the customer’s language, and delivers translated audio to the customer.
Seamless Integration – The V2V Translation Sample Project integrates with Amazon Connect to enable you to use Amazon Connect Streams JS and Amazon Connect RTC JS libraries to handle customer interactions in multiple languages without any additional effort or training Masu.

The prototype can be extended with other AWS AI services to further customize translation capabilities. It is open source and ready for customization to meet your specific needs.

The following diagram illustrates the solution architecture.

The following screenshot shows the sample agent web application:

The user interface consists of three sections:

Contact Control Panel – Softphone Client with Amazon Connect
Customer Control – Customer-Agent Interaction Control, including transcribing customer voice, translating customer voice, and synthesizing customer voice
Agent Control – Agent-Customer interaction control, including transcribing speech from agents, translating agent speech, and synthesizing agent speech

Challenges when implementing near real-time voice translations

The Amazon Connect V2V Sample Project was designed to minimize audio processing time from the moment a customer or agent finishes talking until the translated audio stream begins. However, even with the shortest audio processing time, the user experience does not match the actual conversation experience if both are speaking the same language. This is due to a specific pattern of customers who only listen to the agent’s translated speech, and agents who only listen to the customer’s translated speech. The following image shows the pattern:

The example workflow consists of the following steps:

The customer begins to speak in his or her language and speaks for 10 seconds.
Agents hear only the customer’s translated speech, so the agent first hears a 10-second silence.
When the customer finishes speaking, the audio processing time takes 1-2 seconds, during which time both the customer and the agent hear the silence.
Customer’s translated speeches are streamed to agents. Meanwhile, customers hear silence.
Once the customer’s translated audio playback is complete, the agent begins speaking and speaks for 10 seconds.
Customers only listen to the agent’s translated speech, so customers hear 10 seconds of silence.
Once the agent finishes speaking, the audio processing time takes 1-2 seconds, during which time both the customer and the agent hear the silence.
Agent’s translated speech is streamed to the agent. Meanwhile, the agent hears silence.

In this scenario, the customer hears a complete silence of 22-24 seconds from the moment they finish talking, until they hear the agent’s translated voice. This creates a suboptimal experience, as customers may not be sure what’s going on in 22-24 seconds. For example, if the agent could hear them, or if there were technical issues.

Audio Streaming Add-on

In a face-to-face conversation scenario between two people who do not speak the same language, there may be another person as a translator or interpreter. The example workflow consists of the following steps:

A person speaks in his own language. This has been heard by People B and translators.
The translator translates what A said in the language of person B. Translations are asked by people B and people A.

Essentially, people A and people B hear each other speak their language and also hear translations (from the translator). There is no waiting in silence. This is even more important in face-to-face conversations (such as contact center interactions).

To optimize your customer/agent experience, Amazon Connect V2V Sample Project implements audio streaming add-ons to simulate a more natural conversational experience. The following diagram shows an example workflow:

The workflow consists of the following steps:

The customer begins to speak in his or her language and speaks for 10 seconds.
Agents are listening to the customer’s original voice on a low volume (enabled from customer microphone to agent to agent”).
When a customer finishes speaking, the audio processing time takes 1-2 seconds. Meanwhile, customers and agents will hear subtle audio feedback (contact the center background noise) on very low volumes (enable “audio feedback”).
Customer’s translated speeches are streamed to agents. Meanwhile, customers will listen to translated speeches on a lower volume (enabled “Stream customer translation to customers”).
Once the customer’s translated audio playback is complete, the agent begins speaking and speaks for 10 seconds.
Customers will hear the agent’s original audio on a lower volume (enabled “Stream Agent Microphone to Customer”).
When the agent finishes speaking, the audio processing time takes 1-2 seconds. Meanwhile, customers and agents will hear subtle audio feedback (contact the center background noise) on very low volumes (enable “audio feedback”).
The agent’s translated audio is streamed to the agent. Meanwhile, the agent listens to translated speeches on a lower volume (enabled by “Stream Agent Translation to Agent”).

In this scenario, instead of a single block of 22-24 seconds of total silence, the customer hears two short blocks (1-2 seconds) of subtle audio feedback. This pattern is much closer to face-to-face conversations involving translators.

Audio streaming add-ons offer additional benefits including:

Voice Characteristics – If agents and customers only listen to translated integrated audio, the actual audio characteristics will be lost. For example, agents cannot hear whether the customer is talking slowly and quickly, or whether the customer is upset or calm. Translated and synthesized speeches do not carry that information.
Quality Assurance – When call recording is enabled, translation and synthesis are performed on the agent (client), so only the original voice of the customer and the synthesized voice of the agent are recorded. This makes it difficult for QA teams to properly evaluate and audit conversations. This includes many silent blocks within it. Instead, if the audio streaming add-on is enabled, there is no silent blocking and the QA team listens to the agent’s original voice, the customer’s original voice, and each translated speech all in a single audio file You can do it.
Transcription and translation accuracy – Using both original and translated speeches available in call recordings makes it easier to detect specific words that improve transcription accuracy (Amazon Transcrible Custom Bocabularies) Make sure that your translation accuracy (using Amazon Translate Custom Terminologies), brand names, character names, model names, and other unique content will be transcribed and converted to the desired result.

Get started with Amazon Connect V2V

Ready to transform contact centre communications? The Amazon Connect V2V sample project is now available on GitHub. We recommend exploring, deploying and experimenting with this powerful prototype. Through the important steps below, you can do so as the foundation for developing innovative multilingual communication solutions in your own contact center.

Clone the GitHub repository.
Test different configurations for audio streaming add-ons.
Check the sample project limits in README.
Develop an implementation strategy:
1. Implement robust security and compliance controls that meet your organization’s standards.
2. Work with the customer experience team to define requirements for a particular use case.
3. Balance between automation and agent manual control (for example, use Amazon Connect Contact Flow to automatically set contact attributes for preferred languages and audio streaming add-ons).
4. Use your preferred transcription, translation, and text-to-speech engine based on your specific language support requirements and business, law, and local preferences.
5. Plan a step-by-step rollout starting with a pilot group and iteratively optimize custom vocabulary and translation terms.

Conclusion

The Amazon Connect V2V sample project demonstrates how Amazon Connect and Advanced AWS AI services can break down language barriers, increase operational flexibility and reduce support costs. Start now and revolutionize how your contact center communicates across language barriers!

About the author

Milos Kozic He is a leading solution architect at AWS.

eJFerror I am a senior solution architect at AWS.

Adam El Tambouri I am the technical program manager for prototyping and support services at DXC Modern Workplace.

What's Hot

Ultra-thin graphene brain implant just tested in humans

NYT “Connections” Hints and Answers for August 12: Hints to solve “Connections” #428.

Why Trump Supporters Are Freaking Out About Tim Walz’s Tacos

Maximize your file server data’s potential by using Amazon Q Business on Amazon FSx for Windows

LLM continuous self-instruct fine-tuning framework powered by a compound AI system on Amazon SageMaker

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

How Rocket Companies modernized their data science solution on AWS

Orchestrate an intelligent document processing workflow using tools in Amazon Bedrock

Maximize your file server data’s potential by using Amazon Q Business on Amazon FSx for Windows

LLM continuous self-instruct fine-tuning framework powered by a compound AI system on Amazon SageMaker

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

Israeli Olympic athletes’ personal information leaked to Zeus hacking group

Multimodal AI powers Just Walk Out technology

Strange skeletons unearthed from Roman tomb contain bones of eight people

Most Popular

Gas leaks from galaxies may explain the universe’s missing matter

How to improve your website’s reputation on Google (Infographic)

What Cancun tourists don’t see is the vast concrete jungle

Our Picks

Meta given deadline to address concerns over EU ‘pay or agree’ model

Meta allowed pornographic ads that violate content moderation rules

The science of exercise: Which activities burn the most calories?

Subscribe to our newsletter

Subscribe to Updates

What's Hot

AWS and DXC collaborate to provide customizable, real-time voice-to-voice-to-voice translation capabilities for Amazon Connect

Challenge: Serve customers in multiple languages

Impact on business

Solution overview

Challenges when implementing near real-time voice translations

Audio Streaming Add-on

Get started with Amazon Connect V2V

Conclusion

About the author

Related Posts

Subscribe to our newsletter

Subscribe to our newsletter