This post was co-authored with Lucas Desard, Tom Lauwers, and Sam Landuydt of DPG Media.
DPG Media is a leading Benelux media company that operates multiple online platforms and TV channels. DPG Media’s VTM GO platform alone offers over 500 days of non-stop content.
With a growing library of long-form video content, DPG Media recognizes the importance of efficiently managing and enriching video metadata such as actor information, genre, episode summaries, and video mood. Having descriptive metadata is key to providing accurate TV Guide descriptions, improving content recommendations, and increasing a consumer’s ability to explore content that fits their interests and current mood. .
In this post, we show how DPG Media introduced an AI-powered process using Amazon Bedrock and Amazon Transcribe to its video publishing pipeline in just four weeks, evolving to a more automated annotation system.
Challenge: Extracting and generating metadata at scale
DPG Media accepts video productions with a wide range of marketing materials, including visual media and brief descriptions. These materials are often not standardized and vary in quality. As a result, DPG media producers must perform a screening process to consume and fully understand the content in order to generate missing metadata, such as a short synopsis. Some content undergoes additional review to generate subtitles and captions.
As DPG Media grows, it needs a more scalable way to capture metadata that improves the consumer experience with online video services and helps understand key content characteristics.
Initial challenges in automation include:
- language diversity – The service hosts both Dutch and English programming. Some local programs feature Flemish dialects, which can be difficult for some large-scale language models (LLMs) to understand.
- Changes in content amount – We offer a wide range of content amounts, from single-episode movies to multi-season series.
- release frequency – New shows, episodes, and movies released every day.
- Data aggregation – Metadata must be available at the top-level asset (show or movie) and must be reliably aggregated across different seasons.
Solution overview
To address the automation challenge, DPG Media decided to implement a combination of AI technology and existing metadata to generate new and accurate content and category descriptions, mood, and context.
This project focused solely on audio processing for cost efficiency and reduced processing time. AI video data analysis was not required to generate detailed, accurate, and high-quality metadata.
The following diagram shows the metadata generation pipeline from audio transcription to detailed metadata.
The general architecture of a metadata pipeline consists of two major steps:
- Generate a transcription of an audio track. Generate accurate transcripts of audio content using speech recognition models.
- Generating metadata: Use LLM to extract and generate detailed metadata from transcriptions.
The next section details the pipeline components.
Step 1. Generate a transcription of your audio track
To generate the audio transcripts needed for metadata extraction, the DPG Media team evaluated two different transcription strategies. One is Whisper-v3-large, which requires at least 10 GB of vRAM and advanced operational processing, and the other is Amazon Transcribe, which is a managed service with additional benefits. Automatic model updates and speaker diarization from AWS over time. Our evaluation focused on two key factors: price performance and transcription quality.
To assess the accuracy quality of the transcriptions, the team used the following metrics to compare the results to ground truth subtitles from a large test set.
- Word error rate (WER) – This metric measures the percentage of incorrectly transcribed words compared to the ground truth. A lower WER indicates a more accurate transcription.
- Match error rate (MER) – MER evaluates the percentage of correct words that are matched exactly in the transcription. A lower MER means higher accuracy.
- Loss of word information (WIL) – This metric quantifies the amount of information lost due to transcription errors. Lower WIL indicates fewer errors and better preservation of original content.
- Saving word information (WIP) – WIP is the opposite of WIL and indicates the amount of information correctly captured. A higher WIP score indicates a more accurate transcription.
- hit songs – This metric counts the number of correctly transcribed words and easily measures accuracy.
Both audio-to-transcription experiments produced high-quality results without the need to incorporate video or further diarization of the speaker. For more information about speaker diarization in other use cases, see Streamlining diarization using AI as an assistive technology: The ZOO Digital story.
Given the different development and maintenance efforts required by different alternatives, DPG Media selected Amazon Transcribe as the transcription component of the system. This managed service provides convenience and allows you to focus your resources on obtaining comprehensive, highly accurate data from your assets, with the goal of achieving 100% qualitative accuracy.
Step 2. Generate metadata
Now that DPG Media has transcribed the audio files, it uses LLM through Amazon Bedrock to generate metadata for various categories, such as synopsis, genre, mood, and key events. Amazon Bedrock is a fully managed service that provides a selection of high-performance foundational models (FM) from leading AI companies, including AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon, through a single API. Broad feature set for building generative AI applications with security, privacy, and responsible AI.
Through Amazon Bedrock, DPG Media selected the Anthropic Claude 3 Sonnet model based on internal testing and the Hugging Face LMSYS Chatbot Arena Leaderboard for its reasoning and Dutch performance. The DPG Media team worked closely with the end consumer to adjust the prompts to ensure that the generated metadata matches the expected format and style.
Once the team generated metadata at the individual video level, the next step was to aggregate this metadata across a series of episodes. This was an important requirement because content recommendations on streaming services are typically done at the series or movie level rather than the episode level.
To generate summaries and metadata at the series level, the DPG Media team reused previously generated video-level metadata. They sent the synopsis back to Anthropic Claude 3 Sonnet via Amazon Bedrock in an ordered and structured manner with specially tailored system prompts.
Many of DPG Media’s series are long-running, so using summaries instead of full transcripts of episodes is sufficient to obtain high-quality aggregated data and is also cost-effective. .
This solution also preserves a direct association between each type of metadata and its corresponding system prompt, making it easy to adjust, remove, or remove prompts as needed, similar to adjustments made during the development process. You can add it. This flexibility allows you to adjust metadata generation to your evolving business requirements.
To assess the quality of the metadata, the team used reference-free LLM metrics inspired by LangSmith. This approach uses a secondary LLM to provide customized metrics such as whether the summary is easy to understand, whether it contains all important events from the transcription, and whether there are any hallucinations in the generated summary. We evaluated the output based on. Secondary LLM is used to evaluate large summaries.
Results and lessons learned
Implementing an AI-powered metadata pipeline has been a transformative journey for DPG Media. Their approach saves days of work in generating metadata for TV series.
DPG Media chose Amazon Transcribe because of its ease of transcription, low maintenance burden, and years of incremental improvements from AWS. For metadata generation, DPG Media chose Amazon Bedrock’s Anthropic Claude 3 Sonnet rather than building direct integrations with various model providers. I appreciate the flexibility of experimenting with multiple models and plan to try Anthropic Claude Opus when it becomes available in my preferred AWS Region.
DPG Media chose to balance AI and human expertise by having humans verify the results produced by the pipeline. This approach was chosen because the results would be exposed to the end customer and the AI system could make mistakes from time to time. The goal was not to replace talent, but to augment it through a combination of human curation and automation.
Transforming the video viewing experience is about creating a richer and more engaging user experience, not just adding more description. DPG Media implements an AI-driven process to provide users with better recommended content, improve understanding of content libraries, and move toward a more automated and efficient annotation system We are aiming for This evolution promises to not only streamline operations, but also align content delivery with modern consumption habits and technological advances.
conclusion
In this post, we shared how DPG Media used Amazon Bedrock to bring AI-powered processes to their video publishing pipeline. This solution helps you accelerate audio metadata extraction, create a more engaging user experience, and save time.
We invite you to learn more about how you can gain a competitive advantage with powerful generative AI applications by visiting Amazon Bedrock and trying out this solution on the datasets relevant to your business.
About the author
Lucas Dussard I’m a GenAI engineer at DPG Media. He helps DPG Media efficiently and meaningfully integrate generative AI into various corporate processes.
tom lowers I’m a Machine Learning Engineer on the Video Personalization team at DPG Media. He builds and designs recommendation systems for DPG Media’s long-form video platform, supporting brands such as VTM GO, Streamz, and RTL play.
sam randyte Recommendation and search for area managers at DPG Media. As a team manager, he coaches ML and software engineers in building internal recommendation systems and generative AI solutions.
Irina Radu is a Prototyping Engagement Manager, part of AWS EMEA Prototyping and Cloud Engineering. She helps clients make the most of emerging technologies, innovate faster and think bigger.
Fernanda MachadoAWS Prototyping Architects, help customers bring their ideas to life and use the latest best practices for modern applications.
Andrew ShvedThe Senior AWS Prototyping Architect helps customers build business solutions that use modern applications, big data, and AI innovations.