Build cost-effective RAG applications using binary embeddings from Amazon Titan Text Embeddings V2, Amazon OpenSearch Serverless, and Amazon Bedrock Knowledge Bases.

Today, we are excited to announce that binary embedding of Amazon Titan Text Embeddings V2 is now available in Amazon Bedrock Knowledge Bases and Amazon OpenSearch Serverless. With support for binary embedding in Amazon Bedrock and binary vector store in OpenSearch Serverless, you can use binary embedding and binary vector store to build search augmented generation (RAG) applications in Amazon Bedrock knowledge bases to improve memory usage and overall You can reduce costs.

Amazon Bedrock is a fully managed service that provides a single API to access and use a variety of high-performance foundational models (FMs) from leading AI companies. Amazon Bedrock also provides a wide range of capabilities for building generative AI applications with security, privacy, and responsible AI. Amazon Bedrock Knowledge Bases allows FMs and agents to obtain RAG context information from your company’s private data sources. RAG helps FMs provide better, more accurate, and customized responses.

Amazon Titan Text Embeddings models generate meaningful semantic representations of documents, paragraphs, and sentences. Amazon Titan Text Embeddings takes a body of text as input and produces a vector with dimensions of 1,024 (default), 512, or 256. Amazon Titan text embedding is delivered through latency-optimized endpoint calls (recommended during the retrieval step) to speed up searches and throughput-optimized batch jobs to speed up indexing. Masu. With binary embedding, Amazon Titan Text Embedding V2 represents your data as a binary vector with each dimension encoded as a single binary digit (0 or 1). This binary representation transforms high-dimensional data into a more efficient format for storage and computation.

Amazon OpenSearch Serverless is a serverless deployment option for Amazon OpenSearch Service. Amazon OpenSearch Service is a fully managed service that makes it easy to perform interactive log analysis, real-time application monitoring, website search, and vector search using the k-Nearest Neighbor (kNN) plugin. Supports exact and approximate nearest neighbor algorithms, multiple storage and matching engines. This makes it easy to build modern machine learning (ML) enhanced search experiences, generative AI applications, and analytical workloads without managing the underlying infrastructure.

The OpenSearch serverless kNN plugin now supports 16-bit (FP16) and binary vectors in addition to 32-bit floating point vectors (FP32). By setting the kNN vector field type to binary, you can inexpensively store binary embeddings generated by Amazon Titan Text Embeddings V2. Vectors can be stored and searched in OpenSearch Serverless using the PUT and GET APIs.

This post summarizes the benefits of this new binary vector support across Amazon Titan Text Embeddings, Amazon Bedrock Knowledge Bases, and OpenSearch Serverless, and provides information on how to get started. The following diagram is a high-level architecture diagram using Amazon Bedrock Knowledge Bases and Amazon OpenSearch Serverless.

OpenSearch Serverless and Amazon Bedrock knowledge bases can reduce latency, storage costs, and memory requirements with minimal degradation in retrieval quality.

We ran the Massive Text Embedding Benchmark (MTEB) acquisition data set using binary embedding. For this dataset, we reduced storage but achieved a 25x improvement in latency. The binary embedding maintained 98.5% retrieval accuracy with re-ranking and 97% without re-ranking. Compare these results with the results obtained using full precision (float32) embedding. In an end-to-end RAG benchmark comparison using full precision embeddings, binary embeddings using Amazon Titan Text Embeddings V2 maintains 99.1% correctness of full precision answers (98.6% without reranking) . We recommend that you run your own benchmarks using Amazon OpenSearch Serverless and Amazon Titan Text Embeddings V2 binary embeddings.

The OpenSearch serverless benchmark using the Hierarchical Navigable Small Worlds (HNSW) algorithm with binary vectors reduces search OpenSearch compute units (OCUs) by 50%, resulting in cost savings for users. Using a binary index significantly reduced retrieval time. Traditional search methods often rely on computationally intensive calculations such as L2 distance and cosine distance, which can be resource-intensive. In contrast, Amazon OpenSearch Serverless’s binary index operates based on Hamming distance, which is a more efficient approach to speeding up search queries.

The next section provides a how-to for binary embedding. Use Amazon Titan text embedding,binary Vector for Vector Engine (and FP16)and binary embedding option for Amazon Bedrock Knowledge Base For more information about the Amazon Bedrock Knowledge Base, see Knowledge Base Now Offers a Fully Managed RAG Experience with Amazon Bedrock.

Generate binary embeddings using Amazon Titan Text Embeddings V2

Amazon Titan Text Embeddings V2 now supports binary embeddings, supports text in over 100 languages, and is optimized for search performance and accuracy across different dimension sizes (1024, 512, 256). By default, the Amazon Titan text embedding model generates embeddings with floating point 32-bit (FP32) precision. Using a 1024-dimensional vector with FP32 embedding improves accuracy, but also incurs large storage requirements and associated costs for retrieval use cases.

To generate binary embeddings in your code, use the right embeddingTypes of parameters invoke_model API requests to Amazon Titan Text Embeddings V2:

import json
import boto3
import numpy as np
rt_client = boto3.client("bedrock-runtime")

response = rt_client.invoke_model(modelId="amazon.titan-embed-text-v2:0", 
          body=json.dumps(
               {
                   "inputText":"What is Amazon Bedrock?",
                   "embeddingTypes": ("binary","float")
               }))('body').read()

embedding = np.array(json.loads(response)("embeddingsByType")("binary"), dtype=np.int8)

Similar to the request above, you can request only binary embeddings or both binary and floating point embeddings. mentioned above embedding The above is a binary vector of length 1024, such as:

array((0, 1, 1, ..., 0, 0, 0), dtype=int8)

For more information and sample code, see Amazon Titan Embedded Text.

Configuring an Amazon Bedrock Knowledge Base with Binary Vector Embedding

With the Amazon Bedrock Knowledge Base, you can take advantage of Amazon Titan Text Embeddings V2 binary embedding and binary vector and floating point 16-bit (FP16) for Amazon OpenSearch serverless vector engine without writing a single line of code. Masu. Follow these steps:

Create a knowledge base in the Amazon Bedrock console. Enter knowledge base details such as name and description to create a new service role or use an existing service role with relevant AWS Identity and Access Management (IAM) permissions. For information about creating service roles, see Service Roles. under Select data sourcechoose Amazon S3as shown in the following screenshot. choose Next.
Configure the data source. Enter a name and description. define Source S3 URI. under Chunk configuration and analysis configurationchoose default. choose Next Continue.
Select an embedding model to complete your knowledge base setup. For this tutorial, choose Titan text embed v2. under Embedded typechoose Binary vector embedding. under vector dimensionchoose 1024. choose Easily create a new vector store. This option configures a new Amazon Open Search serverless store that supports binary data types.

You can review the knowledge base details after creation and monitor the synchronization status of the data source. Once the synchronization is complete, you can test your knowledge base to see the FM response.

conclusion

As explained throughout this post, binary embedding is an option in the Amazon Titan Text Embedding V2 model available in the Amazon Bedrock and OpenSearch Serverless binary vector stores. These features significantly reduce memory and disk needs on Amazon Bedrock and OpenSearch Serverless, resulting in fewer OCUs for RAG solutions. You will also experience increased performance and improved latency, but there will be a slight impact on the accuracy of your results compared to using full floating point data types (FP32). The loss in accuracy is minimal, but you should decide whether it is appropriate for your application. The specific benefits will depend on factors such as data volume, search traffic, and storage requirements, but the examples described in this post illustrate the potential value.

Support for binary embedding in Amazon Open Search Serverless, Amazon Bedrock Knowledge Bases, and Amazon Titan Text Embeddings v2 is currently available in all AWS Regions where the services are already available. Please check our regional listings for more information and future updates. For more information about Amazon Knowledge Bases, please visit the Amazon Bedrock Knowledge Bases product page. For more information about Amazon Titan text embedding, see Amazon Titan on Amazon Bedrock. For more information about Amazon OpenSearch Serverless, please visit the Amazon OpenSearch Serverless product page. For pricing details, please visit the Amazon Bedrock pricing page.

Try out new features in the Amazon Bedrock console today. Join the Generative AI Builders community at community.aws by submitting feedback on AWS re:Post on Amazon Bedrock or through your regular AWS contacts.

About the author

Shreyas Subramanian is a Principal Data Scientist who uses AWS services to help customers solve business challenges using generative AI and deep learning. Shreyas has a background in large-scale optimization and ML, and the use of ML and reinforcement learning to speed up optimization tasks.

Ron Ouida He is a Senior Software Development Manager at Amazon Bedrock Knowledge Bases, helping customers easily build scalable RAG applications.

Satish Nandi I am a senior product manager for Amazon OpenSearch Service. He focuses on OpenSearch Serverless and has years of experience in networking, security, and AI/ML. He holds a Bachelor’s degree in Computer Science and an MBA in Entrepreneurship. In my free time, I like to fly airplanes, hang gliders, and ride motorcycles.

Vamsi vijay nakiruta is a senior software development manager working on the OpenSearch project and the Amazon OpenSearch service. His main interest is distributed systems.

What's Hot

Build AI-powered malware analysis using Amazon Bedrock with Deep Instinct

2024 Olympics Closing Ceremony: How to watch

Muscle implants may allow mind control of prosthetics – no brain surgery required

Maximize your file server data’s potential by using Amazon Q Business on Amazon FSx for Windows

LLM continuous self-instruct fine-tuning framework powered by a compound AI system on Amazon SageMaker

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

How Rocket Companies modernized their data science solution on AWS

Orchestrate an intelligent document processing workflow using tools in Amazon Bedrock

Maximize your file server data’s potential by using Amazon Q Business on Amazon FSx for Windows

LLM continuous self-instruct fine-tuning framework powered by a compound AI system on Amazon SageMaker

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

We Have Got to Stop Giving These Movie Universes Ridiculous Acronyms

Atlanta Dream vs Los Angeles Sparks 2024 Live Stream: Watch WNBA Live

2024 may have been the wettest and hottest year on record.

Most Popular

House of the Dragon Season 2, Episode 7 Trailer Teases New Dragon Rider Joins the Fight

Quantum computers teleport and store energy harvested from empty space

What is the Zodiac and what does it mean to you?

Our Picks

Samsung Galaxy Z Fold6 and Galaxy Z Flip6 Review: Sleek foldables

Recent milestones inch us closer to a useful quantum computer

The Australian breaker that broke the internet

Subscribe to our newsletter

Subscribe to Updates

What's Hot

Build cost-effective RAG applications using binary embeddings from Amazon Titan Text Embeddings V2, Amazon OpenSearch Serverless, and Amazon Bedrock Knowledge Bases.

Generate binary embeddings using Amazon Titan Text Embeddings V2

Configuring an Amazon Bedrock Knowledge Base with Binary Vector Embedding

conclusion

About the author

Related Posts

Subscribe to our newsletter

Subscribe to our newsletter