Cohere Rerank 3 Nimble Now Generally Available on Amazon SageMaker JumpStart

The Cohere Rerank 3 Nimble foundation model (FM) is now generally available on Amazon SageMaker JumpStart. This model is the latest FM in Cohere’s Rerank model series, built to power enterprise search and Retrieval Augmented Generation (RAG) systems.

This article describes the benefits and capabilities of this new model with some examples.

Overview of the Cohere Rerank model

Cohere’s Rerank family of models is designed to enhance existing enterprise search and RAG systems. Rerank models improve search accuracy over both keyword-based and embedding-based search systems. Cohere Rerank 3 is designed to sort documents retrieved by an initial search algorithm based on their relevance to a given query. Reranking models, also known as cross-encoders, are a type of model that, given a query-document pair, outputs a similarity score. In FM, words, sentences, or entire documents are often encoded as dense vectors in a semantic space. By calculating the cosine of the angle between these vectors, their semantic similarity can be quantified and output as a single similarity score. This score can be used to sort documents based on their relevance to the query.

Cohere Rerank 3 Nimble is the latest model in Cohere’s family of Rerank models and is designed to improve speed and efficiency from its predecessor, Cohere Rerank 3. According to Cohere’s benchmark tests, including BEIR (Benchmarking IR) for accuracy and internal benchmark datasets, Cohere Rerank 3 Nimble is approximately 3-5x faster than Cohere Rerank 3 while maintaining high accuracy. The speed improvements are designed for businesses that want to enhance their search capabilities without sacrificing performance.

The following diagram illustrates the two-stage search of the RAG pipeline and shows where Cohere Rerank 3 Nimble fits into the search pipeline.

The first stage of search in the RAG architecture returns a set of candidate documents based on the knowledge base relevant to the query. In the second stage, Cohere Rerank 3 Nimble analyzes the semantic relevance between the query and each retrieved document and sorts them from most relevant to least relevant. The top-ranked documents augment the original query with additional context. This process identifies the most relevant documents and improves the quality of search results. Integrating Cohere Rerank 3 Nimble into the RAG system allows users to send fewer high-quality documents to the language model for grounded generation, which improves the accuracy and relevance of search results without increasing latency.

SageMaker JumpStart overview

SageMaker JumpStart gives you access to a wide range of publicly available FMs. These pre-trained models serve as a powerful starting point that you can deeply customize to address your specific use cases. You can now use state-of-the-art model architectures, such as language models, computer vision models, and more, without having to build them from scratch.

Amazon SageMaker is a comprehensive, fully managed machine learning (ML) platform that revolutionizes the entire ML workflow. It provides an unparalleled suite of tools for every stage of the ML lifecycle, from data preparation to model deployment and monitoring. Data scientists and developers can use the SageMaker integrated development environment (IDE) to access a wide range of pre-built algorithms, customize their own models, and seamlessly scale their solutions. The strength of the platform is that it abstracts the complexity of infrastructure management, allowing you to focus on innovation instead of operational overhead. SageMaker’s automated ML capabilities, including the automatic machine learning (AutoML) feature, democratize ML by empowering non-experts to build sophisticated models. In addition, robust governance features enable organizations to maintain control and transparency over ML projects and address key concerns around regulatory compliance.

Prerequisites

Make sure that your SageMaker AWS Identity and Access Management (IAM) service role has the following permissions: AmazonSageMakerFullAccess The authorization policy is attached.

To successfully deploy Cohere Rerank 3 Nimble, please make sure you have one of the following:

Verify that your IAM role has the following permissions to create an AWS Marketplace subscription in the AWS account that you use:
- aws-marketplace:ViewSubscriptions
- aws-marketplace:Unsubscribe
- aws-marketplace:Subscribe
Alternatively, ensure that your AWS account has a subscription to the model. If you have a subscription, you can skip the next deployment step and start with subscribing to the model package.

Deploying Coherence Rerank 3 Nimble on SageMaker JumpStart

You can access the Cohere Rerank 3 model family using SageMaker JumpStart in Amazon SageMaker Studio, as shown in the following screenshot.

Once selected, deployment will begin ExpandYou might be prompted to subscribe to this model through AWS Marketplace. If you’re already subscribed, Expand Click again to deploy the model. Once deployment is complete, an endpoint will be created. You can test the endpoint by passing a sample inference request payload or by using the SDK and selecting the test option.

Subscribe to a model package

To subscribe to a model package, follow these steps:

Depending on which model you want to deploy, open the model package list page for cohere-rerank-nimble-english or cohere-rerank-nimble-multilingual.
AWS Marketplace listing: Continue Subscription.
Above Subscribe to this software Review and select the page Accept the offer When you and your organization agree to the EULA, pricing, and support terms.
choose Go to Settings Next, choose your AWS Region.

The product ARN is displayed, which is the model package ARN that you need to specify when creating a deployable model using Boto3.

Deploying Cohere Rerank 3 Nimble using the SDK

To deploy the model using the SDK, copy the product ARN from the previous step, model_package_arn In the following code:

from cohere_aws import Client
import boto3
region = boto3.Session().region_name

model_package_arn = "Specify the model package ARN here"

Once you have the ARN of your model package, you can create an endpoint as shown in the following code. Specify a name for the endpoint, the instance type, and the number of instances you want to use. Ensure that you have your account-level service limits for using the endpoint with ml.g5.xlarge as one or more instances. To request a service quota increase, see AWS Service Quotas.

co = Client(region_name=region)
co.create_endpoint(arn=model_package_arn, endpoint_name="cohere-rerank-3/cohere-rerank-nimble-multilingual", instance_type="ml.g5.xlarge", n_instances=1)

If the endpoint is already created, you can simply connect to it using the following code:

co.connect_to_endpoint(endpoint_name="cohere-rerank-3/cohere-rerank-nimble-multilingual-v3")

Follow the similar process as above to deploy Cohere Rerank 3 on SageMaker JumpStart.

Coherence Rerank 3 Inference example using Nimble

Cohere Rerank 3 Nimble offers strong multilingual support. The model is available in both English and multilingual versions supporting over 100 languages.

The following code example shows how to perform real-time inference using Cohere Rerank 3 Nimble-English.

documents = (
    {"Title":"Incorrect Password","Content":"Hello, I have been trying to access my account for the past hour and it keeps saying my password is incorrect. Can you please help me?"},
    {"Title":"Confirmation Email Missed","Content":"Hi, I recently purchased a product from your website but I never received a confirmation email. Can you please look into this for me?"},
    {"Title":"Questions about Return Policy","Content":"Hello, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective."},
    {"Title":"Customer Support is Busy","Content":"Good morning, I have been trying to reach your customer support team for the past week but I keep getting a busy signal. Can you please help me?"},
    {"Title":"Received Wrong Item","Content":"Hi, I have a question about my recent order. I received the wrong item and I need to return it."},
    {"Title":"Customer Service is Unavailable","Content":"Hello, I have been trying to reach your customer support team for the past hour but I keep getting a busy signal. Can you please help me?"},
    {"Title":"Return Policy for Defective Product","Content":"Hi, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective."},
    {"Title":"Wrong Item Received","Content":"Good morning, I have a question about my recent order. I received the wrong item and I need to return it."},
    {"Title":"Return Defective Product","Content":"Hello, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective."}
)

In the following code: top_n The inference parameters for Cohere Rerank 3 and Rerank 3 Nimble specify how many top-ranked results are returned after reranking the input documents, allowing you to control how many of the most relevant documents are included in the final output. top_nConsider factors such as the diversity of your document set, the complexity of your queries, and the desired balance between enterprise search or RAG accuracy and latency.

response = co.rerank(documents=documents, query='What emails have been about returning items?', rank_fields=("Title","Content"), top_n=2)

Below is the output from Cohere Rerank 3 Nimble-English.

Documents: (RerankResult<document: {'Title': 'Received Wrong Item', 'Content': 'Hi, I have a question about my recent order. I received the wrong item and I need to return it.'}, index: 4, relevance_score: 0.0068771075>, RerankResult<document: {'Title': 'Wrong Item Received', 'Content': 'Good morning, I have a question about my recent order. I received the wrong item and I need to return it.'}, index: 7, relevance_score: 0.0064131636>)

Cohere Rerank 3 Agile Multilingual Support

Cohere Rerank 3 Nimble-Multilingual’s multilingual capabilities enable global organizations to provide a consistent and improved search experience to users across different regions and language settings.

The following example creates an input payload for a list of emails in multiple languages. You can take the same set of emails as above and translate them into different languages. These examples are available in the SageMaker JumpStart model card and were randomly generated for this example.

documents = (
    {"Title":"Contraseña incorrecta","Content":"Hola, llevo una hora intentando acceder a mi cuenta y sigue diciendo que mi contraseña es incorrecta. ¿Puede ayudarme, por favor?"},
    {"Title":"Confirmation Email Missed","Content":"Hi, I recently purchased a product from your website but I never received a confirmation email. Can you please look into this for me?"},
    {"Title":"أسئلة حول سياسة الإرجاع","Content":"مرحبًا، لدي سؤال حول سياسة إرجاع هذا المنتج. لقد اشتريته قبل بضعة أسابيع وهو معيب"},
    {"Title":"Customer Support is Busy","Content":"Good morning, I have been trying to reach your customer support team for the past week but I keep getting a busy signal. Can you please help me?"},
    {"Title":"Falschen Artikel erhalten","Content":"Hallo, ich habe eine Frage zu meiner letzten Bestellung. Ich habe den falschen Artikel erhalten und muss ihn zurückschicken."},
    {"Title":"Customer Service is Unavailable","Content":"Hello, I have been trying to reach your customer support team for the past hour but I keep getting a busy signal. Can you please help me?"},
    {"Title":"Return Policy for Defective Product","Content":"Hi, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective."},
    {"Title":"收到错误物品","Content":"早上好，关于我最近的订单，我有一个问题。我收到了错误的商品，需要退货。"},
    {"Title":"Return Defective Product","Content":"Hello, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective."}
)

To perform real-time inference using Cohere Rerank 3 Nimble-Multilingual, use the following code:

response = co.rerank(documents=documents, query='What emails have been about returning items?', rank_fields=('Title','Content'), top_n=2)
print(f'Documents: {response}')

Below is the output from Cohere Rerank 3 Nimble-Multilingual.

Documents: (RerankResult<document: {'Title': '收到错误物品', 'Content': '早上好，关于我最近的订单，我有一个问题。我收到了错误的商品，需要退货。'}, index: 7, relevance_score: 0.034553625>, RerankResult<document: {'Title': 'أسئلة حول سياسة الإرجاع', 'Content': 'مرحبًا، لدي سؤال حول سياسة إرجاع هذا المنتج. لقد اشتريته قبل بضعة أسابيع وهو معيب'}, index: 2, relevance_score: 0.00037263767>)

Here’s the output translated into English:

Documents: (RerankResult<document: {'Title': 'Received Wrong Item', 'Content': 'Good morning, I have a question about my recent order. I received the wrong item and need to return it.'}, index: 7, relevance_score: 0.034553625>, RerankResult<document: {'Title': 'Questions about Return Policy', 'Content': 'Hello, I have a question about the return policy for this product. I bought it a few weeks ago and it's defective'}, index: 2, relevance_score: 0.00037263767>)

In both examples, the relevance scores are normalized to the range (0, 1), where a score closer to 1 indicates higher relevance to the query, and a score closer to 0 indicates lower relevance.

Cohere Rerank 3 Nimble Suitable Use Cases

The Cohere Rerank 3 Nimble model offers an option that prioritizes efficiency. This model is ideal for businesses that want to enable their customers to accurately search complex documents, build applications that understand over 100 languages, and retrieve the most relevant information from disparate data stores. In industries like retail, where every 100 milliseconds of search response time increases website abandonment, deploying fast AI models like Cohere Rerank 3 Nimble in enterprise search systems can improve conversion rates.

Conclusion

Cohere Rerank 3 and Rerank 3 Nimble are now available on SageMaker JumpStart. To get started, see Train, Deploy, and Evaluate Pre-Trained Models with SageMaker JumpStart.

To learn more, see the Cohere on AWS GitHub repository.

About the Author

Breanne Warner Breanne is an Enterprise Solutions Architect at Amazon Web Services supporting Healthcare and Life Sciences (HCLS) customers. She is passionate about supporting customers using Generative AI on AWS and driving adoption of their models. She also serves as Co-Director of Allyship on the board of Women@Amazon with the goal of fostering an inclusive and diverse culture at Amazon. Breanne holds a BS in Computer Engineering from the University of Illinois at Urbana-Champaign (UIUC).

Nithin Vijeaswaran is a Solutions Architect at AWS. His areas of expertise are Generative AI and AWS AI Accelerators. He holds a Bachelor’s degree in Computer Science and Bioinformatics. Niithiyn works closely with the Generative AI GTM team to support AWS customers across the board and accelerate their adoption of Generative AI. He is an avid Dallas Mavericks fan and enjoys collecting sneakers.

Karan Singh is a Generative AI Specialist for Third Party Models at AWS. He works with top 3rd party generative model providers to define and execute integrated GTM actions that help customers train, deploy, and scale their generative models. Karan holds a BS in Electrical and Instrumentation Engineering from Manipal University, an MS in Electrical Engineering from Northwestern University, and is currently an MBA candidate at Haas School of Business, University of California, Berkeley.

What's Hot

The Crow Is a Gothic Superhero Romance Destined for Cult Status

Mounting evidence suggests shingles vaccine prevents dementia

Stephen Colbert says Biden will drop out of 2024 presidential race

Accelerating insurance policy reviews with generative AI: Verisk’s Mozart companion

Announcing general availability of Amazon Bedrock Knowledge Bases GraphRAG with Amazon Neptune Analytics

Evaluate RAG responses with Amazon Bedrock, LlamaIndex and RAGAS

Build a Multi-Agent System with LangGraph and Mistral on AWS

Ground truth generation and review best practices for evaluating generative AI question-answering with FMEval

Accelerating insurance policy reviews with generative AI: Verisk’s Mozart companion

Announcing general availability of Amazon Bedrock Knowledge Bases GraphRAG with Amazon Neptune Analytics

Evaluate RAG responses with Amazon Bedrock, LlamaIndex and RAGAS

1 Comment

How to add an activity profile to a Garmin watch

How many planets are in the solar system?

The mystery of how clouds affect the climate is finally solved

Most Popular

Giant farms increase bird flu threat

Puberty in cave fish means taste buds sprout in strange places

Thanks to the Pentagon, the Lords of Silicon Valley Are Having a Moment

Our Picks

Scaling Thomson Reuters’ language model research with Amazon SageMaker HyperPod

Making traffic lights more efficient with Amazon Rekognition

Wild cave fish can survive on little to no sleep.

Subscribe to our newsletter

Subscribe to Updates

What's Hot

Cohere Rerank 3 Nimble Now Generally Available on Amazon SageMaker JumpStart

Overview of the Cohere Rerank model

SageMaker JumpStart overview

Prerequisites

Deploying Coherence Rerank 3 Nimble on SageMaker JumpStart

Subscribe to a model package

Deploying Cohere Rerank 3 Nimble using the SDK

Coherence Rerank 3 Inference example using Nimble

Cohere Rerank 3 Agile Multilingual Support

Cohere Rerank 3 Nimble Suitable Use Cases

Conclusion

About the Author

Related Posts

1 Comment

Subscribe to our newsletter

Subscribe to our newsletter