Mistral-NeMo-Instruct-2407 and Mistral-NeMo-Base-2407 are now available on SageMaker JumpStart

Today, we announce that Mistral-NeMo-Base-2407 and Mistral-NeMo-Instruct-2407, Mistral AI’s 12 billion parameter large-scale language model for text generation, are now available to customers through Amazon SageMaker JumpStart I’m happy to be able to do this. You can try out these models with SageMaker JumpStart. SageMaker JumpStart is a machine learning (ML) hub that provides access to algorithms and models that can be deployed with one click to perform inference. This post explains how to discover, deploy, and use the Mistral-NeMo-Instruct-2407 and Mistral-NeMo-Base-2407 models for various real-world use cases.

Overview of Mistral-NeMo-Instruct-2407 and Mistral-NeMo-Base-2407

Mistral NeMo, a powerful 12B parameter model developed through a collaboration between Mistral AI and NVIDIA and released under the Apache 2.0 license, is now available in SageMaker JumpStart. This model represents a significant advance in multilingual AI capabilities and accessibility.

Main features and functions

Mistral NeMo features a 128k token context window, allowing for extensive long-form content processing. This model shows good performance in inference, world knowledge, and coding accuracy. Both pre-trained base checkpoints and instruction-tuned checkpoints are available under the Apache 2.0 license, making them accessible to researchers and enterprises. Quantization-aware training of the model promotes optimal FP8 inference performance without compromising quality.

Multilingual support

Mistral NeMo is designed for global applications and excels in multiple languages including English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. performance. This multilingual capability, combined with built-in function calls and extensive context windows, makes advanced AI more accessible across diverse linguistic and cultural environments.

Tekken: Advanced Tokenization

This model uses Tekken, an innovative tokenizer based on tiktoken. Trained on over 100 languages, Tekken improves compression efficiency for natural language text and source code.

SageMaker JumpStart overview

SageMaker JumpStart is a fully managed service that provides a state-of-the-art foundational model for a variety of use cases, including content creation, code generation, question answering, copywriting, summarization, classification, and information retrieval. Accelerate the development and deployment of ML applications by providing a collection of ready-to-deploy pre-trained models. One of the key components of SageMaker JumpStart is the Model Hub. Model Hub provides a huge catalog of pre-trained models, such as DBRX, for a variety of tasks.

You can now discover and deploy both Mistral NeMo models with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK. This allows you to derive control over model performance and machine learning operations (MLOps) using Amazon SageMaker features such as Amazon SageMaker Pipelines. Amazon SageMaker debugger, or container logs. This model is deployed in a secure environment in AWS and under the control of a Virtual Private Cloud (VPC) to help support data security.

Prerequisites

To try both NeMo models with SageMaker JumpStart, you need the following prerequisites:

Discover Mistral NeMo models with SageMaker JumpStart

NeMo models can be accessed through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. This section describes how to discover models in SageMaker Studio.

SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface with access to purpose-built tools to complete ML development steps, from data preparation to building, training, and deploying ML models. It can be executed. For more information about how to get started and set up SageMaker Studio, see Amazon SageMaker Studio.

SageMaker Studio allows you to selectively access SageMaker JumpStart. jump start in the navigation pane.

Then select hug face.

From the SageMaker JumpStart landing page, you can search for NeMo in the search box. The search results list Mistral NeMo Instruct and Mistral NeMo Base.

Select a model card to view details about the model, including its license, data used for training, and how the model is used. Also, expand Click the button to deploy the model and create the endpoint.

Deploy the model with SageMaker JumpStart

Select the Deploy button to begin the deployment. Once the deployment is complete, you will see that the endpoint has been created. To test the endpoint, pass a sample inference request payload or use the SDK and select the test option. If you select the option to use the SDK, you will see sample code that you can use with your selected notebook editor in SageMaker Studio.

Deploy a model using the SageMaker Python SDK

To deploy using the SDK, first: model_id along with the value huggingface-llm-mistral-nemo-base-2407. You can deploy the selected model to SageMaker using the following code. Similarly, you can deploy NeMo Instruct using your own model ID.

from sagemaker.jumpstart.model import JumpStartModel 

accept_eula = True 

model = JumpStartModel(model_id="huggingface-llm-mistral-nemo-base-2407") 
predictor = model.deploy(accept_eula=accept_eula)

This deploys your model to SageMaker with default settings, such as the default instance type and default VPC settings. You can change these configurations by specifying non-default values in JumpStartModel. To accept the End User License Agreement (EULA), the EULA value must be explicitly defined as True. Also, make sure there are account-level service limits for use. ml.g6.12xlarge When using endpoints as one or more instances. You can request a service quota increase by following the AWS Service Quotas instructions. After deployment, you can perform inference on the deployed endpoints via SageMaker predictors.

payload = {
    "messages": (
        {
            "role": "user",
            "content": "Hello"
        }
    ),
    "max_tokens": 1024,
    "temperature": 0.3,
    "top_p": 0.9,
}

response = predictor.predict(payload)('choices')(0)('message')('content').strip()
print(response)

An important thing to note here is that we are using the djl-lmi v12 inference container, so when sending payloads to both Mistral-NeMo-Base-2407 and Mistral-NeMo the Large Model Inference Chat Completion API It’s about following a schema. -Instruction-2407.

Mistral-NeMo-Base-2407

You can work with the Mistral-NeMo-Base-2407 model like any other standard text generation model. The model processes the input sequence and outputs the predicted next word in the sequence. This section provides some example prompts and sample output. Note that the base model has no instructions fine-tuned.

text completion

Tasks involving predicting the next token or filling in missing tokens in a sequence:

payload = {
    "messages": (
        {
            "role": "user",
            "content": "The capital of France is ___."
        }
    ),
    "max_tokens": 10,
    "temperature": 0.3,
    "top_p": 0.9,
}

response = predictor.predict(payload)('choices')(0)('message')('content').strip()
print(response)

The output is:

Paris
The capital of France is Paris.

Mistral Nemo’s Instructions

The Mistral-NeMo-Instruct-2407 model easily demonstrates how the base model can be fine-tuned to achieve attractive performance. Deploy the model following the provided instructions and model_id value of huggingface-llm-mistral-nemo-instruct-2407 Instead.

The instruction-tuned NeMo model can be tested with the following tasks:

code generation

Mistral NeMo Instruct demonstrates benchmarked strengths in coding tasks. Mistral says the Tekken tokenizer for NeMo is approximately 30% more efficient at compressing source code. For example, see the following code.

payload = {
    "messages": (
        {
            "role": "user",
            "content": """Create a Binary Search Tree class with methods for insertion, searching, and in-order traversal."""
        }
    ),
    "max_tokens": 2048,
    "temperature": 0.3,
    "top_p": 0.9,
}
text_response = predictor.predict(payload)('choices')(0)('message')('content').strip()
tokens = predictor.predict(payload)('usage')
print(text_response)
print(tokens)

The output is:

class Node:
    def __init__(self, key):
        self.key = key
        self.left = None
        self.right = None

class BinarySearchTree:
    def __init__(self):
        self.root = None

    def insert(self, key):
        if not self.root:
            self.root = Node(key)
        else:
            self._insert(self.root, key)

    def _insert(self, node, key):
        if key < node.key:
            if node.left:
                self._insert(node.left, key)
            else:
                node.left = Node(key)
        elif key > node.key:
            if node.right:
                self._insert(node.right, key)
            else:
                node.right = Node(key)

    def search(self, key):
        return self._search(self.root, key)

    def _search(self, node, key):
        if not node or node.key == key:
            return node

        if key < node.key:
            return self._search(node.left, key)
        else:
            return self._search(node.right, key)

    def inorder_traversal(self):
        self._inorder_traversal(self.root)
        print()

    def _inorder_traversal(self, node):
        if node:
            self._inorder_traversal(node.left)
            print(node.key, end=" ")
            self._inorder_traversal(node.right)

# Example usage:
bst = BinarySearchTree()
bst.insert(50)
bst.insert(30)
bst.insert(20)
bst.insert(40)
bst.insert(70)
bst.insert(60)
bst.insert(80)

print("In-order traversal:")
bst.inorder_traversal()  # Output: 20 30 40 50 60 70 80

print(f"Search 40: {bst.search(40).key if bst.search(40) else 'Not found'}")
print(f"Search 90: {bst.search(90).key if bst.search(90) else 'Not found'}")
{'prompt_tokens': 22, 'completion_tokens': 433, 'total_tokens': 455}

This model shows strong performance in code generation tasks. completion_tokens This provides insight into how tokenizer code compression can effectively optimize the representation of a programming language using fewer tokens.

advanced mathematics and reasoning

This model also reports strengths in mathematical and inferential accuracy. For example, see the following code.

payload = {
    "messages": (
        {   "role": "system", 
            "content": "You are an expert in mathematics and reasoning. Your role is to provide examples, explanations, and insights related to mathematical concepts, problem-solving techniques, and logical reasoning.",
            "role": "user",
            "content": """Calculating the orbital period of an exoplanet:
             Given: An exoplanet orbits its star at a distance of 2.5 AU (Astronomical Units). The star has a mass of 1.2 solar masses.
             Task: Calculate the orbital period of the exoplanet in Earth years."""
        }
    ),
    "max_tokens": 2048,
    "temperature": 0.3,
    "top_p": 0.9,
}
response = predictor.predict(payload)('choices')(0)('message')('content').strip()
print(response)

The output is:

To calculate the orbital period of an exoplanet, we can use Kepler's Third Law, which states that the square of the orbital period (P) is directly proportional to the cube of the semi-major axis (a) of the orbit and inversely proportional to the mass (M) of the central body. The formula is:

P^2 = (4 * π^2 * a^3) / (G * M)

where:
- P is the orbital period in years,
- a is the semi-major axis in AU (Astronomical Units),
- G is the gravitational constant (6.67430 × 10^-11 m^3 kg^-1 s^-2),
- M is the mass of the star in solar masses.

First, we need to convert the mass of the star from solar masses to kilograms. The mass of the Sun is approximately 1.98847 × 10^30 kg. So, the mass of the star is:

M = 1.2 * 1.98847 × 10^30 kg = 2.386164 × 10^30 kg

Now, we can plug the values into Kepler's Third Law:

P^2 = (4 * π^2 * (2.5 AU)^3) / (G * M)

Since 1 AU is approximately 1.496 × 10^11 meters, the semi-major axis in meters is:

a = 2.5 AU * 1.496 × 10^11 m/AU = 3.74 × 10^12 m

Now, we can calculate P^2:

P^2 = (4 * π^2 * (3.74 × 10^12 m)^3) / (6.67430 × 10^-11 m^3 kg^-1 s^-2 * 2.386164 × 10^30 kg)

P^2 = (4 * π^2 * 5.62 × 10^36 m^3) / (1.589 × 10^20 m^3 kg^-1 s^-2)

P^2 = (4 * π^2 * 5.62 × 10^36 m^3) / (1.589 × 10^20 m^3 kg^-1 s^-2)

P^2 = 4.15 × 10^16 s^2

Now, we take the square root to find the orbital period in seconds:

P = √(4.15 × 10^16 s^2) ≈ 2.04 × 10^8 s

Finally, we convert the orbital period from seconds to Earth years (1 Earth year = 31,557,600 seconds):

P = (2.04 × 10^8 s) / (31,557,600 s/year) ≈ 6.47 years

Therefore, the orbital period of the exoplanet is approximately 6.47 Earth years.

language translation task

In this task, let’s test Mistral’s new Tekken tokenizer. Mistral says the tokenizer is two and three times more efficient at compressing Korean and Arabic, respectively.

Here we will use some text for translation.

text= """
"How can our business leverage Mistral NeMo with our new RAG application?"
"What is our change management strategy once we roll out this new application to the field?
"""

Set prompts to instruct the model to translate into Korean and Arabic.

prompt=f"""

text={text}

Translate the following text into these languages:

1. Korean
2. Arabic

Label each language section accordingly""".format(text=text)

Next, set the payload.

payload = {
    "messages": (
        {   "role": "system", 
            "content": "You are an expert in language translation.",
            "role": "user",
            "content": prompt
        }
    ),
    "max_tokens": 2048,
    "temperature": 0.3,
    "top_p": 0.9,
}
#response = predictor.predict(payload)
text_response = predictor.predict(payload)('choices')(0)('message')('content').strip()
tokens = predictor.predict(payload)('usage')
print(text_response)
print(tokens)

The output is:

**1. Korean**

- "우리의 비즈니스가 Mistral NeMo를 어떻게 활용할 수 있을까요?"
- "이 새 애플리케이션을 현장에 롤아웃할 때 우리의 변화 관리 전략은 무엇입니까?"

**2. Arabic**

- "كيف يمكن لعمليتنا الاست من Mistral NeMo مع تطبيق RAG الجديد؟"
- "ما هو استراتيجيتنا في إدارة التغيير بعد تفعيل هذا التطبيق الجديد في الميدان؟"
{'prompt_tokens': 61, 'completion_tokens': 243, 'total_tokens': 304}

The translation result is completion_tokens Even tasks that are typically token-intensive, such as translations involving languages such as Korean or Arabic, will significantly reduce usage. This improvement was made possible by optimizations provided by the Tekken tokenizer. Such reductions are particularly beneficial for token-intensive applications such as summarization, language generation, and multi-turn conversations. Tekken Tokenizer increases token efficiency, allowing more tasks to be processed within the same resource constraints, making it a valuable tool for optimizing workflows where token usage has a direct impact on performance and cost. It will be.

cleaning

Once you’ve finished running your notebook, be sure to delete any resources you created during the process to avoid additional charges. Use the following code:

predictor.delete_model()
predictor.delete_endpoint()

conclusion

In this post, you learned how to get started with Mistral NeMo Base and Instruct in SageMaker Studio and deploy a model for inference. The base model is pre-trained, reducing training and infrastructure costs and allowing customization for your use case. Visit SageMaker JumpStart in SageMaker Studio to get started today.

For more Mistral resources on AWS, check out the Mistral-on-AWS GitHub repository.

About the author

Nitin Vijeswaran is a Generative AI Specialist Solutions Architect on the Third Party Model Science team at AWS. His areas of focus are generative AI and AWS AI accelerators. He holds a bachelor’s degree in computer science and bioinformatics.

preston tackle is a senior specialist solutions architect working on generative AI.

shane rye is the lead generative AI specialist at the AWS World Wide Specialist Organization (WWSO). He works with customers across industries to solve their most pressing and innovative business needs using the wide range of cloud-based AI/ML services offered by AWS, including models from top-tier foundational model providers. is being solved.

What's Hot

Using Amazon Q Business with AWS HealthScribe to gain insights from patient consultations

Rings of Power Season 2 Will Explore the Makings of Kings

Anti-inflammatory diet: Can certain foods reduce inflammation and help you live longer?

After a disastrous year for the environment, better times are ahead

In times of deepening crisis, science can be our reliable shield

High-tech archeology shows humans aren’t the first to endure tough times

Will hibernation technology allow humans to survive winter?

7 Coolest Mathematical Discoveries of 2024

EBSCOlearning scales assessment generation for their online learning content with generative AI

How Tealium built a chatbot evaluation platform with Ragas and Auto-Instruct using AWS generative AI services

Discover insights from your Amazon Aurora PostgreSQL database using the Amazon Q Business connector

Arctic tundra is now a source, not a sink, of carbon emissions

How Druva used Amazon Bedrock to address foundation model complexity when building Dru, Druva’s backup AI copilot

Best Massage Guns 2024 (UK)

Most Popular

Colm Meaney Isn’t Sure Star Trek Needs an Old Man Miles O’Brien

Hurricane Debbie’s Path: See the Storm’s Path as it Approaches Florida Landfall

WhatsApp reaches 100 million US users

Our Picks

Moon ‘spiders’ hint at vast caverns beneath the Moon

From subject to candidate to planet: Checking in with TESS

Diamonds could be the super semiconductor the U.S. power grid needs

Subscribe to our newsletter

Subscribe to Updates

What's Hot

Mistral-NeMo-Instruct-2407 and Mistral-NeMo-Base-2407 are now available on SageMaker JumpStart

Overview of Mistral-NeMo-Instruct-2407 and Mistral-NeMo-Base-2407

Main features and functions

Multilingual support

Tekken: Advanced Tokenization

SageMaker JumpStart overview

Prerequisites

Discover Mistral NeMo models with SageMaker JumpStart

Deploy the model with SageMaker JumpStart

Deploy a model using the SageMaker Python SDK

Mistral-NeMo-Base-2407

text completion

Mistral Nemo’s Instructions

code generation

advanced mathematics and reasoning

language translation task

cleaning

conclusion

About the author

Related Posts

Subscribe to our newsletter

Subscribe to our newsletter