Today, we announce that Mistral-NeMo-Base-2407 and Mistral-NeMo-Instruct-2407, Mistral AI’s 12 billion parameter large-scale language model for text generation, are now available to customers through Amazon SageMaker JumpStart I’m happy to be able to do this. You can try out these models with SageMaker JumpStart. SageMaker JumpStart is a machine learning (ML) hub that provides access to algorithms and models that can be deployed with one click to perform inference. This post explains how to discover, deploy, and use the Mistral-NeMo-Instruct-2407 and Mistral-NeMo-Base-2407 models for various real-world use cases.
Overview of Mistral-NeMo-Instruct-2407 and Mistral-NeMo-Base-2407
Mistral NeMo, a powerful 12B parameter model developed through a collaboration between Mistral AI and NVIDIA and released under the Apache 2.0 license, is now available in SageMaker JumpStart. This model represents a significant advance in multilingual AI capabilities and accessibility.
Main features and functions
Mistral NeMo features a 128k token context window, allowing for extensive long-form content processing. This model shows good performance in inference, world knowledge, and coding accuracy. Both pre-trained base checkpoints and instruction-tuned checkpoints are available under the Apache 2.0 license, making them accessible to researchers and enterprises. Quantization-aware training of the model promotes optimal FP8 inference performance without compromising quality.
Multilingual support
Mistral NeMo is designed for global applications and excels in multiple languages including English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. performance. This multilingual capability, combined with built-in function calls and extensive context windows, makes advanced AI more accessible across diverse linguistic and cultural environments.
Tekken: Advanced Tokenization
This model uses Tekken, an innovative tokenizer based on tiktoken. Trained on over 100 languages, Tekken improves compression efficiency for natural language text and source code.
SageMaker JumpStart overview
SageMaker JumpStart is a fully managed service that provides a state-of-the-art foundational model for a variety of use cases, including content creation, code generation, question answering, copywriting, summarization, classification, and information retrieval. Accelerate the development and deployment of ML applications by providing a collection of ready-to-deploy pre-trained models. One of the key components of SageMaker JumpStart is the Model Hub. Model Hub provides a huge catalog of pre-trained models, such as DBRX, for a variety of tasks.
You can now discover and deploy both Mistral NeMo models with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK. This allows you to derive control over model performance and machine learning operations (MLOps) using Amazon SageMaker features such as Amazon SageMaker Pipelines. Amazon SageMaker debugger, or container logs. This model is deployed in a secure environment in AWS and under the control of a Virtual Private Cloud (VPC) to help support data security.
Prerequisites
To try both NeMo models with SageMaker JumpStart, you need the following prerequisites:
Discover Mistral NeMo models with SageMaker JumpStart
NeMo models can be accessed through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. This section describes how to discover models in SageMaker Studio.
SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface with access to purpose-built tools to complete ML development steps, from data preparation to building, training, and deploying ML models. It can be executed. For more information about how to get started and set up SageMaker Studio, see Amazon SageMaker Studio.
SageMaker Studio allows you to selectively access SageMaker JumpStart. jump start in the navigation pane.
Then select hug face.
From the SageMaker JumpStart landing page, you can search for NeMo in the search box. The search results list Mistral NeMo Instruct and Mistral NeMo Base.
Select a model card to view details about the model, including its license, data used for training, and how the model is used. Also, expand Click the button to deploy the model and create the endpoint.
Deploy the model with SageMaker JumpStart
Select the Deploy button to begin the deployment. Once the deployment is complete, you will see that the endpoint has been created. To test the endpoint, pass a sample inference request payload or use the SDK and select the test option. If you select the option to use the SDK, you will see sample code that you can use with your selected notebook editor in SageMaker Studio.
Deploy a model using the SageMaker Python SDK
To deploy using the SDK, first: model_id
along with the value huggingface-llm-mistral-nemo-base-2407
. You can deploy the selected model to SageMaker using the following code. Similarly, you can deploy NeMo Instruct using your own model ID.
This deploys your model to SageMaker with default settings, such as the default instance type and default VPC settings. You can change these configurations by specifying non-default values in JumpStartModel. To accept the End User License Agreement (EULA), the EULA value must be explicitly defined as True. Also, make sure there are account-level service limits for use. ml.g6.12xlarge
When using endpoints as one or more instances. You can request a service quota increase by following the AWS Service Quotas instructions. After deployment, you can perform inference on the deployed endpoints via SageMaker predictors.
An important thing to note here is that we are using the djl-lmi v12 inference container, so when sending payloads to both Mistral-NeMo-Base-2407 and Mistral-NeMo the Large Model Inference Chat Completion API It’s about following a schema. -Instruction-2407.
Mistral-NeMo-Base-2407
You can work with the Mistral-NeMo-Base-2407 model like any other standard text generation model. The model processes the input sequence and outputs the predicted next word in the sequence. This section provides some example prompts and sample output. Note that the base model has no instructions fine-tuned.
text completion
Tasks involving predicting the next token or filling in missing tokens in a sequence:
The output is:
Mistral Nemo’s Instructions
The Mistral-NeMo-Instruct-2407 model easily demonstrates how the base model can be fine-tuned to achieve attractive performance. Deploy the model following the provided instructions and model_id
value of huggingface-llm-mistral-nemo-instruct-2407
Instead.
The instruction-tuned NeMo model can be tested with the following tasks:
code generation
Mistral NeMo Instruct demonstrates benchmarked strengths in coding tasks. Mistral says the Tekken tokenizer for NeMo is approximately 30% more efficient at compressing source code. For example, see the following code.
The output is:
This model shows strong performance in code generation tasks. completion_tokens
This provides insight into how tokenizer code compression can effectively optimize the representation of a programming language using fewer tokens.
advanced mathematics and reasoning
This model also reports strengths in mathematical and inferential accuracy. For example, see the following code.
The output is:
In this task, let’s test Mistral’s new Tekken tokenizer. Mistral says the tokenizer is two and three times more efficient at compressing Korean and Arabic, respectively.
Here we will use some text for translation.
Set prompts to instruct the model to translate into Korean and Arabic.
Next, set the payload.
The output is:
The translation result is completion_tokens
Even tasks that are typically token-intensive, such as translations involving languages such as Korean or Arabic, will significantly reduce usage. This improvement was made possible by optimizations provided by the Tekken tokenizer. Such reductions are particularly beneficial for token-intensive applications such as summarization, language generation, and multi-turn conversations. Tekken Tokenizer increases token efficiency, allowing more tasks to be processed within the same resource constraints, making it a valuable tool for optimizing workflows where token usage has a direct impact on performance and cost. It will be.
cleaning
Once you’ve finished running your notebook, be sure to delete any resources you created during the process to avoid additional charges. Use the following code:
conclusion
In this post, you learned how to get started with Mistral NeMo Base and Instruct in SageMaker Studio and deploy a model for inference. The base model is pre-trained, reducing training and infrastructure costs and allowing customization for your use case. Visit SageMaker JumpStart in SageMaker Studio to get started today.
For more Mistral resources on AWS, check out the Mistral-on-AWS GitHub repository.
About the author
Nitin Vijeswaran is a Generative AI Specialist Solutions Architect on the Third Party Model Science team at AWS. His areas of focus are generative AI and AWS AI accelerators. He holds a bachelor’s degree in computer science and bioinformatics.
preston tackle is a senior specialist solutions architect working on generative AI.
shane rye is the lead generative AI specialist at the AWS World Wide Specialist Organization (WWSO). He works with customers across industries to solve their most pressing and innovative business needs using the wide range of cloud-based AI/ML services offered by AWS, including models from top-tier foundational model providers. is being solved.