Today, Pixtral 12B (pixtral-12b-2409
) is Mistral AI’s state-of-the-art vision language model (VLM) that excels at both text-only and multimodal tasks and is available to customers through Amazon SageMaker JumpStart. You can try this model on SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms and models that can be deployed with one click to perform inference.
This post explains how to discover, deploy, and use Pixtral 12B models in various real-world vision use cases.
Pixtral 12B overview
Mistral says the Pixtral 12B is Mistral’s first VLM and has strong performance across a variety of benchmarks, outperforming other open models and rivaling larger models. Pixtral is trained to understand both images and documents and excels at visual tasks such as understanding diagrams and diagrams, answering questions in documents, multimodal reasoning, and following instructions. . Some of them will be explained with examples later in this post. Pixtral 12B can capture images in natural resolution and aspect ratio. Unlike other open-source models, Pixtral does not compromise on the performance of textual benchmarks such as instruction following, coding, and math to deliver superior performance in multimodal tasks.
Mistral has designed a new architecture for Pixtral 12B to optimize both speed and performance. This model has two components. A 400 million parameter vision encoder that tokenizes images and a 12 billion parameter multimodal transformer decoder that predicts the next text token from a sequence of text and images. The vision encoder has been newly trained to natively support variable image sizes. This allows you to use Pixtral to accurately understand complex diagrams, charts, and documents at high resolution, and provides fast inference speeds for small images such as icons, clipart, and formulas. This architecture allows Pixtral to process any number of images of any size with a context window as large as 128,000 tokens.
Licensing agreements are an important deciding factor when using an open weight model. Like other Mistral models such as Mistral 7B, Mistral 8x7B, Mistral 8x22B, and Mistral Nemo 12B, Pixtral 12B is released with commercially acceptable Apache 2.0 to enable complex multimodal applications for enterprise and startup customers. Provides high-performance VLM options for building.
SageMaker JumpStart overview
SageMaker JumpStart provides access to a wide range of publicly available Foundation Models (FM). These pre-trained models serve as a powerful starting point that can be deeply customized to address specific use cases. You can now use cutting-edge model architectures, including language models and computer vision models, without having to build them from scratch.
SageMaker JumpStart allows you to deploy models in a secure environment. Models can be provisioned on dedicated SageMaker Inference instances, including instances powered by AWS Trainium and AWS Inferentia, and are isolated within a Virtual Private Cloud (VPC). This increases data security and compliance because your models operate under the control of your own VPC rather than in a shared public environment. After deploying FM, you can further customize and fine-tune your model, including SageMaker Inference for model deployment and container logging for better observability. SageMaker allows you to streamline the entire model deployment process. Please note that tweaks in Pixtral 12B are not yet available (at the time of writing) in SageMaker JumpStart.
Prerequisites
To try Pixtral 12B with SageMaker JumpStart, you need the following prerequisites:
Discover Pixtral 12B with SageMaker JumpStart
Pixtral 12B can be accessed through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. This section describes how to discover models in SageMaker Studio.
SageMaker Studio is an IDE that provides a single web-based visual interface with access to dedicated tools for performing ML development steps, from preparing data to building, training, and deploying ML models. For more information about how to get started and set up SageMaker Studio, see Amazon SageMaker Studio Classic.
- In SageMaker Studio, select to access SageMaker JumpStart. jump start in the navigation pane.
- choose hug face Access the Pixtral 12B model.
- Find the Pixtral 12B model.
- Select a model card to view details about the model, including its license, data used for training, and how the model is used.
- choose expand Deploy the model and create an endpoint.
Deploy the model with SageMaker JumpStart
Select to start deployment expand. Once the deployment is complete, an endpoint will be created. To test the endpoint, pass a sample inference request payload or use the SDK and select the test option. The SDK provides sample code that you can use in your notebook editor of choice in SageMaker Studio.
To deploy using the SDK, first: model_id
along with the value huggingface-vlm-mistral-pixtral-12b-2409
. You can deploy any of the selected models to SageMaker using the following code.
This deploys your model to SageMaker with default settings, such as the default instance type and default VPC settings. You can change these configurations by specifying non-default values ​​in JumpStartModel. To accept the EULA, you must explicitly define the End User License Agreement (EULA) value as True. Also, ensure that your endpoint usage has account-level service limits for using ml.p4d.24xlarge or ml.pde.24xlarge as one or more instances. To request an increase in your service quotas, see AWS Service Quotas. After you deploy your model, you can run inference against the deployed endpoints through SageMaker predictors.
Examples of using Pixtral 12B
This section provides examples of reasoning and prompts in Pixtral 12B.
OCR
Use the following image as input for OCR.
Use the following prompts:
Understanding and Analyzing Charts
For chart understanding and analysis, use the following image as input.
Use the following prompts:
I get the following output:
From image to code
The image-to-code example uses the following image as input:
Use the following prompts:
cleaning
Once you’re done, use the following code to delete the SageMaker endpoint to avoid incurring unnecessary costs.
conclusion
In this post, we showed you how to get started with Mistral’s latest multimodal model, Pixtral 12B, in SageMaker JumpStart and deploy the model for inference. SageMaker JumpStart also enables data scientists and ML engineers to discover, access, and deploy a variety of pre-trained FMs for inference, including other Mistral AI models such as Mistral 7B and Mixtral 8x22B. We also investigated methods.
For more information about SageMaker JumpStart, see Train, Deploy, and Evaluate Pretrained Models with SageMaker JumpStart and Get Started with Amazon SageMaker JumpStart to get started.
For more Mistral assets, check out the Mistral-on-AWS repository.
About the author
preston tackle is a senior specialist solutions architect working on generative AI.
Nitin Vijeswaran I am a GenAI Specialist Solutions Architect at AWS. His areas of focus are generative AI and AWS AI accelerators. He holds a bachelor’s degree in computer science and bioinformatics. Niithiyn will work closely with the Generative AI GTM team to support AWS customers on a variety of fronts and accelerate their adoption of Generative AI. He is an avid Dallas Mavericks fan and enjoys collecting sneakers.
shane rye is a Principal GenAI Specialist at the AWS World Wide Specialist Organization (WWSO). He works with customers across industries to solve their most pressing and innovative business needs using a wide range of cloud-based AI/ML AWS services, including models from top-tier underlying model providers. I am.