Today, we are announcing the availability of Llama 3.2 models on Amazon SageMaker JumpStart. Llama 3.2 offers multimodal vision and lightweight models that represent Meta’s latest advancements in large-scale language models (LLM), providing enhanced capabilities and broader applicability across a range of use cases. With a focus on responsible innovation and system-level safety, these new models demonstrate state-of-the-art performance on a wide range of industry benchmarks and introduce capabilities to help build a new generation of AI experiences. SageMaker JumpStart is a machine learning (ML) hub that provides access to algorithms, models, and ML solutions to help you jump-start your ML journey.
In this post, we show you how to discover and deploy a Llama 3.2 11B Vision model using SageMaker JumpStart. We also show you the instance types and contexts supported by all Llama 3.2 models available in SageMaker JumpStart. Although not covered in this blog, you can also use SageMaker JumpStart to use lightweight models while fine-tuning them.
The Llama 3.2 model is initially available on SageMaker JumpStart in the US East (Ohio) AWS region. If you are located in the European Union, please note that Meta has restrictions on the use of multimodal models. For more information, see the Meta Community License Agreement.
Overview of Llama 3.2
Llama 3.2 represents Meta’s latest advancements in LLM. Llama 3.2 models are available in a range of sizes, from small to medium multimodal models. Larger Llama 3.2 models are available in two parameter sizes, 11B and 90B, and a 128,000 context length, and can perform advanced inference tasks, including multimodal support for high-resolution images. Lightweight text-only models are available in two parameter sizes, 1B and 3B, and a 128,000 context length, suitable for edge devices. Additionally, there is a new safeguard, the Llama Guard 3 11B Vision parameter model, designed to support responsible innovation and system-level safety.
Llama 3.2 is the first Llama model to support vision tasks, with a new model architecture that integrates image encoder representations into the language model. With a focus on responsible innovation and system-level safety, the Llama 3.2 model helps build and deploy state-of-the-art generative AI models, spurring new innovations like image inference, while making them more accessible for on-edge applications. The new model is designed to drive efficiency for AI workloads, with reduced latency and improved performance, making it suitable for a wide range of applications.
SageMaker JumpStart overview
SageMaker JumpStart gives you access to a wide range of publicly available foundational models (FMs). These pre-trained models serve as powerful starting points that you can deeply customize to address your specific use cases. You can now use state-of-the-art model architectures, such as language models, computer vision models, and more, without having to build them from scratch.
SageMaker JumpStart allows you to deploy your model in a secure environment. Models can be provisioned on dedicated SageMaker Inference instances, such as AWS Trainium or AWS Inferentia-powered instances, and are isolated within a Virtual Private Cloud (VPC). This enhances data security and compliance, as your model runs under your own VPC control, not in a shared public environment. After you deploy your FM, you can further customize and fine-tune it using the extensive capabilities of Amazon SageMaker, including SageMaker Inference for model deployment and container logs for improved observability. SageMaker streamlines the entire model deployment process.
Prerequisites
To try out the Llama 3.2 model with SageMaker JumpStart, you need the following prerequisites:
Discover the Llama 3.2 model on SageMaker JumpStart
SageMaker JumpStart exposes FM through two primary interfaces: SageMaker Studio and the SageMaker Python SDK, which give you multiple options to discover and use hundreds of models for your specific use case.
SageMaker Studio is a comprehensive IDE that provides a unified web-based interface for performing all aspects of the ML development lifecycle. From data preparation to model building, training, and deployment, SageMaker Studio provides dedicated tools to streamline the entire process. In SageMaker Studio, you can access SageMaker JumpStart to search and explore an extensive catalog of FMs that can be deployed to the inference capabilities of SageMaker Inference.
SageMaker Studio gives you access to SageMaker JumpStart. Jump Start In the navigation panel, or Jump Start from house page.
Alternatively, you can use the SageMaker Python SDK to access and use SageMaker JumpStart models programmatically. This approach provides greater flexibility and integration with your existing AI/ML workflows and pipelines. By providing multiple access points, SageMaker JumpStart helps you seamlessly incorporate pre-trained models into your AI/ML development efforts, regardless of your preferred interface or workflow.
Deploy a Llama 3.2 multi-modality model for inference using SageMaker JumpStart
On the SageMaker JumpStart landing page, you can see all the publicly trained models that SageMaker provides. Select the Meta Model Providers tab to see all the meta models available in SageMaker.
If you’re using SageMaker Classic Studio and don’t see the Llama 3.2 model, try updating your version of SageMaker Studio by shutting it down and restarting it. For more information about updating the version, see Shutting Down and Updating the Studio Classic App.
Selecting a model card displays details about the model, including the license, the data used to train it, and how it is being used. There are also two buttons: Expand and Open a notebookIt helps to use the model.
Once you select either button, a pop-up window will appear with the End User License Agreement (EULA) and Terms of Use that you must accept.
Once approved, you can move on to the next step and use the model.
Deploying Llama 3.2 11B Vision models for inference using the Python SDK
When you select Expand Once you agree to the terms, the model deployment will begin, or you can deploy it from the sample notebook. Open a notebookThe notebook provides end-to-end guidance on how to deploy your model for inference and clean up resources.
To deploy using a notebook, first select the appropriate model. model_id
Any model you choose can be deployed to SageMaker.
You can deploy the Llama 3.2 11B Vision model using SageMaker JumpStart using the following SageMaker Python SDK code:
This will deploy your model to SageMaker with default settings, including the default instance type and default VPC settings. You can change these settings by specifying non-default values in JumpStartModel. You must configure them manually to successfully deploy your model. accept_eula=True
As an argument to the deploy method. After deployment, you can run inference against the deployed endpoint via a SageMaker predictor.
Recommended instances and benchmarks
The following table lists all the Llama 3.2 models available in SageMaker JumpStart, along with model_id
the default instance type, and the maximum number of total tokens supported by each model (the number of input tokens plus the number of generated tokens). To increase the context length, you can change the default instance type in the SageMaker JumpStart UI.
Model name | Model ID | Default Instance Type | Supported Instance Types |
Llama-3.2-1B | Metatextual Generation Llama 3-2-1b, Metatextual Generation Neurorama 3-2-1b |
ml.g6.xlarge (context length 125K), ml.trn1.2xlarge (context length 125K) |
All g6/g5/p4/p5 instances. ml.inf2.xlarge, ml.inf2.8xlarge, ml.inf2.24xlarge, ml.inf2.48xlarge, ml.trn1.2xlarge, ml.trn1.32xlarge, ml.trn1n.32xlarge |
Llama-3.2-1B-Instructions | Metatextual Generation Llama 3-2-1b Instructions, Metatextual Generating Neurons Llama 3-2-1b Instructions |
ml.g6.xlarge (context length 125K), ml.trn1.2xlarge (context length 125K) |
All g6/g5/p4/p5 instances. ml.inf2.xlarge, ml.inf2.8xlarge, ml.inf2.24xlarge, ml.inf2.48xlarge, ml.trn1.2xlarge, ml.trn1.32xlarge, ml.trn1n.32xlarge |
Llama-3.2-3B | Metatextual Generation Llama 3-2-3b, Metatextual Generation Neurorama-3-2-3b |
ml.g6.xlarge (context length 125K), ml.trn1.2xlarge (context length 125K) |
All g6/g5/p4/p5 instances. ml.inf2.xlarge, ml.inf2.8xlarge, ml.inf2.24xlarge, ml.inf2.48xlarge, ml.trn1.2xlarge, ml.trn1.32xlarge, ml.trn1n.32xlarge |
Llama-3.2-3B-Instructions | Metatextual generation llama 3-2-3b instructions, Metatextual Generator Neuron Llama 3-2-3b Instructions |
ml.g6.xlarge (context length 125K), ml.trn1.2xlarge (context length 125K) |
All g6/g5/p4/p5 instances. ml.inf2.xlarge, ml.inf2.8xlarge, ml.inf2.24xlarge, ml.inf2.48xlarge, ml.trn1.2xlarge, ml.trn1.32xlarge, ml.trn1n.32xlarge |
Llama-3.2-11B-Vision | meta-vlm-rama-3-2-11b-vision | ml.p4d.24xlarge (context length 125K) | p4d.24xlarge, p4de.24xlarge, p5.48 Extra Large |
Llama-3.2-11B-Vision-Instruction | meta-vlm-llama-3-2-11b-vision-instructions | ml.p4d.24xlarge (context length 125K) | p4d.24xlarge, p4de.24xlarge, p5.48 Extra Large |
Llama-3.2-90B-Vision | meta-vlm-rama-3-2-90b-vision | ml.p5.24xlarge (context length 125K) | p4d.24xlarge, p4de.24xlarge, p5.48 Extra Large |
Llama-3.2-90B-Vision-Instruction | meta-vlm-llama-3-2-90b-vision-instructions | ml.p5.24xlarge (context length 125K) | p4d.24xlarge, p4de.24xlarge, p5.48 Extra Large |
Ramagard 3-11B Vision | meta-vlm-llama-guard-3-11b-vision | ml.p4d.24xlarge | p4d.24xlarge, p4de.24xlarge, p5.48 Extra Large |
Llama 3.2 models have been evaluated on over 150 benchmark datasets, demonstrating performance competitive with leading FMs.
Llama-3.2 11B Vision Inference and Example Prompts
The Llama 3.2 11B and 90B models can be used for text and image or visual inference use cases. They can perform a variety of tasks, including image captioning, image text search, visual question answering and inference, and document visual question answering. The input payload to the endpoint looks like the following code example:
Text-only input
Below is an example of a text-only input:
This produces the following response:
Single Image Input
You can set up a vision-based inference task with a Llama 3.2 model using SageMaker JumpStart as follows:
Let’s load an image from the open source MATH-Vision dataset.
You can use base64 image data to structure your message object.
This produces the following response:
Multiple image input
The following code is an example of multiple image input.
This produces the following response:
cleaning
To avoid unnecessary costs, when you’re done, delete the SageMaker endpoint with the following code snippet:
Alternatively, to use the SageMaker console, follow these steps:
- In the SageMaker console inference In the navigation panel, click Endpoints.
- Find the embedding and text generation endpoints.
- On the endpoint details page, erase.
- choose erase Please check again.
Conclusion
In this post, we discussed how SageMaker JumpStart enables data scientists and ML engineers to discover, access, and deploy a wide range of pre-trained FMs for inference, including Meta’s most advanced and high-performing models to date. Get started with SageMaker JumpStart and the Llama 3.2 model today. For more information about SageMaker JumpStart, see Train, Deploy, and Evaluate Pre-Trained Models with SageMaker JumpStart and Getting Started with Amazon SageMaker JumpStart.
About the Author
Supriya Pragandra Senior Solutions Architect at AWS
Armando Diaz I am a Solutions Architect at AWS.
Sharon Yu I am a Software Development Engineer at AWS.
Siddharth Venkatesan I am a Software Development Engineer at AWS.
Tony Liang I’m a software engineer at AWS.
Evan Kravitz I am a Software Development Engineer at AWS.
Jonathan Guineganye I’m a Senior Software Engineer at AWS.
Tyler Osterberg I’m a software engineer at AWS.
Sindhu Vahini Somasundaram I am a Software Development Engineer at AWS.
Hemant Singh I’m an Applied Scientist at AWS
Xin Fan I’m a Senior Applied Scientist at AWS.
Adrianna Simmons I’m a Senior Product Marketing Manager at AWS.
June Won I’m a senior product manager at AWS.
Carl Albertsen He is responsible for ML Algorithms and JumpStart at AWS.