Meta’s Llama 3.2 Models Now Available on Amazon SageMaker JumpStart

Today, we are announcing the availability of Llama 3.2 models on Amazon SageMaker JumpStart. Llama 3.2 offers multimodal vision and lightweight models that represent Meta’s latest advancements in large-scale language models (LLM), providing enhanced capabilities and broader applicability across a range of use cases. With a focus on responsible innovation and system-level safety, these new models demonstrate state-of-the-art performance on a wide range of industry benchmarks and introduce capabilities to help build a new generation of AI experiences. SageMaker JumpStart is a machine learning (ML) hub that provides access to algorithms, models, and ML solutions to help you jump-start your ML journey.

In this post, we show you how to discover and deploy a Llama 3.2 11B Vision model using SageMaker JumpStart. We also show you the instance types and contexts supported by all Llama 3.2 models available in SageMaker JumpStart. Although not covered in this blog, you can also use SageMaker JumpStart to use lightweight models while fine-tuning them.

The Llama 3.2 model is initially available on SageMaker JumpStart in the US East (Ohio) AWS region. If you are located in the European Union, please note that Meta has restrictions on the use of multimodal models. For more information, see the Meta Community License Agreement.

Overview of Llama 3.2

Llama 3.2 represents Meta’s latest advancements in LLM. Llama 3.2 models are available in a range of sizes, from small to medium multimodal models. Larger Llama 3.2 models are available in two parameter sizes, 11B and 90B, and a 128,000 context length, and can perform advanced inference tasks, including multimodal support for high-resolution images. Lightweight text-only models are available in two parameter sizes, 1B and 3B, and a 128,000 context length, suitable for edge devices. Additionally, there is a new safeguard, the Llama Guard 3 11B Vision parameter model, designed to support responsible innovation and system-level safety.

Llama 3.2 is the first Llama model to support vision tasks, with a new model architecture that integrates image encoder representations into the language model. With a focus on responsible innovation and system-level safety, the Llama 3.2 model helps build and deploy state-of-the-art generative AI models, spurring new innovations like image inference, while making them more accessible for on-edge applications. The new model is designed to drive efficiency for AI workloads, with reduced latency and improved performance, making it suitable for a wide range of applications.

SageMaker JumpStart overview

SageMaker JumpStart gives you access to a wide range of publicly available foundational models (FMs). These pre-trained models serve as powerful starting points that you can deeply customize to address your specific use cases. You can now use state-of-the-art model architectures, such as language models, computer vision models, and more, without having to build them from scratch.

SageMaker JumpStart allows you to deploy your model in a secure environment. Models can be provisioned on dedicated SageMaker Inference instances, such as AWS Trainium or AWS Inferentia-powered instances, and are isolated within a Virtual Private Cloud (VPC). This enhances data security and compliance, as your model runs under your own VPC control, not in a shared public environment. After you deploy your FM, you can further customize and fine-tune it using the extensive capabilities of Amazon SageMaker, including SageMaker Inference for model deployment and container logs for improved observability. SageMaker streamlines the entire model deployment process.

Prerequisites

To try out the Llama 3.2 model with SageMaker JumpStart, you need the following prerequisites:

Discover the Llama 3.2 model on SageMaker JumpStart

SageMaker JumpStart exposes FM through two primary interfaces: SageMaker Studio and the SageMaker Python SDK, which give you multiple options to discover and use hundreds of models for your specific use case.

SageMaker Studio is a comprehensive IDE that provides a unified web-based interface for performing all aspects of the ML development lifecycle. From data preparation to model building, training, and deployment, SageMaker Studio provides dedicated tools to streamline the entire process. In SageMaker Studio, you can access SageMaker JumpStart to search and explore an extensive catalog of FMs that can be deployed to the inference capabilities of SageMaker Inference.

SageMaker Studio gives you access to SageMaker JumpStart. Jump Start In the navigation panel, or Jump Start from house page.

Alternatively, you can use the SageMaker Python SDK to access and use SageMaker JumpStart models programmatically. This approach provides greater flexibility and integration with your existing AI/ML workflows and pipelines. By providing multiple access points, SageMaker JumpStart helps you seamlessly incorporate pre-trained models into your AI/ML development efforts, regardless of your preferred interface or workflow.

Deploy a Llama 3.2 multi-modality model for inference using SageMaker JumpStart

On the SageMaker JumpStart landing page, you can see all the publicly trained models that SageMaker provides. Select the Meta Model Providers tab to see all the meta models available in SageMaker.

If you’re using SageMaker Classic Studio and don’t see the Llama 3.2 model, try updating your version of SageMaker Studio by shutting it down and restarting it. For more information about updating the version, see Shutting Down and Updating the Studio Classic App.

Selecting a model card displays details about the model, including the license, the data used to train it, and how it is being used. There are also two buttons: Expand and Open a notebookIt helps to use the model.

Once you select either button, a pop-up window will appear with the End User License Agreement (EULA) and Terms of Use that you must accept.

Once approved, you can move on to the next step and use the model.

Deploying Llama 3.2 11B Vision models for inference using the Python SDK

When you select Expand Once you agree to the terms, the model deployment will begin, or you can deploy it from the sample notebook. Open a notebookThe notebook provides end-to-end guidance on how to deploy your model for inference and clean up resources.

To deploy using a notebook, first select the appropriate model. model_idAny model you choose can be deployed to SageMaker.

You can deploy the Llama 3.2 11B Vision model using SageMaker JumpStart using the following SageMaker Python SDK code:

from sagemaker.jumpstart.model import JumpStartModel
model = JumpStartModel(model_id = "meta-vlm-llama-3-2-11b-vision")
predictor = model.deploy(accept_eula=accept_eula)

This will deploy your model to SageMaker with default settings, including the default instance type and default VPC settings. You can change these settings by specifying non-default values in JumpStartModel. You must configure them manually to successfully deploy your model. accept_eula=True As an argument to the deploy method. After deployment, you can run inference against the deployed endpoint via a SageMaker predictor.

payload = {
    "messages": (
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "How are you doing today"},
        {"role": "assistant", "content": "Good, what can i help you with today?"},
        {"role": "user", "content": "Give me 5 steps to become better at tennis?"}
    ),
    "temperature": 0.6,
    "top_p": 0.9,
    "max_tokens": 512,
    "logprobs": False
}
response = predictor.predict(payload)
response_message = response('choices')(0)('message')('content')

Recommended instances and benchmarks

The following table lists all the Llama 3.2 models available in SageMaker JumpStart, along with model_idthe default instance type, and the maximum number of total tokens supported by each model (the number of input tokens plus the number of generated tokens). To increase the context length, you can change the default instance type in the SageMaker JumpStart UI.

Model name	Model ID	Default Instance Type	Supported Instance Types
Llama-3.2-1B	Metatextual Generation Llama 3-2-1b, Metatextual Generation Neurorama 3-2-1b	ml.g6.xlarge (context length 125K), ml.trn1.2xlarge (context length 125K)	All g6/g5/p4/p5 instances. ml.inf2.xlarge, ml.inf2.8xlarge, ml.inf2.24xlarge, ml.inf2.48xlarge, ml.trn1.2xlarge, ml.trn1.32xlarge, ml.trn1n.32xlarge
Llama-3.2-1B-Instructions	Metatextual Generation Llama 3-2-1b Instructions, Metatextual Generating Neurons Llama 3-2-1b Instructions	ml.g6.xlarge (context length 125K), ml.trn1.2xlarge (context length 125K)	All g6/g5/p4/p5 instances. ml.inf2.xlarge, ml.inf2.8xlarge, ml.inf2.24xlarge, ml.inf2.48xlarge, ml.trn1.2xlarge, ml.trn1.32xlarge, ml.trn1n.32xlarge
Llama-3.2-3B	Metatextual Generation Llama 3-2-3b, Metatextual Generation Neurorama-3-2-3b	ml.g6.xlarge (context length 125K), ml.trn1.2xlarge (context length 125K)	All g6/g5/p4/p5 instances. ml.inf2.xlarge, ml.inf2.8xlarge, ml.inf2.24xlarge, ml.inf2.48xlarge, ml.trn1.2xlarge, ml.trn1.32xlarge, ml.trn1n.32xlarge
Llama-3.2-3B-Instructions	Metatextual generation llama 3-2-3b instructions, Metatextual Generator Neuron Llama 3-2-3b Instructions	ml.g6.xlarge (context length 125K), ml.trn1.2xlarge (context length 125K)	All g6/g5/p4/p5 instances. ml.inf2.xlarge, ml.inf2.8xlarge, ml.inf2.24xlarge, ml.inf2.48xlarge, ml.trn1.2xlarge, ml.trn1.32xlarge, ml.trn1n.32xlarge
Llama-3.2-11B-Vision	meta-vlm-rama-3-2-11b-vision	ml.p4d.24xlarge (context length 125K)	p4d.24xlarge, p4de.24xlarge, p5.48 Extra Large
Llama-3.2-11B-Vision-Instruction	meta-vlm-llama-3-2-11b-vision-instructions	ml.p4d.24xlarge (context length 125K)	p4d.24xlarge, p4de.24xlarge, p5.48 Extra Large
Llama-3.2-90B-Vision	meta-vlm-rama-3-2-90b-vision	ml.p5.24xlarge (context length 125K)	p4d.24xlarge, p4de.24xlarge, p5.48 Extra Large
Llama-3.2-90B-Vision-Instruction	meta-vlm-llama-3-2-90b-vision-instructions	ml.p5.24xlarge (context length 125K)	p4d.24xlarge, p4de.24xlarge, p5.48 Extra Large
Ramagard 3-11B Vision	meta-vlm-llama-guard-3-11b-vision	ml.p4d.24xlarge	p4d.24xlarge, p4de.24xlarge, p5.48 Extra Large

Llama 3.2 models have been evaluated on over 150 benchmark datasets, demonstrating performance competitive with leading FMs.

Llama-3.2 11B Vision Inference and Example Prompts

The Llama 3.2 11B and 90B models can be used for text and image or visual inference use cases. They can perform a variety of tasks, including image captioning, image text search, visual question answering and inference, and document visual question answering. The input payload to the endpoint looks like the following code example:

Text-only input

Below is an example of a text-only input:

{
    'messages': (
        {'role': 'system', 'content': 'You are a helpful assistant'}, 
        {'role': 'user', 'content': 'How are you doing today'}, 
        {'role': 'assistant', 'content': 'Good, what can i help you with today?'}, 
        {'role': 'user', "content": 'Give me 5 steps to become better at tennis?'}
    ), 
    'temperature': 0.6, 
    'top_p': 0.9, 
    'max_tokens': 512, 
    'logprobs': False
}

This produces the following response:

Here are 5 steps to help you improve your tennis game:

**Step 1: Develop a Consistent Serve**

A strong serve is the foundation of a good tennis game. Practice your serve regularly to develop a consistent and powerful shot. Focus on proper footwork, grip, and body positioning. Experiment with different serves, such as topspin and slice, to keep your opponent guessing.

**Step 2: Improve Your Footwork and Agility**

Good footwork is essential for getting into position to hit shots and covering the court. Practice shuffling, sliding, and turning to improve your agility and reaction time. Incorporate exercises like ladder drills and cone drills into your training to enhance your foot speed and agility.

**Step 3: Focus on Your Volley and Overhead Shots**

The volley and overhead shots are critical for winning points at the net. Practice hitting volleys and overheads with both forehand and backhand grips. Focus on keeping your racquet close to the ball and using your wrist and arm to generate power.

**Step 4: Work on Your Return of Serve**

The return of serve is a crucial shot in tennis, as it sets up the point for your opponent. Practice returning serves with different types of spin, such as topspin and slice. Focus on getting your racquet head to the ball early and using your legs to generate power.

**Step 5: Analyze Your Game and Practice with a Purpose**

To improve your game, you need to identify areas for improvement and practice with a purpose. Record your matches and analyze your game to identify weaknesses and areas for improvement. Create a practice plan that targets specific areas, such as your forehand or backhand, and focus on making progress in those areas.

Remember, improvement takes time and practice. Stay committed, and with consistent effort, you'll see improvement in your tennis game!

Single Image Input

You can set up a vision-based inference task with a Llama 3.2 model using SageMaker JumpStart as follows:

import requests
import base64

def url_to_base64(image_url):
    # Download the image
    response = requests.get(image_url)
    if response.status_code != 200:
        return None

    # Encode the image content to base64
    image_base64 = base64.b64encode(response.content).decode('utf-8')
    return image_base64

Let’s load an image from the open source MATH-Vision dataset.

url = "https://raw.githubusercontent.com/mathvision-cuhk/MATH-V/refs/heads/main/images/13.jpg"
image_data = url_to_base64(url)

You can use base64 image data to structure your message object.

{
  "messages": (
    {
      "role": "user",
      "content": (
        {
          "type": "text",
          "text": "Which of these figures differs from the rest four?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": f"""data:image/jpeg;base64,{image_data}"""
          }
        }
      )
    }
  ),
  "temperature": 0.4,
  "top_p": 0.95,
  "max_tokens": 512,
  "logprobs": False
}

This produces the following response:

The figure labeled E is the one that differs from the rest.

Multiple image input

The following code is an example of multiple image input.

image_data1 = url_to_base64("https://d2908q01vomqb2.cloudfront.net/887309d048beef83ad3eabf2a79a64a389ab1c9f/2023/04/27/DBBLOG-2448-SingleAZInstance.png")
image_data2 = url_to_base64("https://d2908q01vomqb2.cloudfront.net/887309d048beef83ad3eabf2a79a64a389ab1c9f/2023/04/27/DBBLOG-2448-MultiAZInstance.png")
payload = {
  "messages": (
    {
      "role": "user",
      "content": (
        {
          "type": "text",
          "text": "There are two images. Describe each one and why they are related"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": f"""data:image/jpeg;base64,{image_data1}"""
          }
        },
        {
          "type": "image_url",
          "image_url": {
            "url": f"""data:image/jpeg;base64,{image_data2}"""
          }
        }
      )
    }
  ),
  "temperature": 0.6,
  "top_p": 0.9,
  "max_tokens": 512,
  "logprobs": False
}

This produces the following response:

The two images depict different architectures for a distributed database system. The main difference between them is the location of the database instances and how they are connected.

**Image 1: Centralized Architecture**

In this image, there is a single database instance (DB1) that serves as the central repository for all data. This instance is connected to multiple application servers (App1, App2, App3) through a network. The application servers interact with the database instance to read and write data.

**Image 2: Distributed Architecture**

In this image, there are multiple database instances (DB1, DB2, DB3) that are distributed across different locations. Each database instance is connected to a local application server (App1, App2, App3) through a network. The application servers interact with their local database instances to read and write data.

The key differences between the two images are:

* **Scalability**: The distributed architecture (Image 2) allows for easier scalability, as new database instances can be added or removed as needed, without affecting the overall system.
* **Availability**: The distributed architecture provides higher availability, as if one database instance goes down, the other instances can still serve requests.
* **Performance**: The centralized architecture (Image 1) may experience performance issues if the central database instance becomes a bottleneck, while the distributed architecture can distribute the load across multiple instances.
* **Data consistency**: The centralized architecture may have easier data consistency management, as all data is stored in a single location, while the distributed architecture requires more complex mechanisms to ensure data consistency across multiple instances.

In summary, the centralized architecture is suitable for small to medium-sized applications with low traffic, while the distributed architecture is more suitable for large-scale applications with high traffic and scalability requirements.

cleaning

To avoid unnecessary costs, when you’re done, delete the SageMaker endpoint with the following code snippet:

predictor.delete_model()
predictor.delete_endpoint()

Alternatively, to use the SageMaker console, follow these steps:

In the SageMaker console inference In the navigation panel, click Endpoints.
Find the embedding and text generation endpoints.
On the endpoint details page, erase.
choose erase Please check again.

Conclusion

In this post, we discussed how SageMaker JumpStart enables data scientists and ML engineers to discover, access, and deploy a wide range of pre-trained FMs for inference, including Meta’s most advanced and high-performing models to date. Get started with SageMaker JumpStart and the Llama 3.2 model today. For more information about SageMaker JumpStart, see Train, Deploy, and Evaluate Pre-Trained Models with SageMaker JumpStart and Getting Started with Amazon SageMaker JumpStart.

About the Author

Supriya Pragandra Senior Solutions Architect at AWS
Armando Diaz I am a Solutions Architect at AWS.
Sharon Yu I am a Software Development Engineer at AWS.
Siddharth Venkatesan I am a Software Development Engineer at AWS.
Tony Liang I’m a software engineer at AWS.
Evan Kravitz I am a Software Development Engineer at AWS.
Jonathan Guineganye I’m a Senior Software Engineer at AWS.
Tyler Osterberg I’m a software engineer at AWS.
Sindhu Vahini Somasundaram I am a Software Development Engineer at AWS.
Hemant Singh I’m an Applied Scientist at AWS
Xin Fan I’m a Senior Applied Scientist at AWS.
Adrianna Simmons I’m a Senior Product Marketing Manager at AWS.
June Won I’m a senior product manager at AWS.
Carl Albertsen He is responsible for ML Algorithms and JumpStart at AWS.

What's Hot

This Lifetime AI Tool is on Sale: Get 1min AI for $39.99

Deliver personalized marketing with Amazon Bedrock Agents

As an apology, Crowdstrike sent out $10 Uber Eats vouchers, some of which don’t work.

Accelerating insurance policy reviews with generative AI: Verisk’s Mozart companion

Announcing general availability of Amazon Bedrock Knowledge Bases GraphRAG with Amazon Neptune Analytics

Evaluate RAG responses with Amazon Bedrock, LlamaIndex and RAGAS

Build a Multi-Agent System with LangGraph and Mistral on AWS

Ground truth generation and review best practices for evaluating generative AI question-answering with FMEval

Accelerating insurance policy reviews with generative AI: Verisk’s Mozart companion

Announcing general availability of Amazon Bedrock Knowledge Bases GraphRAG with Amazon Neptune Analytics

Evaluate RAG responses with Amazon Bedrock, LlamaIndex and RAGAS

We may have discovered how the placebo effect can ease pain

Accelerate Generative AI Inference with NVIDIA NIM Microservices on Amazon SageMaker

Meta expands AI chatbot to more regions and adds new features

Most Popular

Guillermo del Toro Looks Back at 2015’s Crimson Peak

Nexus Review: Yuval Noah Harari outdoes himself with his new book

RFK Jr.’s testimony is suffering from his achievements

Our Picks

How to connect a Nintendo Switch to a TV

Women’s 200m Final Paris 2024 live stream: watch athletics live for free

How Does a Bioluminescent Petunia Glow?

Subscribe to our newsletter

Subscribe to Updates

What's Hot

Meta’s Llama 3.2 Models Now Available on Amazon SageMaker JumpStart

Overview of Llama 3.2

SageMaker JumpStart overview

Prerequisites

Discover the Llama 3.2 model on SageMaker JumpStart

Deploy a Llama 3.2 multi-modality model for inference using SageMaker JumpStart

Deploying Llama 3.2 11B Vision models for inference using the Python SDK

Recommended instances and benchmarks

Llama-3.2 11B Vision Inference and Example Prompts

Text-only input

Single Image Input

Multiple image input

cleaning

Conclusion

About the Author

Related Posts

Subscribe to our newsletter

Subscribe to our newsletter