Snowflake Arctic Model Now Available on Amazon SageMaker JumpStart

This post was co-authored by Matt Marzillo of Snowflake.

Today, we are announcing that Snowflake Arctic Instruct models are now available for deployment and running inference through Amazon SageMaker JumpStart. Snowflake Arctic is a family of enterprise-grade large-scale language models (LLMs) built by Snowflake to serve the needs of enterprise users, and excels in SQL querying, coding, and instruction accuracy execution (as shown in the following benchmarks). SageMaker JumpStart is a machine learning (ML) hub that provides access to algorithms, models, and ML solutions to jump-start your ML journey.

This post explains how to use SageMaker JumpStart to discover and deploy the Snowflake Arctic Instruct model, and provides an example use case with concrete prompts.

What is Snowflake Arctic?

Snowflake Arctic is an enterprise-focused LLM that offers top-tier enterprise intelligence among open LLMs at a highly competitive cost-effectiveness. Snowflake is able to achieve advanced enterprise intelligence with its Dense Mixture of Experts (MoE) hybrid transformer architecture and efficient training techniques. Artic with its hybrid transformer architecture is designed with 10 billion dense transformer models combined with residual 128×3.66 billion MoE MLPs, with a total of 480 billion parameters distributed across 128 fine-grained experts and 17 billion active parameters selected using top-2 gating. This allows Snowflake Arctic to expand enterprise intelligence capacity with a high total parameter count, while at the same time improving training and inference resource efficiency with a moderate number of active parameters.

Snowflake Arctic is trained on a three-phase data curriculum where the first phase focuses on general skills (1 trillion tokens, mostly web data) and the next two phases focus on enterprise-focused skills (1.5 trillion tokens and 1 trillion tokens respectively, more code, SQL, and STEM data), enabling the Snowflake Arctic model to set a new baseline for cost-effective enterprise intelligence.

In addition to cost-efficient training, Snowflake Arctic also features numerous innovations and optimizations to run inference efficiently. At small batch sizes, inference is memory bandwidth constrained and Snowflake Arctic requires up to 4x fewer memory reads compared to other public models, resulting in better inference performance. At very large batch sizes, inference is compute constrained and Snowflake Arctic requires up to 4x less compute compared to other public models. Snowflake Arctic models are available under the Apache 2.0 license, which provides ungated access to weights and code. All data recipes and research insights are also provided to customers.

What is SageMaker JumpStart?

SageMaker JumpStart allows you to choose from a wide range of publicly available foundational models (FMs). ML practitioners can deploy FMs on dedicated Amazon SageMaker instances from a network-isolated environment and use SageMaker to customize model training and deployment. You can now discover and deploy Arctic Instruct models with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK. This allows you to derive control over model performance and machine learning operations (MLOps) using SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, and container logs. Models are deployed in a secure environment on AWS, under the control of a virtual private cloud (VPC) to ensure data security. Snowflake Arctic Instruct models are available today for deployment and inference in SageMaker Studio. us-east-2 AWS Regions, with availability in additional regions planned in the future.

Find a model

You can access FM through the SageMaker Studio UI, SageMaker JumpStart, and the SageMaker Python SDK. In this section, we’ll show you how to discover models in SageMaker Studio.

SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface to access specialized tools for performing all ML development steps, from data preparation to building, training, and deploying ML models. For more information on how to get started and set up SageMaker Studio, see Amazon SageMaker Studio.

SageMaker Studio gives you access to SageMaker JumpStart, which contains pre-trained models, notebooks, and pre-built solutions. Pre-built automation solutions.

From the SageMaker JumpStart landing page, you can find different models by browsing different hubs named after the model providers. The Snowflake Arctic Instruct model is in the Hugging Face hub. If you don’t see the Arctic Instruct model, try shutting down and restarting SageMaker Studio to update your version. For more information, see Shutting Down and Updating the Studio Classic App.

You can also find the Snowflake Arctic Instruct Models by searching for “Snowflake” in the search field.

When you select a model card, you can view the model details, including the license, the data used to train it, how the model is used, etc. There are also two options to deploy the model – a deployment notebook and a preview notebook, which will deploy the model and create an endpoint for you.

Deploying the model in SageMaker Studio

Select “Deploy” in SageMaker Studio and the deployment will begin.

You can monitor the progress of the deployment on the Redirected Endpoint details page.

Deploying models through notebooks

Alternatively, you can select (Open Notebook) to deploy your model through a sample notebook, which provides end-to-end guidance on how to deploy your model for inference and clean up resources.

To deploy using the notebook, first select the appropriate model specified by model_id. You can then deploy one of the selected models to SageMaker using the following code:

from sagemaker.jumpstart.model import JumpStartModel
model = JumpStartModel(model_id = "huggingface-llm-snowflake-arctic-instruct-vllm")

predictor = model.deploy()

This will deploy the model in SageMaker with default settings, including default instance type and default VPC settings. You can change these settings by specifying non-default values in JumpStartModel. For more information, see the API documentation.

Run inference

After deploying the model, you can run inference against the deployed endpoint via the SageMaker prediction API. Snowflake Arctic Instruct accepts chat history between the user and the assistant and generates subsequent chats.

predictor.predict(payload)

Inference parameters control the text generation process at the endpoint. The Max new tokens parameter controls the size of the output generated by the model. This may not be the same as the number of words, as the model’s vocabulary is not the same as the English vocabulary. The temperature parameter controls the randomness of the output. Higher temperatures result in more creative and hallucinatory outputs. All inference parameters are optional.

The model accepts formatted instructions where the conversation roles start with a prompt from the user and alternate between user instructions and the assistant. The format of the instructions must be strictly adhered to or the model will produce suboptimal output. The template that creates the prompts for the model is defined as follows:

<|im_start|>system
{system_message} <|im_end|>
<|im_start|>user
{human_message} <|im_end|>
<|im_start|>assistant\n

<|im_start|> and <|im_end|> Special tokens that represent the beginning of a string (BOS) and the end of a string (EOS). A model can include multiple conversational turns between the system, the user, and the assistant, incorporating a small number of samples to enhance the model’s responses.

The following code shows how to format the prompt imperatively:

from typing import Dict, List

def format_instructions(instructions: List(Dict(str, str))) -> List(str):
    """Format instructions where conversation roles must alternate system/user/assistant/user/assistant/..."""
    prompt: List(str) = ()
    for instruction in instructions:
        if instruction("role") == "system":
            prompt.extend(("<|im_start|>system\n", (instruction("content")).strip(), "<|im_end|>\n"))
        elif instruction("role") == "user":
            prompt.extend(("<|im_start|>user\n", (instruction("content")).strip(), "<|im_end|>\n"))
        else:
            raise ValueError(f"Invalid role: {instruction('role')}. Role must be either 'user' or 'system'.")
    prompt.extend(("<|im_start|>assistant\n"))
    return "".join(prompt)

def print_instructions(prompt: str, response: str) -> None:
    bold, unbold = '\033(1m', '\033(0m'
    print(f"{bold}> Input{unbold}\n{prompt}\n\n{bold}> Output{unbold}\n{response(0)('generated_text').strip()}\n")

The following sections provide sample prompts for various enterprise use cases.

Summarizing long texts

Snowflake Arctic Instruct allows you to perform custom tasks such as summarizing long text into a JSON formatted output. Through text generation, you can perform a variety of tasks such as text summarization, language translation, code generation, sentiment analysis, etc. The input payload to the endpoint looks like the following code:

payload = {
“inputs”: str,
(optional)"parameters":{"max_new_tokens":int, "top_p":float, "temperature":float}
}

Below are example prompts and text generated by the model. All output is generated using the inference parameters. {"max_new_tokens":512, "top_p":0.95, "temperature":0.7, "top_k":50}.

The input is as follows:

instructions = (
{
"role": "user",
"content": """Summarize this transcript in less than 200 words.
Put the product name, defect and summary in JSON format.

Transcript:

Customer: Hello

Agent: Hi there, I hope you're having a great day! To better assist you, could you please provide your first and last name and the company you are calling from?

Customer: Sure, my name is Jessica Turner and I'm calling from Mountain Ski Adventures.

Agent: Thanks, Jessica. What can I help you with today?

Customer: Well, we recently ordered a batch of XtremeX helmets, and upon inspection, we noticed that the buckles on several helmets are broken and won't secure the helmet properly.

Agent: I apologize for the inconvenience this has caused you. To confirm, is your order number 68910?

Customer: Yes, that's correct.

Agent: Thank you for confirming. I'm going to look into this issue and see what we can do to correct it. Would you prefer a refund or a replacement for the damaged helmets?

Customer: A replacement would be ideal, as we still need the helmets for our customers.

Agent: I understand. I will start the process to send out replacements for the damaged helmets as soon as possible. Can you please specify the quantity of helmets with broken buckles?

Customer: There are ten helmets with broken buckles in total.

Agent: Thank you for providing me with the quantity. We will expedite a new shipment of ten XtremeX helmets with functioning buckles to your location. You should expect them to arrive within 3-5 business days.

Customer: Thank you for your assistance, I appreciate it.

Agent: You're welcome, Jessica! If you have any other questions or concerns, please don't hesitate to contact us. Have a great day!
"""
}
)

prompt = format_instructions(instructions)
inputs = {
"inputs": prompt,
"parameters": {
"temperature": 0.1,
"top_p": 0.95,
"max_new_tokens": 512,
"do_sample": False
}
}
response = predictor.predict(inputs)
print_instructions(prompt, response)

You will get output similar to the following:

> Output
{
"product_name": "XtremeX helmets",
"defect": "broken buckles",
"summary": "Customer reports that several XtremeX helmets have broken buckles that won't secure the helmet properly. They prefer a replacement as they still need the helmets for their customers. Agent confirms the order number and will send out replacements for the damaged helmets within 3-5 business days."
}

Code Generation

Using the previous example, you can use the code generation prompt as follows:

instructions = (
{
"role": "user",
"content": "Write a function in Python to write a json file:"
}
)
prompt = format_instructions(instructions)
inputs = {
"inputs": prompt,
"parameters": {
"temperature": 0.1,
"top_p": 0.95,
"max_new_tokens": 400,
"do_sample": False
}
}
response = predictor.predict(inputs)
print_instructions(prompt, response)

The code above uses Snowflake Arctic Instruct to generate a Python function that writes a JSON file. We define the input prompt “Write a function in Python to write a json file:” and a payload dictionary that contains some parameters that control the generation process, such as the maximum number of tokens to generate and whether sampling is enabled. We send this payload to a predictor (likely an API), receive the generated text response, and print it to the console. The output is a Python function that writes the JSON file requested in the prompt.

The output is:

> Input
<|im_start|>user
Write a function in Python to write a json file:<|im_end|>
<|im_start|>assistant

> Output
Here is a simple function in Python that writes a Python dictionary to a JSON file:

```python
import json

def write_json(data, filename):
with open(filename, 'w') as f:
json.dump(data, f)
```

You can use this function to write a Python dictionary to a file like this:

```python
data = {
"name": "John",
"age": 30,
"city": "New York"
}

write_json(data, 'output.json')
```

This creates a file named `output.json` in the same directory as your Python script and writes the `data` dictionary to it in JSON format.

The output from code generation defines write_json, which takes a filename and a Python object and writes the object as JSON data. The output displays the expected JSON file contents, demonstrating the natural language processing and code generation capabilities of the model.

Mathematics and Reasoning

Snowflake Arctic Instruct also reports the strength of your mathematical reasoning. Test it using the following prompts:

instructions = (
{
"role": "user",
"content": "I bought an ice cream for 6 kids. Each cone was $1.25 and I paid with a $10 bill. How many dollars did I get back? Explain first before answering."
}
)
prompt = format_instructions(instructions)
inputs = {
"inputs": prompt,
"parameters": {
"temperature": 0.6,
"top_p": 0.9,
"max_new_tokens": 200,
"do_sample": False
}
}
response = predictor.predict(inputs)
print_instructions(prompt, response)

The output is:

> Input
<|im_start|>user
I bought an ice cream for 6 kids. Each cone was $1.25 and I paid with a $10 bill. How many dollars did I get back? Explain first before answering.<|im_end|>
<|im_start|>assistant

> Output
To solve this problem, we first need to calculate the total cost of the ice cream cones and then subtract that amount from the $10 bill to find out how much change was received.

Step 1: Calculate the total cost of the ice cream cones
There are 6 kids, and each cone costs $1.25. So, the total cost is:
6 kids * $1.25/cone = $7.50

Step 2: Calculate the change received
You paid with a $10 bill, so to find out how much change you got back, subtract the total cost from the $10 bill:
$10 - $7.50 = $2.50

So, you received $2.50 in change.

The code above demonstrates Snowflake Arctic’s ability to understand natural language prompts, including mathematical reasoning, break them down into logical steps, and generate human-like explanations and solutions.

SQL Generation

The Snowflake Arctic Instruct model is also adept at generating SQL queries based on natural language prompts and enterprise intelligent training. Test its capabilities with the following prompts:

question = "Show the average price by cut and sort the results by average price in descending order"
context = """
Here is the table name <tableName> ML_HOL_DB.ML_HOL_SCHEMA.DIAMONDS </tableName>

<tableDescription> This table has data on diamond sales from our favorite diamond dealer. </tableDescription>

Here are the columns of the ML_HOL_DB.ML_HOL_SCHEMA.DIAMONDS

<columns>\n\n CARAT, CUT, COLOR, CLARITY, DEPTH, TABLE_PCT, PRICE, X, Y, Z \n\n</columns>
"""
instructions = (
{
"role": "user",
"content": """You will be acting as an AI Snowflake SQL Expert named Snowflake Cortex Assistant.
Your goal is to give correct, executable sql query to users.
You are given one table, the table name is in <tableName> tag, the columns are in <columns> tag.
The user will ask questions, for each question you should respond and include a sql query based on the question and the table.

{context}

Here are 7 critical rules for the interaction you must abide:
<rules>
1. You MUST MUST wrap the generated sql code within ``` sql code markdown in this format e.g
```sql
(select 1) union (select 2)
```
2. If I don't tell you to find a limited set of results in the sql query or question, you MUST limit the number of responses to 10.
3. Text / string where clauses must be fuzzy match e.g ilike %keyword%
4. Make sure to generate a single snowflake sql code, not multiple.
5. YOU SHOULD USE ONLY THE COLUMN NAMES IN <COLUMNS>, AND THE TABLE GIVEN IN <TABLENAME>.
6. DO NOT put numerical at the very front of sql variable.
7. BE CONCISE. DO NOT SHOW ANY TEXT AFTER THE SQL QUERY! ONLY SHOW THE SQL QUERY AND NOTHING ELSE!
</rules>

Don't forget to use "ilike %keyword%" for fuzzy match queries (especially for variable_name column)
and wrap the generated sql code with ``` sql code markdown in this format e.g:
```sql
(select 1) union (select 2)
```

For each question from the user, make sure to include a SQL QUERY in your response.

Question: {question}

Answer: the most important piece of information is the SQL QUERY. BE CONCISE AND JUST SHOW THE SQL QUERY. DO NOT SHOW ANY TEXT AFTER THE SQL QUERY!')) as response
""".format(context=context, question=question)
}
)

prompt = format_instructions(instructions)
inputs = {
"inputs": prompt,
"parameters": {
"temperature": 0.1,
"top_p": 0.95,
"max_new_tokens": 512,
"do_sample": False
}
}
response = predictor.predict(inputs)
print_instructions(prompt, response)

The output is:

> Output
SELECT CUT, AVG(PRICE) as AVG_PRICE FROM ML_HOL_DB.ML_HOL_SCHEMA.DIAMONDS 
GROUP BY CUT ORDER BY AVG_PRICE DESC LIMIT 10;

The output shows that Snowflake Arctic Instruct has inferred certain fields of interest in the tables and provided a somewhat complex query that joins two tables to get the desired results.

cleaning

Once you’ve finished running the notebook, delete all the resources created in the process and stop incurring charges. Use the following code:

predictor.delete_model()
predictor.delete_endpoint()

If you deploy an endpoint from the SageMaker Studio console, you can delete it by selecting Delete on the endpoint details page.

Conclusion

This post showed you how to get started with the Snowflake Arctic Instruct model in SageMaker Studio and provided sample prompts for multiple enterprise use cases. The FMs are pre-trained, reducing training and infrastructure costs, and can also be customized for your use case. Check out the SageMaker JumpStart in SageMaker Studio to get started today. For more information, see the following resources:

About the Author

Natarajan Chennimalai Kumar – Principal Solutions Architect, 3P Model Provider, AWS
Pavan Kumar Rao Navule – AWS Solutions Architect
Nidhi Gupta – Senior Partner Solutions Architect, AWS
Bosco Albuquerque – Senior Partner Solutions Architect, AWS
Matt Marzillo – Senior Partner Engineer, Snowflake
Nithin Vijeaswaran – AWS Solutions Architect
Armando Diaz – AWS Solutions Architect
Supriya Puragundla – Senior Solutions Architect, AWS
Jin Tan Ruan – AWS Prototyping Developer

What's Hot

With Trump Endorsement, RFK Jr. Is Officially a Loser and Completely Full of Sh*t

What to do first on your new Apple Watch

Acquiring specialized knowledge improves your brain’s ability to concentrate

Wolves that feed on nectar may pollinate flowers.

Asteroid will dramatically burn up in Earth’s atmosphere today

Why does Ozempic and Wegovy seem to handle everything?

Unlock cost savings with the new scale down to zero feature in SageMaker Inference

Speed up your AI inference workloads with new NVIDIA-powered capabilities in Amazon SageMaker

Unlock cost savings with the new scale down to zero feature in SageMaker Inference

Speed up your AI inference workloads with new NVIDIA-powered capabilities in Amazon SageMaker

How Amazon Finance Automation built a generative AI Q&A chat assistant using Amazon Bedrock

Create a fashion assistant application using Amazon Titan models and Amazon Bedrock Agent

NYT Strands July 27th Hints and Answers

Uncovering the mysterious role of magnetism in the galaxy

Most Popular

Elon Musk’s X uses your posts to train its AI chatbot, Grok. Here’s how to opt out:

Dis-Ease review: Visually rich documentary powerfully transforms our view of illness

How to connect a Nintendo Switch to a TV

Our Picks

Britain will run out of coal-fired power stations for the first time in 142 years

European Jupiter probe JUICE passes by the Moon in historic flyby

Dark Matter Will Return for More Multiverse Shenanigans

Subscribe to our newsletter

Subscribe to Updates

What's Hot

Snowflake Arctic Model Now Available on Amazon SageMaker JumpStart

What is Snowflake Arctic?

What is SageMaker JumpStart?

Find a model

Deploying the model in SageMaker Studio

Deploying models through notebooks

Run inference

Summarizing long texts

Code Generation

Mathematics and Reasoning

SQL Generation

cleaning

Conclusion

About the Author

Related Posts

Subscribe to our newsletter

Subscribe to our newsletter