This post was co-authored by Matt Marzillo of Snowflake.
Today, we are announcing that Snowflake Arctic Instruct models are now available for deployment and running inference through Amazon SageMaker JumpStart. Snowflake Arctic is a family of enterprise-grade large-scale language models (LLMs) built by Snowflake to serve the needs of enterprise users, and excels in SQL querying, coding, and instruction accuracy execution (as shown in the following benchmarks). SageMaker JumpStart is a machine learning (ML) hub that provides access to algorithms, models, and ML solutions to jump-start your ML journey.
This post explains how to use SageMaker JumpStart to discover and deploy the Snowflake Arctic Instruct model, and provides an example use case with concrete prompts.
What is Snowflake Arctic?
Snowflake Arctic is an enterprise-focused LLM that offers top-tier enterprise intelligence among open LLMs at a highly competitive cost-effectiveness. Snowflake is able to achieve advanced enterprise intelligence with its Dense Mixture of Experts (MoE) hybrid transformer architecture and efficient training techniques. Artic with its hybrid transformer architecture is designed with 10 billion dense transformer models combined with residual 128×3.66 billion MoE MLPs, with a total of 480 billion parameters distributed across 128 fine-grained experts and 17 billion active parameters selected using top-2 gating. This allows Snowflake Arctic to expand enterprise intelligence capacity with a high total parameter count, while at the same time improving training and inference resource efficiency with a moderate number of active parameters.
Snowflake Arctic is trained on a three-phase data curriculum where the first phase focuses on general skills (1 trillion tokens, mostly web data) and the next two phases focus on enterprise-focused skills (1.5 trillion tokens and 1 trillion tokens respectively, more code, SQL, and STEM data), enabling the Snowflake Arctic model to set a new baseline for cost-effective enterprise intelligence.
In addition to cost-efficient training, Snowflake Arctic also features numerous innovations and optimizations to run inference efficiently. At small batch sizes, inference is memory bandwidth constrained and Snowflake Arctic requires up to 4x fewer memory reads compared to other public models, resulting in better inference performance. At very large batch sizes, inference is compute constrained and Snowflake Arctic requires up to 4x less compute compared to other public models. Snowflake Arctic models are available under the Apache 2.0 license, which provides ungated access to weights and code. All data recipes and research insights are also provided to customers.
What is SageMaker JumpStart?
SageMaker JumpStart allows you to choose from a wide range of publicly available foundational models (FMs). ML practitioners can deploy FMs on dedicated Amazon SageMaker instances from a network-isolated environment and use SageMaker to customize model training and deployment. You can now discover and deploy Arctic Instruct models with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK. This allows you to derive control over model performance and machine learning operations (MLOps) using SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, and container logs. Models are deployed in a secure environment on AWS, under the control of a virtual private cloud (VPC) to ensure data security. Snowflake Arctic Instruct models are available today for deployment and inference in SageMaker Studio. us-east-2
AWS Regions, with availability in additional regions planned in the future.
Find a model
You can access FM through the SageMaker Studio UI, SageMaker JumpStart, and the SageMaker Python SDK. In this section, we’ll show you how to discover models in SageMaker Studio.
SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface to access specialized tools for performing all ML development steps, from data preparation to building, training, and deploying ML models. For more information on how to get started and set up SageMaker Studio, see Amazon SageMaker Studio.
SageMaker Studio gives you access to SageMaker JumpStart, which contains pre-trained models, notebooks, and pre-built solutions. Pre-built automation solutions.
From the SageMaker JumpStart landing page, you can find different models by browsing different hubs named after the model providers. The Snowflake Arctic Instruct model is in the Hugging Face hub. If you don’t see the Arctic Instruct model, try shutting down and restarting SageMaker Studio to update your version. For more information, see Shutting Down and Updating the Studio Classic App.
You can also find the Snowflake Arctic Instruct Models by searching for “Snowflake” in the search field.
When you select a model card, you can view the model details, including the license, the data used to train it, how the model is used, etc. There are also two options to deploy the model – a deployment notebook and a preview notebook, which will deploy the model and create an endpoint for you.
Deploying the model in SageMaker Studio
Select “Deploy” in SageMaker Studio and the deployment will begin.
You can monitor the progress of the deployment on the Redirected Endpoint details page.
Deploying models through notebooks
Alternatively, you can select (Open Notebook) to deploy your model through a sample notebook, which provides end-to-end guidance on how to deploy your model for inference and clean up resources.
To deploy using the notebook, first select the appropriate model specified by model_id. You can then deploy one of the selected models to SageMaker using the following code:
This will deploy the model in SageMaker with default settings, including default instance type and default VPC settings. You can change these settings by specifying non-default values in JumpStartModel. For more information, see the API documentation.
Run inference
After deploying the model, you can run inference against the deployed endpoint via the SageMaker prediction API. Snowflake Arctic Instruct accepts chat history between the user and the assistant and generates subsequent chats.
predictor.predict(payload)
Inference parameters control the text generation process at the endpoint. The Max new tokens parameter controls the size of the output generated by the model. This may not be the same as the number of words, as the model’s vocabulary is not the same as the English vocabulary. The temperature parameter controls the randomness of the output. Higher temperatures result in more creative and hallucinatory outputs. All inference parameters are optional.
The model accepts formatted instructions where the conversation roles start with a prompt from the user and alternate between user instructions and the assistant. The format of the instructions must be strictly adhered to or the model will produce suboptimal output. The template that creates the prompts for the model is defined as follows:
<|im_start|>system
{system_message} <|im_end|>
<|im_start|>user
{human_message} <|im_end|>
<|im_start|>assistant\n
<|im_start|>
and <|im_end|>
Special tokens that represent the beginning of a string (BOS) and the end of a string (EOS). A model can include multiple conversational turns between the system, the user, and the assistant, incorporating a small number of samples to enhance the model’s responses.
The following code shows how to format the prompt imperatively:
<|im_start|>user\n5x + 35 = 7x -60 + 10. Solve for x<|im_end|>\n<|im_start|>assistant\n
The following sections provide sample prompts for various enterprise use cases.
Summarizing long texts
Snowflake Arctic Instruct allows you to perform custom tasks such as summarizing long text into a JSON formatted output. Through text generation, you can perform a variety of tasks such as text summarization, language translation, code generation, sentiment analysis, etc. The input payload to the endpoint looks like the following code:
Below are example prompts and text generated by the model. All output is generated using the inference parameters. {"max_new_tokens":512, "top_p":0.95, "temperature":0.7, "top_k":50}
.
The input is as follows:
You will get output similar to the following:
Code Generation
Using the previous example, you can use the code generation prompt as follows:
The code above uses Snowflake Arctic Instruct to generate a Python function that writes a JSON file. We define the input prompt “Write a function in Python to write a json file:” and a payload dictionary that contains some parameters that control the generation process, such as the maximum number of tokens to generate and whether sampling is enabled. We send this payload to a predictor (likely an API), receive the generated text response, and print it to the console. The output is a Python function that writes the JSON file requested in the prompt.
The output is:
This creates a file named `output.json` in the same directory as your Python script and writes the `data` dictionary to it in JSON format.
The output from code generation defines write_json, which takes a filename and a Python object and writes the object as JSON data. The output displays the expected JSON file contents, demonstrating the natural language processing and code generation capabilities of the model.
Mathematics and Reasoning
Snowflake Arctic Instruct also reports the strength of your mathematical reasoning. Test it using the following prompts:
The output is:
The code above demonstrates Snowflake Arctic’s ability to understand natural language prompts, including mathematical reasoning, break them down into logical steps, and generate human-like explanations and solutions.
SQL Generation
The Snowflake Arctic Instruct model is also adept at generating SQL queries based on natural language prompts and enterprise intelligent training. Test its capabilities with the following prompts:
The output is:
The output shows that Snowflake Arctic Instruct has inferred certain fields of interest in the tables and provided a somewhat complex query that joins two tables to get the desired results.
cleaning
Once you’ve finished running the notebook, delete all the resources created in the process and stop incurring charges. Use the following code:
If you deploy an endpoint from the SageMaker Studio console, you can delete it by selecting Delete on the endpoint details page.
Conclusion
This post showed you how to get started with the Snowflake Arctic Instruct model in SageMaker Studio and provided sample prompts for multiple enterprise use cases. The FMs are pre-trained, reducing training and infrastructure costs, and can also be customized for your use case. Check out the SageMaker JumpStart in SageMaker Studio to get started today. For more information, see the following resources:
About the Author
Natarajan Chennimalai Kumar – Principal Solutions Architect, 3P Model Provider, AWS
Pavan Kumar Rao Navule – AWS Solutions Architect
Nidhi Gupta – Senior Partner Solutions Architect, AWS
Bosco Albuquerque – Senior Partner Solutions Architect, AWS
Matt Marzillo – Senior Partner Engineer, Snowflake
Nithin Vijeaswaran – AWS Solutions Architect
Armando Diaz – AWS Solutions Architect
Supriya Puragundla – Senior Solutions Architect, AWS
Jin Tan Ruan – AWS Prototyping Developer