QWEN 2.5 Multilingual Leading Language Models (LLMS) is a collection of pre-trained and directively tuned generative models of 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B (text-in/text-out and code-out). QWEN 2.5’s fine-tuned text-only model is optimized for multilingual interaction usage cases, outperforming both the previous generation of the QWEN model and many published chat models based on popular industry benchmarks.
Qwen 2.5 is an automatic recursive language model that uses an optimized transformer architecture at its core. The QWEN2.5 collection can support over 29 languages, enhancing the role-playing capabilities and conditioning of chatbots.
This post outlines how to use Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Sagemaker to deploy a model family of QWEN 2.5 models using Hugging Face Text Generation Inference (TGI) containers. QWEN2.5 coders and mathematical variations are also supported.
Preparation
Hugging Face offers two tools that are frequently used when using AWS Imedertia and AWS Traneium: Text Generation Inference (TGI) containers. This supports the deployment and delivery of LLMS.
When the Model runs for the first time on Irsentia or Trainium, compile the model to ensure that there is a version that runs best on Imeferntia and Trainium chips. The optimal neuron library that embraces the face with the optimal neuron cache transparently feeds the compiled model when available. If you are using a different model using the QWEN2.5 architecture, you may need to compile the model before deploying. For more information, see Compiling a Recommended or Training Model.
You can deploy TGI as a Docker Container to an Irsentia or Trainium EC2 instance or to an Amazon Sagemaker.
Option 1: Expand TGI to Amazon EC2 INF2
In this example, we deploy QWEN2.5-7B-Instruct to an inf2.xlarge instance. (See this article for detailed instructions on how to deploy an instance using the embracing face Dlami.)
This option uses ssh for the instance to create a .env file (defines constants and specifies where the model is cached) and a file named docker-compose.yaml (a file that defines all the environment parameters that the model needs to be expanded for inference). For this use case, you can copy the following files:
- Create a .ENV file with the following content:
- Create a file named docker-compose.yaml with the following content:
- Expand the model using Docker Compose.
docker compose -f docker-compose.yaml --env-file .env up
- To verify that the model is deployed correctly, send a test prompt to the model.
- To verify that your model can respond in multiple languages, try sending a prompt in Chinese.
Option 2: Expand TGI to Sagemaker
You can also use Hugging Face’s Optimum Neuron Library to quickly deploy models from Sagemaker using the instructions in the Hugging Face Model Hub.
- Please select from QWEN 2.5 model card hub Expandafter that Surge Makerand finally AWS IMESENTIA & TRAINIUM.
- Copy the sample code to your Sage Maker notebook, select it, then select it run.
- The copied notebook looks like this:
cleaning
Terminate your EC2 instance and remove the Sagemaker endpoint to avoid ongoing costs.
Terminate the EC2 instance through the AWS Management Console.
End the Sage Maker endpoint using the console or the following command:
Conclusion
AWS Trainium and AWS Imdentia offer high performance and low cost for deploying QWEN 2.5 models. We look forward to how we can build differentiated AI applications using these powerful models and dedicated AI infrastructure. For more information on how to get started with AWS AI chips, see the AWS Neurons documentation.
About the Author
Jim Burtft He is a senior startup solution architect at AWS and works directly with the startup and Hugpingface team. Gym is a CISSP, part of the AWS AI/ML Technical Field Community, part of the Neuron Data Science community, and works with the open source community to enable the use of recommendations and training. Jim holds a bachelor’s degree in mathematics from Carnegie Mellon University and a master’s degree in economics from the University of Virginia.
Miriam Lebowitz A solution architect focused on empowering early stage startups on AWS. She leverages her experience with AIML to guide companies to select and implement the technology that suits their business goals, setting it up for scalable growth and innovation in a competitive startup world.
Rhia Soni I am AWS Startup Solutions Architect. RHIA specializes in collaborating with early stage startups, helping customers adopt recommendations and training. RHIA is also part of the AWS Analytics Technology Community and is a subject expert in Generating BI. RHIA holds a Bachelor of Arts in Information Science from the University of Maryland.
Paul Eat I’m a senior solution architect manager focusing on AWS startups. Paul created a team for AWS Startup Solution Architects, focusing on recommendations and adoption of Trainium. Paul holds a bachelor’s degree in computer science from Siena College and has multiple cybersecurity certifications.