Generative Artificial Intelligence (AI) Foundational Models (FMs) are gaining popularity among enterprises due to their versatility and potential to address a variety of use cases. The true value of FMs is realized when they are adapted to domain-specific data. Managing these models across the business and model lifecycle can become complex. As FMs adapt to different domains and data, it becomes important to operationalize these pipelines.
Amazon SageMaker is a fully managed service for building, training, and deploying machine learning (ML) models and is seeing increasing adoption for customizing and deploying FMs that power generative AI applications. SageMaker provides rich capabilities for building automated workflows for deploying models at scale. One of the key features that enables operational excellence around model management is the model registry. The model registry helps in cataloging and managing model versions, facilitating collaboration and governance. Once a model is trained and its performance is evaluated, it can be stored in the model registry for model management.
Amazon SageMaker has released a new capability in the Model Registry that makes it easier to version and catalog FMs. Customers can also use SageMaker to train or tune FMs, such as Amazon SageMaker JumpStart and Amazon Bedrock models, and manage these models within the Model Registry. As customers start to extend their generative AI applications across different use cases, such as fine-tuning for domain-specific tasks, the number of models can grow rapidly. To track models, versions, and associated metadata, you can use the SageMaker Model Registry as an inventory of your models.
In this post, we discuss new features in the Model Registry that streamline FM management by allowing you to register unpacked model artifacts and pass the End User License Agreement (EULA) acceptance flag without user intervention.
overview
Model Registry worked well for traditional models which were small in size. For FMs, the challenge was their large size and the need for user interaction to accept the EULA. With the new features in Model Registry, fine-tuned FMs can now be easily registered within Model Registry and deployed for real-world use.
A typical model development lifecycle is an iterative process, with many cycles of experimentation to achieve the expected performance from a model. Once trained, these models are entered into a model registry and cataloged as versions. Models can be organized into groups, versions can be compared based on quality metrics, and models have an associated approval status that indicates whether they can be deployed.
Once models are manually approved, you can trigger a continuous integration and continuous deployment (CI/CD) pipeline to deploy these models into production. Optionally, you can also use the Model Registry as a repository of models that have been approved for use in your enterprise. Various teams can then deploy these approved models from the Model Registry and build their applications around them.
An example workflow can follow these steps and is shown in the following diagram.
- Select the SageMaker JumpStart model and register it in the Model Registry.
- Alternatively, you can fine-tune the SageMaker JumpStart model.
- Evaluate your model using SageMaker model evaluation, which allows for optional human evaluation.
- Create a model group in the model registry. For each run, create a model version. Add the model group to one or more model registry collections. You can use this to group registered models that are related to each other. For example, you can create a collection of Large Language Models (LLMs) and another collection of diffusion models.
- Deploy the model as a SageMaker inference endpoint that can be used in generative AI applications.
Figure 1: Model Registry workflow for foundational models
To better support generative AI applications, the Model Registry has released two new features: ModelDataSource and Source Model URIs. The following sections describe these features and how to use them.
ModelDataSource speeds up deployment and provides access to EULA-dependent models.
Previously, when models were registered in the model registry in a compressed format, the model artifacts had to be stored alongside the inference code. This was a challenge for generative AI applications that used very large sized FMs with billions of parameters. When FMs were stored as compressed models, it took a very long time to unpack these models at runtime, which increased latency in the startup time of SageMaker endpoints. model_data_source
Parameters can now accept the location of unpacked model artifacts in Amazon Simple Storage Service (Amazon S3), simplifying the registration process and eliminating the need to unpack model weights on the endpoint, reducing latency when starting up the endpoint.
Additionally, public JumpStart models from independent service providers such as LLAMA2 and certain FMs require you to accept a EULA before you can use the model. Therefore, if you tune a public model from SageMaker JumpStart, you could not save it to the Model Registry because it required users to accept the license agreement. The Model Registry now has a new feature: model_data_source
Parameters enable the registration of such models, allowing customers to catalog a wider variety of FMs in the model registry, version them, and associate metadata such as training metrics.
Use the AWS SDK to register the unpacked model stored in Amazon S3.
Register a model that requires a EULA.
Source model URIs provide simplified registration and support for proprietary models.
Model Registry now supports auto-population of inference spec files for recognized model IDs, including some AWS Marketplace models, hosted models, or versioned model packages in the Model Registry. Support for SourceModelURI auto-population enables organizations to use a broader set of FMs in the Model Registry by allowing you to register your own JumpStart models from providers such as AI21 labs, Cohere, and LightOn without requiring an inference spec file.
Previously, to register a trained model in the SageMaker Model Registry, you had to provide a complete inference specification required for deployment, including an Amazon Elastic Container Registry (Amazon ECR) image and the trained model files. source_uri
SageMaker now supports users to easily register models by providing a source model URI, which is a free-form field that stores the model ID or location to your own JumpStart and Bedrock model IDs, S3 location, and MLflow model ID. You do not need to provide the details required for deployment to SageMaker hosting at registration time, you can add the artifacts later. After registration, to deploy a model, package it into an inference spec and update the model registry accordingly.
For example, you can register a model in the Model Registry using the model’s Amazon Resource Name (ARN) SourceURI.
Later, you can update the registered model with an inference specification so that it can be deployed to SageMaker.
Amazon JumpStart Register your own FM.
Conclusion
As organizations continue to adopt generative AI in various parts of their business, robust model management and versioning becomes paramount. Model Registry enables versioning, tracking, collaboration, lifecycle management, and governance of FMs.
In this post, we discussed how model registries can more effectively support the management of generative AI models throughout the model lifecycle, enabling you to better govern and adopt generative AI to achieve transformative outcomes.
To learn more about the Model Registry, see Registering and Deploying Models with the Model Registry. To get started, visit the SageMaker console.
About the Author
Chaitra Mathur As a Principal Solutions Architect at AWS, she is responsible for advising customers on how to build robust, scalable, and secure solutions on AWS. With a keen interest in data and ML, she helps customers leverage AWS AI/ML and generative AI services to effectively address their ML requirements. Throughout her career, she has shared her expertise at numerous conferences and authored several blog posts in the ML space.
Kate Healy He is a Solutions Architect II at AWS, specializing in working with startups and enterprise automotive customers, and has experience building large-scale AI/ML solutions to drive key business outcomes.
Saumitra Vikaram He is a Senior Software Engineer at AWS, focusing on AI/ML technologies, ML model management, ML governance, and MLOps to improve efficiency and productivity across the organization.
Siamak Nariman He is a Senior Product Manager at AWS, focusing on AI/ML technologies, ML model management, and ML governance to improve efficiency and productivity across the organization. He has extensive experience in automating processes and implementing various technologies.