Based on previous machine learning blog posts, we’ll take this post a step further by creating personalized avatars by hosting Stable Diffusion 2.1 models at scale using Amazon Sagemaker. As technology continues to evolve, new models will emerge, providing high quality, greater flexibility, and faster image generation capabilities. One such groundbreaking model is the stable diffusion XL (SDXL) released by StabilityAi, which takes the text-to-image generic AI technology to unprecedented heights. This post shows you how to efficiently fine-tune your SDXL model using Sagemaker Studio. We will then show you how to prepare a fine-tuned model. Runs on AWS recommended Amazon EC2 INF2 instances and unlocks excellent price performance for inference workloads.
Overview of solutions
SDXL 1.0 is a text-to-image generation model developed by stability AI, consisting of over 3 billion parameters. It consists of several key components, including a text encoder that converts input prompts into latent representations, and a U-NET model that generates images based on these latent representations through a spreading process. Despite impressive features trained on public datasets, app builders may need to generate images of specific subjects or styles that are difficult or inefficient to describe in words. In such situations, fine tuning is a great option to use your own data to improve relevance.
One common approach to fine-tuning SDXL is to use DreamBooth and Low-Rank Adaptation (LORA) techniques. You can use DreamBooth to personalize your model by embedding subjects into the output domain using unique identifiers and effectively expanding the language vision dictionary. This process uses a technique called pre-saving that retains existing knowledge of the model regarding subject classes (such as humans) while incorporating new information from the provided subject images. LORA is an efficient fine-tuning method that attaches small adapter networks to specific layers of pre-trained models and freezing most of their weight. Combining these techniques allows personalized models to be generated while tuning low-loading parameters, resulting in fine-tuning times and optimized storage requirements.
After the model has been fine-tuned, you can use the AWS Neuron SDK to compile and host fine-tuned SDXL on your INF2 instance. By doing this, you can benefit from the higher performance and cost-effectiveness offered by these professional AI chips, while taking advantage of seamless integration with popular deep learning frameworks such as Tensorflow and Pytorch. You can do it. See the neurons documentation for more information.
Prerequisite
Before you begin, check out the list of the types of services and instances you need to run the sample notebook provided in this GitHub location.
By following these prerequisites, you have the knowledge and AWS resources needed to run the sample notebook and use a stable diffusion model and FMS with Amazon Sagemaker.
Sagemaker fine tweaks sdxl
To fine-tune SDXL with Sagemaker, follow the steps in the next section.
Prepare the image
The first step to fine-tune your SDXL model is to prepare the training image. With the DreamBooth technique, you will need only 10-12 images for fine tuning. It is recommended that the model provide a variety of images that will help it better understand and generalize facial features.
Training images should cover different perspectives of the face, including selfies taken from different angles. Include images with a variety of facial expressions, such as smiles, frowns, and neutrals. Preferably, images from different backgrounds are used to allow the model to more effectively identify subjects. By providing a variety of images, DreamBooth can better identify subjects from photos and generalize facial features. The following set of images shows this:
Additionally, the tweak uses a square image of 1024 x 1024 pixels. To simplify the process of preparing images, there is a utility feature that automatically crops and adjusts images to the correct dimensions.
Training personalized models
Once the image is ready, you can start the tweaking process. To achieve this, we use Hugging Face’s AutoTrain library, an automatic and easy-to-use approach to training and deploying cutting-edge machine learning (ML) models. Seamlessly integrated with the Hugging Face Ecosystem, AutoTrain is designed to be accessible, allowing individuals to train custom models without extensive technical expertise or coding proficiency. To use AutoTrain, use the following example:
!autotrain dreambooth \
--prompt "${INSTANCE_PROMPT}" \
--class-prompt "${CLASS_PROMPT}" \
--model ${MODEL_NAME} \
--project-name ${PROJECT_NAME} \
--image-path "${IMAGE_PATH}" \
--resolution ${RESOLUTION} \
--batch-size ${BATCH_SIZE} \
--num-steps ${NUM_STEPS} \
--gradient-accumulation ${GRADIENT_ACCUMULATION} \
--lr ${LEARNING_RATE} \
--fp16 \
--gradient-checkpointing
First, you need to set up a prompt and a class prompt. The prompt must contain a unique identifier or token that the model can refer to to the subject. On the other hand, class prompts are used to subsidize model training with similar subjects of the same class. This is a requirement of the Dreambooth technique to better associate new tokens with the theme of interest. This is why Dream Boost Techniques can produce exceptional fine-tuned results when there are few input images. Furthermore, even if you don’t provide examples of the top or back of the head, you’ll notice that you still know how the model generates them for class prompts. In this example, to avoid names that the model may already be well known,< tok >> is used as a unique identifier.
instance_prompt = "photo of <<TOK>>"
class_prompt = "photo of a person"
Next, you need to provide the model, image path, and project name. The model name loads the base model from the hugging facehub or local. The image path is the location of the training image. By default, AutoTrain uses LORA to fine-tune it in a parameter-efficient way. Unlike traditional fine-tuning, Lora attaches small transformer adapter models to the base model to fine-tuning. Only the adapter weights are updated during training to provide fine-tuning behavior. Additionally, these adapters can be attached and separated at any time, making them extremely efficient for storage. These supplemental LORA adapters are 98% smaller in size compared to the original model, allowing you to store and share LORA adapters without repeated replicating the base model. The following diagram illustrates these concepts:
The rest of the configuration parameters are as follows: It is recommended to start with these values ​​first. Adjust them only if the results of the tweaks do not meet your expectations.
resolution = 1024 # resolution or size of the generated images
batch_size = 1 # number of samples in one forward and backward pass
num_steps = 500 # number of training steps
gradient_accumulation = 4 # accumulating gradients over number of batches
learning_rate = 1e-4 # step size
fp16 # half-precision
gradient-checkpointing # technique to reduce memory consumption during training
The entire training process takes about 30 minutes with the previous configuration. Once the training is complete, you can load a LORA adapter such as the following code to generate a finely tuned image:
from diffusers import DiffusionPipeline, StableDiffusionXLImg2ImgPipeline
import random
seed = random.randint(0, 100000)
# loading the base model
pipeline = DiffusionPipeline.from_pretrained(
model_name_base,
torch_dtype=torch.float16,
).to(device)
# attach the LoRA adapter
pipeline.load_lora_weights(
project_name,
weight_name="pytorch_lora_weights.safetensors",
)
# generate fine tuned images
generator = torch.Generator(device).manual_seed(seed)
base_image = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=50,
generator=generator,
height=1024,
width=1024,
output_type="pil",
).images(0)
base_image
Deploy to Amazon EC2 INF2 instances
In this section you will learn to compile and host fine-tuned SDXL models on INF2 instances. To get started, you need to clone the repository and upload the LORA adapter to the INF2 instance created in the prerequisites section. Next, run the compilation notebook to compile the finely tuned SDXL model using the best neuronal library. For more information, see the Best Neurons page.
NeuronStableDiffusionXLPipeline
Optimum Neuron classes currently support LORA directly. All you need to do is provide the base model, the LORA adapter, and supply the input shape of the model to start the compilation process. The following code snippet shows how to compile a compiled model into a local directory and then export it.
from optimum.neuron import NeuronStableDiffusionXLPipeline
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
adapter_id = "lora"
input_shapes = {"batch_size": 1, "height": 1024, "width": 1024, "num_images_per_prompt": 1}
# Compile
pipe = NeuronStableDiffusionXLPipeline.from_pretrained(
model_id,
export=True,
lora_model_ids=adapter_id,
lora_weight_names="pytorch_lora_weights.safetensors",
lora_adapter_names="sttirum",
**input_shapes,
)
# Save locally or upload to the HuggingFace Hub
save_directory = "sd_neuron_xl/"
pipe.save_pretrained(save_directory)
The editing process takes about 35 minutes. Once the process is complete you can use NeuronStableDiffusionXLPipeline
Backload the compiled model again.
from optimum.neuron import NeuronStableDiffusionXLPipeline
stable_diffusion_xl = NeuronStableDiffusionXLPipeline.from_pretrained("sd_neuron_xl")
You can then test the model with INF2 and see that you can produce fine-tuned results.
import torch
# Run pipeline
prompt = """
photo of <<TOK>> , 3d portrait, ultra detailed, gorgeous, 3d zbrush, trending on dribbble, 8k render
"""
negative_prompt = """
ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred,
watermark, grainy, signature, cut off, draft, amateur, multiple, gross, weird, uneven, furnishing, decorating, decoration, furniture, text, poor, low, basic, worst, juvenile,
unprofessional, failure, crayon, oil, label, thousand hands
"""
seed = 491057365
generator = (torch.Generator().manual_seed(seed))
image = stable_diffusion_xl(prompt,
num_inference_steps=50,
guidance_scale=7,
negative_prompt=negative_prompt,
generator=generator).images(0)
Below are some avatar images generated using the finely tuned model of INF2: The corresponding prompts are:
- << tok >> Emoji, astronaut, spaceship background
- << tok >>, business woman, oil painting of suit
- << tok >>, 3D portrait, ultra details, 8k rendering photos
- << tok >>, ninja style, black hair anime
cleaning
To avoid any AWS charges after testing this example is finished, remove the following resources:
- Amazon Sagemaker Studio Domain
- Amazon EC2 INF2 Instances
Conclusion
This post shows how to fine-tune a stable diffusion XL (SDXL) model using Amazon Sagemaker’s DreamBooth and Lora techniques. image. By using these techniques, businesses can quickly adapt their SDXL models to their specific needs, unlocking new opportunities to enhance and deliver customer experiences. Additionally, we introduced the process of compiling and deploying a finely tuned SDXL model for AWS inference. A model of scale in a cost-effective way. I recommend trying this example and sharing your work with the hashtag #SageMaker #MME #GENAI on social platforms. We want to see what you’re making.
For more examples of AWS neurons, see AWS-Neuron-Samples.
About the author
Deepti Tirumala He is a senior solution architect at Amazon Web Services and specializes in machine learning and generator AI technology. Passionate about helping customers move forward with their AWS journey, she works closely with the organization to create scalable, safe and cost-effective solutions that leverage the latest innovations in these areas I will.
James Woo I am AWS Senior AI/ML Specialist Solution Architect. Help customers design and build AI/ML solutions. James’ work covers a wide range of ML use cases with a large interest in computer vision, deep learning, and scaling ML across the enterprise. Prior to joining AWS, James was an architect, developer and technology leader for over six years. This included six years of engineering and four years in the marketing and advertising industry.
Diwakar Bangsal He is a leading Genai specialist focused on business development and go-to markets for accelerated computing services in Genai and machine learning. Diwakar is leading the product definition, global business development and marketing of technology products in the fields of IoT, Edge Computing, and autonomous driving, focusing on bringing machine learning to these domains. Masu. Diwakar is passionate about public speaking and thinking leadership in the fields of cloud and gnai.