Identifying idle endpoints in Amazon SageMaker

July 1, 2024

4

Amazon SageMaker is a machine learning (ML) platform designed to simplify the process of building, training, deploying, and managing ML models at scale. With a comprehensive suite of tools and services, SageMaker provides developers and data scientists with the resources they need to accelerate the development and deployment of ML solutions.

In today’s rapidly changing technology environment, efficiency and agility are essential for businesses and developers looking to innovate. AWS plays a key role in enabling this innovation by providing a range of services that abstract the complexities of infrastructure management. By handling tasks such as resource provisioning, scaling, and management, AWS allows developers to focus on core business logic and quickly iterate on new ideas.

As developers deploy and scale their applications, unused resources such as idle SageMaker endpoints can unknowingly accumulate, increasing operational costs. In this post, we discuss the problem of identifying and managing idle endpoints in SageMaker. We explore how to effectively monitor SageMaker endpoints and distinguish between active and idle endpoints. Additionally, we discuss a Python script that uses Amazon CloudWatch metrics to automate the identification of idle endpoints.

Identifying idle endpoints with a Python script

To effectively manage SageMaker endpoints and optimize resource utilization, we use a Python script that uses the AWS SDK for Python (Boto3) to interact with SageMaker and CloudWatch. The script automates the process of querying CloudWatch metrics to determine endpoint activity and identifies idle endpoints based on the number of invocations in a specified time period.

We’ll break down the main components of the Python script and explain how each part contributes to identifying idle endpoints.

Initializing Global Variables and AWS Client – The script begins by importing the required modules and initializing the following global variables: NAMESPACE, METRIC, LOOKBACKand PERIODThese variables define the parameters for querying the CloudWatch metrics and the SageMaker endpoint. Additionally, AWS clients for interacting with SageMaker and CloudWatch services are initialized using Boto3.

from datetime import datetime, timedelta
import boto3
import logging

# AWS clients initialization
cloudwatch = boto3.client("cloudwatch")
sagemaker = boto3.client("sagemaker")

# Global variables
NAMESPACE = "AWS/SageMaker"
METRIC = "Invocations"
LOOKBACK = 1  # Number of days to look back for activity
PERIOD = 86400  # We opt for a granularity of 1 Day to reduce the volume of metrics retrieved while maintaining accuracy.

# Calculate time range for querying CloudWatch metrics
ago = datetime.utcnow() - timedelta(days=LOOKBACK)
now = datetime.utcnow()

Identify idle endpoints – Based on CloudWatch metrics data, the script determines whether an endpoint is idle or active. If an endpoint has not received any calls for a defined period of time, it is flagged as idle. In this case, we choose a conservative default threshold of zero calls in the analyzed period. However, depending on your specific use case, you can adjust this threshold to suit your requirements.

# Helper function to extract endpoint name from CloudWatch metric

def get_endpoint_name_from_metric(metric):
    for d in metric("Dimensions"):
        if d("Name") == "EndpointName" or d("Name") == "InferenceComponentName" :
            yield d("Value")

# Helper Function to aggregate individual metrics for a designated endpoint and output the total. This validation helps in determining if the endpoint has been idle during the specified period.

def list_metrics():
    paginator = cloudwatch.get_paginator("list_metrics")
    response_iterator = paginator.paginate(Namespace=NAMESPACE, MetricName=METRIC)
    return (m for r in response_iterator for m in r("Metrics"))


# Helper function to check if endpoint is in use based on CloudWatch metrics

def is_endpoint_busy(metric):
    metric_values = cloudwatch.get_metric_data(
        MetricDataQueries=({
            "Id": "metricname",
            "MetricStat": {
                "Metric": {
                    "Namespace": metric("Namespace"),
                    "MetricName": metric("MetricName"),
                    "Dimensions": metric("Dimensions"),
                },
                "Period": PERIOD,
                "Stat": "Sum",
                "Unit": "None",
            },
        }),
        StartTime=ago,
        EndTime=now,
        ScanBy="TimestampAscending",
        MaxDatapoints=24 * (LOOKBACK + 1),
    )
    return sum(metric_values.get("MetricDataResults", ({}))(0).get("Values", ())) > 0

# Helper function to log endpoint activity

def log_endpoint_activity(endpoint_name, is_busy):
    status = "BUSY" if is_busy else "IDLE"
    log_message = f"{datetime.utcnow()} - Endpoint {endpoint_name} {status}"
    print(log_message)

Key Features – main() The function serves as the entry point for running your script. It orchestrates the process of retrieving the SageMaker endpoint, querying CloudWatch metrics, and logging the endpoint activity.

# Main function to identify idle endpoints and log their activity status
def main():
    endpoints = sagemaker.list_endpoints()("Endpoints")
    
    if not endpoints:
        print("No endpoints found")
        return

    existing_endpoints_name = ()
    for endpoint in endpoints:
        existing_endpoints_name.append(endpoint("EndpointName"))
    
    for metric in list_metrics():
        for endpoint_name in get_endpoint_name_from_metric(metric):
            if endpoint_name in existing_endpoints_name:
                is_busy = is_endpoint_busy(metric)
                log_endpoint_activity(endpoint_name, is_busy)
            else:
                print(f"Endpoint {endpoint_name} not active")

if __name__ == "__main__":
    main()

By following the instructions in the script, you will gain a deeper understanding of how to automate the identification of idle endpoints in SageMaker, paving the way for more efficient resource management and cost optimization.

Permissions required to run the script

Before you run the provided Python script to identify idle endpoints in SageMaker, make sure that your AWS Identity and Access Management (IAM) user or role has the required permissions. The permissions required by the script are:

CloudWatch permissions – The IAM entity that runs the script must have permissions for CloudWatch actions. cloudwatch:GetMetricData and cloudwatch:ListMetrics
SageMaker permissions – The IAM entity must have permission to list SageMaker endpoints. sagemaker:ListEndpoints action

Run a Python script

You can run a Python script in a variety of ways:

AWS CLI – Verify that the AWS Command Line Interface (AWS CLI) is installed and configured with the appropriate credentials.
AWS Cloud9 – If you prefer a cloud-based integrated development environment (IDE), AWS Cloud9 provides an IDE pre-configured for AWS development. Simply create a new environment, clone your script repository, and run the scripts within your Cloud9 environment.

In this post, I show you how to run a Python script through the AWS CLI.

Actions to take after identifying idle endpoints

After you have successfully identified idle endpoints in your SageMaker environment using Python scripts, you can take proactive steps to optimize resource utilization and reduce operational costs. Here are some actionable measures you can implement:

Delete or scale down endpoints – For endpoints that have no consistent activity over a long period of time, consider deleting or scaling down to minimize resource waste. SageMaker allows you to delete idle endpoints using the AWS Management Console or programmatically using the AWS SDK.
Review and improve your model deployment strategy – Evaluate your ML model deployment strategy to determine whether all deployed endpoints are necessary. Changing business requirements or model updates can result in endpoints sitting idle. Reviewing your deployment strategy can help you identify opportunities to consolidate or optimize endpoints to gain efficiency.
Implement an autoscaling policy – Configure an autoscaling policy for your active endpoints to dynamically adjust compute capacity based on your workload demands. SageMaker supports autoscaling, allowing you to automatically increase or decrease the number of instances serving predictions based on defined metrics such as CPU utilization or inference latency.
Explore serverless inference options – As an alternative to traditional endpoint provisioning, consider using SageMaker serverless inference. Serverless inference eliminates the need for manual endpoint management by automatically scaling compute resources based on incoming prediction requests. This significantly reduces idle capacity and optimizes costs for intermittent or unpredictable workloads.

Conclusion

In this post, we discussed the importance of identifying idle endpoints in SageMaker and provided a Python script to help automate this process. By implementing a proactive monitoring solution and optimizing resource utilization, SageMaker users can effectively manage their endpoints, reduce operational costs, and maximize the efficiency of their machine learning workflows.

Try using the techniques presented in this post to automate your SageMaker inference cost monitoring. Check out the AWS re:Post for valuable resources on optimizing your cloud infrastructure and getting the most out of AWS services.

resource

For more information about the features and services used in this post, see:

About the Author

Pablo Colazurdo Pablo is a Principal Solutions Architect at AWS, helping customers launch successful projects in the cloud. He has years of experience working with various technologies and is eager to learn new things. Pablo grew up in Argentina, but now enjoys Irish rain, listening to music, reading, and playing Dungeons & Dragons with his kids.

Ozgur Kanibeyaz Ozgur is a Senior Technical Account Manager at AWS with 8 years of experience. Ozgur helps customers optimize their AWS usage by solving technical challenges, exploring cost savings opportunities, improving operational efficiencies, and building innovative services using AWS products.

Identifying idle endpoints in Amazon SageMaker

Identifying idle endpoints with a Python script

Permissions required to run the script

Run a Python script

Actions to take after identifying idle endpoints

Conclusion

resource

About the Author

Prompt engineering techniques and best practices: Learn by doing with Anthropic’s Claude 3 on Amazon Bedrock

Introducing guardrails to Amazon Bedrock knowledge bases

Medical content creation in the age of generative AI

LEAVE A REPLY Cancel reply

Most Popular

Blink Outdoor 4 review: Affordable but lacking

Proton Drive: Like Google Docs, but with end-to-end encryption

Proton Drive: Like Google Docs, but with end-to-end encryption

The Apple Watch doesn’t need to be thinner

Recent Comments

EDITOR PICKS

Blink Outdoor 4 review: Affordable but lacking

Proton Drive: Like Google Docs, but with end-to-end encryption

Proton Drive: Like Google Docs, but with end-to-end encryption

POPULAR POSTS

Deadpool & Wolverine Will Finally Unlock Marvel’s X-Men Movies

On-ear headphones vs. Over-ear headphones: Which sounds better?

Best Dyson deal: Save $100 on Dyson AirStraight straightener

POPULAR CATEGORY

ABOUT US

FOLLOW US