Enabling Production-Grade Generative AI: New capabilities reduce costs, streamline production, and improve security

As generative AI moves from proof of concept (POC) to production, we are seeing a major shift in the way businesses and consumers interact with data, information, and each other. In what can be considered “Act 1” of the generative AI story, previously unimaginable amounts of data and computing have been used to create models that demonstrate the power of generative AI. In the last year alone, the sheer number of POCs has been staggering, with many businesses and even more individuals focused on learning and experimenting. Thousands of customers across industries have conducted dozens or even hundreds of experiments, exploring the potential and impact of generative AI applications.

By early 2024, we are beginning to see the beginning of “Act 2” where many POCs will evolve into production and generate significant business value. For more information on Acts 1 and 2, see Are You Ready for Second Generation AI? The shift to a production mindset brings new focus to key challenges as companies build and evaluate models for specific tasks, looking for the leanest, fastest, and most cost-effective options. Considering and reducing the investment required for production workloads means bringing new efficiencies to the sometimes complex process of building, testing, and fine-tuning foundation models (FMs).

Providing features that increase efficiency and reduce costs

Providing multiple entry points into your generative AI journey is essential to deliver value to enterprises moving generative AI applications into production. Our generative AI technology stack provides the services and capabilities you need to build and scale your generative AI applications, from the top layer of Amazon Q (the most capable generative AI-powered assistant to accelerate your software development), to the middle layer of Amazon Bedrock (the easiest way to build and scale generative AI applications using foundational models), to the bottom layer of Amazon SageMaker (built specifically to help you build, train, and deploy FMs). While each of these layers provides a different entry point, the fundamental truth is that all generative AI journeys start at the bottom layer of foundations.

Organizations that want to build their own model or want fine-grained control choose Amazon Web Services (AWS) because we help customers use the cloud more efficiently and take advantage of more powerful and cost-effective AWS capabilities, such as petabyte-scale networking capabilities, hyperscale clustering, and the right tools to help you build. Our significant investments at this layer increase the capabilities and efficiency of the services we provide at the higher layers.

To make generative AI use cases economical, training and inference must be performed on highly performant, cost-effective infrastructure built for AI. Amazon SageMaker makes it easy to optimize at each step of a model’s lifecycle, whether building, training, or deploying. However, training and inference for FMs presents challenges, including operational burden, overall cost, and performance lags that lead to a poor overall user experience. State-of-the-art generative AI models have average latencies in the order of seconds, and many of today’s large models are too large to fit on a single instance.

Furthermore, innovation in model optimization is occurring at a rapid pace, requiring modelers to put in months of research to learn and implement these techniques even before finalizing their deployment configurations.

Introducing Amazon Elastic Kubernetes Service (Amazon EKS) on Amazon SageMaker HyperPod

Recognizing these challenges, AWS launched Amazon SageMaker HyperPod last year. Earlier this week, we took efficiency a step further and announced the launch of Amazon EKS support on Amazon SageMaker HyperPod. The reason is that provisioning and managing the large GPU clusters required for AI can be a significant operational burden. And running training that takes weeks to complete is challenging because a single failure can derail the entire process. Ensuring infrastructure stability and optimizing performance for distributed training workloads can also be a challenge.

Amazon SageMaker HyperPod provides a fully managed service that removes the operational burden and enables enterprises to accelerate FM development at unprecedented scale. With Amazon EKS support for Amazon SageMaker HyperPod, builders can now use Amazon EKS to manage their SageMaker HyperPod clusters. Builders can use the familiar Kubernetes interface while eliminating the undifferentiated heavy lifting of configuring and optimizing these clusters for generative AI model development at scale. SageMaker HyperPod provides a highly resilient environment that automatically detects, diagnoses, and recovers from failures in the underlying infrastructure, enabling builders to train FMs for weeks or months at a time with minimal interruption.

Testimonials: Articul8 AI

“Amazon SageMaker HyperPod has significantly helped us manage and operate our compute resources more efficiently while minimizing downtime. We were an early adopter of the Slurm-based SageMaker HyperPod service and have benefited from its ease of use and resiliency, enabling us to improve productivity by up to 35% and rapidly scale our AI generation operations.”

As a Kubernetes company, we are excited to launch Amazon EKS support for SageMaker HyperPod. This is a game changer for us as it seamlessly integrates with our existing training pipelines and makes it even easier to manage and operate Kubernetes clusters at scale. Additionally, it will benefit our end customers as we will now be able to package and productize this capability into our gen AI platform, enabling them to run their own training and fine-tuning workloads in a more streamlined way.”

– Arun Subramanian, Founder and CEO, Articul8 AI

Bringing new efficiencies to inference

Despite the latest advances in generative AI modeling, the inference stage remains a major bottleneck. We believe that companies creating customer- and consumer-facing generative AI applications shouldn’t have to sacrifice performance for cost-efficiency. They should be able to have both. That’s why two months ago we announced Inference Optimization Toolkit for Amazon SageMakeris a fully managed solution that provides modern model optimization techniques such as speculative decoding, compilation, and quantization. Available across SageMaker, this toolkit provides a simple menu of modern optimization techniques that can be used individually or in combination to create “optimization recipes.” With easy access to and implementation of these techniques, customers can achieve up to 2X higher throughput while reducing the cost of generative AI inference by up to 50%.

Deploying a safe, reliable and responsible model

While cost and performance are important issues, it’s important not to lose sight of other concerns that come to the forefront when moving from POC to production. Whatever model you choose, it needs to be deployed in a safe, reliable, and responsible manner. We all need to be able to realize the full potential of generative AI while mitigating risks. We need to be able to easily implement safety measures for generative AI applications that are customized to our requirements and responsible AI policies.

That’s why we Amazon Bedrock guardrailis a service that provides customizable safeguards that allow you to filter prompts and model responses. Guardrails help block specific words or topics, and customers can use guardrails to identify and prevent restricted content from reaching end users.

We also have filters for harmful content and personally identifiable information (PII), as well as security checks against malicious prompts, such as prompt injection. Recently, we have also developed guardrails that mitigate hallucinations by checking if the response is within the source material and relevant to the query.

Providing value through groundbreaking innovation

Our partnership with the NFL and our collaborative Next Gen Stats program are impressive evidence of how a production mindset can bring real value to organizations, as well as people around the world. Using AWS AI tools and engineers, the NFL is taking tackle analytics to the next level, providing teams, broadcasters, and fans with deeper insights into one of football’s most important skills: tackling. As fans know, tackling is a complex, evolving process that unfolds throughout each play. But traditional statistics only tell part of the story. That’s why the NFL and AWS created Probability of Tackle, a groundbreaking AI-powered metric that can pinpoint missed tackles, when and where they were made, all in real time. To learn more, visit NFL on AWS.

Building this statistic required five years of historical data to train an Amazon SageMaker AI model that can process millions of data points per game and track 20 different characteristics for each of the 11 defenders every tenth of a second. The result is a literally game-changing statistic that provides unprecedented insight. The NFL can now quantify tackling efficiency in ways that were never possible before: a defender can be credited with attempting 15 tackles in a game and not missing any, or measure the number of missed tackles a running back forced. In total, the model yields at least 10 new statistics:

In the NFL, coaches can now quantify tackling efficiency and identify players who are consistently in the right position to make successful plays, while broadcasters can highlight tackles broken and made in real time for fans.

Achieving Breakthroughs with AWS

The NFL isn’t the only one using AWS to shift its focus from POC to production. Inspiring startups like Evolutionary Scale are making it easier to produce new proteins and antibodies. Airtable is making it easy for customers to build applications with data. And organizations like Slack are incorporating generative AI into their operations. Fastest-growing startups are choosing AWS to build and accelerate their businesses. In fact, 96% of all AI/ML unicorns use AWS, and 90% will by 2024. Forbes AI 50 is an AWS customer.

Why? Because we’re addressing the cost, performance, and security issues to enable production-grade generative AI applications. We’re empowering data scientists, ML engineers, and other developers with new capabilities that make generative AI development faster, easier, safer, and less expensive. As part of our ongoing efforts to democratize generative AI, we’re giving more organizations the ability to build and tune FMs, and the portfolio of intuitive tools to make it happen.

Driving the next wave of innovation

Optimizing costs, improving production efficiency, and ensuring security are among the biggest challenges as generative AI evolves from POC production. Amazon is helping address these issues by adding innovative new capabilities to Amazon SageMaker, Amazon Bedrock, and more. We are also lowering the barrier to entry by making these tools available to everyone, from large enterprises with ML teams to small businesses and individual developers just getting started. Enabling more people and organizations to experiment with generative AI will lead to an explosion of creative new use cases and applications. This is exactly what we are seeing as generative AI continues to rapidly evolve from a fascinating technology to an everyday reality: improving experiences, driving innovation, enhancing competitiveness, and creating significant new value.

About the Author

Baskar Sridharan is Vice President of AI/ML and Data Services & Infrastructure, where he oversees the strategic direction and development of key services including critical data platforms such as Bedrock, SageMaker, EMR, Athena and Glue.

Prior to his current role, Baskar spent nearly six years at Google where he contributed to advancements in cloud computing infrastructure, and prior to that he spent 16 years at Microsoft where he played key roles in the development of Azure Data Lake and Cosmos, disruptive products in the cloud storage and data management space.

Bhaskar earned his PhD in Computer Science from Purdue University and has been at the forefront of the technology industry for over 20 years.

He has lived in Seattle for over 20 years and enjoys the beauty of the Pacific Northwest and various outdoor activities with his wife and two children. In his free time, he enjoys practicing music and playing cricket and baseball with his children.

What's Hot

ViewSonic 27-inch OLED Gaming Monitor Review: Beyond Your Budget

How Vidmob is using generative AI to transform its creative data landscape

Mounting evidence suggests shingles vaccine prevents dementia

How Rocket Companies modernized their data science solution on AWS

Orchestrate an intelligent document processing workflow using tools in Amazon Bedrock

AWS and DXC collaborate to provide customizable, real-time voice-to-voice-to-voice translation capabilities for Amazon Connect

Turbocharging premium audit capabilities with the power of generative AI: Verisk’s journey toward a sophisticated conversational chat platform to enhance customer support

Generate synthetic counterparty (CR) risk data with generative AI using Amazon Bedrock LLMs and RAG

How Rocket Companies modernized their data science solution on AWS

Orchestrate an intelligent document processing workflow using tools in Amazon Bedrock

AWS and DXC collaborate to provide customizable, real-time voice-to-voice-to-voice translation capabilities for Amazon Connect

Telegram CEO arrested in investigation of criminal activity on platform

What’s the streamer’s next move?

EBSCOlearning scales assessment generation for their online learning content with generative AI

Most Popular

Dr. Anthony Fauci is worried about the next pandemic, but he’s more worried about democracy

Qi2 Wireless Charging: Everything You Need to Know (2024)

Today’s Wordle: August 3 Answers and Hints

Our Picks

Hear the healing power of a decent playlist

First Look at Latest Stephen King Adaptation

The mathematical mystery linking Sudoku, flight schedules and protein folding

Subscribe to our newsletter

Subscribe to Updates

What's Hot

Enabling Production-Grade Generative AI: New capabilities reduce costs, streamline production, and improve security

Providing features that increase efficiency and reduce costs

Introducing Amazon Elastic Kubernetes Service (Amazon EKS) on Amazon SageMaker HyperPod

Testimonials: Articul8 AI

Bringing new efficiencies to inference

Deploying a safe, reliable and responsible model

Providing value through groundbreaking innovation

Achieving Breakthroughs with AWS

Driving the next wave of innovation

About the Author

Related Posts

Subscribe to our newsletter

Subscribe to our newsletter