This post was co-authored with Isaac Cameron and Alex Gnibus from Tecton.
Companies are under pressure to demonstrate return on investment (ROI) from AI use cases, whether it’s predictive machine learning (ML) or generative AI. Only 54% of ML prototypes go to production, and only 5% of generative AI use cases go to production.
ROI doesn’t just mean reaching production; it’s about model accuracy and performance. Real-time use cases that directly impact revenue every millisecond require scalable, reliable systems with high accuracy and low latency.
For example, fraud detection requires very low latency because decisions need to be made within the time it takes to swipe a credit card. With fraud on the rise, more organizations are implementing successful fraud detection systems. National fraud losses in the United States in 2023 exceeded $10 billion, an increase of 14% from 2022. Global e-commerce fraud is predicted to exceed $343 billion by 2027.
But building and managing accurate and reliable AI applications that can solve $343 billion problems is extremely complex.
ML teams often start by manually piecing together various infrastructure components. Batch data seems easy at first, but engineering becomes more complex when you need to move from batch data to incorporating real-time and streaming data sources, and from batch inference to real-time serving.
Engineers must build and orchestrate data pipelines, reconcile the different processing needs of different data sources, manage the computing infrastructure, and build a reliable service infrastructure for inference. Without Tecton functionality, the architecture would look like the following diagram.
Accelerate AI development and deployment with Amazon SageMaker and Tecton
Tecton and Amazon SageMaker simplify all manual complexity. Together, Tecton and SageMaker abstract the engineering required for production real-time AI applications. This reduces time to value and allows engineering teams to focus on building new features and use cases instead of struggling to manage existing infrastructure.
SageMaker allows you to build, train, and deploy ML models. Tecton, on the other hand, makes it easy to compute, manage, and retrieve features to enhance models in SageMaker, both for offline training and online serving.. This streamlines the end-to-end functional lifecycle for production-scale use cases and enables a simpler architecture, as shown in the following diagram.
How does it work? Using Tecton’s easy-to-use declarative framework, you define your feature transformations in a few lines of code, and Tecton builds the necessary pipelines to compute, manage, and serve your features. Masu. Tecton will be responsible for the complete production deployment and online services.
It doesn’t matter whether the data is batch, streaming, real-time, or served offline or online. It’s one common framework for all end-to-end feature production data processing needs.
The framework creates a central hub for feature management and governance with enterprise feature store capabilities to observe data lineage for each feature pipeline, monitor data quality, and reuse features across multiple models and teams. Make it easy.
The following diagram shows the Tecton declarative framework.
In the next section, we consider a fraud detection example to demonstrate how Tecton and SageMaker accelerate both the training and real-time servicing of production AI systems.
Streamline feature development and model training
First, we need to develop the features and train the model. Tecton’s declarative framework makes it easy to define features and generate accurate training data for SageMaker models.
- Experiment and iterate on SageMaker notebook features – Tecton’s Software Development Kit (SDK) allows you to interact directly with Tecton through your SageMaker notebook instance, allowing for flexible experimentation and iteration without leaving the SageMaker environment.
- Orchestration with Tecton-managed EMR clusters – After a feature is deployed, Tecton automatically creates the necessary scheduling, provisioning, and orchestration for a pipeline that can run on the Amazon EMR compute engine. You can view and create EMR clusters directly through SageMaker notebooks.
- Generate accurate training data for SageMaker models – For model training, data scientists can use Tecton’s SDK within a SageMaker notebook to capture historical features. Using the same code to backfill your offline store and continuously update your online store reduces training/service bias.
Next, the functionality must be delivered online so that the final model can be used in production.
Deliver functionality with robust real-time online reasoning
Tecton’s declarative framework also applies to online services. Tecton’s real-time infrastructure is designed to meet the demands of a wide range of applications and can reliably perform 100,000 requests per second.
For critical ML apps, meeting demanding service level agreements (SLAs) in a scalable and cost-effective manner can be difficult. For real-time use cases such as fraud detection, p99 latency budgets typically range from 100 to 200 ms. This means that 99% of requests should be faster than 200ms for the end-to-end process from feature retrieval to model scoring to post-processing.
Only a portion of the end-to-end latency budget is allocated to feature provisioning. That means your solution needs to be especially fast. Tecton eliminates these latencies by providing functionality for inference through a low-latency REST API that integrates with both disk-based and in-memory data stores, supports in-memory caching, and integrates with SageMaker endpoints. correspond to your requirements.
Now you can complete your fraud detection use case. In a fraud detection system, when someone makes a transaction (such as purchasing something online), the app may take the following steps:
- Check out other services to get more information (for example, “Is this seller known to be risky?”) from third-party APIs.
- Get important historical data about users and their behavior (for example, “How often does this person usually spend this amount?” or “Has this person shopped at this location before?”) and request ML capabilities from Tecton.
- Perhaps use the streaming feature to compare current transactions to recent spending activity over the past few hours or minutes.
- All this information is sent to a model hosted on Amazon SageMaker that predicts whether a transaction is fraudulent.
This process is shown in the following diagram.
Extend to generated AI use cases using existing AWS and Tecton architecture
After you develop your ML capabilities using Tecton and the AWS architecture, you can extend your ML work to generative AI use cases.
For example, in the fraud detection example, you can add an LLM-powered customer support chat to help users answer questions about their accounts. Chats need to reference various data sources to generate useful responses. This includes unstructured documents in your knowledge base, such as policy documents regarding account suspension causes, as well as structured data such as transaction history and real-time account activity.
If you are using a Search Augmentation Generation (RAG) system to provide context to LLM, you can use your existing ML feature pipeline as context. With Tecton, you can use the same declarative framework to enrich your prompts with contextual data and provide functionality as a tool to your LLM.
To help you choose and customize the best model for your use case, Amazon Bedrock provides a variety of pre-trained foundational models (FMs) for inference. Alternatively, you can use SageMaker for more extensive model building and training.
The following diagram shows how Amazon Bedrock is incorporated to support generative AI capabilities in a fraud detection system architecture.
Build valuable AI apps faster with AWS and Tecton
In this post, you learned how to use SageMaker and Tecton to help AI teams train and deploy high-performance, real-time AI applications without complex data engineering efforts. Tecton combines production ML capabilities with the convenience of running everything within SageMaker, whether it’s in the development phase of training a model or running real-time inference in production.
To get started, see Getting Started with the Amazon SageMaker and Tecton Functional Platform. This is a detailed guide on how to use Tecton with Amazon SageMaker. If you can’t wait to try it for yourself, check out Tecton’s interactive demo to observe fraud detection use cases in action.
Tecton can also be found on AWS re:Invent. Contact us to set up a meeting with an on-site expert for your AI engineering needs.
About the author
Isaac Cameron Tecton’s Lead Solution Architect guides customers in the design and deployment of real-time machine learning applications. Having previously built a custom ML platform from scratch for a major US airline, he has first-hand experience with the challenges and complexities involved, and understands the power of leveraging a modern managed ML/AI infrastructure. I am a supporter.
Alex Gnibus He is a technical evangelist at Tecton, enabling the engineering team to take technical concepts and execute them. Through his work educating practitioners, Alex has developed deep expertise in identifying and addressing the practical challenges teams face when operationalizing AI systems.
Arnab Sinha He is a Senior Solutions Architect at AWS, specializing in designing scalable solutions that drive business outcomes with AI, machine learning, big data, digital transformation, and application modernization. With expertise across industries such as energy, healthcare, retail, and manufacturing, Arnab holds all AWS certifications including ML specialty and led technology and engineering teams prior to joining AWS .