As generative AI continues to drive innovation across industries and daily life, the need for responsible AI becomes increasingly important. At AWS, we believe that the long-term success of AI depends on inspiring trust among users, customers, and society. This belief is at the heart of our long-standing commitment to building and using AI responsibly. Responsible AI does more than just reduce risk and align with relevant standards and regulations. It’s about proactively building trust and unlocking the potential of AI to drive business value. A holistic approach to responsible AI enables organizations to boldly innovate and achieve transformative business outcomes. A new joint study conducted by Accenture and AWS confirms this, highlighting responsible AI as a key driver of business value that increases product quality, operational efficiency, customer loyalty, brand awareness, and more. Almost half of surveyed companies recognize that responsible AI is critical to driving AI-related revenue growth. why? Responsible AI builds trust, and trust accelerates adoption and innovation.
Making trust the cornerstone of AI adoption, AWS re:Invent 2024 announces new responsible AI tools, features, and resources to make AI services and models more safe, secure, and transparent, and to help customers support their own I’m happy to be able to do this. A responsible AI journey.
Take proactive steps to manage AI risk and promote trust and interoperability
AWS is the first major cloud service provider to announce ISO/IEC 42001 certification for AI services for Amazon Bedrock, Amazon Q Business, Amazon Textract, and Amazon Transcribe. ISO/IEC 42001 is an international management system standard that outlines requirements for organizations to responsibly manage AI systems throughout their lifecycle. Technical standards such as ISO/IEC 42001 are important because they provide a common framework for responsible AI development and deployment, promoting trust and interoperability in an increasingly global AI-driven technology environment. is. Achieving ISO/IEC 42001 certification provides independent third-party verification that AWS takes proactive steps to manage the risks and opportunities associated with developing, deploying, and operating AI. It means that you did. With this certification, we strengthen our commitment to providing AI services that help our customers innovate responsibly with AI.
Expanding Amazon Bedrock Guardrails Safeguards to Increase Transparency and Safety
In April 2024, we announced the general availability of Amazon Bedrock Guardrails. This makes it easier to apply safety and responsible AI checks to Gen AI applications. In addition to the native protection provided by the Foundation Model (FM), Amazon Bedrock Guardrails blocks up to 85% more harmful content and uses Search Augmented Generation (RAG) context grounding checks to block up to 85% more harmful content from the model. Provides industry-leading safety protection by filtering over 75% of psychedelic reactions. ) and examples of using summaries. Being able to implement these safeguards was a huge step forward in building trust in AI systems. Despite advances in FM, models can still cause hallucinations, a challenge faced by many customers. For use cases where accuracy is critical, customers must use mathematically sound techniques and explainable reasoning to generate accurate FM responses.
To address this need, we are adding new safeguards to Amazon Bedrock Guardrails to help prevent factual errors caused by FM hallucinations and provide verifiable evidence. With the launch of automated inference checks in Amazon Bedrock Guardrails (preview), AWS becomes the first and only major cloud provider to integrate automated inference into its generative AI products. Automatic inference checking helps prevent factual errors due to hallucinations by using sound mathematical, logic-based algorithmic verification and reasoning processes to verify the information produced by the model. Therefore, the output is consistent with the facts provided and is not based on hallucinatory or contradictory data. When used in conjunction with other techniques such as prompt engineering, RAG, and context grounding checks, automated inference checking adds a more rigorous and verifiable approach to increasing the accuracy of LLM-generated output. Encoding domain knowledge into structured policies enables conversational AI applications to provide reliable information to users.
Click on the image below to see a demo of Amazon Bedrock Guardrails’ automated inference checking.
The need for content filters extends beyond text as organizations increasingly use applications with multimodal data to drive business value, improve decision-making, and improve customer experience. Amazon Bedrock Guardrails now supports Multimodal Toxic Detection with Image Content Support (in preview), allowing organizations to detect unwanted and potentially harmful image content while maintaining safe and relevant visuals. and can be filtered. Multimodal toxicity detection can help reduce the heavy lifting required to build your own safeguards for image data or invest time in error-prone and tedious manual assessments. Amazon Bedrock Guardrails helps you build AI applications responsibly and build trust with your users.
Improve the response and quality of generated AI applications with new Amazon Bedrock evaluation features
With more general-purpose FMs to choose from, organizations now have a wider range of options to power their generative AI applications. However, choosing the best model for a specific use case requires efficiently comparing models based on the quality and responsible AI metrics your organization prefers. Assessment is an important part of building trust and transparency, but each new use case requires significant time, expertise, and resources, so choose the model that provides the most accurate and secure customer experience. becomes difficult. Amazon Bedrock Evaluation addresses this issue by allowing you to evaluate, compare, and choose the best FM for your use case. You can now use LLM as an adjudicator (in preview) for model evaluation to run tests with human-like quality on your dataset and evaluate other models. You can choose to be your judge from LLMs hosted on Amazon Bedrock with a variety of quality and responsible AI metrics, including accuracy, completeness, and harmfulness. You can also use your own prompt datasets to customize assessments with data and compare results across multiple assessment jobs to speed up decision-making. Previously, you could choose between human-based model evaluation or automated evaluation using exact string matching and other traditional natural language processing (NLP) metrics. Although these methods are fast, they did not yield strong correlations with human raters. With LLM-as-a-judge, you can achieve human-like evaluation quality while saving up to weeks of time and at a much lower cost than a full human evaluation. Many organizations still prefer final evaluation by professional human annotators. That’s why Amazon Bedrock continues to offer fully human-based assessments, with the option of bringing in your own employees or letting AWS manage your custom assessments.
Organizations use RAGs to provide FM with up-to-date, proprietary information. RAG is a technology that takes data from enterprise data sources and enriches prompts to provide better and more accurate responses. However, evaluating and optimizing RAG applications can be difficult due to the complexity of optimizing the acquisition and production components. To address this, we introduced RAG evaluation support in Amazon Bedrock Knowledge Bases (in preview). With this new evaluation feature, you can now easily and quickly evaluate and optimize your RAG applications where your data and LLM already exist. RAG evaluation, powered by LLM-as-a-judge technology, offers several judge models and metrics choices, including context relevance, context coverage, accuracy, and fidelity (hallucination detection). This seamless integration facilitates regular evaluation and fosters a culture of continuous improvement and transparency in AI application development. These tools enable organizations to power AI applications and build trust through consistent improvement by saving both cost and time compared to human assessments.
Both the model and the RAG rating feature provide a natural language description of each score in the output file and in the AWS Management Console. Scores are normalized from 0 to 1 for ease of interpretation. The rubric is fully documented, along with the judge’s instructions, so you don’t need to be a scientist to understand how the scores are derived. For more information about the model and RAG’s evaluation capabilities, please see our news blog.
Introducing Amazon Nova, built around responsible AI
Amazon Nova is a new generation of cutting-edge FM that delivers cutting-edge intelligence and industry-leading price performance. Amazon Nova FM includes built-in safeguards to detect and remove harmful content from your data, reject inappropriate user input, and filter model output. We operationalize the dimensions of Responsible AI as a set of design goals that inform decision-making throughout the model development lifecycle, from initial data collection and pre-training to model tuning and post-deployment runtime mitigation implementation. will guide you. Amazon Nova Canvas and Amazon Nova Reel come with controls to support your safety, security, and IP needs using responsible AI. This includes watermarking, content moderation, and C2PA support (available on Amazon Nova Canvas) that adds metadata to generated images by default. Amazon’s security measures to combat the spread of misinformation, child sexual abuse material (CSAM), and chemical, biological, radiological, or nuclear (CBRN) risks also apply to Amazon Nova models. Applies. To learn more about how Amazon Nova was built responsibly, visit the Amazon Science blog.
Greater transparency with new resources to advance responsible generative AI
re:Invent 2024 brings availability of new AWS AI service cards for Amazon Nova Reel, Amazon Canvas, Amazon Nova Micro, Lite, Pro, Amazon Titan Image Generator, and Amazon Titan Text Embeddings to increase Amazon FM transparency I announced my gender. These cards provide comprehensive information about intended use cases, limitations, responsible AI design choices, and best practices for deployment and performance optimization. A key component of Amazon’s Responsible AI documentation, the AI ​​Service Card describes what we do to build services in a responsible way that addresses fairness, explainability, privacy and security, safety, and controllability. Provide your customers and the broader AI community with a central resource to understand your development process. , veracity and robustness, governance, and transparency. As generative AI continues to grow and evolve, transparency around how the technology is developed, tested, and used will be essential to gaining the trust of organizations and their customers. Explore all 16 AI service cards at Responsible AI Tools and Resources.
We also updated our AWS AI Responsible Use Guide. Based on our extensive learning and experience with AI, this document provides considerations for responsibly designing, developing, deploying, and operating AI systems. It was written with a diverse set of AI stakeholders and perspectives in mind, including but not limited to builders, decision makers, and end users. AWS is committed to continually providing these transparency resources to the broader community and repeatedly gathering feedback on best practices.
Delivering breakthrough innovation with trust at the forefront
At AWS, we are dedicated to fostering trust in AI and enabling organizations of all sizes to build and use AI effectively and responsibly. We’re excited about the responsible AI innovations announced at re:Invent this week. More tools, resources, and built-in protections, from Amazon Bedrock’s new safeguards and assessment technology to state-of-the-art Amazon Nova FM, promoting trust and transparency with ISO/IEC 42001 certification and the new AWS AI Service Card. to help you innovate responsibly and unlock value with generative AI.
We encourage you to explore these new tools and resources.
About the author
Dr. Bhaskar Sridharan As Vice President of AI/ML and Data Services and Infrastructure, he oversees the strategic direction and development of key services including Bedrock, SageMaker, and critical data platforms such as EMR, Athena, and Glue.