Bo Li, an associate professor at the University of Chicago who specializes in stress testing and detecting misconduct in AI models, has become a go-to resource for some consulting firms, which are often less concerned with how smart an AI model is than with how many legal, ethical or regulatory compliance issues it might pose.
Li and several other university colleagues, along with Virtue AI and Lapis Labs, two companies Li co-founded, recently developed a taxonomy of AI risk and a benchmark that reveals how different large-scale language models break the rules. “We need some principles for AI safety, both in terms of regulatory compliance and normal use,” Li told WIRED.
The researchers analyzed government AI regulations and guidelines, including from the US, China and the EU, and studied the usage policies of 16 leading AI companies around the world.
The researchers also built AIR-Bench 2024, a benchmark that uses thousands of prompts to determine how well general AI models perform on specific risks. For example, Anthropic’s Claude 3 Opus scores highly for refusing to generate cybersecurity threats, and Google’s Gemini 1.5 Pro scores highly for avoiding generating non-consensual sexual nudity.
The DBRX Instruct model developed by Databricks received the worst scores across the board, and the company said when it unveiled the model in March that it would continue to improve the model’s safety features.
Anthropik, Google and Databricks did not immediately respond to requests for comment.
Understanding the risk landscape and the strengths and weaknesses of specific models may become increasingly important for companies looking to deploy AI in specific markets or for specific use cases. For example, a company looking to use LLMs for customer service may be more concerned about a model’s tendency to produce offensive language when provoked than its ability to design nuclear weapons.
Bo said the analysis also uncovered some interesting questions about how AI should be developed and regulated. For example, the researchers found that government rules are not as comprehensive as companies’ overall policies, suggesting there is room for stronger regulation.
The analysis also suggests that some companies need to do more to ensure their models are safe: “When you test models against a company’s own policies, they’re not always compliant,” Bo said. “That means there’s a lot of room for improvement.”
Other researchers are trying to sort out the confusing landscape of AI risks. This week, two MIT researchers published their own AI Hazards Database compiled from 43 different AI risk frameworks. “Many organizations are very early in their AI adoption process,” meaning they need guidance on possible dangers, says Neil Thompson, an MIT research scientist working on the project.
Peter Slattery, the project leader and a researcher at MIT’s FutureTech group, which studies advances in computing, said the database highlights the fact that some AI risks are getting more attention than others. For example, more than 70% of the frameworks mention privacy and security issues, but only about 40% mention misinformation.
Efforts to categorize and measure AI risk need to evolve as AI evolves. Li says it’s important to investigate emerging issues like the emotional stickiness of AI models. Her company recently analyzed the largest and most powerful version of Meta’s Llama 3.1 model. They found that while the models have become more capable, their safety hasn’t improved much, reflecting a broader disconnect. “Safety hasn’t really improved significantly,” Li said.