OpenAI wants AI to help humans train AI

June 27, 2024

5

One of the key elements that made ChatGPT such a huge success was its army of human trainers, who provided guidance to the artificial intelligence model behind the bot about what constitutes good and bad output. OpenAI now says that adding more AI to assist the human trainers will make the AI helpers smarter and more reliable.

In developing ChatGPT, OpenAI pioneered Reinforcement Learning with Human Feedback (RLHF). This technique uses input from human testers to fine-tune an AI model so that its output is deemed more consistent, less jarring, and more accurate. The ratings that trainers give are fed back into the algorithms that control the model’s behavior. This technique has proven invaluable in making chatbots more trustworthy and useful, and in preventing fraud.

“Although RLHF works very well, it has some important limitations,” said Nat McAleese, an OpenAI researcher working on the new study. First, human feedback can be inconsistent. Also, evaluating highly complex outputs, such as advanced software code, can be difficult even for experienced humans. The process can also optimize the model to produce outputs that seem convincing without actually being accurate.

OpenAI has developed a new model that tweaks its most powerful product, GPT-4, to help human trainers evaluate code. The company found that the new model, named CriticGPT, can detect bugs that humans miss, and that human reviewers found the model’s code critiques to be better 63 percent of the time. OpenAI is considering extending the approach to areas beyond code in the future.

“We’re starting to work on integrating this technology into the RLHF chat stack,” McAleese said. He noted that the approach is imperfect because CriticGPT can also make mistakes due to hallucinations, but added that the technology could help improve the accuracy of OpenAI’s models and tools like ChatGPT by reducing human training errors. He added that the technology could be crucial in making AI models smarter, as it could allow humans to help train AI beyond their own capabilities. “As models get better and better, we think people will need more help,” McAleese said.

This new technique is one of many currently being developed to improve and unlock more capability from large language models, and is also part of an effort to ensure that AI operates within acceptable limits as it becomes more capable.

Earlier this month, OpenAI rival Anthropic, founded by former OpenAI employees, announced a more capable version of its chatbot, called Claude, thanks to improvements to the model’s training plan and input data. Anthropic and OpenAI also recently touted new ways to inspect AI models to understand how they arrive at their outputs in order to more effectively prevent undesirable behaviors such as deception.

The new technology could help OpenAI train increasingly powerful AI models, ensuring that their output is more trustworthy and aligned with human values, especially if the company can successfully deploy it outside of code. OpenAI has said it is training its next major AI model, and the company is clearly keen to show it is serious about checking how it works. This follows the dissolution of a prominent team dedicated to assessing the long-term risks posed by AI. The team was co-led by company co-founder and former board member Ilya Sutskever, who briefly ousted CEO Sam Altman from the company, then retracted and helped Altman regain control. Several members of the team have since criticized the company for taking risks in its rush to develop and commercialize powerful AI algorithms.

The idea of using AI models to train more powerful models has been discussed for a while, says Dylan Hadfield Mennell, a professor at MIT who studies ways to tune AI. “It’s a pretty natural evolution,” he says.

Hadfield-Menell notes that the researchers who first developed the technology used in RLHF discussed a related idea several years ago. She says it remains to be seen how generally applicable and powerful it is. “It could lead to significant improvements in individual performance, and it could be a stepping stone to more effective feedback in the longer term,” she says.

OpenAI wants AI to help humans train AI

Wordle of the Day: 4th of July answers and clues

Best Prime Deal: Get 58% off the Fire HD 8 tablet

Best iPad deal: Get the M1 iPad Air for up to $250 off at Best Buy

LEAVE A REPLY Cancel reply

Most Popular

Wordle of the Day: 4th of July answers and clues

Twilio’s Authy app breach exposes millions of phone numbers

FDA Pulls Food Additive in Citrus Sodas Over Health Risks

Best early Prime Day Chromebook deals: Asus, HP, and more

Recent Comments

EDITOR PICKS

Wordle of the Day: 4th of July answers and clues

Twilio’s Authy app breach exposes millions of phone numbers

FDA Pulls Food Additive in Citrus Sodas Over Health Risks

POPULAR POSTS

Twilio’s Authy app breach exposes millions of phone numbers

Best Buy’s 4th of July 2024 Sale

FDA Pulls Food Additive in Citrus Sodas Over Health Risks

POPULAR CATEGORY

ABOUT US

FOLLOW US