OpenAI sprung out of its silent zone with a new research paper on ‘Prover-Verifier Games’ (PVG) for LLMs. PVGs look to improve the ‘legibility’ of LLM outputs or rather make sure that LLMs produce understandable and logical text even for complex tasks such as solving maths problems or coding.
In this method, OpenAI trained advanced language models to generate text that can be easily verified by weaker models. It was observed that this training improved the comprehensibility of the text for human evaluators, which hints at improving legibility.
The ‘Prover’ and ‘Verifier’
“Techniques like this seem promising for training superhuman models to explain their actions in a way that humans can understand better (and get less fooled by). I’d be excited to see this method tried on harder tasks and with stronger models,” said Jan Leike, the co-author and former researcher at OpenAI, who had worked on the recent PVG paper.
The paper is based on the first concept of PVG released in 2021, which is a game-theoretic framework designed to incentivise learning agents to resolve decision problems in a verified manner.
Akin to a check system, a ‘prover’ generates a solution which a ‘verifier’ checks for accuracy. OpenAI’s method trains small verifiers to judge solution accuracy, encourages “helpful” provers to produce correct solutions approved by verifiers, and tests “sneaky” provers with incorrect solutions to challenge verifiers.
It was noticed that over the course of training, the prover’s accuracy and the verifier’s robustness to adversarial attacks increased. Interestingly, the PVG system alludes to a form of reinforcement learning, something OpenAI’s co-founder and former chief scientist Ilya Sutskever was a strong advocate of.
Prover-Verifier Games for LLMs
Source: X
Looking back at the history of OpenAI’s models much before ChatGPT, the company had been extensively working on reinforcement learning systems. In 2018, OpenAI Five, which was built on five neural networks, defeated human teams at Dota 2. The system played 180 years worth of games against itself – a sort of reward mechanism in the loop to train itself.
“The neural network is going to take the observations and produce actions and then for a given setting of the parameters, you could figure out how to calculate how good they are. Then you could calculate how to compute the way to change these parameters to improve the model,” said Sutskever at an old Berkeley EECS seminar.
Interestingly, PVG also works on similar lines. However, it comes with its limitations. The experiment was done on maths problems which have an answer that can be tested via the right and wrong method. However, with topics that come with broad subjectivity, the PVG system for an LLM may struggle.
“It’s hard and expensive to codify the rules of life. How do we objectively determine whether one poem is more beautiful than another?
“I think a very interesting metric would be to measure the accuracy of the fine-tuned models on unrelated tasks to see if the lessons learned to be better at explaining maths problems would help the model perform better on explaining other problems (such as logic or reasoning),” said a user on HackerNews.
PVG for Superintelligence
Source: X
The prover-verifier gaming system looks to improve the accuracy of LLM-generated results. Not just that, it also sets the next path for achieving superintelligence.
The methodology has a significant advantage in reducing the dependence on human demonstrations or judgments of legibility. This independence is particularly relevant for future superintelligence alignment.
While the study focused on a single dataset and currently necessitates ground truth labels, it is anticipated that these methodologies will prove pivotal in the development of AI systems. Their goal is not only to ensure correctness in outputs but also to facilitate transparent verification, thereby enhancing trust and safety in real-world applications. However, will the new method form the next standard for LLM accuracy, is something that remains to be seen.