His group decided to find out. They created a new, diverse version of AlphaZero, which includes multiple AI systems that are trained independently and under different conditions. The algorithm governing the overall system acts as a kind of virtual matchmaker, Zahavi said: one designed to identify which agent has the best chance of succeeding when it’s time to make a move. There is an opportunity. He and his colleagues also coded in a “diversity bonus,” a reward for the system whenever it picked a strategy from a large selection of choices.

Chess piece

When the new system was let loose to play its games, the team observed many variations. The varied AI player experimented with new, efficient startups and novel — but good — decisions about specific strategies, such as when and where to build a castle. In most matches he defeated the original Alpha Zero. The team also found that the variant version could solve twice as many challenging puzzles as the original and could solve more than half of the total catalog of Penrose puzzles.

“The idea is to find a single solution, or a single policy, that will defeat any player here. [it uses] The idea of ​​creative diversity,” Cooley said.

With access to more and different games played, the diverse Alpha Zero had more options when sticky situations arose, Zahavi said. “If you can control the type of games that see it, you basically control how it’s going to be publicized,” he said. Those peculiar internal rewards (and the actions associated with them) can become the driving forces of diverse behaviors. The system can then learn to evaluate and evaluate different approaches and see when they are successful. “We found that this group of agents could actually come to an agreement on these positions.”

And, importantly, the implications go beyond chess.

Real life creativity

Culley said the multivariate approach can help any AI system, not just one based on reinforcement learning. He has long used variations to train body systems, including a A six-legged robot He was allowed to explore a variety of movements before being deliberately “injured”, allowing him to continue using some of the techniques he had previously developed. “We were just trying to find solutions that were different from all the previous solutions we had found so far.” Recently, he is also collaborating with researchers to use diversification to identify new drug candidates and develop effective stock trading strategies.

“The goal is to generate a large collection of potentially thousands of different solutions, where each solution is very different from the next,” Klee said. So—just as the versatile chess player learned to do—for each type of problem, the overall system can choose the best possible solution. Zahavy’s AI system clearly demonstrates “how exploring diverse strategies helps to think outside the box and find solutions,” he said.

Zahavi suspects that for AI systems to think creatively, researchers simply need to ask them to consider more options. This hypothesis suggests an interesting connection between humans and machines: intelligence may simply be a matter of computational power. For an AI system, perhaps creativity boils down to the ability to consider and choose from a large buffet of options. As the system reaps rewards for selecting a variety of optimal strategies, this type of creative problem solving is reinforced. Ultimately, in theory, it could emulate any type of problem-solving strategy recognized as creative in humans. Creativity will become a computational problem.

Liemhetcharat notes that a diverse AI system is unlikely to fully solve the broader generalization problem in machine learning. But this is a step in the right direction. “It’s narrowing down the gaps,” he said.

More practically, Zahavi’s findings echo recent efforts showing how cooperation can lead to better performance on difficult tasks among humans. Most of the hits on the Billboard 100 list were written by teams of songwriters, for example, not individuals. And there is still room for improvement. The multivariate approach is currently computationally expensive, as it must consider many more possibilities than a typical system. Zahavi isn’t even sure that the varied alpha-zero even captures the full spectrum of possibilities.

“Me still [think] There’s room to find different solutions,” he said. “It’s not clear to me that, given all the data in the world, there is. [only] An answer to every question”

The original story Reprinted with permission. Quanta Magazine, Editorially independent publication of Simon Foundation Its mission is to enhance the public’s understanding of science by covering research developments and trends in the mathematical and physical and life sciences.