Artificial Intelligence Learns to Program Itself: The Darwin-Gödel Algorithm

Revolutionary research demonstrates the creation of AI agents capable of recursively improving their own code. The new Darwin-Gödel algorithm enables coding agents to self-improve, achieving impressive results in automated programming.

G. Ostrov

June 29, 2025

Scientists have finally managed to close the long-awaited loop by creating AI agents capable of recursively improving themselves. New research presents an impressive example of such a system based on the Darwin-Gödel algorithm.

History of Self-Improving Systems Development

In 2003, renowned scientist Jürgen Schmidhuber created problem solvers that rewrote their own code only when they could formally prove the usefulness of updates. These systems were called "Gödel machines" in honor of Kurt Gödel, the mathematician who worked on self-referential systems. However, for complex agents, provable usefulness was difficult to achieve.

Darwin-Gödel Algorithm: A New Approach

The new systems described in recent research rely on empirical evidence. In tribute to Schmidhuber, they are called Darwin-Gödel Machines (DGMs). A DGM starts with a coding agent that can read, write, and execute code, using large language models (LLMs) for reading and writing.

The system then applies an evolutionary algorithm to create multiple new agents. At each iteration, the DGM selects one agent from the population and tasks the LLM with creating one change to improve the agent's coding ability. LLMs have intuition about what might help because they are trained on large amounts of human-written code.

Unique System Features

Unlike traditional evolutionary algorithms that keep only the best performers, DGMs preserve all agents. This is done in case an innovation that initially failed becomes the key to a breakthrough later after further development. This is a form of "open-ended search" that doesn't close paths to progress.

Impressive Testing Results

Scientists ran DGMs for 80 iterations using SWE-bench and Polyglot benchmark codes. The results exceeded all expectations:

On SWE-bench, agent scores improved from 20% to 50%
On Polyglot, from 14% to 31%

"We were really very surprised that the agent could write such complex code by itself," said Jenny Zhang, lead author of the study from the University of British Columbia. "It could edit multiple files, create new files, and create really complex systems."

Safety and Limitations

Understanding potential risks, researchers added necessary safeguards. DGMs were kept in sandboxed environments without internet or operating system access, and all code changes were logged and verified. Future research plans include continuing studies with rewards for agents that make themselves more interpretable and aligned.

While the best SWE-bench agent hasn't yet reached the level of the best human developers (around 70%), it was created entirely automatically. With sufficient time and computational resources, such agents may surpass human expertise in programming.

For more information, visit the official iXBT website.

If you have any problems, contact us, we will help quickly and efficiently!