Discovering more efficient algorithms using Artificial Intelligence

google deepmind reinforcement learning Jul 19, 2023




A team at Google Deepmind is using artificial intelligence to develop more efficient algorithms than humans can design. Their latest breakthrough is improving sorting algorithms: basic procedures that order the elements of an array by size. Sorting algorithms are ubiquitous in all computer code, from web apps to airplane autopilots. It's estimated they are used billions of times per day, so even small improvements could have a global impact on resource savings.

The optimization of these algorithms had hit a ceiling with traditional analytical approaches, as improvements over decades were marginal. However, Deepmind has spent years applying cutting-edge techniques like deep learning and reinforcement learning to try to find solutions to these problems automatically.

In this article we'll look at AlphaDev, the AI system with which they managed to enhance sorting algorithms, potentially marking a turning point in computer science research.

Remember that at Escape Velocity Labs we teach the fundamentals and advanced algorithms of Reinforcement Learning in our course series on this branch of AI.


Problem formulation


In short, discovering an algorithm means finding a sequence of instructions that, when executed sequentially, produce a desired result. In this case, sorting an array of input values.

The researchers decided to use the assembly programming language for this instruction search. To do so, they framed the problem as a game called AssemblyGame for the AI to learn to play better. The game works as outlined in the following diagram:


The player (the AI) has a limited number of attempts to find an algorithm to solve the task (sorting values). On each turn, it receives game state information called st (the instructions chosen so far and the result of executing them, represented by computer memory registers).

Based on this state observation, the AI chooses the next instruction to add to its algorithm, called at. The game then gives a reward rt depending on how good the choice is (whether it gets us closer to completing a sorting algorithm, and how efficient it is - fewer instructions means greater reward).

The goal is for the AI to discover the best instructions to choose based on the rewards received.





Representation Network 

The AI the researchers designed to solve this task is called AlphaDev. In AssemblyGame, AlphaDev starts with a blank program and on each turn chooses an assembly instruction to progressively build it. It receives a positive reward if the added instruction makes the program more optimal, and a negative reward if it makes it worse. In this way, through trial and error the AI learns which sequences of actions maximize the reward. AlphaDev learns through millions of games to create increasingly better programs.

The AI has two parts. The first, called the Representation Network, is in charge of efficiently representing the state of the game. The second uses that representation as a basis for choosing which instructions to add to the algorithm we are constructing. 

The instructions of the algorithm we have built up to a certain point in the game are passed to a transformer network, which converts them into a numeric vector representing that partial algorithm. Transformers are very effective at processing sequential data like text or code.

On the other hand, we use a multi-layer neural network to encode the CPU state as another numeric vector. This representation will allow our AI to understand the effect that its chosen instructions have on the CPU of our computer, to evaluate their performance.

The two numeric vectors extracted through these neural networks are concatenated to generate a global representation of the game state (State Representation in the diagram), which AlphaDev will use as a basis for deciding the suitable instruction to add to our algorithm.



Policy and value network

The second part of AlphaDev is another multi-layer neural network that takes as input the result of the previous one. From that input, the neural network then produces two new values:

The first is an estimate of how good that partial algorithm is - the accumulated rewards we expect to obtain if we start constructing our final algorithm from that partial one. The second value is the instruction our neural network deems suitable to add to our partial algorithm.

On each turn, to select the instruction to add to our algorithm, AlphaDev combines this neural network with a search algorithm called Monte Carlo Tree Search (MCTS). This algorithm runs thousands of simulations to see what would happen if the neural network chooses different instructions, continuing each simulation until the game ends and collecting relevant learning information.

Once these simulations finish, the neural network chooses the most relevant instruction of those explored, and uses the data obtained from the simulations for its learning.




After completing the learning process, AlphaDev managed to find two techniques that streamlined a sorting algorithm called sorting networks. These techniques, called AlphaDev swap move and AlphaDev copy move, simplify the sorting process by substituting operations with simpler ones or reusing already calculated values.

The improvements discovered by this research team have been incorporated into the LLVM compiler, used by millions of developers around the world. Given the ubiquity of this software tool, these enhancements are already being enjoyed by a wide audience in the systems we use every day.  

In addition, this milestone marks the opening of new research avenues for improving algorithms using artificial intelligence.



AI moves fast. We help you keep up with it.

Get a monthly selection of the most groundbreaking advances in the world of AI, top code repositories, and our best articles and tutorials. 

We hate SPAM. We will never sell your information, for any reason.