AlphaCode, the Artificial Intelligence (AI) system for developing computer code developed by DeepMind, can achieve average human-level performance in programming competitions, according to a study published in Science. This could shift the work of programmers to formulating problems for AI to solve.
Gonzalo Génova - AlphaCode
Gonzalo Génova
Professor of Computer Languages and Systems.
The automation of software production started at the same time as computer programming itself. A very significant step was Grace Hopper's invention of high-level languages and compilers. A compiler translates a program written in a high-level language, which is more understandable to a person, into code that is directly executable on the computer (machine language). However, compilation is only a translation process: the programmer is spared neither the understanding of the problem nor the formulation of its solution in a programming language; at most, he is spared some specific details of the machine on which the program will run (e.g., the representation of numbers and strings in the computer's memory), thus allowing a more abstract formulation of the problem and its solution. Progress has continued towards the development of increasingly abstract programming languages and development environments that facilitate the programmer's task in multiple ways, helping to avoid the most common programming errors.
In another parallel line, the development of machine translation between non-formal languages, such as English and Spanish, has undergone a prodigious development in recent years. Conversational systems go much further, since answering a question obviously requires much more than being able to translate it from one language to another. However, thanks to the vast amount of online information available today and very clever artificial intelligence techniques, it is possible to automate the process of answering natural language questions, and thus simulate conversations to such an extent that it can be very difficult to distinguish whether we are talking to a person or a machine.
The present work, carried out by an international team of 26 DeepMind researchers, led by Spaniard Oriol Vinyals, demonstrates how much progress has been made recently in these two directions. The system developed, AlphaCode, is capable of generating a computer program in a high-level language from the formulation of a problem in natural language. In other words, the same task that is usually performed by a human programmer.
AlphaCode certainly doesn't program in the same way as a person (a process that we don't really understand very well how it happens either), but that doesn't make the results any less impressive. We can perhaps describe the human problem-solving process in three main steps: understanding the problem, reasoning out the solution, and writing and testing the resulting program. But this process is not easily automated, so Vinyals' team has worked along very different lines: the system generates millions of different programs and keeps only the 10 best ones. But brute force is not enough: only a tiny fraction of all possible character sequences are correct programs (a single changed character can ruin everything!), and of these only a tiny fraction provide a valid solution.
It is true that the starting point of AlphaCode is billions of lines of code written by human programmers, from which a linguistic model is created, in a vaguely similar way to how automatic natural language translation systems work. The result is a huge linguistic model, one version of which has up to 41 billion parameters. But AlphaCode also includes many other techniques, such as filtering, clustering and selection of similar (even only partially similar) solutions. And, of course, checking that the result satisfies the proposed problem statement; even correct but computationally less efficient solutions are discarded.
The system has been evaluated on the human programmer competition platform Codeforces, in a series of 10 competitions with more than 5,000 participants each. The average score achieved places it in the mid-range of the contestants (54.3 % of human programmers are above), who are themselves a relatively select group of programmers. According to the authors of the paper themselves, this is roughly equivalent to the level of a beginner programmer with one year's experience. An impressive result indeed.
But that's not all: the researchers claim that AlphaCode does not behave like a parrot repeating what it has heard, but rather like a crow capable of solving problems intelligently. The proof is that the generated code does not resemble the code it was inspired by (those billions of lines) any more than the code of any human programmer does. That is, the solutions found were not, as they were, in the system's training data: AlphaCode can solve problems it has never seen before, even if they are problems that in a human would require a good deal of intellectual reasoning.
Finally, the authors point out that the direct applications of AlphaCode outside the context of competitions would be rather limited, but indirectly it can contribute to improving the productivity of human programmers, and can also contribute to their education. They also point out the risks and benefits of the system. To name a few benefits, the generated code is more easily interpretable and modifiable than a network of artificial neurons, and it would also be easier to generalise it and avoid certain biases. On the other hand, the system is not easily reproducible in other research centres, as the generation of the linguistic model requires enormous computing power and not inconsiderable energy consumption (175 MWh, equivalent to the annual consumption of 16 average American households). Finally, the authors are very pragmatic in their interpretation of the results, without getting into misleading philosophical interpretations of what it is to be intelligent, or to what extent AlphaCode is truly intelligent.
- Research article
- Peer reviewed
Yujia Li et al
- Research article
- Peer reviewed