Reacción a "Study claims current AI systems are already capable of tricking and manipulating humans"

Michael Rovatsos

Professor of Artificial Intelligence, University of Edinburgh

The anthropomorphisation of AI systems in the article, which talks about things like ‘sycophancy’ and ‘betrayal’, is not helpful. AI systems will try to learn to optimise their behaviour using all available options, they have no concept of deception and no intention of doing so. The only way to avoid cheating is for their designers to remove it as an option.

In strategic games, what is misleadingly called ‘cheating’ is in many cases entirely compatible with the rules of those games: bluffing is as common in poker as backstabbing in Diplomacy between humans. What is critical is that human players know that they can be bluffed in these games, and if they play against the AI they should know that the AI can also bluff them.

Malicious uses of AI will undoubtedly benefit from its cheating capabilities, which is why they need to be outlawed and efforts made to identify violations, just as detecting fraud, bribery and counterfeiting comes at a cost to society. It is important to mandate that human users know when they are interacting with an AI system, whether it can fool them or not.

I am not so convinced that the ability to deceive creates a risk of ‘loss of control’ over AI systems, if appropriate rigour is applied to their design; the real problem is that this is not currently the case and systems are released into the market without such safety checks. The discussion of the long-term implications of deceptive capabilities raised in the article is highly speculative and makes many additional assumptions about things that may or may not happen in the future.

Language EN