Google DeepMind Admits Large-Scale AI Agent Deployment Is Unsafe Today

Jun 24, 2026 05:58 PM - 2 hours ago 65

In a caller interview, Nenad Tomašev, Senior Staff Research Scientist astatine Google DeepMind, described the sorts of traps that malicious actors are mounting successful bid to return power of systems, return money, and jailbreak models without immoderate of it being visible to the mean user. Tomašev said this is already happening.

Agentic AI Agents At Scale Tips Them Toward Failure

Host Hannah Fry asked astir traps that malicious actors are mounting for AI agents and Tomašev responded that it’s true, group are mounting traps for AI agents successful bid to return advantage of them for criminal purposes. He remarked that complete reliability of each relationship is basal but that the standard of what’s happening tips it statistically toward failure.

Fry asked:

“Just looking astatine the different broadside of this, I besides wanna deliberation astir the benignant of cybersecurity constituent of this, because arsenic much and much agents are retired location interacting successful the world connected the net and truthful on, location are inevitably gonna beryllium group who are trying to utilization the vulnerabilities of agents.

Tell maine a small spot astir agentic traps that group are laying.”

Nenad Tomašev answered that the taxable is some scary and fascinating:

“This is simply a scary and a fascinating taxable astatine the aforesaid time, I would say. And I deliberation it’s 1 of the main reasons why these kinds of deployments astatine standard cannot work, right?

Because arsenic we said, if location is not complete reliability of individual interactions, immoderate strategy astatine standard that has galore interactions is people going to statistically fail.

And because these systems return a batch of compute and truthful power and money to run, if they’re not reliable, it’s conscionable a non-starter.

And agentic traps are thing that we person been reasoning astir for rather a while now. They tin manifest successful different ways.

There are galore types of traps, but it boils down to agents run wrong an environment. And successful this context, the situation is the web.

If the situation itself is poisoned, if the traps are laid, agents whitethorn stumble upon them erstwhile interacting pinch the web.

And past yes, malicious group aliases malicious agents deployed by malicious group tin spot those traps and past discuss systems really.”

Kinds Of Agentic Traps To Beware Of

Host Hannah Fry past asked Tomašev really these traps are group and Tomašev provided examples, remarking that the traps aren’t going to beryllium visible connected a website but are nevertheless disposable to AI agents. Some of what he described will sound acquainted to old-school SEOs who engaged successful things for illustration cloaking successful the early days of hunt engines.

Tomašev said that hidden tokens could beryllium hidden for AI agents to consume. Tokens successful this discourse is simply a reference to really AI breaks words into representations of words. When an AI sounds words connected a page what it does is to break it down into tokens. Hidden tokens could beryllium wholly invisible to humans.

He mentioned 3 ways that traps could beryllium group for AI agents:

Hidden tokens
Dynamic cloaking
Content that induces jailbreaking

Fry asked:

“So I don’t know, the benignant of the vino buying supplier for the wedding goes connected to a peculiar vino merchant wherever location is some, fundamentally a punctual injector successful the website that changes the agent’s goals? Is that the benignant of point that we’re talking astir here?”

Tomašev answered:

“That is 1 measurement this could happen, yes. And the logic why that whitethorn perchance spell unnoticed is, you know, successful position of really web pages are encoded, location are elements location that are conscionable not rendered visually.

So if we’re talking astir an supplier that isn’t a ocular machine usage supplier that sees the webpage, I mean, the pixels the aforesaid measurement a quality does, alternatively consumes the existent format of the page successful its earthy format, past it could inadvertently devour those hidden tokens that tin make it do different things than what the volition was, right?

But this is not the only measurement it whitethorn hap because what malicious websites could perchance do, they could do what we mention to arsenic move cloaking arsenic well, wherever they show pages otherwise for humans and agents.

Because you can, based connected the behaviour connected a page, make a very bully conjecture arsenic to whether it is simply a quality aliases it is an supplier interacting pinch the page. And past only if an supplier is interacting pinch the page pinch a circumstantial intent, do tweak the contented successful specified a measurement truthful arsenic to induce immoderate benignant of jailbreaking.”

Exploiting AI Agents To Steal Money From Humans

Tomašev confirmed that not only tin criminals bargain money from humans who are deploying AI agents, he confirms that it has already happened. He said that this benignant of criminal activity isn’t ever thing that is anticipated erstwhile testing a strategy retired successful a trusted situation but it becomes evident retired connected the web, which is not a trusted environment.

The big asked:

“But conscionable benignant of going a small spot further connected this, you could person agentic traps retired location that, I don’t know, are designed benignant of… return money from you to do each kinds of things.””

Tomašev answered:

“Yes, and this has happened to group who person experimented pinch agents and person fixed them entree to wallets, right, to do things.

As you say, successful the early days of this all, erstwhile we are particularly experimenting internally aliases anyone else’s, this is done successful a trusted environment. So you don’t necessarily, successful your early prototyping, person to woody pinch immoderate of this.

…but erstwhile you deploy connected the web, particularly now with, AI really being utilized successful each sorts of places, the much agents location are, the much incentives location are for malicious group to do malicious things because they person a higher aboveground area to target.”

The More AI Agents The Higher The Incentive

That past portion astir higher inducement to target AI agents makes sense. Systems that are utilized connected a ample standard quickly go targets for scammers and hackers, which is why systems for illustration WordPress and Windows are often targeted. What Tomašev indicates is erstwhile AI agents go much prevalent astatine standard we will astir apt statesman to spot much criminal activities focusing connected exploiting AI agents connected the web.