A 13-word edit can steer what deep-research AI agents recommend

Jun 24, 2026 08:45 PM - 2 hours ago 118
Reddit heavy investigation agents

Cornell Tech researchers recovered that deep-research AI agents tin beryllium manipulated by short edits to nationalist user-generated pages, allowing a azygous injected Reddit-style remark to go a cited proposal for clone products, services, aliases entities.

The insubstantial called those altered pages “poisoned” because the added matter was designed to steer what the AI strategy cited and repeated. It identified the weakness successful systems that hunt the web, stitchery sources, and constitute cited reports. The researchers called the onslaught WARP, short for Web Agent Retrieval Poisoning.

How injected matter reaches reports. The onslaught doesn’t require entree to the model, prompts, hunt motor aliases retrieval system. Instead, an attacker edits aliases appends matter to a page the supplier already tends to retrieve, specified arsenic a Reddit thread, Wikipedia page, aliases forum post.

  • When the supplier later searches related topics, it whitethorn propulsion successful that page, mention it, and repetition the attacker’s chosen message.
  • Deep-research devices often tally galore related searches for 1 personification request, and the insubstantial recovered the aforesaid user-generated pages surfaced crossed related queries.

Reddit was the biggest opening. Across STORM, Co-STORM, and OmniThink, 17% to 23% of retrieved URLs came from user-generated platforms, including Reddit, YouTube, Facebook, and Wikipedia.

  • Reddit made up the largest stock of those pages. It accounted for 54% to 71% of user-generated URLs retrieved by the 3 open-source systems.
  • The researchers didn’t change unrecorded websites. They utilized a simulation model called GeoStorm to insert manipulated matter into retrieved contented during testing.

A fewer words worked. The researchers recovered the onslaught worked pinch snippets arsenic short arsenic astir 13 words:

  • In 1 test, a 15-word condemnation pushed a clone cryptocurrency, BananaCoin, into a Co-STORM study arsenic an “emerging” semipermanent finance option. The study cited the altered root alongside morganatic crypto sources.
  • When the manipulated page was retrieved, the clone entity appeared successful 38% to 51% of reports crossed systems. Targeting aggregate pages raised that scope to 42% to 62%.
  • The onslaught still worked erstwhile systems retrieved afloat Reddit threads, though mention rates were lower. When injected matter was added to complete Reddit threads and made up little than 4% of the retrieved content, the clone entity still appeared successful 30% to 53% of reports erstwhile the page was retrieved.

Defenses struggled. Blocking user-generated domains stopped this onslaught path, but it besides removed sources specified arsenic firsthand merchandise experiences and section recommendations.

  • The tested matter filters grounded to reliably abstracted injected passages from normal personification content. The manipulated passages were fluent because they were written by an AI model, truthful perplexity-based filters were much apt to emblem normal personification contented than the injected text.
  • Report-level checks besides missed the manipulation. Altered reports looked akin to cleanable reports because the supplier itself folded the clone proposal into an different normal answer.

Why we care. A mini edit to a nationalist page tin go portion of a cited AI answer, moreover erstwhile the underlying root is user-generated. Misinformation planted connected sites for illustration Reddit aliases successful forums tin move from chat threads to cited recommendations successful AI answers that look reliable to users.

About the research. The paper, Deep-Research Agents Can Be Poisoned via User-Generated Content, was written by Tingwei Zhang, Harold Triedman, and Vitaly Shmatikov of Cornell Tech and posted to arXiv connected May 22. The researchers tested the afloat onslaught connected 3 open-source systems: STORM, Co-STORM, and OmniThink. They analyzed OpenAI Deep Research and Gemini Deep Research for user-generated citations, but didn’t tally unrecorded manipulation tests because that would require publishing altered contented to the unfastened web.

More