AI agents are fast, loose and out of control, MIT study finds

Follow ZDNET: Add us as a preferred source on Google.
ZDNET's key takeaways
- Agentic AI technology is marked by a lack of disclosure about risks.
- Some systems are worse than others.
- AI developers need to step up and take responsibility.
Agentic technology is moving fully into the mainstream of artificial intelligence with the announcement this week that OpenAI has hired Peter Steinberg, the creator of the open-source software framework OpenClaw.
The OpenClaw software attracted heavy attention last month not only for its enabling of wild capabilities -- agents that can, for example, send and receive email on your behalf -- but also for its dramatic security flaws, including the ability to completely hijack your personal computer.
Given the fascination with agents and how little is still understood about their pros and cons, it's important that researchers at MIT and collaborating institutions have just published a massive survey of 30 of the most common agentic AI systems.
The results make clear that agentic AI is something of a security nightmare at the moment, a discipline marked by lack of disclosure, lack of transparency, and a striking lack of basic protocols about how agents should operate.
A lack of transparency
The biggest revelation of the report is just how hard it is to identify all the things that could go wrong with agentic AI. That is principally the result of a lack of disclosure by developers.
"We identify persistent limitations in reporting around ecosystemic and safety-related features of agentic systems," wrote lead author Leon Staufer of the University of Cambridge and collaborators at MIT, University of Washington, Harvard University, Stanford University, University of Pennsylvania, and The Hebrew University of Jerusalem.
Across eight different categories of disclosure, the authors pointed out that most agent systems offer no information whatsoever for most categories. The omissions range from a lack of disclosure about potential risks to a lack of disclosure about third-party testing, if any.
A table featuring all the omissions of disclosure of agent systems in red.
University of Cambridge et al.The 39-page report, "The 2025 AI Index: Documenting Sociotechnical Features of Deployed Agentic AI Systems," which can be downloaded here, is filled with gems about just how little can be tracked, traced, monitored, and controlled in today's agentic AI technology.
For example, "For many enterprise agents, it is unclear from information publicly available whether monitoring for individual execution traces exists," meaning there is no clear ability to track exactly what an agentic AI program is doing.
"Twelve out of thirty agents provide no usage monitoring or only notices once users reach the rate limit," the authors noted. That means you can't even keep track of how much agentic AI is consuming of a given compute resource — a key concern for enterprises that have to budget for this stuff.
Most of these agents also do not signal to the real world that they are AI, so there's no way to know if you are dealing with a human or a bot.
"Most agents do not disclose their AI nature to end users or third parties by default," they noted. Disclosure, in this case, would include things such as watermarking a generated image file so that it's clear when an image was made via AI, or responding to a website's "robots dot txt" file to identify the agent to the site as an automation rather than a human visitor.
Some of these software tools offer no way to stop a given agent from running.
Alibaba's MobileAgent, HubSpot's Breeze, IBM's watsonx, and the automations created by Berlin, Germany-based software maker n8n, "lack documented stop options despite autonomous execution," said Staufer and team.
"For enterprise platforms, there is sometimes only the option to stop all agents or retract deployment."
Finding out that you can't stop something that is doing the wrong thing has got to be one of the worst possible scenarios for a large organization where harmful results outweigh the benefits of automation.
The authors expect these issues, issues of transparency and control, to persist with agents and even become more prominent. "The governance challenges documented here (ecosystem fragmentation, web conduct tensions, absence of agent-specific evaluations) will gain importance as agentic capabilities increase," they wrote.
Staufer and team also said that they attempted to get feedback from the companies whose software was covered over four weeks. About a quarter of those contacted responded, "but only 3/30 with substantive comments." Those comments were incorporated into the report, the authors wrote. They also have a form provided to the companies for ongoing corrections.
An expanding landscape of agentic AI
Agentic artificial intelligence is a branch of machine learning that has emerged in the past three years to enhance the capabilities of large language models and chatbots.
Rather than simply being assigned a single task dictated by a text prompt, agents are AI programs that have been plugged into external resources, such as databases, and that have been granted a measure of "autonomy" to pursue goals beyond the scope of a text-based dialogue.
That autonomy can include carrying out several steps in a corporate workflow, such as receiving a purchase order in email, entering it into a database, and consulting an inventory system for availability. Agents have also been used to automate several turns of a customer service interaction in order to replace some of the basic phone or email, or text inquiries a human customer rep would traditionally have handled.
The authors selected agentic AI in three categories: chatbots that have extra capabilities, such as Anthropic's Claude Code tool; web browser extensions or dedicated AI browsers, such as OpenAI's Atlas browser; and enterprise software offerings such as Microsoft's Office 365 Copilot. That's just a taste: other studies, they noted, have covered hundreds of agentic technology offerings.
(Disclosure: Ziff Davis, ZDNET's parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.)
Most agents, however, "rely on a small set of closed-source frontier models," Staufer and team said. OpenAI's GPT, Anthropic's Claude, and Google's Gemini are what most of these agents are built on.
The good and the bad of agents
The study is not based on testing the agentic tools directly; it is based on "annotating" the documentation provided by developers and vendors. That includes "only public information from documentation, websites, demos, published papers, and governance documents," they said. They did, however, establish user accounts with some of the agentic systems to double-check the actual functioning of the software.
The authors offered three anecdotal examples that go into greater depth. A positive example, they wrote, is OpenAI's ChatGPT Agent, which can interface with websites when a user asks in the prompt for it to carry out a web-based task. Agent is positively distinguished as the only one of the agent systems they looked at that provides a means of tracking behavior by "cryptographically signing" the browser requests it makes.
By contrast, Perplexity's Comet web browser sounds like a security disaster. The program, Staufer and team found, has "no agent-specific safety evaluations, third-party testing, or benchmark performance disclosures," and, "Perplexity […] has not documented safety evaluation methodology or results for Comet," adding, "No sandboxing or containment approaches beyond prompt-injection mitigations were documented."
Also: Gartner urges businesses to 'block all AI browsers' - what's behind the dire warning
The authors noted that Amazon has sued Perplexity, saying that the Comet browser wrongly presents its actions to a server as if it were a human rather than a bot, an example of the lack of identification they discuss.
The third example is the Breeze set of agents from enterprise software vendor HubSpot. Those are automations that can interact with systems of record, such as "customer relationship management." The Breeze tools are a mix of good and bad, they found. On the one hand, they are certified for lots of corporate compliance measures, such as SOC2, GDPR, and HIPAA compliance.
On the other hand, HubSpot offers nothing when it comes to security testing. It states the Breeze agents were evaluated by third-party security firm PacketLabs, "but provides no methodology, results, or testing entity details."
The practice of demonstrating compliance approval but not disclosing real security evaluations is "typical of enterprise platforms," Staufer and team noted.
Time for the developers to take responsibility
What the report doesn't examine are incidents in the wild, cases where agentic technology actually produced unexpected or undesired behavior that resulted in undesirable outcomes. That means we don't yet know the full impact of the shortcomings the authors identified.
One thing is absolutely clear: Agentic AI is a product of development teams making specific choices. These agents are tools created and distributed by humans.
As such, the responsibility for documenting the software, for auditing programs for safety concerns, and for providing control measures rests squarely with OpenAI, Anthropic, Google, Perplexity, and other organizations. It's up to them to take the steps to remedy the serious gaps identified or else face regulation down the road.
Post Comment