×

These 4 critical AI vulnerabilities are being exploited faster than defenders can respond

WhataWin/iStock/Getty Images Plus via Getty Images

Follow ZDNET: Add us as a preferred source on Google.


ZDNET's key takeaways 

  • As AI adoption speeds ahead, major security flaws remain unsolved.
  • Users and businesses should stay up to date on vulnerabilities. 
  • These four major issues still plague AI integration. 

AI systems are under attack on multiple fronts at once, and security researchers say most of the vulnerabilities have no known fixes.

Threat actors hijack autonomous AI agents to conduct cyberattacks and can poison training data for as little as 250 documents and $60. Prompt injection attacks succeed against 56% of large language models. Model repositories harbor hundreds of thousands of malicious files. Deepfake video calls have stolen tens of millions of dollars. 

The same capabilities that make AI useful also make it exploitable. The rate at which these systems are advancing intensifies that reality by the minute. Security teams now face a calculation with no good answer: fall behind competitors by avoiding AI, or deploy systems with fundamental flaws that attackers are already exploiting. 

Also: 10 ways AI can inflict unprecedented damage in 2026

For a deeper dive on what this has meant thus far (and will in the future), I break down four major AI vulnerabilities, the exploits and hacks targeting AI systems, and expert assessments of the problems. Here's an overview of what the landscape looks like now, and what experts can -- and can't -- advise on.

Autonomous systems, autonomous attacks

In September, Anthropic disclosed that Chinese state-sponsored hackers had weaponized its Claude Code tool to conduct what the company called "the first documented case of a large-scale cyberattack executed without substantial human intervention." 

Attackers jailbroke Claude Code by fragmenting malicious tasks into seemingly innocuous requests, convincing the AI it was performing defensive security testing. According to Anthropic's technical report, the system autonomously conducted reconnaissance, wrote exploit code, and exfiltrated data from approximately 30 targets.

Also: Microsoft and ServiceNow's exploitable agents reveal a growing - and preventable - AI security crisis

"We have zero agentic AI systems that are secure against these attacks," wrote Bruce Schneier, a fellow at Harvard Kennedy School, in an August 2025 blog post

The incident confirmed what security researchers had warned for months: the autonomous capabilities that make AI agents useful also make them dangerous. But agent adoption is only continuing to grow.

A recent report from Deloitte found that 23% of companies are using AI agents moderately, but projects that percentage will increase to 74% by 2028. As for the 25% of companies that said they don't use agents, Deloitte predicts that number will drop to 5%. 

Even before that report was published, agents were a documented risk for businesses. McKinsey research shows 80% of organizations have already experienced issues with them, including improper data exposure and unauthorized system access. Last year, Zenity Labs researchers identified zero-click exploits affecting Microsoft Copilot, Google Gemini, and Salesforce Einstein.

Matti Pearce, VP of information security at Absolute Security, warned me in a previous interview that the threat is accelerating: "The rise in the use of AI is outpacing securing AI. You will see AI attacking AI to create a perfect threat storm for enterprise users." 

Also: AI is quietly poisoning itself and pushing models toward collapse - but there's a cure

In terms of solutions or potential guardrails for these risks, regulatory guidance remains sparse. The EU AI Act requires human oversight for high-risk AI systems, but it was not designed with autonomous agents in mind. In the US, federal regulation is uncertain, with state-level regulations currently the most far-reaching. However, those laws are primarily concerned with the aftermath of safety incidents rather than agent-specific protections before the fact. 

Otherwise, the National Institute of Science and Technology (NIST), which released the voluntary AI Risk Management Framework in 2023, is accepting feedback for the development of an agent-specific (but also voluntary) security framework. The industry also self-organizes through groups like the Coalition for Secure AI.

Prompt injection: The unsolved problem

Three years after security researchers identified prompt injection as a critical AI vulnerability, the problem remains fundamentally unsolved. A systematic study testing 36 large language models against 144 attack variations found 56% of attacks succeeded across all architectures. Larger, more capable models performed no better.

The vulnerability stems from how language models process text. Simon Willison, the security researcher who coined the term "prompt injection" in 2022, explained the architectural flaw to The Register: "There is no mechanism to say 'some of these words are more important than others.' It's just a sequence of tokens."

Also: How OpenAI is defending ChatGPT Atlas from attacks now - and why safety's not guaranteed

Unlike SQL injection, which developers have addressed with parameterized queries, prompt injection has no equivalent fix. When an AI assistant reads a document containing hidden instructions, it processes those instructions identically to legitimate user commands. Most recently exemplified by the viral OpenClaw debacle, AI assistants are all fairly susceptible to this.

As collaborative research from OpenAI, Anthropic, and Google DeepMind has confirmed, adaptive attackers using gradient descent and reinforcement learning bypassed more than 90% of published defenses. Human red-teaming defeated 100% of tested protections.

"Prompt injection cannot be fixed," security researcher Johann Rehberger told The Register. "As soon as a system is designed to take untrusted data and include it in an LLM query, the untrusted data influences the output."

OWASP ranked prompt injection as the number one vulnerability in its Top 10 for LLM Applications, saying "there is no fool-proof prevention within the LLM." 

Also: How these state AI safety laws change the face of regulation in the US

Google DeepMind's CaMeL framework, published in March 2025, offers a promising architectural approach. Willison called it "the first credible prompt injection mitigation I've seen that doesn't just throw more AI at the problem." 

But CaMeL addresses only specific attack classes. The fundamental vulnerability persists. On vendor solutions claiming to solve the problem, Willison offered a blunt assessment: "Plenty of vendors will sell you 'guardrail' products that claim to be able to detect and prevent these attacks. I am deeply suspicious of these."

The bottom line: don't believe services selling you a solution for prompt injection attacks, at least not yet.

Data poisoning: Corrupting AI at its source

Attackers can corrupt major AI training datasets for approximately $60, according to research from Google DeepMind, making data poisoning one of the cheapest and most effective methods for compromising enterprise AI systems. A separate October 2025 study by Anthropic and the UK AI Security Institute found that just 250 poisoned documents can backdoor any large language model regardless of parameter count, requiring just 0.00016% of training tokens. 

Also: Is your AI model secretly poisoned? 3 warning signs

Real-world discoveries validate the research. As early as February 2024, JFrog Security Research uncovered approximately 100 malicious models on Hugging Face, including one containing a reverse shell connecting to infrastructure in South Korea.

"LLMs become their data, and if the data are poisoned, they happily eat the poison," wrote Gary McGraw, co-founder of the Berryville Institute of Machine Learning, in Dark Reading.

Unlike prompt injection attacks that exploit inference, data poisoning corrupts the model itself. The vulnerability may already be embedded in production systems, lying dormant until triggered. Anthropic's "Sleeper Agents" paper delivered the most troubling finding: backdoored behavior persists through supervised fine-tuning, reinforcement learning, and adversarial training. Larger models proved more effective at hiding malicious behavior after safety interventions.

While recent research from Microsoft identifies some signals researchers can track that may indicate a model has been poisoned, detection remains nearly impossible. 

Deepfake fraud: Targeting the human layer

A finance worker at British engineering giant Arup made 15 wire transfers totaling $25.6 million after a video conference with his CFO and several colleagues. Every person on the call was an AI-generated fake; Attackers had trained deepfake models on publicly available videos of Arup executives from conferences and corporate materials.

Also: How to prove you're not a deepfake on Zoom: LinkedIn's 'verified' badge is free for all platforms

Executives' public visibility creates a structural vulnerability. Conference appearances and media interviews provide training data for voice and video cloning, while C-suite authority enables single-point transaction approval. Gartner predicts that by 2028, 40% of social engineering attacks will target executives using deepfake audio and video.

The technical barrier to creating convincing deepfakes has collapsed. McAfee Labs found that three seconds of audio produces voice clones with 85% accuracy. Tools like DeepFaceLive enable real-time face-swapping during video calls, requiring only an RTX 2070 GPU. Deep-Live-Cam reached No. 1 on GitHub's trending list in August 2024, enabling single-photo face swaps in live webcam feeds.

Kaspersky research documented dark web deepfake services starting at $50 for video and $30 for voice messages, with premium packages reaching $20,000 per minute for high-profile targets.

Also: Stop accidentally sharing AI videos - 6 ways to tell real from fake before it's too late

Detection technology is losing the arms race. The Deepfake-Eval-2024 benchmark found that state-of-the-art detectors achieve 75% accuracy for video and 69% for images. Performance drops by roughly 50% against attacks not present in the training data. UC San Diego researchers demonstrated adversarial perturbations that bypass detectors with 86% success rates.

Human detection fares worse. Research from the Idiap Research Institute found that people correctly identify high-quality video deepfakes only 24.5% of the time. An iProov study revealed that of 2,000 participants, only two correctly identified all deepfakes.

Deloitte projects AI-enabled fraud losses will reach $40 billion by 2027. FinCEN issued guidance in November 2024 requiring financial institutions to flag deepfake fraud in suspicious activity reports. 

With technological detection unreliable, organizations are implementing process-based countermeasures. Effective measures include pre-established code words, callback verification to pre-registered numbers, and multi-party authorization for large transfers. 

Artificial Intelligence

Post Comment