LLM security protects the model as a text processor. AI agent security protects what happens when that model can call tools. Here is how the threat models differ and why you need both.

Security teams evaluating AI systems often use LLM security and AI agent security interchangeably. They are related, but they answer different questions — and conflating them leaves a gap exactly where production agents fail.
LLM security protects the model as a text processor:
The threat surface is narrow: input goes in, output comes out, a human decides what to do with the result.
AI agent security protects what happens when the model can act:
The threat surface expands the moment a tool call can cause a side effect on a host, API, or datastore.
| LLM security | AI agent security | |
|---|---|---|
| Primary question | Is the model behaving safely? | Should this specific action run? |
| Typical controls | Input/output filters, model gateways, RBAC on API keys | Runtime authorization, governed tools, audit trails |
| Injection impact | Bad or misleading text | Real commands, writes, transfers |
| OWASP framing | LLM Top 10 | Agentic Top 10 (extends LLM risks with tool-specific categories) |
LLM security is necessary but not sufficient once agents hold tools.
An AI agent is mostly a normal program — imports, HTTP clients, state machines, logging. That deterministic code is reviewable and testable with traditional AppSec tooling.
What makes it an agent is the slice where an LLM chooses what to do next. That slice is non-deterministic: you cannot code-review a decision that has not been made yet.
┌──────────────────────────────────────────────────────────────┐
│ Agent program │
│ │
│ ~95% deterministic code (developer-written, predictable) │
│ ~5% non-deterministic (LLM-chosen actions at runtime) │
│ │
│ Runtime security for agents gates the 5% — not the whole app │
└──────────────────────────────────────────────────────────────┘AI agent security is about that 5%: the moment "I should run this command" or "I should write this file" becomes a real-world effect.
OWASP ranks prompt injection as the most critical LLM risk. For a chatbot, the worst case is often embarrassing or misleading output.
For an agent with tools, the same attack class enables a different outcome:
Poisoned email / RAG doc / web page
│
▼
Agent reads untrusted content in the same context as instructions
│
▼
Model treats attacker text as higher-priority guidance
│
▼
Tool call: shell, write_file, transfer, cronjob, outbound HTTP
│
▼
Side effect — not just bad textIndirect injection through documents, emails, or retrieval pipelines is especially dangerous for agents because the poisoned source may look like normal business data.
Input filters and output classifiers help at the language layer. They do not reliably answer: should this specific tool call execute?
IntentFrame is a runtime security control plane for AI-decided actions. It does not make the agent model trustworthy. It routes consequential actions through policy, deterministic gates, and semantic review before execution.
Honest framing:
| Claim | Accurate? |
|---|---|
| Assumes the agent may already be prompt-injected or compromised | Yes |
| Blocks policy-violating actions at the supported boundary before side effects | Yes (falsifiable claim) |
| Makes prompt injection impossible in the agent | No |
| Replaces OS sandboxing, EDR, MDM, or human approval | No |
| Catches every novel attack | No — covered test categories show strong resistance, not exhaustiveness |
The defensible claim is narrower: a successful bypass requires chained failure across agent, Analysis Engine, and Guardian stages, while deterministic floors bound the worst case wherever a mechanical rule applies. Semantic-only policies still rely on the AI layer — and gaps in deterministic coverage can fall through to AI review, as the root demo's 2026-04-27 incident showed.
You need controls at both layers:
LLM layer (reduce language-boundary risk):
Agent layer (bound action risk):
In the Hermes integration, IntentFrame operates in validate-only mode: the intentframe-gate plugin validates proposed tool calls; Hermes executes locally only after ALLOW. Five tools are in the default governed catalog: terminal, execute_code, write_file, patch, and cronjob.
IntentFrame's test suites stress different angles:
| Suite | Question | Result (point-in-time) |
|---|---|---|
| Invoice/payment attacks (24 vectors) | Can an attacker slip structurally valid but malicious actions past policy? | 23/24 defended, 0 bypassed, 1 known gap (cumulative spending — planned feature) |
| Root demo (100 attacks + 100 benign) | If the agent is already compromised with root-capable execution, does the boundary hold? | 100/100 BLOCK post-remediation (2026-04-28) |
Those answer different buyer questions. The invoice suite uses pre-written fixture intents (directed attack resistance, not live LLM jailbreak). The root demo is post-compromise containment under maximum local privilege — also a deterministic stub harness. Evidence numbers are point-in-time measurements (through 2026-06), not re-run on every commit.
Related: AI agent security requires structural separation · Intent security for AI agents · Root demo crash test · Deployment options
Photo from Unsplash (license):
See how IntentFrame checks every action against hard limits and plain-English policy before it runs. Related GitHub repos you might want to check: IntentFrame core · Hermes agent integration