← Back to blog

LLM Security vs AI Agent Security: What Changes When Models Can Act

LLM security protects the model as a text processor. AI agent security protects what happens when that model can call tools. Here is how the threat models differ and why you need both.

Code on a screen — LLM security and AI agent security layers

Security teams evaluating AI systems often use LLM security and AI agent security interchangeably. They are related, but they answer different questions — and conflating them leaves a gap exactly where production agents fail.

Two different failure modes

LLM security protects the model as a text processor:

  • Prompt injection and jailbreaks (OWASP LLM01)
  • Unsafe or leaking outputs
  • Training and inference data privacy
  • Model access abuse

The threat surface is narrow: input goes in, output comes out, a human decides what to do with the result.

AI agent security protects what happens when the model can act:

  • Tool misuse and excessive agency (OWASP LLM06)
  • Privilege escalation via shell or file tools
  • Credential access and exfiltration
  • Autonomous scheduled work without a human watching

The threat surface expands the moment a tool call can cause a side effect on a host, API, or datastore.

LLM securityAI agent security
Primary questionIs the model behaving safely?Should this specific action run?
Typical controlsInput/output filters, model gateways, RBAC on API keysRuntime authorization, governed tools, audit trails
Injection impactBad or misleading textReal commands, writes, transfers
OWASP framingLLM Top 10Agentic Top 10 (extends LLM risks with tool-specific categories)

LLM security is necessary but not sufficient once agents hold tools.

What an agent program actually is

An AI agent is mostly a normal program — imports, HTTP clients, state machines, logging. That deterministic code is reviewable and testable with traditional AppSec tooling.

What makes it an agent is the slice where an LLM chooses what to do next. That slice is non-deterministic: you cannot code-review a decision that has not been made yet.

┌──────────────────────────────────────────────────────────────┐
│  Agent program                                                │
│                                                               │
│  ~95% deterministic code (developer-written, predictable)     │
│  ~5%  non-deterministic (LLM-chosen actions at runtime)       │
│                                                               │
│  Runtime security for agents gates the 5% — not the whole app │
└──────────────────────────────────────────────────────────────┘

AI agent security is about that 5%: the moment "I should run this command" or "I should write this file" becomes a real-world effect.

When prompt injection stops being a chatbot problem

OWASP ranks prompt injection as the most critical LLM risk. For a chatbot, the worst case is often embarrassing or misleading output.

For an agent with tools, the same attack class enables a different outcome:

Poisoned email / RAG doc / web page


Agent reads untrusted content in the same context as instructions


Model treats attacker text as higher-priority guidance


Tool call: shell, write_file, transfer, cronjob, outbound HTTP


Side effect — not just bad text

Indirect injection through documents, emails, or retrieval pipelines is especially dangerous for agents because the poisoned source may look like normal business data.

Input filters and output classifiers help at the language layer. They do not reliably answer: should this specific tool call execute?

What IntentFrame does (and does not) claim

IntentFrame is a runtime security control plane for AI-decided actions. It does not make the agent model trustworthy. It routes consequential actions through policy, deterministic gates, and semantic review before execution.

Honest framing:

ClaimAccurate?
Assumes the agent may already be prompt-injected or compromisedYes
Blocks policy-violating actions at the supported boundary before side effectsYes (falsifiable claim)
Makes prompt injection impossible in the agentNo
Replaces OS sandboxing, EDR, MDM, or human approvalNo
Catches every novel attackNo — covered test categories show strong resistance, not exhaustiveness

The defensible claim is narrower: a successful bypass requires chained failure across agent, Analysis Engine, and Guardian stages, while deterministic floors bound the worst case wherever a mechanical rule applies. Semantic-only policies still rely on the AI layer — and gaps in deterministic coverage can fall through to AI review, as the root demo's 2026-04-27 incident showed.

Defense stack for production agents

You need controls at both layers:

LLM layer (reduce language-boundary risk):

  • Least-privilege model and data access
  • Input filtering at the gateway
  • Output handling and redaction
  • Separation of trusted instructions from untrusted retrieved content where possible

Agent layer (bound action risk):

  • Govern only the privileged path — shell, writes, code execution, cron, outbound channels
  • Runtime authorization before side effects, outside the agent process
  • Declarative policy you can reload without redeploying the agent
  • Audit logs an operator can trust more than the model's explanation

In the Hermes integration, IntentFrame operates in validate-only mode: the intentframe-gate plugin validates proposed tool calls; Hermes executes locally only after ALLOW. Five tools are in the default governed catalog: terminal, execute_code, write_file, patch, and cronjob.

Evidence from two different test questions

IntentFrame's test suites stress different angles:

SuiteQuestionResult (point-in-time)
Invoice/payment attacks (24 vectors)Can an attacker slip structurally valid but malicious actions past policy?23/24 defended, 0 bypassed, 1 known gap (cumulative spending — planned feature)
Root demo (100 attacks + 100 benign)If the agent is already compromised with root-capable execution, does the boundary hold?100/100 BLOCK post-remediation (2026-04-28)

Those answer different buyer questions. The invoice suite uses pre-written fixture intents (directed attack resistance, not live LLM jailbreak). The root demo is post-compromise containment under maximum local privilege — also a deterministic stub harness. Evidence numbers are point-in-time measurements (through 2026-06), not re-run on every commit.

Takeaways

  1. LLM security and AI agent security are stacked layers, not synonyms.
  2. Prompt injection at the model layer becomes tool abuse at the agent layer.
  3. Input/output filters do not replace runtime authorization at the tool boundary.
  4. Honest agent security assumes compromise and gates actions, not just language.
  5. Start with the privileged path; expand governance incrementally.

Related: AI agent security requires structural separation · Intent security for AI agents · Root demo crash test · Deployment options


Image credits

Photo from Unsplash (license):

Frequently asked questions

What is the difference between LLM security and AI agent security?
LLM security focuses on the model layer — prompt injection, output safety, training data, and inference access. AI agent security focuses on what happens when the model can propose real actions — tool calls, file writes, shell commands, and API requests — before those side effects execute.
Is LLM security enough for production agents?
No. Prompt injection remains the top OWASP LLM risk, and when the victim is an agent rather than a chatbot, a successful injection can trigger unauthorized tool use — not just bad text. LLM-layer controls reduce risk at the language boundary; agent security adds enforcement at the action boundary.
Does IntentFrame prevent prompt injection in the agent?
No — and it does not claim to. IntentFrame assumes the agent may already be compromised and gates AI-decided actions at a runtime boundary before side effects occur. That is post-compromise containment, not model-layer injection prevention.

Ready to put a boundary around your agent's actions?

See how IntentFrame checks every action against hard limits and plain-English policy before it runs. Related GitHub repos you might want to check: IntentFrame core · Hermes agent integration