Abstract
Classical OWASP-inspired enumerations adequately describe immediate single-turn abuses but under-specify systemic hazards peculiar to chained LLM cognition: stochastic planning, nondeterministic tool selection, delegated authority without explicit bearer transfer, latent memory poisoning across sessions, cross-tenant side channels via retrieval indices, emergent covert channels through argument ordering entropy, adversarial MCP responses, and Byzantine failures where benign tools return maliciously truthful-but-misleading data. We propose an orthogonal taxonomy bridging attacker goals, coupling surfaces, temporal scope, observability primitives, and verifiability burdens to guide defenders prioritizing scarce runtime instrumentation budget.
1. Introduction
Agent frameworks collapse multiple trust domains into ephemeral execution envelopes. Existing categories (prompt injection, data poisoning) denote symptoms; practitioners need orthogonal axes that prescribe telemetry: which signals uniquely discriminate an attack hypothesis, minimizing false correlations when models legitimately resemble adversarial verbosity.
1.1 Motivation
Security tooling investment should align to irreducible ambiguity: attackers exploit regions where deterministic rules fail and probabilistic defenses disagree. Highlighting ambiguity classes focuses evaluation datasets and anomaly detectors where marginal utility is maximal.
2. Methodology
We synthesized public incident disclosures (where available), tabletop exercises across financial and healthcare assistants, MCP integration patterns, and gateway trace corpora distilled from lab reproductions—not customer production traffic. Labels were iterated until inter-rater Cohen κ > 0.75 on a held-out adjudication set spanning 620 synthetic multi-step transcripts.
2.1 Threat axes
- Goal plane: Confidentiality breach, Integrity / decision skew, Availability / economic burn, Sovereignty erosion (persistent policy bypass).
- Coupling: Input-only vs tool-return vs memory vs cross-agent orchestration.
- Temporal locality: Single hop, multi-session latent, scheduled triggers (time bombs), model-version drift exploiting alignment regression windows.
- Observability: Events visible at gateway egress vs suppressed until downstream consumer side effects manifest.
- Verifiability: Deterministic replay feasibility vs stochastic dependence on retrieval freshness.
3. Comparative notes
Unlike static malware taxonomies emphasizing file features, ours encodes stochastic cognitive affordances—the attacker optimizes gradients over conversational state rather than exploiting a patched buffer overflow. Accordingly, defenses emphasize runtime distribution shift monitoring and policy divergence detection over signature velocity alone.
4. Practitioner implications
- Prioritize coupling surfaces emitting weak structured logs—silent failures there dominate undetected dwell time.
- Instrument argument-level entropy deltas before global embedding drift alarms—localized spikes often precede macro anomalies.
- Treat cross-tenant retrieval indices as shared attacker-controlled broadcast media requiring isolation proofs.
- Require explicit downgrade contracts when stochastic sampling widens branching factor mid-run.
5. Limitations
Taxonomies risk false precision—real attackers blend classes. Labels inform prioritization—not replace human incident analysis. Emerging modalities (spatial reasoning in CAD copilots) may expose new coupling dimensions beyond current axes.
6. Conclusion
Runtime AI security hinges on structuring observables and ambiguous regions before vendor model updates shuffle empirical boundaries. Formalizing orthogonal threat dimensions lets organizations route investment toward monitors with highest discriminatory leverage while avoiding checklist theater.
References (selected)
- OWASP Top 10 for Large Language Model Applications — ongoing community releases.
- NIST AI Risk Management Framework (AI RMF 1.0) — lifecycle governance scaffolding.
- Industry multi-agent orchestration drafts (ACP / MCP interoperability notes) — tool coupling patterns.
- Public LLM incident postmortems (aggregated sanitized summaries)—temporal escalation motifs.