Security & Compliance for Agent Workspaces: Identity, Injection, and Audit (Reality, Not Theater)

Most “agent security” decks hand-wave identity and provenance. Attackers don’t. They come through indirect prompt injection, standing privileges, and trusted connectors.

Identity-first design (non-negotiable)

Per-agent service accounts: no shared human creds.
Short-lived tokens with rotation; deny by default.
Scopes per tool: read vs write, resource-bound, time-bound.
JIT elevation for high-risk writes with human approval.

Policy before power

Evaluate who/what/why/where before every side-effecting tool call.
Maintain a risk matrix (R1–R4): auto for R1–R2; human for R3; dual control for R4.
Emit policy verdicts into the audit log (allow/deny/reason).

Defense-in-depth for indirect injection

Content isolation: sandbox retrieval; strip active content; canonicalize.
Provenance: attach source + hash to every retrieved chunk.
Allow-list tools: only explicitly approved tools are callable.
Output binding: sign artifacts; link to decision records and inputs.

Audit that survives daylight

Evidence bundles per action: tool manifest/version, inputs/outputs, policy verdict, artifacts, hashes.
Export to SIEM/APM in near real time; store long-term snapshots.
Reproducibility: re-run with the same tool versions and inputs (or show why you can’t).

Incident playbook (condensed)

Freeze credentials for implicated agents/tools.
Replay the run from logs (state + tools + artifacts).
Trace all side-effecting writes; roll back or supersede.
Report with timeline, impact, and preventive changes.
Harden: add tests/policies targeting the exact failure class.

Minimal reference architecture

flowchart TD
  U[User / Trigger] --> A[Agent Runtime]
  A --> P[Policy Engine]
  A --> T[Tool Bus (MCP/Actions)]
  P -->|verdict| A
  T --> S[Sandboxed Executors]
  S --> L[Audit Sink]
  A --> L
  T --> L

Subtext: Stacks that unify identity, policy, and audit at the tool boundary turn scary demos into trustworthy systems.