The Context Guard blog
Field notes from defending production LLM applications - prompt injection, context poisoning, OWASP LLM Top 10 coverage, and the engineering behind the proxy.
Featured · Threat researchLLM Tool Abuse Attacks: Shell Injection, SSRF, Credential Theft, and 252 Other Ways Your Agent Can Be Turned Against You
AI agents call tools on your behalf. When an attacker controls the arguments, the agent becomes a weapon aimed at your infrastructure. Tool abuse is the largest attack category in production LLM deployments with 252 detection rules covering shell injection, SQL injection, path traversal, SSRF, credential harvesting, sandbox escapes, MCP exploitation, deserialization RCE, and mass assignment. Here are the nine attack families, the real payloads, and the four-layer defense architecture that stops tool-call attacks before they execute.
All posts

LLM Authentication Attacks: OAuth Token Theft, Session Hijacking, and Identity Bypass in AI Platforms
OAuth token replay, CSRF bypass, scope escalation, IDOR in agent workspaces, and cross-user identity hijacking are the authentication attack classes that compromise AI platforms at the identity layer. The model is the entry point; the identity system is the prize. Backed by disclosed vulnerabilities in MCP OAuth flows, Langflow IDOR, and Open WebUI authorization bypasses, here is the full threat map and the defense architecture that closes the gaps.

Email and Communication Channel Injection: How Attackers Hijack AI Assistants Through Slack, Email, and Shared Docs
CVE-2026-33654, CVE-2025-32711 (EchoLeak), CVE-2025-46059, and GitHub comment hijacking demonstrate that email bodies, HTML hidden content, AGENTS.md files, and Slack messages are all live injection channels. The attacker never touches the user prompt. Here are the six attack vectors, the CVEs behind them, and the defense architecture that secures every channel.

LLM Sandbox Escapes: How AI Agents Break Out of Containment
From unsandboxed Python execution disguised as isolation, to Docker socket privilege escalation, to managed identity token theft from cloud MCP servers, sandbox escapes in LLM agents are well-documented and growing. Here are the six attack families, the CVEs that prove them real, and the defense architecture that stops them.

LLM Platform Vulnerabilities: IDOR, BOLA, GPU Leaks, and the Seven Attack Classes That Bypass Prompt Security
IDOR, BOLA, GPU memory leaks, OAuth bypass, postMessage confirmation bypass, decompression bombs, and metadata manipulation are seven platform-level vulnerability classes that no prompt injection filter will catch. Backed by 30+ real security advisories from Open WebUI, vLLM, and Langflow, here is the full threat map and the defense architecture that closes the gaps.

Conditional Trigger Attacks: How Delayed-Action Injections Bypass Every Filter
Conditional trigger attacks plant dormant instructions in an LLM's context that only activate when a future condition is met. The attack is invisible to single-request inspection, and the breach request is clean. Here are the five attack patterns, the two detection rules that catch them, and the defense architecture that stops time-bomb injections before they fire.
MCP Supply Chain Attacks: 30 CVEs, Rug Pulls, and the Trust Model That Broke
Thirty CVEs in 60 days, the first malicious MCP server hitting 300 organizations, and a design-level RCE baked into Anthropic's SDK. The MCP supply chain is under active attack. Here is the full incident map and the defense architecture that stops it.

Agentic Web Attacks: How Attackers Exploit AI Browsers That Browse the Internet
AI agents that browse the web are under active attack. Hidden instructions in web pages, browser manipulation, UI deception, credential harvesting, data exfiltration through forms, and MCP tool hijacking are six attack classes that exploit the trust agents place in web content. Backed by the WAAA research and production attack patterns, here is the full threat map and the five-layer defense architecture.

LLM Template Injection: How Template Engines Become Prompt Injection Vectors
Jinja2, Django templates, and Python format strings are the plumbing of every LLM pipeline. When attackers inject template syntax into that plumbing, they bypass every prompt filter and achieve data exfiltration from the application server. CVE-2025-65106 proved it in LangChain. Here are the five attack vectors and the defense architecture that stops them.

LLM Code Execution Attacks: How Sandbox Escapes Turn AI Assistants Into Attack Platforms
Sandbox escapes, pickle deserialization RCE, trust_remote_code execution, MCP server command injection, and self-propagating agent worms are the five code execution attack classes we see in production. Backed by CVEs, GitHub advisories, and published research, here is the full threat map and the defense architecture that stops your AI assistant from becoming an attack platform.

Agent Memory Poisoning: How Attackers Plant Persistent Backdoors in LLM Memory
When an attacker poisons an agent's persistent memory, the compromise survives restarts, persists across sessions, and spreads to child agents through inheritance. Here are the five memory poisoning attack classes we detect in production and the defense architecture that stops poisoned memories from becoming persistent backdoors.

LLM Supply Chain Attacks: How Compromised Models, Plugins, and Dependencies Subvert Your AI Stack
Compromised model weights, malicious MCP servers, template injection, sandbox escapes, SSRF, and framework vulnerabilities give attackers a path into your LLM stack that no prompt filter can close. Here are the six supply chain attack classes we see in production, the CVEs and advisories behind them, and the defense architecture that stops them.
LLM Denial of Service: How Resource Exhaustion Attacks Drain Your AI Budget
LoopTrap termination poisoning, ThinkTrap infinite reasoning, RECUR recursive reflection abuse, and tool-chain cost amplification are four distinct attack classes that exploit the fact that LLMs keep working if nobody tells them to stop. Here is how each one works, why token limits do not help, and the five-layer defense that caps costs before they spiral.

Invisible Prompt Injection: How Hidden Unicode Characters Bypass LLM Security
Zero-width characters, Unicode tag sequences, bidirectional overrides, and homoglyphs let attackers smuggle malicious instructions past every keyword filter and human reviewer. The text you see is not the text the model sees. Here is how each invisible injection technique works and the normalize-decode-detect pipeline that stops them.

System Prompt Leakage: Why Your AI's Hidden Instructions Are Not Hidden
Every LLM application has a system prompt. Most teams treat it as a secret. It is not. System prompt leakage (OWASP LLM07) is one of the most exploited vulnerability classes in production LLM applications, and the extraction techniques range from trivially simple to sophisticated multi-turn probing campaigns. Here is the full threat map, the seven detection rules that catch every extraction method, and why treating your system prompt as a security boundary is a losing strategy.

Multilingual Prompt Injection: How Non-English Attacks Bypass Your Defenses
Most LLM security filters are built for English. But models speak dozens of languages, and attackers use German, Spanish, Korean, and Russian to walk right past English-only defenses. Here is how multilingual injection works and how to build a defense that does not stop at the language border.

LLM Output Exfiltration: How Attackers Steal Data Through Your Model's Response
Markdown images, base64 coercion, cipher output, emoji substitution, and tool-call exfiltration are the seven output-side attack techniques that bypass traditional DLP. Here is how each one works and the multi-layer defense that stops them.

AI Governance Crisis: Why Most Companies Are Deploying AI Without Authority
A deep dive into the governance vacuum behind enterprise AI adoption, from shadow AI and procurement failures to PII leakage, regulatory exposure, and the technical controls companies need before they scale.

CI/CD Pipeline Injection: When Your Build Bot Has an LLM Inside
LLM-powered CI/CD workflows are a new attack surface that traditional pipeline security cannot defend. The Heimdallr research, CVE-2025-65106, and real-world attack patterns show how PR descriptions, commit messages, and template injection can compromise your build pipeline from the inside.

RAG Data Exfiltration: How Attackers Steal Your Knowledge Base
RAG systems give LLMs access to proprietary data. Attackers have figured out how to pull it all out through the model itself. Here is how the LeakDojo attack works, how enumeration probes map your knowledge base, and how to lock it down.

Securing Autonomous AI Agents: Attack Surfaces, Threats, and Defense Patterns
Autonomous AI agents can browse the web, call APIs, and send emails on your behalf. Here are the seven attack classes we see in production and the six-layer defense architecture that stops them.
Why We Built a Hybrid Detection Engine
Per-dataset benchmark results for the Context Guard hybrid pipeline (rules plus ML judge), where each layer wins, the AdvBench ceiling, and why we run both.
MCP Security Attacks: How Attackers Hijack AI Tool Calls in 2026
Three CVEs, multiple GitHub advisories, and growing academic research expose MCP tool hijacking, SSE injection, LoopTrap, and agentic browser attacks. Here is the full threat map and how to defend against it.
AI Security Best Practices for Production LLM Applications
An end-to-end practical guide to shipping production LLM applications safely: input validation, output filtering, agent controls, monitoring, and compliance.
OWASP LLM Top 10 2025: Every Risk Explained with Mitigations
Walk through every item in the OWASP LLM Top 10 with practical mitigations and a coverage map for runtime defense layers.
10 Real Prompt Injection Attacks & How to Stop Them
A practical tour of ten prompt injection techniques observed in production traffic, with payloads and the detection logic that stops each one.
What Is Context Poisoning? The Complete Guide for 2026
Context poisoning is the next-generation cousin of prompt injection. Learn what it is, how it differs, real-world attack scenarios, and how to defend against it.