The average developer's machine is now home to at least one agentic AI tool — something that reads their codebase, writes files, executes shell commands, and makes network calls on their behalf. Claude Code, GitHub Copilot (in agent mode), and similar tools have fundamentally changed what it means to run software on a developer's workstation. And where there's code execution, there's an attack surface.
As a detection engineer, I'm less interested in whether AI agents hallucinate and more interested in what happens when someone discovers a vulnerability in the agent itself. What does malicious exploitation of an AI coding assistant look like in telemetry? How do you hunt for it? And what detection logic actually works at scale?
Scope: This piece focuses on host-level and process-level detection for AI agent exploitation. It assumes you have access to endpoint telemetry (e.g., via osquery, Elastic Agent, or similar) and some form of SIEM. Cloud-side detections (e.g., for API key abuse) are a separate topic.
The New Attack Surface
AI coding agents are architecturally interesting from a security perspective because they combine several high-risk primitives: arbitrary code execution, filesystem read/write, subprocess spawning, and outbound network access. These aren't new threat vectors — developers have always had these capabilities. What's new is that an LLM is now mediating these actions, and that LLM can be influenced.
The threat model breaks down into two broad categories. First, there are vulnerabilities in the agent itself — memory corruption bugs, authentication bypasses, path traversal in file-handling code, or prompt injection via crafted inputs. Second, there is misuse of the agent's legitimate capabilities — convincing it (or its user) to execute malicious code via social engineering, malicious repositories, or poisoned context.
"When a CVE lands in a tool that has a shell open and your SSH keys in scope, it's not just a software bug — it's a potential initial access vector."
The CVEs we've seen in this space so far are mostly in the dependency chains and API handling layers. But as these agents become more capable and more embedded in developer workflows, the vulnerability surface will grow. The good news: the behavioral signatures of exploitation tend to be consistent, regardless of the specific CVE.
What Exploitation Looks Like in Telemetry
When an AI agent is exploited, the attacker's goal is usually one of three things: credential theft, persistent access, or lateral movement to cloud infrastructure. Let's look at what each of those looks like at the telemetry layer.
Process Lineage Anomalies
The most reliable signal is unexpected process spawning. Under normal operation, an AI agent like Claude Code will spawn processes with predictable parents and arguments — compilers, test runners, git operations. What stands out is when the agent spawns processes that have no relationship to normal development activity.
// Detect suspicious process spawning from known AI agent parent processes rule AIAgent_SuspiciousChildProcess { meta: description = "Flags unexpected child processes spawned by AI coding agents" author = "Evan Baltman @ GitLab Signal Engineering" date = "2025-04" mitre = "T1059.004, T1106" strings: // Parent process names for common AI coding agents $parent_claude = "claude" ascii wide $parent_copilot = "copilot" ascii wide $parent_cursor = "cursor" ascii wide // Suspicious child process indicators $child_curl = "curl" ascii $child_wget = "wget" ascii $child_nc = " nc " ascii $child_bash = "/bin/bash" ascii $child_sh_c = "sh -c" ascii condition: any of ($parent_*) and any of ($child_*) }
Tuning note: Claude Code legitimately spawns shells to execute code. The signal here isn't the child process alone — it's the combination of parent context, network access, and the specific arguments passed. Enrich with process.args data before alerting.
Credential Access Patterns
A compromised agent operating in a developer's home directory has access to ~/.ssh/, ~/.aws/credentials, ~/.config/gcloud/, and similar paths. File access to these locations — especially reads followed immediately by outbound network connections — is a high-fidelity indicator.
In osquery, you can hunt for this with a simple join across file_events and socket_events:
SELECT fe.target_path, fe.action, fe.pid, p.name AS process_name, p.cmdline, se.remote_address, se.remote_port, se.time AS socket_time FROM file_events fe JOIN processes p ON fe.pid = p.pid JOIN socket_events se ON se.pid = fe.pid AND se.time BETWEEN fe.time AND fe.time + 30 WHERE fe.target_path LIKE '%/.ssh/%' OR fe.target_path LIKE '%/.aws/credentials' OR fe.target_path LIKE '%/.config/gcloud/%' ORDER BY fe.time DESC;
MITRE ATT&CK Mapping
Mapping AI agent exploitation to the ATT&CK framework helps communicate risk to stakeholders and align detections with existing playbooks. Here's how the most common exploitation patterns map:
| Technique ID | Technique Name | AI Agent Context |
|---|---|---|
| T1059.004 | Unix Shell | Agent spawns shell to execute attacker-controlled commands |
| T1552.001 | Credentials in Files | Agent reads ~/.aws/credentials, ~/.ssh/id_rsa, etc. |
| T1106 | Native API | Agent uses OS APIs to spawn processes or read memory |
| T1071.001 | Web Protocols | Exfiltration over HTTP/HTTPS from agent process |
| T1566.001 | Spearphishing Attachment | Malicious repo or file causes agent to execute payload |
Thinking in Layers, Not Rules
The mistake I see most often in detection engineering is building single-signal detections. One YARA rule. One Sigma rule. One alert. The problem is that any sufficiently motivated attacker will evade a single signal. What they struggle to evade is a layered detection model where each layer asks a different question.
For AI agent exploitation specifically, I think about three detection layers: behavioral (what is the process doing?), structural (where does this process live in the system?), and temporal (does the timing of events make sense?). A good detection combines signals from at least two of these layers before firing.
This is, in a way, the same mental model I use when learning a language. You don't understand a sentence by knowing what each word means in isolation — you understand it by seeing how the words relate to each other, in context, over time. Detection is the same. The signal is in the relationships, not the atoms.
The next sections — covering behavioral baselining, cloud-side detections, and a full case study of a real AI agent CVE — are still being written. Check back in May 2025. Feel free to reach out on LinkedIn if you want to discuss in the meantime.