Securing Shell Execution Agents: From Validation to Custom DSLs

Agents with shell execution capabilities are everywhere now—from general-purpose CLI assistants that help with development tasks, to IT support bots that diagnose system issues, to DevOps agents that automate deployments, to security compliance agents that enforce policies across fleets. These agents execute PowerShell on Windows, bash on Linux, or zsh on macOS to accomplish their tasks. But this power comes with serious risks. When an agent can run shell commands on your behalf, you’re giving an AI system direct access to your machine. Whether it’s helping you debug code, troubleshooting a printer, deploying infrastructure, or checking compliance, the attack surface is the same. ...

March 1, 2026 · 17 min · Evren

Simulation for Agentic Evaluation

Evaluating AI agents presents fundamentally different challenges compared to traditional software testing. Traditional software follows deterministic paths—given the same input, you get the same output. You can write unit tests, integration tests, and measure code coverage with confidence. But agents are non-deterministic by nature. They make decisions based on LLM outputs that vary between runs, they interact with external systems in unpredictable ways, and they can take multiple valid paths to solve the same problem. You can’t simply assert that function X returns value Y. Instead, you need to evaluate whether the agent achieved the intended outcome, regardless of how it got there. This shift from testing execution paths to evaluating goal achievement requires entirely new evaluation frameworks—and that’s where simulation comes in. ...

February 27, 2026 · 6 min · Evren