yortuc

Teaching a 14B Model to Talk Like otisabi: An SFT Style Transfer Experiment

I wanted to fine-tune a small LLM to write like a specific person. Not a generic “write in a funny tone” — I wanted it to mimic a real writer’s voice. I picked otisabi, a veteran writer on ekşi sözlük with nearly 14,000 entries and a distinctive style. For those unfamiliar: ekşi sözlük is Turkey’s iconic collaborative dictionary, founded in 1999 — years before Urban Dictionary or Reddit existed. Think of it as a mix of both, but with a culture of sharp wit, irony, and strong personal voices. Writers build followings based on their style, and the best ones are instantly recognizable from a single entry. ...

Securing Shell Execution Agents: From Validation to Custom DSLs

Agents with shell execution capabilities are everywhere now—from general-purpose CLI assistants that help with development tasks, to IT support bots that diagnose system issues, to DevOps agents that automate deployments, to security compliance agents that enforce policies across fleets. These agents execute PowerShell on Windows, bash on Linux, or zsh on macOS to accomplish their tasks. But this power comes with serious risks. When an agent can run shell commands on your behalf, you’re giving an AI system direct access to your machine. Whether it’s helping you debug code, troubleshooting a printer, deploying infrastructure, or checking compliance, the attack surface is the same. ...

Simulation for Agentic Evaluation

Evaluating AI agents presents fundamentally different challenges compared to traditional software testing. Traditional software follows deterministic paths—given the same input, you get the same output. You can write unit tests, integration tests, and measure code coverage with confidence. But agents are non-deterministic by nature. They make decisions based on LLM outputs that vary between runs, they interact with external systems in unpredictable ways, and they can take multiple valid paths to solve the same problem. You can’t simply assert that function X returns value Y. Instead, you need to evaluate whether the agent achieved the intended outcome, regardless of how it got there. This shift from testing execution paths to evaluating goal achievement requires entirely new evaluation frameworks—and that’s where simulation comes in. ...