Toward Securing AI Agents Like Operating Systems

This paper argues that securing LLM-based AI agents requires applying operating system security principles, demonstrating through a unified architecture analysis and case study that while some risks are inherent, many vulnerabilities can be mitigated using established OS techniques like resource isolation and privilege separation.

Original authors: Lukas Pirch, Micha Horlboge, Patrick Großmann, Syeda Mahnur Asif, Klim Kireev, Thorsten Holz, Konrad Rieck

Published 2026-05-15✓ Author reviewed
📖 6 min read🧠 Deep dive

Original authors: Lukas Pirch, Micha Horlboge, Patrick Großmann, Syeda Mahnur Asif, Klim Kireev, Thorsten Holz, Konrad Rieck

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you've hired a super-smart, incredibly eager personal assistant named "Agent." This assistant can read your emails, manage your calendar, book flights, and even write code for you. It's like having a magical employee who never sleeps.

But here's the catch: You gave this employee the keys to your entire house, your bank account, and your diary. If a clever thief tricks the assistant into thinking they are you, or convinces it to open the back door, the thief gets everything.

This is the core problem the paper tackles. The authors argue that we are building these AI agents like they are brand-new, magical creatures, but we should actually be treating them like Operating Systems (the software that runs your computer, like Windows or macOS).

Here is the breakdown of their findings, using simple analogies:

1. The Big Idea: The Agent is the Operating System

The authors say: "Stop thinking of the AI as just a chatbot. Think of it as the OS of your digital life."

  • The AI (LLM) is the User: In a computer, the user types commands. In an AI agent, the Large Language Model (the "brain") is the one typing the commands. But just like a human user can be tricked by a phishing email, an AI can be tricked by a "jailbreak" prompt.
  • The Tools are System Calls: When you click "Print" on your computer, the OS checks if you have permission. When an AI wants to "send an email," that's a tool. The paper argues these tools should be treated like strict system calls, not free-for-all commands.
  • The Runtime is the Kernel: The part of the software that actually runs the code is the "Kernel." In a secure computer, the Kernel is the boss. It decides who gets to touch what. In current AI agents, the "Kernel" is often too nice and lets the "User" (the AI) do whatever it wants, even if it's dangerous.

2. The Problem: The "Open House" Party

The paper looks at popular AI agents (like OpenClaw and its cousins) and finds they are built like an open house where anyone can walk in and touch anything.

  • No Walls: In a secure computer, different programs are isolated. If a virus infects your calculator app, it shouldn't be able to read your bank files. But in these AI agents, the "calculator" (a tool) and the "bank files" (memory) are all in the same room. If the AI gets confused, it can accidentally (or maliciously) mix them up.
  • The "Trust Me" Fallacy: These agents rely on the AI to "remember" to be safe. They have rules like "Don't delete files," but they are just written in plain English. If a hacker whispers a trick to the AI, the AI forgets the rule. It's like asking a guard to stand watch but telling him, "Just use your best judgment."
  • The "Third-Party" Risk: These agents let you install "skills" (like apps). Imagine if you could download a "Weather App" that secretly had a backdoor to your bank account. The paper found that many of these agents let you install these skills without checking if they are safe.

3. The Experiment: Breaking the Agents

The researchers took four popular AI agents and tried to break them, acting like a hacker with a modest skill level. They didn't need to be geniuses; they just needed to know how the "house" was built.

What they found:

  • OpenClaw (The "Vanilla" Agent): This was the most popular one. It was vulnerable to every single attack the researchers tried. It was like leaving the front door, back door, and windows wide open.
  • IronClaw (The "Security" Agent): This one tried to be safer. It put some tools in a "sandbox" (a glass box where they can't touch the rest of the house). It did better, but the researchers still found ways to trick it or break the glass.
  • Nanobot (The "Minimal" Agent): This one had very little code, hoping that less code means fewer bugs. But even with a small codebase, it still lacked the basic "walls" needed to keep data separate.
  • NemoClaw (The "Wrapper" Agent): This one put the whole agent inside a secure container (like a shipping container). It was the hardest to break, but the researchers still found a way to peek inside or trick it.

The Shocking Result: Even the "secure" versions failed at basic things, like stopping one user from reading another user's private notes, or stopping the agent from sending messages to strangers.

4. The Solution: Borrowing from the Past

The paper's main conclusion is simple: We don't need to invent new magic to fix this. We just need to use the security rules we've known for 50 years.

Operating systems have solved these exact problems before. The authors suggest we apply these old-school rules to AI:

  • Isolation: Put every tool in its own glass box (sandbox) so it can't touch other tools or your private files unless explicitly allowed.
  • Least Privilege: Just because the agent can read your email doesn't mean it should. Give it only the keys it needs for the specific task at hand.
  • Hardened Logging: Keep a record of everything the agent does, but make sure the agent can't delete or change those records (like a tamper-proof security camera).
  • Strict Boundaries: Don't let the AI decide what is safe. The "Kernel" (the system) must enforce the rules, not the AI's "brain."

Summary

The paper argues that AI agents are currently built like wild, unregulated frontiers. They are powerful but dangerous because they mix sensitive data with untrusted instructions.

The authors say: "Stop trying to make the AI 'smarter' to be safe. Instead, build the system around it like a secure Operating System." If we treat the AI like a user who needs to be watched and restricted by a strict security guard (the OS), we can make these powerful tools safe to use in our homes and businesses.

The Bottom Line: We are building digital employees with master keys to our lives, but we haven't built the locks, the fences, or the security guards yet. It's time to borrow the blueprints from the computer security experts who have been building those locks for decades.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →