Here is an explanation of the paper "Pitfalls in VM Implementation on CHERI: Lessons from Porting CRuby," translated into simple language with creative analogies.
The Big Picture: Building a Fortified House
Imagine you have a very old, complex house (a Virtual Machine, or VM) that runs your favorite apps. This house was built decades ago using standard blueprints. It works great, but it has some weak spots where thieves (hackers) can sneak in and steal things or break the walls.
Now, imagine a new, super-secure building code called CHERI.
- Old Way: In the old house, a "key" (a pointer) was just a number telling you which room to go to. If you knew the number, you could go anywhere.
- CHERI Way: In the new house, a "key" is a smart card (a capability). This card doesn't just say "Go to Room 10." It also says, "You can only go to Room 10," "You can only open the door, not break the window," and "This card expires in 5 minutes." If you try to use the card to open a door it wasn't meant for, the lock jams instantly.
The Problem: The authors tried to move their old house (CRuby, the engine behind the Ruby programming language) into this new, super-secure building. They expected it to be a simple renovation. Instead, they found that the old house's construction habits clashed with the new security rules, causing the house to collapse in weird ways.
This paper is their "Lesson Learned" guide for anyone else trying to move their software into this new, secure world.
The Six Traps (Pitfalls)
The authors found six specific ways the old software broke the new rules. Here is how they happened, using analogies:
1. The "Too-Small Key" Trap (Invalid Derived Pointer)
- The Old Habit: Imagine a construction worker holding a master key to the whole building. To open a specific drawer, they just make a copy of the key and say, "This copy is for the drawer." In the old world, this copy still worked for the whole building because keys were just numbers.
- The CHERI Rule: The new smart card system is strict. If you make a copy of the key for the drawer, the system automatically shrinks the card so it only opens that drawer.
- The Crash: The worker tries to use that "drawer-only" card to open the front door. The system slams the door shut.
- The Fix: Keep the "Master Key" (a super-capability) handy. When you need to open a new area, derive the new key from the Master Key, not from a tiny, restricted copy.
2. The "Fake Key" Trap (Dereferencing Ambiguous Pointers)
- The Old Habit: The security guard (Garbage Collector) looks at a pile of random numbers on a table. If a number looks like a room number (e.g., 1004), the guard assumes it's a real key and tries to open that door to check if it's empty.
- The CHERI Rule: In the new world, a number is just a number. It doesn't have a "validity stamp." If you try to use a random number as a key, the system screams, "That's not a real card!" and stops you.
- The Crash: The guard tries to open a door with a random number, gets rejected, and the whole system panics.
- The Fix: Before trying to open a door, check the "validity stamp" on the card. If it doesn't have a stamp, it's just a number, not a key. Ignore it.
3. The "Moving Wall" Trap (In-Place Reallocation)
- The Old Habit: You have a storage box. You need more space, so you ask the landlord to expand the box right where it is. The landlord says, "Okay, I expanded it, but I kept your old key." You assume the old key still fits the new, bigger box.
- The CHERI Rule: When the landlord expands the box, they give you a brand new key that fits the new size. The old key is now too small (it only covers the original size).
- The Crash: You try to put a large item into the expanded part of the box using your old, small key. The system blocks you because the key says, "You can't go past this point."
- The Fix: Never assume the key stays the same after an expansion. Always grab the new key the landlord gives you and throw away the old one.
4. The "Hidden Bits" Trap (Using Padding Bits)
- The Old Habit: Imagine a 64-bit integer is a 64-slot parking lot. You use all 64 slots to park cars (store data).
- The CHERI Rule: In the new world, the top 32 slots of that parking lot are actually reserved for security guards (metadata). You can't park cars there; those slots are invisible to you.
- The Crash: The software tries to park a car in the "guard" slots or shift cars around in a way that assumes the whole lot is empty. The system gets confused because it sees data where it expects security guards.
- The Fix: Use a smaller, exact-sized parking lot (like a 64-bit integer type that is guaranteed to have no hidden guards) for your data, so you don't accidentally try to use the security slots.
5. The "Sealed Envelope" Trap (Modifying Temporary Capabilities)
- The Old Habit: You have a sealed envelope (a sealed capability) containing a return address. You want to do some math on the address inside to find a symbol. You rip the envelope open, do the math, and put it back.
- The CHERI Rule: The envelope is sealed with a special wax stamp. If you try to rip it open or change the address inside, the wax breaks, and the system declares the envelope void.
- The Crash: The software tries to do math on a sealed address, breaks the seal, and the system throws an error.
- The Fix: Don't try to do math on the sealed envelope. Copy the address out onto a piece of paper (convert it to a normal integer), do your math on the paper, and leave the envelope alone.
6. The "Wrong Map" Trap (Pointer Arithmetic on Non-Capability Types)
- The Old Habit: You have a map (a pointer). You want to find a location 10 steps away. You convert the map to a coordinate number, add 10, and convert it back to a map.
- The CHERI Rule: If you convert the map to a generic number (like
size_t), you lose the "security rules" attached to the map. When you convert it back, you get a map with no rules, which the system rejects. - The Crash: You try to walk 10 steps using a map that has no boundaries, and the system stops you.
- The Fix: Always use the "Smart Map" type (
uintptr_t) for your calculations. It keeps the security rules attached to the number so they survive the math.
The Result: Is it Worth It?
The authors managed to fix these issues and get CRuby running on the new CHERI system.
- Success Rate: They passed about 87% of the tests (29,609 out of 34,046). The failures were mostly due to missing libraries or a few remaining bugs, not the core security issues.
- Speed: The new, secure version runs at 98.2% of the speed of the old, insecure version. That's a tiny slowdown for a massive security upgrade.
The Takeaway
Moving software to a secure, capability-based system like CHERI is like moving a family from a regular house to a high-tech fortress. You can't just carry the furniture over; you have to re-pack everything because the locks, keys, and rules are different.
The main lesson? Don't trust the old habits.
- Don't assume a number is a key.
- Don't assume a key works for the whole building.
- Don't assume you can shrink or expand things without getting a new key.
By following these new rules, we can build software that is much harder for hackers to break into, without slowing it down too much.