Trust as Monitoring: Evolutionary Dynamics of User Trust and AI Developer Behaviour

This paper employs evolutionary game theory and simulations to demonstrate that sustainable, safe AI adoption requires a governance framework where penalties for non-compliance exceed the cost of safety and users retain the ability to afford occasional monitoring, rather than relying solely on regulation or blind trust.

Adeela Bashir, Zhao Song, Ndidi Bianca Ogbo, Nataliya Balabanova, Martin Smit, Chin-wing Leung, Paolo Bova, Manuel Chica Serrano, Dhanushka Dissanayake, Manh Hong Duong, Elias Fernandez Domingos, Nikita Huber-Kralj, Marcus Krellner, Andrew Powell, Stefan Sarkadi, Fernando P. Santos, Zia Ush Shamszaman, Chaimaa Tarzi, Paolo Turrini, Grace Ibukunoluwa Ufeoshi, Victor A. Vargas-Perez, Alessandro Di Stefano, Simon T. Powers, The Anh Han

Published 2026-03-27
📖 5 min read🧠 Deep dive

The Big Picture: The "AI Restaurant" Game

Imagine a bustling city filled with AI Restaurants (the Developers) and hungry Customers (the Users).

  • The Goal: Customers want delicious, safe food. Developers want to make money.
  • The Problem: Chefs can choose to cook with fresh, safe ingredients (Safe AI) or cut corners with cheap, potentially poisonous ingredients (Unsafe AI).
  • The Catch: Cooking safely costs more money. Chefs are tempted to cut corners to save cash, unless they get caught and fined.

The paper asks a simple but crucial question: How do we get the chefs to keep cooking safely, and the customers to keep eating there, without everyone going broke?

The Core Idea: Trust is Just "Not Checking the Receipt"

Most people think "Trust" is a feeling. But this paper defines trust differently. It says Trust is simply the act of stopping your inspection.

  • No Trust: You check every single ingredient, taste every bite, and read every label. This is safe, but it takes a lot of time and energy (this is the Cost of Monitoring).
  • High Trust: You walk in, order a burger, and eat it without looking at the kitchen. This is easy, but risky if the chef is cheating.

The paper argues that for a healthy AI ecosystem, we need a balance. If checking is too hard, people stop checking and get hurt. If checking is too easy, people check too much and waste time.

The Three Possible Worlds (The Outcomes)

The researchers ran computer simulations to see how this game plays out over time. They found the system naturally drifts toward one of three "destinations":

1. The Ghost Town (No Adoption, Unsafe Development)

  • What happens: The fines for bad cooking are too low, and checking the kitchen is too expensive.
  • The Result: Chefs stop cooking safely because it's not worth the cost. Customers stop eating because they are scared of getting sick.
  • The Vibe: Everyone goes home hungry. The AI industry dies.

2. The Wild West (Unsafe but Widely Adopted)

  • What happens: The fines are weak, but the food is so cheap or tempting that people eat it anyway.
  • The Result: Chefs keep cutting corners. Customers keep eating, but they get sick often. They don't trust the food, but they have no choice or are too lazy to check.
  • The Vibe: A chaotic market where everyone is making money, but everyone is also getting poisoned. This is the "blind trust" trap.

3. The Golden Age (Safe and Widely Adopted)

  • What happens: The fines for bad cooking are high, and checking the kitchen is cheap and easy.
  • The Result: Chefs realize, "It's cheaper to cook safely than to pay the fine." Customers realize, "I can quickly peek in the kitchen to make sure, so I'll eat here."
  • The Vibe: A thriving, safe market. This is the only outcome the researchers want.

The Secret Sauce: How to Get to the "Golden Age"

The paper suggests two main levers that policymakers (the City Council) need to pull:

1. Make the Fines Hurt (Institutional Punishment)
If a chef gets caught serving poison, the fine must be bigger than the money they saved by cutting corners. If the fine is a slap on the wrist, they will just keep cheating.

  • Analogy: If you speed and the ticket is $5, you'll speed. If the ticket is $5,000, you'll slow down.

2. Make Checking Easy (Low Monitoring Costs)
This is the most surprising part. The paper says we shouldn't just tell people to "trust" AI. We should make it easy to verify that AI is safe.

  • Analogy: If a restaurant has a glass wall so you can see the kitchen from your table, you don't need to hire a private investigator to check their hygiene. You just glance.
  • Real-world application: Governments should require AI companies to publish clear, easy-to-read safety reports (transparency). If it's hard to read the report, the "cost" of checking is too high, and people will stop checking.

The Role of "Smart" Customers

The paper also looked at different types of customers:

  • The Blind Believers: They never check. (Dangerous if the chef cheats).
  • The Paranoid: They check every single time. (Safe, but exhausting).
  • The "Trust-but-Verify"ers: They check at first. If the chef cooks safely for a while, they relax and stop checking. If they see a mistake, they start checking again.

The researchers found that having "Smart" customers helps, but only if checking is cheap. If checking is too expensive, even the smart customers give up and either stop eating or eat blindly.

The Bottom Line

You cannot fix AI safety just by writing rules (Regulation) or by telling people to "have faith" (Blind Trust).

To build a safe AI future, we need a system where:

  1. Cheating is expensive (High fines).
  2. Checking is easy (Transparency and simple audits).
  3. People stay slightly vigilant (We shouldn't trust blindly; we should trust enough to stop checking constantly, but not so much that we stop checking entirely).

If we make it easy to spot a bad AI and hard for bad AI to hide, the market will naturally evolve to reward the safe developers and punish the unsafe ones.