Trust as Monitoring: Evolutionary Dynamics of User Trust and AI Developer Behaviour

Adeela Bashir, Zhao Song, Ndidi Bianca Ogbo, Nataliya Balabanova, Martin Smit, Chin-wing Leung, Paolo Bova, Manuel Chica Serrano, Dhanushka Dissanayake, Manh Hong Duong, Elias Fernandez Domingos, Nikita Huber-Kralj, Marcus Krellner, Andrew Powell, Stefan Sarkadi, Fernando P. Santos, Zia Ush Shamszaman, Chaimaa Tarzi, Paolo Turrini, Grace Ibukunoluwa Ufeoshi, Victor A. Vargas-Perez, Alessandro Di Stefano, Simon T. Powers, The Anh Han

Published 2026-03-27

📖 5 min read🧠 Deep dive

View on arXiv ↗PDF ↗

The Big Picture: The "AI Restaurant" Game

Imagine a bustling city filled with AI Restaurants (the Developers) and hungry Customers (the Users).

The Goal: Customers want delicious, safe food. Developers want to make money.
The Problem: Chefs can choose to cook with fresh, safe ingredients (Safe AI) or cut corners with cheap, potentially poisonous ingredients (Unsafe AI).
The Catch: Cooking safely costs more money. Chefs are tempted to cut corners to save cash, unless they get caught and fined.

The paper asks a simple but crucial question: How do we get the chefs to keep cooking safely, and the customers to keep eating there, without everyone going broke?

The Core Idea: Trust is Just "Not Checking the Receipt"

Most people think "Trust" is a feeling. But this paper defines trust differently. It says Trust is simply the act of stopping your inspection.

No Trust: You check every single ingredient, taste every bite, and read every label. This is safe, but it takes a lot of time and energy (this is the Cost of Monitoring).
High Trust: You walk in, order a burger, and eat it without looking at the kitchen. This is easy, but risky if the chef is cheating.

The paper argues that for a healthy AI ecosystem, we need a balance. If checking is too hard, people stop checking and get hurt. If checking is too easy, people check too much and waste time.

The Three Possible Worlds (The Outcomes)

The researchers ran computer simulations to see how this game plays out over time. They found the system naturally drifts toward one of three "destinations":

1. The Ghost Town (No Adoption, Unsafe Development)

What happens: The fines for bad cooking are too low, and checking the kitchen is too expensive.
The Result: Chefs stop cooking safely because it's not worth the cost. Customers stop eating because they are scared of getting sick.
The Vibe: Everyone goes home hungry. The AI industry dies.

2. The Wild West (Unsafe but Widely Adopted)

What happens: The fines are weak, but the food is so cheap or tempting that people eat it anyway.
The Result: Chefs keep cutting corners. Customers keep eating, but they get sick often. They don't trust the food, but they have no choice or are too lazy to check.
The Vibe: A chaotic market where everyone is making money, but everyone is also getting poisoned. This is the "blind trust" trap.

3. The Golden Age (Safe and Widely Adopted)

What happens: The fines for bad cooking are high, and checking the kitchen is cheap and easy.
The Result: Chefs realize, "It's cheaper to cook safely than to pay the fine." Customers realize, "I can quickly peek in the kitchen to make sure, so I'll eat here."
The Vibe: A thriving, safe market. This is the only outcome the researchers want.

The Secret Sauce: How to Get to the "Golden Age"

The paper suggests two main levers that policymakers (the City Council) need to pull:

1. Make the Fines Hurt (Institutional Punishment)
If a chef gets caught serving poison, the fine must be bigger than the money they saved by cutting corners. If the fine is a slap on the wrist, they will just keep cheating.

Analogy: If you speed and the ticket is $5, you'll speed. If the ticket is $5,000, you'll slow down.

2. Make Checking Easy (Low Monitoring Costs)
This is the most surprising part. The paper says we shouldn't just tell people to "trust" AI. We should make it easy to verify that AI is safe.

Analogy: If a restaurant has a glass wall so you can see the kitchen from your table, you don't need to hire a private investigator to check their hygiene. You just glance.
Real-world application: Governments should require AI companies to publish clear, easy-to-read safety reports (transparency). If it's hard to read the report, the "cost" of checking is too high, and people will stop checking.

The Role of "Smart" Customers

The paper also looked at different types of customers:

The Blind Believers: They never check. (Dangerous if the chef cheats).
The Paranoid: They check every single time. (Safe, but exhausting).
The "Trust-but-Verify"ers: They check at first. If the chef cooks safely for a while, they relax and stop checking. If they see a mistake, they start checking again.

The researchers found that having "Smart" customers helps, but only if checking is cheap. If checking is too expensive, even the smart customers give up and either stop eating or eat blindly.

The Bottom Line

You cannot fix AI safety just by writing rules (Regulation) or by telling people to "have faith" (Blind Trust).

To build a safe AI future, we need a system where:

Cheating is expensive (High fines).
Checking is easy (Transparency and simple audits).
People stay slightly vigilant (We shouldn't trust blindly; we should trust enough to stop checking constantly, but not so much that we stop checking entirely).

If we make it easy to spot a bad AI and hard for bad AI to hide, the market will naturally evolve to reward the safe developers and punish the unsafe ones.

1. Problem Statement

The rapid adoption of AI systems has raised urgent safety concerns. While existing evolutionary game theory (EGT) models of AI governance focus on incentives for developers and regulators, they often treat user trust as a static, one-shot adoption decision. This fails to capture trust as a dynamic, evolving process shaped by repeated interactions, experience, and observed behavior.

The core problem addressed is: How do user trust strategies and developer choices (safe vs. unsafe AI) co-evolve under different regulatory regimes and monitoring costs? Specifically, the authors investigate whether "blind trust" or "regulation alone" is sufficient to prevent evolutionary drift toward unsafe AI outcomes, and how the cost of monitoring AI behavior influences these dynamics.

2. Methodology

The authors employ a multi-faceted approach combining game theory, evolutionary dynamics, and machine learning to model the asymmetric interaction between Users and AI Creators (Developers).

A. Game-Theoretic Model

Setting: A repeated, asymmetric game between two populations: Users and Creators.
Creator Strategies:
- Cooperate (C): Develop safe, compliant AI (incurs cost $c$ ).
- Defect (D): Develop unsafe, non-compliant AI (avoids cost $c$ , but risks institutional punishment $v$ ).
User Strategies: Users choose when to adopt and how frequently to monitor (check) the AI's output. Monitoring incurs a cost $\epsilon$ $ϵ$ .
- AllA: Always adopt, never monitor.
- AllN: Never adopt.
- TFT (Tit-for-Tat): Adopt initially, monitor every round, and condition future actions on past behavior.
- TUA (Trust Until Adverse): Adopt and monitor until $\theta_T$ consecutive rounds of cooperation are observed; then switch to "trust" (unconditional adoption with low-probability monitoring $p_T$ ). Revert to TFT if defection is detected.
- DtG (Distrust Until Good): Adopt and monitor until $\theta_D$ consecutive rounds of defection are observed; then switch to "distrust" (unconditional non-adoption with low-probability monitoring $p_D$ ). Revert to TFT if cooperation is detected.
Payoffs:
- Users gain benefit $b_U$ from safe AI, scaled by risk factor $\mu$ for unsafe AI.
- Creators gain benefit $b_C$ from adoption but face costs for safety ( $c$ ) or punishment ( $v$ ) if caught defecting.
- Regulations are modeled as an institutional parameter ( $v$ ) rather than an explicit agent.

B. Analytical Frameworks

The study validates results across three complementary frameworks:

Stochastic Finite-Population Dynamics: Uses a Markov chain approach with Fermi distribution for strategy imitation. This accounts for noise, random drift, and the possibility of low-payoff strategies spreading by chance in small populations.
Infinite-Population Replicator Dynamics: Uses Ordinary Differential Equations (ODEs) to model the deterministic evolution of strategy frequencies in large, well-mixed populations. Stability is analyzed via Jacobian eigenvalues.
Reinforcement Learning (Q-Learning): Simulates agents learning optimal policies through trial-and-error (Q-learning) without explicit knowledge of the payoff matrix, testing the robustness of conclusions against different learning mechanisms.

3. Key Contributions

Operational Definition of Trust: The paper formalizes trust as "reduced monitoring." Trust is not a binary state but a heuristic that reduces the frequency of costly checks after observing consistent cooperation.
Asymmetric Co-evolutionary Model: Unlike previous symmetric models, this framework explicitly models the asymmetric power dynamic where creators extract value and users bear the risk, analyzing how user vigilance drives developer safety.
Integration of Learning Paradigms: It bridges evolutionary game theory (social learning) and reinforcement learning (individual experience-based learning), demonstrating that the qualitative outcomes are robust across both paradigms.
Policy-Relevant Thresholds: The study derives specific mathematical conditions (e.g., $v > c$ ) under which safe, widely adopted AI systems become the stable evolutionary equilibrium.

4. Key Results

A. Three Robust Long-Run Regimes

Across all three analytical approaches, the system converges to one of three regimes:

No Adoption / Unsafe Development: Users do not adopt (AllN), and creators defect. This occurs when monitoring costs are high and/or institutional punishment is weak.
Unsafe but Widely Adopted: Users adopt blindly (AllA), and creators defect. This is a dangerous equilibrium where users are exploited because the cost of monitoring is too high or the risk of unsafe AI is perceived as low.
Safe and Widely Adopted (Desirable): Users adopt (AllA or trust-based strategies), and creators cooperate. This is the only socially desirable outcome.

B. Critical Parameters

Monitoring Cost ( $\epsilon$ ): This is the most critical driver.
- Low $\epsilon$ : Trust-based strategies (TUA, DtG) thrive. Users monitor initially, establish trust, and reduce monitoring later, maintaining high adoption and safety.
- High $\epsilon$ : Trust-based strategies collapse. Users either stop adopting entirely or revert to blind adoption (AllA), leading to unsafe systems.
Institutional Punishment ( $v$ ):
- Strong punishment ( $v > c$ ) is necessary to make cooperation (safe development) the dominant strategy for creators.
- However, punishment alone is insufficient if monitoring costs are high; users must still be able to afford some level of verification to prevent the "unsafe but adopted" trap.

C. Role of Trust-Based Strategies

Trust-based heuristics (TUA, DtG) significantly enhance user adoption and stabilize cooperation when monitoring costs are low.
They act as a buffer, allowing the system to converge to the "Safe and Widely Adopted" regime faster and more robustly than simple TFT or AllA strategies.
In Q-learning simulations, these strategies help maintain cooperation even when learning agents are exploring, but they lose their advantage as monitoring costs rise.

5. Significance and Implications

The findings provide a formal, quantitative basis for AI governance proposals:

Transparency is Economic: Lowering the cost of monitoring (via transparency, standardized audits, and accessible documentation) is not just a technical goal but a critical economic incentive. It enables users to maintain "calibrated trust" (monitoring occasionally) rather than blind trust or total distrust.
Regulation is Necessary but Insufficient: Strong penalties ( $v$ ) are required to deter unsafe development, but they cannot function in a vacuum. If users cannot afford to verify safety (high $\epsilon$ ), the evolutionary pressure will still drive the system toward unsafe outcomes.
Trust is Dynamic: Policymakers should view trust not as a static label but as an adaptive process. Governance frameworks must support mechanisms that allow users to verify AI behavior cheaply and frequently enough to keep developers accountable.
Avoiding the "Blind Trust" Trap: The model warns that without low-cost monitoring, the system naturally drifts toward a state where users blindly adopt unsafe AI because the cost of checking is too high, and the punishment for developers is insufficient to offset the savings from cutting corners.

In conclusion, the paper argues that a sustainable ecosystem for trustworthy AI requires a triad of low monitoring costs, meaningful institutional sanctions, and adaptive user strategies that balance vigilance with trust.