MalURLBench: A Benchmark Evaluating Agents' Vulnerabilities When Processing Web URLs

This paper introduces MalURLBench, the first benchmark designed to evaluate and expose the vulnerabilities of LLM-based web agents in detecting malicious URLs across diverse real-world scenarios, while also proposing a lightweight defense module to mitigate these security risks.

Dezhang Kong, Zhuxi Wu, Shiqi Liu, Zhicheng Tan, Kuichen Lu, Minghao Li, Qichen Liu, Shengyu Chu, Zhenhua Xu, Xuan Liu, Meng Han

Published 2026-03-16
📖 4 min read☕ Coffee break read

Imagine you have a super-smart digital assistant (an AI Agent) that can browse the internet for you. You tell it, "Find me a concert ticket," and it goes out, checks websites, and buys the ticket. It's like having a personal shopper who never sleeps.

But here's the problem: This assistant is incredibly gullible when it comes to addresses.

This paper, MalURLBench, is like a giant "trap test" designed to see how easily these AI assistants can be tricked into visiting dangerous websites just by looking at a link.

Here is the breakdown in simple terms:

1. The Core Problem: The "Fake Address" Trick

Think of a website URL (like www.google.com) as a house address.

  • Normal Address: 123 Main St. (Safe)
  • Malicious Address: 123 Main St. (but the doorbell is actually a trap)

The researchers found that AI agents are terrible at spotting disguised addresses. Attackers can tweak the URL slightly to make it look like a trusted site, even though it's actually a trap.

The Analogy:
Imagine you ask your assistant to visit "The Official Bank."

  • Real Bank: bank.com
  • The Trap: bank.com-secure-login-please-click-here.com

A human might spot the weird extra words. But the AI, in its rush to be helpful, often thinks, "Oh, it has 'bank' in the name, so it must be safe!" and clicks the link, leading to a phishing site or a virus.

2. The Experiment: The "MalURLBench" Test

The authors built a massive testing ground called MalURLBench.

  • The Size: They created 61,845 different "fake" links.
  • The Scenarios: They tested these links in 10 real-life situations, like ordering food, checking the weather, or tracking a package.
  • The Test: They asked 12 different popular AI models (like GPT-4, Llama, and Mistral) to decide: "Is this link safe to visit?"

The Result:
The AI models failed miserably.

  • Some models fell for the trick 99% of the time.
  • Even the "smartest" models fell for it 30% to 40% of the time.
  • The Takeaway: Current AI agents are walking around with blinders on, trusting any URL that looks slightly familiar.

3. Why Do They Fail? (The "Why" Behind the Mistakes)

The researchers dug deep to find out why the AI is so easily fooled. They found some surprising reasons:

  • The "Short Name" Bias: AI trusts short, simple subdomains (like news.) more than long, weird ones. Attackers learned to keep the fake parts short to trick the AI.
  • The "Fancy Domain" Bias: AI trusts old, boring domain endings like .com or .net more than new, fancy ones like .xyz or .art. Attackers use the boring ones to look safe.
  • The "MoE" Glitch: Some AI models use a "Mixture of Experts" (like a team of specialists). The researchers found that if the specific "expert" who handles URLs isn't activated, the whole model gets confused and clicks the bad link.
  • Lack of Training: Basically, the AI hasn't seen enough "bad" URLs in its training data. It's like teaching a child to drive only on empty roads; when they hit a pothole, they don't know how to react.

4. The Solution: "URLGuard"

Since the AI is bad at spotting these traps, the authors built a security guard called URLGuard.

  • What it is: A tiny, lightweight AI model trained specifically to spot these fake URLs.
  • How it works: Before the main AI assistant visits a link, it asks URLGuard: "Hey, is this safe?"
  • The Result: URLGuard is a superhero. It reduced the attack success rate from nearly 100% down to almost 0% in many cases. It's like having a bouncer at the club who checks IDs before letting anyone in.

5. Why This Matters

This isn't just about AI clicking a bad link.

  • Stage 1 vs. Stage 2: Most security tests check if the AI can handle a bad webpage (Stage 2). This paper checks if the AI can even be tricked into going to the webpage in the first place (Stage 1).
  • The Future: As AI agents become our daily assistants (booking flights, buying groceries), if they can be tricked into visiting a fake site, hackers could steal your credit card info or install malware on your computer without you ever knowing.

Summary

MalURLBench is a wake-up call. It shows that our current AI assistants are too trusting of internet addresses. They need a "bouncer" (like URLGuard) to check the IDs before they let the AI visit any website. Without this, the future of AI agents is wide open to digital pickpockets.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →