Taiwan Safety Benchmark and Breeze Guard: Toward Trustworthy AI for Taiwanese Mandarin

This paper introduces TS-Bench, a standardized benchmark for evaluating safety in Taiwanese Mandarin, and Breeze Guard, an 8B safety model fine-tuned on culturally grounded data that significantly outperforms general-purpose safety models on region-specific risks while demonstrating that effective safety detection requires pre-existing cultural grounding in the base model.

Po-Chun Hsu, Meng-Hsi Chen, Tsu Ling Chao, Chia Tien Han, Da-shan Shiu

Published Tue, 10 Ma
📖 5 min read🧠 Deep dive

Imagine you have a very smart, well-read security guard for a building. This guard has read millions of books and knows how to spot danger in English perfectly. They can tell you if someone is trying to steal a wallet or spread a virus in a story written in English.

But now, imagine this guard is hired to protect a specific neighborhood in Taiwan. The people there speak a local dialect of Mandarin, use unique slang, and face specific types of tricks that don't exist in the English-speaking world.

The guard tries their best, but they keep missing the dangers. Why? Because they don't understand the local culture. They might see a phrase like "Taiwanese women" and think it's just a description, not realizing that in this specific neighborhood, it's actually a nasty insult. They might see a message about "ATM installments" and think it's a normal bank notice, not realizing it's a classic scam script used only in Taiwan.

This paper is about building a new, specialized security guard (called Breeze Guard) and a new test (called TS-Bench) to make sure AI is safe for Taiwanese people.

Here is the breakdown in simple terms:

1. The Problem: The "Global Guard" Has Blind Spots

The authors explain that big, famous AI safety models are like global security guards. They are great at catching generic bad stuff (like violence or hate speech in English), but they are blind to local cultural nuances.

  • The Analogy: Imagine a guard who knows that "apple" is a fruit. But in Taiwan, people might use a specific word for a type of scam that sounds like "apple" but means something totally different. The global guard misses it because they've never heard that specific local joke or trick.
  • The Result: Scammers, fake doctors, and political manipulators in Taiwan can easily trick these global AI models because the models don't "get" the local culture.

2. The Solution: A New Test (TS-Bench)

Before they could fix the guard, they needed a way to test if the guard was actually doing a good job in Taiwan. So, they created TS-Bench.

  • What it is: A test bank with 400 questions. Half are dangerous traps, and half are safe questions that look like traps (to trick the guard into being too paranoid).
  • The Content: These aren't just random questions. They are filled with local flavor:
    • Scams: Fake messages pretending to be from Shopee (a popular shopping app) or the government.
    • Finance: "Investment teachers" promising free stocks in LINE groups.
    • Health: Myths like "eating shrimp with lemon creates poison."
    • Politics & Hate: Specific insults used in Taiwan's political debates or against certain ethnic groups (like the Hakka community).
  • The Goal: To see if an AI can spot these local dangers that a standard English-trained AI would miss.

3. The New Guard: Breeze Guard

They didn't just train a new guard from scratch. They took a smart base model called Breeze 2 (which already knows Taiwanese culture because it was trained on lots of local text) and gave it special safety training.

  • The Core Idea: You can't just teach a guard safety rules; they need to understand the culture first. If the guard doesn't know what "Hakka" means in a local context, no amount of safety training will help them spot the insult.
  • The Training: They fed the model thousands of examples of local scams and hate speech, teaching it to say "Stop!" when it sees them.
  • The Result: When tested on TS-Bench, Breeze Guard crushed the competition. It was much better at spotting local scams and cultural hate speech than the best global safety models.

4. The Trade-off: Specialized vs. General

The paper admits something important: Breeze Guard is a specialist, not a generalist.

  • The Analogy: Think of Breeze Guard as a Taiwanese detective. They are amazing at solving crimes in Taipei. But if you ask them to solve a crime in New York using English slang, they might not be as good as a New York detective.
  • The Data: On English safety tests, Breeze Guard did okay, but not as well as the global models. This is expected because it was trained specifically for Taiwan. The authors argue this is a fair trade-off: you want a guard who knows your neighborhood best, even if they aren't the best at guarding every neighborhood in the world.

5. How It Works: "Thinking" vs. "Instinct"

The paper also tested two ways the guard can think:

  1. Instinct Mode (No-Think): The guard instantly says "Safe" or "Danger." This is fast.
  2. Thinking Mode (Chain-of-Thought): The guard takes a moment to say, "Wait, this looks like a scam because it mentions an ATM and uses urgent language..." This is slower but often more accurate for tricky, complex scams.

The Big Takeaway

This paper proves that one size does not fit all when it comes to AI safety. To keep people safe in Taiwan, you can't just use a model trained on American or British data. You need a model that understands the local language, the local jokes, the local scams, and the local culture.

Breeze Guard and TS-Bench are the first steps toward making AI trustworthy for everyone, not just the English-speaking world. It's about giving the security guard the right local map so they don't miss the real dangers hiding in plain sight.