Reproduction and Replication of an Adversarial Stylometry Experiment

This paper reproduces and replicates a seminal study on adversarial stylometry, confirming the original conclusion that anonymity is difficult to maintain but revealing that the effectiveness of certain defenses may be overstated due to a lack of control groups, while also highlighting round-trip translation as a promising automatic method for reducing authorship attribution accuracy.

Haining Wang, Patrick Juola, Allen Riddell

Published 2026-03-04
📖 5 min read🧠 Deep dive

Imagine you are trying to send a secret letter to a friend without anyone knowing who you are. You might think that if you don't sign your name, you are safe. But there's a catch: your handwriting is unique.

Even if you don't sign your name, a detective (or a computer) can look at how you write—your favorite words, your sentence length, your punctuation habits—and say, "Ah, this was written by Sarah, not John." This is called Authorship Attribution. It's like a digital fingerprint made of words.

In recent years, big companies and governments have gotten very good at finding these "word fingerprints," which puts people like whistleblowers or journalists at risk.

The Problem: How to Hide Your "Word Fingerprint"?

A famous study from 2012 (by Brennan and colleagues) tried to find a way to hide these fingerprints. They tested three tricks:

  1. The "Fake It" Strategy (Obfuscation): You try to write differently on purpose. Maybe you use shorter sentences or different words than usual.
  2. The "Copycat" Strategy (Imitation): You try to write exactly like a famous author (in the original study, they tried to sound like the novelist Cormac McCarthy).
  3. The "Round-Trip" Strategy (Machine Translation): You write your text, translate it into another language (like German), and then translate it back to English. The idea is that the translation software messes up your style enough to hide you.

The 2012 study claimed these tricks worked incredibly well, making it almost impossible to guess who wrote the text.

What This New Paper Did: The "Re-Do"

The authors of this new paper (Wang, Juola, and Riddell) decided to check if the 2012 results were still true. They did two things:

  1. The Reproduction (The "Re-run"): They took the exact same data and methods from 2012 and ran the experiment again. Think of this like a chef tasting a dish from a famous recipe book to see if it actually tastes as good as the book says.

    • Result: They confirmed the 2012 results. The tricks did work.
  2. The Replication (The "New Experiment"): This is the most important part. They ran the experiment again, but this time with new people and a better design.

    • The Flaw in the Old Study: The 2012 study didn't have a "Control Group." They didn't have a group of people who wrote normally without trying to hide their style. It's like testing a new medicine but forgetting to give a placebo to a group of people to see if the medicine actually works or if people just feel better because they think they took medicine.
    • The Fix: In this new study, they had a group of people just write normally (the Control Group). This gave them a true baseline to compare against.

What They Found (The Surprises)

Here is what the new study discovered, using simple analogies:

  • The "Fake It" and "Copycat" strategies still work: If you try to write differently or copy someone else, it really does confuse the computer detectives. It drops their success rate from about 40% (which is pretty good) down to about 20% (which is basically guessing).

    • Analogy: It's like putting on a disguise. The detective can still guess who you are, but they are now just guessing, not knowing.
  • The "Round-Trip" Translation is tricky: The old study said translating your text back and forth was a decent trick. The new study found it works, but not as well as the human tricks.

    • The Catch: Translation software is great, but it has a blind spot: Spelling mistakes.
    • Analogy: Imagine you have a unique habit of always misspelling the word "color" as "colr." If you translate your text to German and back, the computer might keep that "colr" mistake because the translation software thinks it's a real word or just copies it. The detective sees "colr" and says, "Aha! That's definitely Sarah!"
    • The new study found that if your original text has typos, the "Round-Trip" trick might actually fail to hide you.
  • The "Control Group" changed the story: Because the new study had a control group, they realized that the "Copycat" strategy (imitating an author) wasn't actually doing as much work as the "Fake It" strategy (writing differently). In the old study, they couldn't tell the difference because they didn't have a "normal writing" group to compare to.

Why Does This Matter?

This paper is a reality check for people who want to stay anonymous online.

  1. Manual effort is best: If you want to hide your identity, the best way is to consciously try to write differently. Don't rely on a computer to do it for you.
  2. Be careful with Translation: Using Google Translate to hide your identity is risky. If you have a typo, the translation might keep it, and you could get caught. Plus, using an online service means the service provider (like Google) sees your text, which defeats the purpose of anonymity.
  3. The "Control" is key: In science, you always need a baseline. Without comparing the "hiding" group to a "normal" group, you might think a trick is working when it's actually just the topic of the writing that changed the results.

The Bottom Line

If you are a whistleblower or a journalist trying to stay safe:

  • Don't just trust a machine. Computers are good, but they can be fooled by your own unique mistakes (typos).
  • Do the work yourself. Try to consciously change your writing style.
  • Don't use online tools for secret work. If you use an online translator, the company running it knows what you wrote. You need tools that work offline.

The paper confirms that while it is possible to hide your "word fingerprint," it takes human effort and careful planning, not just a quick click of a "translate" button.