AILS-NTUA at SemEval-2026 Task 10: Agentic LLMs for Psycholinguistic Marker Extraction and Conspiracy Endorsement Detection

This paper introduces AILS-NTUA's novel agentic LLM pipeline for SemEval-2026 Task 10, which employs a decoupled design featuring Dynamic Discriminative Chain-of-Thought for marker extraction and an "Anti-Echo Chamber" architecture for conspiracy detection to achieve significant performance improvements over baselines while establishing a paradigm for interpretable, psycholinguistically-grounded NLP.

Panagiotis Alexios Spanakis, Maria Lymperaiou, Giorgos Filandrianos, Athanasios Voulodimos, Giorgos Stamou

Published 2026-03-06
📖 5 min read🧠 Deep dive

Imagine you are a detective trying to solve a mystery in a crowded, noisy town square (the internet). People are shouting theories, some are telling the truth, some are joking, and some are genuinely believing in secret plots to control the world.

Your job is two-fold:

  1. Find the clues: Identify the specific phrases that prove someone is talking about a conspiracy (like "secret group," "hidden plan," or "they are poisoning us").
  2. Judge the intent: Decide if the person shouting is believing the conspiracy or just reporting on it (like a news anchor saying, "Some people claim the earth is flat").

This paper describes a team of AI detectives (from the National Technical University of Athens) who built a super-smart system to do exactly this. They call it an "Agentic LLM Pipeline."

Here is how they did it, explained with simple analogies:

1. The Problem: The "Reporter Trap"

Imagine a news reporter says, "The article claims that aliens built the pyramids."
A simple AI might get confused. It sees the words "aliens" and "pyramids" and thinks, "Aha! Conspiracy!" and marks it as a believer.
But the reporter isn't a believer; they are just repeating what someone else said. This is called the Reporter Trap. Most AI models fall into this trap because they focus on what words are used, not how they are used.

2. The Solution: A Team of Specialists

Instead of using one giant brain to do everything, the authors built a workflow with different AI agents, each playing a specific role. Think of it like a high-end law firm or a courtroom.

Part A: The Clue Hunters (Subtask 1)

Goal: Find the specific conspiracy phrases.

  • The Detective (DD-CoT): This AI doesn't just guess. It uses a technique called Dynamic Discriminative Chain-of-Thought. Imagine a detective who, before writing a report, has to argue against their own conclusion.
    • Example: "I think 'The Media' is the villain here. But wait, could 'The Media' be the victim? No, because the sentence says they 'manipulated' people. Okay, I'm sure they are the villain."
    • By forcing the AI to argue both sides, it stops making mistakes about who is doing what.
  • The Notary (Deterministic Verifier): Large Language Models are great at thinking but terrible at counting. They might say, "The clue starts at word 5," but actually, it starts at word 6. This Notary is a strict, boring robot that checks the text character-by-character to ensure the AI didn't lie about where the clue starts and ends.

Part B: The Courtroom (Subtask 2)

Goal: Decide if the person believes the conspiracy.

To avoid the "Reporter Trap," they built an "Anti-Echo Chamber." Instead of one AI giving an opinion, they set up a Parallel Council of four distinct personalities who debate the case in secret:

  1. The Prosecutor: Always looks for evidence of a conspiracy. "They used the word 'cabal'! That's a conspiracy!"
  2. The Defense Attorney: Always looks for reasons not to convict. "Wait, they used the word 'claims' and 'reportedly.' They are just reporting news, not believing it."
  3. The Literalist: Only looks at the exact words. "If the text doesn't explicitly say 'I believe,' then we can't convict."
  4. The Profiler: Looks at the "vibe." "They are using all-caps and shouting. That sounds like a true believer."

The Calibrated Judge:
After the four jurors vote, a Judge steps in. The Judge doesn't just count the votes (2 vs. 2). The Judge looks at the reasons the jurors gave.

  • If the Prosecutor says "They said 'cabal'" but the Defense says "But they said 'according to the article'," the Judge knows the Defense is right.
  • The Judge is programmed to be conservative. If there is any doubt, they rule "Not Guilty" (Not a conspiracy) to avoid falsely accusing news reporters.

3. The "Hard Negative" Training

To teach the AI the difference between a believer and a reporter, the team used a special training trick called Contrastive Retrieval.

  • They didn't just show the AI examples of conspiracies.
  • They showed it Hard Negatives: Examples that looked exactly like conspiracies (using the same scary words) but were actually just news reports or jokes.
  • It's like training a dog to find a specific scent, but then giving it a bottle of perfume that smells exactly the same but isn't the target. The dog learns to ignore the smell and look for the context.

The Results

This system was a huge success:

  • For finding clues: It doubled the accuracy compared to a standard AI.
  • For judging intent: It improved accuracy by nearly 50%.
  • Ranking: It came in 3rd place in the world for finding clues and 10th for judging intent, beating many systems that used much larger, more expensive computers.

The Takeaway

The paper proves that you don't need a "super-brain" AI to solve complex problems. Instead, you need good organization. By breaking the job down into small, specialized roles (Detective, Notary, Prosecutor, Defense, Judge) and forcing them to argue with each other, you get a much smarter result than a single AI working alone.

It's the difference between asking one person to write a legal brief and having a whole law firm debate the case before handing it to a judge. The result is fairer, more accurate, and much harder to trick.