Community Notes undermoderate polarizing content by design creating risks in electoral processes

Here is an explanation of the paper using simple language and creative analogies.

The Big Picture: The "Community Notes" Experiment

Imagine X (formerly Twitter) as a massive, chaotic town square where everyone is shouting. Sometimes, people shout lies or misleading things. To fix this, X introduced Community Notes, a system where regular citizens act like a "jury" to fact-check posts.

The idea was brilliant: instead of having a few bosses decide what's true, let the crowd decide. But there was a catch. If the crowd is deeply divided (like a political argument), they can never agree. So, X built a special algorithm to find the "middle ground."

The Paper's Main Finding:
The researchers (Paul Bouchaud and Pedro Ramaciotti) looked at 1.9 million of these notes across 13 countries. They discovered that while the system works great for boring stuff (like "This video is fake"), it is designed to fail when it comes to heated political arguments, especially during elections.

In fact, the system is so focused on finding "agreement" that it often leaves the most dangerous, polarizing lies alone because the two sides of the political spectrum simply refuse to agree on them.

Analogy 1: The "Diplomatic Dinner Party"

Imagine a dinner party where the guests are split into two groups: Team Red and Team Blue.

The Goal: The host wants to serve a dish that everyone agrees is delicious.
The Rule: If even one person from Team Red says, "I hate this," and one person from Team Blue says, "I love this," the dish is served. But if Team Red says, "This is poison!" and Team Blue says, "This is the best thing ever!", the dish is not served. It gets hidden in the kitchen.

What the Paper Found:
The researchers found that X's algorithm is obsessed with finding that "perfect dish" that both teams like.

Scenario A (Scams): Someone posts, "Click here to win a free iPhone!" Team Red says, "Fake!" Team Blue says, "Fake!" Result: The note gets posted. Everyone agrees it's a scam.
Scenario B (Elections): Someone posts a lie about the election results. Team Red says, "This is a lie!" Team Blue says, "No, it's true!" Result: The algorithm sees the disagreement. It thinks, "Oh, these people can't agree, so I can't be sure what's true." So, it hides the note.

The Problem: The algorithm is so good at finding agreement that it accidentally protects the most divisive lies. It's like a bouncer at a club who only lets in people who are friends with everyone. The result? The most toxic arguments never get moderated because the bouncer is too confused by the fighting to let anyone in.

Analogy 2: The "Bridge Builder" vs. The "Wall"

The researchers explain that X's algorithm tries to build a bridge between the two sides. It looks for "notes" that people from both sides rate as "Helpful."

How it works: The algorithm assigns a "political score" to every note and every person rating it. If a note gets high scores from both the "Left" and the "Right," it gets a high "Helpfulness" score and is shown to everyone.
The Flaw: In highly polarized times (like an election), the "Left" and "Right" are standing on opposite sides of a canyon. They are looking at the same fact and seeing two different realities.
- The Left sees a lie.
- The Right sees the truth.
- Because they can't agree, the algorithm assumes the note isn't "helpful" enough to show.

The Result: The algorithm effectively says, "Since you two can't agree, I'm going to do nothing." This leaves the polarizing lies floating in the air, unchallenged, during the most critical moments for democracy.

The Election Danger Zone

The paper zoomed in on four major elections (USA, UK, France, Germany). They found a scary pattern:

General Topics: Notes about scams, fraud, or celebrity gossip get moderated quickly. The "jury" agrees easily.
Election Topics: Notes about election fraud or political candidates get stuck in "limbo."
- Only about 6% to 12% of election-related notes actually get displayed.
- Compare that to 39% of scam-related notes that get displayed.

Why this matters: During an election, the most important information is often the most polarizing. If the system is designed to hide content that causes disagreement, it is essentially hiding the most critical fact-checks right when we need them most.

The "Invisible Bias"

You might think, "Well, maybe the system is just fair and neutral." The researchers say: No, it's biased by design.

The system isn't trying to find the truth; it's trying to find consensus.

If the truth is controversial, the system treats it as "unhelpful."
If a lie is boring and everyone hates it, the system treats it as "helpful."

It's like a judge who only gives a verdict if both the prosecutor and the defense lawyer shake hands. If they are screaming at each other, the judge declares a mistrial and lets the accused walk free. In the case of elections, the "accused" is often misinformation that could change the outcome of a vote.

The Takeaway

The paper concludes that Community Notes is a great tool for fixing small, non-political problems, but it is dangerous for fixing big, political problems.

By trying to be a "bridge" between enemies, the system accidentally builds a shield around the most toxic content. The authors warn that as other platforms (like Meta and TikTok) copy this system, we need to be careful. We cannot rely on a system that only speaks when everyone agrees, because in a polarized world, the most important truths are often the ones people disagree on the most.

In short: The system is designed to be polite, but in a shouting match, being polite means letting the loudest, most divisive lies go unchecked.

Here is a detailed technical summary of the paper "Community Notes undermoderate polarizing content by design creating risks in electoral processes" by Bouchaud and Ramaciotti (2026).

1. Problem Statement

Social media platforms face a persistent challenge in moderating misinformation, particularly content that is highly polarizing. Traditional expert fact-checking struggles with scalability, visibility, and perceived partisan bias. In response, X (formerly Twitter) deployed Community Notes (CN), a crowdsourced system where users attach context to posts, and other users rate the helpfulness of these notes.

The core mechanism of CN relies on an algorithmic model that infers a latent ideological dimension to select notes that garner cross-partisan support (i.e., notes rated "Helpful" by users across the ideological spectrum). While this approach successfully reduces the spread of misinformation in non-polarized contexts, the authors argue that this design inherently creates a blind spot: content that is factually false but highly polarizing may fail to achieve the necessary cross-partisan consensus to be displayed. This paper investigates whether this design flaw systematically under-moderates polarizing content, particularly during critical events like elections, posing risks to civic discourse and democratic processes.

2. Methodology

The study employs a large-scale, cross-national quantitative analysis combining platform data, algorithmic modeling, and political science scaling techniques.

Data Collection:
- Dataset: The authors analyzed 1.88 million Community Notes and 135.17 million ratings from March 2025.
- Scope: The analysis covers 13 countries (US, UK, Japan, Spain, France, Brazil, Canada, Germany, Argentina, Israel, Australia, Poland, Mexico).
- Contextualization: They retrieved the original posts associated with 83.5% of non-deleted notes to analyze content, sources, and authorship.
- Election Focus: Specific subsets were curated for four major elections: US (2024), UK (2024), France (2024), and Germany (2025).
Algorithmic Replication (Matrix Factorization):
- The authors replicated X's core algorithm using regularized matrix factorization.
- Model Parameters: They inferred:
  - $\beta_n$ : Note bias (intrinsic helpfulness).
  - $\beta_r$ : Rater bias (leniency).
  - $\theta_n, \theta_r$ : Latent ideological positions of notes and raters.
- Prediction Formula: $\hat{\eta}_{rn} = \beta_0 + \beta_n + \beta_r + \theta_n \cdot \theta_r$ .
- Logic: A note achieves "Helpful Status" if it has a high $\beta_n$ (broad appeal) and low $|\theta_n|$ (low polarization). If a note is rated "Helpful" only by one side, $\theta_n$ becomes large, and the note is suppressed.
Ideological Scaling & Validation:
- To ground the algorithm's latent dimensions in real-world politics, the authors used Correspondence Analysis on follower networks (users following Members of Parliament).
- They calibrated these latent spaces against the Global Party Survey, mapping users onto two key dimensions: Left-Right and Anti-Elite (populism/trust in institutions).
- They validated that the CN algorithm's inferred $\theta$ values align with these established political dimensions across all 13 countries.

3. Key Contributions

Global Validation of Latent Ideology: The study provides the first empirical evidence that the CN algorithm successfully learns a latent ideological axis that aligns with the primary structuring political dimension (Left-Right or Anti-Elite) in diverse national contexts, not just the US.
Identification of Systemic Under-moderation: The authors demonstrate that the requirement for "cross-partisan consensus" acts as a filter that systematically excludes polarizing content, regardless of its factual accuracy.
Electoral Risk Quantification: The paper quantifies the disparity in moderation rates between election-related content and other topics, showing that polarizing political content is significantly less likely to reach "Helpful Status."
Source Analysis: The study reveals that while expert fact-checks are used as sources, they are predominantly cited by Left-leaning contributors, and even these high-quality sources struggle to bridge the gap on polarizing election topics.

4. Key Results

Algorithmic Alignment:
- In all 13 countries, the latent dimension learned by CN ( $\theta$ ) closely correlates with the country's main political divide ( $\delta_1$ ).
- AUC Scores: The algorithm's ability to predict the sign of a note's ideology based on the author's political leaning ranges from 0.729 (Israel) to 0.850 (Poland), with an average of 0.808. This confirms the system effectively detects political polarization.
The "Consensus Trap" (Under-moderation):
- Only 11.97% of proposed notes achieve "Helpful Status."
- Election Disparity: During the 2024/2025 elections, the rate of notes reaching Helpful Status was significantly lower for election-related content compared to the overall corpus:
  - US 2024: 6.9% (Election) vs. 14.7% (Overall).
  - UK 2024: 8.2% vs. 10.9%.
  - France 2024: 12.8% vs. 16.8%.
  - Germany 2025: 8.6% vs. 11.9%.
- Topic Contrast: Notes regarding scams and frauds achieved a 39% Helpful Status rate, highlighting that non-polarizing misinformation is moderated effectively, while polarizing political misinformation is not.
Authorship and Deletion Bias:
- The study observed that Right-leaning accounts in the US appeared to have a higher rate of "Helpful" notes (12.9%) than Left-leaning accounts (8.3%).
- However, the authors attribute this largely to differential deletion rates: Left-leaning users were found to delete posts flagged with Helpful notes at a much higher rate (45.2%) compared to Right-leaning users (35.9%), creating an observational bias in the available data.
Source Reliance:
- 96.2% of notes contain URLs.
- News outlets are the primary source (25.9%), followed by X posts (17.4%) and Wikipedia (8.9%).
- Expert fact-checks appear in only 3.5% of notes. While notes citing fact-checks have higher success rates (21.3%), this effect is insufficient to overcome the polarization barrier on election topics.

5. Significance and Implications

Design Flaw in Crowdsourced Moderation: The paper argues that the "bridging" algorithm, designed to prevent partisan bias, inadvertently creates a systemic under-moderation of polarizing content. By requiring consensus across ideological divides, the system fails to flag misinformation that is accepted by one side and rejected by the other—a common characteristic of election-related disinformation.
Electoral Integrity Risks: The findings suggest that during high-stakes events like elections, Community Notes may provide a false sense of security. The system is structurally incapable of surfacing context for the most divisive and potentially harmful claims, potentially leaving voters exposed to uncorrected misinformation.
Policy Recommendations: The authors suggest that crowdsourced moderation should not be viewed as a complete substitute for expert fact-checking, especially for polarizing topics. They call for risk assessments comparing moderation outcomes and suggest that platforms may need to adjust their consensus thresholds or integrate expert oversight for election-related content to mitigate democratic risks.

In conclusion, while Community Notes effectively captures the structure of political polarization globally, its reliance on cross-partisan consensus as a proxy for truth creates a critical vulnerability: polarizing falsehoods are systematically under-moderated by design.

Community Notes undermoderate polarizing content by design creating risks in electoral processes

The Big Picture: The "Community Notes" Experiment

Analogy 1: The "Diplomatic Dinner Party"

Analogy 2: The "Bridge Builder" vs. The "Wall"

The Election Danger Zone

The "Invisible Bias"

The Takeaway

1. Problem Statement

2. Methodology

3. Key Contributions

4. Key Results

5. Significance and Implications

More like this

Monotone Comparative Statics without Lattices

Motion Illusions Generated Using Predictive Neural Networks Also Fool Humans

Performance Analysis of IEEE 802.11p Preamble Insertion in C-V2X Sidelink Signals for Co-Channel Coexistence

Construction of time-varying ISS-Lyapunov Functions for Impulsive Systems

Real-Time BDI Agents: a model and its implementation