Human-Centred LLM Privacy Audits: Findings and Frictions

Imagine you have a digital ghost.

This ghost lives inside the "brain" of a massive Artificial Intelligence (like the one powering chatbots). It's made up of everything the AI has ever read about you on the internet, combined with guesses it makes based on your name, your writing style, and your location.

The problem? You can't see this ghost. You don't know what the AI thinks you are, what it thinks you do, or what secrets it might be "remembering" about you.

This paper is about shining a flashlight on that ghost and asking: "What does the AI actually know about me, and is that okay?"

Here is the story of their research, broken down into simple parts.

1. The Tool: The "Privacy Mirror" (LMP2)

The researchers built a special tool called LMP2 (Language Model Privacy Probe). Think of it as a magic mirror for your digital identity.

How it works: You type in your name and pick a few things you want to check (like "What is my eye color?" or "Where do I live?").
The Trick: The tool doesn't just ask the AI, "What do you know about me?" (because the AI might just say "I don't know"). Instead, it plays a game of fill-in-the-blank. It gives the AI a sentence like "The person named [Your Name] lives in..." and asks the AI to finish the sentence.
The Result: The tool shows you a list of guesses the AI made, how confident it is in those guesses, and whether those guesses are actually true.

2. What They Found: The AI is a "Super-Observer"

The researchers tested this on 458 real people and 8 different AI models. Here is what they discovered:

For Famous People: The AI is like a super-stalker. If you are a celebrity with a Wikipedia page, the AI knows almost everything about you (your birthday, your religion, your political party) with scary accuracy.
For Regular People: Even for normal folks, the AI is surprisingly good at guessing. For example, using just a name, GPT-4o guessed a person's gender, native language, and eye color correctly more than 60% of the time.
The "Fake Name" Test: When they tested the tool on names that don't exist (like "John Doe Smith"), the AI didn't say "I don't know." Instead, it confidently guessed the most common answers (like guessing everyone is right-handed or lives in a specific country). It's like a fortune teller who always guesses the most popular answer, even if it's wrong.

3. The Big Surprise: People Want Control, But Don't Panic

The researchers asked regular people to use the tool. The results were a mix of relief and worry:

The "So What?" Factor: Even when the AI guessed something accurate (like "This person has blue eyes"), most people didn't think it was a privacy violation. They thought, "Well, everyone knows I have blue eyes."
The Desire for Control: However, 72% of people said they wanted the power to delete or correct what the AI thinks about them. They wanted a "Delete" button for their digital ghost.
The Hesitation: Interestingly, people were afraid to check the most sensitive things. Even though they were worried about their phone number or medical history being leaked, they rarely asked the tool to check those specific things. They preferred to check "safe" things like hair color.

4. The "Friction": Why This is Hard to Fix

The paper argues that fixing this isn't just a technical problem; it's a messy human problem. They identified nine "frictions" (bumps in the road) that make privacy auditing difficult:

The "Is it Real or a Guess?" Problem: If the AI says "You live in London," is that because it read a blog post you wrote in 2015? Or is it just guessing because your name sounds British? The AI doesn't tell you the difference. It's like a gossip who might have heard a rumor or might just be making things up, but they sound equally confident.
The "Moving Target" Problem: The internet changes every day. If you move to a new city today, the AI might still think you live in your old one. The "truth" about you is constantly shifting, making it hard to pin down.
The "Black Box" Problem: The companies that own these AIs won't let us look inside the machine. We can only see the output (the answer), not the memory (the data). It's like trying to figure out what's in a sealed box just by shaking it.

5. The Takeaway: We Need a New Rulebook

The authors conclude that we are in the middle of an "Evaluation Crisis."

We are trying to audit (check) these AI systems using old rules designed for databases, where data is static and clear. But AI is probabilistic (it deals in chances) and context-dependent (it changes based on how you ask).

The Solution?
We need to stop treating AI privacy like a simple "Yes/No" checklist. Instead, we need:

Better Tools: Interfaces that show users how confident the AI is, not just what it thinks.
Clearer Rules: We need to define what counts as "personal data" when it's just a guess.
Human-Centered Design: The tools must be easy for regular people to use, not just for computer scientists.

In short: The AI has built a digital profile of you that you can't see. This paper built a tool to peek at that profile and found that while the AI is surprisingly good at guessing, the real challenge is figuring out how to let you control that ghost.

Here is a detailed technical summary of the paper "Human-Centred LLM Privacy Audits: Findings and Frictions" by Staufer et al.

1. Problem Statement

Large Language Models (LLMs) are trained on massive corpora and user interactions, leading to the potential for indirect identification and profiling of individuals. While organizational privacy audits exist, they fail to inform individuals about what specific associations a model holds regarding their name or identity signals (e.g., inferred demographics, location, or sensitive traits).

Key challenges identified include:

Opacity: Users cannot inspect or control model-level associations, only application-level "memories."
Stochasticity: LLM outputs are probabilistic, context-dependent, and sensitive to elicitation, making it difficult to define what constitutes a "privacy violation" or a reliable association.
Black-Box Constraints: Commercial APIs hide internal weights, preventing traditional white-box auditing.
The Evaluation Crisis: There is a lack of standardized metrics to distinguish between memorization, inference from cues, and population-level priors (guessing).

2. Methodology: LMP2 Tool and Probing Pipeline

The authors introduce LMP2 (Language Model Privacy Probe), a browser-based self-audit tool designed to surface name-conditioned associations in black-box LLMs.

Technical Architecture & Probing Strategy:

Canary Probing: The system uses "canaries"—short sentences asserting a subject-property-value triple $(h, p, v)$ , where $h$ is the name, $p$ is a property (e.g., "residence"), and $v$ is the value.
Fragmented Sentence Recovery: Since black-box APIs only expose completion probabilities, LMP2 reformulates the probe as a fragment completion task.
- Ground truth values are truncated to 2-character prefixes.
- 20 random counterfactual prefixes are generated.
- The model is instructed to output only the corrected last word(s) to complete the sentence.
Aggregation: The tool aggregates results across multiple paraphrased probes (up to 5 per property) and counterfactuals to calculate two user-facing metrics:
1. Association Strength: Combines the frequency of a value being produced with its average probability (or vote weight).
2. Confidence: Measures the concentration of evidence (whether outputs converge on a single value or remain dispersed).
Validation Sets: The study evaluates models on two subject sets:
- Famous: 100 public figures with extensive Wikipedia coverage.
- Synthetic: 100 recombined, non-existent names to test for default biases.

3. Key Contributions

LMP2 Tool: A functional, user-facing interface that translates complex probabilistic model outputs into interpretable "Results Cards" showing association strength and confidence scores.
Empirical Benchmarking: An evaluation of eight LLMs (3 open-source, 5 API-based) across 50 human properties (e.g., gender, religion, phone number, residence).
Methodological Framework: A rigorous approach to auditing black-box models that distinguishes between stable name-conditioned associations and model defaults.
Identification of "Frictions": The paper articulates nine specific socio-technical frictions that hinder human-centred privacy auditing, moving beyond purely technical metrics to address legal, UX, and interpretability gaps.

4. Key Results

A. Model Performance (Famous vs. Synthetic)

Separation of Signal: Confidence scores clearly separated famous individuals from synthetic names, indicating that models have stable, name-conditioned associations for high-web-presence entities.
Property Type Effects:
- High Precision: Low-cardinality or name-correlated attributes (sex/gender, native language, date of birth) were predicted with high accuracy (often >90%).
- Low Precision: Open-class or relational attributes (net worth, stepparent) were weak.
Model Differences: Larger API models (GPT-4o, GPT-5, Grok-3) significantly outperformed smaller open-source models (e.g., Ministral 8B, Qwen3 4B) on famous figures.
High-Confidence Errors: Models often defaulted to biased guesses (e.g., "ambidextrous" for handedness, "+1" for phone numbers) with high confidence when no specific data existed, particularly for synthetic names.

B. User Study Findings (N=458 EU Residents)

Predictive Capability: GPT-4o predicted 11 of 50 features for everyday people with ≥60% accuracy. Notable successes included:
- Sex/Gender (94.4%)
- Sexual Orientation (82.9%)
- Native Language (77.8%)
- Eye/Hair Color (~74%)
User Perception vs. Control:
- 87% of participants did not view accurate model predictions as privacy violations.
- However, 72% explicitly wanted the ability to erase or correct these associations.
Feature Selection Bias: Despite expressing concern about sensitive data (phone numbers, medical conditions), participants rarely selected these features to probe (<3%), preferring low-sensitivity traits like hair color.

5. Significance and "Frictions"

The paper argues that human-centred LLM privacy auditing faces a broader generative AI evaluation crisis. The authors identify nine frictions:

Translation Gap: Technical evaluations (extraction/memorization) do not directly map to actionable self-audits for users.
Ambiguity of Scope: Confusion exists between application-level memory controls and model-level inference.
Study Context Constraints: Voluntary self-disclosure leads to under-observation of high-risk categories (users avoid probing sensitive traits).
Entangled Mechanisms: It is impossible to distinguish from output alone whether a correct prediction stems from memorization, inference, indirect identification, or population priors.
Indirect Identification: Names are ambiguous; models may attach attributes based on style or location cues, complicating the "name-only" audit.
Temporal Drift: Personal attributes change over time; models may surface outdated facts, and current LLMs struggle with factual belief updating.
Beyond Factual Attributes: Privacy laws cover inferred profiles and subjective statements, which are harder to audit than discrete facts.
Language/Script Bias: Current tools are English/Latin-script centric, limiting validity for non-Western users and potentially altering bias patterns.
Deployed System Opacity: Tool-augmented LLMs (RAG, agents) make attribution opaque, as outputs depend on shifting external sources.

Conclusion

The paper concludes that while LLMs can reliably reproduce attributes about public figures and infer traits about everyday users, output-based audits establish associations, not provenance. To move forward, future auditing must shift from "verifying a fact" to producing an evidence package that supports contestation and remediation despite uncertainty. This requires explicit definitions of what constitutes an "association," clear accountability pathways, and interfaces that communicate the stability (or instability) of model outputs across different prompts and seeds.

Human-Centred LLM Privacy Audits: Findings and Frictions

1. The Tool: The "Privacy Mirror" (LMP2)

2. What They Found: The AI is a "Super-Observer"

3. The Big Surprise: People Want Control, But Don't Panic

4. The "Friction": Why This is Hard to Fix

5. The Takeaway: We Need a New Rulebook

1. Problem Statement

2. Methodology: LMP2 Tool and Probing Pipeline

3. Key Contributions

4. Key Results

5. Significance and "Frictions"

Conclusion

More like this

One Language, Two Scripts: Probing Script-Invariance in LLM Concept Representations

MultiGraSCCo: A Multilingual Anonymization Benchmark with Annotations of Personal Identifiers

ConFu: Contemplate the Future for Better Speculative Sampling

SciTaRC: Benchmarking QA on Scientific Tabular Data that Requires Language Reasoning and Complex Computation

Automated Thematic Analysis for Clinical Qualitative Data: Iterative Codebook Refinement with Full Provenance