Code Fingerprints: Disentangled Attribution of LLM-Generated Code

Imagine you walk into a room and see four different people (let's call them Chef A, Chef B, Chef C, and Chef D) all trying to cook the exact same dish: a Spaghetti Carbonara.

Even though they are all following the same recipe (the "task"), you can tell who cooked which plate just by looking at it:

Chef A always cuts the onions very finely and uses a specific brand of cheese.
Chef B is messy, leaving sauce splatters on the rim of the bowl.
Chef C writes a tiny note on the side of the plate explaining the steps.
Chef D always uses a slightly different type of fork.

These tiny, unconscious habits are their "Culinary Fingerprints."

This paper is about building a super-smart detective that can look at a plate of spaghetti (or a piece of computer code) and say, "Ah, this was definitely made by Chef B!"

Here is the breakdown of the paper using that analogy:

1. The Problem: The "Who Cooked This?" Mystery

In the world of computers, Large Language Models (LLMs) like ChatGPT, Claude, and DeepSeek are these "Chefs." They are amazing at writing code (the recipe). But now, if a piece of code has a bug, a security hole, or a copyright issue, we need to know which AI wrote it.

Old Detective Work: Previous methods could only tell the difference between "Human Cook" and "Robot Cook." They couldn't tell the difference between Robot A and Robot B.
The Challenge: If two robots are asked to write a sorting algorithm, they will both write code that works the same way. It's hard to tell them apart because the "recipe" (the logic) is identical.

2. The Big Idea: Separating the "Recipe" from the "Chef's Style"

The authors realized that every piece of code has two mixed-up ingredients:

The Recipe (Source-Agnostic): The actual logic needed to solve the problem. This is the same no matter who writes it.
The Chef's Style (Source-Specific): The tiny habits, the way they name variables, how they indent their code, or how they write comments. This is unique to the specific AI.

The Analogy: Imagine trying to identify a singer by their voice while they are singing a song everyone knows. If you focus too much on the lyrics (the recipe), you can't tell who is singing. You have to ignore the lyrics and focus on the timbre and vibrato (the style).

3. The Solution: The "Disentanglement Network" (DCAN)

The authors built a new AI detective called DCAN. Think of it as a magical kitchen sieve.

Step 1: The Mix. The AI takes the code (the mixed-up recipe and style).
Step 2: The Sieve. It uses a special trick to separate the ingredients. It forces the "Recipe" part to look exactly the same for all chefs (because the math is the same).
Step 3: The Fingerprint. Once the "Recipe" is filtered out, what's left in the sieve is purely the "Chef's Style."
Step 4: The ID. The detective looks at this leftover style and says, "This is definitely Chef B's handwriting!"

4. The Evidence: The "Taste Test"

To prove their detective works, the authors created a massive Taste Test (a dataset):

They asked 4 famous AI Chefs (DeepSeek, Claude, Qwen, ChatGPT) to cook 2,800 different dishes (coding problems) in 4 different languages (Python, Java, C, Go).
They did this twice: once with "clean" plates (no comments) and once with "notes" on the side (comments).

The Results:

The Detective is Sharp: DCAN could identify the correct AI chef 98% of the time when comments were included, and 93% of the time even without them.
The "Style" is Real: They found that:
- ChatGPT tends to be wordy and uses short variable names.
- Claude likes to write long, descriptive variable names.
- DeepSeek loves using specific "stack" tools.
- Qwen has a unique way of organizing its math.
It Works on Hard Dishes: Interestingly, the detective got better at identifying the chef when the recipe was harder (complex math problems). Why? Because when the recipe is simple, everyone does it the same way. When it's hard, the chefs have to make more unique choices, revealing their fingerprints more clearly.

5. Why This Matters

This isn't just a party trick. It's crucial for:

Safety: If a piece of code has a virus, we need to know which AI made it so we can patch that specific model.
Copyright: If a company claims they wrote code, but an AI actually wrote it, this tool can prove it.
Accountability: If an AI makes a mistake that causes a crash, we need to know which "Chef" is responsible.

Summary

The paper introduces a tool that acts like a forensic stylist. Instead of looking at what the code does (which is the same for everyone), it looks at how the code is written (the unique quirks of each AI). By mathematically separating the "logic" from the "personality," it can accurately identify which AI generated a piece of code, even if the code is perfect.

Here is a detailed technical summary of the paper "Code Fingerprints: Disentangled Attribution of LLM-Generated Code".

1. Problem Definition: LLM Code Source Attribution (LLMCSA)

The paper addresses the emerging challenge of LLM Code Source Attribution (LLMCSA). While existing research focuses on distinguishing machine-generated code from human-written code, practical scenarios (e.g., vulnerability triage, incident investigation, licensing audits) require identifying the specific Large Language Model (LLM) that generated a code snippet.

The Challenge: Different LLMs often solve the same algorithmic task using similar logic and adhering to strict syntactic rules, resulting in superficially similar outputs.
The Hypothesis: Despite functional similarity, LLMs possess distinct "generative fingerprints" arising from differences in training data, architecture, alignment strategies, and decoding mechanisms. These manifest as subtle stylistic and structural variations.
The Core Difficulty: Standard detection methods often conflate Source-Agnostic Information (task-dependent functional semantics shared across all models) with Source-Specific Information (model-dependent stylistic fingerprints). The paper argues that effective attribution requires disentangling these two latent factors.

2. Methodology: Disentangled Code Attribution Network (DCAN)

The authors propose DCAN, a framework designed to separate source-agnostic semantics from source-specific stylistic features to improve attribution accuracy.

A. Framework Architecture

Feature Extraction: Uses a pre-trained encoder (UniXcoder) to generate a unified latent representation ( $h_{base}$ ) for a given code snippet. This representation initially entangles both task semantics and model style.
Disentanglement Module:
- Source-Agnostic Projection ( $h_{com}$ ): A non-linear projection network (MLP) extracts the shared task semantics. This component is trained to be invariant across different models solving the same task.
- Source-Specific Extraction ( $h_{spec}$ ): The model-specific fingerprint is isolated via subtractive decomposition: $h_{spec} = h_{base} - h_{com}$ . This residual representation is intended to capture only the stylistic nuances unique to the generating LLM.
Classification: A linear classifier is applied exclusively to $h_{spec}$ to predict the source model.

B. Optimization Objectives

DCAN employs a joint loss function to enforce disentanglement:

Source Classification Loss ( $\mathcal{L}_{cls}$ ): Standard cross-entropy loss applied to $h_{spec}$ to ensure it retains discriminative power for identifying the source model.
Representation Consistency Loss ( $\mathcal{L}_{rc}$ ): A contrastive loss applied to $h_{com}$ . It minimizes the cosine distance between the representations of different models solving the same task. This forces $h_{com}$ to capture only the shared functional logic, thereby pushing model-specific noise into $h_{spec}$ .
Total Loss: $\mathcal{L}_{total} = \mathcal{L}_{cls} + \lambda \mathcal{L}_{rc}$ .

3. Key Contributions

A. The LLMCSA Task & Benchmark Dataset

The paper introduces the first large-scale benchmark for LLMCSA, addressing the lack of public datasets for this specific task.

Scale: 91,804 code samples.
Models: Four mainstream LLMs (DeepSeek, Claude, Qwen, ChatGPT).
Languages: Four programming languages (C, Go, Java, Python).
Settings: Two generation modes: Plain (no comments) and Comment (with inline/block comments).
Diversity: Based on 2,869 LeetCode tasks covering various algorithmic domains (Data Structures, Math, String Processing, etc.) and difficulty levels.

B. The DCAN Framework

A novel representation learning approach that explicitly disentangles task semantics from model style, enabling robust multi-class attribution without requiring access to the generation process (passive attribution).

4. Experimental Results

A. Generative Distinctiveness (RQ1)

Analysis confirms that different LLMs exhibit consistent, measurable differences in:

Code Verbosity: ChatGPT tends to be more verbose; Qwen is more concise.
Lexical Density: Claude prefers longer identifiers; ChatGPT uses shorter names.
Naming Conventions: DeepSeek and Qwen show a higher bias toward snake_case in Java (deviating from standard camelCase).
Structural Depth & Comments: Variations in indentation stability and comment density/placement (e.g., ChatGPT uses more inline comments; DeepSeek/Claude prefer block comments).

B. Attribution Performance (RQ2)

DCAN significantly outperforms baselines (adapted GPTSniffer and CodeGPTSensor) across all settings:

Plain Setting: DCAN achieves 92.94% average F1-score (vs. 89.15% for the best baseline).
Comment Setting: Performance improves to 98.38%, indicating that natural language comments provide strong additional attribution signals.
Complexity: Contrary to typical classification tasks, performance often increases with task difficulty (Hard > Medium > Easy), suggesting complex tasks amplify model-specific stylistic choices.

C. Mechanism Validity (RQ3)

Ablation Study: Using only the source-agnostic component ( $h_{com}$ ) results in near-random accuracy (~25%), proving it contains no source-specific info. Using only the source-specific component ( $h_{spec}$ ) yields the highest accuracy.
Visualization: t-SNE plots show that $h_{spec}$ forms distinct, compact clusters per model, while $h_{com}$ shows significant overlap, confirming successful disentanglement.

D. Robustness and Generalization (RQ4)

Data Efficiency: DCAN maintains high performance even with only 10% of training data, outperforming baselines significantly in low-data regimes.
Cross-Language Generalization: A unified multilingual model performs comparably to single-language specialists.
Zero-Shot Capability: In a Leave-One-Language-Out (LOLO) setting, the model generalizes well to unseen languages, particularly when comments are present (e.g., jumping from ~70% to ~93% accuracy on Python in the Comment setting), leveraging consistent natural language generation styles across languages.

5. Significance and Implications

Software Governance: Provides a technical foundation for accountability, allowing stakeholders to trace vulnerabilities or licensing violations back to specific AI providers.
Forensic Analysis: Demonstrates that "code fingerprints" are robust enough to survive variations in task complexity and programming language, even without access to the generation pipeline.
Methodological Advance: Shifts the paradigm from simple binary detection (Human vs. AI) to fine-grained multi-source attribution by explicitly modeling the separation of what the code does (semantics) from how the model writes it (style).

In conclusion, the paper establishes that LLM-generated code contains stable, model-specific fingerprints. By disentangling these from functional semantics, DCAN achieves state-of-the-art attribution accuracy, offering a viable solution for software provenance analysis in the era of generative AI.