Comparing and Integrating Different Notions of Representational Correspondence in Neural Systems

Imagine you are trying to figure out how two different people are thinking about the same thing. Maybe you want to know if a human brain and a computer program are "thinking" in the same way when they look at a picture of a cat.

For a long time, scientists have tried to answer this by using a single ruler to measure the distance between their thoughts. But this paper argues that one ruler isn't enough. Depending on how you measure, you might get completely different answers.

Here is the story of the paper, broken down into simple concepts and analogies.

1. The Problem: The "One-Ruler" Trap

Imagine you are trying to sort a pile of fruits.

Ruler A measures weight.
Ruler B measures color.
Ruler C measures sweetness.

If you only use the Weight Ruler, a heavy watermelon and a heavy cantaloupe look very similar. But if you use the Color Ruler, they look totally different (green vs. orange). If you use the Sweetness Ruler, they might be similar again.

In the world of AI and brains, scientists have been using different "rulers" (mathematical formulas) to see if two systems are similar. Some rulers check if the systems organize data geometrically (like arranging shapes in a specific pattern). Others check if they can predict the same answers (like a teacher grading a test).

The problem? These rulers often disagree. One might say two AI models are twins, while another says they are strangers. This makes it hard to know what's actually going on.

2. The Experiment: Testing the Rulers

The authors decided to test all these different rulers on two groups:

Artificial Brains: 35 different computer vision models (some trained to recognize cats, some trained to predict the next word, some built like human neurons, some built like modern Transformers).
Real Brains: Scans of human brains looking at 1,000 natural images.

They asked two simple questions:

For AI: Can the ruler tell the difference between a model trained with "Supervised Learning" (learning with a teacher) and "Self-Supervised Learning" (learning by guessing)?
For Humans: Can the ruler tell the difference between the part of the brain that sees edges (V1) and the part that sees complex shapes (V4)?

The Result:

The "Geometry" Rulers (like RSA and SoftMatch) were the best detectives. They looked at how the information was arranged and could easily tell the different families of models and brain regions apart.
The "Prediction" Rulers (like Linear Predictivity) were weaker. They were too flexible. They could twist and turn the data to make anything look similar, so they missed the unique "fingerprints" that made each system special.

3. The Solution: The "Super-Blender" (Similarity Network Fusion)

If one ruler isn't enough, why not use all of them?

The authors tried a simple average (mixing all the rulers together), but that was like making a smoothie where the strong flavors cancel out the subtle ones. It didn't work well.

Instead, they used a clever technique called Similarity Network Fusion (SNF). Think of this not as a blender, but as a group of detectives meeting in a room.

Detective A (Geometry) says: "These two models look alike because they arrange their data the same way."
Detective B (Tuning) says: "I agree, and I also see they react to specific details similarly."
Detective C (Prediction) says: "They both get the right answer, but their internal logic is different."

The SNF algorithm listens to all of them. It only draws a line between two models if most detectives agree they are similar. If one detective thinks they are similar but the others disagree, the line gets weaker. If they all agree, the line becomes a thick, solid highway.

The Magic:
When they used this "Group Detective" approach, the results were amazing.

In AI: It perfectly grouped the models. It showed that all "Self-Supervised" models (regardless of whether they were old-school or new-school) formed one big family, while "Supervised" models formed another. It even showed that hybrid models (mixing old and new tech) were actually cousins to the self-supervised family.
In Brains: It revealed the brain's structure perfectly. It showed the clear hierarchy of the visual cortex, from simple edge detectors to complex object recognizers, much clearer than any single ruler could.

4. The Big Takeaway

This paper teaches us two main lessons:

There is no single "Truth" in similarity. Whether two brains or computers are "similar" depends entirely on how you look at them. Some similarities are about the shape of the thought, others are about the result of the thought.
Combination is key. By fusing these different perspectives, we get a much clearer, more accurate map of how intelligence (both artificial and biological) is organized.

In short: Don't just look at a painting through one colored lens. Look through many, and then combine what you see to understand the masterpiece. That's what this paper does for the science of AI and the brain.

1. Problem Statement

A central challenge in neuroscience and machine learning is determining whether different neural systems (biological or artificial) rely on equivalent internal representations to perform similar tasks. Currently, researchers typically compare these systems using a single representational similarity metric (e.g., Representational Similarity Analysis (RSA), Centered Kernel Alignment (CKA), or Linear Predictivity).

The paper argues that this approach is flawed because:

Non-Interchangeability: Different metrics capture distinct facets of representational correspondence (e.g., geometric structure vs. unit-level tuning vs. linearly accessible information).
Conflicting Conclusions: Because each metric makes different assumptions about which transformations are irrelevant, they can yield qualitatively different conclusions about which systems are similar.
Methodological Fragmentation: There is no unified framework to integrate these complementary dimensions, leading to an incomplete understanding of representational structure.

The core question addressed is: Which dimensions of representational correspondence recover meaningful structure, and how can complementary notions of similarity be integrated?

2. Methodology

Datasets and Subjects

The authors evaluated their approach across two domains:

Artificial Neural Networks: A controlled population of 35 vision models pre-trained on ImageNet-1k, categorized into four families: Supervised CNNs, Self-supervised CNNs, Supervised Transformers, and Self-supervised Transformers. Hybrid architectures (ConvNeXt, Swin) were treated as distinct families.
Biological Neural Systems: fMRI responses from the Natural Scenes Dataset (NSD), spanning 10 visual cortical regions across 4 subjects viewing 1,000 shared natural images.

Representational Similarity Metrics Evaluated

The study systematically compared a suite of metrics varying in the flexibility of the mappings they permit:

Geometry-Preserving: RSA (compares Representational Dissimilarity Matrices), Linear CKA (invariant to orthogonal transformations).
Unit-Level Tuning: Soft Matching (relaxes permutations to "soft permutations" to align units).
Flexible Mappings: Procrustes (orthogonal alignment), Linear Predictivity (unconstrained linear transformation), SVCCA/PWCCA (Canonical Correlation Analysis variants).

Evaluation Criteria

To determine which metrics recover "meaningful structure," the authors defined two benchmarks:

Procedural Correspondence (Models): Can the metric distinguish models with different architectures/training paradigms while grouping models with the same procedure?
Anatomical-Functional Organization (Brains): Can the metric separate responses from distinct cortical regions while aligning responses from the same region across different subjects?

Separability Metrics: The authors used Contrastive Ratio, D-Prime ( $d'$ ), and Silhouette Scores to quantify how well metrics separate known groups.

Integration Framework: Similarity Network Fusion (SNF)

Recognizing that no single metric is sufficient, the authors adapted Similarity Network Fusion (SNF), originally developed for multi-omics integration.

Process: Each metric generates a similarity matrix, which is converted into an affinity graph. SNF iteratively fuses these graphs via a message-passing mechanism.
Goal: The fusion reinforces relationships consistently supported across multiple metrics while suppressing spurious or metric-specific noise, resulting in a consensus similarity matrix.

3. Key Contributions

Systematic Evaluation of Metrics: The paper provides a comprehensive benchmark showing that metrics preserving representational geometry (RSA, Linear CKA) or unit-level tuning (Soft Matching) are superior at recovering known procedural and anatomical structures compared to flexible linear mappings (Linear Predictivity, CCA).
Introduction of SNF for Neural Representations: The authors successfully adapt SNF to integrate representational similarity graphs. They demonstrate that this fused approach yields significantly sharper separation of model families and clearer cortical hierarchies than any single metric.
Data-Driven Typologies: The integrated approach enables the construction of meaningful, data-driven typologies for both AI models and brain regions, revealing organizational principles (e.g., the dominance of training paradigms over architecture in self-supervised models) that are obscured when using single metrics.

4. Key Results

A. Metric Performance on Artificial Models

Geometry & Tuning Win: RSA ( $d' \approx 3.95$ ) and Linear CKA ( $d' \approx 3.91$ ) achieved the highest separability between model families. Soft Matching also performed robustly.
Linear Mappings Fail: Metrics emphasizing linearly accessible information, such as Linear Predictivity ( $d' \approx 2.09$ ) and CCA variants (SVCCA $d' \approx 1.02$ ), showed weak discrimination. This suggests that while linearly decodable features may be shared across families, the specific geometric and tuning structures are the unique "fingerprints" of procedural differences.
SNF Superiority: The SNF-fused metric achieved a $d'$ of 12.42, nearly three times higher than the best single metric. It consistently separated all model families, whereas individual metrics often failed to separate specific pairs.

B. Metric Performance on Neural Data

Cortical Separation: Similar to the model results, geometry-preserving metrics better differentiated cortical regions (e.g., V1 vs. V4) across subjects.
Hierarchical Recovery: SNF achieved a mean $d'$ of 21.45 (nearly five times the best single metric) in separating cortical regions.
Anatomical Alignment: PCA visualizations of the SNF-fused matrix revealed a clear, smooth progression of the ventral visual stream (V1 $\to$ V4 $\to$ higher areas) that aligned more closely with established anatomical hierarchies than any single metric.

C. Emergent Typologies

Training Paradigm Dominance: Hierarchical clustering of the SNF matrix revealed that self-supervised models (regardless of whether they were CNNs or Transformers) formed a unified cluster, overriding architectural boundaries.
Convergence of Hybrid Architectures: Hybrid models like ConvNeXt and Swin clustered closely with MAE (Masked Autoencoder) models, suggesting that modernization and masked reconstruction objectives converge on similar representational structures despite different implementations.

5. Significance and Impact

Reframing Representational Comparison: The paper argues that representational similarity should not be viewed as a single scalar quantity but as a multidimensional space. Different metrics recover different "dimensions" of structure.
Methodological Advancement: By demonstrating that integrating complementary metrics via SNF yields a more complete and accurate picture of neural organization, the paper provides a new standard for comparing biological and artificial systems.
Insight into AI and Brain: The findings suggest that while linearly decodable information may reflect universal computational constraints, the geometric organization and unit-level tuning are the critical signatures of specific learning objectives and anatomical constraints.
Practical Utility: The SNF framework offers a robust tool for researchers to identify natural groupings in complex datasets (e.g., model families or brain regions) without relying on prior assumptions about architecture or anatomy.

In conclusion, this work establishes that no single metric is sufficient for understanding neural representational correspondence. Instead, a fused approach that leverages the complementary strengths of geometry, tuning, and linear mapping provides the most reliable recovery of meaningful structure in both brains and machines.