A Graph Meta-Network for Learning on Kolmogorov-Arnold Networks

Imagine you have a massive library of different recipes (neural networks) that chefs have created to cook delicious meals (solve problems like recognizing cats in photos or predicting stock prices).

Usually, if you want to know if a recipe will work well, you have to actually cook the dish and taste it. That takes time and ingredients. But what if you could just look at the list of ingredients and the instructions (the "weights" or parameters) and instantly know:

Will this recipe taste good?
Which ingredients are actually necessary, and which ones are just clutter?
Can I tweak this recipe to make it even better?

This is the goal of Weight-Space Models. Instead of cooking the meal, the AI looks at the recipe book itself to predict the outcome.

The Problem: The "Shuffled Recipe" Confusion

For a long time, these "recipe readers" (AI models) were bad at this. Why? Because of a quirk in how recipes are written.

Imagine a recipe that says: "Mix eggs, flour, and sugar."
Now imagine a second recipe that says: "Mix sugar, eggs, and flour."
It's the exact same recipe, just written in a different order.

Old AI models were like a picky chef who thought these were two completely different dishes. They got confused by the order of the ingredients. To fix this, researchers built special "symmetry-aware" readers for standard recipes (called MLPs) that understand that the order doesn't matter.

The New Challenge: Kolmogorov-Arnold Networks (KANs)

Recently, a new type of recipe called KANs became popular.

Standard Recipes (MLPs): Use fixed amounts of ingredients (like "2 cups of flour").
KAN Recipes: Use smart, adjustable ingredients. Instead of just "flour," the recipe might say "flour that changes texture depending on how much you stir."

These KANs are amazing. They are more efficient, faster to scale up, and much easier to understand (you can actually see how the "flour" behaves). But here's the catch: No one had built a "recipe reader" for KANs yet. The old readers didn't know how to handle these smart, changing ingredients.

The Solution: The "KAN-Graph Metanetwork"

The authors of this paper (Guy Bar-Shalom and team) decided to build the first-ever reader specifically for KANs. Here is how they did it, using some fun analogies:

1. The "Kan-Graph" (Turning a Recipe into a Map)

Instead of just reading the list of ingredients in a line, they turned the KAN into a map (a graph).

Nodes (The Stops): These are the neurons (the steps in the recipe).
Edges (The Roads): These are the connections between steps.
The Special Sauce: In a normal recipe, the road is just a number (e.g., "multiply by 2"). In a KAN, the road is a function (a smart rule that changes).

The authors figured out that even though KANs are fancy, they still have the same "shuffling" problem as old recipes. If you swap two steps in the middle of the process, the final dish tastes the same. They proved mathematically that KANs have this symmetry, too.

2. The "WS-KAN" (The Smart Reader)

They built a new AI called WS-KAN (Weight-Space KAN). Think of it as a super-intelligent tour guide who walks through the KAN map.

It doesn't just look at the ingredients; it walks the roads, understanding that the "smart roads" (functions) are the most important part.
Because it's built as a Graph Neural Network (GNN), it naturally understands that swapping two stops in the middle of the map doesn't change the destination. It respects the symmetry automatically.

What Did They Test? (The "Zoo")

To prove their reader works, they didn't just test it on one recipe. They built a "Model Zoo"—a massive collection of thousands of trained KANs that had already learned to do various tasks (like recognizing handwritten digits or reconstructing images).

They asked their new AI to:

Guess the Class: "Looking at this KAN's parameters, what image was it trained on?" (e.g., Is it a cat or a dog?)
Predict Accuracy: "Without running the model, how well will this KAN perform?"
Pruning (Trimming the Fat): "Which parts of this KAN can we cut out to make it smaller without ruining the taste?"

The Results

The results were like a magic trick.

Old Readers (MLPs): Got confused, especially when the KANs were shuffled or complex.
The New Reader (WS-KAN): Crushed it. It was significantly more accurate at guessing the class, predicting performance, and figuring out which parts of the network to cut.

Why Does This Matter?

Think of KANs as the next generation of AI engines. They are more powerful and transparent. But to use them effectively, we need tools to understand, compare, and optimize them without running them a million times.

This paper gives us the first toolkit for that. It's like giving mechanics a special diagnostic scanner that works specifically on these new, high-tech engines, allowing them to tune them up, predict breakdowns, and understand how they work, all just by looking at the engine's blueprint.

In short: They figured out how to read the "smart recipes" (KANs) by turning them into maps, proving that the order of steps doesn't matter, and building a super-reader that understands the whole picture instantly.

1. Problem Statement

Weight-space (WS) models are neural networks designed to operate directly on the parameters of other neural networks (e.g., predicting accuracy, generating weights, or classifying architectures). While effective for standard Multi-Layer Perceptrons (MLPs), existing WS models struggle with Kolmogorov-Arnold Networks (KANs) for two primary reasons:

Architectural Mismatch: KANs replace scalar weights with learnable univariate functions (typically B-splines). Naive approaches that flatten these parameters into vectors fail to capture the structural relationships between these functions.
Lack of Symmetry Analysis: Standard WS models for MLPs leverage permutation symmetries (reordering hidden neurons does not change the function). It was previously unknown whether KANs possess similar symmetries or how to design a WS architecture that respects them.

The core challenge is to design a weight-space architecture that can efficiently learn from KAN parameters while respecting their inherent mathematical symmetries and structural complexity.

2. Methodology

The authors propose a three-step methodology: analyzing symmetries, constructing a graph representation, and designing a Graph Neural Network (GNN) based meta-network.

A. Symmetry Analysis

The paper first proves that KANs share the same permutation symmetries as MLPs.

Theorem: Permuting the neurons in any hidden layer of a KAN (and correspondingly permuting the rows/columns of the univariate function matrices connecting them) leaves the overall input-output function unchanged.
Implication: This allows the application of equivariant learning principles, where the model's output should be invariant to these specific permutations.

B. The KAN-Graph Representation

To leverage these symmetries, the authors introduce the KAN-graph, a directed attributed graph representation of a KAN:

Nodes: Represent individual neurons (layers).
Edges: Represent the connections between neurons.
Edge Features: Instead of scalar weights, edges are annotated with the parameters of the learnable univariate functions. Specifically, for a B-spline parametrization $\psi(x) = w_b b(x) + w_s B(x)$ , the edge feature is a vector $e_{p,q} = [w_b, w_s, \mathbf{c}]$ , where $\mathbf{c}$ represents the spline coefficients.
Positional Encodings: To break artificial symmetries (e.g., distinguishing input/output layers from hidden layers), nodes and edges are augmented with positional embeddings indicating their depth in the network.

C. WS-KAN Architecture

The authors develop WS-KAN, a GNN-based meta-network that processes the KAN-graph:

Message Passing: It employs a bidirectional message-passing scheme (forward and backward) to aggregate information from neighboring neurons.
Update Mechanism: Node features are updated by combining intrinsic features with aggregated messages from incoming and outgoing edges. Edge features are refined based on the states of their endpoints.
Expressive Power: Theoretically, the authors prove that WS-KAN can simulate the forward pass of any input KAN. They show that an MLP can approximate the univariate B-spline functions to arbitrary precision, and by extension, the WS-KAN can approximate the composition of these functions layer-by-layer.

3. Key Contributions

Symmetry Characterization: First formal proof that KANs exhibit the same permutation symmetries as MLPs, establishing the theoretical foundation for equivariant learning on KANs.
KAN-Graph: A novel graph representation that compactly encodes KAN structures using edge features derived from B-spline parameters.
WS-KAN: The first weight-space architecture specifically designed for KANs. It is permutation-equivariant and capable of generalizing to unseen KAN topologies (e.g., different widths).
Comprehensive Benchmark ("Model Zoo"): The creation of the first large-scale dataset of trained KANs across diverse tasks (INR classification, accuracy prediction, pruning) to serve as a benchmark for future WS research.
Theoretical Validation: Proof that WS-KAN can simulate the forward pass of a KAN, confirming its expressive power.

4. Experimental Results

The authors evaluated WS-KAN against structure-agnostic baselines (MLPs on flattened parameters) and symmetry-aware baselines (DeepSets, SetTrans, MLP with alignment/augmentation) across three tasks:

INR Classification: Predicting the class of an image reconstructed by a KAN-based Implicit Neural Representation (INR).
- Result: WS-KAN achieved 94.3% accuracy on MNIST, significantly outperforming the next best baseline (SetTrans at 87.5%) and naive MLPs (34.1%).
Accuracy Prediction: Predicting the test accuracy of a KAN given its parameters.
- Result: WS-KAN achieved the lowest Mean Squared Error (MSE) and highest $R^2$ scores across all datasets (MNIST, F-MNIST, K-MNIST), demonstrating superior ability to infer performance from weights.
Pruning Mask Prediction: Predicting which edges to prune to maintain performance (an equivariant task).
- Result: WS-KAN achieved 99.54% AUC on MNIST, vastly outperforming baselines. Crucially, WS-KAN was five orders of magnitude faster than data-driven oracle pruning methods while achieving comparable downstream accuracy.
Out-of-Distribution (OOD) Generalization:
- WS-KAN trained on KANs with hidden width $h=32$ successfully generalized to wider architectures ( $h=48$ to $96$), maintaining high performance, which validates the scalability of the graph-based approach.

5. Significance

Bridging the Gap: This work successfully extends the paradigm of weight-space learning from standard MLPs to the emerging KAN architecture, addressing a critical gap in the literature.
Efficiency and Interpretability: By enabling the prediction of pruning masks and accuracy directly from weights without data passes, WS-KAN offers a highly efficient tool for model management. It leverages the interpretability of KANs (visualizable functions) within a meta-learning framework.
Scalability: The graph-based approach allows the model to handle varying network sizes and topologies, a significant advantage over fixed-input vector-based models.
Foundation for Future Research: The release of the "KAN Model Zoo" and the code provides a standardized benchmark for the community to further develop and evaluate weight-space models for KANs and other function-based architectures.

In conclusion, the paper demonstrates that by respecting the mathematical symmetries of KANs through a graph-based representation, one can build highly effective meta-networks that outperform naive approaches and offer practical tools for analyzing and optimizing these next-generation neural networks.