NNiT: Width-Agnostic Neural Network Generation with Structurally Aligned Weight Spaces

This paper introduces NNiT, a width-agnostic generative model that leverages Graph HyperNetworks to structurally align weight spaces and tokenizes them into patches, enabling a single sequence model to successfully generate functional neural networks with unseen architectures and widths for robotics tasks.

Jiwoo Kim, Swarajh Mehta, Hao-Lun Hsu, Hyunwoo Ryu, Yudong Liu, Miroslav Pajic

Published 2026-03-03
📖 4 min read☕ Coffee break read

Imagine you are trying to teach a robot to pick up a cube. To do this, the robot needs a "brain" (a neural network) with specific instructions (weights) on how to move its arm.

Usually, if you want a robot brain that is slightly bigger or smaller than the one you trained, you have to start from scratch and train it all over again. It's like trying to fit a suit made for a giant onto a child; you can't just stretch it, you have to sew a whole new one.

NNiT is a new invention that solves this problem. It's like a "universal tailor" that can instantly generate a perfect, working brain for a robot of any size, even sizes it has never seen before.

Here is how it works, broken down into simple concepts:

1. The Problem: The "Shuffled Deck" Mess

Imagine a neural network is a deck of cards. The order of the cards matters for the math, but for the robot's brain, it doesn't matter which specific card is in which spot, as long as the pattern of the deck works.

  • The Old Way: When computers train these brains, they shuffle the deck randomly every time. One time, the "Ace" is at the top; the next time, it's at the bottom. Because the order is random and messy, a computer trying to learn from these decks gets confused. It can't tell if a new, wider deck is just a bigger version of an old one or a completely different game.
  • The Result: If you try to make a brain wider (add more neurons), the old computer models break because they were trained on a specific, rigid size.

2. The Secret Sauce: The "Graph HyperNetwork" (GHN)

The authors realized that if they used a special tool called a Graph HyperNetwork (GHN) to create the training data, they could fix the mess.

  • The Analogy: Think of the GHN as a strict architect. Instead of letting the robot brain be built randomly, the architect forces every single brain to be built in the exact same logical order.
  • The Result: Suddenly, the "cards" in the deck are no longer shuffled. They are neatly stacked. If you look at the "Ace" in a small brain, it's in the same spot as the "Ace" in a huge brain. This creates a structured map where the computer can see the patterns clearly, regardless of the brain's size.

3. The Magic Trick: "Patch Tokenization"

Now that the data is organized, the authors introduced NNiT (Neural Network Diffusion Transformers).

  • The Old Way: Imagine trying to describe a picture by listing every single pixel in a long line. If you want a bigger picture, you have to rewrite the whole list.
  • The NNiT Way: Instead of listing pixels, NNiT cuts the picture into small square patches (like a mosaic).
    • If you want a wider brain, you don't change the rules. You just add more patches to the mosaic.
    • Because the GHN made sure the patches are always organized the same way, the computer knows exactly how to stitch them together. It's like playing with LEGO bricks: whether you build a small house or a skyscraper, you use the same types of bricks; you just use more of them.

4. The Result: Zero-Shot Magic

The paper tested this on a robot arm in a simulation (ManiSkill3).

  • The Test: They trained the AI on robots with specific brain sizes. Then, they asked it to build a brain for a robot with a completely new size (one it had never seen).
  • The Outcome:
    • Old AI Models: Failed miserably. They tried to stretch their old knowledge and broke.
    • NNiT: Succeeded with over 85% success rate. It looked at the new size, grabbed the right "patches" from its memory, and built a working brain instantly.

Summary

NNiT is like a master chef who doesn't just cook one specific meal.

  1. They organize their ingredients perfectly (using the GHN so everything is in the right place).
  2. They chop everything into standard-sized cubes (Patch Tokenization).
  3. When a customer orders a meal for 2 people or 200 people, the chef just adds more cubes to the pot. The recipe remains the same, but the size changes effortlessly.

This allows robots to adapt instantly to new hardware or tasks without needing hours of retraining.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →