FEAT: A Linear-Complexity Foundation Model for Extremely Large Structured Data

FEAT is a linear-complexity foundation model designed for extremely large structured data that overcomes the scalability and representation limitations of existing approaches through a novel multi-layer dual-axis architecture combining adaptive-fusion bi-Mamba-2 and convolutional gated linear attention, achieving superior zero-shot performance and up to 40x faster inference across diverse real-world datasets.

Zhenghang Song, Tang Qian, Lu Chen, Yushuai Li, Zhengke Hu, Bingbing Fang, Yumeng Song, Junbo Zhao, Sheng Zhang, Tianyi Li

Published 2026-03-18
📖 6 min read🧠 Deep dive

Imagine you are a detective trying to solve a massive mystery. You have a room full of millions of suspects (data points), and each suspect has a file with hundreds of clues (features) about them. Your goal is to predict who committed the crime or what they will do next, just by looking at the files of a few known suspects and comparing them to the unknown ones.

This is the world of Structured Data (like spreadsheets, medical records, or financial logs). For a long time, the best detectives (AI models) had a major problem: they were too slow and memory-hungry to look at all the suspects at once.

Here is the story of FEAT, the new detective that solves this problem.

The Problem: The "All-Hands Meeting" Bottleneck

Imagine the old way of doing this. To understand a new suspect, the detective had to call every single other suspect into a room and ask them to compare notes with the new person.

  • The Math: If you have 100 suspects, that's 10,000 comparisons. If you have 100,000 suspects, that's 10 billion comparisons.
  • The Result: The computer runs out of memory (the room gets too crowded) or takes days to finish the meeting. This is called Quadratic Complexity (O(N2)O(N^2)). It's like trying to shake hands with everyone in a stadium; it gets impossible as the crowd grows.

The Solution: FEAT (The Efficient Detective)

The authors created FEAT, a new kind of foundation model. Think of FEAT as a detective who doesn't need a giant meeting room. Instead, FEAT uses a smart, two-step filing system that works linearly (one step at a time, no matter how big the crowd is).

Here is how FEAT works, using simple analogies:

1. The "Dual-Axis" Strategy (The Two-Step Dance)

Most AI models try to do everything at once. FEAT splits the job into two distinct dances:

  • Step A: The Feature Dance (Looking at the Clues)
    First, FEAT looks at a single suspect's file. It checks how the clues relate to each other (e.g., "If this person is old, they are likely retired"). It does this for every file independently. This is fast and doesn't require comparing files yet.

    • Analogy: Reading a single book to understand its internal plot.
  • Step B: The Sample Dance (Looking at the Crowd)
    Next, FEAT looks at how the files relate to each other. But here is the magic: instead of making everyone talk to everyone, FEAT uses two special tools:

    • Tool 1: The "Bi-Mamba" (The Local Gossip): This tool walks down the line of suspects, listening to the immediate neighbors. It remembers the local trends (e.g., "The last 5 people all had blue eyes"). It's fast and remembers the recent past.
    • Tool 2: The "Conv-GLA" (The Global Librarian): The "Gossip" tool has a short memory. If the line is too long, it forgets the beginning. So, FEAT adds a "Librarian" who keeps a summary book of the entire crowd. This librarian doesn't read every page; they just update a running summary (a "covariance memory") of what the whole group looks like.
    • Analogy: The Gossip tells you what's happening right now, and the Librarian tells you the big picture of the whole room.

Why this is a game-changer: By combining a fast local listener with a summary-keeping librarian, FEAT can process millions of rows without the computer crashing. It scales linearly (O(N)O(N)), meaning if you double the data, it only takes double the time, not quadruple.

2. Solving the "Permutation" Puzzle

In a spreadsheet, the order of rows doesn't matter. Suspect #100 is the same as Suspect #1 if you swap their places.

  • The Old Problem: Many fast AI models (like Mamba) were designed for text, where order matters (Sentence 1 comes before Sentence 2). If you feed them a spreadsheet, they get confused and think the order matters, leading to bad guesses.
  • FEAT's Fix: FEAT uses a special "Identity Card" for every column. It treats every feature (like "Age" or "Income") as a unique character, regardless of where it sits in the file. This ensures the model understands that the data is a bag of clues, not a story with a beginning and end.

3. The "Heavy-Tail" Training (Handling the Weirdos)

Real-world data is messy. Most people have average salaries, but a few billionaires skew the average. This is called a heavy-tailed distribution.

  • The Old Problem: If an AI tries to learn from these "billionaire" outliers, it gets confused, panics, and its math breaks (gradient explosion).
  • FEAT's Fix: FEAT was trained on a special mix of synthetic data (fake data made by a smart generator) and real data. The generator was taught to create "billionaires" and "outliers" on purpose. FEAT also uses a "tough love" math rule (Huber Loss) that ignores extreme outliers instead of freaking out about them. This makes FEAT robust enough to handle the messy reality of the real world.

The Results: Speed and Smarts

The paper tested FEAT on 11 different real-world datasets (healthcare, finance, etc.).

  • Speed: When the dataset grew to 500,000 rows, old models crashed or took 22 seconds. FEAT handled it in 0.5 seconds. That's a 40x speedup.
  • Accuracy: Despite being faster and simpler, FEAT was just as smart as the slow, heavy models. It could predict outcomes with high accuracy without needing to be retrained for each new task (Zero-Shot Learning).

Summary

FEAT is like upgrading from a detective who needs a massive conference room to hold a meeting with everyone, to a detective who carries a smart notebook.

  1. The notebook has a local gossip section for immediate context.
  2. It has a global librarian section for the big picture.
  3. It knows that order doesn't matter in a spreadsheet.
  4. It isn't scared by weird outliers.

This allows us to analyze massive amounts of structured data (like millions of patient records or financial transactions) instantly, opening the door for AI to make better decisions in healthcare, finance, and science without waiting days for the computer to finish its calculations.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →