Decoding Partial Differential Equations: Cross-Modal Adaptation of Decoder-only Models to PDEs

This paper demonstrates that while standard decoder-only models underperform compared to encoder-only architectures in cross-modal adaptation for partial differential equations, introducing novel bidirectionality-mimicking techniques like Parallel Flipping and Sequence Doubling effectively closes this performance gap.

Paloma García-de-Herreros, Philipp Slusallek, Dietrich Klakow, Vagrant Gautam

Published 2026-03-09
📖 4 min read☕ Coffee break read

Imagine you have a brilliant, world-class translator (a Decoder-Only AI model like GPT-2). This translator has spent its entire life reading and writing books, understanding the flow of sentences, and predicting what word comes next. It's a master of language.

Now, imagine you want to use this translator to solve a physics problem: predicting how heat spreads through a metal rod or how a wave moves through water. These problems are described by complex math equations called Partial Differential Equations (PDEs).

The researchers in this paper tried to take this language expert and ask it to solve physics problems. They tried to "teach" it by showing it examples of physics data, hoping its brain would adapt.

The Problem: The "One-Way Street" Traffic Jam

Here's the catch: The translator was built to read one way only (left to right). It's like a driver who can only look through the windshield but never in the rearview mirror.

  • The Encoder-Only Models (The Old Guard): These are like drivers who can look forward and backward simultaneously. They see the whole picture at once. When the researchers used these models for physics, they worked great.
  • The Decoder-Only Models (The New Stars): These are the popular, massive models everyone uses today. But because they only look forward, they struggle with physics.
    • The Analogy: Imagine trying to describe a wave. If you only see the beginning of the wave, you can't guess how it will crash at the end. If you only see the end, you don't know where it started. The "one-way" translator gets confused, spitting out jagged, messy predictions that look like static on an old TV.

The researchers found that simply making the translator bigger (adding more "brain power" or parameters) didn't help. It was like giving a one-way driver a bigger car; they still couldn't see behind them, so they still crashed.

The Solution: Two New Tricks to "Fake" Two-Way Vision

Since they couldn't rebuild the translator's brain to look backward (which would take too much time and money), they invented two clever tricks to simulate two-way vision.

Trick 1: The "Mirror Walk" (Parallel Flipping)

Imagine you have a long, winding path you need to walk.

  1. First Run: You walk the path from Start to Finish. You get a good view of the end, but the start is a bit blurry because you haven't seen the whole path yet.
  2. Second Run: You take the exact same path, but you flip it upside down and walk it from Finish to Start. Now, the "Start" of your walk (which was the original Finish) is clear, and the "End" is blurry.
  3. The Magic: You take the first half of your second walk (which is actually the end of the real path) and combine it with the second half of your first walk.
    • Result: You now have a perfect map where every part of the path was seen with the full context of the whole journey. The jagged edges smooth out.

Trick 2: The "Double-Book" (Sequence Doubling)

Imagine you are reading a story to understand a character's motivation.

  1. The Problem: If you only read the story once, you might miss the connection between the beginning and the end.
  2. The Trick: You tape two copies of the story together to make one giant, double-length book.
  3. The Reading: You read the whole double-book. When you get to the second copy of the story, you have already read the first copy. Your brain now has the full context of the entire story before you even start analyzing the second half.
  4. The Result: You only use the predictions from that second half. Because your brain had "seen" the whole story twice, the predictions are much smarter and smoother.

The Outcome

By using these two tricks, the researchers turned the "one-way" language models into "two-way" physics solvers.

  • Before: The language models were terrible at physics, making huge errors.
  • After: With the "Mirror Walk" and "Double-Book" tricks, they performed almost as well as the specialized "two-way" models.

Why Does This Matter?

This is a big deal because Decoder-Only models (like the ones powering chatbots today) are the most powerful, widely used, and easiest to scale up. If we can make them work for science without changing their fundamental architecture, scientists can use these massive, pre-trained brains to solve complex problems like earthquake prediction, weather forecasting, and fluid dynamics much faster and cheaper than building new, specialized models from scratch.

In short: They took a one-way driver, gave them a mirror and a double-length map, and suddenly, they could drive a race car just as well as the pros.