Do GPUs Really Need New Tabular File Formats?

This paper demonstrates that GPU scan performance bottlenecks in Parquet files stem from suboptimal, CPU-centric configurations rather than the format itself, and shows that applying GPU-aware settings can boost effective read bandwidth to 125 GB/s without altering the Parquet specification.

Original authors: Jigao Luo, Qi Chen, Carsten Binnig

Published 2026-05-27✓ Author reviewed
📖 4 min read☕ Coffee break read

Original authors: Jigao Luo, Qi Chen, Carsten Binnig

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you have a massive library of books (your data) stored in a warehouse (your hard drive). You also have a super-fast robot librarian (your GPU) whose job is to read these books and answer questions.

For years, the library has been organized using a specific filing system called Parquet. This system was designed with a human librarian in mind: it groups books into small, manageable piles that a human can easily pick up one by one.

However, the robot librarian is different. It doesn't just pick up one pile at a time; it has thousands of hands and can grab dozens of piles simultaneously. But because the library is still organized for humans, the robot spends most of its time waiting for the next pile to be handed to it, or it's only using a tiny fraction of its hands. The robot is incredibly fast, but the library organization is holding it back.

The paper asks a simple question: Do we need to invent a brand-new filing system just for robots?

The authors say: No. Instead, we just need to rearrange the existing books using a few simple rules.

Here is how they fixed the problem, using four main "rules of the road":

1. The "More Piles" Rule (Increase Page Count)

  • The Problem: The old system put all the data for a section into one giant, heavy book. The robot tried to read it, but it could only use one hand at a time because the book was too big to split up.
  • The Fix: They chopped those giant books into many smaller, thinner pages. Now, the robot can grab 100 pages at once with its 100 hands.
  • The Result: The robot is no longer waiting around; it's busy using all its hands at once.

2. The "Big Boxes" Rule (Increase Row Group Size)

  • The Problem: The old system sent the robot tiny, postage-stamp-sized packages. Even though the robot is fast, the delivery truck (the connection between the drive and the robot) gets clogged with too many tiny packages.
  • The Fix: They started sending huge, full-sized moving boxes instead of postage stamps.
  • The Result: The delivery truck can now drive at full speed, keeping the robot constantly fed with data.

3. The "Smart Packing" Rule (Encoding Flexibility)

  • The Problem: The old system packed the books using a generic, one-size-fits-all method. Sometimes this made the books smaller, but often it didn't help much.
  • The Fix: They looked at each book individually and chose the best way to shrink it. If a book had lots of repeated words, they used a special code to make it tiny. If a book was already short, they left it alone.
  • The Result: The books take up less space on the shelf, so the delivery truck has less weight to carry, making the whole process faster.

4. The "Don't Wrap It" Rule (No Unnecessary Compression)

  • The Problem: Sometimes, the old system wrapped books in heavy bubble wrap (compression) even when the books were already small. The robot then had to spend time unwrapping them, which wasted energy.
  • The Fix: They decided: "If the bubble wrap doesn't make the package significantly smaller, don't use it."
  • The Result: The robot saves time by skipping the unwrapping step for books that didn't need it.

The Grand Finale: The Robot vs. The Human

The authors tested this new arrangement.

  • The Old Way: The robot was slow, barely using its superpowers.
  • The New Way: By just reorganizing the existing Parquet files (without inventing a new format), they made the robot 125 times faster in terms of data reading speed.

They also showed that when the robot works in sync with the delivery truck (overlapping reading and processing), it becomes even more efficient. In fact, this reorganized robot was so fast that it nearly reached the theoretical speed limit of the delivery truck itself.

The Bottom Line

The paper concludes that we don't need to burn down the library and build a new one from scratch. We just need to re-shelve the books with a few smart adjustments.

By tweaking how the data is packed and grouped, the existing Parquet format can already run at lightning speed on modern GPUs. This saves everyone the trouble of learning a new system and keeps all the old software compatible, while still getting the massive speed boost we wanted.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →