Pavement Missing Condition Data Imputation through Collective Learning-Based Graph Neural Networks

This paper proposes a collective learning-based Graph Convolutional Network model that effectively imputes missing pavement condition data by integrating features from adjacent road sections and capturing dependencies between observed conditions, demonstrating promising results in a Texas Department of Transportation case study.

Ke Yu, Lu Gao

Published 2026-03-10
📖 5 min read🧠 Deep dive

Imagine a massive, sprawling city of roads. Every few years, inspectors drive over these roads to give them a "health score" (like a report card from 1 to 100). This score tells the city if the road is "Very Good" or "Very Poor" and helps them decide where to spend money on repairs.

But here's the problem: The data is messy.

Sometimes the sensors break, sometimes the inspectors miss a section, and sometimes the schedule gets messed up. This leaves big holes in the data. It's like trying to read a book where someone has torn out random pages. If you try to guess the story based only on the pages you have, you might get it wrong.

The authors of this paper, Ke Yu and Lu Gao, wanted to fix these missing pages using a clever new trick called Collective Learning Graph Neural Networks (CLGNN).

Here is how they did it, explained simply:

1. The Old Way: Ignoring the Neighbors

Traditionally, when data is missing, engineers used two main methods:

  • The "Delete" Method: If a road section is missing data, they just threw that whole section away. This is like deleting a chapter from a book because a few words are missing. You lose a lot of information.
  • The "Simple Guess" Method: They looked at the road's history (how it was doing last year) or its traffic volume to guess the current score. This is like guessing a character's mood in a story just by looking at what they wore yesterday, ignoring who they are talking to right now.

The Flaw: These methods treat every road section as an island. They forget that roads are connected. A pothole on one street often means the street next to it is also suffering.

2. The New Way: The "Gossip Network"

The authors realized that roads are connected like a giant web. If you want to know the health of a specific road, you shouldn't just look at its own history; you should ask its neighbors.

They built a model that acts like a super-smart gossip network:

  • The Graph: They turned the road map into a digital "friendship graph." Every road section is a person, and the lines connecting them are friendships.
  • The Collective Learning: The model doesn't just look at one person; it looks at the whole group. If "Road A" is missing its health score, the model asks: "What are Road A's neighbors doing? Are they all 'Very Poor'?" If the neighbors are all struggling, it's highly likely Road A is struggling too, even if we don't have the data yet.

3. How the "Magic" Works (The Analogy)

Imagine you are at a party, and you want to guess how a specific guest (let's call him "Road X") is feeling, but you can't see his face.

  • Old Method: You look at a photo of Road X from last year. You guess, "He looked happy then, so he's probably happy now."
  • CLGNN Method: You look at the three people standing right next to him. They are all crying and holding tissues. You immediately realize, "Oh, Road X is probably crying too, even though I can't see him."

The model uses this "neighborhood vibe" to fill in the missing data. It learns that if a group of connected roads are deteriorating, the missing one in the middle is likely deteriorating too.

4. The Test Drive

To see if this worked, they tested it on real roads in Austin, Texas.

  • They took a bunch of real data and secretly "hid" 30% of it (pretending it was missing).
  • They asked their new AI model to guess the hidden scores.
  • They compared the AI's guesses to the actual answers.

The Result: The new model was the clear winner. It was about 5% more accurate than the best traditional methods and much better than standard deep learning models. It successfully used the "neighborhood gossip" to fill in the blanks.

Why This Matters

In the real world, missing data leads to bad decisions. If a city thinks a road is "Good" because of a bad guess, they might delay repairs, and the road could collapse. If they think it's "Bad" when it's actually "Good," they waste money fixing a road that didn't need it.

By using this Collective Learning approach, cities can:

  1. Fill in the gaps without throwing away data.
  2. Make smarter decisions about where to spend tax dollars.
  3. Keep roads safer by getting a more accurate picture of the whole network, not just the parts they can see.

In short: Instead of treating every road as a lonely island, this paper teaches computers to look at the whole neighborhood to figure out what's really going on. It's a smarter, more connected way to keep our roads in shape.