The Value of Graph-based Encoding in NBA Salary Prediction

This paper demonstrates that integrating graph-based embeddings of on-court and off-court player data into tabular datasets significantly improves the accuracy of supervised machine learning models for predicting NBA player salaries, particularly for veterans and high-earning outliers where traditional methods fail.

Junhao Su, David Grimsman, Christopher Archibald

Published 2026-03-09
📖 5 min read🧠 Deep dive

Imagine you are trying to guess how much a professional basketball player will be paid next year.

Traditionally, teams and analysts have used a "stat sheet" approach. They look at a player's stats from last year (points scored, rebounds, etc.), their age, and where they were drafted. It's like grading a student based solely on their last test score. This works great for new kids in school (rookies), because their future is mostly determined by how well they did on the entrance exam (the draft).

But for older, experienced players (veterans), this "stat sheet" method often fails. Why? Because a player's value isn't just about what they did on the court last week. It's about who they know, their reputation, their agent's connections, and how long they've been a trusted part of the league's "family."

This paper asks a simple question: Can we teach a computer to understand "who a player knows" to predict their salary better?

Here is the breakdown of their experiment, using some everyday analogies:

1. The Problem: The "Lonely Stat Sheet"

The authors say that standard computer models treat every player like an island. They see "LeBron James" and just look at his stats. They don't see that LeBron is connected to a powerful agent, a specific team culture, and a network of other stars.

The authors built a Knowledge Graph. Think of this not as a spreadsheet, but as a giant social network map.

  • The Nodes (Dots): Players, Teams, Agents, Awards, Injuries.
  • The Edges (Lines): "Played for," "Signed by," "Won Award with."

They wanted to see if feeding this "social map" into the computer helps it guess salaries better than just looking at stats.

2. The Experiment: Two Types of "Students"

They tested their new "Social Map" method on two very different groups of players, and the results were surprisingly different:

Group A: The Rookies (The "Structural Vacuum")

  • The Situation: A rookie just got drafted. They have no history, no agent network, and no teammates yet. They are in a "social vacuum."
  • The Result: The "Social Map" method failed miserably.
  • The Analogy: Imagine trying to guess a new student's future grades by looking at their "friend group." If they have no friends yet, the computer gets confused and starts guessing randomly.
  • The Lesson: For rookies, stick to the basics. Their salary is a simple math problem based on their draft pick and age. Adding the "social map" just adds noise.

Group B: The Veterans (The "Social Capital")

  • The Situation: An older player whose stats might be slipping (maybe they got injured or are slowing down), but they are still getting paid a fortune.
  • The Result: The "Social Map" method saved the day.
  • The Analogy: Imagine a veteran player is like a retired CEO. Their recent performance might be shaky, but their value is high because of their reputation and network. The standard "stat sheet" model says, "You played poorly, so you should get paid less." The "Social Map" model says, "Wait, this guy is a franchise legend with a great agent and a loyal fanbase; he's worth more than his last game suggests."
  • The Lesson: For veterans, the graph model acts as a safety net. It catches the players who are being undervalued by simple stats because it understands their "social capital."

3. The "Oracle" Test: Did they cheat?

A major worry in these studies is "cheating" (data leakage). Did the computer just memorize the team names or agent names?

  • The Test: They ran a version where the computer was blind to the specific names of teams and agents. It only saw the structure of the connections.
  • The Result: Even without knowing the names, the graph model could guess the salary almost as well as a model that did know the names.
  • The Takeaway: The "shape" of the network itself holds the secret. The computer learned that "being connected to this type of cluster of players" means "high salary," without needing to know the specific names.

4. The "Too Much Information" Trap

The authors tried adding everything to the graph: every injury, every award, every game log.

  • The Result: It got worse.
  • The Analogy: It's like trying to find a needle in a haystack by adding more hay. The most important signal was the specific connections (who you play for, who your agent is), not the sheer volume of every single event in history. Quality over quantity.

Summary: The "Maturity" Rule

The paper concludes with a simple rule for predicting athlete salaries:

  1. For New Kids (Rookies): Use a simple calculator. Look at the draft pick and age. Don't overcomplicate it.
  2. For Veterans: Use the "Social Map." Look at their network, their reputation, and their history. The simple stats will lie to you; the social connections tell the truth.

In short: You can't predict a veteran's salary just by looking at their last game. You have to look at their whole life in the league. The computer finally learned how to do that.