Multi-View Encoders for Performance Prediction in LLM-Based Agentic Workflows

This paper introduces Agentic Predictor, a lightweight model that utilizes multi-view encoding and cross-domain unsupervised pretraining to accurately predict the performance of LLM-based agentic workflows, thereby significantly reducing the computational cost of optimizing agent configurations through trial-and-error.

Patara Trirat, Wonyong Jeong, Sung Ju Hwang

Published 2026-03-02
📖 5 min read🧠 Deep dive

Imagine you are trying to build the perfect team of robots to solve a complex puzzle, like writing a computer program, solving a math problem, or answering a tricky question.

In the world of Artificial Intelligence, these robots are called LLM-based Agents. They don't work alone; they talk to each other, use tools, and follow specific instructions (prompts) to get the job done.

The problem is: There are too many ways to build these teams.

You could have a team of 3 robots or 10. They could talk in a circle, a line, or a web. They could use different "languages" (prompts) or different tools. Trying to find the best team by building them, testing them, and seeing if they fail is like trying to find a needle in a haystack by building a million haystacks first. It's slow, expensive, and frustrating.

This paper introduces a new tool called Agentic Predictor. Think of it as a "Crystal Ball" or a "Talent Scout" for robot teams.

The Problem: The "Trial-and-Error" Trap

Currently, to see if a robot team will work, you have to actually let them run the task.

  • Analogy: Imagine you are a chef trying to find the perfect recipe for a new soup. Instead of tasting a spoonful, you have to cook the entire pot of soup, serve it to a panel of judges, and wait for their score. If it tastes bad, you throw it away and start over.
  • The Cost: Doing this thousands of times costs a fortune in time and money (computing power).

The Solution: The "Crystal Ball" (Agentic Predictor)

The authors built a lightweight AI that can look at a robot team's blueprint and predict if it will succeed before you even run it.

  • Analogy: Instead of cooking the whole soup, the Crystal Ball looks at the recipe card, the list of ingredients, and the chef's notes. It says, "I'm 90% sure this recipe will be delicious," or "This one is going to be a disaster." You only cook the ones the Crystal Ball says are promising.

How Does the Crystal Ball Work? (The "Multi-View" Magic)

The secret sauce is that the Crystal Ball doesn't just look at one thing. It looks at the robot team from three different angles (views) at the same time:

  1. The Map (Graph View): It looks at the structure. Who talks to whom? Is the team organized like a pyramid or a chaotic circle?
  2. The Manual (Code View): It reads the actual instructions and tools the robots are using. Are they using the right tools for the job?
  3. The Personality (Prompt View): It reads the "personality" and instructions given to the robots. Are they being told to be creative, strict, or helpful?

Analogy: Imagine hiring a new employee.

  • A bad manager only looks at their Resume (Code).
  • A better manager looks at their Resume and their Job History (Graph).
  • The best manager (our Crystal Ball) looks at the Resume, the Job History, AND listens to how they talk in an interview (Prompt). By combining all three, they can predict if the person will be a star employee much better than anyone else.

The "Training" Trick: Learning Without a Teacher

Usually, to teach a Crystal Ball, you need thousands of examples of "Good Team" vs. "Bad Team." But getting those examples is expensive (because you have to run the tests!).

The authors used a clever trick called Cross-Domain Unsupervised Pretraining.

  • Analogy: Imagine you want to teach a student how to be a great detective. You don't have enough solved crime cases (labeled data) to teach them. So, you first let them read millions of mystery novels and police reports (unlabeled data) from different genres. They learn the patterns of how detectives think, how clues connect, and how stories unfold.
  • Once they understand the patterns, you only need to show them a few actual crime cases to teach them how to solve your specific mystery.
  • This allows the Crystal Ball to learn the "language" of robot teams without needing to run expensive tests first.

Why Does This Matter?

  1. Speed: It's instant. The Crystal Ball predicts the result in milliseconds, while running the actual robot team takes seconds or minutes.
  2. Money: It saves a huge amount of money. You stop wasting resources on bad ideas.
  3. Quality: Because you can test more ideas quickly, you end up finding better robot teams than before.

The Bottom Line

This paper is about stopping the waste. Instead of blindly guessing which robot team configuration will work and paying a high price to find out, we now have a smart, fast, and cheap "Talent Scout" that can tell us which teams are winners before we even hire them. It turns a slow, expensive guessing game into a fast, efficient science.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →