SEval-NAS: A Search-Agnostic Evaluation for Neural Architecture Search

Imagine you are a chef trying to invent the perfect new recipe. In the world of Artificial Intelligence (AI), this "recipe" is called a Neural Network Architecture. It's the blueprint for how a computer brain is built.

For a long time, finding the best blueprint was like trying to bake a cake by baking thousands of different versions, tasting each one, and throwing away the bad ones. This took forever and used up a massive amount of electricity (computing power).

The Problem: The "Hardcoded" Menu
The paper explains that existing methods for finding these AI blueprints are rigid. They are like a restaurant kitchen where the chef can only check if a dish tastes good (Accuracy). If the owner suddenly says, "Hey, we also need to know how long it takes to cook (Latency) and how much fridge space it needs (Memory)," the whole kitchen has to be torn down and rebuilt. The tools are "hardcoded" to only check taste, making it hard to adapt to new needs, especially for small devices like smartphones or smartwatches.

The Solution: SEval-NAS (The "Translator" and "Oracle")
The authors propose a new tool called SEval-NAS. Think of it as a super-smart translator and a fortune teller combined.

Here is how it works, using a simple analogy:

1. Turning Blueprints into Stories (Network-to-String)

Imagine you have a complex LEGO castle. Instead of looking at the physical bricks, you take a photo of the instructions and turn them into a long sentence of words.

The Paper's Method: SEval-NAS looks at the AI's internal "wiring diagram" (the autograd graph) and translates it into a text string.
The Analogy: It's like taking a complex machine and writing a story about how its gears fit together. "First, a red gear turns, then a blue spring pushes, then a yellow lever lifts..."

2. The Translator (The Encoder)

Now, you have a story (the string), but a computer needs to understand the meaning of that story to guess how the machine will perform.

The Paper's Method: It uses a sophisticated AI model (based on T5, a type of language model) to read that "story" and turn it into a mathematical vector (a list of numbers).
The Analogy: This is like a translator who reads your story about the LEGO castle and gives you a "vibe score" or a "complexity rating" based on the words used. It understands that a story with "many heavy gears" implies a heavy machine.

3. The Oracle (The Predictor)

Finally, the system predicts the outcome without ever building the machine.

The Paper's Method: It takes those numbers and predicts specific metrics: How accurate will it be? How fast will it run? How much memory will it eat?
The Analogy: This is the Oracle. You hand it the "story" of the LEGO castle, and it says, "Based on this story, this castle will take 3 seconds to build and will fit in a shoebox." You don't need to actually build the castle to know this!

Why is this a Big Deal?

1. It's "Search-Agnostic" (Plug-and-Play)
Most previous tools were like a custom-built car engine; you couldn't put them in a different car. SEval-NAS is like a universal remote control. You can plug it into any existing AI search system (like the one called "FreeREA" mentioned in the paper) without having to rebuild the whole system. It just adds a new button to the remote.

2. It's Great at Guessing Hardware Costs
The researchers tested this on two huge databases of AI blueprints.

The Result: The Oracle was surprisingly good at guessing Latency (speed) and Memory (size). The correlation was very strong.
The Catch: It was okay at guessing Accuracy (how smart the AI is), but not amazing.
The Metaphor: It's easy to guess how heavy a car is just by looking at the engine size (Hardware costs), but it's harder to guess how fast the car will drive on a race track (Accuracy) just by looking at the engine. The paper admits this limitation but highlights that for "Hardware-Aware NAS" (building AI for phones, drones, etc.), guessing the weight and speed is actually the most important part!

3. Real-World Application
The team took a standard AI search tool and added SEval-NAS to it. They told the tool: "Find me an AI that is smart, but also fits on a Raspberry Pi (a tiny computer)."

Without SEval-NAS: The tool would have to build and test thousands of models to see which ones fit.
With SEval-NAS: The tool generates a blueprint, SEval-NAS instantly "reads" the story and says, "Nope, that's too big," or "Yes, that fits!" The search became much smarter and faster at finding hardware-friendly designs.

The Bottom Line

SEval-NAS is a flexible, plug-and-play tool that turns complex AI blueprints into simple text stories. It then uses a smart "Oracle" to predict how fast and how big those AI models will be, without needing to actually build them. This allows developers to easily design AI that fits perfectly onto small, real-world devices like smartphones and smartwatches, saving time, money, and energy.

1. Problem Statement

Neural Architecture Search (NAS) automates the design of neural networks but faces two critical limitations in its evaluation phase:

High Computational Cost: Traditional NAS requires training and testing candidate architectures to convergence to estimate performance, leading to massive search costs (e.g., thousands of GPU hours).
Rigidity and Hardcoding: Existing evaluation procedures are often hardcoded into specific search algorithms. This makes it difficult to introduce new evaluation metrics, particularly hardware-aware metrics (such as latency and memory usage) required for edge devices. Most hardware-aware NAS methods are designed for specific, single-objective constraints, lacking the flexibility to adapt to diverse or multiple hardware constraints without redesigning the entire search algorithm.

2. Methodology: SEval-NAS

The authors propose SEval-NAS, a search-agnostic evaluation mechanism. It decouples the evaluation process from the search algorithm, allowing it to be plugged into existing NAS pipelines with minimal modification. The framework consists of three main stages:

A. Network-to-String Conversion

Mechanism: The system traverses the autograd graph of any neural network (NN) in a breadth-first manner.
Output: It extracts structural details (operation types, parameters, connectivity) and converts them into a standardized textual string representation.
Tokenization: The string is tokenized into a sequence of tokens (e.g., |1_Convolution|2_Relu|3...), creating a universal representation compatible with Natural Language Processing (NLP) models.

B. Evaluator Network (Encoder-Predictor)

The evaluator is a deep learning model designed to map the tokenized architecture string to performance metrics.

Encoder: Utilizes a Transformer-based architecture (specifically T5 models) to process the token sequence. It generates high-dimensional vector embeddings that capture the structural and contextual dependencies of the network.
Predictor: A fully connected neural network (regression head) that maps the embeddings to the target metrics.
Flexibility: The predictor can be configured for single-objective (e.g., latency only) or multi-objective (e.g., accuracy + latency + memory) prediction by adjusting the number of output neurons ( $k$ ).

C. Integration into NAS Pipeline

The evaluator acts as a "plug-and-play" module. Instead of training candidates, the NAS controller generates an architecture, converts it to a string, and feeds it to the SEval-NAS evaluator.
The evaluator returns predicted metrics, which the controller uses to rank candidates and optimize the search objective (e.g., maximizing accuracy while minimizing latency).

3. Key Contributions

Universal Network-to-String Mechanism: A novel method to traverse autograd graphs and generate textual representations for any neural network, making the approach adaptable to diverse NN types.
Search-Agnostic Evaluator: An encoder-predictor framework that can be integrated into existing NAS algorithms (like FreeREA) without significant algorithmic changes, supporting arbitrary evaluation objectives.
Multi-Objective Hardware Prediction: The ability to predict not just accuracy, but also hardware costs (latency and memory) simultaneously, addressing a gap in current training-free NAS methods.
Comprehensive Ablation Studies: Evaluation of different model sizes (T5-small, T5-base, T5-large) to understand the trade-offs between model complexity and prediction accuracy.

4. Experimental Results

The authors evaluated SEval-NAS on NATS-Bench and HW-NAS-Bench across datasets like CIFAR-10, CIFAR-100, and ImageNet16-120.

Prediction Accuracy (Kendall's $\tau$ Correlation):
- Hardware Metrics (Strong): The model demonstrated strong positive correlations for latency and memory predictions.
  - On HW-NAS-Bench (6 edge devices), latency correlations ranged from 0.60 to 0.97.
  - Memory predictions were highly consistent across all datasets.
- Accuracy (Moderate/Weak): Correlation for accuracy was significantly lower than for hardware metrics. The authors note that accuracy depends on factors beyond simple structural features, making it harder to predict from architecture strings alone.
Model Size Impact:
- T5-small performed robustly and efficiently.
- T5-large showed slightly weaker correlations on the Size Search Space (SSS) of NATS-Bench, suggesting that larger models do not always yield better results for this specific task and may introduce noise.
- On HW-NAS-Bench, larger models (T5-base/large) showed slightly better latency correlation on Edge GPUs due to reduced relative impact of kernel launch overhead, but at the cost of higher inference latency.
Integration with FreeREA:
- When integrated into the FreeREA algorithm, SEval-NAS successfully added latency and memory constraints.
- Search Time: The overhead introduced by the evaluator was negligible (e.g., increasing search time from 45s to 77s for latency-constrained search), which is minimal compared to full training-based NAS.
- Effectiveness: The system successfully ranked architectures based on hardware constraints, proving its utility for edge device deployment.

5. Significance and Future Work

Significance: SEval-NAS solves the "hardcoding" problem in NAS by providing a flexible, training-free evaluator that can predict hardware costs. This enables Hardware-Aware NAS to be easily adapted to new devices or multiple constraints without retraining the search controller. It bridges the gap between structural representation and hardware performance.
Limitations: The study relies on benchmark-reported metrics which may differ from real-world device measurements. Accuracy prediction remains a challenge.
Future Work: The authors suggest deploying lightweight versions of SEval-NAS directly on edge devices for on-device NAS and further exploring threshold parameters for search dynamics.

Conclusion: SEval-NAS represents a significant step toward efficient, flexible, and hardware-aware neural architecture search, offering a practical solution for deploying optimized models on resource-constrained edge devices.