Formal Reasoning About Confidence and Automated Verification of Neural Networks

Imagine you have a very smart robot dog that can look at a picture and tell you, "That's a horse!" or "That's a dog!" This robot is a Neural Network, and it's being used for important jobs like driving self-driving cars or diagnosing diseases.

For a long time, scientists have been worried about Adversarial Attacks. These are like tiny, almost invisible smudges on a photo that trick the robot. If you put a tiny smudge on a picture of a horse, the robot might suddenly scream, "That's a toaster!" and be 100% sure it's right. This is dangerous.

The Problem: The Robot is Too Confident (or Not Confident Enough)

The old way of testing these robots was simple: "If the robot gets the answer wrong, even a little bit, the robot is broken."

But the authors of this paper say, "Wait a minute! That's too harsh."

Imagine the robot sees a horse.

Scenario A: You smudge the picture, and the robot says, "That's a toaster!" but it only has 5% confidence. It's basically guessing.
Scenario B: You smudge the picture, and the robot says, "That's a horse!" but its confidence drops from 99% to 20%. It's still right, but it's suddenly very unsure.

The old tests would fail the robot in both cases. But the authors argue:

In Scenario A, maybe the robot is fine! It was just a wild guess. If the robot is usually right, a low-confidence mistake shouldn't count as a total failure.
In Scenario B, the robot is actually more dangerous. It's still saying "Horse," but it's losing its confidence. That's a sign the robot is fragile and might break soon.

The paper introduces a new way to test robots that cares about how sure they are, not just if they are right or wrong.

The Solution: The "Translator" Layer

Here is the tricky part. The tools scientists use to test these robots (called Verifiers) are like very strict math teachers. They only understand simple questions like: "Is the answer greater than zero?" or "Is the answer less than zero?"

They don't understand complex sentences like: "If the confidence is low, ignore the mistake, OR if the confidence is high, make sure the answer is still correct."

If you try to ask the math teacher this complex question, they get confused and stop working.

The Authors' Magic Trick:
Instead of trying to teach the math teacher a new language, the authors built a translator (a few extra layers of the neural network) that sits between the robot and the teacher.

Think of it like this:

The Robot looks at the picture.
The Translator (the new layers) takes the robot's complex thoughts and confidence levels and turns them into a simple "Yes/No" signal.
- If the robot is doing well (even if it makes a low-confidence mistake), the Translator sends a Green Light.
- If the robot is failing (high confidence mistake or huge confidence drop), the Translator sends a Red Light.
The Math Teacher (the Verifier) just looks at the Green or Red Light. They don't need to understand the complex logic; they just check if the light is Green.

How They Built the Translator

The authors created a special "grammar" (a set of rules) to describe all these different types of safety checks (Relaxed, Strong, Top-K, etc.).

Then, they figured out how to build the Translator using ReLU (a standard math function used in AI). They treated the logic like building with LEGO blocks:

AND logic (both things must be true) is built one way.
OR logic (either thing can be true) is built another way.
They even invented a "flip" switch to make sure the AND and OR blocks could talk to each other without getting confused.

This translator is so good that it can handle the most complex safety rules without breaking the math teacher's brain.

The Results: A Big Win

The team tested this on 8,870 different benchmarks (thousands of different robot brains and test cases).

They used the biggest networks in the world (some with 138 million parameters!).
They compared their "Translator" method against old, custom-made ways of testing.

The verdict? Their method was much faster and more successful.

It allowed them to use the world's best testing tools (like $\alpha\beta$ -CROWN) on these complex new rules.
It found that many robots that were previously thought to be "broken" were actually safe if you considered their confidence levels.
It also found that some robots were "fragile" because their confidence dropped too much, even if they got the answer right.

The Takeaway

This paper is like giving safety inspectors a new, smarter checklist. Instead of just asking, "Did the robot get the answer right?", they can now ask, "Did the robot get the answer right and feel confident about it?"

By building a clever "translator" layer, they made it possible to ask these complex questions using existing tools, making our AI systems safer, more reliable, and easier to trust.

1. Problem Statement

While significant progress has been made in verifying the robustness of neural networks (ensuring predictions remain unchanged under small input perturbations), existing approaches largely ignore the confidence of the network's output.

The Gap: Standard robustness treats any misclassification as a failure, regardless of whether the network is highly confident or barely guessing. Conversely, a network might maintain the correct label but suffer a drastic drop in confidence, which can also be a safety risk.
The Challenge: Confidence is derived from the Softmax function, which involves exponential terms and is non-linear. Most state-of-the-art neural network verifiers (e.g., $\alpha\beta$ -CROWN, Marabou) are optimized for linear constraints or simple Boolean combinations of linear atoms. They struggle with complex post-conditions involving confidence thresholds, strong robustness, or top- $k$ constraints.
Encoding Difficulty: Directly encoding complex confidence-based properties into verifiers often requires modifying the solver's source code (which may be proprietary or complex) or using ad-hoc constraint encodings that do not scale well.

2. Methodology

The authors propose a unified framework that transforms complex confidence-based specifications into a format compatible with existing verification tools without modifying the tools themselves. The methodology consists of three main components:

A. Generalized Grammar for Specifications

The authors define a formal grammar to capture a wide variety of robustness properties, including:

Linear Expressions (LE): Standard linear constraints on logits.
Confidence Constraints (CC): Constraints on the Softmax output (e.g., $Conf(\bar{y}, t) \geq b$ ).
Post-Conditions (PC): Boolean combinations (AND/OR) of LE and CC.

This grammar unifies existing concepts like Relaxed Robustness (ignoring low-confidence errors), Strong Robustness (ensuring confidence doesn't drop drastically), Smoothness (bounding confidence variation), and Top- $k$ Robustness.

B. Softmax Approximation

Since verifiers cannot handle the exponential nature of Softmax directly, the authors introduce a sound approximation technique to convert confidence constraints into Linear Real Arithmetic (LRA) constraints.

Mechanism: They derive bounds based on the gap between the logit of the target class ( $y_t$ ) and the maximum logit of other classes ( $y_{t'}$ ).
Soundness: The approximation provides formal guarantees on the error bound. For example, if $y_t < y_{t'} + \delta$ , the confidence is guaranteed to be below a threshold $b$ . This allows the non-linear confidence check to be replaced by a linear inequality.

C. Layer-Based Encoding (The Core Innovation)

To handle complex Boolean combinations of these linear constraints without modifying the verifier, the authors propose appending additional layers to the neural network.

Concept: The post-condition is encoded as a sequence of new layers attached to the original network.
Boolean Logic via ReLU:
- Conjunctions ( $\land$ ): Modeled by summing ReLU outputs. If all inputs are negative (satisfying the condition), the sum is 0; otherwise, it is positive.
- Disjunctions ( $\lor$ ): Modeled similarly but with inverted logic.
- Flip Operation: To handle nested combinations (e.g., ANDs inside ORs), a novel "flip" technique is used to reverse the signal polarity (mapping positive to negative and vice versa) while maintaining low error bounds.
Result: The complex property $Q$ is transformed into a simple condition on a single new output node $y$ (e.g., $y \geq 0$ or $y < \eta$ ). This allows any standard verifier (like $\alpha\beta$ -CROWN) to treat the property as a standard robustness query on the augmented network.

3. Key Contributions

Unified Grammar: A formal grammar that captures diverse robustness variants (Relaxed, Strong, Smooth, Top- $k$ ) involving confidence.
Sound Softmax Approximation: A technique to approximate confidence constraints into linear constraints with provable error bounds, enabling integration with linear solvers.
Layer-Based Encoding: A novel method to encode arbitrary Boolean combinations of constraints by appending layers to the network. This eliminates the need to modify verifier source code and enables the use of high-performance tools like $\alpha\beta$ -CROWN as black boxes.
Novel Robustness Definitions: Formalization of "Relaxed Robustness" (ignoring low-confidence counterexamples) and "Strong Robustness" (monitoring confidence drops), providing more nuanced safety guarantees.

4. Experimental Results

The authors evaluated their framework on 8,870 benchmarks across four datasets (MNIST, CIFAR-10, GTSRB, ImageNet-1K), covering networks from 0.5K to 13.16M activation units (up to 138M parameters).

Performance vs. Ad-hoc Encoding:
- The proposed layer-based approach significantly outperformed ad-hoc constraint encodings when used with the constraint-based solver Marabou.
- Crucially, it enabled the use of $\alpha\beta$ -CROWN (a portfolio solver using PGD attacks and bound propagation) for these complex properties. $\alpha\beta$ -CROWN with the layer-based encoding consistently outperformed Marabou (both ad-hoc and layered) across all confidence thresholds.
Scalability: The method successfully verified properties on large-scale networks (e.g., VGG16 with 138M parameters) where direct encoding often failed due to memory limits or timeout.
Ablation Studies:
- Relaxed Robustness: Increasing the confidence threshold ( $\tau$ ) reduced the number of "unsafe" (counterexample) cases and timeouts, as the search space tightened.
- GTSRB Anomaly: The authors observed that GTSRB networks often had 100% confidence on seed images and counterexamples, suggesting a potential vulnerability where networks make confident but incorrect decisions.
- Top- $k$ & Affinity: The framework successfully verified Top- $k$ and Affinity robustness, showing that allowing misclassification within specific semantic groups (e.g., "animals" vs. "vehicles") significantly increases the "safe" verification rate compared to standard robustness.

5. Significance

Bridging the Gap: This work bridges the gap between complex, real-world safety requirements (which involve confidence levels) and the capabilities of current formal verification tools.
Tool Agnosticism: By transforming the problem into a network architecture modification rather than a solver modification, the approach is compatible with any state-of-the-art verifier, future-proofing the methodology.
Safety Nuance: It moves the field beyond binary "robust/non-robust" classifications, allowing engineers to define safety criteria that account for the degree of certainty, which is critical for safety-critical applications like autonomous driving and medical diagnosis.
Efficiency: The experimental results demonstrate that this "unified" approach is not only more expressive but also computationally more efficient than previous specialized encoding methods.