OD-RASE: Ontology-Driven Risk Assessment and Safety Enhancement for Autonomous Driving

Imagine you are driving a car, but instead of a human behind the wheel, it's a very smart robot. This robot has super-powered eyes and can see everything around it better than any human. However, even with these superpowers, the robot sometimes gets confused or scared by weird road designs—like a sharp curve hidden behind a building or a confusing intersection.

Currently, when a robot car crashes, we fix the road after the accident happens. It's like waiting for a house to catch fire before you install smoke detectors. This paper introduces a new system called OD-RASE that acts like a "Proactive Road Doctor." Instead of waiting for a crash, it looks at a road, predicts where a robot might get confused, and suggests how to fix the road before anyone gets hurt.

Here is how it works, broken down into simple steps:

1. The "Expert Rulebook" (The Ontology)

Imagine you have a massive library of books written by the world's top traffic engineers. These books explain exactly why certain roads are dangerous and how to fix them.

The Problem: These books are huge, messy, and written in complex language. A computer can't just read them and understand them easily.
The Solution: The researchers took all that expert knowledge and organized it into a strict "Rulebook" (called an Ontology). Think of this like a flowchart or a decision tree. It says: "If the road looks like X, it's dangerous because of Y, and the fix is Z." This turns human wisdom into a format a computer can strictly follow.

2. The "AI Intern" (The LVLM)

Next, they hired a super-smart AI (a Large Visual Language Model) to act as an intern.

The Job: The AI looks at photos of roads and tries to guess what's wrong and how to fix it.
The Risk: AI can sometimes "hallucinate" or make things up. It might suggest building a bridge over a sidewalk, which is a terrible idea.
The Fix (The Filter): This is where the "Rulebook" comes in. The AI's suggestions are run through the Rulebook. If the AI suggests something that isn't in the Rulebook (like the bridge over the sidewalk), the system says, "Nope, that's not in the expert manual. Discard it."
The Result: They created a high-quality dataset of "Road Problems" and "Expert-Approved Fixes" that the computer can trust.

3. The "Magic Paintbrush" (The Diffusion Model)

Once the system identifies a problem and suggests a fix, it needs to show people what the fix looks like.

The Problem: Telling a city planner, "We need to widen the lane," is boring and hard to visualize.
The Solution: The system uses a "Magic Paintbrush" (a Diffusion Model, the same tech behind AI art generators). It takes the original photo of the dangerous road and paints the improvement right onto it.
The Analogy: It's like using Photoshop to instantly show a city council what a street would look like if they added a new bike lane or a better sign. You can see the "Before" and "After" instantly.

4. Why This Matters

For Robots: It helps self-driving cars understand the world better by teaching them to spot "traps" in the road design before they drive into them.
For Humans: It helps city planners fix roads before accidents happen. Instead of reacting to a tragedy, they can proactively make the streets safer for everyone—pedestrians, cyclists, and drivers alike.

The Big Takeaway

Think of OD-RASE as a team of three:

The Librarian: Who organizes all the expert rules.
The Intern: Who looks at the roads and makes suggestions.
The Artist: Who draws the picture of the perfect road.

Together, they don't just wait for accidents; they look at the road, say, "Hey, this curve looks tricky for a robot," and then draw a picture of how to make it safe. It's a shift from reacting to disasters to preventing them before they happen.

1. Problem Statement

While autonomous driving (AD) systems have achieved high perception accuracy, they remain vulnerable to rare situations and complex road infrastructures designed primarily for human drivers. Current safety improvements are largely reactive, occurring only after accidents happen. Experts analyze causes and propose infrastructure changes (e.g., adding signage, widening lanes) based on post-accident data.

The Gap: There is a lack of proactive mechanisms to identify potential road structural risks before accidents occur.
The Challenge: Existing datasets focus on object detection or risk prediction but lack structured data linking specific road structures to concrete infrastructure improvement proposals. Furthermore, relying solely on general Large Vision-Language Models (LVLMs) often yields plausible but inaccurate or unsafe suggestions due to a lack of specialized domain knowledge.

2. Methodology

The authors propose OD-RASE, a framework that proactively identifies accident-prone road structures and generates corresponding infrastructure improvement plans and visualizations. The methodology consists of three core stages:

A. Ontology Construction (Expert Knowledge Formalization)

To ensure reliability, the authors first formalized a domain-specific ontology based on expert knowledge from road traffic systems.

Process: They analyzed over 390 real-world accident cases and expert guidelines.
Refinement: Initial granular categories (30 accident-causing structures, 26 improvement types) were merged and filtered to remove time-dependent factors (e.g., traffic volume) and corner cases.
Final Ontology: Reduced to 11 types of accident-causing road structures (e.g., poor visibility, inappropriate intersection structure) and 10 types of countermeasures (e.g., improvement of road alignment, adding signs).
Representation: These are structured as a directed reference graph ( $G_A$ ) for validation.

B. Ontology-Driven Dataset Construction (G2CoT & Filtering)

Since no existing dataset links road images to improvement plans, the authors constructed one automatically using a Graph-Based Grounded Chain-of-Thought (G2CoT) prompting strategy.

Generation: A large-scale LVLM (GPT-4o) processes road images using a multi-stage CoT prompt:
- Step 1: Identify static traffic risks.
- Step 2: Infer accident-causing factors (mapped to the 11 ontology types).
- Step 3: Predict the accident occurrence process.
- Step 4: Propose infrastructure improvements (mapped to the 10 ontology types).
Filtering (The Core Innovation): The generated proposals are converted into a generated graph ( $G_B$ $G_{B}$ ). The system performs graph matching against the expert reference graph ( $G_A$ $G_{A}$ ).
- It computes the intersection of nodes and edges ( $V' = V_B \cap V_A$ , $E' = E_B \cap E_A$ ).
- It removes isolated nodes and edges that do not align with expert knowledge.
- Result: Only proposals strictly consistent with the ontology are retained, discarding >50% of initial AI-generated noise. This creates a high-quality, trusted multimodal dataset.

C. OD-RASE Model Architecture

The baseline model is trained on the filtered dataset to perform two tasks:

Risk & Proposal Prediction: A multi-modal model (Vision Encoder + Text Encoder + Grounding Block) that predicts the accident-causing structure and the corresponding improvement plan. It uses a multi-label classification head.
Visual Synthesis: A Diffusion Model (Instruct Pix2Pix) takes the predicted improvement text and the original road image to generate a visual representation of the "post-improvement" road environment. This aids non-experts (planners, community members) in visualizing the solution.

3. Key Contributions

Novel Framework: A proactive system that links road structure analysis directly to infrastructure improvement proposals, shifting from reactive to preventive safety.
Expert Ontology: The formalization of road traffic safety knowledge into a structured ontology (11 risk types, 10 countermeasures) that serves as a ground-truth filter.
Ontology-Driven Data Filtering: A novel method using graph matching to filter LVLM-generated data, significantly enhancing dataset reliability and reducing hallucinations.
Visual Decision Support: Integration of diffusion models to visualize proposed infrastructure changes, bridging the gap between technical proposals and stakeholder understanding.

4. Experimental Results

The framework was evaluated on Mapillary Vistas and BDD100K datasets.

Prediction Accuracy:
- The best performing configuration (Long-CLIP vision encoder + RoBERTa text encoder) achieved an F1-Score of ~70.26% and Accuracy of ~42.14% on infrastructure improvement prediction.
- Ablation Study: Removing the ontology-based data filtering resulted in a drastic drop in accuracy (from 42.14% to 0.00% in some metrics), proving that filtering is essential for learning correct domain logic.
- Modality: Combining image and text inputs outperformed using either modality alone.
Zero-Shot Generalization:
- OD-RASE demonstrated strong robustness when trained on one dataset (e.g., BDD100K) and tested on another (Mapillary).
- Comparison with Generalist Models: State-of-the-art generalist LVLMs (GPT-4o, LLaVA-1.5, Qwen2-VL) performed significantly worse (F1-Scores ~15–34%) in zero-shot settings, confirming that general models lack the specific domain knowledge required for infrastructure safety planning.
Visual Generation:
- The diffusion-based layout control successfully generated images reflecting specific improvements (e.g., adding lane markings, widening roads, adding signs).
- Expert evaluation showed high "Prompt Faithfulness," with generated images accurately reflecting the proposed structural changes.

5. Significance and Impact

Proactive Safety: OD-RASE moves beyond detecting immediate hazards to identifying systemic infrastructure flaws, enabling cities to upgrade roads before accidents occur.
Data Quality: The paper demonstrates that "raw" AI generation is insufficient for safety-critical tasks; human-in-the-loop ontology filtering is necessary to create reliable training data.
Stakeholder Communication: By generating visual "before-and-after" scenarios, the system facilitates better communication between AI engineers, urban planners, and the public, accelerating the adoption of safer road designs.
Broader Adoption: The work addresses a critical bottleneck in autonomous driving: the need for environments that are inherently safe for machines, not just humans.

In conclusion, OD-RASE provides a robust, ontology-guided pipeline for automating the identification of road safety risks and generating actionable, visualizable infrastructure improvements, marking a significant step toward safer autonomous transportation ecosystems.