FLUID: A Fine-Grained Lightweight Urban Signalized-Intersection Dataset of Dense Conflict Trajectories

Imagine trying to understand how a bustling city intersection works. You could stand on a corner and watch cars go by, but you'd miss what's happening on the other side of the street. You could sit in a car and see what's right in front of you, but you'd be blind to the chaos happening behind you.

This paper introduces FLUID, a new "super-eye" for traffic researchers. It's a massive dataset created by flying drones over three busy intersections in China, capturing every car, bike, and pedestrian in high definition.

Here is the breakdown of what makes FLUID special, explained with some everyday analogies:

1. The Problem: The "Blind Spot" of Traffic Data

Before FLUID, traffic datasets were like trying to solve a puzzle with missing pieces.

Ground cameras are like security guards standing at one corner; they can't see the whole picture and sometimes scare drivers into driving differently.
Car sensors (like those in self-driving cars) are like looking through a narrow keyhole; you see what's right in front of you, but you miss the big picture.
Existing drone data was often too blurry, missed small details (like pedestrians), or didn't have enough "traffic jams" to study how people react when things get crowded.

2. The Solution: The "Drone Chef"

The researchers didn't just fly a drone and hope for the best. They built a lightweight, step-by-step kitchen to process the raw video into a perfect meal of data.

Stabilization: Drones wobble in the wind. The team used a digital "steady-cam" to smooth out the shaky footage, like stabilizing a shaky hand holding a camera.
The "All-Star" Detection Team: Instead of using one robot to find cars and bikes, they trained three different AI models (like three different detectives) and let them work together. One is great at spotting tiny mopeds, another at spotting big trucks, and the third at catching fast-moving cars. They combine their findings to ensure nothing is missed.
The "Traffic Cop" Filter: Sometimes the AI gets confused and thinks one car is two cars, or sees a shadow as a vehicle. The team built a smart filter (using math called "Time-to-Collision") that acts like a strict traffic cop, removing the fake cars and keeping only the real ones.

3. What's in the Box? (The Dataset)

FLUID isn't just a video file; it's a complete "traffic simulation kit" containing:

The Scenes: Three different types of intersections (a standard 4-way, a 4-way with a special right-turn lane, and a T-junction).
The Crowd: Over 20,000 traffic participants (cars, trucks, buses, tricycles, mopeds, and pedestrians).
The Drama: This is the best part. The dataset is full of conflicts. While other datasets might have 1 or 2 near-misses per minute, FLUID has 2.8 conflicts per minute. It's like capturing a movie scene where the action never stops. About 15% of all the cars in the video were involved in a near-miss or a rule-breaking moment.
The Rules: They recorded the traffic lights, the road maps, and even the specific "intentions" of drivers (e.g., "I'm turning left but didn't yield").

4. Why Does This Matter? (The "Why")

Think of FLUID as a gym for self-driving cars and traffic planners.

For Self-Driving Cars: To teach a robot to drive, you need to show it dangerous situations, not just empty roads. FLUID provides thousands of "near-accidents" so the AI can learn how to react when a pedestrian darts out or a truck cuts them off.
For City Planners: It helps them see exactly where people break the rules. Are people running red lights at the left turn? Are pedestrians jaywalking at the crosswalk? The data pinpoints these "hotspots" so cities can fix them.
For Researchers: It's transparent. Unlike some commercial tools that are "black boxes" (you pay, you get data, you don't know how it was made), FLUID gives you the raw video, the code, and the math. It's like giving a chef the recipe, not just the finished cake.

The Bottom Line

FLUID is a high-definition, drone-shot, traffic conflict library that fills the gaps in our understanding of how humans and machines interact at intersections. It's designed to be the "gold standard" for making our roads safer and our self-driving cars smarter, by showing them exactly what happens when traffic gets messy.

1. Problem Statement

Urban intersections are critical nodes for traffic safety and efficiency, characterized by complex interactions and high densities of conflicts among Traffic Participants (TPs). While trajectory data is essential for Intelligent Transportation Systems (ITS), existing datasets suffer from three primary limitations:

Scene Representativeness: Most datasets cover limited intersection types (often just 3–4) and lack diverse control strategies (e.g., protected vs. permissive phases) or dedicated channelization (e.g., right-turn lanes).
Information Richness: There is a lack of fine-grained behavioral annotations (e.g., specific conflict types, violation intentions) and poor classification of Vulnerable Road Users (VRUs) like pedestrians and two-wheelers.
Data Fidelity & Transparency: Many datasets rely on "black-box" commercial processing (e.g., DataFromSky) without releasing raw data or detailed processing pipelines, hindering reproducibility and independent validation of spatio-temporal accuracy. Additionally, ground-based sensors suffer from occlusions, while existing aerial datasets often lack the resolution needed for dense conflict analysis.

2. Methodology

The authors propose FLUID, a dataset and a lightweight, full-pipeline framework for drone-based trajectory processing. The methodology consists of three main stages:

A. Data Acquisition

Setup: 14 flight campaigns were conducted at three distinct signalized intersections in Xuancheng, China (FI: Four-way, FIDRT: Four-way with Dedicated Right-Turn lanes, TI: T-intersection).
Hardware: A DJI Mini 3 drone captured 4K video at ~30 FPS from an altitude of 100–120 meters. Ground observers recorded traffic signal states using synchronized cameras.
Duration: Approximately 5 hours of raw footage covering over 20,000 TPs across 8 categories.

B. Lightweight Processing Pipeline

The framework transforms raw video into high-fidelity trajectories through:

Video Pre-processing:
- Stabilization: A two-stage pipeline using AKAZE/BRISK feature detectors and RANSAC-based motion compensation to remove wind-induced jitter.
- Masking: Anonymization of non-road areas (except for FIDRT to preserve VRU context).
- Downsampling: Reduced to 10 FPS, sufficient for human decision-making analysis.
Trajectory Acquisition:
- Object Detection: An ensemble strategy using YOLOv8 trained on three specialized datasets (DroneVehicle_Revised, CODrone, Songdo Vision). The system uses Oriented Bounding Boxes (OBB) for better kinematic analysis and fuses outputs based on "advantage categories" of each detector.
- Tracking: Utilizes SparseTrack, an enhancement of ByteTrack using pseudo-depth estimation and Deep Cascade Matching (DCM) to handle occlusions in dense traffic.
- Georeferencing: Camera calibration and lens distortion correction followed by mapping pixel coordinates to a local Cartesian system using RTK-GNSS ground control points.
Data Fusion & Refinement:
- Motion Refinement: Stabilization of physical dimensions and orientation (heading/yaw) using bidirectional smoothing.
- Bounding Box Filtering: A novel two-stage filter using 2D Surrogate Safety Measures (SSMs):
  - Time-to-Collision (TTC): Detects overlapping bounding boxes.
  - Dynamic Gap Time (DGT): Identifies persistent parallel overlaps (false positives) where vehicles move together but are not the same object.
- S-T Matching: Aligns trajectories with road maps (Lanelet2/OSM) and traffic signal states to assign turning movements and detect violations.

C. Conflict Quantification

Conflicts are identified using minTTC and DGT thresholds ( $0 \le TTC \le 2.0s$ , $0 \le DGT \le 4.0s$ ). Conflicts are classified into four types based on yaw angle differences: Rear-end, Sideswipe, Angle, and Head-on.

3. Key Contributions

Dataset (FLUID):
- High Density: Records an average of 2.8 conflicts per minute, with 15.14% of all motor vehicles involved in conflicts (significantly higher than existing datasets like INTERACTION or inD).
- Diversity: Covers three intersection types (FI, FIDRT, TI) with mixed traffic flows including cars, trucks, buses, tricycles, mopeds, and pedestrians.
- Rich Annotations: Includes synchronized traffic signals, fine-grained conflict types, violation annotations, and behavioral intentions.
Methodological Framework:
- A transparent, open-source pipeline (detection, tracking, georeferencing, fusion) that avoids "black-box" commercial tools.
- Introduction of DGT for filtering redundant bounding boxes in dense scenes.
- Validation of the 10 FPS downsampling strategy, showing it yields more stable speed profiles than higher frame rates.
Validation:
- Validated against the DataFromSky (DFS) platform and RTK-GNSS ground truth.
- Results show FLUID has lower miss rates (near-zero) and lower ID switch rates (2–5%) compared to DFS, with positional errors under 0.3m on straight segments.

4. Results

Accuracy: The FLUID framework achieves high spatio-temporal accuracy, outperforming commercial benchmarks in object recall and tracking stability.
Conflict Analysis:
- VRU Interaction: VRUs constitute 35.4% of agents within a 10m radius of conflict pairs in FLUID, compared to <24% in other major datasets, making it ideal for studying mixed-traffic safety.
- Behavioral Diversity: The dataset reveals complex yielding strategies (e.g., 502 left-turn yielding instances vs. 415 through-moving yielding instances) and diverse violation patterns across signal cycles.
Application Potential: The data supports advanced research in human preference mining, traffic behavior modeling, and autonomous driving decision-making, particularly for large vehicle detection and VRU spatiotemporal violation analysis.

5. Significance

The FLUID dataset addresses a critical gap in urban traffic research by providing a high-fidelity, reproducible, and conflict-rich resource. Unlike previous datasets that focus on highways or sparse intersections, FLUID specifically targets the complexity of urban signalized intersections. Its open pipeline and raw data availability allow researchers to:

Reproduce and verify trajectory extraction results, enhancing scientific rigor.
Model complex interactions between motor vehicles and VRUs in dense, real-world scenarios.
Develop safer autonomous driving systems by training on realistic conflict and violation data.
Optimize traffic control strategies based on fine-grained behavioral insights.

The dataset is publicly available via figshare, and the processing code is hosted on GitHub, fostering community-driven advancement in traffic safety and autonomous driving research.