The trajectoRIR Database: Room Acoustic Recordings Along a Trajectory of Moving Microphones

Imagine you are trying to teach a robot how to "hear" a room the way a human does. You want the robot to understand not just what a room sounds like when it's standing still, but how the sound changes as it walks, runs, or turns its head.

For a long time, scientists had two separate toolkits for this:

The "Still Photo" Toolkit: A massive library of static sound snapshots (called Room Impulse Responses, or RIRs). These tell you exactly how sound bounces off walls when you are standing in one spot.
The "Video" Toolkit: Recordings of people or robots moving around while making noise. These show how sound changes over time as you move.

The Problem: Until now, these two toolkits didn't match. If you wanted to teach a robot to hear while walking, you had to guess how the "still photos" would change as it moved. It was like trying to predict how a movie looks by only looking at a few random frames from a photo album. The guesswork often led to robots that sounded robotic and unnatural.

The Solution: The trajectoRIR Database
The authors of this paper built a new, super-powered database called trajectoRIR. Think of this as a "perfectly synchronized movie and photo album" of a room.

Here is how they did it, using some simple analogies:

1. The "Train Track" Setup

Imagine a room with a giant, L-shaped train track running through the middle. Instead of a train, they put a robotic cart on the track.

The Cart: This cart carries different types of "ears" (microphones).
The Journey: The cart moves along the track at three different speeds: a slow stroll (0.2 m/s), a brisk walk (0.4 m/s), and a fast jog (0.8 m/s).
The "Ears": They didn't just use one type of ear. They used:
- A Dummy Head (a fake human head with ears) to hear exactly like a person.
- Circular Arrays (ears arranged in a circle) to hear sound from all directions.
- Linear Arrays (ears in a straight line) to hear sound from specific angles.

2. The "Double Recording" Magic

This is the secret sauce. As the cart moves down the track, they did two things simultaneously:

The Movie: They played sounds (piano, drums, speech, noise) from two speakers in the room and recorded what the moving cart heard. This is the "dynamic" part.
The Photo Album: At every single stop along the track (every 5 or 10 centimeters), they stopped the cart and took a "snapshot" of the room's acoustics. They measured exactly how sound travels from the speakers to that specific spot.

Why is this special?
Usually, you either have the movie or the photo album. With trajectoRIR, you have both for the exact same path. You know exactly what the room sounded like at the start, the middle, and the end, and you have the continuous recording of the cart moving between those points.

3. What Can We Do With This?

The researchers tested this database by trying to predict how sound changes as you move. They tried three methods:

Method A (The Guess): Just looking at the "photo album" (static spots) and guessing what happens in between. Result: The guess was okay, but the "movie" sounded fake.
Method B (The Improv): Just listening to the "movie" (moving recording) and trying to figure out the room's rules. Result: The "movie" sounded great, but the understanding of the room was shaky.
Method C (The Perfect Blend): Using the "photo album" to learn the rules of the room, and the "movie" to fill in the gaps. Result: This was the winner. It produced the most realistic sound and the most accurate understanding of the room.

The Takeaway

The trajectoRIR database is like a "training gym" for audio engineers and AI developers. It provides the perfect, matched data needed to teach computers how to hear in a moving world.

Whether you are building:

Virtual Reality where you can walk around a digital room and hear it change naturally.
Hearing Aids that adjust instantly as you turn your head.
Robots that need to find their way by listening.

...this database gives them the real-world data they need to stop guessing and start hearing like a human. It bridges the gap between "standing still" and "moving around," making digital sound much more lifelike.

Here is a detailed technical summary of the paper "The trajectoRIR Database: Room Acoustic Recordings Along a Trajectory of Moving Microphones."

1. Problem Statement

The development of advanced acoustic signal processing algorithms (e.g., sound source localization, spatial audio reconstruction, and auralization) relies heavily on large, diverse datasets. While existing databases offer either static Room Impulse Responses (RIRs) or dynamic audio recordings (moving microphones), a critical gap exists: there is no comprehensive dataset that provides matched recordings of both stationary RIRs and moving-microphone audio along the same controlled trajectory.

This gap limits the ability to:

Accurately estimate time-variant RIRs for dynamic scenes.
Validate algorithms for spatially dynamic sound field reconstruction.
Train data-driven models (Deep Learning) that require both physical ground truth (RIRs) and dynamic signal evolution.
Synthesize realistic dynamic acoustic scenes, as current methods often rely on imperfect simulations or lack the necessary ground truth for interpolation.

2. Methodology

The authors constructed the trajectoRIR database by recording acoustic data in a controlled laboratory environment using a robotic cart system.

Experimental Setup

Environment: The Alamire Interactive Laboratory (AIL) in Heverlee, Belgium. The room is approximately 208 m³ with a reverberation time ( $T_{20}$ ) of 0.5 s.
Trajectory: A smooth L-shaped trajectory (two straight segments connected by a 90° curve) was built using a modular rail system. The total length is ~4.62 m.
Motion Control: A robotic cart moved along the rail at three constant speeds: 0.2, 0.4, and 0.8 m/s (walking speed range).
Sound Sources: Two stationary loudspeakers (Genelec 8030 CP) were placed on opposite sides of the trajectory to simulate different source-receiver geometries.

Microphone Configurations (3 Types)

The database captures data using three distinct array configurations to ensure versatility:

MC1 (Dummy Head): Includes a Neumann KU-100 dummy head with in-ear microphones, two reference microphones near the ears, a 16-channel Uniform Circular Array (UCA) at ear height, and a 4-channel "crown" array above the head.
MC2 (Dummy Head without DH): Identical to MC1 but without the dummy head, using only the reference mics and arrays.
MC3 (Ambisonics & Linear): Three First-Order Ambisonics (FOA) microphones and a 12-channel Uniform Linear Array (ULA).

Signal Acquisition

Stationary Recordings (STAT): RIRs were measured at 46 positions (MC1/MC2) and 92 positions (MC3) along the trajectory using exponential sine sweeps. Total: 8,648 RIRs.
Moving Recordings (MOV): The cart moved while playing six distinct source signals: Piano, Drums, Female Speech, White Noise, and two Perfect Sweeps (1 kHz and 8 kHz). Total: 108 multi-channel recordings.
Ego-Noise: Mechanical noise from the cart was recorded separately to facilitate noise reduction research.
Metadata: Extensive CSV files provide geometric coordinates, timestamps, speed data, and ambient temperature for every recording.

Data Processing

Latency Compensation: A state-space model using a Kalman filter was employed to estimate and compensate for system latency differences between the RIR and moving recordings, ensuring precise temporal alignment.
Synchronization: High-speed video (240 FPS) was used to manually timestamp the cart's passage over specific rail markers, which were then mapped to the audio signals via cross-correlation.

3. Key Contributions

First Matched Dynamic/Static Dataset: The primary contribution is the provision of a database containing both stationary RIRs and moving-microphone audio recorded along the exact same trajectory, enabling direct comparison and hybrid algorithm development.
Multi-Array Diversity: The inclusion of three distinct microphone configurations (including a dummy head, Ambisonics, and linear/circular arrays) allows the database to be used for a wide range of research, from binaural audio to high-order spatial reconstruction.
Comprehensive Metadata & Tools: The release includes Python scripts to access audio, retrieve precise geometric coordinates, and load metadata (temperature, timestamps), lowering the barrier to entry for researchers.
Ego-Noise Characterization: The inclusion of cart mechanical noise recordings supports the development of self-noise reduction algorithms for mobile robotics.

4. Results and Evaluation

The authors evaluated the database's utility through a Time-Variant RIR Estimation use case, comparing three estimation methods:

Linear Interpolation (LI): Using only sparse stationary RIRs.
Purely Data-Driven Kalman Filter (KF- $\alpha$ ): Using only moving-microphone audio.
Hybrid Kalman Filter (KF-A(l)): Combining sparse RIRs with moving audio and a physical propagation model.

Key Findings:

Interpolation Limitations: Linear interpolation of sparse RIRs alone yielded poor correlation coefficients (~0.3–0.6) for synthesized moving signals, failing to capture dynamic effects and mechanical noise.
Data-Driven vs. Hybrid: The purely data-driven Kalman filter produced the most accurate synthesized audio signals but generated RIR estimates that deviated significantly from the ground-truth stationary measurements.
Optimal Approach: The Hybrid Kalman Filter achieved the best balance. It produced synthesized signals with high correlation (comparable to the data-driven method) while maintaining high accuracy against the stationary RIR ground truth.
Conclusion: The evaluation confirms that combining sparse stationary RIRs with moving-microphone recordings is essential for robust time-variant RIR estimation.

5. Significance

The trajectoRIR database addresses a fundamental bottleneck in room acoustics research by providing the "ground truth" necessary to bridge the gap between static acoustic modeling and dynamic scene simulation. Its significance lies in:

Algorithm Validation: It allows researchers to rigorously test algorithms for dynamic sound field reconstruction, auralization, and source tracking in realistic moving scenarios.
Machine Learning: It provides the large-scale, diverse, and labeled data required to train deep learning models for dynamic acoustic environments, which previously relied heavily on synthetic (and often inaccurate) simulations.
Reproducibility: The modular rail system and open-source CAD files allow the community to replicate the setup or extend it to new geometries, fostering a standardized benchmark for moving microphone research.

The database is publicly available (7.47 GB, 3.4 hours of audio) and includes all source signals and processing tools, making it a foundational resource for the next generation of spatial audio and acoustic signal processing technologies.