From Verbatim to Gist: Distilling Pyramidal Multimodal Memory via Semantic Information Bottleneck for Long-Horizon Video Agents

This paper introduces MM-Mem, a cognition-inspired pyramidal multimodal memory architecture that leverages Fuzzy-Trace Theory and a Semantic Information Bottleneck to progressively distill verbatim visual details into abstract semantic schemas, thereby enabling efficient long-horizon video understanding through hierarchical storage and entropy-driven retrieval.

Niu Lian, Yuting Wang, Hanshu Yao + 5 more2026-03-03💬 cs.CL

UltraStar: Semantic-Aware Star Graph Modeling for Echocardiography Navigation

To address the limitations of existing sequential models in handling noisy echocardiography probe trajectories, this paper proposes UltraStar, a semantic-aware star graph framework that reformulates navigation as anchor-based global localization by connecting the current view directly to representative historical keyframes, thereby achieving robust performance and better scalability on large-scale datasets.

Teng Wang, Haojun Jiang, Chenxi Li + 6 more2026-03-03💻 cs

SCATR: Mitigating New Instance Suppression in LiDAR-based Tracking-by-Attention via Second Chance Assignment and Track Query Dropout

This paper presents SCATR, a novel LiDAR-based tracking-by-attention framework that mitigates new instance suppression and bridges the performance gap with detection-based methods through two architecture-agnostic training strategies: Second Chance Assignment and Track Query Dropout, achieving state-of-the-art results on the nuScenes benchmark.

Brian Cheong, Letian Wang, Sandro Papais + 1 more2026-03-03💻 cs

ATA: Bridging Implicit Reasoning with Attention-Guided and Action-Guided Inference for Vision-Language Action Models

The paper proposes ATA, a novel training-free, plug-and-play framework that enhances Vision-Language-Action models by introducing implicit reasoning through complementary attention-guided and action-guided strategies, thereby improving task success and robustness without the need for additional annotations or retraining.

Cheng Yang, Jianhao Jiao, Lingyi Huang + 8 more2026-03-03🤖 cs.AI

Rate-Distortion Signatures of Generalization and Information Trade-offs

This paper introduces a rate-distortion-theoretic framework that characterizes the generalization trade-offs of human and machine vision systems using geometric signatures of slope and curvature, revealing that while both follow a common lossy-compression principle, humans exhibit smoother and more flexible trade-offs compared to the steeper, more brittle regimes of modern deep networks.

Leyla Roksan Caglar, Pedro A. M. Mediano, Baihan Lin2026-03-03🧬 q-bio

Downstream Task Inspired Underwater Image Enhancement: A Perception-Aware Study from Dataset Construction to Network Design

This paper proposes a Downstream Task-Inspired Underwater Image Enhancement (DTI-UIE) framework that integrates a human visual perception model, a task-driven perceptual loss, and an automatically constructed dataset to generate enhanced images specifically optimized for improving downstream vision tasks like object detection and semantic segmentation.

Bosen Lin, Feng Gao, Yanwei Yu + 2 more2026-03-03⚡ eess

Neural Operator-Grounded Continuous Tensor Function Representation and Its Applications

This paper introduces Neural Operator-Grounded Continuous Tensor Function Representation (NO-CTR), a novel framework that replaces discrete, linear mode-nn products with continuous, nonlinear neural operators to more faithfully represent complex real-world data across various grid structures and point clouds, while theoretically guaranteeing universal approximation and demonstrating superior performance in multi-dimensional data completion tasks.

Ruoyang Su, Xi-Le Zhao, Sheng Liu + 3 more2026-03-03🔢 math

Event-Only Drone Trajectory Forecasting with RPM-Modulated Kalman Filtering

This paper proposes a novel event-only drone trajectory forecasting method that extracts propeller rotational speed directly from raw event data and integrates it into an RPM-aware Kalman filter, achieving superior short-to-medium horizon prediction accuracy compared to learning-based approaches without relying on RGB imagery or training data.

Hari Prasanth S. M., Pejman Habibiroudkenar, Eerik Alamikkotervo + 2 more2026-03-03⚡ eess

3D Field of Junctions: A Noise-Robust, Training-Free Structural Prior for Volumetric Inverse Problems

This paper introduces a training-free, noise-robust 3D Field of Junctions (3D FoJ) representation that optimizes volumetric wedge junctions to serve as a structural prior, successfully outperforming both classical and neural methods in low-SNR 3D imaging tasks such as CT, cryo-ET, and point cloud denoising without risking hallucination.

Namhoon Kim, Narges Moeini, Justin Romberg + 1 more2026-03-03⚡ eess

Data Augmentation via Mixed Class Interpolation using Cycle-Consistent Generative Adversarial Networks Applied to Cross-Domain Imagery

This paper proposes a novel data augmentation method called Conditional CycleGAN Mixup Augmentation (C2GMA) that leverages visible-band imagery to synthesize mixed-class non-visible domain examples via CycleGANs, significantly improving classification accuracy in data-scarce Synthetic Aperture Radar (SAR) applications.

Hiroshi Sasaki, Chris G. Willcocks, Toby P. Breckon2026-03-02🤖 cs.LG