Interpretable Pre-Release Baseball Pitch Type Anticipation from Broadcast 3D Kinematics

This paper presents a scalable, interpretable framework that achieves 80.4% accuracy in classifying eight professional baseball pitch types using only monocular 3D body kinematics, revealing that upper-body mechanics—particularly wrist position and trunk tilt—are the primary predictors while establishing an empirical ceiling for grip-based distinctions.

Jerrin Bright, Michelle Lu, John Zelek2026-03-06🤖 cs.AI

Federated Modality-specific Encoders and Partially Personalized Fusion Decoder for Multimodal Brain Tumor Segmentation

This paper proposes FedMEPD, a novel federated learning framework that addresses intermodal heterogeneity and the need for personalization in multimodal brain tumor segmentation by employing federated modality-specific encoders, a server-side fusion decoder for global optimization, and partially personalized decoders enhanced by cross-attention mechanisms to handle clients with incomplete imaging modalities.

Hong Liu, Dong Wei, Qian Dai + 3 more2026-03-06💻 cs

FC-VFI: Faithful and Consistent Video Frame Interpolation for High-FPS Slow Motion Video Generation

The paper proposes FC-VFI, a novel video frame interpolation method that leverages latent temporal modeling, semantic matching lines, and a temporal difference loss to achieve high-fidelity, motion-consistent 4x and 8x frame rate upscaling from 30 FPS to 120/240 FPS at 2560×1440 resolution, overcoming the fidelity and consistency limitations of existing diffusion-based approaches.

Ganggui Ding, Hao Chen, Xiaogang Xu2026-03-06💻 cs

Beyond the Patch: Exploring Vulnerabilities of Visuomotor Policies via Viewpoint-Consistent 3D Adversarial Object

This paper proposes a viewpoint-consistent 3D adversarial texture optimization method using differentiable rendering, Expectation over Transformation with a Coarse-to-Fine curriculum, and saliency-guided perturbations to effectively expose and exploit vulnerabilities in robot visuomotor policies under dynamic camera viewpoints.

Chanmi Lee, Minsung Yoon, Woojae Kim + 2 more2026-03-06💻 cs

Think, Then Verify: A Hypothesis-Verification Multi-Agent Framework for Long Video Understanding

The paper introduces VideoHV-Agent, a multi-agent framework that improves long video understanding by replacing reactive retrieval with a structured "think-then-verify" process where hypotheses are formulated, clues are derived, and evidence is grounded before generating a final answer, achieving state-of-the-art accuracy with enhanced interpretability and lower computational cost.

Zheng Wang, Haoran Chen, Haoxuan Qin + 3 more2026-03-06💻 cs