Lost in Translation: How Language Re-Aligns Vision for Cross-Species Pathology

This study demonstrates that introducing "Semantic Anchoring," a text-alignment mechanism, effectively resolves intrinsic embedding collapse and domain-locking in cross-species pathology models by using language as a stable coordinate system to re-align visual features, thereby significantly improving cancer detection performance across same-cancer, cross-cancer, and cross-species scenarios.

Ekansh Arora2026-03-06💻 cs

Recognition of Daily Activities through Multi-Modal Deep Learning: A Video, Pose, and Object-Aware Approach for Ambient Assisted Living

This paper proposes a multi-modal deep learning framework that fuses 3D CNN-based video features, Graph Convolutional Network-analyzed pose data, and object detection context via cross-attention to robustly recognize daily activities for Ambient Assisted Living, achieving competitive accuracy on the Toyota SmartHome dataset.

Kooshan Hashemifard, Pau Climent-Pérez, Francisco Florez-Revuelta2026-03-06💻 cs

Fusion and Grouping Strategies in Deep Learning for Local Climate Zone Classification of Multimodal Remote Sensing Data

This study evaluates various deep learning fusion and grouping strategies for classifying Local Climate Zones using multimodal SAR and MSI data, demonstrating that a baseline hybrid fusion model combined with band grouping and label merging achieves the highest accuracy (76.6%) while significantly improving predictions for underrepresented classes.

Ancymol Thomas, Jaya Sreevalsan-Nair2026-03-06💻 cs

PinPoint: Evaluation of Composed Image Retrieval with Explicit Negatives, Multi-Image Queries, and Paraphrase Testing

The paper introduces PinPoint, a comprehensive real-world benchmark for Composed Image Retrieval featuring multi-answer ground truths, explicit hard negatives, and multi-image queries to reveal significant limitations in current methods, alongside proposing a training-free MLLM-based reranking solution to address these gaps.

Rohan Mahadev, Joyce Yuan, Patrick Poirson + 3 more2026-03-06💻 cs

Spinverse: Differentiable Physics for Permeability-Aware Microstructure Reconstruction from Diffusion MRI

Spinverse is a differentiable physics framework that reconstructs explicit microstructural interfaces from diffusion MRI by optimizing learnable face permeabilities on a fixed tetrahedral grid, utilizing geometric priors and multi-sequence optimization to overcome ill-posedness and recover complex tissue geometries without altering mesh connectivity.

Prathamesh Pradeep Khole, Mario M. Brenes, Zahra Kais Petiwala + 5 more2026-03-06💻 cs

A Benchmark Study of Neural Network Compression Methods for Hyperspectral Image Classification

This paper presents a systematic benchmark study evaluating the effectiveness of pruning, quantization, and knowledge distillation in compressing neural networks for hyperspectral image classification, demonstrating that these methods can significantly reduce model size and computational costs while maintaining competitive accuracy for resource-constrained remote sensing applications.

Sai Shi2026-03-06💻 cs

Are Multimodal LLMs Ready for Surveillance? A Reality Check on Zero-Shot Anomaly Detection in the Wild

This paper evaluates the viability of zero-shot Multimodal LLMs for real-world video anomaly detection, revealing that while prompt engineering can significantly improve F1-scores, a persistent conservative bias toward the "normal" class severely limits recall, highlighting a critical gap between current MLLM capabilities and the demands of practical surveillance.

Shanle Yao, Armin Danesh Pazho, Narges Rashvand + 1 more2026-03-06💻 cs