From Pixels to Predicates: Learning Symbolic World Models via Pretrained Vision-Language Models

This paper proposes a method that leverages pretrained vision-language models to learn compact, abstract symbolic world models from limited visual demonstrations, enabling zero-shot generalization and long-horizon planning for complex robotic tasks across novel objects, environments, and goals.

Ashay Athalye, Nishanth Kumar, Tom Silver, Yichao Liang, Jiuguang Wang, Tomás Lozano-Pérez, Leslie Pack KaelblingTue, 10 Ma🤖 cs.LG

Enhancing Alzheimer's Diagnosis: Leveraging Anatomical Landmarks in Graph Convolutional Neural Networks on Tetrahedral Meshes

This paper proposes a novel transformer-based geometric deep learning model that tokenizes tetrahedral meshes with anatomical landmarks to accurately classify Alzheimer's disease and predict brain amyloid positivity in medium-risk individuals, offering a robust alternative to costly and invasive PET scans.

Yanxi Chen, Mohammad Farazi, Zhangsihao Yang, Yonghui Fan, Nicholas Ashton, Eric M Reiman, Yi Su, Yalin WangTue, 10 Ma💻 cs

From 2D Alignment to 3D Plausibility: Unifying Heterogeneous 2D Priors and Penetration-Free Diffusion for Occlusion-Robust Two-Hand Reconstruction

This paper proposes a unified framework for occlusion-robust two-hand reconstruction that combines a fusion-alignment encoder to implicitly integrate heterogeneous 2D structural priors from vision foundation models with a penetration-free diffusion model that guides 3D pose generation toward collision-free, kinematically coherent interactions.

Gaoge Han, Yongkang Cheng, Zhe Chen, Shaoli Huang, Tongliang LiuTue, 10 Ma💻 cs

EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video

To address the data scarcity in dexterous manipulation imitation learning, this paper introduces EgoDex, the largest and most diverse dataset of its kind featuring 829 hours of Apple Vision Pro-captured egocentric videos with precise, native 3D hand and finger tracking, alongside established benchmarks for training and evaluating manipulation policies.

Ryan Hoque, Peide Huang, David J. Yoon, Mouli Sivapurapu, Jian ZhangTue, 10 Ma🤖 cs.LG

Generative Prior-Guided Neural Interface Reconstruction for 3D Electrical Impedance Tomography

This paper introduces a "solver-in-the-loop" framework for 3D Electrical Impedance Tomography that combines a pre-trained 3D generative prior with a rigorous boundary integral equation solver to enforce physical constraints as hard conditions, thereby achieving superior geometric accuracy and data efficiency in reconstructing complex interfaces compared to traditional optimization and deep learning methods.

Haibo Liu, Junqing Chen, Guang LinTue, 10 Ma🔢 math

ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers

The paper introduces ViTaPEs, a transformer-based architecture that employs a novel two-stage positional encoding strategy to effectively fuse visual and tactile modalities, achieving state-of-the-art performance and zero-shot generalization across diverse recognition and robotic grasping tasks without relying on pre-trained vision-language models.

Fotios Lygerakis, Ozan Özdenizci, Elmar RückertTue, 10 Ma🤖 cs.LG

Transforming H&E images into IHC: A Variance-Penalized GAN for Precision Oncology

This study introduces a variance-penalized GAN based on pyramid pix2pix that generates high-fidelity HER2-specific immunohistochemistry (IHC) images from routine hematoxylin and eosin (H&E) slides, effectively mitigating mode collapse and outperforming baseline models to enable cost-effective, scalable precision oncology diagnostics.

Sara Rehmat, Hafeez Ur Rehman, Byeong-Gwon Kang, Sarra Ayouni, Yunyoung NamTue, 10 Ma💻 cs

TransUNet-GradCAM: A Hybrid Transformer-U-Net with Self-Attention and Explainable Visualizations for Foot Ulcer Segmentation

This paper presents TransUNet-GradCAM, a hybrid Vision Transformer-U-Net model that effectively segments diabetic foot ulcers by combining global attention with local feature extraction, achieving high accuracy on internal and external datasets while providing explainable visualizations for clinical utility.

Akwasi Asare, Mary Sagoe, Justice Williams Asare, Stephen Edward MooreTue, 10 Ma💻 cs