VISTA: Vision-Language Inference for Training-Free Stock Time-Series Analysis

The paper introduces VISTA, a novel training-free framework that leverages Vision-Language Models to predict stock prices by jointly analyzing textual data and line charts through zero-shot prompting, achieving significant performance improvements over traditional statistical and text-only baselines.

Tina Khezresmaeilzadeh, Parsa Razmara, Seyedarmin Azizi, Mohammad Erfan Sadeghi, Erfan Baghaei PotraghlooTue, 10 Ma🤖 cs.LG

ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers

The paper introduces ViTaPEs, a transformer-based architecture that employs a novel two-stage positional encoding strategy to effectively fuse visual and tactile modalities, achieving state-of-the-art performance and zero-shot generalization across diverse recognition and robotic grasping tasks without relying on pre-trained vision-language models.

Fotios Lygerakis, Ozan Özdenizci, Elmar RückertTue, 10 Ma🤖 cs.LG

MMTU: A Massive Multi-Task Table Understanding and Reasoning Benchmark

This paper introduces MMTU, a large-scale benchmark comprising over 28,000 questions across 25 real-world expert-level table tasks, designed to comprehensively evaluate and reveal the significant limitations of current frontier models in understanding, reasoning, and manipulating structured tabular data.

Junjie Xing, Yeye He, Mengyu Zhou, Haoyu Dong, Shi Han, Lingjiao Chen, Dongmei Zhang, Surajit Chaudhuri, H. V. JagadishTue, 10 Ma🤖 cs.LG

EROICA: Online Performance Troubleshooting for Large-scale Model Training

This paper presents EROICA, the first online troubleshooting system deployed on production-scale GPU clusters (~100,000 GPUs) that effectively diagnoses complex hardware and software performance issues in large-scale model training through fine-grained profiling and differential observability with minimal impact.

Yu Guan, Zhiyu Yin, Haoyu Chen, Sheng Cheng, Chaojie Yang, Kun Qian, Tianyin Xu, Pengcheng Zhang, Yang Zhang, Hanyu Zhao, Yong Li, Wei Lin, Dennis Cai, Ennan ZhaiTue, 10 Ma🤖 cs.LG

BemaGANv2: Discriminator Combination Strategies for GAN-based Vocoders in Long-Term Audio Generation

BemaGANv2 is an advanced GAN-based vocoder that enhances long-term audio generation for Text-to-Music and Text-to-Audio applications by integrating Anti-aliased Multi-Periodicity composition modules in the generator and systematically evaluating novel discriminator combination strategies, including the Multi-Envelope Discriminator, to achieve high-fidelity and temporally coherent results.

Taesoo Park, Mungwi Jeong, Mingyu Park, Narae Kim, Junyoung Kim, Mujung Kim, Jisang Yoo, Hoyun Lee, Sanghoon Kim, Soonchul KwonTue, 10 Ma🤖 cs.LG

Efficient Algorithms for Logistic Contextual Slate Bandits with Bandit Feedback

This paper introduces two efficient algorithms, Slate-GLM-OFU and Slate-GLM-TS, for the Logistic Contextual Slate Bandit problem that achieve O~(T)\tilde{O}(\sqrt{T}) regret and NO(1)N^{O(1)} per-round computational complexity by combining local planning with global learning, demonstrating superior performance in both synthetic benchmarks and practical language model applications.

Tanmay Goyal, Gaurav SinhaTue, 10 Ma🤖 cs.LG

Sharpness-Aware Machine Unlearning

This paper characterizes how Sharpness-Aware Minimization (SAM) alters generalization during machine unlearning by abandoning its denoising properties when fitting forget signals, leading to the proposal of "Sharp MinMax"—a novel method that splits the model to simultaneously learn retain signals via SAM and unlearn forget signals via sharpness maximization, thereby achieving superior unlearning performance, reduced feature entanglement, and enhanced privacy.

Haoran Tang, Rajiv KhannaTue, 10 Ma🤖 cs.LG

DemoDiffusion: One-Shot Human Imitation using pre-trained Diffusion Policy

DemoDiffusion is a one-shot imitation learning method that enables robots to perform diverse manipulation tasks by leveraging kinematic retargeting to derive a rough trajectory from a single human demonstration and refining it with a pre-trained diffusion policy to ensure alignment with plausible robot actions, achieving significantly higher success rates than baseline approaches without requiring task-specific training or paired data.

Sungjae Park, Homanga Bharadhwaj, Shubham TulsianiTue, 10 Ma🤖 cs.LG