Flash-KMeans: Fast and Memory-Efficient Exact K-Means

This paper introduces Flash-KMeans, an IO-aware and contention-free GPU implementation that eliminates memory bottlenecks in the assignment stage and resolves atomic write contention in the update stage through novel kernel-level innovations, achieving up to 17.9×\times speedup over existing baselines and enabling kk-means as a high-performance online primitive.

Shuo Yang, Haocheng Xi, Yilong Zhao, Muyang Li, Xiaoze Fan, Jintao Zhang, Han Cai, Yujun Lin, Xiuyu Li, Kurt Keutzer, Song Han, Chenfeng Xu, Ion StoicaWed, 11 Ma💻 cs

MO-Playground: Massively Parallelized Multi-Objective Reinforcement Learning for Robotics

This paper introduces MORLAX, a GPU-native multi-objective reinforcement learning algorithm, and MO-Playground, a suite of GPU-accelerated environments, which together enable massively parallelized training that achieves 25–270x speedups and superior Pareto fronts for complex robotics tasks compared to legacy CPU-based approaches.

Neil Janwani, Ellen Novoseller, Vernon J. Lawhern, Maegan TuckerWed, 11 Ma💻 cs

When Detectors Forget Forensics: Blocking Semantic Shortcuts for Generalizable AI-Generated Image Detection

This paper introduces Geometric Semantic Decoupling (GSD), a parameter-free module that enhances the generalizability of AI-generated image detectors by explicitly removing dominant semantic priors from learned representations, thereby forcing models to rely on robust forensic artifacts rather than failing via "semantic fallback" when encountering unseen generation pipelines.

Chao Shuai, Zhenguang Liu, Shaojing Fan, Bin Gong, Weichen Lian, Xiuli Bi, Zhongjie Ba, Kui RenWed, 11 Ma💻 cs

Towards Instance Segmentation with Polygon Detection Transformers

This paper introduces Poly-DETR, a lightweight instance segmentation framework that reformulates the task as sparse vertex regression using polar representation and specialized attention mechanisms, achieving superior performance and reduced memory consumption compared to traditional mask-based methods, particularly in high-resolution and domain-specific scenarios.

Jiacheng Sun, Jiaqi Lin, Wenlong Hu, Haoyang Li, Xinghong Zhou, Chenghai Mao, Yan Peng, Xiaomao LiWed, 11 Ma💻 cs

Reasoning-Oriented Programming: Chaining Semantic Gadgets to Jailbreak Large Vision Language Models

This paper introduces "Reasoning-Oriented Programming," an automated attack framework that bypasses Large Vision-Language Model safety alignments by chaining semantically orthogonal benign visual inputs to force the emergence of harmful logic only during late-stage reasoning, thereby outperforming existing jailbreak methods on state-of-the-art models.

Quanchen Zou, Moyang Chen, Zonghao Ying, Wenzhuo Xu, Yisong Xiao, Deyue Zhang, Dongdong Yang, Zhao Liu, Xiangzheng ZhangWed, 11 Ma💻 cs

Evoking User Memory: Personalizing LLM via Recollection-Familiarity Adaptive Retrieval

This paper introduces RF-Mem, a novel memory retrieval framework that mimics human dual-process cognition by adaptively switching between fast familiarity-based recognition and iterative recollection-based reconstruction to achieve scalable and effective personalization in large language models.

Yingyi Zhang, Junyi Li, Wenlin Zhang, Penyue Jia, Xianneng Li, Yichao Wang, Derong Xu, Yi Wen, Huifeng Guo, Yong Liu, Xiangyu ZhaoWed, 11 Ma💻 cs

Platooning as a Service (PlaaS): A Sustainable Transportation Framework for Connected and Autonomous Vehicles

This paper introduces Platooning as a Service (PlaaS), a Stackelberg game-based decision-support framework that optimizes pricing and travel distance between service providers and users to enhance sustainable transportation, while analyzing how factors like government subsidies and vehicle velocity impact profitability and carbon emissions.

Bhosale Akshay Tanaji, Sayak Roychowdhury, Anand AbrahambWed, 11 Ma💻 cs

Implicit Geometry Representations for Vision-and-Language Navigation from Web Videos

This paper introduces a large-scale framework for Vision-and-Language Navigation that leverages web-based room tour videos and implicit geometry representations to overcome simulator limitations, enabling robust zero-shot navigation agents with state-of-the-art performance across multiple benchmarks.

Mingfei Han, Haihong Hao, Liang Ma, Kamila Zhumakhanova, Ekaterina Radionova, Jingyi Zhang, Xiaojun Chang, Xiaodan Liang, Ivan LaptevWed, 11 Ma💻 cs

ForgeDreamer: Industrial Text-to-3D Generation with Multi-Expert LoRA and Cross-View Hypergraph

ForgeDreamer is a novel text-to-3D generation framework designed for industrial applications that overcomes domain adaptation and geometric reasoning limitations by integrating a Multi-Expert LoRA Ensemble for interference-free cross-category generalization and a Cross-View Hypergraph approach for capturing high-order structural dependencies to ensure manufacturing-level precision.

Junhao Cai, Deyu Zeng, Junhao Pang, Lini Li, Zongze Wu, Xiaopin ZhongWed, 11 Ma💻 cs

Entangling Like Mycorrhizae: Mixing Realities Through Touch in "FungiSync"

The paper presents *FungiSync*, a multi-person mixed reality experience that translates the symbiotic interdependence of mycorrhizal networks into an embodied ritual where participants' individual digital perceptual worlds entangle through physical touch, fostering a "fungal epistemic" perspective that critiques accelerated individualism.

Botao Amber Hu, Danlin Huang, Yilan Elan Tao, Xiaobo Aaron Hu, Rem RunGu LinWed, 11 Ma💻 cs

From Ideal to Real: Stable Video Object Removal under Imperfect Conditions

The paper introduces Stable Video Object Removal (SVOR), a robust framework that achieves state-of-the-art, flicker-free video object removal under real-world imperfections by employing a Mask Union strategy for stable erasure, a Denoising-Aware Segmentation head for precise localization, and a Curriculum Two-Stage training approach to handle shadows, abrupt motion, and defective masks.

Jiagao Hu, Yuxuan Chen, Fuhao Li, Zepeng Wang, Fei Wang, Daiguo Zhou, Jian LuanWed, 11 Ma💻 cs