Towards Instance Segmentation with Polygon Detection Transformers

This paper introduces Poly-DETR, a lightweight instance segmentation framework that reformulates the task as sparse vertex regression using polar representation and specialized attention mechanisms, achieving superior performance and reduced memory consumption compared to traditional mask-based methods, particularly in high-resolution and domain-specific scenarios.

Jiacheng Sun, Jiaqi Lin, Wenlong Hu, Haoyang Li, Xinghong Zhou, Chenghai Mao, Yan Peng, Xiaomao Li2026-03-11💻 cs

Reasoning-Oriented Programming: Chaining Semantic Gadgets to Jailbreak Large Vision Language Models

This paper introduces "Reasoning-Oriented Programming," an automated attack framework that bypasses Large Vision-Language Model safety alignments by chaining semantically orthogonal benign visual inputs to force the emergence of harmful logic only during late-stage reasoning, thereby outperforming existing jailbreak methods on state-of-the-art models.

Quanchen Zou, Moyang Chen, Zonghao Ying, Wenzhuo Xu, Yisong Xiao, Deyue Zhang, Dongdong Yang, Zhao Liu, Xiangzheng Zhang2026-03-11💻 cs

Evoking User Memory: Personalizing LLM via Recollection-Familiarity Adaptive Retrieval

This paper introduces RF-Mem, a novel memory retrieval framework that mimics human dual-process cognition by adaptively switching between fast familiarity-based recognition and iterative recollection-based reconstruction to achieve scalable and effective personalization in large language models.

Yingyi Zhang, Junyi Li, Wenlin Zhang, Penyue Jia, Xianneng Li, Yichao Wang, Derong Xu, Yi Wen, Huifeng Guo, Yong Liu, Xiangyu Zhao2026-03-11💻 cs

Platooning as a Service (PlaaS): A Sustainable Transportation Framework for Connected and Autonomous Vehicles

This paper introduces Platooning as a Service (PlaaS), a Stackelberg game-based decision-support framework that optimizes pricing and travel distance between service providers and users to enhance sustainable transportation, while analyzing how factors like government subsidies and vehicle velocity impact profitability and carbon emissions.

Bhosale Akshay Tanaji, Sayak Roychowdhury, Anand Abrahamb2026-03-11💻 cs

Implicit Geometry Representations for Vision-and-Language Navigation from Web Videos

This paper introduces a large-scale framework for Vision-and-Language Navigation that leverages web-based room tour videos and implicit geometry representations to overcome simulator limitations, enabling robust zero-shot navigation agents with state-of-the-art performance across multiple benchmarks.

Mingfei Han, Haihong Hao, Liang Ma, Kamila Zhumakhanova, Ekaterina Radionova, Jingyi Zhang, Xiaojun Chang, Xiaodan Liang, Ivan Laptev2026-03-11💻 cs

ForgeDreamer: Industrial Text-to-3D Generation with Multi-Expert LoRA and Cross-View Hypergraph

ForgeDreamer is a novel text-to-3D generation framework designed for industrial applications that overcomes domain adaptation and geometric reasoning limitations by integrating a Multi-Expert LoRA Ensemble for interference-free cross-category generalization and a Cross-View Hypergraph approach for capturing high-order structural dependencies to ensure manufacturing-level precision.

Junhao Cai, Deyu Zeng, Junhao Pang, Lini Li, Zongze Wu, Xiaopin Zhong2026-03-11💻 cs

Entangling Like Mycorrhizae: Mixing Realities Through Touch in "FungiSync"

The paper presents *FungiSync*, a multi-person mixed reality experience that translates the symbiotic interdependence of mycorrhizal networks into an embodied ritual where participants' individual digital perceptual worlds entangle through physical touch, fostering a "fungal epistemic" perspective that critiques accelerated individualism.

Botao Amber Hu, Danlin Huang, Yilan Elan Tao, Xiaobo Aaron Hu, Rem RunGu Lin2026-03-11💻 cs

From Ideal to Real: Stable Video Object Removal under Imperfect Conditions

The paper introduces Stable Video Object Removal (SVOR), a robust framework that achieves state-of-the-art, flicker-free video object removal under real-world imperfections by employing a Mask Union strategy for stable erasure, a Denoising-Aware Segmentation head for precise localization, and a Curriculum Two-Stage training approach to handle shadows, abrupt motion, and defective masks.

Jiagao Hu, Yuxuan Chen, Fuhao Li, Zepeng Wang, Fei Wang, Daiguo Zhou, Jian Luan2026-03-11💻 cs

ToolRosetta: Bridging Open-Source Repositories and Large Language Model Agents through Automated Tool Standardization

ToolRosetta is a unified framework that automatically transforms heterogeneous open-source code repositories into standardized, secure, and executable Model Context Protocol (MCP) tools, enabling LLM agents to autonomously plan and invoke specialized software for complex tasks with minimal human intervention.

Shimin Di, Xujie Yuan, Hanghui Guo, Chaoqian Ouyang, Zhangze Chen, Ling Yue, Libin Zheng, Jia Zhu, Shaowu Pan, Jian Yin, Min-Ling Zhang, Yong Rui2026-03-11💻 cs

See, Plan, Rewind: Progress-Aware Vision-Language-Action Models for Robust Robotic Manipulation

The paper introduces See, Plan, Rewind (SPR), a progress-aware vision-language-action framework that enhances robotic manipulation robustness by dynamically grounding instructions into spatial subgoals and enabling closed-loop error recovery through state rewinding, achieving state-of-the-art performance on challenging benchmarks without additional training.

Tingjun Dai, Mingfei Han, Tingwen Du, Zhiheng Liu, Zhihui Li, Salman Khan, Jun Yu, Xiaojun Chang2026-03-11💻 cs