TrajPred: Trajectory-Conditioned Joint Embedding Prediction for Surgical Instrument-Tissue Interaction Recognition in Vision-Language Models

TrajPred is a novel framework that enhances surgical instrument-tissue interaction recognition in vision-language models by encoding instrument trajectories to capture temporal motion cues and generating fine-grained visual semantic embeddings, thereby significantly improving performance and vision-text alignment on the CholecT50 benchmark.

Jiajun Cheng, Xiaofan Yu, Subarna, Sainan Liu, Shan Lin2026-03-10💻 cs

Mozart: Modularized and Efficient MoE Training on 3.5D Wafer-Scale Chiplet Architectures

The paper introduces Mozart, an algorithm-hardware co-design framework that leverages 3.5D wafer-scale chiplet architectures with specialized expert allocation and scheduling strategies to overcome communication and memory bottlenecks in the efficient training of large-scale Mixture-of-Experts (MoE) language models.

Shuqing Luo (Katie), Ye Han (Katie), Pingzhi Li (Katie), Jiayin Qin (Katie), Jie Peng (Katie), Yang (Katie), Zhao (Kevin), Yu (Kevin), Cao, Tianlong Chen2026-03-10💻 cs

SuperSkillsStack: Agency, Domain Knowledge, Imagination, and Taste in Human-AI Design Education

This study analyzes how 80 student design teams integrated generative AI into their creative process, revealing that while AI serves as a cognitive accelerator for early-stage tasks like brainstorming, human competencies in agency, domain knowledge, imagination, and taste remain essential for interpreting context, validating outputs, and refining design solutions.

Qian Huang, King Wang Poon2026-03-10💻 cs

OV-DEIM: Real-time DETR-Style Open-Vocabulary Object Detection with GridSynthetic Augmentation

This paper presents OV-DEIM, a real-time end-to-end DETR-style open-vocabulary object detector that combines the DEIMv2 framework with a query supplement strategy and a novel GridSynthetic data augmentation technique to achieve state-of-the-art performance and efficiency, particularly for rare categories.

Leilei Wang, Longfei Liu, Xi Shen, Xuanlong Yu, Ying Tiffany He, Fei Richard Yu, Yingyi Chen2026-03-10💻 cs

Two Frames Matter: A Temporal Attack for Text-to-Video Model Jailbreaking

This paper introduces TFM, a temporal attack framework that exploits the vulnerability of text-to-video models to generate harmful content by providing only sparse boundary conditions (start and end frames) and implicitly substituting sensitive cues, thereby bypassing existing safety filters and significantly increasing jailbreak success rates.

Moyang Chen, Zonghao Ying, Wenzhuo Xu, Quancheng Zou, Deyue Zhang, Dongdong Yang, Xiangzheng Zhang2026-03-10💻 cs

SSP: Safety-guaranteed Surgical Policy via Joint Optimization of Behavioral and Spatial Constraints

This paper introduces the Safety-guaranteed Surgical Policy (SSP) framework, which integrates Neural ODE-based uncertainty modeling with robust Control Barrier Functions to enforce behavioral and spatial constraints, thereby ensuring near-zero safety violations while maintaining high task success rates in data-driven robot-assisted surgery.

Jianshu Hu, ZhiYuan Guan, Lei Song, Kantaphat Leelakunwet, Hesheng Wang, Wei Xiao, Qi Dou, Yutong Ban2026-03-10💻 cs

Self-Supervised Multi-Modal World Model with 4D Space-Time Embedding

The paper introduces DeepEarth, a self-supervised multi-modal world model featuring Earth4D, a novel 4D space-time positional encoder that achieves state-of-the-art ecological forecasting performance and outperforms larger foundation models through efficient planetary-scale learning.

Lance Legel, Qin Huang, Brandon Voelker, Daniel Neamati, Patrick Alan Johnson, Favyen Bastani, Jeff Rose, James Ryan Hennessy, Robert Guralnick, Douglas Soltis, Pamela Soltis, Shaowen Wang2026-03-10💻 cs

Looking Back and Forth: Cross-Image Attention Calibration and Attentive Preference Learning for Multi-Image Hallucination Mitigation

This paper proposes CAPL, a framework that mitigates multi-image hallucinations in large vision-language models by introducing a selectable image token interaction mechanism for fine-grained cross-image alignment and a preference learning strategy that trains the model to rely on genuine visual evidence rather than textual priors.

Xiaochen Yang, Hao Fang, Jiawei Kong, Yaoxin Mao, Bin Chen, Shu-Tao Xia2026-03-10💻 cs

Communication Network-Aware Missing Data Recovery for Enhanced Distribution Grid Visibility

This paper proposes a communication network-aware framework that integrates routing constraints with low-rank matrix completion to mitigate spatially correlated data losses and significantly improve missing data recovery accuracy in power distribution grids compared to traditional measurement-only approaches.

Biswas Rudra Jyoti Arka, Md Zahidul Islam, Yuzhang Lin, Vinod M. Vokkarane, Junbo Zhao2026-03-10💻 cs