cs papers | Gist.Science

TrajPred: Trajectory-Conditioned Joint Embedding Prediction for Surgical Instrument-Tissue Interaction Recognition in Vision-Language Models

TrajPred is a novel framework that enhances surgical instrument-tissue interaction recognition in vision-language models by encoding instrument trajectories to capture temporal motion cues and generating fine-grained visual semantic embeddings, thereby significantly improving performance and vision-text alignment on the CholecT50 benchmark.

Jiajun Cheng, Xiaofan Yu, Subarna, Sainan Liu, Shan Lin2026-03-10💻 cs

Privacy-Preserving Patient Identity Management Framework for Secure Healthcare Access

This paper proposes and formally evaluates a privacy-preserving, patient-centric identity management framework that ensures secure healthcare access by balancing operational reliability with strong protections against linkability and traceability through anonymous pseudonyms and conditional traceability.

Nasif Muslim, Jean-Charles Grégoire2026-03-10💻 cs

Two-Stage Path Following for Mobile Manipulators via Dimensionality-Reduced Graph Search and Numerical Optimization

This paper proposes a robust two-stage framework for mobile manipulator path planning that combines dimensionality-reduced graph search with numerical optimization to efficiently generate smooth, kinematically feasible trajectories with sub-millimeter accuracy.

Fuyu Guo, Yuting Mei, Yuyao Zhang, Qian Tang2026-03-10💻 cs

An Extended Consent-Based Access Control Framework: Pre-Commit Validation and Emergency Access

This paper proposes an extended Consent-Based Access Control framework that enhances patient autonomy and system performance by shifting conflict resolution to a pre-commit validation phase, formalizing immutable access invariants, and implementing a context-aware emergency mechanism that balances clinical continuity with strict data privacy.

Nasif Muslim, Jean-Charles Grégoire2026-03-10💻 cs

Mozart: Modularized and Efficient MoE Training on 3.5D Wafer-Scale Chiplet Architectures

The paper introduces Mozart, an algorithm-hardware co-design framework that leverages 3.5D wafer-scale chiplet architectures with specialized expert allocation and scheduling strategies to overcome communication and memory bottlenecks in the efficient training of large-scale Mixture-of-Experts (MoE) language models.

Shuqing Luo (Katie), Ye Han (Katie), Pingzhi Li (Katie), Jiayin Qin (Katie), Jie Peng (Katie), Yang (Katie), Zhao (Kevin), Yu (Kevin), Cao, Tianlong Chen2026-03-10💻 cs

SuperSkillsStack: Agency, Domain Knowledge, Imagination, and Taste in Human-AI Design Education

This study analyzes how 80 student design teams integrated generative AI into their creative process, revealing that while AI serves as a cognitive accelerator for early-stage tasks like brainstorming, human competencies in agency, domain knowledge, imagination, and taste remain essential for interpreting context, validating outputs, and refining design solutions.

Qian Huang, King Wang Poon2026-03-10💻 cs

OV-DEIM: Real-time DETR-Style Open-Vocabulary Object Detection with GridSynthetic Augmentation

This paper presents OV-DEIM, a real-time end-to-end DETR-style open-vocabulary object detector that combines the DEIMv2 framework with a query supplement strategy and a novel GridSynthetic data augmentation technique to achieve state-of-the-art performance and efficiency, particularly for rare categories.

Leilei Wang, Longfei Liu, Xi Shen, Xuanlong Yu, Ying Tiffany He, Fei Richard Yu, Yingyi Chen2026-03-10💻 cs

Enhancing Web Agents with a Hierarchical Memory Tree

This paper proposes the Hierarchical Memory Tree (HMT), a structured framework that decouples high-level task logic from site-specific action details through a three-level abstraction hierarchy, thereby significantly enhancing the generalization and robustness of large language model-based web agents in unseen environments.

Yunteng Tan, Zhi Gao, Xinxiao Wu2026-03-10💻 cs

Two Frames Matter: A Temporal Attack for Text-to-Video Model Jailbreaking

This paper introduces TFM, a temporal attack framework that exploits the vulnerability of text-to-video models to generate harmful content by providing only sparse boundary conditions (start and end frames) and implicitly substituting sensitive cues, thereby bypassing existing safety filters and significantly increasing jailbreak success rates.

Moyang Chen, Zonghao Ying, Wenzhuo Xu, Quancheng Zou, Deyue Zhang, Dongdong Yang, Xiangzheng Zhang2026-03-10💻 cs

Improved Leakage Abuse Attacks in Searchable Symmetric Encryption with eBPF Monitoring

This paper demonstrates that leveraging eBPF-based system-level monitoring reveals new leakage patterns in Searchable Symmetric Encryption (SSE) that extend beyond traditional threat models, thereby enabling more powerful leakage abuse attacks and highlighting the critical need to address system-level exposures in SSE defenses.

Chinecherem Dimobi2026-03-10💻 cs

SSP: Safety-guaranteed Surgical Policy via Joint Optimization of Behavioral and Spatial Constraints

This paper introduces the Safety-guaranteed Surgical Policy (SSP) framework, which integrates Neural ODE-based uncertainty modeling with robust Control Barrier Functions to enforce behavioral and spatial constraints, thereby ensuring near-zero safety violations while maintaining high task success rates in data-driven robot-assisted surgery.

Jianshu Hu, ZhiYuan Guan, Lei Song, Kantaphat Leelakunwet, Hesheng Wang, Wei Xiao, Qi Dou, Yutong Ban2026-03-10💻 cs

Monetizing Generative AI: YouTubers' Collective Knowledge on Earning from Generative AI Content

This paper analyzes 377 YouTube videos to map the collective knowledge creators share about monetizing Generative AI, identifying ten common use cases and revenue strategies while highlighting structural tensions such as unverifiable income claims and shifting authorship norms in AI-mediated creative labor.

Shuo Niu, Yao Lyu, He Zhang, Na Li, Bumjin Kim, Jie Cai2026-03-10💻 cs

Self-Supervised Multi-Modal World Model with 4D Space-Time Embedding

The paper introduces DeepEarth, a self-supervised multi-modal world model featuring Earth4D, a novel 4D space-time positional encoder that achieves state-of-the-art ecological forecasting performance and outperforms larger foundation models through efficient planetary-scale learning.

Lance Legel, Qin Huang, Brandon Voelker, Daniel Neamati, Patrick Alan Johnson, Favyen Bastani, Jeff Rose, James Ryan Hennessy, Robert Guralnick, Douglas Soltis, Pamela Soltis, Shaowen Wang2026-03-10💻 cs

TacDexGrasp: Compliant and Robust Dexterous Grasping with Tactile Feedback

TacDexGrasp is a robust dexterous grasping framework that utilizes tactile feedback and a Second-Order Cone Programming controller to actively constrain tangential-to-normal force ratios, thereby preventing both translational and rotational slip without requiring explicit torque modeling or slip detection.

Yubin Ke, Jiayi Chen, Hang Lv, Xiao Zhou, He Wang2026-03-10💻 cs

AIReSim: A Discrete Event Simulator for Large-scale AI Cluster Reliability Modeling

The paper introduces AIReSim, a discrete event simulator designed to help system designers evaluate and tune reliability mechanisms, prioritize improvements, and plan capacity for large-scale AI clusters by modeling the complex tradeoffs involved in failure, recovery, scheduling, and repair processes.

Karthik Pattabiraman, Mihir Patel, Fred Lin2026-03-10💻 cs

Fine-Grained 3D Facial Reconstruction for Micro-Expressions

This paper proposes a novel fine-grained 3D facial reconstruction method for micro-expressions that integrates global dynamic features with locally-enriched cues from 2D motions, facial priors, and 3D geometry to overcome data scarcity and achieve superior geometric accuracy and perceptual detail compared to state-of-the-art approaches.

Che Sun, Xinjie Zhang, Rui Gao, Xu Chen, Yuwei Wu, Yunde Jia2026-03-10💻 cs

Understanding User Requirements for Creating Sensor-Powered Smart Car Cabins Through Retrofitting

This paper investigates the potential of retrofitting to enhance sensor-powered smart car cabins by identifying limitations in built-in systems through interviews and defining user requirements via participatory design, ultimately offering design recommendations for future retrofit solutions.

Bofan Yu, Borui Li, Tingyu Zhang, Xing-Dong Yang2026-03-10💻 cs

Looking Back and Forth: Cross-Image Attention Calibration and Attentive Preference Learning for Multi-Image Hallucination Mitigation

This paper proposes CAPL, a framework that mitigates multi-image hallucinations in large vision-language models by introducing a selectable image token interaction mechanism for fine-grained cross-image alignment and a preference learning strategy that trains the model to rely on genuine visual evidence rather than textual priors.

Xiaochen Yang, Hao Fang, Jiawei Kong, Yaoxin Mao, Bin Chen, Shu-Tao Xia2026-03-10💻 cs

Communication Network-Aware Missing Data Recovery for Enhanced Distribution Grid Visibility

This paper proposes a communication network-aware framework that integrates routing constraints with low-rank matrix completion to mitigate spatially correlated data losses and significantly improve missing data recovery accuracy in power distribution grids compared to traditional measurement-only approaches.

Biswas Rudra Jyoti Arka, Md Zahidul Islam, Yuzhang Lin, Vinod M. Vokkarane, Junbo Zhao2026-03-10💻 cs

Leveraging Large Language Models for Automated Scalable Development of Open Scientific Databases

This paper introduces a scalable, domain-agnostic web-based framework that leverages Large Language Models to automate the collection, filtering, and construction of open scientific databases, achieving 90% overlap with expert-curated datasets while significantly reducing manual workload.

Nikita Gautam, Doina Caragea, Ignacio Ciampitti, Federico Gomez2026-03-10💻 cs

← Previous Next →