cs papers | Gist.Science

Differentiable Variable Fonts

This paper introduces "Differentiable Variable Fonts," a framework that mathematically connects variable font parameters to vector graphics to enable gradient-based optimization for automated, intuitive text design and animation workflows while preserving legibility and aesthetics.

Kinjal Parikh, Danny M. Kaufman, David I. W. Levin, Alec Jacobson2026-03-10💻 cs

EB-MBD: Emerging-Barrier Model-Based Diffusion for Safe Trajectory Optimization in Highly Constrained Environments

This paper introduces Emerging-Barrier Model-Based Diffusion (EB-MBD), a novel approach that integrates progressively introduced barrier functions inspired by interior point methods to overcome the sample inefficiency and catastrophic performance degradation of standard Model-Based Diffusion in highly constrained environments, achieving superior solution quality and computational efficiency without expensive projection operations.

Raghav Mishra, Ian R. Manchester2026-03-10💻 cs

Real-Time Motion-Controllable Autoregressive Video Diffusion

The paper introduces AR-Drag, a reinforcement learning-enhanced few-step autoregressive video diffusion model that achieves real-time, high-fidelity image-to-video generation with diverse motion control while significantly reducing latency compared to existing bidirectional approaches.

Kesen Zhao, Jiaxin Shi, Beier Zhu, Junbao Zhou, Xiaolong Shen, Yuan Zhou, Qianru Sun, Hanwang Zhang2026-03-10💻 cs

CDE: Concept-Driven Exploration for Reinforcement Learning

This paper proposes Concept-Driven Exploration (CDE), a reinforcement learning framework that leverages a pre-trained vision-language model to generate object-centric concepts as noisy supervisory signals, using concept reconstruction accuracy as an intrinsic reward to guide efficient, targeted exploration in visual control tasks and achieve robust real-world transfer.

Le Mao, Andrew H. Liu, Renos Zabounidis, Yanan Niu, Zachary Kingston, Joseph Campbell2026-03-10💻 cs

Deliberative Dynamics and Value Alignment in LLM Debates

This paper investigates how different deliberation protocols (synchronous vs. round-robin) and model architectures influence value alignment and verdict revision in multi-turn LLM debates, revealing significant behavioral disparities where GPT-4.1 exhibits strong inertia and autonomy-focused reasoning while Claude 3.7 Sonnet and Gemini 2.0 Flash demonstrate greater flexibility, empathy, and susceptibility to order effects.

Pratik S. Sachdeva, Tom van Nuenen2026-03-10💻 cs

Reallocating Attention Across Layers to Reduce Multimodal Hallucination

This paper proposes a lightweight, training-free plugin called Functional Head Identification and Class-Conditioned Rescaling that mitigates multimodal hallucinations in large reasoning models by adaptively rebalancing perception and reasoning contributions across layers, achieving significant performance gains with minimal computational overhead.

Haolang Lu, Bolun Chu, WeiYe Fu, Guoshun Nan, Junning Liu, Minghui Pan, Qiankun Li, Yi Yu, Hua Wang, Kun Wang2026-03-10💻 cs

Preference-Conditioned Multi-Objective RL for Integrated Command Tracking and Force Compliance in Humanoid Locomotion

This paper proposes a preference-conditioned multi-objective reinforcement learning framework that enables a single humanoid locomotion policy to dynamically balance accurate command tracking with compliant responses to external forces, validated through stable training and successful deployment in both simulation and real-world experiments.

Tingxuan Leng, Yushi Wang, Tinglong Zheng, Changsheng Luo, Mingguo Zhao2026-03-10💻 cs

DropVLA: An Action-Level Backdoor Attack on Vision-Language-Action Models

This paper introduces DropVLA, an action-level backdoor attack that covertly manipulates Vision-Language-Action models to execute specific safety-critical actions at attacker-chosen decision points using minimal vision-based data poisoning while maintaining high nominal task performance.

Zonghuan Xu, Jiayu Li, Yunhan Zhao, Xiang Zheng, Xingjun Ma, Yu-Gang Jiang2026-03-10💻 cs

Ego-Vision World Model for Humanoid Contact Planning

This paper presents a demonstration-free framework that combines a learned ego-vision world model with sampling-based Model Predictive Control and a surrogate value function to enable humanoid robots to perform robust, real-time physical contact planning in unstructured environments.

Hang Liu, Yuman Gao, Sangli Teng, Yufeng Chi, Yakun Sophia Shao, Zhongyu Li, Maani Ghaffari, Koushil Sreenath2026-03-10💻 cs

Protege Effect for Behaviour Change: Does Teaching Digital Stress Solutions to Others Reduce One's Own?

This study found that a protégé-based approach, where individuals teach others about managing digital stress, did not significantly reduce their own problematic digital behaviors compared to control groups, highlighting the challenges of translating cognitive engagement into actual behavioral change.

Sameha Alshakhsi, Ala Yankouskaya, Dena Al-Thani, Raian Ali2026-03-10💻 cs

Unsupervised Deep Generative Models for Anomaly Detection in Neuroimaging: A Systematic Scoping Review

This systematic scoping review synthesizes thirty-three studies on unsupervised deep generative models for neuroimaging anomaly detection, highlighting their potential for pathology-agnostic localization in data-scarce settings while identifying key challenges such as methodological heterogeneity and limited external validation.

Youwan Mahé, Elise Bannier, Stéphanie Leplaideur, Elisa Fromont, Francesca Galassi2026-03-10💻 cs

A Robust Placeability Metric for Model-Free Unified Pick-and-Place Reasoning

This paper introduces a robust, model-free probabilistic metric that evaluates 6D placement poses from partial point clouds by jointly scoring stability, graspability, and clearance, thereby enabling reliable and unified pick-and-place reasoning for unseen objects on diverse support geometries.

Benno Wingender, Nils Dengler, Rohit Menon, Sicong Pan, Maren Bennewitz2026-03-10💻 cs

Taming Modality Entanglement in Continual Audio-Visual Segmentation

This paper introduces the Continual Audio-Visual Segmentation (CAVS) task and proposes a Collision-based Multi-modal Rehearsal (CMR) framework that effectively addresses multi-modal semantic drift and co-occurrence confusion through novel sample selection and frequency adjustment strategies, significantly outperforming existing single-modal continual learning methods.

Yuyang Hong, Qi Yang, Tao Zhang, Zili Wang, Zhaojin Fu, Kun Ding, Bin Fan, Shiming Xiang2026-03-10💻 cs

PolyJailbreak: Cross-Modal Jailbreaking Attacks on Black-Box Multimodal LLMs

This paper introduces PolyJailbreak, a novel black-box framework that exploits multimodal safety asymmetries through a structured library of atomic strategies and reinforcement learning-based multi-agent optimization to achieve significantly higher jailbreak success rates on state-of-the-art multimodal large language models compared to existing methods.

Xinkai Wang, Beibei Li, Zerui Shao, Ao Liu, Guangquan Xu, Shouling Ji2026-03-10💻 cs

HumanHalo - Safe and Efficient 3D Navigation Among Humans via Minimally Conservative MPC

This paper presents HumanMPC, a Model Predictive Control framework that ensures safe and efficient 3D navigation for Micro Air Vehicles among humans by combining data-driven motion forecasting with a novel reachability-based safety formulation that minimizes conservatism while guaranteeing collision avoidance.

Simon Schaefer, Helen Oleynikova, Sandra Hirche, Stefan Leutenegger2026-03-10💻 cs

Khelte Khelte Shikhi: A Proposed HCI Framework for Gamified Interactive Learning with Minecraft in Bangladeshi Education Systems

This paper proposes a practical, three-tiered HCI framework for deploying localized, gamified Minecraft learning in Bangladesh's resource-constrained schools, addressing critical infrastructure gaps through adaptive offline and low-power solutions while outlining specific curriculum-aligned content and evaluation benchmarks for future pilot testing.

Mohd Ruhul Ameen, Akif Islam, Momen Khandokar Ope2026-03-10💻 cs

Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks

This paper introduces Dream4Drive, a novel synthetic data generation framework that leverages 3D-aware guidance and a fine-tuned driving world model to create diverse, multi-view corner cases, effectively enhancing downstream perception tasks in autonomous driving without the performance gains being negated by increased training epochs.

Kai Zeng, Zhanqian Wu, Kaixin Xiong, Xiaobao Wei, Xiangyu Guo, Zhenxin Zhu, Kalok Ho, Lijun Zhou, Bohan Zeng, Ming Lu, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Wentao Zhang2026-03-10💻 cs

MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting

MoE-GS introduces a novel Mixture-of-Experts framework for dynamic Gaussian Splatting that utilizes a Volume-aware Pixel Router to adaptively blend heterogeneous deformation priors for superior novel view synthesis, while addressing efficiency concerns through multi-expert rendering optimizations and knowledge distillation.

In-Hwan Jin, Hyeongju Mun, Joonsoo Kim, Kugjin Yun, Kyeongbo Kong2026-03-10💻 cs

Next Generation Cloud-native In-Memory Stores: From Redis to Valkey and Beyond

This study provides a comprehensive experimental benchmark of emerging cloud-native in-memory key-value stores—including Valkey, KeyDB, and Garnet—within Kubernetes environments to evaluate their performance trade-offs, resource efficiency, and long-term viability as alternatives to Redis.

Carl-Johan Fauvelle Munck af Rosensch"old, Feras M. Awaysheh, Ahmad Awad2026-03-10💻 cs

Human-Centered LLM-Agent System for Detecting Anomalous Digital Asset Transactions

This paper presents HCLA, a human-centered multi-agent system that enhances transparency and accountability in digital asset anomaly detection by reconstructing traceable, expert-style reasoning processes through a conversational workflow that separates evidence scoring from justification, rather than merely explaining black-box models.

Gyuyeon Na, Minjung Park, Hyeonjeong Cha, Sangmi Chai2026-03-10💻 cs

← Previous Next →