cs papers | Gist.Science

Tokenizing Semantic Segmentation with RLE

This paper introduces a unified language modeling approach for semantic and panoptic segmentation in images and videos that discretizes masks into run-length encoded tokens, employing novel compression strategies to enable autoregressive generation despite computational constraints.

Abhineet Singh, Justin Rozeboom, Nilanjan Ray2026-03-10💻 cs

EmoOmni: Bridging Emotional Understanding and Expression in Omni-Modal LLMs

This paper introduces EmoOmni, a unified framework that leverages an emotional Chain-of-Thought (E-CoT) to bridge the gap between fine-grained multimodal perception and accurate emotional expression in Omni-LLMs, accompanied by a new dataset and benchmark for systematic evaluation.

Wenjie Tian, Zhixian Zhao, Jingbin Hu, Huakang Chen, Haohe Liu, Binshen Mu, Lei Xie2026-03-10💻 cs

CryoNet.Refine: A One-step Diffusion Model for Rapid Refinement of Structural Models with Cryo-EM Density Map Restraints

CryoNet.Refine is a novel one-step diffusion model that automates and accelerates the refinement of atomic structures against cryo-EM density maps, outperforming traditional tools like Phenix in both model-map correlation and geometric quality while supporting diverse protein and nucleic acid complexes.

Fuyao Huang, Xiaozhu Yu, Kui Xu, Qiangfeng Cliff Zhang2026-03-10💻 cs

Vibe Researching as Wolf Coming: Can AI Agents with Skills Replace or Augment Social Scientists?

This paper argues that AI agents equipped with specialized skills can augment, but not fully replace, social scientists by executing codifiable research tasks autonomously through "vibe researching," while highlighting the enduring necessity of human theoretical originality and tacit knowledge alongside the profession's emerging risks of stratification and pedagogical crisis.

Yongjun Zhang2026-03-10💻 cs

Decomposing Physician Disagreement in HealthBench

This paper analyzes physician disagreement in the HealthBench dataset, revealing that while the majority of variance is structural and irreducible, a small but actionable portion stems from reducible uncertainties like missing context, suggesting that improving evaluation design to close information gaps could meaningfully reduce disagreement on borderline medical AI cases.

Satya Borgohain, Roy Mariathas2026-03-10💻 cs

WISER: Wider Search, Deeper Thinking, and Adaptive Fusion for Training-Free Zero-Shot Composed Image Retrieval

WISER is a training-free framework for Zero-Shot Composed Image Retrieval that unifies Text-to-Image and Image-to-Image paradigms through a "retrieve-verify-refine" pipeline, leveraging wider search, adaptive fusion, and self-reflection to significantly outperform existing methods across diverse benchmarks.

Tianyue Wang, Leigang Qu, Tianyu Yang, Xiangzhao Hao, Yifan Xu, Haiyun Guo, Jinqiao Wang2026-03-10💻 cs

PackUV: Packed Gaussian UV Maps for 4D Volumetric Video

The paper introduces PackUV, a novel 4D Gaussian representation and fitting method that maps volumetric video attributes into structured UV atlases for efficient, codec-compatible storage and streaming, while demonstrating superior temporal consistency and rendering fidelity on the newly proposed large-scale PackUV-2B dataset.

Aashish Rai, Angela Xing, Anushka Agarwal, Xiaoyan Cong, Zekun Li, Tao Lu, Aayush Prakash, Srinath Sridhar2026-03-10💻 cs

On Sample-Efficient Generalized Planning via Learned Transition Models

This paper proposes a sample-efficient approach to generalized planning that learns explicit neural transition models to predict intermediate world states, demonstrating superior out-of-distribution performance and data efficiency compared to direct action-sequence prediction methods.

Nitin Gupta, Vishal Pallagani, John A. Aydin, Biplav Srivastava2026-03-10💻 cs

Annotation-Free Visual Reasoning for High-Resolution Large Multimodal Models via Reinforcement Learning

This paper proposes HART, an annotation-free framework that leverages a novel Advantage Preference Group Relative Policy Optimization (AP-GRPO) algorithm to enable Large Multimodal Models to autonomously identify and verify key high-resolution image regions, thereby improving reasoning performance without requiring costly human grounding labels.

Jiacheng Yang, Anqi Chen, Yunkai Dang, Qi Fan, Cong Wang, Wenbin Li, Feng Miao, Yang Gao2026-03-10💻 cs

PEPA: a Persistently Autonomous Embodied Agent with Personalities

This paper introduces PEPA, a three-layer cognitive architecture that leverages personality traits to enable embodied agents to autonomously generate goals and sustain long-term operation in dynamic environments without relying on external task specifications.

Kaige Liu, Yang Li, Lijun Zhu, Weinan Zhang2026-03-10💻 cs

Self-Attention And Beyond the Infinite: Towards Linear Transformers with Infinite Self-Attention

This paper introduces Infinite Self-Attention (InfSA) and its linear-time variant, Linear-InfSA, a spectral reformulation of self-attention as a diffusion process on token graphs that achieves state-of-the-art ImageNet accuracy and enables efficient, memory-free inference at ultra-high resolutions (up to 9216×9216) by replacing the quadratic softmax cost with a Neumann series approximation.

Giorgio Roffo, Luke Palmer2026-03-10💻 cs

WildActor: Unconstrained Identity-Preserving Video Generation

This paper introduces WildActor, a framework for unconstrained identity-preserving human video generation that leverages the large-scale Actor-18M dataset and novel attention mechanisms to overcome existing limitations in maintaining consistent full-body identities across dynamic shots, viewpoints, and motions.

Qin Guo, Tianyu Yang, Xuanhua He, Fei Shen, Yong Zhang, Zhuoliang Kang, Xiaoming Wei, Dan Xu2026-03-10💻 cs

Position: Evaluation of Visual Processing Should Be Human-Centered, Not Metric-Centered

This position paper argues that the evaluation of modern visual processing systems must shift from a reliance on single-metric benchmarks toward a human-centered, context-aware paradigm to better align with human perception and foster genuine innovation.

Jinfan Hu, Fanghua Yu, Zhiyuan You, Xiang Yin, Hongyu An, Xinqi Lin, Chao Dong, Jinjin Gu2026-03-10💻 cs

Sustainable Care: Designing Technologies That Support Children's Long-Term Engagement with Social Issues

This workshop paper proposes "sustainable care" as a design framework to help researchers and practitioners create digital technologies that foster children's long-term, meaningful engagement with social issues while preventing empathic distress and burnout.

JaeWon Kim, Aayushi Dangol, Rotem Landesman, Alexis Hiniker, McKenna F. Parnes2026-03-10💻 cs

DeAR: Fine-Grained VLM Adaptation by Decomposing Attention Head Roles

The paper proposes DeAR, a fine-grained adaptation framework for Vision-Language Models that decomposes attention heads into functional roles (Attribute, Generalization, and Mixed) using a Concept Entropy metric to selectively isolate task-specific learning from generalization capabilities, thereby achieving superior performance across diverse tasks while preserving zero-shot robustness.

Yiming Ma, Hongkun Yang, Lionel Z. Wang, Bin Chen, Weizhi Xian, Jianzhi Teng2026-03-10💻 cs

Digital Twin-Based Cooling System Optimization for Data Center

This paper presents a validated digital twin of the Frontier supercomputer's liquid cooling system to demonstrate that a ramp-constrained, joint optimization of flow rate and supply temperature can achieve 27.8% energy savings, significantly outperforming flow-only strategies by addressing the gap between theoretical optima and operational deployability.

Shrenik Jadhav, Zheng Liu2026-03-10💻 cs

Extended Empirical Validation of the Explainability Solution Space

This technical report extends the empirical validation of the Explainability Solution Space (ESS) framework by demonstrating its domain-independent applicability and systematic adaptability to diverse governance roles and stakeholder configurations through a cross-domain evaluation involving both employee attrition and urban resource allocation systems.

Antoni Mestre, Manoli Albert, Miriam Gil, Vicente Pelechano2026-03-10💻 cs

Energy Efficient Traffic Scheduling For Optical LEO Satellite Downlinks

This paper proposes and evaluates static and adaptive traffic scheduling schemes—including threshold, heuristic, and reinforcement learning-based approaches—to optimize energy efficiency and delivery ratios for energy-constrained optical LEO satellite downlinks facing weather-related disruptions.

Ethan Fettes, Pablo G. Madoery, Halim Yanikomeroglu, Gunes Karabulut Kurt, Abhishek Naik, Stéphane Martel2026-03-10💻 cs

HarmonyCell: Automating Single-Cell Perturbation Modeling under Semantic and Distribution Shifts

HarmonyCell is an end-to-end agent framework that automates single-cell perturbation modeling by combining an LLM-driven semantic unifier to resolve metadata incompatibilities and an adaptive Monte Carlo Tree Search engine to synthesize architectures that handle distribution shifts, thereby achieving high execution success and outperforming expert baselines without manual engineering.

Wenxuan Huang, Mingyu Tsoi, Yanhao Huang, Xinjie Mao, Xue Xia, Hao Wu, Jiaqi Wei, Yuejin Yang, Lang Yu, Cheng Tan, Xiang Zhang, Zhangyang Gao, Siqi Sun2026-03-10💻 cs

LLM-assisted Semantic Option Discovery for Facilitating Adaptive Deep Reinforcement Learning

This paper proposes a novel LLM-driven closed-loop framework that maps natural language instructions to executable rules and semantically annotates options to enhance the data efficiency, interpretability, and cross-environment transferability of Deep Reinforcement Learning, with experimental validation showing superior performance in constraint compliance and skill reuse.

Chang Yao, Jinghui Qin, Kebing Jin, Hankz Hankui Zhuo2026-03-10💻 cs

← Previous Next →