cs papers | Gist.Science

Synthetic Visual Genome 2: Extracting Large-scale Spatio-Temporal Scene Graphs from Videos

This paper introduces Synthetic Visual Genome 2 (SVG2), a large-scale automated panoptic video scene graph dataset with over 636K videos, and presents TRaSER, a novel model that leverages trajectory-aligned token mechanisms to significantly outperform existing baselines in scene graph generation and downstream video question answering tasks.

Ziqi Gao, Jieyu Zhang, Wisdom Oluchi Ikezogwo, Jae Sung Park, Tario G. You, Daniel Ogbu, Chenhao Zheng, Weikai Huang, Yinuo Yang, Winson Han, Quan Kong, Rajat Saini, Ranjay Krishna2026-03-09💻 cs

Learning Robust Control Policies for Inverted Pose on Miniature Blimp Robots

This paper presents a novel framework that combines a calibrated 3D simulation environment, a robust TD3-based control policy with domain randomization, and a mapping layer to successfully enable miniature blimp robots to achieve and maintain inverted poses in real-world settings.

Yuanlin Yang, Lin Hong, Fumin Zhang2026-03-09💻 cs

Adaptive Dynamic Dehazing via Instruction-Driven and Task-Feedback Closed-Loop Optimization for Diverse Downstream Task Adaptation

This paper proposes a novel adaptive dynamic dehazing framework that utilizes a closed-loop optimization mechanism combining task performance feedback and text-based instruction guidance to enable real-time, training-free adaptation of dehazing outputs for diverse downstream vision tasks.

Yafei Zhang, Shuaitian Song, Huafeng Li, Shujuan Wang, Yu Liu2026-03-09💻 cs

Cross-Scale Pansharpening via ScaleFormer and the PanScale Benchmark

This paper introduces PanScale, a large-scale cross-scale pansharpening dataset and benchmark, alongside ScaleFormer, a novel transformer-based architecture that achieves superior generalization across varying image resolutions by reframing scale adaptation as sequence length generalization through tokenization and rotary positional encoding.

Ke Cao, Xuanhua He, Xueheng Li, Lingting Zhu, Yingying Wang, Ao Ma, Zhanjie Zhang, Man Zhou, Chengjun Xie, Jie Zhang2026-03-09💻 cs

From OCR to Analysis: Tracking Correction Provenance in Digital Humanities Pipelines

This paper proposes a provenance-aware framework for tracking OCR correction lineage in digital humanities pipelines, demonstrating that recording edit details at the span level significantly improves the reproducibility and interpretability of downstream NLP tasks by revealing how textual transformations impact scholarly analysis.

Haoze Guo, Ziqi Wei2026-03-09💻 cs

Mobile-VTON: High-Fidelity On-Device Virtual Try-On

Mobile-VTON is a high-fidelity, privacy-preserving framework that enables fully offline virtual try-on on commodity mobile devices by utilizing a modular TGT architecture with feature-guided adversarial distillation and trajectory-consistency training to match server-based performance without requiring cloud computing.

Zhenchen Wan, Ce Chen, Runqi Lin, Jiaxin Huang, Tianxi Chen, Yanwu Xu, Tongliang Liu, Mingming Gong2026-03-09💻 cs

ROSER: Few-Shot Robotic Sequence Retrieval for Scalable Robot Learning

The paper introduces ROSER, a lightweight few-shot retrieval framework that extracts reusable, task-centric segments from unlabeled robotic logs using only 3-5 reference examples, thereby overcoming data scarcity by enabling scalable, high-accuracy utilization of large-scale continuous interaction datasets without task-specific training.

Zillur Rahman, Eddison Pham, Alejandro Daniel Noel, Cristian Meo2026-03-09💻 cs

FastLightGen: Fast and Light Video Generation with Fewer Steps and Parameters

FastLightGen is a novel algorithm that simultaneously compresses model parameters and reduces inference steps through an optimized teacher-student distillation framework, achieving state-of-the-art efficiency and visual quality in video generation with significantly fewer resources.

Shitong Shao, Yufei Gu, Zeke Xie2026-03-09💻 cs

VSearcher: Long-Horizon Multimodal Search Agent via Reinforcement Learning

This paper introduces VSearcher, a reinforcement learning-based multimodal search agent that transforms static models into capable long-horizon web browsers through an iterative data synthesis pipeline and an SFT-then-RL training strategy, achieving superior performance on the proposed MM-SearchExam benchmark.

Ruiyang Zhang, Qianguo Sun, Chao Song, Yiyan Qi, Zhedong Zheng2026-03-09💻 cs

Think-as-You-See: Streaming Chain-of-Thought Reasoning for Large Vision-Language Models

This paper introduces Think-as-You-See (TaYS), a unified framework that enables concurrent, streaming Chain-of-Thought reasoning for Large Vision-Language Models by decoupling visual encoding from textual reasoning, thereby outperforming traditional batch and interleaved approaches in both accuracy and latency for real-time video understanding.

Jialiang Zhang, Junlong Tong, Junyan Lin, Hao Wu, Yirong Sun, Yunpu Ma, Xiaoyu Shen2026-03-09💻 cs

AI Researchers' Views on Automating AI R&D and Intelligence Explosions

A 2025 survey of 25 leading AI researchers reveals a consensus that automating AI research poses a severe and urgent risk due to the potential for recursive self-improvement, while highlighting significant disagreements on timelines, the likelihood of explosive growth, and the most effective governance strategies.

Severin Field, Raymond Douglas, David Krueger2026-03-09💻 cs

Scrambler: Mixed Boolean Arithmetic Obfuscation Tool Using E-graph and Equality Expansion

The paper introduces Scrambler, an e-graph-based tool that utilizes Equality Expansion to efficiently generate complex and diverse Mixed Boolean Arithmetic obfuscation expressions with guaranteed equivalence, demonstrating superior expressiveness and complexity compared to existing solutions.

Seoksu Lee, Sangjun An, Eun-Sun Cho2026-03-09💻 cs

Efficient Query Rewrite Rule Discovery via Standardized Enumeration and Learning-to-Rank(extend)

This paper presents SLER, a scalable system that combines standardized template enumeration with a learning-to-rank model to overcome the exponential search space and redundancy challenges of existing methods, successfully discovering over one million high-quality query rewrite rules for complex query plans.

Yuan Zhang, Yuxing Chen, Yuekun Yu, Jinbin Huang, Rui Mao, Anqun Pan, Lixiong Zheng, Jianbin Qin2026-03-09💻 cs

Publication and Maintenance of Relational Data in Enterprise Knowledge Graphs (Revised Version)

This paper proposes a formal framework, architecture, and algorithms for constructing and incrementally maintaining materialized RDB2RDF views to enable efficient, semantically integrated access to legacy relational data within Enterprise Knowledge Graphs.

Vânia Maria Ponte Vidal (Departamento de Computação, UFC, Fortaleza, Brazil), Valéria Magalhães Pequeno (TechLab, Departamento de Ciências e Tecnologias, UAL, Lisboa, Portugal), Marco Antonio Casanova (Instituto Tecgraf, Puc-Rio, Rio de Janeiro, Brazil), Narciso Arruda (Departamento de Computação, UFC, Fortaleza, Brazil), Carlos Brito (Departamento de Computação, UFC, Fortaleza, Brazil)2026-03-09💻 cs

XR and Hybrid Data Visualization Spaces for Enhanced Data Analytics

This paper advocates for the use of Extended Reality (XR) to create hybrid visualization spaces that seamlessly integrate 2D and 3D representations, offering a solution to the challenges of high-dimensional data and AI interpretability through three supporting case studies.

Santiago Lombeyda, S. G. Djorgovski, Ciro Donalek2026-03-09💻 cs

Biometric-enabled Personalized Augmentative and Alternative Communications

This study proposes a roadmap for integrating biometric technologies into personalized Augmentative and Alternative Communication (AAC) systems by introducing concepts like the AAC biometric register, while highlighting through case studies that current AI accuracy in gesture and sign language recognition remains insufficient for practical applications and offering recommendations to bridge this gap.

S. Yanushkevich, E. Berepiki, P. Ciunkiewicz, V. Shmerko, G. Wolbring, R. Guest2026-03-09💻 cs

The People's Gaze: Co-Designing and Refining Gaze Gestures with General Users and Gaze Interaction Experts

This paper presents a two-phase methodology that combines co-design workshops with non-expert users and expert refinement to develop a grounded, intuitive set of 32 gaze gestures and design principles for hands-free interaction on gaze-enabled devices.

Yaxiong Lei, Xinya Gong, Shijing He, Yafei Wang, Mohamed Khamis, Juan Ye2026-03-09💻 cs

Enhancing Tool Calling in LLMs with the International Tool Calling Dataset

This paper introduces International Tool Calling (ITC), a large-scale, multilingual benchmark comprising over 3,500 real APIs and 17,000 tasks across 40 countries, designed to address the limitations of existing datasets by improving LLM robustness, cross-lingual generalization, and performance in realistic global tool-calling scenarios.

Zuoyu Zhang, Yancheng Zhu2026-03-09💻 cs

Human-Centered Ambient and Wearable Sensing for Automated Monitoring in Dementia Care: A Scoping Review

This scoping review maps the landscape of wearable and ambient sensing technologies for dementia care from 2015 to 2025, proposing five key human-centered implementation principles to guide the development of ethical, scalable, and autonomy-enhancing monitoring systems.

Mason Kadem, Sarah Masri, Anthea Innes, Rong Zheng2026-03-09💻 cs

CoEditor++: Instruction-based Visual Editing via Cognitive Reasoning

CoEditor++ is a training-free, cognitively structured framework that leverages a two-stage "what-to-edit" and "how-to-edit" reasoning process with self-reflection to achieve state-of-the-art, visually consistent, and interpretable instruction-based image editing using only open-source components.

Minheng Ni, Yutao Fan, Zhengyuan Yang, Yeli Shen, Yuxiang Wei, Yaowen Zhang, Lijuan Wang, Lei Zhang, Wangmeng Zuo2026-03-09💻 cs

← Previous Next →