cs papers | Gist.Science

See and Switch: Vision-Based Branching for Interactive Robot-Skill Programming

This paper introduces "See & Switch," a vision-based interactive framework for Programming by Demonstration that utilizes eye-in-hand images to enable reliable online conditional branching and anomaly detection in dexterous robot tasks, achieving high accuracy across diverse conditions and novice users.

Petr Vanc, Jan Kristof Behrens, Václav Hlaváč, Karla Stepanova2026-03-10💻 cs

ImageEdit-R1: Boosting Multi-Agent Image Editing via Reinforcement Learning

ImageEdit-R1 is a novel multi-agent framework that employs reinforcement learning to coordinate specialized vision-language and generative agents, enabling dynamic, context-aware image editing that outperforms existing monolithic models and baselines in handling complex, multi-step user instructions.

Yiran Zhao, Yaoqi Ye, Xiang Liu, Michael Qizhe Shieh, Trung Bui2026-03-10💻 cs

CinemaWorld: Generative Augmented Reality with LLMs and 3D Scene Generation for Movie Augmentation

CinemaWorld is a generative augmented reality system for the Meta Quest 3 that uses multimodal large language models and generative AI to extract features from 2D movie scenes and automatically synthesize synchronized 3D mixed reality content, thereby enhancing viewer immersion and enjoyment as validated through technical, user, and expert evaluations.

Keiichi Ihara, DaeHo Lee, Manato Abe, Hye-Young Jo, Ryo Suzuki2026-03-10💻 cs

Enhancing Cross-View UAV Geolocalization via LVLM-Driven Relational Modeling

This paper proposes a novel plug-and-play ranking architecture that leverages Large Vision-Language Models (LVLMs) and a relational-aware loss function to explicitly model cross-view interactions, thereby significantly enhancing the accuracy and stability of UAV-to-satellite image geolocalization.

Bowen Liu, Pengyue Jia, Wanyu Wang, Derong Xu, Jiawei Cheng, Jiancheng Dong, Xiao Han, Zimo Zhao, Chao Zhang, Bowen Yu, Fangyu Hong, Xiangyu Zhao2026-03-10💻 cs

Evaluating Generative Models via One-Dimensional Code Distributions

This paper proposes a novel evaluation framework for generative models that replaces traditional continuous feature-based metrics with training-free and no-reference metrics operating in discrete visual token space, demonstrating superior correlation with human judgments across a new large-scale benchmark called VisForm.

Zexi Jia, Pengcheng Luo, Yijia Zhong, Jinchao Zhang, Jie Zhou2026-03-10💻 cs

In-Context Reinforcement Learning for Tool Use in Large Language Models

This paper proposes In-Context Reinforcement Learning (ICRL), a novel framework that eliminates the need for supervised fine-tuning by leveraging few-shot prompting during reinforcement learning rollouts to progressively teach large language models how to effectively use external tools, ultimately achieving state-of-the-art performance in a data-efficient, zero-shot manner.

Yaoqi Ye, Yiran Zhao, Keyu Duan, Zeyu Zheng, Kenji Kawaguchi, Cihang Xie, Michael Qizhe Shieh2026-03-10💻 cs

Synthetic Defect Image Generation for Power Line Insulator Inspection Using Multimodal Large Language Models

This paper proposes a training-free pipeline using multimodal large language models to generate diverse, high-fidelity synthetic defect images for power line insulators, which significantly improves classification performance and data efficiency in low-data regimes by augmenting limited real-world datasets.

Xuesong Wang, Caisheng Wang2026-03-10💻 cs

Geometric Give and Take

This paper analyzes a geometric balancing game on line arrangements, determining the minimum initial number of pebbles required for Alice to prevent Bob from emptying any box and proving that this threshold scales as $\Theta(n^3)$ for $n$ lines in general position.

Oswin Aichholzer, Katharina Klost, Kristin Knorr, Viola Mészáros, Josef Tkadlec2026-03-10💻 cs

TALON: Test-time Adaptive Learning for On-the-Fly Category Discovery

The paper proposes TALON, a test-time adaptive learning framework for on-the-fly category discovery that overcomes the limitations of static hash-based methods by dynamically updating semantic prototypes and the feature encoder to continuously integrate new knowledge, while employing margin-aware logit calibration to prevent category explosion and significantly improve novel-class accuracy.

Yanan Wu, Yuhan Yan, Tailai Chen, Zhixiang Chi, ZiZhang Wu, Yi Jin, Yang Wang, Zhenbo Li2026-03-10💻 cs

Why Large Language Models can Secretly Outperform Embedding Similarity in Information Retrieval

Although the study finds that Large Language Model-based relevance judgment systems do not outperform embedding-based retrieval on standard TREC-DL 2019 benchmarks due to the short-sightedness inherent in human annotations, it argues that these models possess the theoretical capability to surpass embedding methods by better understanding relevance through reasoning.

Matei Benescu, Ivo Pascal de Jong2026-03-10💻 cs

Augmented Model Predictive Control: A Balance between Satellite Agility and Computation Complexity

This paper introduces an augmented Model Predictive Control method for agile earth observation satellites that effectively balances high-performance nonlinear control capabilities with the computational simplicity required for hardware implementation, validated through both numerical simulations and physical experiments.

Yiming Wang, Mihindukulasooriya Sheral Crescent Tissera, Haihong Yu, Kai Jie Ethan Foo, Sean Yeo Keyuan, Ankit Srivastava, Hao An2026-03-10💻 cs

M-ABD: Scalable, Efficient, and Robust Multi-Affine-Body Dynamics

This paper introduces M-ABD, a scalable and robust framework that leverages linear kinematic mapping and a compact dual-space formulation of Affine Body Dynamics to enable interactive, stable simulation of large-scale articulated assemblies with hundreds of thousands of bodies on a single CPU core.

Zhiyong He (University of Utah), Dewen Guo (University of Utah), Minghao Guo (MIT), Yili Zhao (ByteDance), Wojciech Matusik (MIT), Hao Su (UCSD), Chenfanfu Jiang (UCLA), Peter Yichen Chen (UBC), Yin Yang (University of Utah)2026-03-10💻 cs

MRDrive: An Open Source Mixed Reality Driving Simulator for Automotive User Research

This paper introduces MRDrive, an open-source mixed reality driving simulator that bridges the gap between ecological validity and experimental control by allowing users to interact with a real vehicle cabin while immersed in a virtual driving environment to support automotive user research.

Patrick Ebel, Michał Patryk Miazga, Martin Lorenz, Timur Getselev, Pavlo Bazilinskyy, Celine Conzen2026-03-10💻 cs

The AI Amplifier Effect: Defining Human-AI Intimacy and Romantic Relationships with Conversational AI

Based on interviews with 30 users, this paper defines human-AI intimacy and introduces the "AI Amplifier Effect" to explain how conversational AI intensifies users' existing emotional states, thereby highlighting the need for HCI research that balances platform regulation with user well-being in designing romantic AI relationships.

Ching Christie Pang, Yi Gao, Xuetong Wang, Pan Hui2026-03-10💻 cs

From Reactive to Map-Based AI: Tuned Local LLMs for Semantic Zone Inference in Object-Goal Navigation

This paper proposes a "Map-Based AI" framework that integrates a LoRA-fine-tuned Llama-2 model for semantic zone inference with a hybrid topological-grid mapping system to enable systematic, TSP-optimized exploration, significantly outperforming traditional reactive baselines in Object-Goal Navigation tasks within the AI2-THOR simulator.

Yudai Noda, Kanji Tanaka2026-03-10💻 cs

Adaptive Vision-Based Control of Redundant Robots with Null-Space Interaction for Human-Robot Collaboration

This paper proposes a novel adaptive vision-based control scheme with null-space interaction for redundant robots that ensures stable, safe, and effective human-robot collaboration in unknown environments by decoupling primary task execution from interactive adjustments, as validated through augmented reality experiments and Lyapunov stability analysis.

Xiangjie Yan, Chen Chen, Xiang Li2026-03-10💻 cs

DSH-Bench: A Difficulty- and Scenario-Aware Benchmark with Hierarchical Subject Taxonomy for Subject-Driven Text-to-Image Generation

This paper introduces DSH-Bench, a comprehensive benchmark featuring a hierarchical subject taxonomy, granular difficulty and scenario classification, and a novel Subject Identity Consistency Score (SICS) metric to systematically evaluate and diagnose subject-driven text-to-image generation models.

Zhenyu Hu, Qing Wang, Te Cao, Luo Liao, Longfei Lu, Liqun Liu, Shuang Li, Hang Chen, Mengge Xue, Yuan Chen, Chao Deng, Peng Shu, Huan Yu, Jie Jiang2026-03-10💻 cs

TrianguLang: Geometry-Aware Semantic Consensus for Pose-Free 3D Localization

TrianguLang is a feed-forward, pose-free framework for 3D object localization that leverages Geometry-Aware Semantic Attention to achieve state-of-the-art accuracy and geometric consistency across multiple views without requiring camera calibration or per-scene optimization.

Bryce Grant, Aryeh Rothenberg, Atri Banerjee, Peng Wang2026-03-10💻 cs

PathBench: Speech Intelligibility Benchmark for Automatic Pathological Speech Assessment

This paper introduces PathBench, a unified benchmark for pathological speech intelligibility assessment that establishes systematic baselines across six public datasets and three evaluation protocols, while proposing the Dual-ASR Articulatory Precision (DArtP) method as a top-performing reference-free approach.

Bence Mark Halpern, Thomas Tienkamp, Defne Abur, Tomoki Toda2026-03-10💻 cs

Adaptive MLP Pruning for Large Vision Transformers

This paper proposes Adaptive MLP Pruning (AMP), a method that utilizes a label-free information entropy criterion for accurate neuron importance evaluation and a binary search algorithm for adaptive pruning, achieving roughly 40% parameter and FLOPs reduction in large vision transformers like CLIP and DINOv2 with near-lossless performance.

Chengchao Shen2026-03-10💻 cs

← Previous Next →