cs papers | Gist.Science

Escaping The Big Data Paradigm in Self-Supervised Representation Learning

This paper introduces SCOTT, a sparse convolutional tokenizer combined with a MIM-JEPA training framework, which enables Vision Transformers to learn robust self-supervised representations from scratch on small-scale, fine-grained datasets, thereby challenging the necessity of big data and massive computational resources for effective vision representation learning.

Carlos Vélez García, Miguel Cazorla, Jorge Pomares2026-03-09💻 cs

Cracking Vector Search Indexes

This paper proposes CrackIVF, an adaptive, partition-based index for vector search in data lakes that progressively optimizes itself based on query workloads, enabling immediate query responses without upfront index construction while eventually matching the performance of conventional indexes.

Vasilis Mageirakos, Bowen Wu, Gustavo Alonso2026-03-09💻 cs

Whole-Body Model-Predictive Control of Legged Robots with MuJoCo

This paper demonstrates that a simple iterative LQR algorithm using MuJoCo dynamics and finite-difference derivatives can achieve effective, real-time whole-body model-predictive control for quadruped and humanoid robots in the real world with minimal sim-to-real tuning, thereby lowering the barrier for future research.

John Z. Zhang, Taylor A. Howell, Zeji Yi, Chaoyi Pan, Guanya Shi, Guannan Qu, Tom Erez, Yuval Tassa, Zachary Manchester2026-03-09💻 cs

NAMI: Efficient Image Generation via Bridged Progressive Rectified Flow Transformers

The paper introduces NAMI, a Bridged Progressive Rectified Flow Transformer framework that significantly accelerates image generation and reduces inference time by 64% through a multi-resolution, spatially cascaded architecture with a BridgeFlow module, while maintaining state-of-the-art quality and introducing the NAMI-1K benchmark for evaluation.

Yuhang Ma, Bo Cheng, Shanyuan Liu, Hongyi Zhou, Liebucha Wu, Dawei Leng, Yuhui Yin2026-03-09💻 cs

ECLARE: Efficient cross-planar learning for anisotropic resolution enhancement

ECLARE is an open-source, self-supervised super-resolution method that enhances anisotropic 2D MR volumes by estimating slice profiles and learning in-plane mappings without external data, thereby overcoming domain shift and outperforming existing techniques in both signal recovery and downstream tasks.

Samuel W. Remedios, Shuwen Wei, Shuo Han, Jinwei Zhang, Aaron Carass, Kurt G. Schilling, Dzung L. Pham, Jerry L. Prince, Blake E. Dewey2026-03-09💻 cs

VISKY: Virtual Inertia Skyhook Control for Semi-Active Suspension Systems Using Magnetorheological Dampers

This paper introduces VISKY, a computationally efficient semi-active control strategy for magnetorheological suspensions that enhances ride comfort and stability by implementing a virtual inertia matrix through acceleration feedback, effectively outperforming conventional Skygroundhook methods across various road conditions.

Hansol Lim, Jee Won Lee, Seung-Bok Choi, Jongseong Brad Choi2026-03-09💻 cs

EarthScape: A Multimodal Dataset for Surficial Geologic Mapping and Earth Surface Analysis

The paper introduces EarthScape, a multimodal dataset and reproducible pipeline designed to automate surficial geologic mapping by integrating diverse geospatial data sources, demonstrating that terrain features provide the most robust predictive signal while highlighting the dataset's utility for benchmarking multimodal fusion and domain adaptation.

Matthew Massey, Nusrat Munia, Abdullah-Al-Zubaer Imran2026-03-09💻 cs

Evaluating quality metrics through the lenses of psychophysical measurements of low-level vision

This paper introduces a new framework of psychophysical tests based on low-level vision principles—specifically contrast sensitivity, masking, and matching—to evaluate and reveal the perceptual strengths and weaknesses of 34 existing image and video quality metrics, demonstrating that standard evaluation protocols often fail to capture these fundamental human visual properties.

Dounia Hammou, Yancheng Cai, Pavan Madhusudanarao, Christos G. Bampis, Rafał K. Mantiuk2026-03-09💻 cs

Graph-based Online Lidar Odometry with Retrospective Map Refinement

This paper presents a graph-based online Lidar odometry method that enhances trajectory estimation and map accuracy by registering scans against multiple overlapping submaps and performing retrospective refinement of their anchor points, achieving superior performance on automotive datasets while maintaining real-time operation.

Aaron Kurda, Simon Steuernagel, Marcus Baum2026-03-09💻 cs

GenCLIP: Generalizing CLIP Prompts for Zero-shot Anomaly Detection

This paper proposes GenCLIP, a novel framework for zero-shot anomaly detection that enhances generalization and stability through multi-layer prompting, a dual-branch inference strategy balancing category specificity with generalization, and an adaptive text prompt filtering mechanism.

Donghyeong Kim, Chaewon Park, Suhwan Cho + 4 more2026-03-09💻 cs

FAST: An Efficient Scheduler for All-to-All GPU Communication

FAST is an efficient scheduler designed to overcome the scalability and performance limitations of existing solutions for All-to-All(v) communication in dynamic Mixture-of-Experts workloads by addressing traffic skew and incast congestion while drastically reducing synthesis time on modern GPU clusters.

Yiran Lei, Dongjoo Lee, Liangyu Zhao, Daniar Kurniawan, Chanmyeong Kim, Heetaek Jeong, Changsu Kim, Hyeonseong Choi, Liangcheng Yu, Arvind Krishnamurthy, Justine Sherry, Eriko Nurvitadhi2026-03-09💻 cs

BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation

This paper introduces BusterX, an MLLM-powered framework for AI-generated video forgery detection that combines the comprehensive GenBuster-200K dataset and the multi-track GenBuster-Bench to shift from black-box classification to interpretable visual reasoning, achieving superior accuracy and explanation quality compared to leading models.

Haiquan Wen, Yiwei He, Zhenglin Huang + 7 more2026-03-09💻 cs

DVD-Quant: Data-free Video Diffusion Transformers Quantization

This paper introduces DVD-Quant, a novel data-free post-training quantization framework for Video Diffusion Transformers that utilizes Bounded-init Grid Refinement, Auto-scaling Rotated Quantization, and $\delta$ -Guided Bit Switching to achieve a 2 $\times$ speedup and enable W4A4 quantization without compromising visual fidelity.

Zhiteng Li, Hanxuan Li, Junyi Wu, Kai Liu, Haotong Qin, Linghe Kong, Guihai Chen, Yulun Zhang, Xiaokang Yang2026-03-09💻 cs

Alchemist: Turning Public Text-to-Image Data into Generative Gold

This paper introduces "Alchemist," a novel methodology that leverages pre-trained generative models to curate a compact, high-quality, general-purpose supervised fine-tuning dataset, which significantly enhances the aesthetic quality and alignment of public text-to-image models while preserving diversity.

Valerii Startsev, Alexander Ustyuzhanin, Alexey Kirillov, Dmitry Baranchuk, Sergey Kastryulin2026-03-09💻 cs

Systems of Twinned Systems: A Systematic Literature Review

This paper presents a systematic literature review of 80 studies on "systems of twinned systems," a paradigm that integrates the System of Systems and Digital Twin concepts to address modern engineering complexity, resulting in a new classification framework compatible with existing theories.

Feyi Adesanya, Kanan Castro Silva, Valdemar V. Graciano Neto, Istvan David2026-03-09💻 cs

Instance Data Condensation for Image Super-Resolution

This paper introduces Instance Data Condensation (IDC), a novel framework utilizing Random Local Fourier Feature Extraction and Multi-level Feature Distribution Matching to synthesize a highly compact (10% volume) dataset for Image Super-Resolution that achieves performance comparable to the original full dataset while significantly reducing computational and storage requirements.

Tianhao Peng, Ho Man Kwan, Yuxuan Jiang, Ge Gao, Fan Zhang, Xiaozhong Xu, Shan Liu, David Bull2026-03-09💻 cs

Linear Layouts: Robust Code Generation of Efficient Tensor Computation Using $\mathbb{F}_2$

This paper introduces "Linear Layouts," a novel framework that models tensor layouts as linear algebra operations over $\mathbb{F}_2$ to enable generic, efficient, and bug-free layout definitions and conversions for deep learning workloads, successfully integrating with the Triton compiler to overcome the limitations of existing case-by-case approaches.

Keren Zhou, Mario Lezcano, Adam Goucher, Akhmed Rakhmati, Jeff Niu, Justin Lebar, Pawel Szczerbuk, Peter Bell, Phil Tillet, Thomas Raoux, Zahi Moudallal2026-03-09💻 cs

Robustness-Aware Tool Selection and Manipulation Planning with Learned Energy-Informed Guidance

This paper introduces a robustness-aware framework that jointly selects tools and plans contact-rich manipulation trajectories by leveraging an energy-based metric to optimize for disturbance resilience in robotic tool-use tasks.

Yifei Dong, Yan Zhang, Sylvain Calinon, Florian T. Pokorny2026-03-09💻 cs

ROS-related Robotic Systems Development with V-model-based Application of MeROS Metamodel

This paper proposes a structured methodology that integrates the Robot Operating System (ROS) with Model-Based Systems Engineering (MBSE) through a specialized SysML metamodel called MeROS and an adapted V-model, aiming to enhance the semantic coherence, structural traceability, and reliable coordination of complex heterogeneous robotic systems.

Tomasz Winiarski, Jan Kaniuka, Daniel Giełdowski, Jakub Ostrysz, Krystian Radlak, Dmytro Kushnir2026-03-09💻 cs

VisualPrompter: Semantic-Aware Prompt Optimization with Visual Feedback for Text-to-Image Synthesis

VisualPrompter is a novel, training-free framework that enhances text-to-image synthesis by employing a self-reflection module to identify missing concepts and a fine-grained optimization mechanism to refine prompts, thereby achieving state-of-the-art semantic alignment between user descriptions and generated images.

Shiyu Wu, Mingzhen Sun, Weining Wang, Yequan Wang, Jing Liu2026-03-09💻 cs

← Previous Next →

cs