Welcome to Gist.Science
Research papers,
explained for humans.
We read the latest papers from arXiv, bioRxiv, and medRxiv and produce easy-to-understand explanations, key takeaways, and technical summaries — in ten languages.
Latest papers
View all from today →A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher?
This paper introduces MetaCompress, a metamorphic testing framework that reveals significant behavioral discrepancies between teacher and student code language models under adversarial conditions—discrepancies missed by traditional accuracy metrics—and demonstrates its effectiveness in evaluating the behavioral fidelity of models compressed via knowledge distillation.
FM-Agent: Scaling Formal Methods to Large Systems via LLM-Based Hoare-Style Reasoning
FM-Agent is a novel framework that leverages large language models to automate compositional reasoning for large-scale systems by generating function-level specifications from natural-language caller expectations, enabling the discovery of hundreds of critical bugs in complex codebases within days.
Multimodal Diffusion Forcing for Forceful Manipulation
This paper introduces Multimodal Diffusion Forcing (MDF), a unified framework that leverages random partial masking and diffusion models to learn rich temporal and cross-modal dependencies from expert trajectories, thereby achieving robust and versatile performance in contact-rich, forceful manipulation tasks.
Locally Trivial Deformations of Toric Varieties
This paper introduces a combinatorial deformation functor based on Čech cochains that is isomorphic to the locally trivial deformation functor of toric varieties under specific conditions, enabling new criteria for unobstructedness, explicit obstruction formulas, and a classification of unobstructed iterated -bundle threefolds.
CLAY: Conditional Visual Similarity Modulation in Vision-Language Embedding Space
The paper proposes CLAY, a training-free method that leverages pretrained Vision-Language Models to enable adaptive, multi-conditioned visual similarity retrieval by reframing the embedding space as text-conditional, achieving high accuracy and efficiency without requiring additional model training.
Enumeration of dihypergraphs with specified degrees and edge types
This paper provides asymptotic formulae for enumerating directed hypergraphs with specified in-degree and out-degree sequences, as well as fixed head and tail sizes for all hyperarcs, under the condition that the maximum values of these parameters remain sufficiently small.
SVD-Prune: Training-Free Token Pruning For Efficient Vision-Language Models
SVD-Prune is a training-free, plug-and-play token pruning method that utilizes Singular Value Decomposition and statistical leverage scores to select the most informative vision tokens, effectively overcoming the limitations of existing heuristic-based approaches to maintain high performance in Vision-Language Models even under extreme token budget constraints.
Symplectic structures on the space of space curves
This paper introduces new symplectic structures on the shape space of unparameterized space curves by combining the classical Marsden-Weinstein Liouville 1-form with Riemannian structures from shape analysis, and subsequently derives the corresponding Hamiltonian vector fields for several classical Hamiltonian functions.
RL makes MLLMs see better than SFT
This paper demonstrates that Reinforcement Learning (RL) significantly outperforms Supervised Fine-Tuning (SFT) in enhancing Multimodal Large Language Models by fundamentally reshaping their vision encoders to produce stronger, more localized visual representations, leading to the proposal of a computationally efficient training framework called Preference-Instructed Vision OpTimization (PIVOT).
From Translation to Superset: Benchmark-Driven Evolution of a Production AI Agent from Rust to Python
This paper presents a benchmark-driven methodology for migrating a 648K-line production Rust AI agent to Python, demonstrating that LLM-assisted translation guided by public benchmarks not only achieves functional parity but also enables the creation of a more expressive, feature-rich superset with significantly reduced code volume.