cs.CV 篇论文 | Gist.Science

Cora: Correspondence-aware image editing using few step diffusion

本文提出了名为 Cora 的图像编辑框架，通过引入对应感知噪声校正和插值注意力图，利用语义对应关系在少步扩散过程中实现结构保持与纹理转移的平衡，从而有效解决了现有方法在处理非刚性形变、物体修改及内容生成时易产生伪影或丢失关键属性的难题。

Amirhossein Alimohammadi, Aryan Mikaeili, Sauradip Nag + 3 more2026-03-02💻 cs

ECAM: A Contrastive Learning Approach to Avoid Environmental Collision in Trajectory Forecasting

本文提出了名为 ECAM 的基于对比学习的模块，旨在增强现有行人轨迹预测模型对环境障碍的感知与避让能力，从而显著降低预测轨迹中的碰撞率。

Giacomo Rosin, Muhammad Rameez Ur Rahman, Sebastiano Vascon2026-03-02💻 cs

LLM-Enhanced Multimodal Fusion for Cross-Domain Sequential Recommendation

本文提出了 LLM-EMF 模型，通过利用大语言模型增强文本信息并结合冻结 CLIP 模型融合视觉与文本数据，利用多注意力机制有效捕捉跨域用户偏好，从而在多个电商数据集上显著提升了跨域序列推荐的性能。

Wangyu Wu, Zhenhong Chen, Wenqiao Zhang + 5 more2026-03-02💻 cs

Distilling Balanced Knowledge from a Biased Teacher

本文提出了长尾知识蒸馏（LTKD）框架，通过将蒸馏目标分解为组间和组内损失并引入重平衡与重加权机制，有效解决了传统知识蒸馏在长尾分布下因教师模型偏差而导致的尾部类别性能不足问题。

Seonghak Kim2026-03-02💻 cs

Empowering Small VLMs to Think with Dynamic Memorization and Exploration

本文提出了 DyME 框架，通过动态平衡监督微调（SFT）与强化学习（RLVR）并引入视觉监督机制，有效解决了小尺度视觉语言模型（SVLMs）在训练思考能力时面临的记忆伪迹与探索不稳定问题，从而显著提升了其在专有任务中的性能与可靠性。

Jiazhen Liu, Yuchuan Deng, Long Chen2026-03-02💻 cs

SelvaBox: A high-resolution dataset for tropical tree crown detection

本文介绍了 SelvaBox，这是一个涵盖三个国家、包含超过 83,000 个手动标注树冠的开源高分辨率无人机影像数据集，旨在解决热带森林树冠检测中数据稀缺的难题，并证明了其在提升检测精度及实现跨数据集零-shot 泛化方面的卓越性能。

Hugo Baudchon, Arthur Ouaknine, Martin Weiss + 5 more2026-03-02💻 cs

Concept-based Adversarial Attack: a Probabilistic Perspective

本文提出了一种基于概率视角的概念型对抗攻击框架，该框架通过在概念分布上采样生成多样化的对抗样本，在有效保持原始概念（如身份或类别）的同时，实现了对分类器的高效攻击。

Andi Zhang, Xuan Ding, Steven McDonagh + 1 more2026-03-02🤖 cs.AI

Knowledge-Guided Machine Learning: Illustrating the use of Explainable Boosting Machines to Identify Overshooting Tops in Satellite Imagery

本文展示了如何利用知识引导的机器学习方法，通过从卫星图像中提取标量特征并训练可解释的增强机器（EBM）模型，结合人类专家策略来识别卫星图像中的 overshooting tops，从而在气象高 stakes 应用中实现可解释且可靠的机器学习。

Nathan Mitchell, Lander Ver Hoef, Imme Ebert-Uphoff + 4 more2026-03-02🤖 cs.LG

pFedMMA: Personalized Federated Fine-Tuning with Multi-Modal Adapter for Vision-Language Models

本文提出了 pFedMMA，这是首个利用多模态适配器进行个性化联邦微调的框架，它通过让客户端本地适配个性化数据分布并协同训练全局共享投影，在保持通信高效的同时实现了视觉语言模型在个性化与泛化能力之间的最佳平衡。

Sajjad Ghiasvand, Mahnoosh Alizadeh, Ramtin Pedarsani2026-03-02🤖 cs.LG

Conformal Prediction for Long-Tailed Classification

针对长尾分类中现有共形预测方法在集合大小与类条件覆盖率之间难以兼顾的问题，本文提出了基于流行度调整 Softmax 的评分函数及边际与类条件预测的线性插值新流程，实现了两者之间的平滑权衡。

Tiffany Ding, Jean-Baptiste Fermanian, Joseph Salmon2026-03-02📊 stat

Animal behavioral analysis and neural encoding with transformer-based self-supervised pretraining

该论文提出了 BEAST 框架，通过结合掩码自编码与时序对比学习对 Transformer 进行自监督预训练，有效利用无标签视频数据，在多种物种及单/多动物场景下显著提升了神经行为分析、姿态估计及动作分割等任务的性能。

Yanchen Wang, Han Yu, Ari Blau + 5 more2026-03-02🧬 q-bio

Fast Learning of Non-Cooperative Spacecraft 3D Models through Primitive Initialization

本文提出了一种通过卷积神经网络利用单目图像生成原始几何体以初始化 3D 高斯泼溅（3DGS）的框架，该框架不仅显著降低了训练所需的迭代次数和图像数量，还能在姿态估计存在噪声或隐式的情况下实现非合作航天器的高保真 3D 模型快速学习。

Pol Francesch Huc, Emily Bates, Simone D'Amico2026-03-02🤖 cs.LG

DA-Occ: Direction-Aware 2D Convolution for Efficient and Geometry-Preserving 3D Occupancy Prediction in Autonomous Driving

本文提出了 DA-Occ，一种基于方向感知卷积和高度分数投影的纯 2D 框架，旨在解决现有 3D 占据预测方法在精度与效率之间的权衡难题，通过保留垂直几何信息在 Occ3D-nuScenes 数据集上实现了 39.3% 的 mIoU 和 27.7 FPS 的实时推理速度。

Yuchen Zhou, Yan Luo, Xiaogang Wang + 3 more2026-03-02💻 cs

cs.CV

Cora: Correspondence-aware image editing using few step diffusion

ECAM: A Contrastive Learning Approach to Avoid Environmental Collision in Trajectory Forecasting

LLM-Enhanced Multimodal Fusion for Cross-Domain Sequential Recommendation

Distilling Balanced Knowledge from a Biased Teacher

Empowering Small VLMs to Think with Dynamic Memorization and Exploration

SelvaBox: A high-resolution dataset for tropical tree crown detection

Concept-based Adversarial Attack: a Probabilistic Perspective

Knowledge-Guided Machine Learning: Illustrating the use of Explainable Boosting Machines to Identify Overshooting Tops in Satellite Imagery

pFedMMA: Personalized Federated Fine-Tuning with Multi-Modal Adapter for Vision-Language Models

Conformal Prediction for Long-Tailed Classification

Animal behavioral analysis and neural encoding with transformer-based self-supervised pretraining

Fast Learning of Non-Cooperative Spacecraft 3D Models through Primitive Initialization

DA-Occ: Direction-Aware 2D Convolution for Efficient and Geometry-Preserving 3D Occupancy Prediction in Autonomous Driving

AutoDebias: Automated Framework for Debiasing Text-to-Image Models

Less is More: AMBER-AFNO -- a New Benchmark for Lightweight 3D Medical Image Segmentation

AnimateScene: Camera-controllable Animation in Any Scene

BeeNet: Reconstructing Flower Shapes from Electric Fields using Deep Learning

Structure-aware Contrastive Learning for Diagram Understanding of Multimodal Models

Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing

MEGS $^{2}$ : Memory-Efficient Gaussian Splatting via Spherical Gaussians and Unified Pruning

cs.CV

Cora: Correspondence-aware image editing using few step diffusion

ECAM: A Contrastive Learning Approach to Avoid Environmental Collision in Trajectory Forecasting

LLM-Enhanced Multimodal Fusion for Cross-Domain Sequential Recommendation

Distilling Balanced Knowledge from a Biased Teacher

Empowering Small VLMs to Think with Dynamic Memorization and Exploration

SelvaBox: A high-resolution dataset for tropical tree crown detection

Concept-based Adversarial Attack: a Probabilistic Perspective

Knowledge-Guided Machine Learning: Illustrating the use of Explainable Boosting Machines to Identify Overshooting Tops in Satellite Imagery

pFedMMA: Personalized Federated Fine-Tuning with Multi-Modal Adapter for Vision-Language Models

Conformal Prediction for Long-Tailed Classification

Animal behavioral analysis and neural encoding with transformer-based self-supervised pretraining

Fast Learning of Non-Cooperative Spacecraft 3D Models through Primitive Initialization

DA-Occ: Direction-Aware 2D Convolution for Efficient and Geometry-Preserving 3D Occupancy Prediction in Autonomous Driving

AutoDebias: Automated Framework for Debiasing Text-to-Image Models

Less is More: AMBER-AFNO -- a New Benchmark for Lightweight 3D Medical Image Segmentation

AnimateScene: Camera-controllable Animation in Any Scene

BeeNet: Reconstructing Flower Shapes from Electric Fields using Deep Learning

Structure-aware Contrastive Learning for Diagram Understanding of Multimodal Models

Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing

MEGS2^{2}2: Memory-Efficient Gaussian Splatting via Spherical Gaussians and Unified Pruning

MEGS $^{2}$ : Memory-Efficient Gaussian Splatting via Spherical Gaussians and Unified Pruning