cs 篇论文 | Gist.Science

Hybrid Diffusion Policies with Projective Geometric Algebra for Efficient Robot Manipulation Learning

本文提出了一种名为 hPGA-DP 的新型混合扩散策略，通过将投影几何代数（PGA）的几何归纳偏置嵌入网络架构（利用 P-GATr 作为状态编码器和动作解码器），显著提升了机器人操作学习的训练效率与任务性能。

Xiatao Sun, Yuxuan Wang, Shuo Yang, Yinxing Chen, Daniel Rakita2026-03-10💻 cs

A Robust Incomplete Multimodal Low-Rank Adaptation Approach for Emotion Recognition

本文提出了一种名为 MCULoRA 的鲁棒不完整多模态低秩适应框架，通过模态组合感知低秩适应（MCLA）模块解耦共享信息与模态特性，并利用动态参数微调（DPFT）模块基于表征空间可分性优化训练比例，从而有效解决了多模态情感识别中因模态缺失导致的梯度冲突问题并显著提升了预测性能。

Xinkui Zhao, Jinsong Shu, Yangyang Wu, Guanjie Cheng, Zihe Liu, Naibo Wang, Shuiguang Deng, Zhongle Xie, Jianwei Yin2026-03-10💻 cs

Unified Medical Image Segmentation with State Space Modeling Snake

本文提出了一种名为 Mamba Snake 的新型深度蛇形框架，通过引入状态空间建模、Mamba 演化模块及双分类协同机制，有效解决了统一医学图像分割中多尺度结构异质性与器官间关系建模的难题，并在五个临床数据集上实现了优于现有最先进方法的平均 3% 的 Dice 提升。

Ruicheng Zhang, Haowei Guo, Kanghui Tian, Jun Zhou, Mingliang Yan, Zeyu Zhang, Shen Zhao2026-03-10💻 cs

$\pi^3$ : Permutation-Equivariant Visual Geometry Learning

本文提出了 $π^3$ ，一种无需固定参考视图、采用全排列等变架构的自监督前馈神经网络，通过直接预测仿射不变相机姿态和尺度不变局部点图，在相机位姿估计、单目/视频深度估计及稠密点云重建等任务中实现了最先进的性能。

Yifan Wang, Jianjun Zhou, Haoyi Zhu, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Jiangmiao Pang, Chunhua Shen, Tong He2026-03-10💻 cs

Post-Disaster Affected Area Segmentation with a Vision Transformer (ViT)-based EVAP Model using Sentinel-2 and Formosat-5 Imagery

本文提出了一种基于 Vision Transformer 的深度学习框架，利用主成分分析扩展少量标注数据并融合多源遥感影像，以在缺乏精确地面真值的情况下提升灾后受灾区域分割的平滑度与可靠性，从而增强台湾太空总署的紧急增值产品（EVAP）效能。

Yi-Shan Chu, Hsuan-Cheng Wei2026-03-10💻 cs

Auto-scaling Approaches for Microservice Applications: A Survey and Taxonomy

本文针对微服务应用自 2018 年以来在复杂交互与动态负载下的自动扩缩容挑战，系统综述了最新方法，并从基础设施、架构、扩缩容策略、优化目标及行为建模五个维度构建了分类体系，旨在平衡资源效率、成本与 SLA 保障。

Minxian Xu, Junhan Liao, Linfeng Wen, Huaming Wu, Kejiang Ye, Rajkumar Buyya, Chengzhong Xu2026-03-10💻 cs

BrownoutServe: SLO-Aware Inference Serving under Bursty Workloads for MoE-based LLMs

本文提出了 BrownoutServe 框架，通过引入“联合专家”机制和动态褐出（Brownout）策略，有效解决了混合专家（MoE）大模型在突发负载下的静态部署效率低与 SLO 保障难的问题，显著提升了吞吐量并大幅降低了 SLO 违规率。

Jianmin Hu, Minxian Xu, Kejiang Ye + 1 more2026-03-10💻 cs

C-Koordinator: Interference-aware Management for Large-scale and Co-located Microservice Clusters

本文针对大规模混部微服务集群中的资源竞争与干扰问题，提出并实现了基于 CPI 高精度预测的开源平台 C-Koordinator，有效提升了资源利用率并将应用延迟降低了 16.7% 至 36.1%。

Shengye Song, Minxian Xu, Zuowei Zhang + 5 more2026-03-10💻 cs

They See Me Rolling: High-Speed Event Vision-Based Tactile Roller Sensor for Large Surface Inspection

本文提出了一种结合神经形态相机与滚动机制的新型触觉传感器，利用事件驱动多视图立体视觉和贝叶斯融合策略，实现了在 0.5 m/s 高速下对大型工业表面进行亚毫米级精度的连续 3D 扫描，其速度比现有连续触觉传感方法快 11 倍。

Akram Khairi, Hussain Sajwani, Abdallah Mohammad Alkilany, Laith AbuAssi, Mohamad Halwani, Islam Mohamed Zaid, Ahmed Awadalla, Dewald Swart, Abdulla Ayyad, Yahya Zweiri2026-03-10💻 cs

Dynamic Symbolic Execution for Semantic Difference Analysis of Component and Connector Architectures

本文研究了动态符号执行在 MontiArc 组件与连接器架构语义差异分析中的应用，通过增强模型生成器收集运行时数据以识别关键执行路径，评估了多种执行策略并指出该方法虽具潜力但受限于可扩展性。

Johanna Grahl, Bernhard Rumpe, Max Stachon, Sebastian Stüber2026-03-10💻 cs

Empowering Microscopic Traffic Simulators with Realistic Perception using Surrogate Sensor Models

本文提出了 MIDAR，一种基于几何感知图 Transformer 的代理 LiDAR 检测模型，它利用微观交通模拟器中的高层特征高效模拟真实的感知效果（包括遮挡和误检），从而在保持低计算成本的同时显著提升了大规模智能交通系统仿真中自动驾驶车辆感知建模的准确性与实用性。

Tianheng Zhu, Yiheng Feng2026-03-10💻 cs

TransUNet-GradCAM: A Hybrid Transformer-U-Net with Self-Attention and Explainable Visualizations for Foot Ulcer Segmentation

本文提出了一种结合自注意力机制与可解释性可视化的混合 TransUNet-GradCAM 模型，通过融合 Transformer 的全局上下文建模能力与 U-Net 的精细空间定位优势，在多个数据集上实现了具有强泛化能力和高临床相关性的糖尿病足溃疡自动分割。

Akwasi Asare, Mary Sagoe, Justice Williams Asare, Stephen Edward Moore2026-03-10💻 cs

cs

Hybrid Diffusion Policies with Projective Geometric Algebra for Efficient Robot Manipulation Learning

A Robust Incomplete Multimodal Low-Rank Adaptation Approach for Emotion Recognition

Unified Medical Image Segmentation with State Space Modeling Snake

$\pi^3$ : Permutation-Equivariant Visual Geometry Learning

Post-Disaster Affected Area Segmentation with a Vision Transformer (ViT)-based EVAP Model using Sentinel-2 and Formosat-5 Imagery

Auto-scaling Approaches for Microservice Applications: A Survey and Taxonomy

BrownoutServe: SLO-Aware Inference Serving under Bursty Workloads for MoE-based LLMs

C-Koordinator: Interference-aware Management for Large-scale and Co-located Microservice Clusters

They See Me Rolling: High-Speed Event Vision-Based Tactile Roller Sensor for Large Surface Inspection

Dynamic Symbolic Execution for Semantic Difference Analysis of Component and Connector Architectures

Empowering Microscopic Traffic Simulators with Realistic Perception using Surrogate Sensor Models

TransUNet-GradCAM: A Hybrid Transformer-U-Net with Self-Attention and Explainable Visualizations for Foot Ulcer Segmentation

S $^2$ Q-VDiT: Accurate Quantized Video Diffusion Transformer with Salient Data and Sparse Token Distillation

SPEX: A Vision-Language Model for Land Cover Extraction on Spectral Remote Sensing Images

3D Gaussian Splatting with Fisheye Images: Field of View Analysis and Depth-Based Initialization

Experimental Validation of Provably Covert Communication Using Software-Defined Radio

Unified and Semantically Grounded Domain Adaptation for Medical Image Segmentation

Video-EM: Event-Centric Episodic Memory for Long-Form Video Understanding

UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding

UniCast: A Unified Framework for Instance-Conditioned Multimodal Time-Series Forecasting

cs

Hybrid Diffusion Policies with Projective Geometric Algebra for Efficient Robot Manipulation Learning

A Robust Incomplete Multimodal Low-Rank Adaptation Approach for Emotion Recognition

Unified Medical Image Segmentation with State Space Modeling Snake

π3\pi^3π3: Permutation-Equivariant Visual Geometry Learning

Post-Disaster Affected Area Segmentation with a Vision Transformer (ViT)-based EVAP Model using Sentinel-2 and Formosat-5 Imagery

Auto-scaling Approaches for Microservice Applications: A Survey and Taxonomy

BrownoutServe: SLO-Aware Inference Serving under Bursty Workloads for MoE-based LLMs

C-Koordinator: Interference-aware Management for Large-scale and Co-located Microservice Clusters

They See Me Rolling: High-Speed Event Vision-Based Tactile Roller Sensor for Large Surface Inspection

Dynamic Symbolic Execution for Semantic Difference Analysis of Component and Connector Architectures

Empowering Microscopic Traffic Simulators with Realistic Perception using Surrogate Sensor Models

TransUNet-GradCAM: A Hybrid Transformer-U-Net with Self-Attention and Explainable Visualizations for Foot Ulcer Segmentation

S2^22Q-VDiT: Accurate Quantized Video Diffusion Transformer with Salient Data and Sparse Token Distillation

SPEX: A Vision-Language Model for Land Cover Extraction on Spectral Remote Sensing Images

3D Gaussian Splatting with Fisheye Images: Field of View Analysis and Depth-Based Initialization

Experimental Validation of Provably Covert Communication Using Software-Defined Radio

Unified and Semantically Grounded Domain Adaptation for Medical Image Segmentation

Video-EM: Event-Centric Episodic Memory for Long-Form Video Understanding

UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding

UniCast: A Unified Framework for Instance-Conditioned Multimodal Time-Series Forecasting

$\pi^3$ : Permutation-Equivariant Visual Geometry Learning

S $^2$ Q-VDiT: Accurate Quantized Video Diffusion Transformer with Salient Data and Sparse Token Distillation