cs.CV 件の論文 | Gist.Science

Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing

本論文は、画像編集における理解モジュールと生成モジュールの役割分担の非対称性を解消するため、複雑な指示理解と編集の設計図を明示的に提供する大規模データセット「Draw-In-Mind」を提案し、これにより小規模モデルでも最先端の画像編集性能を達成することを示しています。

Ziyun Zeng, David Junhao Zhang, Wei Li + 1 more2026-03-02🤖 cs.AI

MEGS $^{2}$ : Memory-Efficient Gaussian Splatting via Spherical Gaussians and Unified Pruning

本論文は、高次球面調和関数の代わりに軽量な球面ガウスローブを採用し、プリミティブ数とローブ数の剪定を統合的に最適化する「MEGS $^{2}$ 」を提案することで、レンダリング時の VRAM 使用量を大幅に削減しつつ画質を維持するメモリ効率の良い 3D ガウススプラッティング手法を確立した。

Jiarui Chen, Yikeng Chen, Yingshuang Zou + 5 more2026-03-02🤖 cs.AI

Activation Function Design Sustains Plasticity in Continual Learning

本論文は、継続的学習における「可塑性の喪失」を軽減するため、活性化関数の形状（負の枝の形状と飽和挙動）を分析し、追加容量やタスク固有の調整なしに汎用的に可塑性を維持できる新しい活性化関数を提案することを示しています。

Lute Lillo, Nick Cheney2026-03-02🤖 cs.AI

Unsupervised Representation Learning for 3D Mesh Parameterization with Semantic and Visibility Objectives

本論文は、手動の UV マッピングに依存せず、セマンティックな整合性と可視性（目立たない継ぎ目）を考慮した教師なし学習フレームワークを提案し、3D メッシュのパラメータ化を自動化してテクスチャ生成の品質向上と継ぎ目アーティファクトの低減を実現するものである。

AmirHossein Zamani, Bruno Roy, Arianna Rampini2026-03-02💻 cs

Less is More: Lean yet Powerful Vision-Language Model for Autonomous Driving

本論文は、自動運転を次なる経路点の予測という言語生成問題として再定義し、大規模な専門家データからの模倣学習により nuScenes データセットで最先端の性能を達成する、軽量かつ強力なエンドツーエンド型ビジョン・言語モデル「Max-V1」を提案しています。

Sheng Yang, Tong Zhan, Guancheng Chen + 2 more2026-03-02🤖 cs.AI

Universal Beta Splatting

本論文は、3D ガウススプラッティングを N 次元の異方性ベータカーネルに一般化した統一フレームワーク「Universal Beta Splatting」を提案し、補助ネットワークを必要とせずに空間・角度・時間的な依存関係を統一的にモデル化することで、リアルタイムレンダリング性能と既存手法を上回る画質を実現するものです。

Rong Liu, Zhongpai Gao, Benjamin Planche + 8 more2026-03-02⚡ eess

CLEAR-IR: Clarity-Enhanced Active Reconstruction of Infrared Imagery

この論文は、暗所でのロボティクス視覚を強化するため、赤外線画像のノイズを除去し高品質な画像を再構築する「CLEAR-IR」という新しい手法を提案し、既存の技術を上回る性能で RGB 画像で訓練されたタスクを極低照度環境でも実行可能にすることを示しています。

Nathan Shankar, Pawel Ladosz, Hujun Yin2026-03-02🤖 cs.LG

The False Promise of Zero-Shot Super-Resolution in Machine-Learned Operators

本論文は、機械学習演算子（MLO）が学習解像度と異なる解像度での推論（ゼロショット超解像）においてエイリアシングに陥り失敗することを示し、その課題を克服するための効率的な多解像度学習プロトコルを提案しています。

Mansi Sakarvadia, Kareem Hegazy, Amin Totounferoush + 4 more2026-03-02🤖 cs.AI

Into the Rabbit Hull: From Task-Relevant Concepts in DINO to Minkowski Geometry

DINOv2 の SAE 解析を通じて、従来の線形表現仮説を補完し、トークンがアーキタイプ間の凸結合として構成される「ミンコフスキー表現仮説」を提唱し、視覚トランスフォーマーの概念空間における幾何学的・機能的構造を解明した。

Thomas Fel, Binxu Wang, Michael A. Lepori + 8 more2026-03-02🤖 cs.AI

Uncertainty Matters in Dynamic Gaussian Splatting for Monocular 4D Reconstruction

本論文は、単眼入力からの動的 3D 場面の再構成において、観測の信頼性に基づいてガウスプリミティブの運動を最適化する「USplat4D」という不確実性認識型フレームワークを提案し、被りや極端な視点変化に対する安定性と合成品質の向上を実現するものである。

Fengzhi Guo, Chih-Chuan Hsu, Sihao Ding + 1 more2026-03-02🤖 cs.AI

Leveraging Multimodal LLM Descriptions of Activity for Explainable Semi-Supervised Video Anomaly Detection

本論文は、マルチモーダル大規模言語モデル（MLLM）を用いて正常な動画から物体の活動や相互作用に関する高レベルなテキスト記述を生成し、これをテスト時の記述と比較することで、複雑な相互作用に基づく異常を検出するとともに説明可能性を付与する、新しい半教師あり動画異常検出フレームワークを提案するものです。

Furkan Mumcu, Michael J. Jones, Anoop Cherian + 1 more2026-03-02💻 cs

From Volume Rendering to 3D Gaussian Splatting: Theory and Applications

本チュートリアルは、3D ガウススプラッティング（3DGS）の理論とパイプラインを概説し、その限界への対応策を論じるとともに、サーフェス再構成やアバターモデリングなど多様な応用分野におけるその可能性を調査する。

Vitor Pereira Matias, Daniel Perazzo, Vinicius Silva + 4 more2026-03-02💻 cs

Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation

この論文は、情報量の多い画像における視覚的推論の課題を解決するため、複数の軽量ドラフトモデルが生成する多様な推論パスを大規模モデルが統合・検証するトレーニング不要のフレームワーク「Speculative Verdict」を提案し、高精度かつ計算コストの低い推論を実現したことを示しています。

Yuhan Liu, Lianhui Qin, Shengjie Wang2026-03-02💬 cs.CL

TokenCLIP: Token-wise Prompt Learning for Zero-shot Anomaly Detection

本論文は、ゼロショット異常検出において、各視覚トークンを意味的親和性に基づいて動的に直交するテキスト部分空間へ割り当てるトランスポート計画を最適化し、従来の単一テキスト空間の限界を克服する「TokenCLIP」というトークン単位の適応フレームワークを提案するものです。

Qihang Zhou, Binbin Gao, Guansong Pang + 3 more2026-03-02💻 cs

MMSD3.0: A Multi-Image Benchmark for Real-World Multimodal Sarcasm Detection

既存の単一画像に限定された手法の限界を克服するため、複数の画像を含む実世界データを基盤とした新しいベンチマーク「MMSD3.0」と、画像間の関連性を捉えるためのクロス画像推論モデル（CIRM）を提案し、単一・複数画像の両シナリオで最先端の性能を達成したことを示しています。

Haochen Zhao, Yuyao Kong, Yongxiu Xu + 4 more2026-03-02💻 cs

Enhancing CLIP Robustness via Cross-Modality Alignment

本論文は、敵対的攻撃下での CLIP の頑健性を向上させるため、最適輸送に基づく「COLA」と呼ばれる学習不要のフレームワークを提案し、画像とテキストの潜在空間における整合性を回復することで、敵対的摂動下での分類精度を大幅に改善することを示しています。

Xingyu Zhu, Beier Zhu, Shuo Wang + 2 more2026-03-02💻 cs

Attentive Feature Aggregation or: How Policies Learn to Stop Worrying about Robustness and Attend to Task-Relevant Visual Cues

本論文は、事前学習済み視覚表現の持つタスク無関係な情報への脆弱性を解決するため、タスクに関連する視覚手がかりに自動的に注目しノイズを無視する軽量な「注意機能集約（AFA）」を提案し、これによりデータ拡張や微調整なしで視覚的擾乱に対する強固な視覚運動制御ポリシーを実現することを示しています。

Nikolaos Tsagkas, Andreas Sochopoulos, Duolikun Danier + 4 more2026-03-02💻 cs

← 前へ次へ →

cs.CV

Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing

MEGS $^{2}$ : Memory-Efficient Gaussian Splatting via Spherical Gaussians and Unified Pruning

Activation Function Design Sustains Plasticity in Continual Learning

Unsupervised Representation Learning for 3D Mesh Parameterization with Semantic and Visibility Objectives

Less is More: Lean yet Powerful Vision-Language Model for Autonomous Driving

Universal Beta Splatting

CLEAR-IR: Clarity-Enhanced Active Reconstruction of Infrared Imagery

The False Promise of Zero-Shot Super-Resolution in Machine-Learned Operators

Into the Rabbit Hull: From Task-Relevant Concepts in DINO to Minkowski Geometry

Uncertainty Matters in Dynamic Gaussian Splatting for Monocular 4D Reconstruction

Leveraging Multimodal LLM Descriptions of Activity for Explainable Semi-Supervised Video Anomaly Detection

From Volume Rendering to 3D Gaussian Splatting: Theory and Applications

Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation

TokenCLIP: Token-wise Prompt Learning for Zero-shot Anomaly Detection

MMSD3.0: A Multi-Image Benchmark for Real-World Multimodal Sarcasm Detection

Enhancing CLIP Robustness via Cross-Modality Alignment

Attentive Feature Aggregation or: How Policies Learn to Stop Worrying about Robustness and Attend to Task-Relevant Visual Cues

Score-Regularized Joint Sampling with Importance Weights for Flow Matching

General vs Domain-Specific CNNs: Understanding Pretraining Effects on Brain MRI Tumor Classification

Q-Save: Towards Scoring and Attribution for Generated Video Evaluation

cs.CV

Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing

MEGS2^{2}2: Memory-Efficient Gaussian Splatting via Spherical Gaussians and Unified Pruning

Activation Function Design Sustains Plasticity in Continual Learning

Unsupervised Representation Learning for 3D Mesh Parameterization with Semantic and Visibility Objectives

Less is More: Lean yet Powerful Vision-Language Model for Autonomous Driving

Universal Beta Splatting

CLEAR-IR: Clarity-Enhanced Active Reconstruction of Infrared Imagery

The False Promise of Zero-Shot Super-Resolution in Machine-Learned Operators

Into the Rabbit Hull: From Task-Relevant Concepts in DINO to Minkowski Geometry

Uncertainty Matters in Dynamic Gaussian Splatting for Monocular 4D Reconstruction

Leveraging Multimodal LLM Descriptions of Activity for Explainable Semi-Supervised Video Anomaly Detection

From Volume Rendering to 3D Gaussian Splatting: Theory and Applications

Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation

TokenCLIP: Token-wise Prompt Learning for Zero-shot Anomaly Detection

MMSD3.0: A Multi-Image Benchmark for Real-World Multimodal Sarcasm Detection

Enhancing CLIP Robustness via Cross-Modality Alignment

Attentive Feature Aggregation or: How Policies Learn to Stop Worrying about Robustness and Attend to Task-Relevant Visual Cues

Score-Regularized Joint Sampling with Importance Weights for Flow Matching

General vs Domain-Specific CNNs: Understanding Pretraining Effects on Brain MRI Tumor Classification

Q-Save: Towards Scoring and Attribution for Generated Video Evaluation

MEGS $^{2}$ : Memory-Efficient Gaussian Splatting via Spherical Gaussians and Unified Pruning