cs.CV 편의 논문 | Gist.Science

The Thinking Boundary: Quantifying Reasoning Suitability of Multimodal Tasks via Dual Tuning

이 논문은 'Dual Tuning' 프레임워크를 통해 다양한 멀티모달 작업에서 추론의 유익성을 정량화하고 '생각의 경계 (Thinking Boundary)'를 설정함으로써, 모든 작업에 무조건적인 추론을 적용하는 관행에 도전하고 데이터 및 학습 전략을 최적화하는 실용적인 지침을 제시합니다.

Ruobing Zheng, Tianqi Li, Jianing Li + 3 more2026-03-06💻 cs

SkillNet: Create, Evaluate, and Connect AI Skills

이 논문은 에이전트의 장기적 발전과 기술 전수를 위해 20 만 개 이상의 기술을 체계적으로 생성, 평가 및 연결하는 오픈 인프라 'SkillNet'을 제안하며, 이를 통해 에이전트의 성능을 크게 향상시키고 실행 단계를 줄인다는 것을 보여줍니다.

Yuan Liang, Ruobin Zhong, Haoming Xu + 46 more2026-03-06✓ Author reviewed ⓘ💻 cs

Recognition of Daily Activities through Multi-Modal Deep Learning: A Video, Pose, and Object-Aware Approach for Ambient Assisted Living

이 논문은 3D CNN, 그래프 합성곱 네트워크, 그리고 객체 감지 정보를 크로스 어텐션 메커니즘으로 융합한 다중 모달 딥러닝 방식을 제안하여, 고령자의 일상 활동 인식 정확도를 향상시키고 Ambient Assisted Living 시스템의 안전성과 자율성을 강화하는 것을 목표로 합니다.

Kooshan Hashemifard, Pau Climent-Pérez, Francisco Florez-Revuelta2026-03-06💻 cs

← 이전 다음 →

cs.CV

The Thinking Boundary: Quantifying Reasoning Suitability of Multimodal Tasks via Dual Tuning

SkillNet: Create, Evaluate, and Connect AI Skills

Recognition of Daily Activities through Multi-Modal Deep Learning: A Video, Pose, and Object-Aware Approach for Ambient Assisted Living

InverseNet: Benchmarking Operator Mismatch and Calibration Across Compressive Imaging Modalities

Fusion and Grouping Strategies in Deep Learning for Local Climate Zone Classification of Multimodal Remote Sensing Data

Structure-Guided Histopathology Synthesis via Dual-LoRA Diffusion

Mask-aware inference with State-Space Models

PinPoint: Evaluation of Composed Image Retrieval with Explicit Negatives, Multi-Image Queries, and Paraphrase Testing

SGR3 Model: Scene Graph Retrieval-Reasoning Model in 3D

Spinverse: Differentiable Physics for Permeability-Aware Microstructure Reconstruction from Diffusion MRI

Using Vision + Language Models to Predict Item Difficulty

sFRC for assessing hallucinations in medical image restoration

Decoding the Pulse of Reasoning VLMs in Multi-Image Understanding Tasks

A Benchmark Study of Neural Network Compression Methods for Hyperspectral Image Classification

Are Multimodal LLMs Ready for Surveillance? A Reality Check on Zero-Shot Anomaly Detection in the Wild

FOZO: Forward-Only Zeroth-Order Prompt Optimization for Test-Time Adaptation

Toward Real-world Infrared Image Super-Resolution: A Unified Autoregressive Framework and Benchmark Dataset

Evaluating GPT-5 as a Multimodal Clinical Reasoner: A Landscape Commentary

Evaluating and Correcting Human Annotation Bias in Dynamic Micro-Expression Recognition

DSA-SRGS: Super-Resolution Gaussian Splatting for Dynamic Sparse-View DSA Reconstruction