cs 편의 논문 | Gist.Science

EgoMI: Learning Active Vision and Whole-Body Manipulation from Egocentric Human Demonstrations

이 논문은 인간의 능동적 시선과 손의 협응을 포착하여 반인간형 로봇의 시뮬레이션 격차를 해소하고 견고한 모방 학습을 가능하게 하는 'EgoMI' 프레임워크를 제안합니다.

Justin Yu, Yide Shentu, Di Wu, Pieter Abbeel, Ken Goldberg, Philipp Wu2026-03-11💻 cs

SPAN: Spatial-Projection Alignment for Monocular 3D Object Detection

이 논문은 단일 이미지 기반 3D 객체 감지의 성능을 향상시키기 위해, 분해된 속성 예측 간의 기하학적 불일치를 해결하고 2D-3D 정렬을 강화하는 '공간 - 투영 정렬 (SPAN)' 프레임워크와 계층적 작업 학습 전략을 제안합니다.

Yifan Wang, Yian Zhao, Fanqi Pu, Xiaochen Yang, Yang Tang, Xi Chen, Wenming Yang2026-03-11💻 cs

V-Attack: Targeting Disentangled Value Features for Controllable Adversarial Attacks on LVLMs

이 논문은 LVLM 의 패치 토큰 표현에 내재된 의미적 얽힘 문제를 해결하기 위해, 전역 문맥을 억제하고 해리된 지역적 의미 정보를 유지하는 '가치 (Value) 특징'을 표적으로 하는 정밀한 제어형 적대적 공격 기법인 V-Attack 을 제안하고, 이를 통해 기존 최첨단 방법 대비 평균 36% 높은 공격 성공률을 달성함을 보여줍니다.

Sen Nie, Jie Zhang, Jianxin Yan, Shiguang Shan, Xilin Chen2026-03-11💻 cs

Bootstrap Dynamic-Aware 3D Visual Representation for Scalable Robot Learning

본 논문은 로봇 조작 작업의 성능 향상을 위해 상태-행동-상태 동역학을 모델링하고 명시적 기하학적 재구성을 배제하여 자기지도식 3D 표현을 학습하는 새로운 프레임워크인 AFRO 를 제안하고, 이를 통해 다양한 시뮬레이션 및 실세계 작업에서 기존 방법보다 우수한 조작 성공률을 달성함을 보여줍니다.

Qiwei Liang, Boyang Cai, Minghao Lai, Sitong Zhuang, Tao Lin, Yan Qin, Yixuan Ye, Jiaming Liang, Renjing Xu2026-03-11💻 cs

Audio-Visual World Models: Towards Multisensory Imagination in Sight and Sound

이 논문은 시각과 청각 정보를 통합한 최초의 공식적인 오디오 - 비주얼 월드 모델 (AVWM) 프레임워크를 제안하고, 이를 위해 새로운 데이터셋 AVW-4k 와 3 단계 학습 전략을 갖춘 AV-CDiT 모델을 개발하여 멀티모달 미래 상태 예측 및 내비게이션 성능을 크게 향상시켰음을 보여줍니다.

Jiahua Wang, Leqi Zheng, Jialong Wu, Yaoxin Mao2026-03-11💻 cs

Beware of the Classical Benchmark Instances for the Traveling Salesman Problem with Time Windows

이 논문은 기존 TSPTW 벤치마크 인스턴스의 구조적 취약점을 간파하여 50 개 이상의 고객으로 구성된 모든 사례를 초단위로 해결하는 정밀 알고리즘을 제시함으로써, 해당 인스턴스들이 더 이상 문제의 난이도를 평가하거나 머신러닝 학습용 데이터셋으로 적합하지 않음을 경고합니다.

Francisco J. Soulignac2026-03-11💻 cs

AVGGT: Rethinking Global Attention for Accelerating VGGT

이 논문은 VGGT 와 $\pi^3$ 모델의 글로벌 어텐션 역할을 분석하여 초기 층을 프레임 어텐션으로 변환하고 K/V 를 서브샘플링하는 훈련 없는 2 단계 가속화 기법을 제안함으로써, 기존 희소 어텐션 방식이 실패하는 고밀도 다중 뷰 환경에서도 정확도를 유지하면서 최대 10 배까지 추론 속도를 획기적으로 향상시켰습니다.

Xianbing Sun, Zhikai Zhu, Zhengyu Lou, Bo Yang, Jinyang Tang, Liqing Zhang, He Wang, Jianfu Zhang2026-03-11💻 cs

UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations

이 논문은 다양한 로봇 형태에 맞춰 인간 시연의 모방을 넘어 적응적 탐색을 가능하게 하는 통합 강화학습 프레임워크 'UniBYD'와 이를 평가하기 위한 새로운 벤치마크 'UniManip'을 제안하여, 기존 최첨단 방법 대비 성공률을 44.08% 향상시켰음을 보여줍니다.

Tingyu Yuan, Biaoliang Guan, Wen Ye, Ziyan Tian, Yi Yang, Weijie Zhou, Zhaowen Li, Yan Huang, Peng Wang, Chaoyang Zhao, Jinqiao Wang2026-03-11💻 cs

Multimodal Skeleton-Based Action Representation Learning via Decomposition and Composition

이 논문은 다양한 모달리티 간의 상호보완성을 효과적으로 활용하면서도 모델 효율성을 유지하기 위해, 융합된 다중 모달 특징을 개별 모달 특징으로 분해하고 이를 다시 조합하여 자기지도 학습을 수행하는 새로운 프레임워크를 제안하여 계산 비용과 성능 간의 균형을 달성합니다.

Hongsong Wang, Heng Fei, Bingxuan Dai + 1 more2026-03-11💻 cs

Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning

이 논문은 인간 선호도 기반 강화학습을 통한 텍스트-이미지 생성 모델의 '선호도 모드 붕괴 (Preference Mode Collapse)' 현상을 규명하고, 이를 해결하기 위해 보상 신호를 방향적으로 보정하여 생성 다양성을 유지하는 새로운 프레임워크인 '방향성 분해 정렬 (D²-Align)'을 제안합니다.

Chubin Chen, Sujie Hu, Jiashu Zhu, Meiqi Wu, Jintao Chen, Yanxun Li, Nisha Huang, Chengyu Fang, Jiahong Wu, Xiangxiang Chu, Xiu Li2026-03-11💻 cs

A Tale of 1001 LoC: Potential Runtime Error-Guided Specification Synthesis for Verifying Large-Scale Programs

본 논문은 정적 분석과 LLM 기반 추론을 결합한 모듈형 프레임워크인 Preguss 를 통해 대규모 프로그램의 잠재적 런타임 오류를 기반으로 인터프로시저 명세를 자동 생성 및 정제함으로써, 기존 LLM 기반 접근법보다 우수한 확장성을 보이며 수천 줄 규모의 프로그램에 대한 검증 노력을 80.6%~88.9% 감소시킨다고 제안합니다.

Zhongyi Wang, Tengjie Lin, Mingshuai Chen, Haokun Li, Mingqi Yang, Xiao Yi, Shengchao Qin, Yixing Luo, Xiaofeng Li, Bin Gu, Liqiang Lu, Jianwei Yin2026-03-11💻 cs

Secure Digital Semantic Communications: Fundamentals, Challenges, and Opportunities

본 논문은 실용적인 디지털 의미 통신의 보안 취약점과 위협을 체계적으로 분석하고, 아날로그 방식과의 차이점을 명확히 하며 방어 전략과 향후 연구 방향을 제시합니다.

Weixuan Chen, Qianqian Yang, Yuanyuan Jia + 5 more2026-03-11💻 cs

Low-rank Orthogonal Subspace Intervention for Generalizable Face Forgery Detection

이 논문은 딥페이크 탐지의 일반화 문제를 해결하기 위해, 위조와 무관한 저랭크 편향을 제거하고 진짜 위조 흔적에 초점을 맞추는 인과적 표현 학습 기반의 'SeLop'이라는 새로운 방법을 제안합니다.

Chi Wang, Xinjue Hu, Boyu Wang, Ziwen He, Zhangjie Fu2026-03-11💻 cs

Towards a Goal-Centric Assessment of Requirements Engineering Methods for Privacy by Design

이 논문은 GDPR 의 프라이버시 설계 (PbD) 를 위한 요구사항 공학 방법론을 조직의 목표에 부합하는지 평가하기 위한 목표 중심 접근법을 제안하고 있습니다.

Oleksandr Kosenkov, Ehsan Zabardast, Jannik Fischbach, Tony Gorschek, Daniel Mendez2026-03-11💻 cs

CovertComBench: A First Domain-Specific Testbed for LLMs in Wireless Covert Communication

이 논문은 무선 은폐 통신 (Covert Communication) 의 엄격한 보안 제약 조건 하에서 LLM 의 능력을 평가하기 위해 CovertComBench 를 제안하고, 현재 LLM 이 개념 이해와 코드 구현에서는 우수한 성능을 보이지만 보안 보장을 위한 고차원 수학적 유도에서는 한계가 있어 신뢰할 수 있는 무선 AI 시스템 구축을 위해 외부 도구 증강이 필요함을 밝힙니다.

Zhaozhi Liu, Jiaxin Chen, Yuanai Xie, Yuna Jiang, Minrui Xu, Xiao Zhang, Pan Lai, Zan Zhou2026-03-11💻 cs

Weakly supervised framework for wildlife detection and counting in challenging Arctic environments: a case study on caribou (Rangifer tarandus)

이 논문은 알래스카의 카리부 무리를 대상으로 배경 이질성과 클래스 불균형 등 극한 환경의 어려움을 극복하기 위해, 약한 지도 학습 패치 기반 사전 학습을 적용하여 기존 ImageNet 초기화보다 높은 정확도의 야생동물 탐지 및 계수 프레임워크 (HerdNet) 를 제안하고 검증했습니다.

Ghazaleh Serati, Samuel Foucher, Jerome Theau2026-03-11💻 cs

← 이전 다음 →

cs

EgoMI: Learning Active Vision and Whole-Body Manipulation from Egocentric Human Demonstrations

SPAN: Spatial-Projection Alignment for Monocular 3D Object Detection

V-Attack: Targeting Disentangled Value Features for Controllable Adversarial Attacks on LVLMs

Bootstrap Dynamic-Aware 3D Visual Representation for Scalable Robot Learning

Audio-Visual World Models: Towards Multisensory Imagination in Sight and Sound

Beware of the Classical Benchmark Instances for the Traveling Salesman Problem with Time Windows

AVGGT: Rethinking Global Attention for Accelerating VGGT

UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations

Multimodal Skeleton-Based Action Representation Learning via Decomposition and Composition

Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning

A Tale of 1001 LoC: Potential Runtime Error-Guided Specification Synthesis for Verifying Large-Scale Programs

Secure Digital Semantic Communications: Fundamentals, Challenges, and Opportunities

Low-rank Orthogonal Subspace Intervention for Generalizable Face Forgery Detection

Towards a Goal-Centric Assessment of Requirements Engineering Methods for Privacy by Design

CovertComBench: A First Domain-Specific Testbed for LLMs in Wireless Covert Communication

Weakly supervised framework for wildlife detection and counting in challenging Arctic environments: a case study on caribou (Rangifer tarandus)

Exploiting the Final Component of Generator Architectures for AI-Generated Image Detection

RegionReasoner: Region-Grounded Multi-Round Visual Reasoning

Optimal conversion from Rényi Differential Privacy to $f$ -Differential Privacy

Pathwise Test-Time Correction for Autoregressive Long Video Generation

cs

EgoMI: Learning Active Vision and Whole-Body Manipulation from Egocentric Human Demonstrations

SPAN: Spatial-Projection Alignment for Monocular 3D Object Detection

V-Attack: Targeting Disentangled Value Features for Controllable Adversarial Attacks on LVLMs

Bootstrap Dynamic-Aware 3D Visual Representation for Scalable Robot Learning

Audio-Visual World Models: Towards Multisensory Imagination in Sight and Sound

Beware of the Classical Benchmark Instances for the Traveling Salesman Problem with Time Windows

AVGGT: Rethinking Global Attention for Accelerating VGGT

UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations

Multimodal Skeleton-Based Action Representation Learning via Decomposition and Composition

Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning

A Tale of 1001 LoC: Potential Runtime Error-Guided Specification Synthesis for Verifying Large-Scale Programs

Secure Digital Semantic Communications: Fundamentals, Challenges, and Opportunities

Low-rank Orthogonal Subspace Intervention for Generalizable Face Forgery Detection

Towards a Goal-Centric Assessment of Requirements Engineering Methods for Privacy by Design

CovertComBench: A First Domain-Specific Testbed for LLMs in Wireless Covert Communication

Weakly supervised framework for wildlife detection and counting in challenging Arctic environments: a case study on caribou (Rangifer tarandus)

Exploiting the Final Component of Generator Architectures for AI-Generated Image Detection

RegionReasoner: Region-Grounded Multi-Round Visual Reasoning

Optimal conversion from Rényi Differential Privacy to fff-Differential Privacy

Pathwise Test-Time Correction for Autoregressive Long Video Generation

Optimal conversion from Rényi Differential Privacy to $f$ -Differential Privacy