Think While Watching: Online Streaming Segment-Level Memory for Multi-Turn Video Reasoning in Multimodal Large Language Models
Dit paper introduceert 'Think While Watching', een innovatief framework dat multimodale grote taalmodellen in staat stelt om tijdens het streamen van video's continue segmentgeheugen te behouden en gelijktijdig te waarnemen en redeneren, waardoor de prestaties bij meervoudige interacties aanzienlijk worden verbeterd en de output efficiënter wordt.
Lu Wang (The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China), Zhuoran Jin (The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China), Yupu Hao (The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China), Yubo Chen (The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China), Kang Liu (The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China), Yulong Ao (Beijing Academy of Artificial Intelligence), Jun Zhao (The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China)2026-03-13💬 cs.CL