EgoGraph: Temporal Knowledge Graph for Egocentric Video Understanding

EgoGraph is a training-free, dynamic knowledge graph framework that overcomes the limitations of existing methods in ultra-long egocentric video understanding by constructing a unified schema and temporal relational modeling to capture long-term cross-entity dependencies, thereby achieving state-of-the-art performance on long-term video question answering benchmarks.

Shitong Sun, Ke Han, Yukai Huang, Weitong Cai, Jifei Song

Published 2026-03-02
📖 4 min read☕ Coffee break read

Imagine you are wearing a camera on your glasses for an entire week, recording every single moment of your life: what you eat, who you talk to, where you go, and what you do. Now, imagine trying to answer a question about that week, like, "Who did I have coffee with on Tuesday, and did they bring a blue mug?"

If you tried to remember this by just scrolling through hours of video, you'd get lost. If you tried to summarize it by writing a long diary entry for every hour, you'd end up with a messy, disorganized stack of papers that's hard to search.

EgoGraph is a new computer system designed to solve this exact problem. It turns your ultra-long, chaotic video life into a smart, living memory map.

Here is how it works, using some simple analogies:

1. The Problem: The "Messy Diary" vs. The "Smart Map"

Current AI systems try to understand long videos by breaking them into small chunks (like 1-minute clips) and summarizing each one separately.

  • The Analogy: Imagine trying to understand a whole movie by reading a one-sentence summary of every scene, but the summaries are just piled in a random box. If you ask, "How did the hero's relationship with the villain change from the beginning to the end?" the AI has to dig through thousands of disconnected notes to find the connection. It often misses the big picture because it doesn't see how Scene A connects to Scene Z.

2. The Solution: EgoGraph (The "Living Memory Map")

EgoGraph doesn't just summarize; it builds a Knowledge Graph. Think of this as a giant, interactive mind-map or a subway system for your memories.

  • The Nodes (Stations): Instead of just text, the system identifies key "stations" in your life: People (John), Objects (the yellow mug), Places (the kitchen), and Events (the meeting).
  • The Edges (Tracks): It draws lines between these stations to show how they connect. "John" is connected to "the yellow mug" because he held it. "The yellow mug" is connected to "the kitchen" because it was there.
  • The Time Travel (The Secret Sauce): This is the most important part. Every connection has a timestamp. It's not just a map; it's a time-traveling map. It knows that John held the mug on Day 1 at 9:00 AM, but on Day 3, he held a red cup.

3. How It Learns: The "Human Brain" Approach

The paper mentions that human memory works by organizing who, where, and what happened. EgoGraph copies this:

  • The Schema (The Filing System): It has a strict rulebook (a schema) for what matters. It knows that "John" is a person, "coffee" is an object, and "kitchen" is a location. It doesn't get confused by random words.
  • The Update (The Growing Tree): As you record more days, the graph doesn't just get bigger and messier. It merges things. If you see "John" on Day 1 and Day 5, the system realizes, "Oh, that's the same John!" and updates his profile with new habits or preferences, rather than creating a duplicate "John #2." It keeps the memory compact but rich.

4. Answering Questions: The "Detective"

When you ask a question like, "Who did I see in the kitchen yesterday?", EgoGraph acts like a detective:

  1. Time Filter: It immediately ignores everything that happened after yesterday. It only looks at the "yesterday" part of the map.
  2. Search: It zooms in on the "Kitchen" station and follows the tracks to see who was connected to it at that specific time.
  3. Reasoning: If the question is tricky, like "Did I start drinking tea after I moved to the new house?", the system can trace the timeline of "Tea" and "New House" to see if the tea-drinking habit started after the move.

Why This Matters

Previous AI models were like students who studied for a test by memorizing isolated facts. If the question required connecting two facts from different days, they failed.

EgoGraph is like a student who keeps a structured, time-stamped journal. It can look back at a week of data, find the specific connections between people and objects, and answer complex questions about your daily life with high accuracy.

In short: EgoGraph turns a chaotic, endless video stream into a clean, organized, time-traveling map of your life, allowing computers to finally understand the "story" of your day, not just the individual frames.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →