CompanionCast: Toward Social Collaboration with Multi-Agent Systems in Shared Experiences

The paper introduces CompanionCast, a multi-agent framework that orchestrates specialized AI agents with multimodal detection, context caching, and spatial audio to enhance social presence and emotional sharing during shared media experiences, as validated by improved outcomes in pilot studies with soccer fans.

Yiyang Wang, Chen Chen, Tica Lin, Vishnu Raj, Josh Kimball, Alex Cabral, Josiah Hester

Published Tue, 10 Ma
📖 4 min read☕ Coffee break read

Imagine you're sitting on your couch watching a thrilling soccer match. You're alone, but you wish you had a group of friends right there with you to scream "GOAL!" when the ball hits the net, to argue about a bad referee call, or to crack a joke when a player slips.

That's exactly what CompanionCast is trying to solve. It's a new system that uses a team of AI "friends" to watch videos with you, making the experience feel less lonely and more like a real social gathering.

Here is how it works, broken down into simple concepts:

1. The Cast of Characters (The Multi-Agent Team)

Instead of having just one robotic voice narrating the game, CompanionCast creates a team of three distinct AI personalities, each with their own job and voice:

  • The Die-Hard Fan: This is your hype man. They are super emotional, cheering wildly for your team, and screaming when a goal is scored.
  • The Analyst: This is the smart friend who knows the rules. They calmly explain why a play worked, pointing out tactical moves and statistics.
  • The Comedian: This is the friend who likes to tease the other team. They make sarcastic jokes and keep the mood light, even when things get tense.

The Analogy: Think of it like a sports bar. You don't just have one guy talking; you have the loud fan in the corner, the guy with the stats in his head, and the joker making everyone laugh. CompanionCast brings all three of these people to your living room.

2. The "Eyes and Ears" (Event Detection)

The system doesn't just chat randomly. It has "eyes" watching the video and "ears" listening to the audio. It uses special technology to spot Key Moments.

  • The Trigger: When the AI sees a goal, a foul, or a replay, it knows, "Okay, this is the moment! Time to speak up!"
  • The Memory: It keeps a "rolling memory" of the last minute of the game. This ensures that if the Analyst mentions a player's name, the Comedian can immediately make a joke about that specific player, just like real friends do.

3. The "Sound Stage" (Spatial Audio)

This is a cool feature. When the AI speaks, it doesn't just come from your speakers as a flat sound. It uses spatial audio to place each character in a specific spot in your room.

  • The Analogy: Imagine sitting in a circle with friends. The Die-Hard Fan is shouting from your left, the Analyst is whispering from your right, and the Comedian is talking from behind you. Your brain naturally separates them, making it feel like you are physically in a room with a group, not just listening to a podcast.

4. The "Editor" (The Evaluator Agent)

Before the AI friends speak to you, they have a secret "boss" or Editor Agent.

  • How it works: The three friends generate a conversation, but the Editor checks it first. It asks: "Is this funny? Does the Analyst sound too robotic? Is the Comedian being too mean?"
  • The Refinement: If the conversation isn't good enough, the Editor sends it back for a quick rewrite. This ensures the final chat feels natural, authentic, and high-quality before you hear it.

5. Did It Work? (The Pilot Study)

The researchers tested this with two soccer fans. They watched a clip alone, then watched the same clip with CompanionCast.

  • The Result: The fans felt much less alone. They rated the experience as having a higher "social presence" (feeling like they were with others).
  • The Feeling: They said the different voices and personalities made them feel like they were hanging out with a group of friends, rather than just watching a video with a narrator.

The Big Picture

CompanionCast is trying to bring the magic of "hanging out with friends" back to our solitary screen time. It realizes that watching a movie or a game isn't just about the content; it's about the reactions, the arguments, and the shared excitement. By using a team of specialized AI agents that talk to each other and to you, it turns a lonely viewing session into a lively, shared party.

In short: It's like having a virtual sports bar in your pocket, ready to cheer, analyze, and joke with you the second the ball hits the net.