Mindstorms in Natural Language-Based Societies of Mind

This paper proposes Natural Language-Based Societies of Mind (NLSOMs), a modular framework where large multimodal neural networks communicate via natural language to solve complex AI tasks more effectively than single models, while also exploring the emerging social, economic, and structural challenges of scaling these heterogeneous societies to include billions of agents.

Mingchen Zhuge, Haozhe Liu, Francesco Faccio, Dylan R. Ashley, Róbert Csordás, Anand Gopalakrishnan, Abdullah Hamdi, Hasan Abed Al Kader Hammoud, Vincent Herrmann, Kazuki Irie, Louis Kirsch, Bing Li, Guohao Li, Shuming Liu, Jinjie Mai, Piotr Pi\k{e}kos, Aditya Ramesh, Imanol Schlag, Weimin Shi, Aleksandar Stanic, Wenyi Wang, Yuhui Wang, Mengmeng Xu, Deng-Ping Fan, Bernard Ghanem, Jürgen Schmidhuber

Published Thu, 12 Ma
📖 4 min read☕ Coffee break read

Imagine you're trying to solve a really tough puzzle, like building a complex Lego castle or figuring out why a car won't start. If you try to do it all by yourself, you might get stuck or miss a crucial piece. But what if you had a team of experts? You could ask a mechanic, a painter, an architect, and a historian, and let them talk to each other until they figure it out together.

This paper is about building that exact kind of team, but instead of real people, the team is made of AI computers.

Here's the breakdown using a few simple analogies:

1. The "Brain" vs. The "Brain Trust"

For a long time, AI researchers tried to build one giant, super-smart computer brain (a Large Language Model) that could do everything. It's like hiring one person who claims to be a genius at math, art, coding, and cooking. While they are good, they still make mistakes and get confused.

This paper suggests a different approach: The Society of Mind. Instead of one giant brain, imagine a bustling town square filled with hundreds of smaller, specialized AI "agents."

  • One agent is great at looking at pictures.
  • Another is a master of writing code.
  • A third is an expert at 3D modeling.
  • A fourth is just really good at asking the right questions.

2. The "Mindstorm" Meeting

The magic happens when these agents start talking to each other. The authors call this a "Mindstorm."

Think of it like a round-table discussion at a restaurant.

  • The Problem: "I need to describe this photo of a dog running in the rain."
  • Agent A (The Visual Expert): "I see a golden retriever, but the image is blurry."
  • Agent B (The Language Expert): "Let's ask Agent C to clarify the weather conditions."
  • Agent C (The Context Expert): "It's definitely raining, and the dog looks happy, not scared."
  • The Result: They bounce ideas back and forth, correcting each other's mistakes, until they come up with a perfect description: "A happy golden retriever splashing through a puddle in the pouring rain."

By having them chat in natural language (just like we do), they can easily swap in new experts. If you need a new agent who knows about 3D printing, you just add them to the chat room, and they start contributing immediately.

3. What They Actually Did

The researchers didn't just talk about this; they built a few of these "societies." They created groups with up to 129 different AI agents working together. They tested them on hard tasks like:

  • Answering questions about images.
  • Turning text into pictures.
  • Creating 3D models.
  • Helping robots understand their surroundings.

The result? These teams of AIs solved problems much better than any single AI could do alone.

4. The Big Future Question

The most exciting part of the paper is looking ahead. The authors imagine a future where these societies grow to have billions of agents. Some might be AI, and some might even be humans joining the conversation.

This raises some fascinating "social" questions, similar to how we organize human societies:

  • Who is the boss? Should one AI act as a "King" (Monarchy) making all the decisions, or should everyone vote (Democracy)?
  • How do we pay them? If these agents are working together, how do we reward the ones who do the best work to keep the whole group happy and efficient?

The Bottom Line

This paper is a blueprint for the future of AI. Instead of trying to build one "God-like" robot, we should build a collaborative ecosystem where many different smart tools talk to each other, argue, agree, and solve problems together. It's about moving from a solo act to a full orchestra, where the music is much better than any single instrument could play on its own.