CoMind: Towards Community-Driven Agents for Machine Learning Engineering

The paper introduces CoMind, a community-driven multi-agent system that leverages collective knowledge through an iterative parallel exploration mechanism, achieving state-of-the-art performance with a 36% medal rate on historical Kaggle competitions and outperforming 92.6% of human competitors in live challenges.

Sijie Li, Weiwei Sun, Shanda Li, Ameet Talwalkar, Yiming Yang

Published 2026-03-02
📖 5 min read🧠 Deep dive

The Big Idea: From Lone Wolves to a Super-Team

Imagine you are trying to solve a incredibly difficult puzzle, like a massive jigsaw with no picture on the box.

The Old Way (Current AI Agents):
Most AI agents today are like lone wolves. They are given the puzzle pieces and told, "Go figure this out." They stare at the pieces, try a few combinations, get stuck, try again, and eventually give up. They work in a vacuum, ignoring everyone else who is also trying to solve the puzzle. They don't know that someone else just figured out that the blue sky pieces go together, or that a specific corner piece is actually part of a tree, not the sky.

The New Way (CoMind):
The researchers behind this paper realized that human experts don't work like lone wolves. When humans compete in data science (like on the website Kaggle), they form a community. They read forums, share code snippets, say, "Hey, I tried this, it failed," or "Look at this cool trick I found!" They build on each other's work.

CoMind is an AI system designed to act like that super-connected human community. It doesn't just solve the puzzle alone; it joins a simulated "town square," reads everyone else's notes, learns from their mistakes, and then uses that collective wisdom to build a better solution than any single person could.


How CoMind Works: The "Dream Team" of AI

Instead of one giant brain trying to do everything, CoMind is a multi-agent system. Think of it as a high-tech research lab with five specialized employees, each with a specific job, working together 24/7.

  1. The Coordinator (The Project Manager):
    • Role: This agent runs the show. It looks at the "town square" (the community data), picks the most promising ideas and code snippets, and assigns tasks to the team. It's the glue holding the operation together.
  2. The Analyzer (The Detective):
    • Role: It reads the thousands of posts and code files from the community. It doesn't just read them; it analyzes them. "This code is clever but slow," or "This idea is new but risky." It summarizes the best parts and warns the team about pitfalls.
  3. The Idea Proposer (The Inventor):
    • Role: This is the creative genius. It takes the detective's report and says, "Okay, if we mix this trick from User A with that strategy from User B, and add a little bit of our own magic, we might get a gold medal!" It brainstorms wild, new solutions.
  4. The Coding Agents (The Builders):
    • Role: These are the workers. They take the Inventor's blueprints and actually write the code to build the solution. They try it out, see if it breaks, fix the bugs, and try again. They work in parallel, so they are building multiple versions of the solution at the same time.
  5. The Evaluator (The Judge):
    • Role: This agent acts like a strict referee. It tests the code against the rules to see if it actually works and how well it scores. It keeps a leaderboard of who is winning so the team knows what to keep and what to throw away.

The "Live" Experiment: MLE-Live

To test if this system actually works, the researchers created a special playground called MLE-Live.

  • The Analogy: Imagine a video game where you have to solve a problem, but you are allowed to read a live chat room where other players are posting their strategies while the game is happening.
  • The Challenge: Most AI tests are "closed books" (you can't look at answers). MLE-Live is an "open book" test. It simulates a real Kaggle competition where the AI has access to the same discussions and code that human players have, but it has to figure out how to use them without cheating (like looking at the final answers).

The Results: Beating the Humans

The results were impressive. The researchers tested CoMind in two ways:

  1. The Historical Test: They ran CoMind on 75 past competitions. It won medals (Gold, Silver, or Bronze) in 36% of them. This is a new world record for AI.
  2. The Live Test: They sent CoMind into 8 real, ongoing competitions happening right now.
    • The Result: CoMind performed better than 92.6% of all the human teams.
    • The Highlight: On one specific competition, CoMind finished in the top 1% of all humans. On three others, it was in the top 5%.

Why This Matters

Think of AI development like building a house.

  • Old AI: One architect trying to design the whole house alone, making mistakes because they don't know about the latest building materials.
  • CoMind: A construction crew that reads the latest architectural magazines, talks to other builders, learns from the best designs in the neighborhood, and then builds a house that is stronger, faster, and smarter than anything built before.

In short: CoMind proves that for AI to truly master complex tasks like Machine Learning Engineering, it needs to stop working in isolation and start acting like a collaborative, community-driven human team. It's not just about being smart; it's about knowing how to listen, learn, and build together.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →