ExSampling: a system for the real-time ensemble performance of field-recorded environmental sounds

The paper proposes ExSampling, an integrated system combining a recording application and a Deep Learning environment to enable the real-time ensemble performance of field-recorded environmental sounds through automated sound mapping to Ableton Live tracks.

Atsuya Kobayashi, Reo Anzai, Nao Tokui

Published 2026-03-10
📖 5 min read🧠 Deep dive

Imagine you are at a concert, but instead of the musician just playing a guitar or a synthesizer, they are conducting a symphony made entirely of the sounds happening right now all over the world.

That is the core idea behind ExSampling, a system created by researchers at Keio University. Here is a simple breakdown of how it works, using some everyday analogies.

The Problem: The "Library" Bottleneck

Traditionally, if a musician wants to use the sound of a rainstorm or a busy street in their song, they have to go out, record it, come back, listen to hours of tape, find the perfect 5-second clip, and then manually teach their computer how to play it.

It's like trying to cook a gourmet meal, but you have to spend three hours just chopping vegetables before you can even turn on the stove. By the time you are ready to cook, the inspiration is gone, and the "freshness" of the moment is lost.

The Solution: The "Magic Translator"

ExSampling is like a real-time magic translator that turns the chaos of the outside world into a musical instrument instantly. It connects three different groups of people:

  1. The Field Recorders: Regular people with smartphones or laptops anywhere in the world.
  2. The Performer: A musician on stage (or online) playing a show.
  3. The AI Brain: A computer program that does the heavy lifting.

How It Works (The Three Steps)

1. The "Ears" (The Web Recorder)
Imagine a website that acts like a universal microphone. Anyone can visit this site, press a button, and record whatever sound is around them—a barking dog, a train whistle, or wind in the trees.

  • The Magic: As soon as they hit record, that sound is instantly sent over the internet to the musician's computer. No waiting, no downloading.

2. The "Brain" (The Deep Learning AI)
This is the coolest part. The musician's computer has a "brain" (a Neural Network) that listens to the incoming sound and instantly guesses what it is.

  • The Analogy: Think of it like a very fast, very smart librarian. You hand them a book (the sound), and they immediately shout, "This is a bird!" or "This is a car!"
  • The system then converts that sound into a visual map (a spectrogram) and asks the AI, "What is this?" The AI answers in milliseconds.

3. The "Conductor" (The Music Software)
Once the AI knows what the sound is, it automatically assigns it to a musical instrument.

  • If the AI hears a bird chirping, it might assign that sound to a flute track.
  • If it hears a car honking, it might assign it to a drum track.
  • If it hears wind, it might assign it to a synthesizer.

The musician doesn't have to do any of this sorting. They just play their MIDI keyboard, and the system instantly swaps in the fresh, real-world sound that matches the note they are playing.

The Experience: A Global Jam Session

Imagine a performer in Tokyo playing a drum solo.

  • A student in London records the sound of a subway train. The system hears "train," maps it to a "bass drum," and the London train sound plays in the Tokyo concert.
  • A hiker in the Alps records a rushing river. The system hears "water," maps it to a "hi-hat," and the river sound joins the beat.

The performer gets a notification on their screen showing a map of where the sounds came from. They can see, "Oh, the bass drum is coming from London right now!"

Why This Matters

  • Serendipity (Happy Accidents): The musician doesn't know exactly what sounds will arrive. Maybe the AI mistakes a cat meow for a violin. That "mistake" might create a weird, beautiful sound that the musician loves. It adds an element of surprise and tension to the show.
  • Democratizing Music: You don't need to be a professional sound engineer to contribute. If you have a phone and a quiet moment, you can be part of a live concert.
  • Preserving the Moment: It captures the "vibe" of a specific place and time and turns it into music instantly, keeping the performance feeling alive and connected to the real world.

The Future

The researchers admit the system isn't perfect yet. Sometimes the AI guesses wrong, and the musician can't always choose to reject a sound they don't like. But they plan to add features like sending photos or videos along with the sound, so the audience can see exactly where the "drum beat" was recorded.

In short: ExSampling turns the whole world into a musical instrument, allowing a performer to play a live concert using the sounds of the world as they happen, with a smart computer acting as the bridge between the noise of reality and the harmony of music.