Cover photo

Recursion, Stems, and the Architecture Nobody Explains

The article has actual working FFmpeg commands per stem type, and a real functional Web Audio API implementation you can drop into an HTML inscription

GIRL BARS — Music Tech | Endodeca — Part 3

The article has actual working FFmpeg commands per stem type, a real functional Web Audio API implementation you can drop into an HTML inscription today, the demucs workflow for people working from finished masters without session files, and an honest section on where the tooling is still rough and why some developers aren't sharing it.

I keep hearing the word "recursion" in music and blockchain conversations. I hear it at showcases, in Discord servers, on panels. And here is what I've noticed: the people saying it most confidently are never the ones who built anything with it. They're amplifying something they were told by someone who was told by someone else, and somewhere back in that chain of telephone there was a developer who actually knew, and that developer has long since stopped talking to panels.

That ends here.

This is what recursion actually is, what it does for audio specifically, and a real workflow for doing it yourself — starting today.

Strip the Mystique First

Recursion in Bitcoin Ordinals is not magic. It's not a philosophical concept. It's a technical feature introduced to the Ordinals protocol that does exactly one thing: it lets one inscription reference the content of another inscription.

That's it.

When you inscribe something on Bitcoin, it gets a unique ID — something like a3f7bc...i0. That content is permanently retrievable at a path that looks like /content/a3f7bc...i0. A recursive inscription is simply one that calls that path, the same way a webpage calls an image from a server. Except the "server" is the Bitcoin blockchain, it never goes down, and nobody can delete it.

An HTML file on Bitcoin can load audio from another inscription. A JavaScript file on Bitcoin can combine multiple audio inscriptions and play them in sync. The entire composition lives on-chain — not hosted, not dependent on a company staying solvent — and the individual stems it references also live on-chain independently, permanently, reusably.

That is what recursion means. Now let's talk about why it matters for sound quality.

Why Stems Sound Better Than the Full Mix at Low Bitrate

Parts 1 and 2 of this series established that compressing a full song to extreme bitrates sounds like music being described by someone who's never heard music. The reason is that a full mix contains everything at once — bass frequencies down at 30Hz, snare transients spiking at 5kHz, vocals sitting in the midrange, cymbals reaching toward 16kHz — and a codec trying to represent all of that at 16 kbps has to make brutal, destructive decisions about what to throw away.

Stems are different. A bass stem only contains content in roughly the 20–200Hz range. A vocal stem lives mostly in the 80Hz–8kHz window. A drum stem is percussive and transient but spectrally manageable in isolation. When you compress a single stem, you're asking the codec to represent a much simpler, narrower frequency picture — and codecs, particularly Opus, are dramatically better at this than they are at representing full mixes at the same bitrate.

Here's what the numbers look like when you allocate compression intelligently by stem:

Stem

Frequency Range

Optimal Bitrate

Sample Rate

Bass (mono)

20–200 Hz

32 kbps

24 kHz

Drums

40 Hz–16 kHz

48 kbps

32 kHz

Vocals (mono)

80 Hz–8 kHz

48 kbps

24 kHz

Melody / Synth

100 Hz–16 kHz

64 kbps

44.1 kHz

Combined, that's 192 kbps of total information across four files — eleven and a half times the quality of cramming everything into a single 16.5 kbps file. Each stem, at its individually appropriate bitrate, will sound respectable. The bass stem at 32 kbps mono sounds far better than bass in a full mix at 32 kbps, because the codec isn't competing with seventeen other frequency ranges for the same budget.

This is the actual argument for recursive stems. Not mysticism. Frequency-specific compression at appropriate bitrates, assembled by a tiny HTML file that weighs almost nothing, pulling the elements together in real time on playback.

The Real Talk on Storage

I'm going to be straight with you because this matters: the stem approach uses more total storage than stuffing everything into one file. Four stems for a 3-minute song at those bitrates adds up to roughly 4.3MB — more than the entire 4MB budget from Parts 1 and 2. Eleven songs with four stems each comes out around 46MB total.

So the recursion approach is not a compression hack. It is a quality architecture. You're spending more on-chain space to get dramatically better sound. That's a different trade-off than trying to minimize file size, and it's worth being clear about the distinction.

What recursion does give you that a single file cannot is this: shared stems cost nothing to reuse.

If you produce music with a consistent drum palette — and most producers do — that drum pattern, once inscribed, can be referenced by every composition that uses it without paying to inscribe it again. The same bass loop that runs through tracks 3, 6, and 9? Inscribed once. Referenced three times. You paid for it once and it lives in every song that needs it. The more prolific your catalog, the more efficiently it compounds. You're not storing music anymore. You're building a library that composes itself.

The Actual Workflow

Here is how to do this. Not conceptually. Actually.

Step 1: Export Your Stems

From your DAW, export individual stems for each element. Standard practice is at minimum: bass, drums, vocals, everything else. Depending on your arrangement, you might go further — pads separate from leads, for instance. Export at full fidelity first (WAV, 44.1kHz or 48kHz, 24-bit). You'll compress them in the next step.

Step 2: Compress Each Stem Appropriately

Use FFmpeg, which is free, open-source, and runs from the command line. These commands give you the right compression profile per stem type:

bash

# Bass stem — mono, 24kHz, 32kbps Opus
ffmpeg -i bass.wav -c:a libopus -b:a 32k -ac 1 -ar 24000 bass.opus

# Drums — mono or stereo depending on room feel, 32kHz, 48kbps
ffmpeg -i drums.wav -c:a libopus -b:a 48k -ac 1 -ar 32000 drums.opus

# Vocals — mono, 24kHz, 48kbps
ffmpeg -i vocals.wav -c:a libopus -b:a 48k -ac 1 -ar 24000 vocals.opus

# Melody/Synth — can stay stereo if it matters, 44.1kHz, 64kbps
ffmpeg -i melody.wav -c:a libopus -b:a 64k -ar 44100 melody.opus

Listen to each output file before you proceed. If the bass sounds wrong at 32kbps, bump it to 40kbps. These are starting points, not gospel.

Step 3: Inscribe Each Stem Separately

Using the ord client or a service like Gamma.io or Hiro.so, inscribe each stem file as a separate ordinal. You'll receive an inscription ID for each one — something like a3f7bc8d2e1f...i0. Write these down. They are the addresses your composition will call.

Step 4: Write the Composition Inscription

This is the piece that pulls everything together. It's an HTML file with embedded JavaScript, and it uses the Web Audio API to fetch each stem, decode the audio, and start playback in precise synchronization. Here is a working structural example:

html

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <title>Track Title</title>
  <style>
    body { background: #000; display: flex; align-items: center; 
           justify-content: center; height: 100vh; }
    button { color: #fff; background: none; border: 1px solid #fff; 
             padding: 12px 24px; cursor: pointer; font-size: 16px; }
  </style>
</head>
<body>
  <button onclick="play()">PLAY</button>
  <script>
    // Replace these with your actual inscription IDs
    const stems = [
      '/content/BASS_INSCRIPTION_ID_HERE',
      '/content/DRUMS_INSCRIPTION_ID_HERE',
      '/content/VOCALS_INSCRIPTION_ID_HERE',
      '/content/MELODY_INSCRIPTION_ID_HERE'
    ];

    async function play() {
      const ctx = new AudioContext();
      const buffers = await Promise.all(
        stems.map(url =>
          fetch(url)
            .then(r => r.arrayBuffer())
            .then(ab => ctx.decodeAudioData(ab))
        )
      );
      // Schedule all stems to start at the same moment
      const startTime = ctx.currentTime + 0.1;
      buffers.forEach(buffer => {
        const source = ctx.createBufferSource();
        source.buffer = buffer;
        source.connect(ctx.destination);
        source.start(startTime);
      });
    }
  </script>
</body>
</html>

This HTML file is about 2–3KB. Inscribe it last. When someone opens this inscription in an Ordinals-compatible viewer, it fetches each stem from the chain, decodes them in the browser's audio engine, and starts them simultaneously. The listener hears the full composition — bass, drums, vocals, melody — assembled in real time from four separate on-chain files.

Step 5: AI Stem Separation (If You Don't Have Session Files)

If you're working with finished masters and don't have the original sessions, Meta's open-source tool demucs can separate a stereo mix into stems automatically. It runs locally, it's free, and it's genuinely good — not perfect, but good enough for this workflow as a starting point.

bash

It outputs bass, drums, vocals, and other as separate files. Use them as your source material for Step 2.

What This Actually Achieves

The storage math expands. The creative math gets interesting.

Because every stem is a discrete on-chain asset, you can treat your stems as building blocks across your entire catalog. A drum stem from one song can be pulled into a remix without the original artist having to send you anything — the address is public. A bass line you inscribed in 2024 can appear in a 2027 composition. Collaborative arrangements can be assembled from stems inscribed by multiple artists, each credited at the inscription level.

Recursion: You are not compressing a full painting into a smaller frame. You are storing each brushstroke separately and assembling the painting at the moment someone looks at it. The brushstrokes are high fidelity. The assembly is instant. The result sounds dramatically better than anything you could achieve in a single compressed file.

It also means the conversation about "fitting music onto Bitcoin" has been slightly wrong this whole time. The frame of "how small can I make a full song" is the old frame. The new frame is "what is the smallest meaningful unit of my music, and what can I build from units that already exist on chain?"

That's recursion. Not a buzzword. A different way of thinking about what a music file is.

The Honest Limitations

Because I'm not here to hype, I'll say this too: recursive ordinal audio is not yet standardized. Viewer support varies — some ordinal explorers render the HTML cleanly, others strip JavaScript, others don't load cross-inscription fetches. This is an early, actively developing space. The code above will work in compliant environments and may not render in others.

The tooling is also still rough. There's no drag-and-drop interface for this workflow. You will need to be comfortable with the command line, with FFmpeg, with the ord client or a third-party inscription service, and with enough JavaScript to debug a Web Audio API implementation when something doesn't start in sync. That learning curve is real.

What is also real: some developers know exactly how to do all of this, have done it, and are not publishing tutorials because first-mover advantage is a thing. I understand the strategic logic. I also think music moves faster when more people know how to build with it.

So there it is.

post image

GIRL BARS | Endodeca — Part 3 of 3 in the compression series. Part 1: An entire album. 4MB. I Did the Math So You Don't Have To. Part 2: Devil's Advocate: What If the Songs Are Only 1:20?

Cover photo

Maximum Music, Minimum Megabytes: The Ultimate Low-Bitrate Guide

Songs Are Getting Shorter. The Data Isn't Subtle. This is not a hot take. This is a documented, multi-year trend that the music industry has been watching in real-time. ...

Yesterday I did the math on 11 songs in 4MB and the math told me: impossible, without sounding like someone left music in the rain. I stood by that. I still stand by that.

But then I kept thinking about it. Because the calculation I ran assumed something I didn't even question — that a song is 3 minutes long.

What if it isn't?

Songs Are Getting Shorter. The Data Isn't Subtle.

This is not a hot take. This is a documented, multi-year trend that the music industry has been watching in real-time. Since streaming became the dominant delivery format, average song length has been in a slow, steady decline. Spotify pays per stream, not per minute. Algorithms reward replay count. TikTok trained an entire generation's ears to expect the hook in the first eight seconds or they're already swiping.

By 2026, sub-two-minute tracks are not a novelty. They're a strategy. Some of the most-streamed records of the last several years clock in under 2:30. Hyperpop, drill, phonk, bedroom pop — entire genres are built around tight, punchy, get-in-get-out structures. A minute twenty isn't a demo. It might just be the song.

So let's run it back. Let's give the person at the party the benefit of the doubt they probably don't deserve, and see what happens to our math when we change one variable.

The Table That Changes Everything

Here's what the numbers actually look like when you stop assuming 3-minute songs:

Song Length

Bitrate (11 songs / 4MB)

Where That Lands

3:00

16.5 kbps

Unlistenable — tin can in a storm

2:00

24.8 kbps

Still rough — speech might survive, music won't

1:20

37.2 kbps

The party claim — survivable, genre-dependent

1:00

49.6 kbps

Getting warmer

0:47

~64 kbps

AM Radio. Actually listenable.

0:31

~96 kbps

Decent. You could stream this.

0:23

~128 kbps

Standard quality. Good. We're here.

The question isn't how many songs can we fit in 4MB. The question is how short does a song have to be to sound good at that size — and the answer is a number the music industry is already quietly moving toward on its own.

What 1:20 Actually Gets Us

At a minute twenty per song, you're sitting at 37.2 kbps. That is — let me be precise here — 58% of AM Radio quality. With Opus, in mono, at a 32 kHz sample rate.

That's not nothing. That's not the "metallic shimmer in a blender" scenario I described last week. That's survivable. Whether it's good depends entirely on what the music is doing.

Drone music? Ambient textures? Lo-fi beats that already have vinyl crackle baked in as an aesthetic choice? 37 kbps might not even be audibly offensive to the average listener in those genres. The compression artifacts can almost pass as intentional warmth if the source material is sparse enough.

A dense orchestral arrangement with dynamic range, a gospel choir, a trap record with 808s that go down to 30 Hz and a hi-hat pattern that lives at 16kHz simultaneously? 37 kbps will end you. The codec will make decisions you did not authorize about what frequencies matter, and it will be wrong every single time.

The breakeven where things get genuinely listenable — as in, you wouldn't be embarrassed to play it for someone — is 47 seconds at 64 kbps. That's AM Radio quality. Not premium. Not hi-fi. But the notes are there, the rhythm is there, the song is recognizable and present.

Thirty-one seconds per song gets you to 96 kbps. That's respectable. That's the quality level a lot of podcast audio lives at. Twenty-three seconds per song gets you to 128 kbps — standard MP3, actually good, no asterisks needed.

Twenty-three seconds. Eleven of them. In 4MB. With quality.

This Is Where It Gets Philosophical

Here's the thing I can't shake: a 23-second song is not new. Hardcore punk has been releasing sub-30-second tracks since the early '80s. Grindcore made brevity an ideology. Chopped interludes, skits, transitional tracks on concept albums — these aren't songs that failed to be longer. They're exactly as long as they need to be.

The streaming era is creating a new version of this by economic pressure rather than artistic intention, which is a different and more complicated conversation. But the format itself — short, tight, punchy — is not inherently less musical. The constraint is not the problem. The constraint is the canvas.

Which brings us back around to Bitcoin.

Onchain Audio and the Constraint Aesthetic

When you inscribe audio onto Bitcoin as an ordinal, you're working within a block size limit. That limit is not going away. The technology is not designed to eventually accommodate your 48kHz stereo master at full fidelity. The compression tradeoff is structural.

But here's what the math just told us: if you're releasing music that runs 23 seconds at 128 kbps, you have a standard-quality audio file that fits comfortably in that constraint. If you're creating ambient pieces, generative loops, sonic textures — the kinds of things that exist as experiences rather than traditional songs — the compression ceiling might not even be a ceiling for you. It might just be the room you're working in.

The artists who are going to do interesting things with on-chain audio are not the ones trying to force a 4-minute record into a 200KB container. They're the ones who understand the container first and build the work to live inside it. Same as every constrained format that preceded it — vinyl's 20-minute side limit, the 74-minute CD, the 10MB MP3 file cap on early digital distribution platforms. The constraint shapes the art. Always has.

The Actual Answer to the Original Question

So: can we fit 11 good-sounding songs onto an ordinal?

Yes. If each song is 47 seconds or shorter, you clear 64 kbps and it's listenable. If each song is 23 seconds or shorter, you hit 128 kbps and it sounds genuinely good. If your songs are 1:20 you're at 37 kbps, which is rough but not indefensible depending on genre and intent.

As the creator and marketer of your music, operating inside a completely different definition of what a song is is your prerogative— and in 2026, that definition is legitimately in motion.

I went in on the numbers to extract a point of reference. The math revealed new ideas.

That's sometimes how it goes.

GIRL BARS | Endodeca — Part 2 in the music technology series.
Part 1: [An Entire Album on 4MB. I Did the Math So You Don't Have To.]

Cover photo

At Supreme Racket Records, we have reached the inalienable conclusion that tokenized music collectors deserve more, not less.

A note on why the MP3 is finished: The format shift that’s elevating the ownership experience for heirloom music collectors—and where music has become about discovery again

The MP3 was once a miracle: music, liberated from shelves and shipping crates. But miracles are not meant to last forever. They exist to solve a problem, then get politely out of the way. The MP3 solved distribution. It did not solve meaning.

And meaning, inconveniently, is what collectors actually collect.

This new work marks a return to a very old idea: that music is not a file, but a thing—something you encounter, revisit, and slowly come to know.


post image

post image
Acquiring the work via Transient Labs means stewarding the release object itself—the canonical version, published onchain, with provenance and permanence built in. It’s not just access; it’s participation in the life of the work. You can experience it directly on the artist’s website, where the work lives as intended: immersive, dynamic, and alive. Visit maxximillian.com/endodeca to learn how to use the interactive controls and experience the piece live. For those who wish to add "They Were Wrong" by Endodeca to your collections, a convenient portal also lives there.

Why Interactivity Changes Everything

A song flattened into a file behaves like a postcard. It tells you something happened elsewhere.

An interactive record behaves like a room.

It listens when you enter. It responds when you stay. It reveals itself differently depending on how you move through it. The listener is no longer a consumer of sound, but a participant in form.

This is deeper and cultier than novelty. It is awe-restoration.

For most of human history, music was inseparable from context—space, ritual, repetition, memory. The MP3 removed all of that in exchange for convenience. Interactive works give it back, without asking permission from shelves, labels, or servers that forget.

For collectors, this is decisive.

You are no longer keeping a copy.
You are stewarding an experience.

SUPREME RACKET RECORDS

Written by

Every racket needs believers, and every believer deserves a cut. Albums are no longer dropped into the void — they’re delivered directly into the wallets of those who matter.

Subscribers<100
Posts6
Collects0
Subscribe