Native Instruments Shouldn’t Be the Only Company Creating the Stems Format

Screen Shot 2015-03-30 at 10.52.47 AM

The following post comes from Matt Aimonetti, Co-Founder and CTO of Splice.  Reposted with permission from Splice’s blog.

The launch of Stems from Native Instruments at World Music Conference in Miami is a major step forward in how artists will create and share music together. As reported by DJ TechTools, NI will support stems through an open audio multi track format. Using a container file format, a song can be distributed with five tracks: a mixdown compatible with most audio players and four discrete tracks (stems). What this means is that we will begin to see music producers split their tracks into four musical sub elements such as a drum stem, a bassline stem, a synth stem, and a vocal stem.

Screen-Shot-2014-08-26-at-11.22.08-AM-1024x544

At Splice we’ve been looking at stems and various formats for a while now. We are strong open source advocates so we’re thrilled to see NI has used an existing, well documented, open container format called MP4 originally invented by Apple. Our engineering team frequently deals with file formats and what we do know is that the audio and music industry has a hard time with standards. Even though we’ve seen in numerous occasions that standards are in fact beneficial. In 1996, Steinberg changed the industry by releasing their VST plugin specification format and SDK. By opening up their format, VST became a standard used by dozens of DAWs and most plugins are now written for VST and wrapped to support AudioUnit and RTAS/AAX formats. Modern web features are also based on long discussions between the main browser manufacturers (Chrome, Safari, IE, Opera), the end result is that a website looks the same on all browsers and developers don’t have to write different code for Chrome or Safari.

While we are delighted to see NI use open source technology and formats, we also think that if we want to have a long living, unified format supported by today and tomorrow’s main players, we should have an open discussion about the specifications of this format.

How is NI planning on supporting a multitrack file format? The details will not become available until it’s full release in June, but based on the information NI provided and our knowledge of the MP4 container, it’s pretty easy to deduct.

What we know:

  • NI Stems use the standard MP4 container for backward compatibility.
  • NI Stems files will be played on players supporting mp4 specs.
  • NI will require Stems to use a .stem.mp4 file extension.
  • NI Stems can only contain a maximum of 4 stem tracks per container.
  • NI Stems will use ID3 tags + maybe another kind of tagging for stem naming.
  • There is another similar format out there called .MOGG. It’s used by the Rock Band game, supported by Audacity using Ogg Vorbis, and popular with users looking for illegal stem tracks.

The MP4 container format:

MP4 aka MPEG-4 format is a container format meaning that it’s a well defined way to package information. It’s usually used to store video, audio, subtitles and images. The format is directly based off of Apple’s QuickTime format and was standardised as an ISO format in 2001 then updated in 2003. An MP4 file is basically some metadata and one or more data streams. MP4 is used for both audio and video, the underlying media streams can be encoded using a wide range of codecs, H264 and AAC being the most common ones. The metadata support is much richer than MP3 and was designed by Adobe.

Putting stems into a MP4 container has been fully supported since at least 2003, and I will show you how you can create your own stem files today. Note that NI didn’t mention the stream formats their tools will support but based on the demos shown online a stem file generated by their editor is somewhere between 30 and 80MB which would assume compression. Technically speaking, if we are just looking at the container format, there is no reason why they would not also allow raw PCM stems. There is also no format limitation that enforces only 5 streams (1 mixdown + 4 stems), that’s an artificial limitation added by NI, probably because they don’t expect to use more tracks for now.

So how does it work or, how should it work?

Everything starts with a normal MP4 container and an audio “stream”. To make things easier and avoid having you to read code, I’ll use FFmpeg — a very popular, free, cross platform and open source collection of libraries and tools to inspect, record, convert and stream audio and video. I don’t have a NI Stem file, but because we know that they followed to the MP4 specs, we can safely assume that the mixdown/bounce track is a normal MP4 audio file, more than likely using AAC since iTunes doesn’t support MP3 streams in a MP4 container.

Here is how I convert a mixdown I exported from Ableton into a mp4 file using FFmpeg:

$ ffmpeg -i ~/Desktop/bounce.wav -c:a libfaac ~/Desktop/bounce.mp4

FFmpeg nicely converted my input WAV file into a MP4 container file with an aac encoded stream. The file plays perfectly in iTunes and all players supporting MP4 audio files.

As previously mentioned, MP4 is a container format, we can put a lot of things into the file and you have room for extra storage such as metadata and data (MIDI, samples etc..) Often MP4 video files contain a video stream + a few audio streams. The main audio stream is the main language and the others are dubbed versions of the main track. All streams are synchronized allowing the audience to change audio stream while the video keeps on playing.

Let’s add some stems to our MP4 container. I have extracted two stems from my Ableton session: drums.wav and synth.wav. I’d like to inject them in my container but I still want the mixdown to continue playing properly in iTunes.

$ ffmpeg -i ~/Desktop/bounce.wav -i ~/Desktop/drums.wav -i ~/Desktop/synth.wav -map 0 -map 1 -map 2 -c:a libfaac ~/Desktop/bounce.stem.mp4

This long command is telling FFmpeg that we have three audio input files and that we want them transcoded to AAC and multiplexed into one file, with the first file being the first stream.

FFmpeg abides and give us a bigger file that our original bounce.mp4. The first output was 3.6MB, the second one 11.1MB — which is normal since my stem.mp4 file contains 3 streams instead of one.

The file plays perfectly in iTunes so we know we didn’t mess things up. How about checking that the stems were properly added to the container? Good news, since we are using the MP4 container format we can open VLC and check the various streams.

Media_Information_and_bounce_stem_mp4

It looks good, how about listening to the streams to make sure they work? Not a problem, click on the Audio menu > Audio Track and pick another track.

Audio_Track_and_Audio_and_Menubar

Awesome, everything works as expected! The only issue remaining is how would a DJ/producer know the content of our stems? Pretty easy, the MP container has a good metadata support and we can use FFmpeg to set the title of our tracks:

$ ffmpeg -i ~/Desktop/bounce.wav -i ~/Desktop/drums.wav -i ~/Desktop/synth.wav -map 0 -map 1 -map 2 -c:a libfaac -metadata:s:0 title=mix -metadata:s:1 title=drums -metadata:s:2 title=synth ~/Desktop/bounce.stem.mp4

Audio_Track_and_Audio_and_Menubar-1

Based on this quick demo, I think it’s fair to say Native Instruments products will read multiple streams at once to allow you to mix them live. There is technically no reason to limit the stems to 4 the way NI is doing it. Our guess is that they have their own reasons (memory limitations, user experience based on their 4 track hardware). Our hope is that Traktor Pro and the existing and upcoming controllers will accept mp4 files with more stems, but will let NI users access the first 4 stems. This will allow other products (software/hardware) to go beyond NI’s current limit.

+How to Make Music With Sushi, by Just Blaze and TOKiMONSTA…

There is also a lot more we can do. At Splice we work at the source code level of music, we can inject all kinds of data/metadata into a container format, from cues, markers and loops to automation, visualization, samples, MIDI, and presets. While we praise NI for choosing a great existing ISO format, the devil is in the details. If we truly want a unified stem format, the details of this format can’t be solely crafted by one company in order to factor in everyone’s interest. Apple realized that quickly and knew that if they wanted their format to be adopted by all, they had to have the community manage it. That step forward is what allowed Native Instruments to support multitrack files using the ISO version of the QuickTime format MP4.

We would like to begin leading an initiative of open format development for Stems, but also for cross DAW format exchange. In the near term we will begin reaching out to our software partners to drive this forward. If you’re interested in joining the conversation, email us at [email protected]

4 Responses

  1. annoyed

    Please make this site mobile friendly, it’s borderline ironic that it isn’t

    Reply
  2. Anonymous

    This seems a bit defensive on Splice’s part, NI might eat their lunch.

    Reply
  3. FarePlay

    ” At Splice we work at the source code level of music, we can inject all kinds of data/metadata into a container format, from cues, markers and loops to automation, visualization, samples, MIDI, and presets.”

    Native Instruments. Kind of an oxymoron.

    Reply

Leave a Reply

Your email address will not be published.

Verify Your Humanity *