MIT Develops AI That Can Isolate and Edit the Individual Instruments in a Song

Violin musical instrument
  • Save

MIT’s latest AI feat has the power to pick apart music in an unprecedented way. When a song is released, it’s in its final form. It’s a single audio file that is nearly impossible to separate into individual instruments and voices. The new AI capability could seriously alter audio editing and enable impeccable audio restoration methods for old music. Additionally, a band teacher could place a video of an orchestra and isolate individual instruments for the students to hear. The possibilities go on and on.

MIT’s PixelPlayer is a deep-learning AI. What does this mean? Deep-learning means that the AI can leverage varying patterns, regardless of their complexity, through neural networks that were implemented on previous videos. In PixelPlayer, there’s one neural network that learns the visuals, another for the audio, and the last one for the specific pixels with certain sound waves to pull apart the various sounds.

Furthermore, PixelPlayer is self-supervised, which means that MIT and its engineers aren’t always able to pinpoint how it learns which instruments make certain sounds. “Trained on over 60 hours of videos, the ‘PixelPlayer’ system can view a never-before-seen musical performance, identify specific instruments at pixel level, and extract the sounds that are associated with those instruments,” states MIT.

The lead author for the project, Hang Zhao, envisioned a best-case scenario in which the researchers could recognize which instruments make which sounds. “We were surprised that we could actually spatially locate the instruments at the pixel level,” states Zhao. “Being able to do that opens up a lot of possibilities, like being able to edit the audio of individual instruments by a single click on the video.”

PixelPlayer’s advanced capabilities will also have a significant impact on the music industry, as it can now be used to isolate individual instruments for licensing purposes. Think of a deeper ID’ing Shazam, and you get the idea. Companies are already using artificial intelligence to make processes more efficient, including writing music.

The PixelPlayer system has been developed by MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and was presented at the Conference on Neural Information Processing Systems in Montreal. The researchers used a machine-learning algorithm to train the system to identify different instruments in a video.

The system works by analysing the pixels in a video to identify the individual instruments being played. It then matches each instrument to the corresponding sound wave in the audio track, allowing individual sounds to be isolated and manipulated.

The researchers tested the system on a range of videos, including performances by the Boston Symphony Orchestra and a rock band, and found that it was able to identify individual instruments with a high degree of accuracy.

The PixelPlayer system is still in the early stages of development, and it’s unclear when it might be commercially available. However, the researchers behind it believe that it could have a significant impact on the music industry and other fields where audio editing is important.

In conclusion, the new AI-powered PixelPlayer system developed by MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) has the capability to isolate individual instruments within a piece of music, making it possible to adjust the individual elements, remove them, or remix them in any way. The system works by analysing the pixels in a video to identify the individual instruments being played. It then matches each instrument to the corresponding sound wave in the audio track, allowing individual sounds to be isolated and manipulated. The system is still in the early stages of development, but it could have a significant impact on the music industry and other fields where audio editing is important.