Meta JASCO GenAI Model Can Create From Inputs Including Chords and Beats

Meta JASCO GenAI
  • Save

Meta JASCO GenAI
  • Save
Photo Credit: BandLab

Meta’s Fundamental AI Research (FAIR) team has revealed several new generative AI models focused on audio generation, text-to-vision, and watermarking. The audio generation model JASCO is capable of accepting not just text inputs, but also chords and beats for additional customization of the generated audio.

“By publicly sharing our early research work, we hope to inspire iterations and ultimately help advance AI in a responsible way,” Meta said in a press release detailing its new AI models. The most relevant tool to the music industry appears to be JASCO, which stands for Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation.

JASCO can accept inputs like a chord or beat and improve the final AI-generated sound using those inputs. The model allows users to adjust features of the generated sound, including melody, drums, and chords, while using the text-to-music prompt to further hone the final sound using text alone. The FAIR team says it will release the JASCO inference code as part of its AudioCraft AI audio model library under an MIT license and the pre-trained model on a non-commercial Creative Commons license.

The FAIR team will also launch AudioSeal, which adds watermarks to AI-generated speech. It’s a tool Meta has developed specifically to identify content made with AI. “We believe [AudioSeal] is the first audio watermarking technique designed specifically for the localized detection of AI-generated speech, making it possible to pinpoint AI-generated segments within a longer audio snippet,” the team says about the tool.

While the team details its Chameleon image generation model, it says it is not releasing the image generation tool to the public yet. Chameleon 7B and 34B allow users to point the models at tasks requiring visual and textual understanding—like image captioning. Only the text generation model of Chameleon will be made available to researchers for safety reasons, the FAIR team says.