Meta announces new AI for realistic music and sound generation from text

poory

2 years ago

Meta has released AudioCraft, a new open-source artificial intelligence system that allows users to generate original music, sound effects, and other audio content through text prompts.

The system consists of three different AI models trained on thousands of hours of audio data. The components include MusicGen for music generation, AudioGen for sound effect generation, and EnCodec, which helps train the models.

MusicGen can create instrumental music of various genres based on text prompts describing the mood, instruments, tempo, and other qualities. AudioGen generates sound effects like animal noises, weather, mechanical sounds, and more from text descriptions.

The key point is EnCodec, which learns discrete audio tokens to create a ‘fixed vocabulary’ for the models. This simplifies audio-generative AI design.

The AudioCraft family of models are capable of producing high-quality audio with long-term consistency, and they’re easy to use. With AudioCraft, we simplify the overall design of generative models for audio compared to prior work in the field.

The models could be helpful for game developers to create sound effects and for marketing teams to make commercial soundtracks or effects.

Nevertheless, questions remain around copyright and compensation as AI-generated content using work from others proliferates.

Until now, most models have been restricted to research, like Google’s MusicLM. Meta is betting that easy access to creative audio AI will spawn new art forms and use cases. The framework and models are available for non-commercial research and educational purposes.

On the other hand, Meta continued to grow in AI with this project. Earlier in June, the company announced Voicebox, which is designed to help creators with its ability to perform speech generation tasks such as audio editing, sampling and stylising, even if it wasn’t specifically trained to do so through in-context learning.

Also, we reported that the tech giant is working on various humanlike chatbots that will soon be able to converse with the users. These chatbots can take personas to simulate conversations with different individuals.