【附錄11】後續探索---音樂類AI工具：Meta AudioCraft

2023 iThome 鐵人賽

AI & Data

30天深度探索免費生成式AI工具：實踐多樣AIGC應用系列第 41 篇

15th鐵人賽

Jason Hung

2023-10-27 23:46:36

1194 瀏覽

分享至

AudioCraft

Meta 推出新的生成式 AI 工具 AudioCraft，是用於音訊生成深度學習研究的 PyTorch 函式庫。 AudioCraft 包含兩種用於產生高品質音訊的最先進的 AI 生成模型的推理和訓練程式碼：AudioGen 和 MusicGen。透過 MusicGen UI 介面，只要輸入文字描述，AudioCraft 就可根據文字內容生成一段全新音樂或音效。

你可以在 huggingface 裡的 facebook/MusicGen 玩看看，不過這裡只能產生12秒的音樂。或是也可以看這篇YouTube教學，自己安裝執行，在 colab 打4行code，就可以順利看到 MusicGen UI了，最多可以生成120秒音樂，有4種模型可以使用。

讓我們一起試一下吧！

使用線上服務 huggingface/facebook/MusicGen

https://huggingface.co/spaces/facebook/MusicGen

自行安裝執行

建立 colab 檔案，我的colab範例檔
設定環境，選擇 T4 GPU
輸入程式碼並執行

! git clone -b v1.0 https://github.com/camenduru/audiocraft.git
%cd audiocraft/
! pip install -r requirements.txt
! python app.py --share

點擊 Running on public URL，就會開啟 MusicGen UI。

MusicGen

查看 MusicGen UI

這裡提供 4 種模型：
Melody：以文字為條件的 300M 解碼器。也可以加入旋律生成音樂。
Medium：以文字為條件的 1.5B 解碼器。
Small：以文字為條件的 300M 解碼器。
Large：以文字為條件的 3.3B 解碼器。
使用 Melody 時，您可以選擇提供參考旋律檔，模型將嘗試遵循所提供的描述和旋律。
點選下方 Examples 的第一個範例，會自動把內容填到上面的欄位。
文字描述：An 80s driving pop song with heavy drums and synth pads in the background (一首 80 年代流行歌曲，背景是沉重的鼓和合成器)
旋律條件：巴哈音樂 bach.mp3
查看 Input Text 和 Melody Condition (optional)，點擊 Submit
第一次需要先下載 melody 模型，可以回到 ipynb 查看執行訊息。

Loading model melody
Downloading (…)ssion_state_dict.bin: 236M
...
Downloading state_dict.bin: 2.77G
Downloading: "https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/955717e8-8726e21a.th" to /root/.cache/torch/hub/checkpoints/955717e8-8726e21a.th 80.2M
Downloading (…)ve/main/spiece.model: 792k
Downloading (…)/main/tokenizer.json: 1.39M
Downloading (…)lve/main/config.json: 1.21k
Downloading model.safetensors: 892M

下載 melody 模型完成後，開始生成。

new batch 1 ['An 80s driving pop song with heavy drums and synth pads in the background'] [(44100, (442368, 2))]
...
[libx264 @ 0x5b4664c22600] 264 - core 163 r3060 5db6aa6 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - ...
Output #0, mp4, to 'UKDiC7rAcXz3.mp4':