【Day19】用 4 種不同的 GAN 模型生成音樂簡介

13th鐵人賽

Rorschach

2021-09-19 14:34:14

1530 瀏覽

前言

接下來的時間我們會試著用 4 種不同的 GAN 模型來生成音樂。
更改的部分只有 Generator 與 Discriminator，訓練方法與 loss 都與 Dcgan 一樣。
最後我們期待給這個模型一段意義不明的噪音，它就會生成可以聽出旋律的鋼琴音樂。

參賽者 1 號：WavenetGan

就是把我們在第 7 天寫的 wavenet 當成 Generator，Discriminator 維持跟 Dcgan 一樣。

參賽者 2 : BidirectionalLSTMGAN

受到 kaggle 上這篇文章的啟發，用大量的 BidirectionalLSTM 當 Generator 跟Discriminator，缺點是訓練非常久，但是效果沒有話說的好。

參賽者 3 : WaveGan

WaveGan 是 Base On DCGAN 的模型，但他改了幾個操作:

把 Transpose Convolution 2D 變成 1D
Kernel size 5X5 to 25, strides from 2X2 to 4 拓展 respective filed
Remove BatchNormalization from DCGAN，這是他們意外發現，也不知道原理是什麼XD
訓練的時候用 WGAN-GP 的方式，也有用 WGAN 的版本，但這裡我們想要方便比較所以通通固定是 Dcgan 的訓練方式

WaveGan 的原始作者用這個網路模型成功的直接從 wave 生成出鼓聲跟鋼琴聲( 意思是它訓練時候的輸入就是 raw waveform)，雖然生成的都只有短短的 1 ~ 2 秒，但也許它在 MIDI 上表現可以更好?

參賽者 4 : TransformerGAN

因為 BidirectionalLSTMGAN 的效果還不錯但訓練時間驚人，因此在想要降低訓練成本的時候接觸了 Transformer，但這東西感覺不先花點時間研究，寫起來跟前面 3 個的模型難度上有著天壤之別(小弟資質愚魯，到現在也是一知半解，但我會盡力解釋的)，結論是它確實大大加速了訓練時間，結果也不錯。

資料準備

這次的實驗我們用的是莫札特音樂集V2.0.0

裏頭包含了 2004 ~ 2018 他們蒐集到的資料，全都是 MIDI 檔，你可以隨意選你想用的音樂，我這邊自己挑了 1281 首，處理的時候，我們不管 control change，只在乎 note_on (照理說應該要加 control change 下去訓練，但這樣處理資料變很麻煩，那時候又有時間壓力所以就沒又加了XD，改成最後生成的時候隨機添加)。

paths = []
songs = []
#append every filepath in your_music_folder folder to paths[]
for r, d, f in os.walk("./your_music_folder"):
    for file in f:
        if '.mid' in file:
            paths.append(os.path.join(r, file))

#for each path in the array, create a Mido object and append it to songs
for path in paths:
    mid = MidiFile(path, type = 1)
    songs.append(mid)
del paths


from math import sqrt
seq_len = 256
gen_len = int(sqrt(seq_len))

notes = []
dataset = []
chunk = []

# for each in midi object in list of songs
for i in range(len(songs)):
    for msg in songs[i]:
        # filtering out meta messages
        if not msg.is_meta:
        # filtering out control changes
            if (msg.type == 'note_on'):
                notes.append(msg.note)
    for j in range(1, len(notes)):
        chunk.append(notes[j])
        #save each 16 note chunk
        if (j % seq_len == 0):
            dataset.append(chunk)
            chunk = []
    print(f"Processing {i} Song")
    chunk = []
    notes = []

del chunk
del notes
train_data = np.array(dataset)
np.save("preprocess_data.npz",train_data)