今天起床時就在想,一直無頭蒼蠅的Debug實在沒有系統,浪費時間沒有效率。最好的方式,還是重新有系統學習如何在預訓練模型操作遷移式學習,也有了新收穫!
1.Gain a deeper understanding: Transfer Learning for Pretrained Models (AlexNet).深入了解如何在預訓練模型進行遷移式學習,以將模型在新任務/新資料上進行訓練和預測。
2.競賽資料Model Debug紀錄
並不會固定神經網路的權重參數,重新訓練分類器層(最後一層)時,會進行反向傳播,更新整個神經網路的所有權重
不使用隨機初始參數做訓練,而是使用經過大型資料集訓練過的參數進行微調,其餘訓練過程和一般訓練一個全新卷積神經網路是一樣的。
Finetuning the convnet: Instead of random initialization, we initialize the network with a pretrained network, like the one that is trained on imagenet 1000 dataset. Rest of the training looks as usual.
將pre-trained model的權重固定住(除最後一層分類器的權重外),當作特徵提取器,單純針對分類器(最後一層)進行訓練
除了最後一層全連接層外的其他所有層(layer)都進行凍結(freeze the weights)。最後一層全連接層將給定隨機權重,整個神經網路只有對最後一層進行訓練。
ConvNet as fixed feature extractor: Here, we will freeze the weights for all of the network except that of the final fully connected layer. This last fully connected layer is replaced with a new one with random weights and only this layer is trained.
Load a pretrained model and reset final fully connected layer.
### Setting
model_ft = models.resnet18(pretrained=True) # finetune model(ft)
num_ftrs = model_ft.fc.in_features
# 依據任務不同,設定不同的輸出大小
model_ft.fc = nn.Linear(num_ftrs, , len(class_names)) # the size of each output sample is set to 33 (len(class_names)=33,因為競賽資料總共有33種作物)
model_ft = model_ft.to(device)
criterion = nn.CrossEntropyLoss()
# 對整個模型的參數都進行優化
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)
# 每7個訓練循環(epoch)就衰減0.1倍的學習率(learning rate)
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)
### Train & evaluate
model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
num_epochs=25)
from mobilenetv3 import mobilenetv3_large
model = mobilenetv3_large()
model.load_state_dict(torch.load('mobilenetv3-large-1cd25616.pth'))
# 透過修改model.classifier(out_ch),使符合新訓練任務目標
model.classifier = nn.Sequential(
nn.Linear(960, 1280),
h_swish(),
nn.Dropout(0.2),
nn.Linear(1280, out_ch), # 修改out_ch
)
為了凍結神經網路除了最後一層外的所有其他層,我們須設定requires_grad = False
來凍結參數,這樣梯度就不會在back()中計算,也就不會更新變動那些凍結層的參數,而只會對最後一層進行訓練和參數更新。
Here, we need to freeze all the network except the final layer. We need to set requires_grad = False to freeze the parameters so that the gradients are not computed in backward().
Note:
關於Pytorch的backward()中的自動求導數機制(AUTOGRAD MECHANICS),可參閱[官方文件](https://pytorch.org/docs/master/notes/autograd.html)
推薦閱讀:[鐵人賽文章:Day 2 動態計算圖:PyTorch's autograd](https://ithelp.ithome.com.tw/articles/10216440)
### Setting
model_conv = torchvision.models.resnet18(pretrained=True)
for param in model_conv.parameters():
param.requires_grad = False # 不自動更新參數:把所有層數都凍結(freeze the weights)
# Parameters of newly constructed modules have requires_grad=True by default
num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, 33) # 33: numbers of output class
model_conv = model_conv.to(device)
criterion = nn.CrossEntropyLoss()
# Only parameters of final layer are being optimized
optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9) # 只對最後一層參數做更新(指定最後一層:model_conv.fc)
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)
### Train and evaluate
model_conv = train_model(model_conv, criterion, optimizer_conv,
exp_lr_scheduler, num_epochs=25)
更了解目前模型預測準確率低的原因,有越來越清晰的感覺了!
繼續實驗和除錯,採用(方法_2)ConvNet as fixed feature extractor方式,在進行新的一次模型訓練和預測,查看此次實驗的準確率表現。有越來越清晰的感覺了!
參考
心得小語:
今天身體不適,連吞了兩顆止痛藥還是生理痛(ಥ﹏ಥ) 前陣子作息不正常又常常手搖飲的副作用QQ
下單兩罐營養保健食品&多喝溫熱湯,身體呀~比完賽後會好好善待妳的(再撐一下下啦,精神喊話!
今日工時50min*4
午夜前一小時的睡眠等於午夜後睡三小時 (要早睡!
One hour's sleep before midnight is worth three after.