iT邦幫忙

2023 iThome 鐵人賽

DAY 23
0

前言

  • 前幾天花了一些時間討論非監督式學習與自編碼器Autoencoder架構,今天就讓我們實際操作一下看看吧!

先備知識

  1. Python(至少對Python語法不陌生)
  2. Pytorch如何載入預訓練模型與資料集(https://ithelp.ithome.com.tw/articles/10323073https://ithelp.ithome.com.tw/articles/10322732)
  3. ResNet架構的特點(https://ithelp.ithome.com.tw/articles/10334832https://ithelp.ithome.com.tw/articles/10335396)
  4. 捲積運算(可以回顧:https://ithelp.ithome.com.tw/articles/10323076 )
  5. 捲積神經網路(可以回顧:https://ithelp.ithome.com.tw/articles/10323077 )

看完今天的內容你可能會知道......

  1. 如何透過全線性層建構Autoencoder
  2. 如果透過CNN架構實現Autoencoder

一、Autoencoder(AE) Pytorch實戰

  • 在我們介紹AE的時候有提到過,AE只是一種架構,我們可以通過不同的方式實現,所以今天會介紹兩種方式,一種是以線性層為主的MLP架構,另一種則是使用捲積層為主的CNN架構。
  • 我們這次實戰所使用的資料集依然是先前介紹過的MNIST手寫數字資料集,可以從Pytorch提供的API直接下載。

    1. 利用MLP架構

    • 這邊我們使用單純的nn.linear架構建立一個Autoencoder模型:
    # Define the Autoencoder model
    class Autoencoder(nn.Module):
        def __init__(self):
            super(Autoencoder, self).__init__()
            self.encoder = nn.Sequential(
                nn.Linear(1 * 28 * 28, 128),
                nn.ReLU(),
                nn.Linear(128, 64),
                nn.ReLU(),
                nn.Linear(64, 32),
                nn.ReLU()
            )
            self.decoder = nn.Sequential(
                nn.Linear(32, 64),
                nn.ReLU(),
                nn.Linear(64, 128),
                nn.ReLU(),
                nn.Linear(128, 1 * 28 * 28),
                nn.Tanh()  # Output values between -1 and 1 (suitable for images)
            )
    
        def forward(self, x):
            x = self.encoder(x)
            x = self.decoder(x)
            return x
    
    # Initialize the Autoencoder model
    model = Autoencoder()
    
    • 整個模型架構如下圖所示:
      https://ithelp.ithome.com.tw/upload/images/20231008/20163299it1YC4jTwy.png
    • 可以看到在Encoder的部分是從高維度逐步學習到低維度特徵,而Decoder則相反,是從低維度特徵學習到高維度特徵,符合Autoencoder的設計概念。
    • 接著,建立好模型之後,我們就可以開始訓練了:
    import torch
    import torch.nn as nn
    import torchvision
    import torchvision.transforms as transforms
    import matplotlib.pyplot as plt
    import torch.optim as optim
    
    # Hyperparameters
    batch_size = 64
    learning_rate = 0.001
    num_epochs = 10
    
    # Data preprocessing and loading
    transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5), (0.5))])
    
    train_dataset = torchvision.datasets.MNIST(root='./data', train=True, transform=transform, download=True)
    train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
    
    # Define the Autoencoder model
    class Autoencoder(nn.Module):
        def __init__(self):
            super(Autoencoder, self).__init__()
            self.encoder = nn.Sequential(
                nn.Linear(1 * 28 * 28, 128),
                nn.ReLU(),
                nn.Linear(128, 64),
                nn.ReLU(),
                nn.Linear(64, 32),
                nn.ReLU()
            )
            self.decoder = nn.Sequential(
                nn.Linear(32, 64),
                nn.ReLU(),
                nn.Linear(64, 128),
                nn.ReLU(),
                nn.Linear(128, 1 * 28 * 28),
                nn.Tanh()  # Output values between -1 and 1 (suitable for images)
            )
    
        def forward(self, x):
            x = self.encoder(x)
            x = self.decoder(x)
            return x
    
    # Initialize the Autoencoder model
    model = Autoencoder()
    
    # Loss and optimizer
    criterion = nn.MSELoss()  # Mean Squared Error (MSE) loss
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    
    # Training loop
    total_step = len(train_loader)
    for epoch in range(num_epochs):
        for i, (images, _) in enumerate(train_loader):
            # Flatten the images
            images = images.view(-1, 1 * 28 * 28)
    
            # Forward pass
            outputs = model(images)
            loss = criterion(outputs, images)
    
            # Backward pass and optimize
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
    
            if (i + 1) % 100 == 0:
                print(f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{total_step}], Loss: {loss.item():.4f}')
    
    print('Training finished.')
    
    • 最後,我們可以利用下方的程式碼輸出原始圖片以及經過模型還原後的圖片:
    # Test the autoencoder
    model.eval()
    with torch.no_grad():
        for images, _ in train_loader:
            outputs = model(images.view(-1, 1 * 28 * 28)).view(-1, 1,28,28)
            break
    
    # Display original and reconstructed images
    fig, axes = plt.subplots(2, 5, figsize=(12, 6))
    for i in range(5):
        axes[0, i].imshow(images[i].cpu().numpy().transpose(1, 2, 0))
        axes[0, i].set_title('Original')
        axes[0, i].axis('off')
        axes[1, i].imshow(outputs[i].cpu().numpy().transpose(1, 2, 0))
        axes[1, i].set_title('Reconstructed')
        axes[1, i].axis('off')
    plt.show()
    
    • 運行結果應該會如下圖一樣,訓練好的模型已經能很好的根據輸入圖片還原出一張極度相似的圖片出來:
      https://ithelp.ithome.com.tw/upload/images/20231008/20163299yF0LtKpVjz.png
    • 註:因為有經過歸一化處理,所以圖片不是資料集中黑白色的樣子。

    2. 利用CNN架構

    • 除了全部都使用線性層以外,當然也可以通過捲積層來實現Autoencoder架構,我們使用下方程式碼就可以建構這樣的模型:
    # Define the Autoencoder model with Conv2d layers and a bottleneck (nn.Linear)
    class Autoencoder(nn.Module):
        def __init__(self):
            super(Autoencoder, self).__init__()
            self.encoder = nn.Sequential(
                nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),
                nn.ReLU(),
                nn.Conv2d(16, 32, kernel_size=3, stride=2, padding=1),
                nn.ReLU(),
                nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1),
                nn.ReLU()
            )
            self.bottleneck = nn.Sequential(
                nn.Flatten(),
                nn.Linear(64 * 4 * 4, 256),  # Bottleneck layer with reduced dimensionality
                nn.ReLU(),
                nn.Linear(256, 64 * 4 * 4),  # Bottleneck layer with reduced dimensionality
                nn.ReLU()
            )
            self.decoder = nn.Sequential(
                nn.ConvTranspose2d(64, 32, kernel_size=3, stride=2, padding=1, output_padding=1),
                nn.ReLU(),
                nn.ConvTranspose2d(32, 16, kernel_size=3, stride=2, padding=1, output_padding=0),
                nn.ReLU(),
                nn.ConvTranspose2d(16, 1, kernel_size=3, stride=2, padding=2, output_padding=1),
                nn.Tanh()  # Output values between -1 and 1 (suitable for images)
            )
    
        def forward(self, x):
            x = self.encoder(x)
            x = self.bottleneck(x)
            x = self.decoder(x.view(x.size(0), 64, 4, 4))  # Reshape for decoding
            return x
    
    # Initialize the Autoencoder model
    model = Autoencoder()
    
    • 建構出來的模型架構如下圖所示:
      https://ithelp.ithome.com.tw/upload/images/20231008/20163299bESwoKmOvx.png
    • 架構和全部由線性層建構出來的模型類似(畢竟都是Autoencoder架構),唯一的差別在於CNN的特性,讓每一層的輸出圖片大小會逐步縮小,也因此在Decoder的過程,我們需要逐步將徒「還原/放大」回來,這個過程可以說與Encoder在做的事情是相對的,具體上來說,我們是通過ConvTranspose2d運算來做到這件事,雖然這樣講在數學上不太對,可是我們可以先把這樣的運算想成是卷機運算的「反運算」,所以捲機運算做過的事,ConvTranspose2d運算就會反過來做,讓我們可以逐步將圖片放大回來。
    • 這個部分可以延伸參考:https://www.geeksforgeeks.org/apply-a-2d-transposed-convolution-operation-in-pytorch/ 與Pytorch的使用說明https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html
    • 在建構好模型之後,我們使用相同的方式訓練這樣的模型:
    import torch
    import torch.nn as nn
    import torch.optim as optim
    import torchvision
    import torchvision.transforms as transforms
    import matplotlib.pyplot as plt
    
    # Hyperparameters
    batch_size = 64
    learning_rate = 0.001
    num_epochs = 10
    
    # Data preprocessing and loading
    transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5), (0.5))])
    
    train_dataset = torchvision.datasets.MNIST(root='./data', train=True, transform=transform, download=True)
    train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
    
    # Define the Autoencoder model with Conv2d layers and a bottleneck (nn.Linear)
    class Autoencoder(nn.Module):
        def __init__(self):
            super(Autoencoder, self).__init__()
            self.encoder = nn.Sequential(
                nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),
                nn.ReLU(),
                nn.Conv2d(16, 32, kernel_size=3, stride=2, padding=1),
                nn.ReLU(),
                nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1),
                nn.ReLU()
            )
            self.bottleneck = nn.Sequential(
                nn.Flatten(),
                nn.Linear(64 * 4 * 4, 256),  # Bottleneck layer with reduced dimensionality
                nn.ReLU(),
                nn.Linear(256, 64 * 4 * 4),  # Bottleneck layer with reduced dimensionality
                nn.ReLU()
            )
            self.decoder = nn.Sequential(
                nn.ConvTranspose2d(64, 32, kernel_size=3, stride=2, padding=1, output_padding=1),
                nn.ReLU(),
                nn.ConvTranspose2d(32, 16, kernel_size=3, stride=2, padding=1, output_padding=0),
                nn.ReLU(),
                nn.ConvTranspose2d(16, 1, kernel_size=3, stride=2, padding=2, output_padding=1),
                nn.Tanh()  # Output values between -1 and 1 (suitable for images)
            )
    
        def forward(self, x):
            x = self.encoder(x)
            x = self.bottleneck(x)
            x = self.decoder(x.view(x.size(0), 64, 4, 4))  # Reshape for decoding
            return x
    
    # Initialize the Autoencoder model
    model = Autoencoder()
    
    # Loss and optimizer
    criterion = nn.MSELoss()  # Mean Squared Error (MSE) loss
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    
    # Training loop
    total_step = len(train_loader)
    for epoch in range(num_epochs):
        for i, (images, _) in enumerate(train_loader):
            # Forward pass
            outputs = model(images)
            loss = criterion(outputs, images)
    
            # Backward pass and optimize
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
    
            if (i + 1) % 100 == 0:
                print(f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{total_step}], Loss: {loss.item():.4f}')
    
    print('Training finished.')
    
    • 並且也可以通過輸出結果視覺化的觀察一下模型訓練的好壞:
    # Test the autoencoder
    model.eval()
    with torch.no_grad():
        for images, _ in train_loader:
            print(images.shape)
            outputs = model(images).view(-1, 1,28,28)
            break
    
    # Display original and reconstructed images
    fig, axes = plt.subplots(2, 5, figsize=(12, 6))
    for i in range(5):
        axes[0, i].imshow(images[i].cpu().numpy().transpose(1, 2, 0))
        axes[0, i].set_title('Original')
        axes[0, i].axis('off')
        axes[1, i].imshow(outputs[i].cpu().numpy().transpose(1, 2, 0))
        axes[1, i].set_title('Reconstructed')
        axes[1, i].axis('off')
    plt.show()
    
    • 輸出結果如下圖所示;
      https://ithelp.ithome.com.tw/upload/images/20231008/20163299YNA2z9gc8Y.png

二、總結

  • 今天主要介紹了兩種實現Autoencoder的方式,讓大家從實際的程式碼中觀察看看特徵維度的變化過程,也可以實際嘗試看看模型還原圖片的效果,或許之後有時間的話我們也可以額外寫一篇內容來講如何透過VAE生成圖片的,可以稍微期待一下(?

上一篇
番外篇--AI中的機率統計與VAE
下一篇
CNN經典論文導讀(五)--DenseNet
系列文
AI白話文運動系列之「A!給我那張Image!」30
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言