Day 27：使用Keras撰寫生成式對抗網路(GAN)

第 12 屆 iThome 鐵人賽

DAY 27

AI & Data

輕鬆掌握 Keras 及相關應用系列第 27 篇

12th鐵人賽 ai machine learning tensorflow

I code so I am

2020-09-27 00:35:23

11806 瀏覽

分享至

前言

近年來，深度偽造(DeepFake)已經造成假影片氾濫，從剛開始的美國總統Obama的談話影片，到最近明星臉全部被套到各式的影片當中，真偽難辨，它根源的技術就是【生成式對抗網路】(Generative Adversarial Network, GAN)，相關的介紹可以參照【Day 28：小學生談『生成對抗網路』(Generative Adversarial Network，GAN）】一文，本篇不再贅述，將主題放在如何使用Keras實作GAN。

GAN 主流的演算法

根據【GAN Zoo】統計，2014~2018年 GAN 各式的演算法就有 502 個，比較炫的演算法可以參閱【Some cool applications of GAN】，摘要部分如下：

動漫人物創作(Create Anime characters)。
各種姿勢生成(Pose Guided Person Image Generation)：加上各種姿勢的關鍵點，GAN可以生成模特兒的各種站姿。
跨領域變換(Cross-domain transfer)：類似風格轉換，但只換目標物，例如下圖，將一般的馬加上斑馬紋，也能作相反的變換。
提升解析度(Super resolution)。
換臉(Progressive growing of GANs)
由影像分割(image segmentation)產生真實影像，右半圖變換為左半圖。

類似生成影像的應用，不勝枚舉，請參閱【Some cool applications of GAN】完整說明。

原理

GAN主要分為兩個網路，一個是偽造者(counterfeiter)就稱為『生成模型』（generative model），另一個稱為『判别模型』（discriminative model），簡單架構如下圖：

圖. GAN Architecture，圖片來源：generative-adversarial-networks

兩個網路分別定義不同的損失函數，把它合成起來，透過最大概似法(MLE)，優化求解，特別的是在『生成模型』求解的過程中，將影像不斷的擷取出來，就構成GAN最大的亮點。各種演算法定義不同的損失函數，就會產生不同的效果，如下表：

實作

接著我們就來實作CycleGAN演算法，將一般的馬加上斑馬紋，看看效果如何。此程式修改自【Keras 官網】，程式碼較長，只說明重要的區塊，完整程式請直接參考 27_01_CycleGAN.ipynb：

載入套件。
下載 horse to zebra 資料集，會存到 C:\Users\mikec\tensorflow_datasets\cycle_gan\horse2zebra\2.0.0 目錄下。

dataset, metadata = tfds.load('cycle_gan/horse2zebra',
                              with_info=True, as_supervised=True)

train_horses, train_zebras = dataset['trainA'], dataset['trainB']
test_horses, test_zebras = dataset['testA'], dataset['testB']

定義標準化及任意變換影像的函數。

buffer_size = 256
batch_size = 1

# Define the standard image size.
orig_img_size = (286, 286)
# Size of the random crops to be used during training.
input_img_size = (256, 256, 3)
# Weights initializer for the layers.
kernel_init = keras.initializers.RandomNormal(mean=0.0, stddev=0.02)
# Gamma initializer for instance normalization.
gamma_init = keras.initializers.RandomNormal(mean=0.0, stddev=0.02)


def normalize_img(img):
    img = tf.cast(img, dtype=tf.float32)
    # Map values in the range [-1, 1]
    return (img / 127.5) - 1.0


def preprocess_train_image(img, label):
    # Random flip
    img = tf.image.random_flip_left_right(img)
    # Resize to the original size first
    img = tf.image.resize(img, [*orig_img_size])
    # Random crop to 256X256
    img = tf.image.random_crop(img, size=[*input_img_size])
    # Normalize the pixel values in the range [-1, 1]
    img = normalize_img(img)
    return img


def preprocess_test_image(img, label):
    # Only resizing and normalization for the test images.
    img = tf.image.resize(img, [input_img_size[0], input_img_size[1]])
    img = normalize_img(img)
    return img

建立 Tensorflow Dataset。

# Apply the preprocessing operations to the training data
train_horses = (
    train_horses.map(preprocess_train_image, num_parallel_calls=autotune)
    .cache()
    .shuffle(buffer_size)
    .batch(batch_size)
)
train_zebras = (
    train_zebras.map(preprocess_train_image, num_parallel_calls=autotune)
    .cache()
    .shuffle(buffer_size)
    .batch(batch_size)
)

# Apply the preprocessing operations to the test data
test_horses = (
    test_horses.map(preprocess_test_image, num_parallel_calls=autotune)
    .cache()
    .shuffle(buffer_size)
    .batch(batch_size)
)
test_zebras = (
    test_zebras.map(preprocess_test_image, num_parallel_calls=autotune)
    .cache()
    .shuffle(buffer_size)
    .batch(batch_size)
)

為 CycleGAN 生成模型、判别模型建立各種神經區塊，類似 AutoEncoder，分成編碼(下採樣)及解碼(上採樣)，上採樣使用反捲積(Conv2DTranspose)。

class ReflectionPadding2D(layers.Layer):
    """Implements Reflection Padding as a layer.

    Args:
        padding(tuple): Amount of padding for the
        spatial dimensions.

    Returns:
        A padded tensor with the same type as the input tensor.
    """

    def __init__(self, padding=(1, 1), **kwargs):
        self.padding = tuple(padding)
        super(ReflectionPadding2D, self).__init__(**kwargs)

    def call(self, input_tensor, mask=None):
        padding_width, padding_height = self.padding
        padding_tensor = [
            [0, 0],
            [padding_height, padding_height],
            [padding_width, padding_width],
            [0, 0],
        ]
        return tf.pad(input_tensor, padding_tensor, mode="REFLECT")

# 殘差區塊
def residual_block(
    x,
    activation,
    kernel_initializer=kernel_init,
    kernel_size=(3, 3),
    strides=(1, 1),
    padding="valid",
    gamma_initializer=gamma_init,
    use_bias=False,
):
    dim = x.shape[-1]
    input_tensor = x

    x = ReflectionPadding2D()(input_tensor)
    x = layers.Conv2D(
        dim,
        kernel_size,
        strides=strides,
        kernel_initializer=kernel_initializer,
        padding=padding,
        use_bias=use_bias,
    )(x)
    x = tfa.layers.InstanceNormalization(gamma_initializer=gamma_initializer)(x)
    x = activation(x)

    x = ReflectionPadding2D()(x)
    x = layers.Conv2D(
        dim,
        kernel_size,
        strides=strides,
        kernel_initializer=kernel_initializer,
        padding=padding,
        use_bias=use_bias,
    )(x)
    x = tfa.layers.InstanceNormalization(gamma_initializer=gamma_initializer)(x)
    x = layers.add([input_tensor, x])
    return x


# 下採樣
def downsample(
    x,
    filters,
    activation,
    kernel_initializer=kernel_init,
    kernel_size=(3, 3),
    strides=(2, 2),
    padding="same",
    gamma_initializer=gamma_init,
    use_bias=False,
):
    x = layers.Conv2D(
        filters,
        kernel_size,
        strides=strides,
        kernel_initializer=kernel_initializer,
        padding=padding,
        use_bias=use_bias,
    )(x)
    x = tfa.layers.InstanceNormalization(gamma_initializer=gamma_initializer)(x)
    if activation:
        x = activation(x)
    return x


# 上採樣
def upsample(
    x,
    filters,
    activation,
    kernel_size=(3, 3),
    strides=(2, 2),
    padding="same",
    kernel_initializer=kernel_init,
    gamma_initializer=gamma_init,
    use_bias=False,
):
    x = layers.Conv2DTranspose(
        filters,
        kernel_size,
        strides=strides,
        padding=padding,
        kernel_initializer=kernel_initializer,
        use_bias=use_bias,
    )(x)
    x = tfa.layers.InstanceNormalization(gamma_initializer=gamma_initializer)(x)
    if activation:
        x = activation(x)
    return x