iT邦幫忙

0

[深度學習] 多輸入模型用.flow_from_directory()卻只讀到其中一個資料夾

  • 分享至 

  • xImage

首先感謝點進來的各位大大們,這是小弟我第一次在此發問,排版較差,還請大家耐心看完。

問題:
目前正在學習如何使用多輸入模型,但遇到的這個問題在網路上找了許久卻找不到解決方法。
這是本該input進去的資料:

訓練資料1: origin: 5 classes, 150 images
訓練資料2: entropy: 5 classes, 150 images
驗證資料1: origin: 5 classes, 150 images
驗證資料2: entropy: 5 classes, 150 images

也就是說本來預期會得到以下這樣的結果

Found 150 images belonging to 5 classes.
Found 150 images belonging to 5 classes.
Found 150 images belonging to 5 classes.
Found 150 images belonging to 5 classes.

但卻只出現以下結果

Found 150 images belonging to 5 classes.
Found 150 images belonging to 5 classes.

檢查後發現只有訓練資料1: origin 和驗證資料1: origin 被抓進去,請問各位大大該如何解決此問題?

以下為程式碼

train_gen = ImageDataGenerator()
test_gen = ImageDataGenerator()

def generate_generator_multiple(generator, dir1, dir2, batch_size, img_rows, img_cols):
    
    genX1 = generator.flow_from_directory(
                                dir1,
                                target_size = (img_rows, img_cols),
                                #target_size = (64, 64),
                                # 遵照論文所示,將batch_size 設定為8
                                batch_size = batch_size,
                                #shuffle=False,
                                class_mode = 'categorical',
                                seed=666
                                )
    
    genX2 = generator.flow_from_directory(
                                dir2,
                                target_size = (img_rows, img_cols),
                                #target_size = (64, 64),
                                # 遵照論文所示,將batch_size 設定為8
                                batch_size = batch_size,
                                #shuffle=False,
                                class_mode = 'categorical',
                                seed=666
                                )
    
    while True:
            X1i = genX1.next()
            X2i = genX2.next()
            yield [X1i[0], X2i[0]], X2i[1]  #Yield both images and their mutual label
                       
inputgenerator = generate_generator_multiple(
                                            generator=train_gen,
                                            dir1=train_origin_dir,
                                            dir2=train_entropy_dir,
                                            batch_size=batch_size,
                                            img_rows=img_rows,
                                            img_cols=img_cols
                                            )       
     
testgenerator = generate_generator_multiple(
                                           generator=test_gen,
                                           dir1=test_origin_dir,
                                           dir2=test_entropy_dir,
                                           batch_size=batch_size,
                                           img_rows=img_rows,
                                           img_cols=img_cols
                                           )
                                           
---- 中間為神經網路的架構----

history = model.fit(
          inputgenerator,
          steps_per_epoch = trainsetsize/batch_size,
          validation_data = testgenerator,
          validation_steps = testsetsize/batch_size,
          epochs = epochs,
          callbacks = callbacks_list,
          verbose = 2
          )

只出現

Found 150 images belonging to 5 classes.
Found 150 images belonging to 5 classes.
Epoch 1/1000
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

1 個回答

1
I code so I am
iT邦高手 1 級 ‧ 2022-06-09 07:42:06
最佳解答

flow_from_directory 同一類別只能放在一個子目錄,如下圖,因為子目錄名稱會被視為 label,可參考官網
簡單的解法就是複製相同類別的資料至同一個子目錄。

看更多先前的回應...收起先前的回應...
nykd iT邦新手 5 級 ‧ 2022-06-09 08:48:04 檢舉

謝謝您的解答,但這樣會產生以下情況,是否變成1個input有300張圖片?

Found 300 images belonging to 5 classes.
Found 300 images belonging to 5 classes.

以下是我原本的路徑,每個路徑的最後都是5個label的資料夾,其中又各有30張圖片,。

train_origin_dir = f"D:/img_data/{dataset_origin}/{dataset_origin}_train"

test_origin_dir = f"D:/img_data/{dataset_origin}/{dataset_origin}_test"

train_entropy_dir = f"D:/img_data/{dataset_entropy}/{dataset_entropy}_train"

test_entropy_dir = f"D:/img_data/{dataset_entropy}/{dataset_entropy}_test"

原本預期是會得到下方的結果。

Found 150 images belonging to 5 classes.
Found 150 images belonging to 5 classes.
Found 150 images belonging to 5 classes.
Found 150 images belonging to 5 classes.

會有這樣的預期主要是受這篇文的啟發,但他的程式碼中gen1, gen2 的directory 是一樣的,這樣不就會發生上方Found 300 images belonging to 5 classes.的情況,可他產生的結果卻是我想要的。

1個input有300張圖片 ==> 會有問題?

nykd iT邦新手 5 級 ‧ 2022-06-09 12:21:58 檢舉

原本期望的是兩個input,
input1:5個label,150張圖片
input2:5個label,150張圖片

如果照您說的那樣做是否會變成下方情況?
input1:5個label,300張圖片
input2就沒東西了

上面的訊息:
Found 300 images belonging to 5 classes.
Found 300 images belonging to 5 classes.

訓練及測試資料各有300筆,5 classes。

nykd iT邦新手 5 級 ‧ 2022-06-09 17:40:27 檢舉

但我這個是multiple inputs model,有兩個inputs的話,應該要為以下結果,還是我的想法是錯的呢?

訓練資料1: origin: 5 classes, 150 images ---> input1
訓練資料2: entropy: 5 classes, 150 images ---> input2

驗證資料1: origin: 5 classes, 150 images
驗證資料2: entropy: 5 classes, 150 images

nykd iT邦新手 5 級 ‧ 2022-06-09 18:51:46 檢舉

以這個例子來講,他得到結果就是2個inputs,分別有2個classes和5000張圖片,但我不明白為什麼他的directory是一樣的,想請問他是用甚麼方式把不同的inputs圖片存放在同一個directory,卻可以用.flow_from_directory讀取到那樣的結果?
因為此篇已是2年前在stackoverflow上發問的,且我的帳號等級不夠無法提問,因此在此尋求答案。

#add data_augmentation
train_aug_datagen = ImageDataGenerator(
    rescale = 1./255,
    rotation_range = 20,
    shear_range = 0.1,
    zoom_range = 0.2,
    width_shift_range = 0.1,
    height_shift_range = 0.1,
    horizontal_flip = True
)

validation_datagen = ImageDataGenerator(rescale = 1./255)

def two_image_generator(generator, 
                        directory, 
                        batch_size,
                        shuffle = False,
                        img_size1 = (224,224), 
                        img_size2 = (299,299)):

    gen1 = generator.flow_from_directory(
        # This is the target directory
        directory,
        # All images will be resized to target height and width.
        target_size=img_size1,
        batch_size=batch_size,
        # Since we use categorical_crossentropy loss, we need categorical labels
        class_mode='categorical',
        shuffle = shuffle,
        seed = 7)

    gen2 = generator.flow_from_directory(
        # This is the target directory
        directory,
        # All images will be resized to target height and width.
        target_size=img_size2,
        batch_size=batch_size,
        # Since we use categorical_crossentropy loss, we need categorical labels
        class_mode='categorical',
        shuffle = shuffle,
        seed = 7)    

    while True:
        X1i = gen1.next()
        X2i = gen2.next()
        yield [X1i[0], X2i[0]], X2i[1]  #Yield both images and their mutual label    

train_generator = two_image_generator(train_aug_datagen, 
                                      train_dir,
                                      batch_size = batch_size,  
                                      shuffle = True)

validation_generator = two_image_generator(validation_datagen, 
                                      validation_dir,
                                      batch_size = batch_size,  
                                      shuffle = True)


history = model.fit_generator(train_generator,
                              steps_per_epoch= NUM_TRAIN //batch_size,
                              epochs=100,
                              validation_data=validation_generator,
                              validation_steps= NUM_TEST //batch_size,
                              verbose=1,
                              use_multiprocessing=True,
#                               workers=14,
                              callbacks=callbacks ) 
Epoch 1/100
Found 5000 images belonging to 2 classes.
Found 5000 images belonging to 2 classes.
Found 52700 images belonging to 2 classes.
Found 52700 images belonging to 2 classes.
340/625 [===============>..............] - ETA: 4:37 - loss: 7.7634 - acc: 0.4926

multiple inputs model ==> 應該使用 Funcational API 定義模型,可使用 concatenate layer 合併 inputs,可參閱:
https://keras.io/guides/functional_api/

nykd iT邦新手 5 級 ‧ 2022-06-10 01:18:23 檢舉

照keras官方說明.fit()是可以透過generator來輸入數據的,而我已再三確認過前面的two_image_generator()和最後.fit()應該是沒有錯誤,所以是我的multiple inputs CNN有錯誤嗎?
以下是我架構的multiple inputs CNN

# the first branch operates on the first input
input_origin = Input(shape = input_shape)
## 1
x = Conv2D(64, (3, 3), strides = 1, padding='same')(input_origin)
x = BatchNormalization(axis = -1)(x)
x = Activation(activation)(x)
x = MaxPooling2D(pool_size=(3, 3), strides = 2, padding='same')(x)
## 2
x = Conv2D(128, (3, 3), strides = 1, padding='same')(x)
x = BatchNormalization(axis = -1)(x)
x = Activation(activation)(x)
x = MaxPooling2D(pool_size=(3, 3), strides = 2, padding='same')(x)
## 3
x = Conv2D(256, (3, 3), strides = 1, padding='same')(x)
x = BatchNormalization(axis = -1)(x)
x = Activation(activation)(x)
x = MaxPooling2D(pool_size=(3, 3), strides = 2, padding='same')(x)
## 4
x = Conv2D(512, (3, 3), strides = 1, padding='same')(x)
x = BatchNormalization(axis = -1)(x)
x = Activation(activation)(x)
x = MaxPooling2D(pool_size=(3, 3), strides = 2, padding='same')(x)
## 5
x = Conv2D(512, (3, 3), strides = 1, padding='same')(x)
x = BatchNormalization(axis = -1)(x)
x = Activation(activation)(x)
x = MaxPooling2D(pool_size=(3, 3), strides = 2, padding='same')(x)

x = Model(inputs = input_origin, outputs = x)
# the second branch opreates on the second input
input_entropy = Input(shape = input_shape)
## 1
y = Conv2D(64, (3, 3), strides = 1, padding='same')(input_entropy)
y = BatchNormalization(axis = -1)(y)
y = Activation(activation)(y)
y = MaxPooling2D(pool_size=(3, 3), strides = 2, padding='same')(y)
## 2
y = Conv2D(128, (3, 3), strides = 1, padding='same')(y)
y = BatchNormalization(axis = -1)(y)
y = Activation(activation)(y)
y = MaxPooling2D(pool_size=(3, 3), strides = 2, padding='same')(y)
## 3
y = Conv2D(256, (3, 3), strides = 1, padding='same')(y)
y = BatchNormalization(axis = -1)(y)
y = Activation(activation)(y)
y = MaxPooling2D(pool_size=(3, 3), strides = 2, padding='same')(y)
## 4
y = Conv2D(512, (3, 3), strides = 1, padding='same')(y)
y = BatchNormalization(axis = -1)(y)
y = Activation(activation)(y)
y = MaxPooling2D(pool_size=(3, 3), strides = 2, padding='same')(y)
## 5
y = Conv2D(512, (3, 3), strides = 1, padding='same')(y)
y = BatchNormalization(axis = -1)(y)
y = Activation(activation)(y)
y = MaxPooling2D(pool_size=(3, 3), strides = 2, padding='same')(y)

y = Model(inputs = input_entropy, outputs = y)
# combine the output of the two branches
combined = concatenate([x.output, y.output])
# apply a FC layer and then a regression prediction on the
# combined outputs
z = Flatten()(combined)
z = Dense(4096)(z)
z = Dense(4096)(z)
z = Dense(class_num, activation='softmax')(z)
# our model will accept the inputs of the two branches and
# then output a single value
model = Model(inputs = [input_origin, input_entropy], outputs=z)

adagrad = tf.keras.optimizers.Adagrad(learning_rate = learning_rate)
model.compile(optimizer = adagrad, loss = 'categorical_crossentropy', metrics = ['accuracy'])

執行看看,沒有錯誤訊息,準確率符合預期,就OK。

nykd iT邦新手 5 級 ‧ 2022-06-10 13:05:19 檢舉

執行沒有錯誤訊息,但這樣變成一個5 classes 300 images的input,這樣只有一個input,就不是我想要的2個inputs了(不是共2個inputs 各有5 classes 150 images 的multiple inputs)。
請問我這樣的理解有誤嗎?

你定義的模型有2個input啊。
model = Model(inputs = [input_origin, input_entropy], outputs=z)

nykd iT邦新手 5 級 ‧ 2022-06-13 09:43:52 檢舉

抱歉,現在才回覆您。
我現在成功運行程式了,感謝您多日耐心的回覆以及指導。

讚!

我要發表回答

立即登入回答