iT邦幫忙

0

CNN分類 結果錯誤 但錯得很一致

  • 分享至 

  • xImage

我這程式碼是使用預訓練模型如InceptionV3、VGG16等做狗的品種分類,但我運行時遇到一個很奇妙的問題,在訓練集、驗證集的結果都很不錯,有達到95%以上,但 當我自己使用資料測試並將結果輸出到EXCEL時,有些種類是全答對(我輸入A的圖片答B),有些種類是全答錯(我輸入B的圖片答C,或C的出現率非常高 ),我懷疑是在編碼時的順序出了問題,我不論使用InceptionV3、VGG16都得到類似的結果,但我實在找不出問題出在哪裡,將前面的資料Print出來順序也都是對的, 不知是哪裡出了問題

上傳bloodhood的圖片全錯,老實說這沒甚麼,但神奇的是,我找很多不同的bloodhood的圖片,跑出來都是一樣的結果,都是french bulldog,也就是說他是能辨認出來的
目前測15個,有兩個是對的,其他狀況都一樣,我找同品種不同的照片,都能辨識出來,但是是錯的結果
也就是說: 如果原本給A要答B,但今天我輸入了10張A的圖片,得到的都是C,不是說有B有C有D這樣,他錯得很一致
https://ithelp.ithome.com.tw/upload/images/20231222/20119136sFYJjyDKxO.png

上傳Airedale的圖片全對
https://ithelp.ithome.com.tw/upload/images/20231222/20119136BVxfR8KgRk.png

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# 讀取標籤檔案
labels_df = pd.read_csv("./archive/dogs.csv")
# 選定類別
selected_breeds = ["Airedale", "Beagle", "Bloodhound", "Bluetick", "Chihuahua", "Collie", "Dingo", 
                    "French Bulldog", "German Sheperd", "Malinois", "Newfoundland", "Pekinese", 
                    "Pomeranian", "Pug", "Vizsla"]

# 篩選所需類別
filtered_labels = labels_df[labels_df['labels'].isin(selected_breeds)]
# 將訓練集、驗證集、測試集分割
train_df = filtered_labels[filtered_labels['data set']=='train'].copy()
test_df = filtered_labels[filtered_labels['data set']=='test'].copy()
valid_df = filtered_labels[filtered_labels['data set']=='valid'].copy()
# 資料集路徑
data_dir = "./archive/"

# 資料預處理
label_encoder = LabelEncoder()
train_df['encoded_labels'] = label_encoder.fit_transform(train_df['labels']).astype(str)
valid_df['encoded_labels'] = label_encoder.transform(valid_df['labels']).astype(str)
# 構建完整的檔案路徑
train_df['filepaths'] = data_dir + train_df['filepaths']
valid_df['filepaths'] = data_dir + valid_df['filepaths']
test_df['filepaths'] = data_dir + test_df['filepaths']
# 使用ImageDataGenerator進行數據增強和載入
datagen = ImageDataGenerator(
    rescale=1./255,
)

train_generator = datagen.flow_from_dataframe(
    train_df,
    x_col='filepaths',
    y_col='encoded_labels',
    target_size=(224, 224),
    batch_size=32,
    class_mode='sparse'
)

val_generator = datagen.flow_from_dataframe(
    valid_df,
    x_col='filepaths',
    y_col='encoded_labels',
    target_size=(224, 224),
    batch_size=32,
    class_mode='sparse'
)
from tensorflow.keras.applications import VGG16
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Dropout
from tensorflow.keras.optimizers import RMSprop,Adam
from sklearn.model_selection import train_test_split
from tensorflow.keras.regularizers import l2
from tensorflow.keras.callbacks import LearningRateScheduler
from tensorflow.keras.applications import InceptionV3



# 使用 VGG16 預訓練模型
# base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
# 使用 InceptionV3 預訓練模型
base_model = InceptionV3(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

# 凍結預訓練模型的權重
for layer in base_model.layers:
    layer.trainable = False
# #逐次降低學習率
# def lr_scheduler(epoch):
#     return 0.0005 * pow(0.9, epoch // 5)
# lr_schedule = LearningRateScheduler(lr_scheduler)
# 建立新的模型,添加全連接層
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(15, activation='softmax'))
model.summary()

# 編譯模型
model.compile(optimizer = RMSprop(learning_rate=0.0001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# 訓練模型
model.fit(train_generator, epochs=25, validation_data=val_generator)
# 使用測試集進行預測
test_datagen = ImageDataGenerator(rescale=1./255)
test_generator = test_datagen.flow_from_dataframe(
    test_df,
    x_col='filepaths',
    target_size=(224, 224),
    batch_size=32,
    class_mode=None,
    shuffle=False
)

predictions = model.predict(test_generator)

# 使用驗證集進行性能評估
val_metrics = model.evaluate(val_generator)

# 取得 Accuracy
val_accuracy = val_metrics[1]

# 印出 Accuracy
print("Validation Set Accuracy: {:.2f}%".format(val_accuracy * 100))
import os
import pandas as pd
import numpy as np
from tensorflow.keras.preprocessing import image
from PIL import Image

# 檔案夾路徑
testing_data_folder = "./test_data3/Bloodhound"

# 檔名列表
file_names = [f for f in os.listdir(testing_data_folder) if f.endswith('.jpg')]

# 建立 DataFrame
result_df = pd.DataFrame({'檔名': file_names})

# 製作測試資料集
test_data = []

# 收集預測機率的列表
probabilities = []

for file_name in file_names:
    img_path = os.path.join(testing_data_folder, file_name)
    img = Image.open(img_path)
    img = img.resize((224, 224))
    img_array = image.img_to_array(img)
    img_array = img_array / 255.0  # 歸一化
    test_data.append(img_array)

test_data = np.array(test_data)

# 使用模型進行預測
predictions = model.predict(test_data)

# 類別名稱
# class_names = ["Airedale", "Beagle", "Bloodhound", "Bluetick", "Chihuahua", "Collie", "Dingo",
#                "Bloodhound", "German Sheperd", "Malinois", "Newfoundland", "Pekinese",
#                "Pomeranian", "Pug", "Vizsla"]
# class_names = ["Airedale", "Beagle", "Bloodhound", "Bluetick", "Chihuahua", "Collie", "Dingo",
#                "French Bulldog", "German Sheperd", "Malinois", "Newfoundland", "Pekinese",
#                "Pomeranian", "Pug", "Vizsla"]
# 印出訓練集中每個類別的編碼
print(predictions)

# 使用預測結果對應到的類別標籤
predicted_labels = label_encoder.inverse_transform(predictions.argmax(axis=1))

# 將預測結果加入 DataFrame
result_df['結果'] = predicted_labels

# 將結果保存到 Excel 文件
result_df.to_excel("test_data.xlsx", index=False)

print("結果已成功輸出到 test_data.xlsx 檔案中。")
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

1 個回答

0
增廣建文
iT邦研究生 5 級 ‧ 2023-12-24 11:18:59

我自己的思路會是

  1. Training和validation set中的Airedale, bloodhood和french bulldog的比例各占多少?
  2. 再來就是上面三個類別在validation set的準確度分別有多少?

如果bloodhood本來就不準然後Airedale學得很好,有你現在遇到的狀況感覺蠻合理的

我要發表回答

立即登入回答