[AI 影像處理 30天] [Day 06] 前後景分離 (下) : 以 Segment Anything Model 搭配二值化圖像取出前景

2024 iThome 鐵人賽

DAY 6

AI/ ML & Data

AI 影像處理 30天系列第 6 篇

[AI 影像處理 30天] [Day 06] 前後景分離 (下) : 以 Segment Anything Model 搭配二值化圖像取出前景

16th鐵人賽

twm_pt_dat

2024-09-18 10:31:34

1114 瀏覽

分享至

昨天我們將場景的深度圖進行二值化以達到不精確的前後景分離效果，今天我們要用 Meta 公司推出的開源模型 Segment Anything Model (以下簡稱 SAM) 來將前後景完全分離~

binarized

Segment Anything Model (SAM) 介紹

Segment Anything Model (SAM) 是由 Meta AI Research 團隊開發的一個強大的圖像分割工具。該模型能夠根據輸入提示（如點或框）生成高質量的物體遮罩，甚至可以生成圖像中所有物體的遮罩。SAM 是在一個包含 1100 萬張圖像和 11 億個遮罩的龐大數據集上訓練而成的，並在多種分割任務中展現了出色的零樣本性能。無論是用於互動式分割，還是全自動生成遮罩，SAM 都能提供高效且精確的解決方案。該模型支援 ONNX 格式的導出，方便在瀏覽器等環境中運行。無論是手動選擇感興趣的區域還是自動分割整個圖像，它都能做到。

如何使用 SAM ?

安裝

pip install git+https://github.com/facebookresearch/segment-anything.git

import

import os
import cv2
import torch
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
from IPython.display import display
from segment_anything import sam_model_registry, SamPredictor, SamAutomaticMaskGenerator

變數設置 & 模型下載

USE_DEPTHMAP = True  # True: use depthmap, False: automatically detect

# Define paths to SAM checkpoints,
# "sam_vit_b" is the base model,
# "sam_vit_l" is the large model,
# "sam_vit_h" is the huge model,
# read README.md in folder /SAM for more information
SAM_CKPT_B = "sam/sam_vit_b_01ec64.pth"
SAM_CKPT_L = "sam/sam_vit_l_0b3195.pth"
SAM_CKPT_H = "sam/sam_vit_h_4b8939.pth"
download_link = {
    SAM_CKPT_B: "https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth",
    SAM_CKPT_L: "https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth",
    SAM_CKPT_H: "https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth",
}

# Select SAM model type
MODEL_TYPE = "vit_h"
MODEL_WEIGHTS = SAM_CKPT_H
if not os.path.exists(MODEL_WEIGHTS):
  !wget {download_link[MODEL_WEIGHTS]} -O {MODEL_WEIGHTS}
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# Load SAM model
SAM = sam_model_registry[MODEL_TYPE](checkpoint=MODEL_WEIGHTS)
SAM.to(device=DEVICE)

# Select SAM predictor
PREDICTOR = SamPredictor(SAM) if USE_DEPTHMAP else SamAutomaticMaskGenerator(SAM)

這是變數設置 & 模型下載過程，步驟說明如下:

USE_DEPTHMAP 這個布林變數決定是否使用深度圖進行分割。如果設置為 True，系統將使用深度圖來輔助圖像分割；如果設置為 False，則 SAM 模型將自動檢測和生成遮罩。
SAM_CKPT_B、SAM_CKPT_L、AM_CKPT_H：這三個變數分別定義了 SAM 的三個版本的檢查點路徑，這些檢查點文件包含了預先訓練的模型權重。
download_link 這個字典將模型檢查點的本地路徑與其下載鏈接對應起來，方便根據需要下載模型。
MODEL_TYPE 這個變數指定了使用哪一種 SAM 類型 (vit_b, vit_l, vit_h)。
MODEL_WEIGHTS 這個變數對應選擇的模型類型的檢查點路徑。
if not os.path.exists(MODEL_WEIGHTS) 這段程式碼檢查指定的模型檢查點文件是否已經存在於本地。如果不存在，則使用 wget 命令下載相應的模型權重文件 (僅適用於 Colab & Jupyter Notebook)。
DEVICE 這個變數根據是否有可用的 CUDA 設備來決定使用 GPU 或 CPU。如果系統上有可用的 GPU，則設置為 cuda，否則設置為 cpu。

以 SAM 取得背景遮罩

def get_result_from_SAM(original_img: Image, input_box=tuple()) -> np.array:
    global PREDICTOR

    image = np.array(original_img.convert('RGB'))

    if isinstance(PREDICTOR, SamPredictor):
        PREDICTOR.set_image(image)
        mask, _, _ = PREDICTOR.predict(box=input_box[None, :], multimask_output=False)

    elif isinstance(PREDICTOR, SamAutomaticMaskGenerator):
        masks = PREDICTOR.generate(image)
        mask = sorted(masks, key=(lambda x: x['area']), reverse=True)[0]

    else:
        raise TypeError("PREDICTOR should be SamPredictor or SamAutomaticMaskGenerator!")

    return mask

這是從 SAM 中獲取分割結果的重點函式，步驟說明如下:

將傳入的 original_img 轉換為 RGB 格式，然後轉換為 NumPy 陣列以便於後續處理。

這邊的 original_img 是原本的照片而非二值化過後的圖。
根據使用的 PREDICTOR 類型（SamPredictor 或 SamAutomaticMaskGenerator）回傳不同的遮罩，如果類型是後者的話 SAM 會自己將整張圖分成多個遮罩並回傳最大面積的那個 (多數情況下該遮罩就是背景了)，否則根據傳入的 input_box (一個表示框框範圍 NumPy 陣列) 給予唯一的遮罩。

上述函式的 `input_box` 怎麼來?

剛剛提到 SAM 可以把傳入的框框範圍當作 prompt 去找出唯一的遮罩，像是這樣:
input_box

為了取得那個框框範圍，此時就要用上昨天獲得的二值化黑白圖了!

def find_min_bounding_box_from_img(image: Image) -> np.array:
    image_array = np.array(image.convert('L'))  # convert Image into np.array
    image_array = 255 - image_array  # invert black & white since cv2.findContours finds the white part only
    contours, _ = cv2.findContours(image_array, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    x, y, w, h = cv2.boundingRect(np.concatenate(contours))
    return np.array([x, y, x + w, y + h])

透過以上這個函式，可以從黑白圖中獲取能包住所有黑色像素的最小框框 (以 NumPy 陣列表示的四個點)，再將這個框框座標傳入 get_result_from_SAM 就能獲取一個精確的背景遮罩啦~