[AI 影像處理 30天] [Day 04] 深究深度圖生成套件 stable-diffusion-webui-depthmap-script 的用途

2024 iThome 鐵人賽

DAY 4

AI/ ML & Data

AI 影像處理 30天系列第 4 篇

[AI 影像處理 30天] [Day 04] 深究深度圖生成套件 stable-diffusion-webui-depthmap-script 的用途

16th鐵人賽

twm_pt_dat

2024-09-16 10:28:23

481 瀏覽

分享至

High Resolution Depth Maps for Stable Diffusion WebUI 是個能夠生成高解析度深度圖的 Python 套件。該套件利用多個不同的深度估算模型，如 Marigold、MiDaS、ZoeDepth 和 LeReS 來生成現實感極強的深度圖。通過多解析度融合技術，該套件可以生成高解析度的深度圖，進一步提升結果的精度。

深度圖範例

screenshot

使用該套件來生成圖片的深度圖可用於以下幾個場景：

3D 重建與虛擬現實 (VR)

深度圖可以用來從單張圖片中推測出場景的三維結構，進而應用於 3D 建模和 VR 場景的生成。這在遊戲開發和電影製作中非常有用，可以讓平面圖片轉化為沉浸式的 3D 體驗。

自動駕駛與機器人導航

深度圖在自動駕駛和機器人導航中扮演重要角色。AI 可以利用深度信息來辨識和預測前方物體的距離，以避免碰撞並做出適當的路徑規劃。

AR 應用

在擴增實境 (AR) 中，深度圖可以幫助應用更好地理解現實世界中的場景結構，從而在合適的位置投影虛擬物件。

醫學影像處理

醫學影像中，深度圖可以用於立體顯示病灶的形狀或結構，例如腫瘤的體積估算，從而輔助醫生進行診斷。

攝影後製與背景替換

透過深度圖，可以實現更加精確的背景分離與替換，例如在攝影後製中實現背景模糊（景深效果）或將前景人物與背景分離。

為了方便理解，本篇將以應用範圍最廣的攝影後製與背景替換做為例子來進行講解。

以套件自帶的 GUI 取得深度圖

假設這是預計要替換背景的場景圖。
kitchen

我們可以用該套件原本就有的 GUI 生成它的深度圖。
gui

但當我們要進行批次或自動化時如何用程式碼完成這件事就很關鍵了!!!

以套件裡面的程式碼取得深度圖

首先用 git clone 命令將 stable-diffusion-webui-depthmap-script repo 複製下來再 cd 過去。

git clone https://github.com/thygate/stable-diffusion-webui-depthmap-script.git
cd stable-diffusion-webui-depthmap-script

然後在 stable-diffusion-webui-depthmap-script 裡面用 pip 安裝所需套件。

pip install -q -r requirements.txt

requirements.txt 中缺少了一個名為 einops 的必要套件所以要記得安裝。

pip install -q einops

安裝完成後 import 相關套件。

import gc
import cv2
import torch
import numpy as np
from PIL import Image
from src import backbone
import matplotlib.pyplot as plt
from src.depthmap_generation import ModelHolder

接著就是關鍵函式的撰寫啦~

def convert_to_i16(arr: np.array) -> np.array:
    # Single channel, 16 bit image. This loses some precision!
    # uint16 conversion uses round-down, therefore values should be [0; 2**16)
    max_val = 2 ** 16
    out = np.clip(arr * max_val + 0.0001, 0, max_val - 0.1)  # -0.1 from above is needed to avoid overflowing
    return out.astype("uint16")
    
def get_depthmap(original_img: Image, model_holder: ModelHolder) -> Image:

    try:
        # Convert single channel input (PIL) images to rgb
        if original_img.mode == 'I':
            original_img.point(lambda p: p * 0.0039063096, mode='RGB')
            original_img = original_img.convert('RGB')

        # Round up to a multiple of 32 to avoid potential issues
        net_width = (original_img.width + 31) // 32 * 32
        net_height = (original_img.height + 31) // 32 * 32

        # predict is being done here!
        raw_prediction, raw_prediction_invert = model_holder.get_raw_prediction(original_img, net_width, net_height)

        # output
        if abs(raw_prediction.max() - raw_prediction.min()) > np.finfo("float").eps:
            out = np.copy(raw_prediction)
            if raw_prediction_invert:
                out *= -1
            out = (out - out.min()) / (out.max() - out.min())  # normalize to [0; 1]
        else:
            out = np.zeros(raw_prediction.shape) # Regretfully, the depthmap is broken and will be replaced with a black image

        img_output = convert_to_i16(out)

    except Exception as e:
        raise e

    finally:
        gc.collect()
        backbone.torch_gc()

    return Image.fromarray(img_output)

convert_to_i16 用於圖片格式的轉換，get_depthmap 才是關鍵中的關鍵函式。
眼尖的人或許已經發現該函式需要有個名為 ModelHolder 的類別傳入，那麼這個類別怎麼初始化?

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_holder = ModelHolder()
ops = backbone.gather_ops()
model_holder.update_settings(**ops)
model_holder.ensure_models(0, device, False)