iT邦幫忙

2025 iThome 鐵人賽

DAY 11
0
Cloud Native

K8s上的機器人沙盒系列 第 16

Day 16|把 Isaac + Selkies 放進同一個 Pod

  • 分享至 

  • xImage
  •  

承接 Day 15(Isaac Lab 容器化)。今天把 Isaac(Sim/Lab)Selkies sidecar 打包在同一個 Pod,以 Helm Chart 形式管理。本文提供可直接套用的 Chart 骨架values.yaml 範例,並說明探針(Probe)、資源、Ingress、TURN 整合與常見參數。

A. Chart 結構(可直接複製)

charts/isaac-selkies/
├─ Chart.yaml
├─ values.yaml
└─ templates/
   ├─ deployment.yaml
   ├─ service.yaml
   ├─ ingress.yaml
   ├─ configmap-env.yaml        # (選)集中環境變數
   └─ NOTES.txt

Chart.yaml

apiVersion: v2
name: isaac-selkies
version: 0.1.0
appVersion: "2023.1.1"
description: Isaac (Sim/Lab) + Selkies sidecar in one Pod

B. 預設 values.yaml(精簡但可用)

nameOverride: ""
fullnameOverride: ""

namespace: robotics

image:
  isaac:
    repository: nvcr.io/nvidia/isaac-sim
    tag: 2023.1.1
    pullPolicy: IfNotPresent
  selkies:
    repository: gcr.io/selkies-public/selkies-sidecar
    tag: latest
    pullPolicy: IfNotPresent

replicaCount: 1

resources:
  isaac:
    limits:
      nvidia.com/gpu: 1
      cpu: 4
      memory: 16Gi
    requests:
      cpu: 2
      memory: 8Gi
  selkies:
    limits:
      cpu: 2
      memory: 2Gi
    requests:
      cpu: 500m
      memory: 512Mi

persistence:
  enabled: true
  assets:
    claimName: pvc-isaac-assets
    mountPath: /isaac-sim/assets
    readOnly: true
  datasets:
    claimName: pvc-isaac-datasets
    mountPath: /isaac-sim/datasets
  cache:
    claimName: pvc-isaac-cache
    mountPath: /isaac-sim/cache
    readOnly: false

# WebRTC / TURN 設定(來自 Day 9)
turn:
  restUrl: https://turn.example.com/turn-cred
  secretName: turn-secret

webrtc:
  codec: h264
  width: 1280
  height: 720
  framerate: 30
  bitrateKbps: 4000

service:
  type: ClusterIP
  port: 8080

ingress:
  enabled: true
  className: nginx
  host: isaac.example.com
  tls:
    enabled: true
    secretName: isaac-tls
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-body-size: 100m

securityContext:
  isaac:
    runAsUser: 2000
    runAsGroup: 2000
    fsGroup: 2000
  selkies:
    runAsUser: 2000
    runAsGroup: 2000

nodeSelector: {}

# 針對 GPU 節點的容忍(選配)
tolerations:
  - key: "nvidia.com/gpu"
    operator: "Exists"
    effect: "NoSchedule"

affinity: {}

env:
  ACCEPT_EULA: "Y"
  PRIVACY_CONSENT: "Y"
  ISAACSIM_HEADLESS: "true"   # Streaming 走 Selkies;若要內建視覺化可改 false

C. Deployment(templates/deployment.yaml)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "isaac-selkies.fullname" . }}
  namespace: {{ .Values.namespace }}
  labels:
    app.kubernetes.io/name: {{ include "isaac-selkies.name" . }}
    app.kubernetes.io/instance: {{ .Release.Name }}
spec:
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      app.kubernetes.io/name: {{ include "isaac-selkies.name" . }}
      app.kubernetes.io/instance: {{ .Release.Name }}
  template:
    metadata:
      labels:
        app.kubernetes.io/name: {{ include "isaac-selkies.name" . }}
        app.kubernetes.io/instance: {{ .Release.Name }}
    spec:
      tolerations: {{- toYaml .Values.tolerations | nindent 6 }}
      nodeSelector: {{- toYaml .Values.nodeSelector | nindent 6 }}
      containers:
      - name: isaac
        image: "{{ .Values.image.isaac.repository }}:{{ .Values.image.isaac.tag }}"
        imagePullPolicy: {{ .Values.image.isaac.pullPolicy }}
        env:
        - name: ACCEPT_EULA
          value: {{ .Values.env.ACCEPT_EULA | quote }}
        - name: PRIVACY_CONSENT
          value: {{ .Values.env.PRIVACY_CONSENT | quote }}
        - name: ISAACSIM_HEADLESS
          value: {{ .Values.env.ISAACSIM_HEADLESS | quote }}
        resources: {{- toYaml .Values.resources.isaac | nindent 10 }}
        volumeMounts:
        {{- if .Values.persistence.enabled }}
        - name: assets
          mountPath: {{ .Values.persistence.assets.mountPath }}
          readOnly: {{ .Values.persistence.assets.readOnly | default true }}
        - name: datasets
          mountPath: {{ .Values.persistence.datasets.mountPath }}
        - name: cache
          mountPath: {{ .Values.persistence.cache.mountPath }}
        {{- end }}
        # 你的啟動命令,可換成 runheadless.sh / runapp.sh 等
        command: ["bash","-lc","./runheadless.sh"]

      - name: selkies
        image: "{{ .Values.image.selkies.repository }}:{{ .Values.image.selkies.tag }}"
        imagePullPolicy: {{ .Values.image.selkies.pullPolicy }}
        ports:
        - containerPort: {{ .Values.service.port }}
        env:
        - name: TURN_REST_URL
          value: {{ .Values.turn.restUrl | quote }}
        - name: DISPLAY
          value: ":0"
        - name: SELKIES_CODEC
          value: {{ .Values.webrtc.codec | quote }}
        - name: SELKIES_WIDTH
          value: {{ .Values.webrtc.width | quote }}
        - name: SELKIES_HEIGHT
          value: {{ .Values.webrtc.height | quote }}
        - name: SELKIES_FPS
          value: {{ .Values.webrtc.framerate | quote }}
        - name: SELKIES_BITRATE_KBPS
          value: {{ .Values.webrtc.bitrateKbps | quote }}
        resources: {{- toYaml .Values.resources.selkies | nindent 10 }}
        readinessProbe:
          httpGet:
            path: /healthz
            port: {{ .Values.service.port }}
          initialDelaySeconds: 10
          periodSeconds: 15
        livenessProbe:
          httpGet:
            path: /healthz
            port: {{ .Values.service.port }}
          initialDelaySeconds: 30
          periodSeconds: 30
        volumeMounts:
        {{- if .Values.persistence.enabled }}
        - name: assets
          mountPath: /assets
          readOnly: true
        - name: datasets
          mountPath: /datasets
        - name: cache
          mountPath: /cache
        {{- end }}

      volumes:
      {{- if .Values.persistence.enabled }}
      - name: assets
        persistentVolumeClaim:
          claimName: {{ .Values.persistence.assets.claimName }}
      - name: datasets
        persistentVolumeClaim:
          claimName: {{ .Values.persistence.datasets.claimName }}
      - name: cache
        persistentVolumeClaim:
          claimName: {{ .Values.persistence.cache.claimName }}
      {{- end }}

說明:

  • 雙容器同 Pod:Isaac(拿 GPU)+ Selkies(負責 WebRTC/串流)。
  • Probe:以 /healthz 作為 readiness/liveness;可改為自訂端點。
  • Volume:兩邊都可讀到 assets/datasets/cache;assets 預設唯讀。

D. Service(templates/service.yaml)

apiVersion: v1
kind: Service
metadata:
  name: {{ include "isaac-selkies.fullname" . }}
  namespace: {{ .Values.namespace }}
spec:
  type: {{ .Values.service.type }}
  selector:
    app.kubernetes.io/name: {{ include "isaac-selkies.name" . }}
    app.kubernetes.io/instance: {{ .Release.Name }}
  ports:
  - name: http
    port: {{ .Values.service.port }}
    targetPort: {{ .Values.service.port }}

E. Ingress(templates/ingress.yaml)

{{- if .Values.ingress.enabled }}
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: {{ include "isaac-selkies.fullname" . }}
  namespace: {{ .Values.namespace }}
  annotations:
    {{- toYaml .Values.ingress.annotations | nindent 4 }}
spec:
  ingressClassName: {{ .Values.ingress.className }}
  {{- if .Values.ingress.tls.enabled }}
  tls:
  - hosts: [{{ .Values.ingress.host | quote }}]
    secretName: {{ .Values.ingress.tls.secretName }}
  {{- end }}
  rules:
  - host: {{ .Values.ingress.host }}
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: {{ include "isaac-selkies.fullname" . }}
            port:
              number: {{ .Values.service.port }}
{{- end }}

F. 安裝測試

# 新建命名空間
kubectl create ns robotics || true

# 安裝 Chart(以資料夾方式)
helm upgrade --install isaac-selkies ./charts/isaac-selkies \
  -n robotics \
  --set ingress.host=isaac.example.com \
  --set turn.restUrl=https://turn.example.com/turn-cred

# 驗收
kubectl get pods -n robotics -w
kubectl logs -n robotics deploy/isaac-selkies -c selkies | tail -n 50
kubectl exec -n robotics deploy/isaac-selkies -c isaac -- nvidia-smi

瀏覽器開啟 https://isaac.example.com

  • 有畫面且 webrtc-internals 顯示 CandidatePair 成功。
  • nvidia-smi 有負載(NVENC/Compute 使用)。

G. 常見問題

  • 黑畫面

    • 檢查 Ingress annotations 是否允許 WebSocket/長連線;
    • 瀏覽器是否支援指定 codec;
    • TURN 是否可用(Day 9)。
  • GPU 未分配給 Isaac

    • 確認 resources.limits.nvidia.com/gpu: 1
    • GPU Operator 元件是否全部 Ready。
  • 延遲過高

    • 可能走了 TURN(relay);就近部署多區 TURN(Day 13)。
    • 調低解析度/FPS 或增加 bitrate(Day 12)。
  • 檔案權限問題(NFS root_squash)

    • runAsUser/runAsGroup/fsGroup 統一 UID/GID(Day 5)。

實用參數

  • .Values.webrtc.codech264|vp9|av1
  • .Values.webrtc.framerate30|60
  • .Values.webrtc.bitrateKbps:依網路調整(2,000–8,000)。
  • .Values.env.ISAACSIM_HEADLESStrue(搭 Selkies)或 false(內建視覺化)。

感想

メダル貰えない事になりましたQAQ


上一篇
Day 15|Isaac Lab 容器化:RL/IL 任務、Python 相依、CUDA/cuDNN 驗證
系列文
K8s上的機器人沙盒16
圖片
  熱門推薦
圖片
{{ item.channelVendor }} | {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言