iT邦幫忙

2025 iThome 鐵人賽

DAY 12
0

小明的自動化田間管理員

昨天我們認識了 GitHub 這個數位倉庫,今天要學習 GitHub Runner - 一個自動化的田間管理員。就像爸爸以前要親自到田裡檢查作物、澆水、施肥,現在我們有了 GitHub Runner,可以自動執行部署到生產環境的所有工作!

什麼是 GitHub Runner?

GitHub Runner 是執行 GitHub Actions 工作流程的計算環境:

  • GitHub-hosted runners:GitHub 提供的雲端執行環境
  • Self-hosted runners:我們自己管理的執行環境
  • 執行任務:建置、測試、部署等自動化任務

Runner 類型比較

Runner 類型比較

我們的生產部署架構

部署流程設計

部署流程設計

生產環境部署 Workflow

完整的 CD Pipeline

# .github/workflows/deploy-prod.yml
name: Deploy to Production

on:
  push:
    branches: [ main ]
  workflow_dispatch:  # 允許手動觸發

env:
  AWS_REGION: us-east-1
  ECR_REPOSITORY: trading-bot
  ECS_SERVICE: trading-bot-service
  ECS_CLUSTER: trading-cluster
  CONTAINER_NAME: trading-bot

jobs:
  # 預檢階段
  pre-checks:
    runs-on: ubuntu-latest
    outputs:
      should-deploy: ${{ steps.check.outputs.should-deploy }}
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v3
      with:
        fetch-depth: 2
    
    - name: Check if deployment needed
      id: check
      run: |
        # 檢查是否有重要檔案變更
        CHANGED_FILES=$(git diff --name-only HEAD^ HEAD)
        if echo "$CHANGED_FILES" | grep -E "(src/|Dockerfile|requirements.txt)"; then
          echo "should-deploy=true" >> $GITHUB_OUTPUT
        else
          echo "should-deploy=false" >> $GITHUB_OUTPUT
        fi
    
    - name: Check deployment conditions
      run: |
        # 檢查是否在交易時間外
        CURRENT_HOUR=$(date +%H)
        if [ $CURRENT_HOUR -ge 9 ] && [ $CURRENT_HOUR -le 16 ]; then
          echo "⚠️ 正在交易時間,請謹慎部署"
        fi

  # 安全檢查
  security-scan:
    runs-on: ubuntu-latest
    needs: pre-checks
    if: needs.pre-checks.outputs.should-deploy == 'true'
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v3
    
    - name: Run Trivy vulnerability scanner
      uses: aquasecurity/trivy-action@master
      with:
        scan-type: 'fs'
        scan-ref: '.'
        format: 'sarif'
        output: 'trivy-results.sarif'
    
    - name: Upload Trivy scan results
      uses: github/codeql-action/upload-sarif@v2
      with:
        sarif_file: 'trivy-results.sarif'

  # 建置和部署
  deploy:
    runs-on: ubuntu-latest
    needs: [pre-checks, security-scan]
    if: needs.pre-checks.outputs.should-deploy == 'true'
    environment: production
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v3
    
    - name: Configure AWS credentials
      uses: aws-actions/configure-aws-credentials@v2
      with:
        aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws-region: ${{ env.AWS_REGION }}
        role-to-assume: ${{ secrets.AWS_DEPLOY_ROLE_ARN }}
        role-session-name: GitHubActions-Deploy
    
    - name: Login to Amazon ECR
      id: login-ecr
      uses: aws-actions/amazon-ecr-login@v1
    
    - name: Extract metadata
      id: meta
      uses: docker/metadata-action@v4
      with:
        images: ${{ steps.login-ecr.outputs.registry }}/${{ env.ECR_REPOSITORY }}
        tags: |
          type=ref,event=branch
          type=sha,prefix={{branch}}-
          type=raw,value=latest,enable={{is_default_branch}}
    
    - name: Build and push Docker image
      uses: docker/build-push-action@v4
      with:
        context: .
        push: true
        tags: ${{ steps.meta.outputs.tags }}
        labels: ${{ steps.meta.outputs.labels }}
        cache-from: type=gha
        cache-to: type=gha,mode=max
    
    - name: Download task definition
      run: |
        aws ecs describe-task-definition \
          --task-definition ${{ env.ECS_SERVICE }} \
          --query taskDefinition > task-definition.json
    
    - name: Update task definition
      id: task-def
      uses: aws-actions/amazon-ecs-render-task-definition@v1
      with:
        task-definition: task-definition.json
        container-name: ${{ env.CONTAINER_NAME }}
        image: ${{ steps.login-ecr.outputs.registry }}/${{ env.ECR_REPOSITORY }}:${{ github.sha }}
    
    - name: Deploy to ECS
      uses: aws-actions/amazon-ecs-deploy-task-definition@v1
      with:
        task-definition: ${{ steps.task-def.outputs.task-definition }}
        service: ${{ env.ECS_SERVICE }}
        cluster: ${{ env.ECS_CLUSTER }}
        wait-for-service-stability: true
        wait-for-minutes: 10
    
    - name: Verify deployment
      run: |
        # 等待服務穩定
        sleep 30
        
        # 獲取 ALB DNS 名稱
        ALB_DNS=$(aws elbv2 describe-load-balancers \
          --names trading-bot-alb \
          --query 'LoadBalancers[0].DNSName' \
          --output text)
        
        # 健康檢查
        for i in {1..10}; do
          if curl -f "http://$ALB_DNS/health"; then
            echo "✅ 部署驗證成功"
            exit 0
          fi
          echo "等待服務啟動... ($i/10)"
          sleep 30
        done
        
        echo "❌ 部署驗證失敗"
        exit 1

  # 部署後測試
  post-deployment-tests:
    runs-on: ubuntu-latest
    needs: deploy
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v3
    
    - name: Run integration tests
      run: |
        # 設定測試環境變數
        export API_BASE_URL="https://api.trading.example.com"
        export TEST_API_KEY="${{ secrets.TEST_API_KEY }}"
        
        # 執行整合測試
        python -m pytest tests/integration/ -v
    
    - name: Performance test
      run: |
        # 簡單的效能測試
        echo "執行效能測試..."
        curl -w "回應時間: %{time_total}s\n" \
             -o /dev/null -s \
             "https://api.trading.example.com/health"

  # 通知
  notify:
    runs-on: ubuntu-latest
    needs: [deploy, post-deployment-tests]
    if: always()
    
    steps:
    - name: Notify Telegram on success
      if: ${{ needs.deploy.result == 'success' && needs.post-deployment-tests.result == 'success' }}
      run: |
        curl -X POST "https://api.telegram.org/bot${{ secrets.TELEGRAM_BOT_TOKEN }}/sendMessage" \
             -H "Content-Type: application/json" \
             -d '{
               "chat_id": "${{ secrets.TELEGRAM_CHAT_ID }}",
               "text": "🎉 生產環境部署成功!\n\n📦 版本: ${{ github.sha }}\n🕒 時間: $(date)\n🔗 [查看詳情](https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }})"
             }'
    
    - name: Notify Telegram on failure
      if: ${{ needs.deploy.result == 'failure' || needs.post-deployment-tests.result == 'failure' }}
      run: |
        curl -X POST "https://api.telegram.org/bot${{ secrets.TELEGRAM_BOT_TOKEN }}/sendMessage" \
             -H "Content-Type: application/json" \
             -d '{
               "chat_id": "${{ secrets.TELEGRAM_CHAT_ID }}",
               "text": "❌ 生產環境部署失敗!\n\n📦 版本: ${{ github.sha }}\n🕒 時間: $(date)\n🔗 [查看詳情](https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }})"
             }'

環境管理和保護

Production Environment 設定

Production Environment 設定

環境保護規則設定

# 在 GitHub Repository Settings > Environments 中設定
environment:
  name: production
  protection_rules:
    required_reviewers:
      - "senior-developer-team"
    wait_timer: 5  # 等待 5 分鐘
    deployment_branch_policy:
      protected_branches: true
      custom_branch_policies: false

藍綠部署策略

藍綠部署實作

# .github/workflows/blue-green-deploy.yml
name: Blue-Green Deployment

on:
  workflow_dispatch:
    inputs:
      environment:
        description: 'Target environment'
        required: true
        default: 'production'
        type: choice
        options:
        - production
        - staging

jobs:
  deploy:
    runs-on: ubuntu-latest
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v3
    
    - name: Determine current environment
      id: current-env
      run: |
        # 檢查目前運行的是藍色還是綠色環境
        CURRENT_COLOR=$(aws elbv2 describe-target-groups \
          --load-balancer-arn ${{ secrets.ALB_ARN }} \
          --query 'TargetGroups[?contains(TargetGroupName, `blue`)].TargetGroupName' \
          --output text)
        
        if [ -n "$CURRENT_COLOR" ]; then
          echo "current=blue" >> $GITHUB_OUTPUT
          echo "target=green" >> $GITHUB_OUTPUT
        else
          echo "current=green" >> $GITHUB_OUTPUT
          echo "target=blue" >> $GITHUB_OUTPUT
        fi
    
    - name: Deploy to target environment
      run: |
        TARGET_COLOR=${{ steps.current-env.outputs.target }}
        
        # 更新目標環境的 ECS 服務
        aws ecs update-service \
          --cluster trading-cluster \
          --service trading-bot-$TARGET_COLOR \
          --task-definition trading-bot-task:${{ github.run_number }}
        
        # 等待服務穩定
        aws ecs wait services-stable \
          --cluster trading-cluster \
          --services trading-bot-$TARGET_COLOR
    
    - name: Run smoke tests
      run: |
        TARGET_COLOR=${{ steps.current-env.outputs.target }}
        
        # 對新環境執行煙霧測試
        python scripts/smoke_tests.py \
          --target-group trading-bot-$TARGET_COLOR \
          --timeout 300
    
    - name: Switch traffic
      run: |
        TARGET_COLOR=${{ steps.current-env.outputs.target }}
        
        # 切換 ALB 流量到新環境
        aws elbv2 modify-listener \
          --listener-arn ${{ secrets.ALB_LISTENER_ARN }} \
          --default-actions Type=forward,TargetGroupArn=${{ secrets.TARGET_GROUP_ARN_PREFIX }}$TARGET_COLOR
        
        echo "✅ 流量已切換到 $TARGET_COLOR 環境"

自動回滾機制

失敗檢測和回滾

# .github/workflows/auto-rollback.yml
name: Auto Rollback

on:
  workflow_run:
    workflows: ["Deploy to Production"]
    types:
      - completed

jobs:
  rollback:
    runs-on: ubuntu-latest
    if: ${{ github.event.workflow_run.conclusion == 'failure' }}
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v3
    
    - name: Configure AWS credentials
      uses: aws-actions/configure-aws-credentials@v2
      with:
        aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws-region: us-east-1
    
    - name: Get previous stable version
      id: previous-version
      run: |
        # 獲取前一個穩定版本
        PREVIOUS_TASK_DEF=$(aws ecs describe-services \
          --cluster trading-cluster \
          --services trading-bot-service \
          --query 'services[0].taskDefinition' \
          --output text)
        
        # 獲取前一個版本的 revision
        CURRENT_REVISION=$(echo $PREVIOUS_TASK_DEF | grep -o '[0-9]*$')
        PREVIOUS_REVISION=$((CURRENT_REVISION - 1))
        
        echo "previous-revision=$PREVIOUS_REVISION" >> $GITHUB_OUTPUT
    
    - name: Rollback to previous version
      run: |
        echo "🔄 執行自動回滾到版本 ${{ steps.previous-version.outputs.previous-revision }}"
        
        aws ecs update-service \
          --cluster trading-cluster \
          --service trading-bot-service \
          --task-definition trading-bot-task:${{ steps.previous-version.outputs.previous-revision }}
        
        # 等待回滾完成
        aws ecs wait services-stable \
          --cluster trading-cluster \
          --services trading-bot-service
    
    - name: Verify rollback
      run: |
        # 驗證回滾是否成功
        sleep 60
        
        ALB_DNS=$(aws elbv2 describe-load-balancers \
          --names trading-bot-alb \
          --query 'LoadBalancers[0].DNSName' \
          --output text)
        
        if curl -f "http://$ALB_DNS/health"; then
          echo "✅ 回滾驗證成功"
        else
          echo "❌ 回滾驗證失敗"
          exit 1
        fi
    
    - name: Notify rollback
      run: |
        curl -X POST "https://api.telegram.org/bot${{ secrets.TELEGRAM_BOT_TOKEN }}/sendMessage" \
             -H "Content-Type: application/json" \
             -d '{
               "chat_id": "${{ secrets.TELEGRAM_CHAT_ID }}",
               "text": "🔄 自動回滾已執行\n\n📦 回滾到版本: ${{ steps.previous-version.outputs.previous-revision }}\n🕒 時間: $(date)\n❗ 原因: 部署失敗"
             }'

監控和告警整合

部署監控

# scripts/deployment_monitor.py
import boto3
import time
import requests
from datetime import datetime, timedelta

class DeploymentMonitor:
    def __init__(self, cluster_name, service_name, alb_dns):
        self.ecs = boto3.client('ecs')
        self.cloudwatch = boto3.client('cloudwatch')
        self.cluster_name = cluster_name
        self.service_name = service_name
        self.alb_dns = alb_dns
    
    def check_service_health(self):
        """檢查 ECS 服務健康狀態"""
        try:
            response = self.ecs.describe_services(
                cluster=self.cluster_name,
                services=[self.service_name]
            )
            
            service = response['services'][0]
            running_count = service['runningCount']
            desired_count = service['desiredCount']
            
            return running_count == desired_count
        except Exception as e:
            print(f"服務健康檢查失敗: {e}")
            return False
    
    def check_endpoint_health(self):
        """檢查應用程式端點健康狀態"""
        try:
            response = requests.get(f"http://{self.alb_dns}/health", timeout=10)
            return response.status_code == 200
        except Exception as e:
            print(f"端點健康檢查失敗: {e}")
            return False
    
    def monitor_deployment(self, timeout=600):
        """監控部署過程"""
        start_time = datetime.now()
        
        while datetime.now() - start_time < timedelta(seconds=timeout):
            service_healthy = self.check_service_health()
            endpoint_healthy = self.check_endpoint_health()
            
            if service_healthy and endpoint_healthy:
                print("✅ 部署監控通過")
                return True
            
            print(f"⏳ 等待部署完成... 服務健康: {service_healthy}, 端點健康: {endpoint_healthy}")
            time.sleep(30)
        
        print("❌ 部署監控超時")
        return False

if __name__ == "__main__":
    import sys
    
    monitor = DeploymentMonitor(
        cluster_name=sys.argv[1],
        service_name=sys.argv[2],
        alb_dns=sys.argv[3]
    )
    
    success = monitor.monitor_deployment()
    sys.exit(0 if success else 1)

效能最佳化

Runner 效能優化

# 使用快取加速建置
- name: Cache Docker layers
  uses: actions/cache@v3
  with:
    path: /tmp/.buildx-cache
    key: ${{ runner.os }}-buildx-${{ github.sha }}
    restore-keys: |
      ${{ runner.os }}-buildx-

# 並行執行多個工作
jobs:
  test:
    strategy:
      matrix:
        python-version: [3.8, 3.9, 3.10]
    runs-on: ubuntu-latest
    steps:
    - name: Test with Python ${{ matrix.python-version }}
      run: pytest

小結

今天我們學習了如何設定 GitHub Runner 來自動部署到生產環境,這就像是為農場配備了一個全自動的管理系統。重要的概念包括:

  1. 自動化部署:從程式碼推送到生產部署的完整自動化
  2. 安全檢查:部署前的安全掃描和驗證
  3. 環境保護:生產環境的存取控制和審查機制
  4. 藍綠部署:無停機時間的部署策略
  5. 自動回滾:失敗時的自動恢復機制
  6. 監控告警:部署過程的實時監控

明天我們將學習更詳細的 GitHub Actions CI/CD 配置,包括更複雜的工作流程設計!


下一篇:Day 13 - Github Actions CI/CD


上一篇
Day 11: Github 的功能介紹
下一篇
Day 13: Github Actions CI/CD
系列文
小資族的量化交易 10113
圖片
  熱門推薦
圖片
{{ item.channelVendor }} | {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言