iT邦幫忙

2025 iThome 鐵人賽

DAY 23
0
Build on AWS

30 天將工作室 SaaS 產品部署起來系列 第 23

Day 23: 30天部署SaaS產品到AWS-生產環境部署策略與金絲雀發布

  • 分享至 

  • xImage
  •  

前情提要

經過 Day 21-22 的 CI/CD 建立與監控系統實作,我們已經有了自動化部署的基礎。今天我們要實作生產環境的部署策略,包括藍綠部署 (Blue-Green)金絲雀發布 (Canary)滾動更新 (Rolling Update) 等進階技術,確保零停機部署與快速回滾能力。

部署策略深度比較

讓我們先了解不同部署策略的特性與適用場景:

/**
 * 部署策略比較矩陣
 *
 * ┌─────────────────┬──────────────┬──────────────┬──────────────┬──────────────┐
 * │ 策略            │ 停機時間     │ 風險         │ 成本         │ 回滾速度     │
 * ├─────────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
 * │ Recreate        │ 🔴 數分鐘    │ 🔴 高        │ 🟢 低        │ 🟡 中等      │
 * │ (重建部署)      │              │              │              │              │
 * ├─────────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
 * │ Rolling Update  │ 🟢 零停機    │ 🟡 中        │ 🟢 低        │ 🔴 慢        │
 * │ (滾動更新)      │              │              │              │              │
 * ├─────────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
 * │ Blue-Green      │ 🟢 零停機    │ 🟢 低        │ 🔴 高(2倍)   │ 🟢 秒級      │
 * │ (藍綠部署)      │              │              │              │              │
 * ├─────────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
 * │ Canary          │ 🟢 零停機    │ 🟢 極低      │ 🟡 中        │ 🟢 秒級      │
 * │ (金絲雀發布)    │              │              │              │              │
 * └─────────────────┴──────────────┴──────────────┴──────────────┴──────────────┘
 *
 * 建議:
 * - 開發環境:Recreate(快速、簡單)
 * - 測試環境:Rolling Update(節省成本)
 * - 生產環境:Canary 或 Blue-Green(風險最低)
 */

金絲雀發布 (Canary Deployment)

金絲雀發布是最安全的部署策略,靈感來自礦工用金絲雀檢測毒氣。

1. AWS ECS 金絲雀部署架構

// infrastructure/lib/canary-deployment-stack.ts
import * as cdk from 'aws-cdk-lib';
import * as ecs from 'aws-cdk-lib/aws-ecs';
import * as elbv2 from 'aws-cdk-lib/aws-elasticloadbalancingv2';
import * as cloudwatch from 'aws-cdk-lib/aws-cloudwatch';
import * as sns from 'aws-cdk-lib/aws-sns';
import * as codedeploy from 'aws-cdk-lib/aws-codedeploy';
import { Construct } from 'constructs';

export interface CanaryDeploymentConfig {
  // 金絲雀流量比例階段
  trafficStages: Array<{
    percentage: number;  // 流量百分比
    duration: number;    // 持續時間(分鐘)
  }>;

  // 自動回滾條件
  rollbackConditions: {
    errorRateThreshold: number;      // 錯誤率門檻
    latencyThreshold: number;        // 延遲門檻(毫秒)
    http5xxThreshold: number;        // 5xx 錯誤門檻
  };

  // 健康檢查
  healthCheck: {
    interval: number;                // 檢查間隔(秒)
    healthyThreshold: number;        // 健康門檻(次數)
    unhealthyThreshold: number;      // 不健康門檻(次數)
  };
}

export class CanaryDeploymentStack extends cdk.Stack {
  constructor(
    scope: Construct,
    id: string,
    config: CanaryDeploymentConfig,
    props?: cdk.StackProps
  ) {
    super(scope, id, props);

    // ALB 設定
    const alb = elbv2.ApplicationLoadBalancer.fromLookup(this, 'ALB', {
      loadBalancerArn: process.env.ALB_ARN!,
    });

    // 目標群組 1: 穩定版本 (Green)
    const targetGroupGreen = new elbv2.ApplicationTargetGroup(this, 'TargetGroupGreen', {
      vpc: alb.vpc,
      port: 3000,
      protocol: elbv2.ApplicationProtocol.HTTP,
      targetType: elbv2.TargetType.IP,
      healthCheck: {
        path: '/health',
        interval: cdk.Duration.seconds(config.healthCheck.interval),
        healthyThresholdCount: config.healthCheck.healthyThreshold,
        unhealthyThresholdCount: config.healthCheck.unhealthyThreshold,
        timeout: cdk.Duration.seconds(5),
      },
      deregistrationDelay: cdk.Duration.seconds(30),
    });

    // 目標群組 2: 金絲雀版本 (Blue)
    const targetGroupBlue = new elbv2.ApplicationTargetGroup(this, 'TargetGroupBlue', {
      vpc: alb.vpc,
      port: 3000,
      protocol: elbv2.ApplicationProtocol.HTTP,
      targetType: elbv2.TargetType.IP,
      healthCheck: {
        path: '/health',
        interval: cdk.Duration.seconds(config.healthCheck.interval),
        healthyThresholdCount: config.healthCheck.healthyThreshold,
        unhealthyThresholdCount: config.healthCheck.unhealthyThreshold,
        timeout: cdk.Duration.seconds(5),
      },
      deregistrationDelay: cdk.Duration.seconds(30),
    });

    // ECS 服務
    const cluster = ecs.Cluster.fromClusterAttributes(this, 'Cluster', {
      clusterName: 'kyo-production-cluster',
      vpc: alb.vpc,
    });

    const taskDefinition = new ecs.FargateTaskDefinition(this, 'TaskDef', {
      memoryLimitMiB: 512,
      cpu: 256,
    });

    taskDefinition.addContainer('api', {
      image: ecs.ContainerImage.fromRegistry('kyo-api:latest'),
      logging: ecs.LogDrivers.awsLogs({
        streamPrefix: 'kyo-api',
        logRetention: logs.RetentionDays.ONE_WEEK,
      }),
      portMappings: [{ containerPort: 3000 }],
      environment: {
        NODE_ENV: 'production',
      },
      secrets: {
        DATABASE_URL: ecs.Secret.fromSecretsManager(
          secretsmanager.Secret.fromSecretNameV2(this, 'DBSecret', 'kyo/database')
        ),
      },
    });

    const service = new ecs.FargateService(this, 'Service', {
      cluster,
      taskDefinition,
      desiredCount: 3,
      deploymentController: {
        // 使用 CodeDeploy 控制部署
        type: ecs.DeploymentControllerType.CODE_DEPLOY,
      },
      healthCheckGracePeriod: cdk.Duration.seconds(60),
    });

    // CodeDeploy 應用程式
    const application = new codedeploy.EcsApplication(this, 'CodeDeployApp', {
      applicationName: 'kyo-api-canary',
    });

    // CodeDeploy 部署群組
    const deploymentGroup = new codedeploy.EcsDeploymentGroup(this, 'DeploymentGroup', {
      application,
      service,
      blueGreenDeploymentConfig: {
        blueTargetGroup: targetGroupBlue,
        greenTargetGroup: targetGroupGreen,
        listener: alb.listeners[0],
        testListener: alb.addListener('TestListener', {
          port: 8080,
          protocol: elbv2.ApplicationProtocol.HTTP,
          defaultTargetGroups: [targetGroupBlue],
        }),
      },

      // 金絲雀部署配置
      deploymentConfig: this.createCanaryDeploymentConfig(config),

      // 自動回滾配置
      autoRollback: {
        failedDeployment: true,
        stoppedDeployment: true,
        deploymentInAlarm: true,
      },

      // CloudWatch 告警
      alarms: this.createDeploymentAlarms(config),
    });
  }

  /**
   * 建立金絲雀部署配置
   */
  private createCanaryDeploymentConfig(
    config: CanaryDeploymentConfig
  ): codedeploy.IEcsDeploymentConfig {
    // 自訂金絲雀配置
    // 例如:10% 流量 5 分鐘 → 50% 流量 10 分鐘 → 100% 流量
    return codedeploy.EcsDeploymentConfig.fromEcsDeploymentConfigName(
      this,
      'CanaryConfig',
      // 使用預定義配置或自訂
      'CodeDeployDefault.EcsCanary10Percent5Minutes'
    );

    // 自訂配置範例(需要 AWS CLI 或 SDK 建立):
    /*
    {
      "deploymentConfigName": "KyoCanaryConfig",
      "trafficRoutingConfig": {
        "type": "TimeBasedCanary",
        "timeBasedCanary": {
          "canaryPercentage": 10,
          "canaryInterval": 5
        }
      }
    }
    */
  }

  /**
   * 建立部署監控告警
   */
  private createDeploymentAlarms(
    config: CanaryDeploymentConfig
  ): cloudwatch.IAlarm[] {
    const alarms: cloudwatch.IAlarm[] = [];

    // 1. 錯誤率告警
    alarms.push(
      new cloudwatch.Alarm(this, 'CanaryErrorRateAlarm', {
        alarmName: 'Kyo-Canary-High-Error-Rate',
        metric: new cloudwatch.Metric({
          namespace: 'Kyo/API',
          metricName: 'ErrorRate',
          statistic: 'Average',
          period: cdk.Duration.minutes(1),
        }),
        threshold: config.rollbackConditions.errorRateThreshold,
        evaluationPeriods: 2,
        datapointsToAlarm: 2,
        treatMissingData: cloudwatch.TreatMissingData.NOT_BREACHING,
      })
    );

    // 2. 延遲告警
    alarms.push(
      new cloudwatch.Alarm(this, 'CanaryLatencyAlarm', {
        alarmName: 'Kyo-Canary-High-Latency',
        metric: new cloudwatch.Metric({
          namespace: 'Kyo/API',
          metricName: 'APILatency',
          statistic: 'p99',
          period: cdk.Duration.minutes(1),
        }),
        threshold: config.rollbackConditions.latencyThreshold,
        evaluationPeriods: 2,
      })
    );

    // 3. HTTP 5xx 錯誤告警
    alarms.push(
      new cloudwatch.Alarm(this, 'Canary5xxAlarm', {
        alarmName: 'Kyo-Canary-HTTP-5xx',
        metric: new cloudwatch.MathExpression({
          expression: '(m1 / m2) * 100',
          usingMetrics: {
            m1: new cloudwatch.Metric({
              namespace: 'AWS/ApplicationELB',
              metricName: 'HTTPCode_Target_5XX_Count',
              statistic: 'Sum',
            }),
            m2: new cloudwatch.Metric({
              namespace: 'AWS/ApplicationELB',
              metricName: 'RequestCount',
              statistic: 'Sum',
            }),
          },
        }),
        threshold: config.rollbackConditions.http5xxThreshold,
        evaluationPeriods: 2,
      })
    );

    return alarms;
  }
}

2. 金絲雀部署執行腳本

// scripts/canary-deploy.ts
import {
  CodeDeployClient,
  CreateDeploymentCommand,
  GetDeploymentCommand,
  StopDeploymentCommand,
} from '@aws-sdk/client-codedeploy';
import {
  CloudWatchClient,
  GetMetricStatisticsCommand,
} from '@aws-sdk/client-cloudwatch';

interface CanaryDeploymentOptions {
  applicationName: string;
  deploymentGroupName: string;
  revision: {
    taskDefinitionArn: string;
    containerName: string;
    containerPort: number;
    imageUri: string;
  };
  canaryConfig: {
    stages: Array<{
      percentage: number;
      durationMinutes: number;
    }>;
  };
  autoRollback: {
    enabled: boolean;
    errorRateThreshold: number;
    latencyThreshold: number;
  };
}

class CanaryDeploymentManager {
  private readonly codeDeploy: CodeDeployClient;
  private readonly cloudWatch: CloudWatchClient;

  constructor() {
    this.codeDeploy = new CodeDeployClient({ region: process.env.AWS_REGION });
    this.cloudWatch = new CloudWatchClient({ region: process.env.AWS_REGION });
  }

  /**
   * 執行金絲雀部署
   */
  async deploy(options: CanaryDeploymentOptions): Promise<string> {
    console.log('🚀 Starting canary deployment...\n');

    // 建立部署
    const deploymentId = await this.createDeployment(options);

    console.log(`📝 Deployment ID: ${deploymentId}\n`);

    // 監控部署進度
    await this.monitorDeployment(deploymentId, options);

    return deploymentId;
  }

  /**
   * 建立 CodeDeploy 部署
   */
  private async createDeployment(
    options: CanaryDeploymentOptions
  ): Promise<string> {
    const command = new CreateDeploymentCommand({
      applicationName: options.applicationName,
      deploymentGroupName: options.deploymentGroupName,
      revision: {
        revisionType: 'AppSpecContent',
        appSpecContent: {
          content: JSON.stringify({
            version: '0.0',
            Resources: [
              {
                TargetService: {
                  Type: 'AWS::ECS::Service',
                  Properties: {
                    TaskDefinition: options.revision.taskDefinitionArn,
                    LoadBalancerInfo: {
                      ContainerName: options.revision.containerName,
                      ContainerPort: options.revision.containerPort,
                    },
                  },
                },
              },
            ],
          }),
        },
      },
      autoRollbackConfiguration: {
        enabled: options.autoRollback.enabled,
        events: ['DEPLOYMENT_FAILURE', 'DEPLOYMENT_STOP_ON_ALARM'],
      },
    });

    const response = await this.codeDeploy.send(command);

    if (!response.deploymentId) {
      throw new Error('Failed to create deployment');
    }

    return response.deploymentId;
  }

  /**
   * 監控部署進度
   */
  private async monitorDeployment(
    deploymentId: string,
    options: CanaryDeploymentOptions
  ): Promise<void> {
    const stages = options.canaryConfig.stages;
    let currentStageIndex = 0;
    let hasIssues = false;

    while (currentStageIndex < stages.length) {
      const stage = stages[currentStageIndex];

      console.log(`\n🎯 Stage ${currentStageIndex + 1}/${stages.length}: ${stage.percentage}% traffic`);
      console.log(`   Duration: ${stage.durationMinutes} minutes\n`);

      // 等待階段完成
      const startTime = Date.now();
      const stageEndTime = startTime + stage.durationMinutes * 60 * 1000;

      while (Date.now() < stageEndTime) {
        // 檢查部署狀態
        const deployment = await this.getDeploymentStatus(deploymentId);

        if (deployment.status === 'Failed' || deployment.status === 'Stopped') {
          console.error(`\n❌ Deployment ${deployment.status.toLowerCase()}!`);
          throw new Error(`Deployment ${deployment.status}`);
        }

        // 監控指標
        const metrics = await this.collectMetrics();

        // 顯示當前狀態
        this.displayMetrics(metrics, stage.percentage);

        // 檢查是否需要回滾
        if (this.shouldRollback(metrics, options.autoRollback)) {
          console.error('\n🚨 Metrics exceeded threshold, triggering rollback...');
          await this.rollback(deploymentId);
          throw new Error('Automatic rollback triggered');
        }

        // 等待 30 秒後再次檢查
        await this.delay(30000);
      }

      console.log(`\n✅ Stage ${currentStageIndex + 1} completed successfully`);
      currentStageIndex++;
    }

    console.log('\n🎉 Canary deployment completed successfully!\n');
  }

  /**
   * 取得部署狀態
   */
  private async getDeploymentStatus(deploymentId: string): Promise<{
    status: string;
    errorInfo?: string;
  }> {
    const command = new GetDeploymentCommand({ deploymentId });
    const response = await this.codeDeploy.send(command);

    return {
      status: response.deploymentInfo?.status || 'Unknown',
      errorInfo: response.deploymentInfo?.errorInformation?.message,
    };
  }

  /**
   * 收集監控指標
   */
  private async collectMetrics(): Promise<{
    errorRate: number;
    latencyP99: number;
    requestCount: number;
    http5xxCount: number;
  }> {
    const endTime = new Date();
    const startTime = new Date(endTime.getTime() - 5 * 60 * 1000); // 最近 5 分鐘

    // 取得錯誤率
    const errorRateMetric = await this.cloudWatch.send(
      new GetMetricStatisticsCommand({
        Namespace: 'Kyo/API',
        MetricName: 'ErrorRate',
        StartTime: startTime,
        EndTime: endTime,
        Period: 60,
        Statistics: ['Average'],
      })
    );

    // 取得延遲
    const latencyMetric = await this.cloudWatch.send(
      new GetMetricStatisticsCommand({
        Namespace: 'Kyo/API',
        MetricName: 'APILatency',
        StartTime: startTime,
        EndTime: endTime,
        Period: 60,
        Statistics: ['p99'],
        ExtendedStatistics: ['p99'],
      })
    );

    // 取得請求數
    const requestMetric = await this.cloudWatch.send(
      new GetMetricStatisticsCommand({
        Namespace: 'AWS/ApplicationELB',
        MetricName: 'RequestCount',
        StartTime: startTime,
        EndTime: endTime,
        Period: 60,
        Statistics: ['Sum'],
      })
    );

    // 取得 5xx 錯誤數
    const http5xxMetric = await this.cloudWatch.send(
      new GetMetricStatisticsCommand({
        Namespace: 'AWS/ApplicationELB',
        MetricName: 'HTTPCode_Target_5XX_Count',
        StartTime: startTime,
        EndTime: endTime,
        Period: 60,
        Statistics: ['Sum'],
      })
    );

    return {
      errorRate: this.getLatestDatapoint(errorRateMetric.Datapoints, 'Average') || 0,
      latencyP99: this.getLatestDatapoint(latencyMetric.Datapoints, 'p99') || 0,
      requestCount: this.sumDatapoints(requestMetric.Datapoints, 'Sum'),
      http5xxCount: this.sumDatapoints(http5xxMetric.Datapoints, 'Sum'),
    };
  }

  /**
   * 顯示指標
   */
  private displayMetrics(
    metrics: {
      errorRate: number;
      latencyP99: number;
      requestCount: number;
      http5xxCount: number;
    },
    trafficPercentage: number
  ): void {
    const errorRateColor = metrics.errorRate > 5 ? '\x1b[31m' : '\x1b[32m';
    const latencyColor = metrics.latencyP99 > 2000 ? '\x1b[31m' : '\x1b[32m';
    const reset = '\x1b[0m';

    console.log(`   📊 Current Metrics (${trafficPercentage}% traffic):`);
    console.log(`      Error Rate:  ${errorRateColor}${metrics.errorRate.toFixed(2)}%${reset}`);
    console.log(`      Latency P99: ${latencyColor}${metrics.latencyP99.toFixed(0)}ms${reset}`);
    console.log(`      Requests:    ${metrics.requestCount}`);
    console.log(`      5xx Errors:  ${metrics.http5xxCount}`);
  }

  /**
   * 判斷是否需要回滾
   */
  private shouldRollback(
    metrics: {
      errorRate: number;
      latencyP99: number;
      http5xxCount: number;
    },
    config: {
      errorRateThreshold: number;
      latencyThreshold: number;
    }
  ): boolean {
    return (
      metrics.errorRate > config.errorRateThreshold ||
      metrics.latencyP99 > config.latencyThreshold ||
      metrics.http5xxCount > 10 // 硬編碼門檻,可配置化
    );
  }

  /**
   * 執行回滾
   */
  private async rollback(deploymentId: string): Promise<void> {
    console.log('\n⏪ Rolling back deployment...');

    const command = new StopDeploymentCommand({
      deploymentId,
      autoRollbackEnabled: true,
    });

    await this.codeDeploy.send(command);

    console.log('✅ Rollback initiated');
  }

  private getLatestDatapoint(
    datapoints: any[] | undefined,
    stat: string
  ): number | undefined {
    if (!datapoints || datapoints.length === 0) return undefined;

    const sorted = datapoints.sort(
      (a, b) => b.Timestamp.getTime() - a.Timestamp.getTime()
    );

    return sorted[0][stat];
  }

  private sumDatapoints(
    datapoints: any[] | undefined,
    stat: string
  ): number {
    if (!datapoints || datapoints.length === 0) return 0;

    return datapoints.reduce((sum, dp) => sum + (dp[stat] || 0), 0);
  }

  private delay(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

// 執行部署
const manager = new CanaryDeploymentManager();

manager
  .deploy({
    applicationName: 'kyo-api-canary',
    deploymentGroupName: 'production',
    revision: {
      taskDefinitionArn: process.env.TASK_DEFINITION_ARN!,
      containerName: 'api',
      containerPort: 3000,
      imageUri: process.env.IMAGE_URI!,
    },
    canaryConfig: {
      stages: [
        { percentage: 10, durationMinutes: 5 },
        { percentage: 50, durationMinutes: 10 },
        { percentage: 100, durationMinutes: 0 },
      ],
    },
    autoRollback: {
      enabled: true,
      errorRateThreshold: 5, // 5% 錯誤率
      latencyThreshold: 2000, // 2000ms
    },
  })
  .then(deploymentId => {
    console.log(`\n✅ Deployment completed: ${deploymentId}`);
    process.exit(0);
  })
  .catch(error => {
    console.error(`\n❌ Deployment failed: ${error.message}`);
    process.exit(1);
  });

藍綠部署 (Blue-Green Deployment)

藍綠部署同時維護兩個完全相同的生產環境,切換流量時只需更新路由規則。

// scripts/blue-green-deploy.ts
import {
  ECSClient,
  UpdateServiceCommand,
  DescribeServicesCommand,
} from '@aws-sdk/client-ecs';
import {
  ElasticLoadBalancingV2Client,
  ModifyListenerCommand,
  DescribeTargetHealthCommand,
} from '@aws-sdk/client-elastic-load-balancing-v2';

class BlueGreenDeployment {
  private readonly ecs: ECSClient;
  private readonly elbv2: ElasticLoadBalancingV2Client;

  constructor() {
    this.ecs = new ECSClient({ region: process.env.AWS_REGION });
    this.elbv2 = new ElasticLoadBalancingV2Client({ region: process.env.AWS_REGION });
  }

  async deploy(config: {
    cluster: string;
    blueService: string;
    greenService: string;
    listenerArn: string;
    blueTargetGroupArn: string;
    greenTargetGroupArn: string;
    newTaskDefinition: string;
  }): Promise<void> {
    console.log('🔵 Starting Blue-Green Deployment...\n');

    // 1. 確認當前活躍環境(藍或綠)
    const activeEnv = await this.getActiveEnvironment(
      config.listenerArn,
      config.blueTargetGroupArn,
      config.greenTargetGroupArn
    );

    console.log(`Current active environment: ${activeEnv === 'blue' ? '🔵 Blue' : '🟢 Green'}\n`);

    const inactiveEnv = activeEnv === 'blue' ? 'green' : 'blue';
    const inactiveService = inactiveEnv === 'blue' ? config.blueService : config.greenService;
    const inactiveTargetGroup =
      inactiveEnv === 'blue' ? config.blueTargetGroupArn : config.greenTargetGroupArn;

    // 2. 部署新版本到非活躍環境
    console.log(`Deploying new version to ${inactiveEnv === 'blue' ? '🔵' : '🟢'} ${inactiveEnv} environment...`);

    await this.ecs.send(
      new UpdateServiceCommand({
        cluster: config.cluster,
        service: inactiveService,
        taskDefinition: config.newTaskDefinition,
        forceNewDeployment: true,
      })
    );

    // 3. 等待部署穩定
    console.log('Waiting for deployment to stabilize...');
    await this.waitForServiceStable(config.cluster, inactiveService);

    // 4. 健康檢查
    console.log('Performing health checks...');
    const isHealthy = await this.checkTargetHealth(inactiveTargetGroup);

    if (!isHealthy) {
      throw new Error('Health check failed on inactive environment');
    }

    // 5. 切換流量
    console.log(`\n🔄 Switching traffic to ${inactiveEnv} environment...`);

    await this.elbv2.send(
      new ModifyListenerCommand({
        ListenerArn: config.listenerArn,
        DefaultActions: [
          {
            Type: 'forward',
            TargetGroupArn: inactiveTargetGroup,
          },
        ],
      })
    );

    console.log('✅ Traffic switched successfully!');

    // 6. 驗證新環境
    console.log('\nVerifying new environment...');
    await this.delay(30000); // 等待 30 秒

    const metricsOk = await this.verifyMetrics();

    if (!metricsOk) {
      console.error('❌ Metrics check failed, rolling back...');
      await this.rollback(config.listenerArn, activeEnv === 'blue' ? config.blueTargetGroupArn : config.greenTargetGroupArn);
      throw new Error('Rollback triggered due to metrics failure');
    }

    console.log('\n🎉 Blue-Green deployment completed successfully!');
    console.log(`\nNew active environment: ${inactiveEnv === 'blue' ? '🔵 Blue' : '🟢 Green'}`);
    console.log(`Old environment (${activeEnv}) is now standby for quick rollback if needed\n`);
  }

  /**
   * 取得目前活躍的環境
   */
  private async getActiveEnvironment(
    listenerArn: string,
    blueTargetGroupArn: string,
    greenTargetGroupArn: string
  ): Promise<'blue' | 'green'> {
    const command = new DescribeListenersCommand({
      ListenerArns: [listenerArn],
    });

    const response = await this.elbv2.send(command);
    const currentTargetGroup = response.Listeners?.[0]?.DefaultActions?.[0]?.TargetGroupArn;

    return currentTargetGroup === blueTargetGroupArn ? 'blue' : 'green';
  }

  /**
   * 等待 ECS 服務穩定
   */
  private async waitForServiceStable(
    cluster: string,
    service: string,
    maxAttempts: number = 60
  ): Promise<void> {
    for (let attempt = 0; attempt < maxAttempts; attempt++) {
      const command = new DescribeServicesCommand({
        cluster,
        services: [service],
      });

      const response = await this.ecs.send(command);
      const serviceData = response.services?.[0];

      if (!serviceData) {
        throw new Error('Service not found');
      }

      const runningCount = serviceData.runningCount || 0;
      const desiredCount = serviceData.desiredCount || 0;
      const deployments = serviceData.deployments || [];

      // 檢查是否只有一個 deployment(PRIMARY)
      const hasSingleDeployment = deployments.length === 1 && deployments[0].status === 'PRIMARY';

      if (runningCount === desiredCount && hasSingleDeployment) {
        console.log(`✅ Service stable (${runningCount}/${desiredCount} tasks running)`);
        return;
      }

      console.log(`   ${runningCount}/${desiredCount} tasks running (attempt ${attempt + 1}/${maxAttempts})`);
      await this.delay(10000); // 等待 10 秒
    }

    throw new Error('Timeout waiting for service to stabilize');
  }

  /**
   * 檢查目標群組健康狀態
   */
  private async checkTargetHealth(targetGroupArn: string): Promise<boolean> {
    const command = new DescribeTargetHealthCommand({
      TargetGroupArn: targetGroupArn,
    });

    const response = await this.elbv2.send(command);
    const targets = response.TargetHealthDescriptions || [];

    const healthyCount = targets.filter(t => t.TargetHealth?.State === 'healthy').length;
    const totalCount = targets.length;

    console.log(`   Healthy targets: ${healthyCount}/${totalCount}`);

    return healthyCount === totalCount && totalCount > 0;
  }

  /**
   * 驗證指標
   */
  private async verifyMetrics(): Promise<boolean> {
    // 簡化版本,實際應該檢查 CloudWatch 指標
    // 參考前面的 CanaryDeploymentManager.collectMetrics()

    console.log('   Checking error rates...');
    await this.delay(5000);

    console.log('   Checking latency...');
    await this.delay(5000);

    console.log('   Checking throughput...');
    await this.delay(5000);

    console.log('✅ All metrics within acceptable range');
    return true;
  }

  /**
   * 回滾到舊環境
   */
  private async rollback(listenerArn: string, oldTargetGroupArn: string): Promise<void> {
    console.log('⏪ Rolling back to previous environment...');

    await this.elbv2.send(
      new ModifyListenerCommand({
        ListenerArn: listenerArn,
        DefaultActions: [
          {
            Type: 'forward',
            TargetGroupArn: oldTargetGroupArn,
          },
        ],
      })
    );

    console.log('✅ Rollback completed');
  }

  private delay(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

滾動更新 (Rolling Update)

滾動更新是最常見且成本最低的部署策略:

# .github/workflows/rolling-update.yml
name: Rolling Update Deployment

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest

    steps:
      # ... 省略 build 步驟 ...

      - name: Deploy with Rolling Update
        run: |
          aws ecs update-service \
            --cluster kyo-production \
            --service kyo-api \
            --task-definition kyo-api:${{ github.sha }} \
            --deployment-configuration \
              "minimumHealthyPercent=75,maximumPercent=200,deploymentCircuitBreaker={enable=true,rollback=true}" \
            --force-new-deployment

      - name: Wait for stable deployment
        run: |
          aws ecs wait services-stable \
            --cluster kyo-production \
            --services kyo-api \
            --timeout 900

      - name: Verify deployment
        run: |
          # 檢查健康端點
          HEALTH_CHECK_URL="https://api.kyong.com/health"

          for i in {1..5}; do
            RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" $HEALTH_CHECK_URL)

            if [ "$RESPONSE" = "200" ]; then
              echo "✅ Health check passed ($i/5)"
            else
              echo "❌ Health check failed: HTTP $RESPONSE"
              exit 1
            fi

            sleep 10
          done

          echo "🎉 Deployment verified successfully!"

部署策略決策樹

/**
 * 選擇適合的部署策略
 */
function selectDeploymentStrategy(criteria: {
  budget: 'low' | 'medium' | 'high';
  riskTolerance: 'low' | 'medium' | 'high';
  downtime Acceptable: boolean;
  trafficVolume: 'low' | 'medium' | 'high';
  rollbackSpeed: 'fast' | 'medium' | 'slow';
}): string {
  // 可以接受停機時間
  if (criteria.downtimeAcceptable) {
    return 'Recreate Deployment';
  }

  // 低預算
  if (criteria.budget === 'low') {
    return 'Rolling Update';
  }

  // 高風險容忍度
  if (criteria.riskTolerance === 'high') {
    return 'Rolling Update';
  }

  // 需要快速回滾 + 高預算
  if (criteria.rollbackSpeed === 'fast' && criteria.budget === 'high') {
    return 'Blue-Green Deployment';
  }

  // 高流量 + 低風險容忍度
  if (criteria.trafficVolume === 'high' && criteria.riskTolerance === 'low') {
    return 'Canary Deployment';
  }

  // 預設建議
  return 'Canary Deployment';
}

// 範例使用
const strategy = selectDeploymentStrategy({
  budget: 'medium',
  riskTolerance: 'low',
  downtimeAcceptable: false,
  trafficVolume: 'high',
  rollbackSpeed: 'fast',
});

console.log(`Recommended strategy: ${strategy}`);

今日總結

我們今天實作了生產環境的部署策略:

核心功能

  1. 金絲雀發布: 最安全的漸進式部署
  2. 藍綠部署: 快速切換與即時回滾
  3. 滾動更新: 低成本的零停機部署
  4. 自動回滾: 基於指標的智能決策

深度技術比較

金絲雀 vs 藍綠部署:

  • 金絲雀: 風險更低(漸進式)、成本中等、適合高流量
  • 藍綠: 切換更快(秒級)、成本高(雙倍資源)、適合關鍵服務
  • 💡 建議:大部分場景用金絲雀,關鍵業務用藍綠

自動回滾門檻設定:

  • 錯誤率: 5-10%(取決於業務)
  • 延遲: P99 增加 50%+
  • 5xx 錯誤: 絕對數量 > 10/分鐘
  • 💡 建議:從寬鬆開始,逐步收緊

ECS Deployment Controller 選擇:

  • ECS Native: 簡單、內建、適合小團隊
  • CODE_DEPLOY: 進階、金絲雀支援、適合企業級
  • External: 完全控制、需自行實作、適合特殊需求

成本分析

月成本估算 (ap-northeast-1):

滾動更新: $200 (baseline)
├─ ECS Fargate: $200
└─ ALB: $25

金絲雀發布: $250 (+25%)
├─ ECS Fargate: $220 (10% overhead during deployment)
├─ ALB: $25
└─ CodeDeploy: $5

藍綠部署: $450 (+125%)
├─ ECS Fargate: $400 (double capacity)
└─ ALB: $50 (two target groups)

最佳實踐檢查清單

  • ✅ 選擇適合業務需求的部署策略
  • ✅ 設定自動回滾條件與門檻
  • ✅ 實作完整的健康檢查機制
  • ✅ 監控部署過程的關鍵指標
  • ✅ 保留足夠的回滾窗口
  • ✅ 文件化部署流程與應急計劃
  • ✅ 定期演練部署與回滾流程

上一篇
Day 22: 30天部署SaaS產品到AWS-CloudWatch 監控與告警實作
下一篇
Day 24: 30天部署SaaS產品到AWS-AWS SES 整合與郵件送達率優化完全指南
系列文
30 天將工作室 SaaS 產品部署起來25
圖片
  熱門推薦
圖片
{{ item.channelVendor }} | {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言