iT邦幫忙

2025 iThome 鐵人賽

DAY 29
0
Build on AWS

30 天將工作室 SaaS 產品部署起來系列 第 29

Day 29: 30天部署SaaS產品到AWS-完整 CI/CD Pipeline 與自動化部署

  • 分享至 

  • xImage
  •  

前情提要

在過去的 Day 24-28 中,我們建構了 Kyo System 在 AWS 上的完整基礎設施:

  • Day 24: 使用 AWS CDK 建立基礎設施即程式碼 (IaC)
  • Day 25: 設計高可用架構與 Multi-AZ 部署
  • Day 26: 實作 RDS、ElastiCache 與資料持久化
  • Day 27: 建立監控、日誌與告警系統
  • Day 28: 實作安全性、IAM 與 Secrets 管理

今天是倒數第二天,我們將完成最後的拼圖:完整的 CI/CD Pipeline,讓 Kyo System 達到自動化部署、零停機更新的生產就緒狀態。

完整 AWS 架構回顧

┌─────────────────────────────────────────────────────────────────┐
│                          GitHub Actions                          │
│              CI/CD Pipeline (Build, Test, Deploy)               │
└────────┬─────────────────────────────────────────────────┬──────┘
         │                                                  │
         │ Push Images                                      │ Deploy
         ▼                                                  ▼
┌─────────────────┐                              ┌─────────────────┐
│      ECR        │                              │   AWS CDK       │
│  - Docker       │                              │  - Synth        │
│    Images       │                              │  - Deploy       │
└────────┬────────┘                              └────────┬────────┘
         │                                                 │
         │ Pull Images                                     │ Create/Update
         ▼                                                 ▼
┌─────────────────────────────────────────────────────────────────┐
│                         AWS Services                             │
├─────────────────┬──────────────────┬──────────────────┬─────────┤
│   ECS Fargate   │   Application    │      RDS         │  Redis  │
│   - Services    │   Load Balancer  │   - PostgreSQL   │  Cache  │
│   - Tasks       │   - Target Grps  │   - Multi-AZ     │         │
└─────────────────┴──────────────────┴──────────────────┴─────────┘
         │                    │                  │           │
         └────────────────────┴──────────────────┴───────────┘
                              │
                              ▼
                  ┌───────────────────────┐
                  │   CloudWatch          │
                  │   - Logs              │
                  │   - Metrics           │
                  │   - Alarms            │
                  └───────────────────────┘

GitHub Actions CI/CD Pipeline

1. 完整 Workflow 設定

建立多環境部署的 GitHub Actions workflow:

# .github/workflows/deploy.yml
name: CI/CD Pipeline

on:
  push:
    branches:
      - main          # Production
      - develop       # Staging
  pull_request:
    branches:
      - main
      - develop

env:
  AWS_REGION: ap-northeast-1
  ECR_REPOSITORY: kyo-otp-service
  NODE_VERSION: '20.x'

jobs:
  # Job 1: Code Quality & Testing
  test:
    name: Test & Lint
    runs-on: ubuntu-latest
    timeout-minutes: 15

    steps:
      - name: Checkout code
        uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}

      - name: Setup pnpm
        uses: pnpm/action-setup@v2
        with:
          version: 8

      - name: Get pnpm store directory
        id: pnpm-cache
        shell: bash
        run: |
          echo "STORE_PATH=$(pnpm store path)" >> $GITHUB_OUTPUT

      - name: Setup pnpm cache
        uses: actions/cache@v3
        with:
          path: ${{ steps.pnpm-cache.outputs.STORE_PATH }}
          key: ${{ runner.os }}-pnpm-store-${{ hashFiles('**/pnpm-lock.yaml') }}
          restore-keys: |
            ${{ runner.os }}-pnpm-store-

      - name: Install dependencies
        run: pnpm install --frozen-lockfile

      - name: Lint
        run: pnpm run lint

      - name: Type check
        run: pnpm run type-check

      - name: Unit tests
        run: pnpm run test:unit

      - name: Integration tests
        run: pnpm run test:integration
        env:
          DATABASE_URL: postgresql://test:test@localhost:5432/test
          REDIS_URL: redis://localhost:6379

      - name: Upload coverage
        uses: codecov/codecov-action@v3
        with:
          files: ./coverage/coverage-final.json
          flags: unittests
          name: codecov-umbrella

  # Job 2: Security Scan
  security:
    name: Security Scan
    runs-on: ubuntu-latest
    timeout-minutes: 10

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Run npm audit
        run: npm audit --production --audit-level=high
        continue-on-error: true

      - name: Run Trivy vulnerability scanner
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'fs'
          scan-ref: '.'
          format: 'sarif'
          output: 'trivy-results.sarif'

      - name: Upload Trivy results to GitHub Security
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: 'trivy-results.sarif'

  # Job 3: Build Docker Images
  build:
    name: Build & Push Docker Images
    runs-on: ubuntu-latest
    needs: [test, security]
    if: github.event_name == 'push'
    timeout-minutes: 20

    outputs:
      image-tag: ${{ steps.meta.outputs.tags }}
      image-digest: ${{ steps.build.outputs.digest }}

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v2

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ steps.login-ecr.outputs.registry }}/${{ env.ECR_REPOSITORY }}
          tags: |
            type=ref,event=branch
            type=sha,prefix={{branch}}-
            type=semver,pattern={{version}}
            type=raw,value=latest,enable={{is_default_branch}}

      - name: Build and push Docker image
        id: build
        uses: docker/build-push-action@v5
        with:
          context: .
          file: ./apps/kyo-otp-service/Dockerfile
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
          build-args: |
            NODE_ENV=production
            BUILD_DATE=${{ github.event.head_commit.timestamp }}
            VCS_REF=${{ github.sha }}

      - name: Scan image with Trivy
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: ${{ steps.login-ecr.outputs.registry }}/${{ env.ECR_REPOSITORY }}:${{ github.sha }}
          format: 'sarif'
          output: 'trivy-image-results.sarif'

      - name: Upload Trivy scan results
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: 'trivy-image-results.sarif'

  # Job 4: Database Migration
  migrate:
    name: Run Database Migrations
    runs-on: ubuntu-latest
    needs: [build]
    if: github.event_name == 'push'
    timeout-minutes: 10

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}

      - name: Setup pnpm
        uses: pnpm/action-setup@v2
        with:
          version: 8

      - name: Install dependencies
        run: pnpm install --frozen-lockfile

      - name: Get database credentials from Secrets Manager
        id: db-secret
        run: |
          SECRET=$(aws secretsmanager get-secret-value \
            --secret-id kyo-system/database/credentials \
            --query SecretString --output text)

          echo "::add-mask::$(echo $SECRET | jq -r .password)"
          echo "DATABASE_URL=postgresql://$(echo $SECRET | jq -r .username):$(echo $SECRET | jq -r .password)@$(echo $SECRET | jq -r .host):$(echo $SECRET | jq -r .port)/$(echo $SECRET | jq -r .dbname)" >> $GITHUB_ENV

      - name: Run migrations
        run: pnpm --filter kyo-otp-service run migrate:up
        env:
          DATABASE_URL: ${{ env.DATABASE_URL }}

      - name: Verify migration
        run: pnpm --filter kyo-otp-service run migrate:status
        env:
          DATABASE_URL: ${{ env.DATABASE_URL }}

  # Job 5: Deploy to Staging
  deploy-staging:
    name: Deploy to Staging
    runs-on: ubuntu-latest
    needs: [build, migrate]
    if: github.ref == 'refs/heads/develop' && github.event_name == 'push'
    environment:
      name: staging
      url: https://staging.kyo-system.com
    timeout-minutes: 15

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}

      - name: Install CDK
        run: npm install -g aws-cdk

      - name: Setup pnpm
        uses: pnpm/action-setup@v2
        with:
          version: 8

      - name: Install dependencies
        run: pnpm install --frozen-lockfile

      - name: Build CDK
        run: pnpm --filter kyo-infrastructure run build

      - name: CDK Synth
        run: pnpm --filter kyo-infrastructure run cdk synth
        env:
          ENVIRONMENT: staging
          IMAGE_TAG: ${{ github.sha }}

      - name: CDK Deploy
        run: pnpm --filter kyo-infrastructure run cdk deploy --all --require-approval never
        env:
          ENVIRONMENT: staging
          IMAGE_TAG: ${{ github.sha }}

      - name: Wait for deployment to stabilize
        run: |
          aws ecs wait services-stable \
            --cluster kyo-staging-cluster \
            --services kyo-otp-service

      - name: Run smoke tests
        run: pnpm run test:smoke
        env:
          API_URL: https://api.staging.kyo-system.com

  # Job 6: Deploy to Production
  deploy-production:
    name: Deploy to Production
    runs-on: ubuntu-latest
    needs: [build, migrate]
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    environment:
      name: production
      url: https://kyo-system.com
    timeout-minutes: 20

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}

      - name: Install CDK
        run: npm install -g aws-cdk

      - name: Setup pnpm
        uses: pnpm/action-setup@v2
        with:
          version: 8

      - name: Install dependencies
        run: pnpm install --frozen-lockfile

      - name: Build CDK
        run: pnpm --filter kyo-infrastructure run build

      - name: CDK Synth
        run: pnpm --filter kyo-infrastructure run cdk synth
        env:
          ENVIRONMENT: production
          IMAGE_TAG: ${{ github.sha }}

      - name: CDK Deploy (Blue-Green)
        run: pnpm --filter kyo-infrastructure run cdk deploy --all --require-approval never
        env:
          ENVIRONMENT: production
          IMAGE_TAG: ${{ github.sha }}
          DEPLOYMENT_TYPE: blue-green

      - name: Wait for deployment to stabilize
        run: |
          aws ecs wait services-stable \
            --cluster kyo-production-cluster \
            --services kyo-otp-service

      - name: Run smoke tests
        id: smoke-tests
        run: pnpm run test:smoke
        env:
          API_URL: https://api.kyo-system.com
        continue-on-error: true

      - name: Rollback on failure
        if: steps.smoke-tests.outcome == 'failure'
        run: |
          echo "Smoke tests failed, initiating rollback..."
          pnpm --filter kyo-infrastructure run cdk deploy --all --require-approval never
        env:
          ENVIRONMENT: production
          IMAGE_TAG: ${{ needs.build.outputs.previous-image-tag }}

      - name: Notify deployment
        uses: 8398a7/action-slack@v3
        with:
          status: ${{ job.status }}
          text: 'Production deployment ${{ job.status }}'
          webhook_url: ${{ secrets.SLACK_WEBHOOK }}
        if: always()

  # Job 7: Performance Test
  performance:
    name: Performance Tests
    runs-on: ubuntu-latest
    needs: [deploy-staging]
    if: github.ref == 'refs/heads/develop' && github.event_name == 'push'
    timeout-minutes: 10

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Setup k6
        run: |
          sudo gpg -k
          sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
          echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
          sudo apt-get update
          sudo apt-get install k6

      - name: Run load tests
        run: k6 run test/load/otp-load-test.js
        env:
          BASE_URL: https://api.staging.kyo-system.com
          JWT_TOKEN: ${{ secrets.STAGING_JWT_TOKEN }}

      - name: Upload results
        uses: actions/upload-artifact@v3
        with:
          name: k6-results
          path: k6-results.json

Multi-Stage Dockerfile 優化

建立優化的 Docker 映像構建流程:

# apps/kyo-otp-service/Dockerfile
# Stage 1: Base
FROM node:20-alpine AS base

# Install pnpm
RUN corepack enable && corepack prepare pnpm@8.15.0 --activate

# Set working directory
WORKDIR /app

# Copy workspace configuration
COPY pnpm-workspace.yaml package.json pnpm-lock.yaml ./

# Stage 2: Dependencies
FROM base AS dependencies

# Copy package.json files for all workspaces
COPY packages/kyo-core/package.json ./packages/kyo-core/
COPY packages/kyo-types/package.json ./packages/kyo-types/
COPY packages/kyo-config/package.json ./packages/kyo-config/
COPY apps/kyo-otp-service/package.json ./apps/kyo-otp-service/

# Install dependencies (including devDependencies for build)
RUN pnpm install --frozen-lockfile

# Stage 3: Build
FROM dependencies AS build

# Copy source code
COPY packages/kyo-core ./packages/kyo-core
COPY packages/kyo-types ./packages/kyo-types
COPY packages/kyo-config ./packages/kyo-config
COPY apps/kyo-otp-service ./apps/kyo-otp-service

# Copy shared config files
COPY tsconfig.json ./
COPY turbo.json ./

# Build packages
RUN pnpm --filter @kyong/kyo-types run build
RUN pnpm --filter @kyong/kyo-config run build
RUN pnpm --filter @kyong/kyo-core run build
RUN pnpm --filter kyo-otp-service run build

# Stage 4: Production Dependencies
FROM base AS prod-dependencies

# Copy package files
COPY packages/kyo-core/package.json ./packages/kyo-core/
COPY packages/kyo-types/package.json ./packages/kyo-types/
COPY packages/kyo-config/package.json ./packages/kyo-config/
COPY apps/kyo-otp-service/package.json ./apps/kyo-otp-service/

# Install production dependencies only
RUN pnpm install --prod --frozen-lockfile

# Stage 5: Runtime
FROM node:20-alpine AS runtime

# Install dumb-init for proper signal handling
RUN apk add --no-cache dumb-init

# Create non-root user
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nodejs -u 1001

# Set working directory
WORKDIR /app

# Copy production dependencies
COPY --from=prod-dependencies --chown=nodejs:nodejs /app/node_modules ./node_modules
COPY --from=prod-dependencies --chown=nodejs:nodejs /app/packages ./packages
COPY --from=prod-dependencies --chown=nodejs:nodejs /app/apps/kyo-otp-service/node_modules ./apps/kyo-otp-service/node_modules

# Copy built application
COPY --from=build --chown=nodejs:nodejs /app/packages/kyo-core/dist ./packages/kyo-core/dist
COPY --from=build --chown=nodejs:nodejs /app/packages/kyo-types/dist ./packages/kyo-types/dist
COPY --from=build --chown=nodejs:nodejs /app/packages/kyo-config/dist ./packages/kyo-config/dist
COPY --from=build --chown=nodejs:nodejs /app/apps/kyo-otp-service/dist ./apps/kyo-otp-service/dist

# Copy package.json files
COPY --chown=nodejs:nodejs packages/kyo-core/package.json ./packages/kyo-core/
COPY --chown=nodejs:nodejs packages/kyo-types/package.json ./packages/kyo-types/
COPY --chown=nodejs:nodejs packages/kyo-config/package.json ./packages/kyo-config/
COPY --chown=nodejs:nodejs apps/kyo-otp-service/package.json ./apps/kyo-otp-service/

# Set environment
ENV NODE_ENV=production \
    PORT=3000

# Switch to non-root user
USER nodejs

# Expose port
EXPOSE 3000

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
    CMD node -e "require('http').get('http://localhost:3000/health', (r) => {process.exit(r.statusCode === 200 ? 0 : 1)})"

# Use dumb-init to handle signals properly
ENTRYPOINT ["dumb-init", "--"]

# Start application
CMD ["node", "apps/kyo-otp-service/dist/index.js"]

ECS Blue-Green Deployment with CDK

實作零停機的藍綠部署:

// infrastructure/lib/ecs-service-stack.ts
import * as cdk from 'aws-cdk-lib';
import * as ecs from 'aws-cdk-lib/aws-ecs';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as elbv2 from 'aws-cdk-lib/aws-elasticloadbalancingv2';
import * as logs from 'aws-cdk-lib/aws-logs';
import * as iam from 'aws-cdk-lib/aws-iam';
import * as codedeploy from 'aws-cdk-lib/aws-codedeploy';
import { Construct } from 'constructs';

export interface EcsServiceStackProps extends cdk.StackProps {
  vpc: ec2.IVpc;
  cluster: ecs.ICluster;
  imageTag: string;
  environment: 'staging' | 'production';
  dbSecretArn: string;
  redisEndpoint: string;
}

export class EcsServiceStack extends cdk.Stack {
  public readonly service: ecs.FargateService;

  constructor(scope: Construct, id: string, props: EcsServiceStackProps) {
    super(scope, id, props);

    const { vpc, cluster, imageTag, environment, dbSecretArn, redisEndpoint } = props;

    // Task Definition
    const taskDefinition = new ecs.FargateTaskDefinition(this, 'TaskDef', {
      memoryLimitMiB: 1024,
      cpu: 512,
      runtimePlatform: {
        cpuArchitecture: ecs.CpuArchitecture.ARM64,
        operatingSystemFamily: ecs.OperatingSystemFamily.LINUX,
      },
    });

    // Grant access to Secrets Manager
    const dbSecret = cdk.aws_secretsmanager.Secret.fromSecretCompleteArn(
      this,
      'DBSecret',
      dbSecretArn
    );
    dbSecret.grantRead(taskDefinition.taskRole);

    // CloudWatch Log Group
    const logGroup = new logs.LogGroup(this, 'ServiceLogGroup', {
      logGroupName: `/ecs/kyo-otp-service-${environment}`,
      retention: logs.RetentionDays.ONE_MONTH,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
    });

    // Container
    const container = taskDefinition.addContainer('app', {
      image: ecs.ContainerImage.fromRegistry(
        `${cdk.Aws.ACCOUNT_ID}.dkr.ecr.${cdk.Aws.REGION}.amazonaws.com/kyo-otp-service:${imageTag}`
      ),
      logging: ecs.LogDrivers.awsLogs({
        streamPrefix: 'ecs',
        logGroup,
      }),
      environment: {
        NODE_ENV: 'production',
        ENVIRONMENT: environment,
        REDIS_URL: `redis://${redisEndpoint}:6379`,
        PORT: '3000',
      },
      secrets: {
        DATABASE_URL: ecs.Secret.fromSecretsManager(dbSecret, 'connectionString'),
        JWT_SECRET: ecs.Secret.fromSecretsManager(dbSecret, 'jwtSecret'),
      },
      healthCheck: {
        command: [
          'CMD-SHELL',
          'node -e "require(\'http\').get(\'http://localhost:3000/health\', (r) => {process.exit(r.statusCode === 200 ? 0 : 1)})"',
        ],
        interval: cdk.Duration.seconds(30),
        timeout: cdk.Duration.seconds(5),
        retries: 3,
        startPeriod: cdk.Duration.seconds(60),
      },
    });

    container.addPortMappings({
      containerPort: 3000,
      protocol: ecs.Protocol.TCP,
    });

    // Application Load Balancer
    const alb = new elbv2.ApplicationLoadBalancer(this, 'ALB', {
      vpc,
      internetFacing: true,
      deletionProtection: environment === 'production',
    });

    // Target Group - Blue
    const blueTargetGroup = new elbv2.ApplicationTargetGroup(this, 'BlueTargetGroup', {
      vpc,
      port: 3000,
      protocol: elbv2.ApplicationProtocol.HTTP,
      targetType: elbv2.TargetType.IP,
      healthCheck: {
        path: '/health',
        interval: cdk.Duration.seconds(30),
        timeout: cdk.Duration.seconds(5),
        healthyThresholdCount: 2,
        unhealthyThresholdCount: 3,
      },
      deregistrationDelay: cdk.Duration.seconds(30),
    });

    // Target Group - Green
    const greenTargetGroup = new elbv2.ApplicationTargetGroup(this, 'GreenTargetGroup', {
      vpc,
      port: 3000,
      protocol: elbv2.ApplicationProtocol.HTTP,
      targetType: elbv2.TargetType.IP,
      healthCheck: {
        path: '/health',
        interval: cdk.Duration.seconds(30),
        timeout: cdk.Duration.seconds(5),
        healthyThresholdCount: 2,
        unhealthyThresholdCount: 3,
      },
      deregistrationDelay: cdk.Duration.seconds(30),
    });

    // Listener - Production Traffic
    const prodListener = alb.addListener('ProdListener', {
      port: 80,
      protocol: elbv2.ApplicationProtocol.HTTP,
      defaultTargetGroups: [blueTargetGroup],
    });

    // Listener - Test Traffic (for validation)
    const testListener = alb.addListener('TestListener', {
      port: 8080,
      protocol: elbv2.ApplicationProtocol.HTTP,
      defaultTargetGroups: [greenTargetGroup],
    });

    // ECS Service
    this.service = new ecs.FargateService(this, 'Service', {
      cluster,
      taskDefinition,
      desiredCount: environment === 'production' ? 2 : 1,
      minHealthyPercent: 100,
      maxHealthyPercent: 200,
      deploymentController: {
        type: ecs.DeploymentControllerType.CODE_DEPLOY,
      },
      circuitBreaker: {
        rollback: true,
      },
      enableExecuteCommand: true, // For debugging
    });

    // Attach to blue target group initially
    this.service.attachToApplicationTargetGroup(blueTargetGroup);

    // Auto Scaling
    const scaling = this.service.autoScaleTaskCount({
      minCapacity: environment === 'production' ? 2 : 1,
      maxCapacity: environment === 'production' ? 10 : 3,
    });

    scaling.scaleOnCpuUtilization('CpuScaling', {
      targetUtilizationPercent: 70,
      scaleInCooldown: cdk.Duration.seconds(60),
      scaleOutCooldown: cdk.Duration.seconds(60),
    });

    scaling.scaleOnMemoryUtilization('MemoryScaling', {
      targetUtilizationPercent: 80,
      scaleInCooldown: cdk.Duration.seconds(60),
      scaleOutCooldown: cdk.Duration.seconds(60),
    });

    // CodeDeploy Application
    const application = new codedeploy.EcsApplication(this, 'Application');

    // CodeDeploy Deployment Group
    const deploymentGroup = new codedeploy.EcsDeploymentGroup(this, 'DeploymentGroup', {
      application,
      service: this.service,
      blueGreenDeploymentConfig: {
        blueTargetGroup,
        greenTargetGroup,
        listener: prodListener,
        testListener,
        terminationWaitTime: cdk.Duration.minutes(5),
      },
      deploymentConfig: environment === 'production'
        ? codedeploy.EcsDeploymentConfig.CANARY_10PERCENT_5MINUTES
        : codedeploy.EcsDeploymentConfig.ALL_AT_ONCE,
      autoRollback: {
        failedDeployment: true,
        stoppedDeployment: true,
        deploymentInAlarm: true,
      },
    });

    // CloudWatch Alarms for Auto Rollback
    const errorAlarm = new cdk.aws_cloudwatch.Alarm(this, 'ErrorAlarm', {
      metric: alb.metricTargetResponseTime(),
      threshold: 1000, // 1 second
      evaluationPeriods: 2,
      comparisonOperator: cdk.aws_cloudwatch.ComparisonOperator.GREATER_THAN_THRESHOLD,
    });

    deploymentGroup.addAlarm(errorAlarm);

    // Outputs
    new cdk.CfnOutput(this, 'LoadBalancerDNS', {
      value: alb.loadBalancerDnsName,
      description: 'Load Balancer DNS Name',
    });

    new cdk.CfnOutput(this, 'ServiceName', {
      value: this.service.serviceName,
      description: 'ECS Service Name',
    });

    new cdk.CfnOutput(this, 'DeploymentGroupName', {
      value: deploymentGroup.deploymentGroupName,
      description: 'CodeDeploy Deployment Group Name',
    });
  }
}

Database Migration Automation

實作自動化資料庫 Migration:

// apps/kyo-otp-service/src/db/migrate.ts
import { readdir, readFile } from 'fs/promises';
import { join } from 'path';
import pg from 'pg';

const { Pool } = pg;

interface Migration {
  version: number;
  name: string;
  sql: string;
}

class MigrationRunner {
  private pool: Pool;
  private migrationsDir: string;

  constructor(databaseUrl: string, migrationsDir: string) {
    this.pool = new Pool({
      connectionString: databaseUrl,
      ssl: process.env.NODE_ENV === 'production' ? { rejectUnauthorized: false } : undefined,
    });
    this.migrationsDir = migrationsDir;
  }

  /**
   * Initialize migrations table
   */
  async init(): Promise<void> {
    await this.pool.query(`
      CREATE TABLE IF NOT EXISTS migrations (
        id SERIAL PRIMARY KEY,
        version INTEGER UNIQUE NOT NULL,
        name VARCHAR(255) NOT NULL,
        executed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
      )
    `);
    console.log('[Migration] Initialized migrations table');
  }

  /**
   * Get applied migrations
   */
  async getAppliedMigrations(): Promise<number[]> {
    const result = await this.pool.query<{ version: number }>(
      'SELECT version FROM migrations ORDER BY version ASC'
    );
    return result.rows.map((row) => row.version);
  }

  /**
   * Get pending migrations
   */
  async getPendingMigrations(): Promise<Migration[]> {
    const appliedVersions = await this.getAppliedMigrations();
    const allMigrations = await this.loadMigrations();

    return allMigrations.filter((m) => !appliedVersions.includes(m.version));
  }

  /**
   * Load all migration files
   */
  private async loadMigrations(): Promise<Migration[]> {
    const files = await readdir(this.migrationsDir);
    const migrations: Migration[] = [];

    for (const file of files) {
      if (!file.endsWith('.sql')) continue;

      // Parse filename: 001_create_users_table.sql
      const match = file.match(/^(\d+)_(.+)\.sql$/);
      if (!match) continue;

      const version = parseInt(match[1], 10);
      const name = match[2];
      const sql = await readFile(join(this.migrationsDir, file), 'utf-8');

      migrations.push({ version, name, sql });
    }

    return migrations.sort((a, b) => a.version - b.version);
  }

  /**
   * Run migration
   */
  private async runMigration(migration: Migration): Promise<void> {
    const client = await this.pool.connect();

    try {
      await client.query('BEGIN');

      console.log(`[Migration] Running: ${migration.version}_${migration.name}`);

      // Execute migration SQL
      await client.query(migration.sql);

      // Record migration
      await client.query(
        'INSERT INTO migrations (version, name) VALUES ($1, $2)',
        [migration.version, migration.name]
      );

      await client.query('COMMIT');

      console.log(`[Migration] Completed: ${migration.version}_${migration.name}`);
    } catch (error) {
      await client.query('ROLLBACK');
      console.error(`[Migration] Failed: ${migration.version}_${migration.name}`, error);
      throw error;
    } finally {
      client.release();
    }
  }

  /**
   * Run all pending migrations
   */
  async up(): Promise<void> {
    await this.init();

    const pending = await this.getPendingMigrations();

    if (pending.length === 0) {
      console.log('[Migration] No pending migrations');
      return;
    }

    console.log(`[Migration] Found ${pending.length} pending migrations`);

    for (const migration of pending) {
      await this.runMigration(migration);
    }

    console.log('[Migration] All migrations completed');
  }

  /**
   * Show migration status
   */
  async status(): Promise<void> {
    await this.init();

    const applied = await this.getAppliedMigrations();
    const all = await this.loadMigrations();
    const pending = all.filter((m) => !applied.includes(m.version));

    console.log('\n=== Migration Status ===\n');
    console.log(`Applied: ${applied.length}`);
    console.log(`Pending: ${pending.length}`);
    console.log(`Total: ${all.length}\n`);

    if (applied.length > 0) {
      console.log('Applied Migrations:');
      for (const version of applied) {
        const migration = all.find((m) => m.version === version);
        console.log(`  ✓ ${version}_${migration?.name || 'unknown'}`);
      }
    }

    if (pending.length > 0) {
      console.log('\nPending Migrations:');
      for (const migration of pending) {
        console.log(`  ○ ${migration.version}_${migration.name}`);
      }
    }

    console.log('');
  }

  /**
   * Close database connection
   */
  async close(): Promise<void> {
    await this.pool.end();
  }
}

// CLI
const command = process.argv[2];
const databaseUrl = process.env.DATABASE_URL;
const migrationsDir = join(__dirname, 'migrations');

if (!databaseUrl) {
  console.error('DATABASE_URL environment variable is required');
  process.exit(1);
}

const runner = new MigrationRunner(databaseUrl, migrationsDir);

(async () => {
  try {
    switch (command) {
      case 'up':
        await runner.up();
        break;
      case 'status':
        await runner.status();
        break;
      default:
        console.log('Usage: node migrate.js [up|status]');
        process.exit(1);
    }
  } catch (error) {
    console.error('[Migration] Error:', error);
    process.exit(1);
  } finally {
    await runner.close();
  }
})();

新增 Migration 腳本範例:

-- apps/kyo-otp-service/src/db/migrations/001_create_users_table.sql
CREATE TABLE IF NOT EXISTS users (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  email VARCHAR(255) UNIQUE NOT NULL,
  password_hash VARCHAR(255) NOT NULL,
  phone_number VARCHAR(20),
  phone_verified BOOLEAN DEFAULT FALSE,
  status VARCHAR(20) DEFAULT 'pending',
  tenant_id UUID NOT NULL,
  role VARCHAR(50) DEFAULT 'user',
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_tenant_id ON users(tenant_id);
CREATE INDEX idx_users_status ON users(status);
-- apps/kyo-otp-service/src/db/migrations/002_create_otp_logs_table.sql
CREATE TABLE IF NOT EXISTS otp_logs (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  request_id VARCHAR(255) UNIQUE NOT NULL,
  user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  phone_number VARCHAR(20) NOT NULL,
  otp_code VARCHAR(10) NOT NULL,
  status VARCHAR(20) DEFAULT 'pending',
  attempts INTEGER DEFAULT 0,
  expires_at TIMESTAMP NOT NULL,
  verified_at TIMESTAMP,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_otp_logs_request_id ON otp_logs(request_id);
CREATE INDEX idx_otp_logs_user_id ON otp_logs(user_id);
CREATE INDEX idx_otp_logs_phone_number ON otp_logs(phone_number);
CREATE INDEX idx_otp_logs_status ON otp_logs(status);
CREATE INDEX idx_otp_logs_expires_at ON otp_logs(expires_at);

Smoke Tests

建立部署後的煙霧測試:

// test/smoke/smoke.test.ts
import axios from 'axios';
import { test } from 'node:test';
import assert from 'node:assert';

const API_URL = process.env.API_URL || 'http://localhost:3000';
const TIMEOUT = 10000;

test('Smoke Tests', { timeout: 30000 }, async (t) => {
  await t.test('Health check should return 200', async () => {
    const response = await axios.get(`${API_URL}/health`, { timeout: TIMEOUT });

    assert.strictEqual(response.status, 200);
    assert.ok(response.data.status);
  });

  await t.test('API should be accessible', async () => {
    const response = await axios.get(`${API_URL}/api`, { timeout: TIMEOUT });

    assert.strictEqual(response.status, 200);
  });

  await t.test('Database connection should be healthy', async () => {
    const response = await axios.get(`${API_URL}/health/db`, { timeout: TIMEOUT });

    assert.strictEqual(response.status, 200);
    assert.strictEqual(response.data.database, 'connected');
  });

  await t.test('Redis connection should be healthy', async () => {
    const response = await axios.get(`${API_URL}/health/redis`, { timeout: TIMEOUT });

    assert.strictEqual(response.status, 200);
    assert.strictEqual(response.data.redis, 'connected');
  });

  await t.test('Can login with valid credentials', async () => {
    const response = await axios.post(
      `${API_URL}/api/auth/login`,
      {
        email: process.env.TEST_USER_EMAIL || 'test@example.com',
        password: process.env.TEST_USER_PASSWORD || 'test123',
      },
      { timeout: TIMEOUT }
    );

    assert.strictEqual(response.status, 200);
    assert.ok(response.data.token);
  });

  await t.test('Rate limiting is working', async () => {
    const token = 'test-token';

    // Make multiple requests rapidly
    const requests = Array(10)
      .fill(null)
      .map(() =>
        axios.post(
          `${API_URL}/api/otp/send`,
          { phoneNumber: '0912345678' },
          {
            headers: { Authorization: `Bearer ${token}` },
            timeout: TIMEOUT,
            validateStatus: () => true, // Don't throw on any status
          }
        )
      );

    const responses = await Promise.all(requests);
    const rateLimited = responses.some((r) => r.status === 429);

    assert.ok(rateLimited, 'Rate limiting should trigger');
  });

  await t.test('Metrics endpoint is accessible', async () => {
    const response = await axios.get(`${API_URL}/metrics`, { timeout: TIMEOUT });

    assert.strictEqual(response.status, 200);
    assert.ok(response.data.includes('http_requests_total'));
  });
});

生產部署檢查清單

## AWS 生產部署檢查清單

### Infrastructure (CDK)
- [ ] CDK 程式碼已通過 synth
- [ ] 所有 Stacks 已部署成功
- [ ] VPC 與 Subnet 設定正確
- [ ] Security Groups 規則最小化
- [ ] NAT Gateway 已設定 (Multi-AZ)
- [ ] Load Balancer 健康檢查正常

### ECS/Fargate
- [ ] Task Definition 資源配置適當
- [ ] Container health check 設定
- [ ] Auto Scaling 策略已測試
- [ ] Service Discovery 正常運作
- [ ] 藍綠部署設定正確
- [ ] Deployment Circuit Breaker 啟用

### Database (RDS)
- [ ] Multi-AZ 部署已啟用
- [ ] 自動備份已設定 (7-35天)
- [ ] 加密已啟用 (at-rest, in-transit)
- [ ] Parameter Group 優化
- [ ] 效能監控已設定
- [ ] Migration 腳本已測試
- [ ] Rollback 策略已準備

### Cache (ElastiCache Redis)
- [ ] Cluster mode 已啟用 (production)
- [ ] Multi-AZ 複寫已設定
- [ ] 自動備份已設定
- [ ] 記憶體閾值告警已設定
- [ ] 連線池設定正確

### Secrets Management
- [ ] Secrets Manager 中存放所有敏感資料
- [ ] IAM 權限正確設定
- [ ] 金鑰輪替策略已設定
- [ ] 無硬編碼的 secrets

### CI/CD Pipeline
- [ ] GitHub Actions workflow 測試通過
- [ ] Docker 映像建置成功
- [ ] ECR 映像推送正常
- [ ] 自動化測試全部通過
- [ ] 部署腳本已驗證
- [ ] Rollback 機制已測試

### Monitoring & Logging
- [ ] CloudWatch Logs 收集正常
- [ ] Metrics Dashboard 已建立
- [ ] 告警規則已設定
- [ ] SNS 通知已測試
- [ ] X-Ray 追蹤已啟用
- [ ] Log retention 已設定

### Security
- [ ] WAF 規則已設定
- [ ] SSL/TLS 憑證已安裝
- [ ] Security Groups 最小化
- [ ] IAM Roles 遵循最小權限原則
- [ ] VPC Flow Logs 已啟用
- [ ] GuardDuty 已啟用

### Cost Optimization
- [ ] 資源 tagging 完整
- [ ] Cost Explorer 已設定
- [ ] Budget alerts 已設定
- [ ] Spot instances 考慮使用 (non-prod)
- [ ] S3 lifecycle policies 設定

### Performance
- [ ] Load testing 完成 (k6)
- [ ] API 回應時間 < 500ms (p95)
- [ ] Database queries 優化
- [ ] CDN 設定 (CloudFront)
- [ ] Gzip/Brotli 壓縮啟用

### Documentation
- [ ] Architecture diagram 更新
- [ ] Runbook 完整
- [ ] API documentation 更新
- [ ] Deployment guide 完成
- [ ] Troubleshooting guide 準備

### Post-Deployment
- [ ] Smoke tests 通過
- [ ] Health checks 正常
- [ ] Metrics 收集正常
- [ ] Logs 可查詢
- [ ] Alarms 沒有誤報
- [ ] 團隊通知已發送

今日總結

今天我們完成了 Kyo System 在 AWS 上的完整 CI/CD Pipeline:

完成項目

  1. 完整 GitHub Actions Workflow

    • 多階段測試 (Lint, Unit, Integration)
    • 安全掃描 (npm audit, Trivy)
    • Docker 映像建置與推送
    • 資料庫 Migration 自動化
    • 多環境部署 (Staging, Production)
    • 效能測試整合
  2. 優化的 Docker 建置

    • Multi-stage Dockerfile
    • 層級快取優化
    • 非 root 使用者
    • Health check 設定
    • 映像大小優化
  3. ECS 藍綠部署

    • CodeDeploy 整合
    • 零停機部署
    • 自動 Rollback
    • Canary 部署策略
    • CloudWatch Alarms 整合
  4. 自動化資料庫 Migration

    • 版本控制的 SQL 腳本
    • 交易式執行
    • Rollback 支援
    • CI/CD 整合
  5. 生產就緒檢查

    • Smoke Tests
    • 部署檢查清單
    • 監控與告警
    • 成本優化

技術特色

  • 完全自動化:從 git push 到生產部署全自動
  • 零停機部署:藍綠部署確保服務不中斷
  • 安全第一:多層次安全掃描與審計
  • 可觀測性:完整的監控、日誌、追蹤系統

上一篇
Day 28: 30天部署SaaS產品到AWS-AWS QuickSight 與資料湖架構
下一篇
Day 30: 30天部署SaaS產品到AWS-完賽心得與雲端架構回顧
系列文
30 天將工作室 SaaS 產品部署起來30
圖片
  熱門推薦
圖片
{{ item.channelVendor }} | {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言