終於來到了鐵人賽的最後一天!30 天的挑戰結束了。
我在軟體開發領域工作八年,從後端工程師、到全端工程師、再到區塊鏈開發,也成立了自己的工作室。但在這次的鐵人賽挑戰中,最大的收穫就是 AWS 的深度學習與實作。
以前在工作中,雖然也有使用 AWS,但大多只是部署一些簡單的應用,比如:
但這次從零開始打造 Kyo System,才發現 AWS 有許多我不知道的服務。30 天下來,深入學習了:
從不了解到熟悉,這是這次鐵人賽最大的轉變。
更重要的是,成功地為工作室的 SaaS 產品建立了一套完整的 AWS 架構。這套架構具備:
這讓我在工作上可以更進一步,未來工作室的其他產品也可以使用這套架構快速部署。
在這 30 天中,Kyo System 使用了以下 AWS 服務:
總計: 使用了 25+ 個 AWS 服務
這次最大的收穫之一,就是學會使用 AWS CDK (Cloud Development Kit)。
為什麼選擇 CDK 而不是 CloudFormation 或 Terraform?
工具 | 優點 | 缺點 | 適合場景 |
---|---|---|---|
CloudFormation | AWS 原生、功能完整 | YAML 冗長、難以維護 | 簡單架構 |
Terraform | 多雲支援、HCL 語法 | 學習曲線、狀態管理 | 多雲環境 |
CDK | TypeScript 程式設計、型別安全 | 只支援 AWS | 複雜 AWS 架構 |
CDK 的優勢:
// 1. 使用熟悉的程式語言 (TypeScript)
// 2. IDE 自動完成與型別檢查
// 3. 可以使用迴圈、條件判斷等程式邏輯
// 4. 更容易測試與重構
// 範例: 建立 VPC
const vpc = new ec2.Vpc(this, 'KyoVPC', {
maxAzs: 2, // 使用 2 個可用區
natGateways: 1, // 1 個 NAT Gateway (節省成本)
subnetConfiguration: [
{
name: 'Public',
subnetType: ec2.SubnetType.PUBLIC,
cidrMask: 24,
},
{
name: 'Private',
subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS,
cidrMask: 24,
},
{
name: 'Database',
subnetType: ec2.SubnetType.PRIVATE_ISOLATED,
cidrMask: 28,
},
],
})
// CDK 自動建立:
// - VPC
// - Internet Gateway
// - NAT Gateway
// - Route Tables
// - Security Groups
// - Subnets (Public, Private, Database)
// 只需 10 行程式碼!
完整的 CDK Stack 結構:
// lib/kyo-stack.ts
import * as cdk from 'aws-cdk-lib'
import * as ec2 from 'aws-cdk-lib/aws-ec2'
import * as ecs from 'aws-cdk-lib/aws-ecs'
import * as elbv2 from 'aws-cdk-lib/aws-elasticloadbalancingv2'
import * as rds from 'aws-cdk-lib/aws-rds'
import * as elasticache from 'aws-cdk-lib/aws-elasticache'
import { Construct } from 'constructs'
export class KyoStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props)
// 1. 網路層
const vpc = this.createVPC()
// 2. 資料庫層
const database = this.createDatabase(vpc)
const cache = this.createCache(vpc)
// 3. 應用層
const cluster = this.createECSCluster(vpc)
const service = this.createFargateService(cluster, database, cache)
// 4. 負載均衡器
const alb = this.createLoadBalancer(vpc, service)
// 5. 監控與告警
this.createMonitoring(service, database, cache)
// 6. 安全防護
this.createWAF(alb)
// 輸出重要資訊
new cdk.CfnOutput(this, 'LoadBalancerDNS', {
value: alb.loadBalancerDnsName,
description: 'Load Balancer DNS Name',
})
}
private createVPC(): ec2.Vpc {
return new ec2.Vpc(this, 'KyoVPC', {
maxAzs: 2,
natGateways: 1,
subnetConfiguration: [/* ... */],
})
}
private createDatabase(vpc: ec2.Vpc): rds.DatabaseInstance {
return new rds.DatabaseInstance(this, 'KyoDB', {
engine: rds.DatabaseInstanceEngine.postgres({
version: rds.PostgresEngineVersion.VER_15_3,
}),
instanceType: ec2.InstanceType.of(
ec2.InstanceClass.T3,
ec2.InstanceSize.SMALL
),
vpc,
vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_ISOLATED },
multiAz: true, // 高可用
allocatedStorage: 20,
maxAllocatedStorage: 100, // 自動擴展
backupRetention: cdk.Duration.days(7),
deleteAutomatedBackups: false,
removalPolicy: cdk.RemovalPolicy.SNAPSHOT,
})
}
// ... 其他方法
}
CDK 最佳實踐:
// 1. 使用環境變數區分環境
const app = new cdk.App()
new KyoStack(app, 'KyoDev', {
env: {
account: process.env.CDK_DEFAULT_ACCOUNT,
region: 'us-west-2',
},
stage: 'dev',
})
new KyoStack(app, 'KyoProd', {
env: {
account: process.env.CDK_DEFAULT_ACCOUNT,
region: 'us-west-2',
},
stage: 'prod',
})
// 2. 使用 Context 管理設定
// cdk.json
{
"context": {
"dev": {
"dbInstanceType": "t3.micro",
"fargateTaskCount": 1
},
"prod": {
"dbInstanceType": "t3.medium",
"fargateTaskCount": 3
}
}
}
// 3. 使用 Aspects 進行跨資源設定
import { Aspects, Tag } from 'aws-cdk-lib'
Aspects.of(app).add(new Tag('Project', 'Kyo-System'))
Aspects.of(app).add(new Tag('ManagedBy', 'CDK'))
Aspects.of(app).add(new Tag('Environment', stage))
AWS 的成本管理非常重要,以下是我學到的成本優化策略:
// 使用 Fargate Spot 實例 (最多節省 70%)
const taskDefinition = new ecs.FargateTaskDefinition(this, 'Task', {
memoryLimitMiB: 512,
cpu: 256,
})
const service = new ecs.FargateService(this, 'Service', {
cluster,
taskDefinition,
capacityProviderStrategies: [
{
capacityProvider: 'FARGATE_SPOT',
weight: 2, // 優先使用 Spot
base: 0,
},
{
capacityProvider: 'FARGATE',
weight: 1, // 備用 On-Demand
base: 1, // 至少 1 個 On-Demand 保證可用性
},
],
})
// 使用 Reserved Instances (節省 40-60%)
// 透過 AWS Console 購買 1 年或 3 年預留容量
// 使用 Aurora Serverless (按需付費)
const auroraCluster = new rds.ServerlessCluster(this, 'AuroraDB', {
engine: rds.DatabaseClusterEngine.auroraPostgres({
version: rds.AuroraPostgresEngineVersion.VER_13_7,
}),
vpc,
scaling: {
minCapacity: rds.AuroraCapacityUnit.ACU_2,
maxCapacity: rds.AuroraCapacityUnit.ACU_4,
autoPause: cdk.Duration.minutes(10), // 閒置 10 分鐘自動暫停
},
})
// S3 生命週期管理
const logBucket = new s3.Bucket(this, 'LogBucket', {
lifecycleRules: [
{
id: 'DeleteOldLogs',
enabled: true,
expiration: cdk.Duration.days(30), // 30 天後刪除
transitions: [
{
storageClass: s3.StorageClass.INFREQUENT_ACCESS,
transitionAfter: cdk.Duration.days(7), // 7 天後移到 IA
},
{
storageClass: s3.StorageClass.GLACIER,
transitionAfter: cdk.Duration.days(14), // 14 天後移到 Glacier
},
],
},
],
})
// 使用單一 NAT Gateway (開發環境)
// 生產環境使用 Multi-AZ NAT Gateway 提高可用性
const vpc = new ec2.Vpc(this, 'VPC', {
natGateways: stage === 'prod' ? 2 : 1,
})
// 使用 VPC Endpoints 避免 NAT Gateway 流量費
vpc.addGatewayEndpoint('S3Endpoint', {
service: ec2.GatewayVpcEndpointAwsService.S3,
})
vpc.addInterfaceEndpoint('ECREndpoint', {
service: ec2.InterfaceVpcEndpointAwsService.ECR,
})
// 設定成本告警
const costAlarm = new cloudwatch.Alarm(this, 'CostAlarm', {
metric: new cloudwatch.Metric({
namespace: 'AWS/Billing',
metricName: 'EstimatedCharges',
statistic: 'Maximum',
period: cdk.Duration.hours(6),
dimensionsMap: {
Currency: 'USD',
},
}),
threshold: 100, // 超過 $100 告警
evaluationPeriods: 1,
alarmDescription: 'Alert when estimated charges exceed $100',
})
costAlarm.addAlarmAction(new actions.SnsAction(alertTopic))
為什麼選擇 Fargate 而不是 EC2?
EC2 優點:
✅ 更便宜 (長期運行)
✅ 完全控制
EC2 缺點:
❌ 需要管理伺服器
❌ 需要處理擴展
❌ 需要安全更新
Fargate 優點:
✅ 無需管理伺服器
✅ 自動擴展
✅ 按需付費
✅ 內建安全更新
Fargate 缺點:
❌ 稍微貴一點
❌ 較少的控制
Kyo System 選擇 Fargate:
- 初期不需要大量運算資源
- 團隊小,不想花時間管理伺服器
- 需要快速擴展能力
完整的 ECS Fargate 部署:
// 1. 建立 ECS Cluster
const cluster = new ecs.Cluster(this, 'KyoCluster', {
vpc,
clusterName: 'kyo-cluster',
containerInsights: true, // 啟用 Container Insights
})
// 2. 建立 Task Definition
const taskDefinition = new ecs.FargateTaskDefinition(this, 'TaskDef', {
memoryLimitMiB: 1024, // 1 GB
cpu: 512, // 0.5 vCPU
runtimePlatform: {
cpuArchitecture: ecs.CpuArchitecture.ARM64, // 使用 ARM64 更便宜
operatingSystemFamily: ecs.OperatingSystemFamily.LINUX,
},
})
// 3. 加入容器
const container = taskDefinition.addContainer('app', {
image: ecs.ContainerImage.fromEcrRepository(ecrRepo, 'latest'),
logging: ecs.LogDrivers.awsLogs({
streamPrefix: 'kyo-api',
logRetention: logs.RetentionDays.ONE_WEEK,
}),
environment: {
NODE_ENV: 'production',
PORT: '8080',
},
secrets: {
DATABASE_URL: ecs.Secret.fromSecretsManager(dbSecret),
JWT_SECRET: ecs.Secret.fromSecretsManager(jwtSecret),
REDIS_URL: ecs.Secret.fromSecretsManager(redisSecret),
},
healthCheck: {
command: ['CMD-SHELL', 'curl -f http://localhost:8080/health || exit 1'],
interval: cdk.Duration.seconds(30),
timeout: cdk.Duration.seconds(5),
retries: 3,
startPeriod: cdk.Duration.seconds(60),
},
})
container.addPortMappings({
containerPort: 8080,
protocol: ecs.Protocol.TCP,
})
// 4. 建立 Fargate Service
const service = new ecs.FargateService(this, 'Service', {
cluster,
taskDefinition,
desiredCount: 2, // 2 個 task
minHealthyPercent: 50, // 滾動更新時至少保持 50%
maxHealthyPercent: 200, // 滾動更新時最多 200%
assignPublicIp: false, // 在 Private Subnet
vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
securityGroups: [appSecurityGroup],
enableExecuteCommand: true, // 允許 ECS Exec (除錯用)
capacityProviderStrategies: [
{
capacityProvider: 'FARGATE_SPOT',
weight: 2,
},
{
capacityProvider: 'FARGATE',
weight: 1,
base: 1,
},
],
})
// 5. 設定 Auto Scaling
const scaling = service.autoScaleTaskCount({
minCapacity: 2,
maxCapacity: 10,
})
// CPU 使用率 > 70% 時擴展
scaling.scaleOnCpuUtilization('CpuScaling', {
targetUtilizationPercent: 70,
scaleInCooldown: cdk.Duration.seconds(60),
scaleOutCooldown: cdk.Duration.seconds(60),
})
// 記憶體使用率 > 80% 時擴展
scaling.scaleOnMemoryUtilization('MemoryScaling', {
targetUtilizationPercent: 80,
})
// 基於 ALB 請求數量擴展
scaling.scaleOnRequestCount('RequestScaling', {
requestsPerTarget: 1000,
targetGroup: targetGroup,
})
// 6. 連接到 ALB
const alb = new elbv2.ApplicationLoadBalancer(this, 'ALB', {
vpc,
internetFacing: true,
vpcSubnets: { subnetType: ec2.SubnetType.PUBLIC },
})
const listener = alb.addListener('Listener', {
port: 443,
protocol: elbv2.ApplicationProtocol.HTTPS,
certificates: [certificate],
})
listener.addTargets('ECS', {
port: 8080,
protocol: elbv2.ApplicationProtocol.HTTP,
targets: [service],
healthCheck: {
path: '/health',
interval: cdk.Duration.seconds(30),
timeout: cdk.Duration.seconds(5),
healthyThresholdCount: 2,
unhealthyThresholdCount: 3,
},
deregistrationDelay: cdk.Duration.seconds(30),
})
RDS 高可用架構:
// Multi-AZ 部署
const database = new rds.DatabaseInstance(this, 'Database', {
engine: rds.DatabaseInstanceEngine.postgres({
version: rds.PostgresEngineVersion.VER_15_3,
}),
instanceType: ec2.InstanceType.of(
ec2.InstanceClass.T3,
ec2.InstanceSize.SMALL
),
vpc,
vpcSubnets: {
subnetType: ec2.SubnetType.PRIVATE_ISOLATED,
},
// 高可用設定
multiAz: true, // 自動容錯移轉到備用實例
// 儲存設定
allocatedStorage: 20, // 初始 20 GB
maxAllocatedStorage: 100, // 自動擴展到 100 GB
storageType: rds.StorageType.GP3, // 使用 GP3 (比 GP2 便宜且效能更好)
// 備份設定
backupRetention: cdk.Duration.days(7), // 保留 7 天備份
preferredBackupWindow: '03:00-04:00', // 凌晨 3 點備份
deleteAutomatedBackups: false, // 刪除實例時保留備份
// 維護設定
preferredMaintenanceWindow: 'sun:04:00-sun:05:00', // 週日凌晨 4 點維護
autoMinorVersionUpgrade: true, // 自動更新小版本
// 效能洞察
enablePerformanceInsights: true,
performanceInsightRetention: rds.PerformanceInsightRetention.DEFAULT,
// 監控
monitoringInterval: cdk.Duration.seconds(60), // 每分鐘收集指標
cloudwatchLogsExports: ['postgresql'], // 匯出日誌到 CloudWatch
// 刪除保護
deletionProtection: true, // 防止意外刪除
removalPolicy: cdk.RemovalPolicy.SNAPSHOT, // 刪除時建立快照
})
// 設定連線參數
const parameterGroup = new rds.ParameterGroup(this, 'ParameterGroup', {
engine: rds.DatabaseInstanceEngine.postgres({
version: rds.PostgresEngineVersion.VER_15_3,
}),
parameters: {
'shared_buffers': '256MB',
'max_connections': '200',
'work_mem': '4MB',
'maintenance_work_mem': '64MB',
'effective_cache_size': '1GB',
'random_page_cost': '1.1',
'log_statement': 'all', // 記錄所有 SQL (開發環境)
'log_min_duration_statement': '1000', // 記錄慢查詢 (>1s)
},
})
讀寫分離架構:
// 建立 Read Replica
const readReplica = new rds.DatabaseInstanceReadReplica(this, 'ReadReplica', {
sourceDatabaseInstance: database,
instanceType: ec2.InstanceType.of(
ec2.InstanceClass.T3,
ec2.InstanceSize.MICRO // Read Replica 可以使用更小的實例
),
vpc,
vpcSubnets: {
subnetType: ec2.SubnetType.PRIVATE_ISOLATED,
},
multiAz: false, // Read Replica 通常不需要 Multi-AZ
deleteAutomatedBackups: true,
})
// 應用層使用
// 寫入使用 Master
const writeEndpoint = database.dbInstanceEndpointAddress
// 讀取使用 Read Replica
const readEndpoint = readReplica.dbInstanceEndpointAddress
備份與還原策略:
// 1. 自動快照 (每日)
backupRetention: cdk.Duration.days(7)
// 2. 手動快照 (重大更新前)
// aws rds create-db-snapshot \
// --db-instance-identifier kyo-db \
// --db-snapshot-identifier kyo-db-before-migration-2025-01-15
// 3. 匯出到 S3 (長期保存)
const backupBucket = new s3.Bucket(this, 'BackupBucket', {
lifecycleRules: [
{
id: 'ArchiveOldBackups',
transitions: [
{
storageClass: s3.StorageClass.GLACIER,
transitionAfter: cdk.Duration.days(30),
},
],
},
],
})
// 4. 定時備份 Lambda
const backupFunction = new lambda.Function(this, 'BackupFunction', {
runtime: lambda.Runtime.NODEJS_18_X,
handler: 'index.handler',
code: lambda.Code.fromInline(`
const { RDS } = require('aws-sdk')
const rds = new RDS()
exports.handler = async () => {
const timestamp = new Date().toISOString().split('T')[0]
await rds.createDBSnapshot({
DBInstanceIdentifier: process.env.DB_IDENTIFIER,
DBSnapshotIdentifier: \`manual-backup-\${timestamp}\`,
}).promise()
}
`),
environment: {
DB_IDENTIFIER: database.instanceIdentifier,
},
})
// 每週備份
const rule = new events.Rule(this, 'BackupRule', {
schedule: events.Schedule.cron({
weekDay: 'SUN',
hour: '2',
minute: '0',
}),
})
rule.addTarget(new targets.LambdaFunction(backupFunction))
// Redis Cluster Mode (高可用 + 分片)
const redisSubnetGroup = new elasticache.CfnSubnetGroup(this, 'RedisSubnet', {
description: 'Subnet group for Redis',
subnetIds: vpc.selectSubnets({
subnetType: ec2.SubnetType.PRIVATE_ISOLATED,
}).subnetIds,
})
const redisCluster = new elasticache.CfnReplicationGroup(this, 'Redis', {
replicationGroupDescription: 'Kyo System Redis Cluster',
engine: 'redis',
engineVersion: '7.0',
cacheNodeType: 'cache.t3.micro',
// Cluster Mode 設定
numNodeGroups: 2, // 2 個分片
replicasPerNodeGroup: 1, // 每個分片 1 個副本
// 高可用設定
automaticFailoverEnabled: true, // 自動容錯移轉
multiAzEnabled: true, // Multi-AZ 部署
// 網路設定
cacheSubnetGroupName: redisSubnetGroup.ref,
securityGroupIds: [redisSecurityGroup.securityGroupId],
// 備份設定
snapshotRetentionLimit: 7, // 保留 7 天快照
snapshotWindow: '03:00-05:00', // 凌晨 3-5 點快照
// 維護設定
preferredMaintenanceWindow: 'sun:05:00-sun:06:00',
// 通知設定
notificationTopicArn: alertTopic.topicArn,
// 參數設定
cacheParameterGroupName: 'default.redis7.cluster.on',
// 啟用加密
atRestEncryptionEnabled: true,
transitEncryptionEnabled: true,
})
// 監控 Redis
const redisMemoryAlarm = new cloudwatch.Alarm(this, 'RedisMemoryAlarm', {
metric: new cloudwatch.Metric({
namespace: 'AWS/ElastiCache',
metricName: 'DatabaseMemoryUsagePercentage',
statistic: 'Average',
period: cdk.Duration.minutes(5),
dimensionsMap: {
ReplicationGroupId: redisCluster.ref,
},
}),
threshold: 80, // 記憶體使用率 > 80% 告警
evaluationPeriods: 2,
alarmDescription: 'Redis memory usage is too high',
})
redisMemoryAlarm.addAlarmAction(new actions.SnsAction(alertTopic))
SES 設定與驗證:
// 1. 驗證網域
const domain = 'kyo-system.com'
const sesIdentity = new ses.EmailIdentity(this, 'SESIdentity', {
identity: ses.Identity.domain(domain),
mailFromDomain: `mail.${domain}`, // 自訂 MAIL FROM
})
// 2. 設定 DKIM (CDK 會自動建立 DKIM tokens)
// 需要在 DNS 加入 CNAME 記錄
// 3. 設定 SPF (Sender Policy Framework)
new route53.TxtRecord(this, 'SPF', {
zone: hostedZone,
recordName: domain,
values: ['v=spf1 include:amazonses.com ~all'],
})
// 4. 設定 DMARC
new route53.TxtRecord(this, 'DMARC', {
zone: hostedZone,
recordName: `_dmarc.${domain}`,
values: ['v=DMARC1; p=quarantine; rua=mailto:dmarc@kyo-system.com'],
})
// 5. 建立信件模板
const template = new ses.CfnTemplate(this, 'WelcomeTemplate', {
template: {
templateName: 'welcome-email',
subjectPart: '歡迎加入 Kyo System!',
htmlPart: `
<html>
<body>
<h1>歡迎, {{name}}!</h1>
<p>感謝您註冊 Kyo System。</p>
<a href="{{verifyUrl}}">點此驗證您的帳號</a>
</body>
</html>
`,
textPart: '歡迎, {{name}}! 請造訪 {{verifyUrl}} 驗證您的帳號。',
},
})
// 6. 設定 Configuration Set (追蹤開啟率、點擊率)
const configSet = new ses.ConfigurationSet(this, 'ConfigSet', {
configurationSetName: 'kyo-email-tracking',
})
// 追蹤事件到 CloudWatch
configSet.addEventDestination('CloudWatch', {
destination: ses.EventDestination.cloudWatchDimensions({
dimensionConfigurations: [
{
dimensionName: 'template',
dimensionValueSource: ses.DimensionValueSource.MESSAGE_TAG,
defaultDimensionValue: 'unknown',
},
],
}),
events: [
ses.EmailSendingEvent.SEND,
ses.EmailSendingEvent.DELIVERY,
ses.EmailSendingEvent.BOUNCE,
ses.EmailSendingEvent.COMPLAINT,
],
})
SES 發送限制與管理:
// 新帳號限制:
// - 沙盒模式: 只能發送到已驗證的地址
// - 每天最多 200 封
// - 每秒最多 1 封
// 申請移除沙盒限制後:
// - 可以發送到任何地址
// - 每天最多 50,000 封 (可申請提高)
// - 每秒最多 14 封 (可申請提高)
// 監控發送配額
const sendQuotaAlarm = new cloudwatch.Alarm(this, 'SESQuotaAlarm', {
metric: new cloudwatch.Metric({
namespace: 'AWS/SES',
metricName: 'Send',
statistic: 'Sum',
period: cdk.Duration.days(1),
}),
threshold: 45000, // 接近每日限制時告警
evaluationPeriods: 1,
alarmDescription: 'SES daily send quota is almost reached',
})
// 監控退信率 (>5% 可能被暫停)
const bounceRateAlarm = new cloudwatch.Alarm(this, 'BounceRateAlarm', {
metric: new cloudwatch.Metric({
namespace: 'AWS/SES',
metricName: 'Reputation.BounceRate',
statistic: 'Average',
period: cdk.Duration.hours(1),
}),
threshold: 0.05, // 5%
evaluationPeriods: 1,
alarmDescription: 'SES bounce rate is too high',
})
// 監控投訴率 (>0.1% 可能被暫停)
const complaintRateAlarm = new cloudwatch.Alarm(this, 'ComplaintRateAlarm', {
metric: new cloudwatch.Metric({
namespace: 'AWS/SES',
metricName: 'Reputation.ComplaintRate',
statistic: 'Average',
period: cdk.Duration.hours(1),
}),
threshold: 0.001, // 0.1%
evaluationPeriods: 1,
alarmDescription: 'SES complaint rate is too high',
})
// 建立 WAF Web ACL
const webAcl = new wafv2.CfnWebACL(this, 'WebACL', {
scope: 'REGIONAL', // ALB 使用 REGIONAL, CloudFront 使用 CLOUDFRONT
defaultAction: { allow: {} },
rules: [
// 1. AWS Managed Rules - 核心規則集
{
name: 'AWS-AWSManagedRulesCommonRuleSet',
priority: 1,
statement: {
managedRuleGroupStatement: {
vendorName: 'AWS',
name: 'AWSManagedRulesCommonRuleSet',
},
},
overrideAction: { none: {} },
visibilityConfig: {
sampledRequestsEnabled: true,
cloudWatchMetricsEnabled: true,
metricName: 'AWSManagedRulesCommonRuleSetMetric',
},
},
// 2. AWS Managed Rules - 已知壞輸入
{
name: 'AWS-AWSManagedRulesKnownBadInputsRuleSet',
priority: 2,
statement: {
managedRuleGroupStatement: {
vendorName: 'AWS',
name: 'AWSManagedRulesKnownBadInputsRuleSet',
},
},
overrideAction: { none: {} },
visibilityConfig: {
sampledRequestsEnabled: true,
cloudWatchMetricsEnabled: true,
metricName: 'AWSManagedRulesKnownBadInputsRuleSetMetric',
},
},
// 3. 速率限制 (每 5 分鐘最多 2000 個請求)
{
name: 'RateLimitRule',
priority: 3,
statement: {
rateBasedStatement: {
limit: 2000,
aggregateKeyType: 'IP',
},
},
action: { block: {} },
visibilityConfig: {
sampledRequestsEnabled: true,
cloudWatchMetricsEnabled: true,
metricName: 'RateLimitRuleMetric',
},
},
// 4. 地理限制 (只允許台灣和美國)
{
name: 'GeoBlockRule',
priority: 4,
statement: {
notStatement: {
statement: {
geoMatchStatement: {
countryCodes: ['TW', 'US'],
},
},
},
},
action: { block: {} },
visibilityConfig: {
sampledRequestsEnabled: true,
cloudWatchMetricsEnabled: true,
metricName: 'GeoBlockRuleMetric',
},
},
// 5. IP 白名單 (辦公室 IP)
{
name: 'IPWhitelistRule',
priority: 5,
statement: {
ipSetReferenceStatement: {
arn: ipSet.attrArn,
},
},
action: { allow: {} },
visibilityConfig: {
sampledRequestsEnabled: true,
cloudWatchMetricsEnabled: true,
metricName: 'IPWhitelistRuleMetric',
},
},
// 6. SQL 注入防護
{
name: 'SQLiProtectionRule',
priority: 6,
statement: {
sqliMatchStatement: {
fieldToMatch: {
allQueryArguments: {},
},
textTransformations: [
{
priority: 0,
type: 'URL_DECODE',
},
{
priority: 1,
type: 'HTML_ENTITY_DECODE',
},
],
},
},
action: { block: {} },
visibilityConfig: {
sampledRequestsEnabled: true,
cloudWatchMetricsEnabled: true,
metricName: 'SQLiProtectionRuleMetric',
},
},
// 7. XSS 防護
{
name: 'XSSProtectionRule',
priority: 7,
statement: {
xssMatchStatement: {
fieldToMatch: {
body: {},
},
textTransformations: [
{
priority: 0,
type: 'URL_DECODE',
},
],
},
},
action: { block: {} },
visibilityConfig: {
sampledRequestsEnabled: true,
cloudWatchMetricsEnabled: true,
metricName: 'XSSProtectionRuleMetric',
},
},
],
visibilityConfig: {
sampledRequestsEnabled: true,
cloudWatchMetricsEnabled: true,
metricName: 'WebACLMetric',
},
})
// 關聯到 ALB
new wafv2.CfnWebACLAssociation(this, 'WebACLAssociation', {
webAclArn: webAcl.attrArn,
resourceArn: alb.loadBalancerArn,
})
// 監控 WAF
const wafBlockedRequestsAlarm = new cloudwatch.Alarm(this, 'WAFBlockedAlarm', {
metric: new cloudwatch.Metric({
namespace: 'AWS/WAFV2',
metricName: 'BlockedRequests',
statistic: 'Sum',
period: cdk.Duration.minutes(5),
dimensionsMap: {
Rule: 'ALL',
WebACL: webAcl.ref,
Region: cdk.Stack.of(this).region,
},
}),
threshold: 1000, // 5 分鐘內 > 1000 個被擋請求
evaluationPeriods: 1,
alarmDescription: 'High number of blocked requests by WAF',
})
wafBlockedRequestsAlarm.addAlarmAction(new actions.SnsAction(alertTopic))
完整的監控系統:
// 1. 建立 SNS Topic 用於告警
const alertTopic = new sns.Topic(this, 'AlertTopic', {
displayName: 'Kyo System Alerts',
})
alertTopic.addSubscription(
new subscriptions.EmailSubscription('admin@kyo-system.com')
)
// 2. 建立 CloudWatch Dashboard
const dashboard = new cloudwatch.Dashboard(this, 'Dashboard', {
dashboardName: 'kyo-system-dashboard',
})
// 3. ECS 服務監控
dashboard.addWidgets(
new cloudwatch.GraphWidget({
title: 'ECS Service Metrics',
left: [
service.metricCpuUtilization(),
service.metricMemoryUtilization(),
],
right: [
new cloudwatch.Metric({
namespace: 'AWS/ECS',
metricName: 'RunningTaskCount',
statistic: 'Average',
period: cdk.Duration.minutes(1),
dimensionsMap: {
ServiceName: service.serviceName,
ClusterName: cluster.clusterName,
},
}),
],
})
)
// 4. ALB 監控
dashboard.addWidgets(
new cloudwatch.GraphWidget({
title: 'ALB Metrics',
left: [
alb.metricRequestCount(),
alb.metricTargetResponseTime(),
],
right: [
alb.metricHttpCodeTarget(elbv2.HttpCodeTarget.TARGET_2XX_COUNT),
alb.metricHttpCodeTarget(elbv2.HttpCodeTarget.TARGET_5XX_COUNT),
],
})
)
// 5. RDS 監控
dashboard.addWidgets(
new cloudwatch.GraphWidget({
title: 'RDS Metrics',
left: [
database.metricCPUUtilization(),
database.metricDatabaseConnections(),
],
right: [
database.metricFreeStorageSpace(),
database.metricFreeableMemory(),
],
})
)
// 6. Redis 監控
dashboard.addWidgets(
new cloudwatch.GraphWidget({
title: 'Redis Metrics',
left: [
new cloudwatch.Metric({
namespace: 'AWS/ElastiCache',
metricName: 'CPUUtilization',
statistic: 'Average',
dimensionsMap: {
ReplicationGroupId: redisCluster.ref,
},
}),
],
right: [
new cloudwatch.Metric({
namespace: 'AWS/ElastiCache',
metricName: 'DatabaseMemoryUsagePercentage',
statistic: 'Average',
dimensionsMap: {
ReplicationGroupId: redisCluster.ref,
},
}),
],
})
)
// 7. 應用層指標 (自訂 Metrics)
const apiLatencyMetric = new cloudwatch.Metric({
namespace: 'KyoSystem/API',
metricName: 'APILatency',
statistic: 'Average',
period: cdk.Duration.minutes(1),
})
dashboard.addWidgets(
new cloudwatch.GraphWidget({
title: 'API Metrics',
left: [apiLatencyMetric],
right: [
new cloudwatch.Metric({
namespace: 'KyoSystem/API',
metricName: 'RequestCount',
statistic: 'Sum',
}),
],
})
)
// 8. 告警規則
// ECS CPU 使用率過高
const ecsCpuAlarm = new cloudwatch.Alarm(this, 'ECSCpuAlarm', {
metric: service.metricCpuUtilization(),
threshold: 80,
evaluationPeriods: 2,
alarmDescription: 'ECS service CPU utilization is too high',
})
ecsCpuAlarm.addAlarmAction(new actions.SnsAction(alertTopic))
// ALB 5XX 錯誤過多
const alb5xxAlarm = new cloudwatch.Alarm(this, 'ALB5xxAlarm', {
metric: alb.metricHttpCodeTarget(elbv2.HttpCodeTarget.TARGET_5XX_COUNT),
threshold: 10,
evaluationPeriods: 1,
alarmDescription: 'Too many 5XX errors from ALB',
})
alb5xxAlarm.addAlarmAction(new actions.SnsAction(alertTopic))
// RDS 連線數過高
const rdsConnectionAlarm = new cloudwatch.Alarm(this, 'RDSConnectionAlarm', {
metric: database.metricDatabaseConnections(),
threshold: 150, // max_connections 的 75%
evaluationPeriods: 2,
alarmDescription: 'RDS connection count is too high',
})
rdsConnectionAlarm.addAlarmAction(new actions.SnsAction(alertTopic))
// 9. 日誌洞察查詢
const logGroup = new logs.LogGroup(this, 'LogGroup', {
logGroupName: '/ecs/kyo-system',
retention: logs.RetentionDays.ONE_WEEK,
})
// 建立常用查詢
new logs.QueryDefinition(this, 'ErrorQuery', {
queryDefinitionName: 'kyo-api-errors',
queryString: new logs.QueryString({
fields: ['@timestamp', '@message'],
filter: '@message like /ERROR/',
sort: '@timestamp desc',
limit: 100,
}),
logGroups: [logGroup],
})
new logs.QueryDefinition(this, 'SlowQueryLog', {
queryDefinitionName: 'kyo-slow-queries',
queryString: new logs.QueryString({
fields: ['@timestamp', 'duration', 'path'],
filter: 'duration > 1000', // > 1 秒的請求
sort: 'duration desc',
limit: 50,
}),
logGroups: [logGroup],
})
// 完整的 CI/CD Pipeline
const pipeline = new codepipeline.Pipeline(this, 'Pipeline', {
pipelineName: 'kyo-system-pipeline',
crossAccountKeys: false, // 單一帳號不需要
})
// 1. Source Stage (GitHub)
const sourceOutput = new codepipeline.Artifact()
const sourceAction = new actions.GitHubSourceAction({
actionName: 'GitHub_Source',
owner: 'your-org',
repo: 'kyo-system',
branch: 'main',
oauthToken: cdk.SecretValue.secretsManager('github-token'),
output: sourceOutput,
})
pipeline.addStage({
stageName: 'Source',
actions: [sourceAction],
})
// 2. Build Stage (CodeBuild)
const buildProject = new codebuild.PipelineProject(this, 'BuildProject', {
buildSpec: codebuild.BuildSpec.fromObject({
version: '0.2',
phases: {
pre_build: {
commands: [
'echo Logging in to Amazon ECR...',
'aws ecr get-login-password --region $AWS_DEFAULT_REGION | docker login --username AWS --password-stdin $ECR_REPO',
'COMMIT_HASH=$(echo $CODEBUILD_RESOLVED_SOURCE_VERSION | cut -c 1-7)',
'IMAGE_TAG=${COMMIT_HASH:=latest}',
],
},
build: {
commands: [
'echo Build started on `date`',
'echo Building the Docker image...',
'docker build -t $ECR_REPO:latest .',
'docker tag $ECR_REPO:latest $ECR_REPO:$IMAGE_TAG',
],
},
post_build: {
commands: [
'echo Build completed on `date`',
'echo Pushing the Docker images...',
'docker push $ECR_REPO:latest',
'docker push $ECR_REPO:$IMAGE_TAG',
'echo Writing image definitions file...',
'printf \'[{"name":"app","imageUri":"%s"}]\' $ECR_REPO:$IMAGE_TAG > imagedefinitions.json',
],
},
},
artifacts: {
files: ['imagedefinitions.json'],
},
}),
environment: {
buildImage: codebuild.LinuxBuildImage.STANDARD_5_0,
privileged: true, // 需要 Docker
},
environmentVariables: {
ECR_REPO: {
value: ecrRepo.repositoryUri,
},
},
})
// 授權推送到 ECR
ecrRepo.grantPullPush(buildProject)
const buildOutput = new codepipeline.Artifact()
const buildAction = new actions.CodeBuildAction({
actionName: 'CodeBuild',
project: buildProject,
input: sourceOutput,
outputs: [buildOutput],
})
pipeline.addStage({
stageName: 'Build',
actions: [buildAction],
})
// 3. Test Stage (可選)
const testProject = new codebuild.PipelineProject(this, 'TestProject', {
buildSpec: codebuild.BuildSpec.fromObject({
version: '0.2',
phases: {
install: {
commands: [
'npm ci',
],
},
build: {
commands: [
'npm run test',
'npm run lint',
],
},
},
}),
})
const testAction = new actions.CodeBuildAction({
actionName: 'Test',
project: testProject,
input: sourceOutput,
runOrder: 1,
})
pipeline.addStage({
stageName: 'Test',
actions: [testAction],
})
// 4. Deploy Stage (ECS)
const deployAction = new actions.EcsDeployAction({
actionName: 'Deploy',
service: service,
input: buildOutput,
deploymentTimeout: cdk.Duration.minutes(10),
})
pipeline.addStage({
stageName: 'Deploy',
actions: [deployAction],
})
// 5. 部署通知
pipeline.onStateChange('PipelineStateChange', {
target: new targets.SnsTopic(alertTopic),
eventPattern: {
detail: {
state: ['FAILED', 'SUCCEEDED'],
},
},
})
藍綠部署策略:
// 使用 CodeDeploy 實現藍綠部署
const deploymentConfig = ecs.EcsDeploymentConfig.ALL_AT_ONCE
// 其他選項:
// - ecs.EcsDeploymentConfig.LINEAR_10PERCENT_EVERY_1MINUTES
// - ecs.EcsDeploymentConfig.CANARY_10PERCENT_5MINUTES
const ecsApplication = new codedeploy.EcsApplication(this, 'EcsApp', {
applicationName: 'kyo-system',
})
const ecsDeploymentGroup = new codedeploy.EcsDeploymentGroup(this, 'EcsDeployment', {
application: ecsApplication,
service,
blueGreenDeploymentConfig: {
blueTargetGroup: blueTargetGroup,
greenTargetGroup: greenTargetGroup,
listener: listener,
testListener: testListener, // 用於測試的監聽器
terminationWaitTime: cdk.Duration.minutes(5), // 等待 5 分鐘後終止舊版本
},
deploymentConfig,
autoRollback: {
failedDeployment: true, // 部署失敗自動回滾
stoppedDeployment: true, // 停止部署自動回滾
deploymentInAlarm: true, // 告警觸發自動回滾
},
alarms: [ecsCpuAlarm, alb5xxAlarm], // 監控的告警
})
CDK 讓基礎建設管理變得像寫程式一樣簡單。最大的收穫包括:
// 傳統方式: 手動點擊 AWS Console 建立資源
// - 耗時
// - 容易出錯
// - 難以複製
// - 無法版本控制
// CDK 方式: 寫程式建立資源
// - 快速
// - 型別安全
// - 易於複製到其他環境
// - Git 版本控制
// - 可以 Code Review
// 部署到開發環境
cdk deploy KyoDev
// 部署到生產環境 (完全相同的架構)
cdk deploy KyoProd
// 刪除整個架構
cdk destroy KyoDev
// 使用環境變數和 CDK Context
interface KyoStackProps extends cdk.StackProps {
stage: 'dev' | 'staging' | 'prod'
}
export class KyoStack extends cdk.Stack {
constructor(scope: Construct, id: string, props: KyoStackProps) {
super(scope, id, props)
const config = this.getConfig(props.stage)
// 根據環境使用不同的配置
const database = new rds.DatabaseInstance(this, 'Database', {
instanceType: config.dbInstanceType,
// ...
})
const service = new ecs.FargateService(this, 'Service', {
desiredCount: config.desiredTaskCount,
// ...
})
}
private getConfig(stage: string) {
const configs = {
dev: {
dbInstanceType: ec2.InstanceType.of(ec2.InstanceClass.T3, ec2.InstanceSize.MICRO),
desiredTaskCount: 1,
enableBackup: false,
multiAz: false,
},
staging: {
dbInstanceType: ec2.InstanceType.of(ec2.InstanceClass.T3, ec2.InstanceSize.SMALL),
desiredTaskCount: 2,
enableBackup: true,
multiAz: false,
},
prod: {
dbInstanceType: ec2.InstanceType.of(ec2.InstanceClass.T3, ec2.InstanceSize.MEDIUM),
desiredTaskCount: 3,
enableBackup: true,
multiAz: true,
},
}
return configs[stage]
}
}
30 天下來,學到了很多成本優化的技巧:
問題: AWS 服務太多,不知道該用哪個
需要容器部署:
- ECS Fargate? ECS EC2? EKS? Lambda?
需要資料庫:
- RDS? Aurora? DynamoDB?
需要快取:
- ElastiCache? DAX? CloudFront?
解決方案: 先從最簡單的開始
原則:
1. 選擇託管服務 (減少運維負擔)
2. 選擇 Serverless/Fargate (不需管理伺服器)
3. 從小規模開始,需要時再擴展
4. 參考 AWS Well-Architected Framework
問題: 不小心開啟了昂貴的服務
常見的成本陷阱:
❌ NAT Gateway (每個 ~$32/月)
❌ 未使用的 EBS 卷
❌ 未刪除的 Snapshots
❌ CloudWatch Logs 沒有設定保留期限
❌ Load Balancer 開了但沒用
解決方案: 設定成本告警和預算
// AWS Budgets
const budget = new budgets.CfnBudget(this, 'Budget', {
budget: {
budgetName: 'monthly-budget',
budgetType: 'COST',
timeUnit: 'MONTHLY',
budgetLimit: {
amount: 200,
unit: 'USD',
},
},
notificationsWithSubscribers: [
{
notification: {
notificationType: 'ACTUAL',
comparisonOperator: 'GREATER_THAN',
threshold: 80, // 超過 80% 告警
},
subscribers: [
{
subscriptionType: 'EMAIL',
address: 'admin@kyo-system.com',
},
],
},
],
})
// Cost Explorer API
// 定期檢查成本
問題: IAM 權限設定複雜,容易出錯
// ❌ 不好的做法: 給予過多權限
{
"Effect": "Allow",
"Action": "*",
"Resource": "*"
}
// ✅ 好的做法: 最小權限原則
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::kyo-bucket/*"
}
解決方案: 使用 CDK 自動生成 IAM 權限
// CDK 會自動建立最小權限的 IAM Role
// 例如: Lambda 需要讀取 S3
const lambda = new lambda.Function(this, 'Function', {
// ...
})
const bucket = s3.Bucket.fromBucketName(this, 'Bucket', 'kyo-bucket')
// CDK 自動建立 IAM policy
bucket.grantRead(lambda)
// 自動生成的 policy:
// {
// "Effect": "Allow",
// "Action": [
// "s3:GetObject*",
// "s3:GetBucket*",
// "s3:List*"
// ],
// "Resource": [
// "arn:aws:s3:::kyo-bucket",
// "arn:aws:s3:::kyo-bucket/*"
// ]
// }
問題: VPC, Subnet, Security Group, NACL... 太多概念
VPC 架構層級:
VPC
└─ Availability Zone A
├─ Public Subnet (ALB)
├─ Private Subnet (ECS)
└─ Database Subnet (RDS)
└─ Availability Zone B
├─ Public Subnet (ALB)
├─ Private Subnet (ECS)
└─ Database Subnet (RDS)
解決方案: 使用 CDK 的高階 Construct
// CDK 自動建立最佳實踐的 VPC
const vpc = new ec2.Vpc(this, 'VPC', {
maxAzs: 2, // CDK 自動選擇 2 個 AZ
natGateways: 1,
subnetConfiguration: [
{
name: 'Public',
subnetType: ec2.SubnetType.PUBLIC, // 自動建立 Internet Gateway
},
{
name: 'Private',
subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS, // 自動建立 NAT Gateway
},
{
name: 'Database',
subnetType: ec2.SubnetType.PRIVATE_ISOLATED, // 完全隔離
},
],
})
// 自動建立:
// - VPC
// - 2 個 AZ 的 Subnets
// - Internet Gateway
// - NAT Gateway
// - Route Tables
// - 所有必要的路由規則
推薦學習資源:
使用 CDK 當:
✅ 只用 AWS
✅ 喜歡 TypeScript/Python 等程式語言
✅ 想要型別安全和 IDE 支援
✅ 需要複雜的邏輯 (迴圈、條件)
使用 Terraform 當:
✅ 多雲環境 (AWS + GCP + Azure)
✅ 喜歡 HCL 語法
✅ 團隊已經使用 Terraform
✅ 需要更多的社群模組
❌ 不好:
const database = new rds.DatabaseInstance(this, 'DB', {
// 沒有設定刪除保護
})
✅ 好:
const database = new rds.DatabaseInstance(this, 'DB', {
deletionProtection: true,
removalPolicy: cdk.RemovalPolicy.SNAPSHOT,
})
❌ 不好:
// 部署完就不管了
✅ 好:
// 設定完整的監控和告警
const cpuAlarm = new cloudwatch.Alarm(/* ... */)
const memoryAlarm = new cloudwatch.Alarm(/* ... */)
const errorAlarm = new cloudwatch.Alarm(/* ... */)
❌ 不好:
// 使用預設的 Security Group (可能有安全漏洞)
✅ 好:
const securityGroup = new ec2.SecurityGroup(this, 'SG', {
vpc,
description: 'Allow HTTPS traffic',
allowAllOutbound: false, // 明確定義出站規則
})
securityGroup.addIngressRule(
ec2.Peer.anyIpv4(),
ec2.Port.tcp(443),
'Allow HTTPS'
)
❌ 不好:
backupRetention: cdk.Duration.days(0)
✅ 好:
backupRetention: cdk.Duration.days(7),
preferredBackupWindow: '03:00-04:00',
deleteAutomatedBackups: false,
┌──────────────────────────────────────┐
│ Route 53 (DNS) │
│ kyo-system.com → ALB │
└────────────────┬─────────────────────┘
│
┌────────────────┴─────────────────────┐
│ CloudFront (CDN) │
│ Global Edge Locations │
└────────────────┬─────────────────────┘
│
┌──────────────────────────┼──────────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌────────────────┐ ┌───────────────┐
│ WAF/Shield │ │ ALB │ │ CloudWatch │
│ DDoS 防護 │───────▶│ (Multi-AZ) │───────▶│ 監控/告警 │
└───────────────┘ └────────┬───────┘ └───────────────┘
│
┌─────────────┼─────────────┐
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────────┐
│ VPC (10.0.0.0/16) │
│ │
│ ┌────────────────────────────────────┐ │
│ │ Public Subnet (AZ-A) │ │
│ │ - ALB (10.0.0.0/24) │ │
│ └────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────┐ │
│ │ Private Subnet (AZ-A) │ │
│ │ - ECS Fargate Tasks │ │
│ │ - (10.0.10.0/24) │ │
│ └────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────┐ │
│ │ Database Subnet (AZ-A) │ │
│ │ - RDS Primary │ │
│ │ - ElastiCache │ │
│ │ - (10.0.20.0/24) │ │
│ └────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────┐ │
│ │ Public Subnet (AZ-B) │ │
│ │ - ALB (10.0.1.0/24) │ │
│ └────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────┐ │
│ │ Private Subnet (AZ-B) │ │
│ │ - ECS Fargate Tasks │ │
│ │ - (10.0.11.0/24) │ │
│ └────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────┐ │
│ │ Database Subnet (AZ-B) │ │
│ │ - RDS Replica │ │
│ │ - ElastiCache Replica │ │
│ │ - (10.0.21.0/24) │ │
│ └────────────────────────────────────┘ │
└──────────────────────────────────────────┘
│
┌────────────────────────┼────────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ S3 │ │ SES │ │ SNS │
│ (日誌/備份) │ │ (郵件服務) │ │ (通知) │
└──────────────┘ └──────────────┘ └──────────────┘
│
┌────────────────────────┼────────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ ECR │ │ CodePipeline │ │ QuickSight │
│ (容器映像) │ │ (CI/CD) │ │ (數據分析) │
└──────────────┘ └──────────────┘ └──────────────┘
Multi-AZ 部署:
Auto Scaling:
健康檢查:
RTO (Recovery Time Objective): < 1 hour
RPO (Recovery Point Objective): < 15 minutes
備份策略:
每日自動備份:
- RDS Automated Backups (7 days)
- EFS Automated Backups
- S3 Versioning
每週手動快照:
- RDS Manual Snapshots (30 days)
- EBS Volume Snapshots
即時複製:
- RDS Read Replica (另一個 AZ)
- S3 Cross-Region Replication (另一個 Region)
恢復流程:
1. 資料庫故障:
- RDS Multi-AZ 自動容錯移轉 (< 2 分鐘)
- 或從最近的快照恢復 (< 30 分鐘)
2. 應用故障:
- ECS Auto Scaling 自動啟動新 task (< 5 分鐘)
- 或回滾到上一個版本 (< 10 分鐘)
3. AZ 故障:
- 流量自動路由到其他 AZ (< 1 分鐘)
- 資源自動在其他 AZ 啟動 (< 10 分鐘)
4. Region 故障:
- 手動切換到備份 Region (需要預先設定)
- 從 S3 Cross-Region Replication 恢復資料
- RTO: < 1 hour
水平擴展:
現況 (MVP):
- 2 ECS tasks
- t3.small RDS
- t3.micro Redis
預期擴展 (1 年後):
- 10+ ECS tasks (Auto Scaling)
- t3.medium RDS + 2 Read Replicas
- t3.small Redis Cluster (2 shards)
- Aurora Serverless (未來)
垂直擴展:
Fargate:
- 512 CPU / 1024 MiB → 1024 CPU / 2048 MiB
RDS:
- t3.small → t3.medium → t3.large
- 或切換到 Aurora (更好的擴展性)
Redis:
- t3.micro → t3.small → t3.medium
- 或切換到 Cluster Mode
開發環境 (~$100/月):
計算:
- ECS Fargate (1 task, 0.5 vCPU, 1GB): $15
資料庫:
- RDS t3.micro (Single-AZ): $15
快取:
- ElastiCache t3.micro (Single node): $12
網路:
- ALB: $16
- NAT Gateway: $32
儲存與監控:
- S3: $2
- CloudWatch: $3
總計: ~$95/月
生產環境 (~$250/月):
計算:
- ECS Fargate (3 tasks, 60% Spot): $30
- Lambda (事件處理): $5
資料庫:
- RDS t3.small Multi-AZ (Reserved 1yr): $30
- RDS Read Replica t3.micro: $15
快取:
- ElastiCache t3.small Multi-AZ (Reserved 1yr): $25
網路:
- ALB: $16
- NAT Gateway (2): $64
- CloudFront: $10
安全:
- WAF: $5 + 請求費用: $5
儲存:
- S3 (日誌、備份): $10
- EFS (共享儲存): $5
監控與工具:
- CloudWatch: $10
- CodePipeline: $1
郵件與通知:
- SES: $1
- SNS: $1
總計: ~$233/月
預估流量費用: $20/月
總計: ~$250/月
高流量場景 (10x 流量, ~$800/月):
計算:
- ECS Fargate (10 tasks): $100
資料庫:
- RDS t3.medium Multi-AZ: $60
- Read Replicas (2): $60
快取:
- ElastiCache t3.medium Cluster: $80
網路:
- ALB: $16
- NAT Gateway (2): $64
- CloudFront: $50
其他服務: $70
流量費用: $300
總計: ~$800/月
短期優化 (立即可做):
1. 使用 Fargate Spot (節省 70%)
2. 減少不必要的日誌保留 (節省 30%)
3. 優化 S3 生命週期規則 (節省 50%)
4. 使用單一 NAT Gateway (開發環境, 節省 $32/月)
5. 刪除未使用的 Snapshots (節省 20%)
預期節省: ~$50/月 (開發環境)
中期優化 (3-6 個月):
1. 購買 RDS Reserved Instances (節省 40%)
2. 購買 ElastiCache Reserved (節省 40%)
3. 使用 Savings Plans (Fargate, 節省 17%)
4. 優化 API 快取策略 (減少 RDS 查詢)
5. 實作 CloudFront 快取 (減少源站請求)
預期節省: ~$80/月 (生產環境)
長期優化 (1 年+):
1. 考慮 Aurora Serverless (按需付費)
2. 遷移部分功能到 Lambda (更便宜)
3. 使用 S3 Intelligent-Tiering (自動優化)
4. 多區域部署時使用 PrivateLink (減少流量費)
5. 自建 CDN 節點 (超大流量時)
預期節省: ~$150/月 (大規模時)
// 擴展到多個 Region
const usWestStack = new KyoStack(app, 'KyoUSWest', {
env: { region: 'us-west-2' },
stage: 'prod',
})
const apNortheastStack = new KyoStack(app, 'KyoAPNortheast', {
env: { region: 'ap-northeast-1' },
stage: 'prod',
})
// 使用 Route 53 Geolocation Routing
const hostedZone = new route53.HostedZone(this, 'Zone', {
zoneName: 'kyo-system.com',
})
// 美國用戶導向美西
new route53.ARecord(this, 'USRecord', {
zone: hostedZone,
recordName: 'api',
target: route53.RecordTarget.fromAlias(
new targets.LoadBalancerTarget(usWestStack.alb)
),
geoLocation: route53.GeoLocation.country('US'),
})
// 亞洲用戶導向日本
new route53.ARecord(this, 'APRecord', {
zone: hostedZone,
recordName: 'api',
target: route53.RecordTarget.fromAlias(
new targets.LoadBalancerTarget(apNortheastStack.alb)
),
geoLocation: route53.GeoLocation.continent(route53.Continent.ASIA),
})
// 將部分功能遷移到 Lambda
const reportGenerator = new lambda.Function(this, 'ReportGenerator', {
runtime: lambda.Runtime.NODEJS_18_X,
handler: 'index.handler',
code: lambda.Code.fromAsset('lambda/report-generator'),
timeout: cdk.Duration.minutes(5),
memorySize: 1024,
})
// 定時生成報表
const rule = new events.Rule(this, 'ReportSchedule', {
schedule: events.Schedule.cron({
hour: '0',
minute: '0',
}),
})
rule.addTarget(new targets.LambdaFunction(reportGenerator))
// 事件驅動的 Lambda
const eventProcessor = new lambda.Function(this, 'EventProcessor', {
runtime: lambda.Runtime.NODEJS_18_X,
handler: 'index.handler',
code: lambda.Code.fromAsset('lambda/event-processor'),
})
// SQS trigger
const queue = new sqs.Queue(this, 'EventQueue')
eventProcessor.addEventSource(new sources.SqsEventSource(queue))
EventBridge:
// 事件驅動架構
const eventBus = new events.EventBus(this, 'EventBus', {
eventBusName: 'kyo-events',
})
const rule = new events.Rule(this, 'OTPSentRule', {
eventBus,
eventPattern: {
source: ['kyo.otp'],
detailType: ['OTP Sent'],
},
})
rule.addTarget(new targets.LambdaFunction(notificationFunction))
rule.addTarget(new targets.SqsQueue(analyticsQueue))
App Runner:
// 更簡單的容器部署 (考慮從 ECS 遷移)
const appRunner = new apprunner.Service(this, 'Service', {
source: apprunner.Source.fromEcr({
imageConfiguration: { port: 8080 },
repository: ecrRepo,
tagOrDigest: 'latest',
}),
autoScaling: {
minSize: 1,
maxSize: 10,
},
})
Aurora Serverless v2:
// 更靈活的資料庫擴展
const cluster = new rds.ServerlessCluster(this, 'Cluster', {
engine: rds.DatabaseClusterEngine.auroraPostgres({
version: rds.AuroraPostgresEngineVersion.VER_14_6,
}),
vpc,
scaling: {
minCapacity: rds.AuroraCapacityUnit.ACU_2,
maxCapacity: rds.AuroraCapacityUnit.ACU_16,
autoPause: cdk.Duration.minutes(5),
},
})
30 天的鐵人賽挑戰終於完成了!
這段旅程對我來說意義非凡。雖然我已經有八年的開發經驗,從後端到全端再到區塊鏈,但 AWS 的深度學習是這次鐵人賽最大的收穫。
以前在工作中,雖然也用過 AWS,但大多只是部署一些簡單的應用。這次從零開始打造 Kyo System,從網路架構設計、資料庫高可用、容器化部署、到 CI/CD 自動化,每一個環節都需要深入研究 AWS 的最佳實踐。
Kyo System 已經成功部署到 AWS 生產環境,目前運行穩定:
這是這次鐵人賽最大的成果,也是對工作室未來發展的重要基礎。
這次鐵人賽對我來說,不只是一個挑戰,更是一個系統化學習 AWS 的過程。透過實際打造 SaaS 產品,我對 AWS 服務有了更深入的理解,也建立了自己的 AWS 最佳實踐模板,未來會繼續學習然後去取得AWS的相關證照。
這是這幾年來參加鐵人賽挑戰做過最充足的內容,得益於AI的演進,讓我可以更快速地學習知識與實作,希望明年繼續挑戰。