今天要學習 Auto Scaling Group (ASG),這就像是農場的智慧人力調度系統。還記得收穫季節時,爸爸總是根據當天的工作量決定要請多少臨時工嗎?ASG 就是這樣的概念,會根據系統負載自動增減 EC2 執行個體數量。
Auto Scaling Group 是一個智慧的管理員,會:
graph TD
A[CloudWatch 監控] --> B{檢查指標}
B --> |CPU > 70%| C[觸發擴展]
B --> |CPU < 30%| D[觸發縮減]
B --> |30% ≤ CPU ≤ 70%| E[維持現狀]
C --> F[啟動新執行個體]
D --> G[終止執行個體]
F --> H[等待冷卻期]
G --> I[等待冷卻期]
H --> A
I --> A
E --> A
style C fill:#ffcccc
style D fill:#ccffcc
style E fill:#fff3e0
我們的量化交易系統特性:
graph TB
subgraph "Auto Scaling Group"
A[Launch Template<br/>執行個體範本]
B[Scaling Policies<br/>擴展政策]
C[Health Checks<br/>健康檢查]
D[Termination Policies<br/>終止政策]
A --> E[EC2 Instance 1]
A --> F[EC2 Instance 2]
A --> G[EC2 Instance 3]
B --> H[Scale Out<br/>向外擴展]
B --> I[Scale In<br/>向內縮減]
C --> J[ELB Health Check<br/>負載平衡器檢查]
C --> K[EC2 Health Check<br/>執行個體檢查]
end
L[CloudWatch Alarms] --> B
M[Target Group] --> J
style E fill:#e3f2fd
style F fill:#e3f2fd
style G fill:#e3f2fd
{
"AutoScalingGroupName": "trading-ecs-asg",
"LaunchTemplate": {
"LaunchTemplateName": "trading-ecs-template",
"Version": "$Latest"
},
"MinSize": 1,
"MaxSize": 5,
"DesiredCapacity": 2,
"DefaultCooldown": 300,
"HealthCheckType": "ELB",
"HealthCheckGracePeriod": 300,
"VPCZoneIdentifier": [
"subnet-12345678",
"subnet-87654321"
],
"TargetGroupARNs": [
"arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/trading-tg/1234567890123456"
],
"TerminationPolicies": [
"OldestLaunchConfiguration",
"OldestInstance"
],
"Tags": [
{
"Key": "Name",
"Value": "trading-ecs-instance",
"PropagateAtLaunch": true
},
{
"Key": "Environment",
"Value": "production",
"PropagateAtLaunch": true
}
]
}
{
"PolicyName": "trading-cpu-scale-out",
"PolicyType": "TargetTrackingScaling",
"TargetTrackingConfiguration": {
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ASGAverageCPUUtilization"
},
"TargetValue": 70.0,
"ScaleOutCooldown": 300,
"ScaleInCooldown": 300
}
}
{
"PolicyName": "trading-custom-scale-out",
"PolicyType": "TargetTrackingScaling",
"TargetTrackingConfiguration": {
"CustomizedMetricSpecification": {
"MetricName": "TradingRequestsPerSecond",
"Namespace": "TradingSystem",
"Statistic": "Average",
"Dimensions": [
{
"Name": "Environment",
"Value": "production"
}
]
},
"TargetValue": 100.0
}
}
{
"PolicyName": "trading-step-scale-out",
"PolicyType": "StepScaling",
"StepAdjustments": [
{
"MetricIntervalLowerBound": 0,
"MetricIntervalUpperBound": 50,
"ScalingAdjustment": 1
},
{
"MetricIntervalLowerBound": 50,
"ScalingAdjustment": 2
}
],
"AdjustmentType": "ChangeInCapacity",
"Cooldown": 300
}
# 建立 Auto Scaling Group
aws autoscaling create-auto-scaling-group \
--auto-scaling-group-name trading-ecs-asg \
--launch-template LaunchTemplateName=trading-ecs-template,Version='$Latest' \
--min-size 1 \
--max-size 5 \
--desired-capacity 2 \
--default-cooldown 300 \
--health-check-type ELB \
--health-check-grace-period 300 \
--vpc-zone-identifier "subnet-12345678,subnet-87654321" \
--target-group-arns "arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/trading-tg/1234567890123456" \
--tags "Key=Name,Value=trading-ecs-instance,PropagateAtLaunch=true"
# 建立 Target Tracking Policy
aws autoscaling put-scaling-policy \
--auto-scaling-group-name trading-ecs-asg \
--policy-name trading-cpu-tracking \
--policy-type TargetTrackingScaling \
--target-tracking-configuration '{
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ASGAverageCPUUtilization"
},
"TargetValue": 70.0,
"ScaleOutCooldown": 300,
"ScaleInCooldown": 300
}'
# 建立高 CPU 告警
aws cloudwatch put-metric-alarm \
--alarm-name "trading-high-cpu" \
--alarm-description "High CPU utilization" \
--metric-name CPUUtilization \
--namespace AWS/EC2 \
--statistic Average \
--period 300 \
--threshold 80 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 2 \
--alarm-actions "arn:aws:sns:us-east-1:123456789012:trading-alerts"
graph TD
A[一天的時間] --> B[亞洲交易時段<br/>00:00-08:00 UTC]
A --> C[歐洲交易時段<br/>08:00-16:00 UTC]
A --> D[美洲交易時段<br/>16:00-24:00 UTC]
B --> E[Desired: 3<br/>Min: 2, Max: 5]
C --> F[Desired: 4<br/>Min: 3, Max: 8]
D --> G[Desired: 2<br/>Min: 1, Max: 4]
style E fill:#e3f2fd
style F fill:#ffcccc
style G fill:#e8f5e8
# 建立預測性擴展
aws autoscaling put-scaling-policy \
--auto-scaling-group-name trading-ecs-asg \
--policy-name trading-predictive-scaling \
--policy-type PredictiveScaling \
--predictive-scaling-configuration '{
"MetricSpecifications": [
{
"TargetValue": 70.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ASGAverageCPUUtilization"
}
}
],
"Mode": "ForecastOnly",
"SchedulingBufferTime": 300
}'
# 交易日開始前擴展
aws autoscaling put-scheduled-update-group-action \
--auto-scaling-group-name trading-ecs-asg \
--scheduled-action-name trading-market-open \
--recurrence "0 8 * * MON-FRI" \
--desired-capacity 4 \
--min-size 2 \
--max-size 8
# 交易日結束後縮減
aws autoscaling put-scheduled-update-group-action \
--auto-scaling-group-name trading-ecs-asg \
--scheduled-action-name trading-market-close \
--recurrence "0 22 * * MON-FRI" \
--desired-capacity 2 \
--min-size 1 \
--max-size 5
graph TD
A[執行個體健康檢查] --> B[EC2 Health Check<br/>執行個體狀態]
A --> C[ELB Health Check<br/>應用程式回應]
A --> D[Custom Health Check<br/>自訂業務邏輯]
B --> E{執行個體正常?}
C --> F{應用程式回應?}
D --> G{交易功能正常?}
E --> |否| H[標記為不健康]
F --> |否| H
G --> |否| H
H --> I[終止並替換執行個體]
style H fill:#ffcccc
style I fill:#fff3e0
#!/bin/bash
# /opt/trading/health-check.sh
# 檢查 ECS Agent 狀態
if ! docker ps | grep -q ecs-agent; then
echo "ECS Agent not running"
exit 1
fi
# 檢查交易服務狀態
if ! curl -f http://localhost:8080/health > /dev/null 2>&1; then
echo "Trading service unhealthy"
exit 1
fi
# 檢查與 Bybit API 的連線
if ! curl -f https://api.bybit.com/v2/public/time > /dev/null 2>&1; then
echo "Cannot connect to Bybit API"
exit 1
fi
echo "All checks passed"
exit 0
graph TB
A[ASG 監控指標] --> B[容量指標]
A --> C[活動指標]
A --> D[健康指標]
B --> E[DesiredCapacity<br/>InServiceInstances<br/>PendingInstances]
C --> F[ScalingActivities<br/>LaunchSuccessful<br/>TerminateSuccessful]
D --> G[HealthyInstances<br/>UnhealthyInstances]
style E fill:#e3f2fd
style F fill:#f3e5f5
style G fill:#fff3e0
# 監控不健康的執行個體
aws cloudwatch put-metric-alarm \
--alarm-name "trading-unhealthy-instances" \
--alarm-description "Unhealthy instances in ASG" \
--metric-name UnhealthyInstances \
--namespace AWS/AutoScaling \
--statistic Maximum \
--period 60 \
--threshold 1 \
--comparison-operator GreaterThanOrEqualToThreshold \
--evaluation-periods 1 \
--dimensions Name=AutoScalingGroupName,Value=trading-ecs-asg
# 監控擴展活動失敗
aws cloudwatch put-metric-alarm \
--alarm-name "trading-scaling-failures" \
--alarm-description "Scaling activities failed" \
--metric-name ScalingActivities \
--namespace AWS/AutoScaling \
--statistic Sum \
--period 300 \
--threshold 1 \
--comparison-operator GreaterThanOrEqualToThreshold \
--evaluation-periods 1
graph LR
A[擴展觸發] --> B[啟動新執行個體]
B --> C[執行個體啟動<br/>2-3 分鐘]
C --> D[應用程式初始化<br/>1-2 分鐘]
D --> E[健康檢查通過<br/>1 分鐘]
E --> F[開始處理流量]
F --> G[冷卻期開始<br/>5 分鐘]
G --> H[可以再次擴展]
style G fill:#ffffcc
建議冷卻期:
{
"TerminationPolicies": [
"OldestLaunchConfiguration", // 優先終止舊版本
"OldestInstance", // 其次終止最舊的執行個體
"Default" // 最後使用預設政策
]
}
# 跨多個可用區域部署
aws autoscaling create-auto-scaling-group \
--vpc-zone-identifier "subnet-1a,subnet-1b,subnet-1c" \
--availability-zones "us-east-1a,us-east-1b,us-east-1c"
執行個體無法啟動
# 檢查 ASG 活動歷史
aws autoscaling describe-scaling-activities \
--auto-scaling-group-name trading-ecs-asg
健康檢查失敗
# 檢查 Target Group 健康狀態
aws elbv2 describe-target-health \
--target-group-arn arn:aws:elasticloadbalancing:...
擴展政策未觸發
# 檢查 CloudWatch 指標
aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--start-time 2023-01-01T00:00:00Z \
--end-time 2023-01-01T01:00:00Z \
--period 300 \
--statistics Average
今天我們學習了 Auto Scaling Group,這個智慧的人力調度系統能夠根據需求自動調整執行個體數量。就像農場根據季節和工作量調整工人數量一樣,ASG 幫助我們在保持服務品質的同時優化成本。
重要概念回顧:
明天我們將學習 Elastic IP,為我們的系統提供固定的對外 IP 位址!
下一篇:Day 10 - AWS Elastic IP