ignore_errors
、failed_when
、changed_when
等錯誤控制機制block/rescue/always
進行結構化錯誤處理其實當我們真正在開發 production 環境時,是肯定會發生一些例外或錯誤的 ~
俗話說的好:沒人用的系統不會爆炸,如果各位系統都自稱不會爆炸,那...只能說各位是大神中的大神 😆
那如果我們都沒有做任何錯誤處理的話,很容易因為小問題讓整個部署流程終止,並衍伸出其他不預期的行為。
舉個例子:如果一台機器的硬碟空間滿了,沒有錯誤處理會直接導致 50 個機器更新全部停止,那如果有錯誤處理的話,其他 49 台機器也會如預期更新,並且我們只需要根據有問題的機器再作修正即可。
這是個最簡單的錯誤處理方式:
---
- name: Deploy application with error handling
hosts: web
become: yes
tasks:
- name: Stop application (might not be running)
service:
name: myapp
state: stopped
ignore_errors: yes # 我們可以透過 ignore_errors 來忽略這個 step 有可能遇到的錯誤,像是 service 可能本來就沒在 run
- name: Update application files
copy:
src: /tmp/app/
dest: /opt/myapp/
- name: Start application
service:
name: myapp
state: started
使用時機:
那我們能不能自己決定什麼條件下要讓該 step 算是失敗:
- name: Check disk space
shell: df -h / | tail -1 | awk '{print $5}' | sed 's/%//'
register: disk_usage
failed_when: disk_usage.stdout|int > 90 # 磁碟使用率 > 90% 才算失敗
- name: Check if service is responding
uri:
url: "http://{{ inventory_hostname }}/health"
method: GET
register: health_check
failed_when:
- health_check.status != 200 # HTTP 狀態碼不是 200
- "'OK' not in health_check.content" # 或者回應內容沒有 'OK'
因為 Ansible 只要這個 step 有執行且有造成變化會顯示 changed,但是其實這個 step 根本就沒有改變任何東西。
所以這個時候我們可以手動透過 changed_when 來做自定義的判斷。
- name: Run database migration
command: php /opt/app/migrate.php
register: migration_result
changed_when: "'Applied' in migration_result.stdout" # 只有真的執行了更新才算 changed
- name: Check configuration syntax
command: nginx -t
register: nginx_syntax
changed_when: false # 這個檢查永遠不算改變
failed_when: nginx_syntax.rc != 0
其實就像是 Python 的 try/except/finally 一樣:
---
- name: Database backup with error handling
hosts: db
tasks:
- name: Database operations with recovery
block:
# 正常的任務
- name: Stop database service
service:
name: mysql
state: stopped
- name: Backup database files
archive:
path: /var/lib/mysql
dest: /backup/mysql-{{ ansible_date_time.epoch }}.tar.gz
- name: Update database config
template:
src: my.cnf.j2
dest: /etc/mysql/my.cnf
backup: yes
rescue:
# 啊如果出現例外怎麼辦,這邊可以做些發生錯誤要執行的 step
- name: Restore original config
copy:
src: /etc/mysql/my.cnf.{{ ansible_date_time.epoch }}.bak
dest: /etc/mysql/my.cnf
remote_src: yes
ignore_errors: yes
- name: Send alert
mail:
to: admin@company.com
subject: "Database config update failed on {{ inventory_hostname }}"
body: "Please check the database server immediately."
- name: Fail the play
fail:
msg: "Database update failed, rolled back to original config"
always:
# 不管成功還是失敗都會執行
- name: Start database service
service:
name: mysql
state: started
- name: Verify database is running
wait_for:
port: 3306
timeout: 30
- name: Web application deployment with rollback
block:
- name: Download new version
get_url:
url: "{{ app_download_url }}"
dest: /tmp/app-new.tar.gz
- name: Extract new version
unarchive:
src: /tmp/app-new.tar.gz
dest: /tmp/app-new/
remote_src: yes
- name: Stop current application
service:
name: webapp
state: stopped
- name: Backup current version
command: mv /opt/webapp /opt/webapp.backup
- name: Deploy new version
command: mv /tmp/app-new /opt/webapp
- name: Start application
service:
name: webapp
state: started
- name: Health check
uri:
url: "http://localhost/health"
status_code: 200
retries: 3
delay: 10
rescue:
- name: Rollback to previous version
block:
- name: Stop failed version
service:
name: webapp
state: stopped
ignore_errors: yes
- name: Restore backup
command: mv /opt/webapp.backup /opt/webapp
- name: Start restored version
service:
name: webapp
state: started
- name: Verify rollback success
uri:
url: "http://localhost/health"
status_code: 200
rescue:
- name: Emergency notification
debug:
msg: "CRITICAL: Both deployment and rollback failed!"
always:
- name: Clean up temporary files
file:
path: "{{ item }}"
state: absent
loop:
- /tmp/app-new.tar.gz
- /tmp/app-new
ignore_errors: yes
建立一個服務部署 Playbook,包含以下錯誤處理:
# 提示架構
tasks:
- name: Stop service (ignore if not running)
# ... ignore_errors: yes
- name: Deploy application
# ...
- name: Start service
# ... register: service_start
- name: Health check
# ... failed_when: custom condition
模擬資料庫維護作業:
學會錯誤處理,可以讓我們的 Playbook 運行起來更有信心,明天我們來看看 Ansible 社群有哪些好用的 Roles 可以使用吧!