Day-23 Spinlock 在 UART 使用與實作

2022 iThome 鐵人賽

DAY 24

Software Development

與作業系統的第一類接觸 : 探索 xv6系列第 24 篇

14th鐵人賽 xv6 risc-v 作業系統

吳他，惟手熟爾

團隊那團名要叫什麼？

2022-10-09 19:32:15

1307 瀏覽

分享至

前言

前面我們介紹了多 Process 會遇到的 race condition 的問題，以及如何使用 lock 的機制進行處理，後面提到了 xv6 中 Spinlock 的相關實作，在本文中，我們將看到 Spinlock 在 UART 中使用的實際案例，並且在最後比較避免 Deadlock 使用 Disable Interrupt 和 Spinlock 兩種策略之間的比較。

UART 中使用 lock

struct spinlock uart_tx_lock;
#define UART_TX_BUF_SIZE 32
char uart_tx_buf[UART_TX_BUF_SIZE];
uint64 uart_tx_w; // write next to uart_tx_buf[uart_tx_w % UART_TX_BUF_SIZE]
uint64 uart_tx_r; // read next from uart_tx_buf[uart_tx_r % UART_TX_BUF_SIZE]

可以看到 UART 中有一個稱為 uart_tx_lock 使用 spinlock 實現，而這個 lock 保護了 buffer 以及讀和寫的指標。

而下面我們看到uartputc()中 lock 的使用

void
uartputc(int c)
{
  acquire(&uart_tx_lock);

  if(panicked){
    for(;;)
      ;
  }
  while(uart_tx_w == uart_tx_r + UART_TX_BUF_SIZE){
    // buffer is full.
    // wait for uartstart() to open up space in the buffer.
    sleep(&uart_tx_r, &uart_tx_lock);
  }
  uart_tx_buf[uart_tx_w % UART_TX_BUF_SIZE] = c;
  uart_tx_w += 1;
  uartstart();
  release(&uart_tx_lock);
}

首先會先獲得 uart_tx_lock 這個 lock，接著會檢查 buffer 是否還有空間，如果有的話，write 指標+1，接著呼叫uartstart()，接著將 lock 釋放。

而假設現在有一個並行的情況，像是兩個 thread 都呼叫到了 uartputc()，那麼這裡 lock 會確保 thread A 會寫入到 buffer 空閒空間的第一個欄位，thread B 會寫入到 buffer 空閒空間的第二個欄位。而在輸出端我們就可以看到先輸出 thread A 傳送的內容，接著是 thread B 傳送的內容，這樣就避免了 race condition 的問題發生。

如果沒有 lock 去避免掉 race condition 的問題，可能會發生 thread B 複寫掉 thread A 寫入到 buffer 的內容，接著在輸出時我們就看不見 thread A 所寫入的內容了。

接著我們看到uartstart()

void
uartstart()
{
  while(1){
    if(uart_tx_w == uart_tx_r){
      // transmit buffer is empty.
      return;
    }
    
    if((ReadReg(LSR) & LSR_TX_IDLE) == 0){
      // the UART transmit holding register is full,
      // so we cannot give it another byte.
      // it will interrupt when it's ready for a new byte.
      return;
    }
    
    int c = uart_tx_buf[uart_tx_r % UART_TX_BUF_SIZE];
    uart_tx_r += 1;
    
    // maybe uartputc() is waiting for space in the buffer.
    wakeup(&uart_tx_r);
    
    WriteReg(THR, c);
  }
}

先通過 uart_tx_w 和 uart_tx_r 判斷 buffer 不是空的 (在 UART 中有詳細介紹此部分)，接著，上面uartputc()的 lock 確保了只有一個 process 或是 thread 可以向 THR 暫存器一個一個字元的寫入資料，而當 UART 完成寫入或是傳輸操作時，這時候會產生 Interrupt。

而在 Interrupt 中我們也需要 lock，原因為假設 CPU 0 呼叫了 printf，和其他的 CPU 並行執行，而 CPU 1 去處理 UART 的 Interrupt，而 UART 的 Interrupt Handler 為 uartintr()，如果我們展開uartintr()的程式碼可以發現到以下

void
uartintr(void)
{
  // read and process incoming characters.
  while(1){
    int c = uartgetc();
    if(c == -1)
      break;
    consoleintr(c);
  }

  // send buffered characters.
  acquire(&uart_tx_lock);
  uartstart();
  release(&uart_tx_lock);
}

可以發現 uartintr() 也會呼叫 uartstart()，而這時候 CPU 0 和 CPU 1 都會呼叫 uartstart()，而為了確保 THR 暫存器在同一時間只能被一個 thread 或是 process 存取，我們需要在 Interrupt 獲得 lock，確保 THR 只有一個 thread 或是 process 在進行存取，同時也保證 write 和 read 指標的特性。

綜合上面來看，基本上uartstart() 的 caller 都需要獲得 lock，以確保 THR 同一時間只能被一個 thread 或 process 存取。

而也因為 lock，UART 的 Top 和 Bottom 實際上是可以並行執行的。

為什麼關閉中斷可以避免 Dead lock ?

我們在acquire()中可以看到在最一開始的使用，使用push_off()停用中斷。

void
acquire(struct spinlock *lk)
{
  push_off(); // disable interrupts to avoid deadlock.
  if(holding(lk))
    panic("acquire");

  // On RISC-V, sync_lock_test_and_set turns into an atomic swap:
  //   a5 = 1
  //   s1 = &lk->locked
  //   amoswap.w.aq a5, a5, (s1)
  while(__sync_lock_test_and_set(&lk->locked, 1) != 0)
    ;

  // Tell the C compiler and the processor to not move loads or stores
  // past this point, to ensure that the critical section's memory
  // references happen strictly after the lock is acquired.
  // On RISC-V, this emits a fence instruction.
  __sync_synchronize();

  // Record info about lock acquisition for holding() and debugging.
  lk->cpu = mycpu();
}

而我們可以看看如果沒有停用中斷為什麼會造成 Dead lock。

我們可以看到uart.c中的uartputc()

void
uartputc(int c)
{
  acquire(&uart_tx_lock);

  if(panicked){
    for(;;)
      ;
  }

  while(1){
    if(uart_tx_w == uart_tx_r + UART_TX_BUF_SIZE){
      // buffer is full.
      // wait for uartstart() to open up space in the buffer.
      sleep(&uart_tx_r, &uart_tx_lock);
    } else {
      uart_tx_buf[uart_tx_w % UART_TX_BUF_SIZE] = c;
      uart_tx_w += 1;
      uartstart();
      release(&uart_tx_lock);
      return;
    }
  }
}

在uartputc()最一開始的時候會獲得 lock，UART 的功用是一個一個的傳送字元 (serial communication)，當 UART 完成了字元傳輸之後，會產生出 Interrupt，接著執行 UART 的 Interrupt Handler ，也就是uartintr()，而在上面我們可以看到uartintr()也會去嘗試獲得同一把 lock，而這時候便發生了uartputc()持有 uart_tx_lock，而uartintr()要求 uart_tx_lock。

假設我們位於 CPU 0，那麼上面這個情況就會形成 dead lock，CPU 0 嘗試取得同一把 lock，uart_tx_lock，因此上面這個情況會被 xv6 偵測到，並且觸發 panic。

所以我們在 spinlock 中可以看到我們需要處理幾種情況

不同 CPU 之間的並行執行 (如 printf 的例子)
CPU 上 Interrupt Handler 和一般程式之間的並行執行 (如上面的例子)

memory ordering 與 out-of-order-execution

out-of-order-execution 中文為亂序執行，CPU 會根據資料來決定程式執行的順序，而不是按照程式指令的順序來執行，而這樣帶來的好處是程式執行的速度會提升，看到下面來自於 wiki 的例子

b = a * 5
v = *b
c = a + 3

在上面這個例子中，有可能第1行和第3行並行執行，接著在執行第2行，這樣的方式節省了 v 在等待 b 計算得到結果時的等待時間。這樣的優化在 CPU 以及編譯器中可以看到類似的行為，在不改變執行結果的條件之下改變其指令的執行順序。

但是對於並形程式來說，亂序執行可能會造成一些非常嚴重的後果，考慮以下情況

acquire(lk)
...some code
release(lk)

中間的程式碼部分被 lock 所保護，為 critical section，如果這一段程式碼經過了亂序排列，可能就會導致指令的原子性遭到破壞，並得到完全錯誤的結果，而為了避免這樣的情況發生，我們在上面 lock 提供的 API，無論是 acquire() 又或者是 reelease() 都會看到一段程式碼，__sync_synchronize()，其背後代表的意義就是任何在__sync_synchronize()之前的程式碼片段，無論是 load/store，都不能夠移動到__sync_synchronize()之後。

可以把__sync_synchronize()看作是一個界線的概念，以下面release()為例子

void
release(struct spinlock *lk)
{
  if(!holding(lk))
    panic("release");

  lk->cpu = 0;

  // Tell the C compiler and the CPU to not move loads or stores
  // past this point, to ensure that all the stores in the critical
  // section are visible to other CPUs before the lock is released,
  // and that loads in the critical section occur strictly before
  // the lock is released.
  // On RISC-V, this emits a fence instruction.
  __sync_synchronize();

  // Release the lock, equivalent to lk->locked = 0.
  // This code doesn't use a C assignment, since the C standard
  // implies that an assignment might be implemented with
  // multiple store instructions.
  // On RISC-V, sync_lock_release turns into an atomic swap:
  //   s1 = &lk->locked
  //   amoswap.w zero, zero, (s1)
  __sync_lock_release(&lk->locked);

  pop_off();
}

lk->cpu = 0這一段程式碼就不會因為 memory ordering 的關係，跑到 __sync_synchronize()之後了。

避免 Deadlock, disable interrupt vs Spinlock

上面我們看到了通過關閉中斷，我們可以避免 Deadlock 的產生，而在昨天，我們看到了 Spinlock 也同樣可以避免 Deadlock 的產生，而下面我們針對這兩種方式進行比較。

對於單一 CPU 的情況

disable Interrupt:
- 只有在 supervisor mode 中才能夠 disable interrupt
- 會增加 interrupt 的延遲時間，使 interrupt 進入 pending 狀態
Spinlock
- 浪費 CPU 資源，在沒有獲取 lock 時，會進入到無限迴圈，在 xv6 中 Spinclock 的實作中可以看到

對於多個 CPU 的情況

disable Interrupt:
- 因為有多個 CPU 的存在，disable Interrupt 會在如果一個 Process 需要多個 hart 或是 Thread 時失去作用，因為可能由其他的 Thread 或是 hart 發起 Interrupt
Spinlock
- 浪費 CPU 資源，使不使用 Spinlock 是一個取捨問題，我們如果要求效率，使得 Process 能夠並行處理，則我們使用越少 lock 的機制越好，但如果我們使用了 Spinlock，會讓並行的情況變成串行。但我們希望能夠加速 Process 的處理，同時又要確保資料的正確性，我們又需要使用到 Spinlock。