[6.1810][code] xv6 的 FileSystem (三) : Logging Layer

xv6-riscv

wtommy_fdgkhdkgh 2026-05-07 23:58:39 ‧ 398 瀏覽

分享至

系列文章 : [6.1810] 跟著 MIT 6.1810 學習基礎作業系統觀念

大綱

kernel/log.c/struct logheader
kernel/log.c/struct log
kernel/log.c/initlog
kernel/log.c/install_trans
kernel/log.c/read_head
kernel/log.c/write_head
kernel/log.c/recover_from_log
kernel/log.c/begin_op
kernel/log.c/end_op
kernel/log.c/write_log
kernel/log.c/commit
kernel/log.c/log_write
kernel/log.c/總結
recover from crash 分析 : commit, recover_from_log

kernel/log.c/struct logheader

// Contents of the header block, used for both the on-disk header block
// and to keep track in memory of logged block# before commit.
struct logheader {
  int n;
  int block[LOGBLOCKS];
};

n : 代表目前有多少個 logged blocks
block
- 假設 block[x] = n，這代表第 x 個 logg-block 儲存了要寫入第 n 個 data-block 的資料。
- 在 commit 的時候，會把第 x 個 logged-block 的資料寫入第 n 個 data-block。

kernel/log.c/struct log

struct log {
  struct spinlock lock;
  int start;
  int outstanding; // how many FS sys calls are executing.
  int committing;  // in commit(), please wait.
  int dev;
  struct logheader lh;
};

struct spinlock lock : 保護這個 structure，避免同時多個 CPU 存取這個 structure。
int start : log 在 disk hardware 裡面起始的 block 編號。

int outstanding : 同時有多少個 process 進入 begin_op。
int committing : 目前是否正在 committing ( 準備用 commit function 把資料寫進 log-block 以及 data-block )。
int dev : device number，代表目前我們是使用哪一個 physical disk 或是 partition。
struct logheader lh : in-memory copy of the log header。如同之前的筆記有提到過，log 本身也會存在 disk hardware 裡面，並且是放在 superblock 的後面。需要的時候，我們會把 disk hardware 裡面的 logheader 讀取出來，並放在這裡。

kernel/log.c/initlog

void
initlog(int dev, struct superblock *sb)
{
  if (sizeof(struct logheader) >= BSIZE)
    panic("initlog: too big logheader");

  initlock(&log.lock, "log");
  log.start = sb->logstart;
  log.dev = dev;
  recover_from_log();
}

log.start : log 的 header ( logheader ) 會在 disk 裡面的哪一個 block ? 此外也是 logging layer 開始的 block。在 logheader 的 block 後面，會開始放置 log-blocks。
log.dev : device number，代表目前我們是使用哪一個 physical disk 或是 partition。xv6-riscv 的 root disk 的 device number 為 ROOTDEV (1)。
recover_from_log() : 嘗試復原有紀錄在 logging layer ，且還沒完成的 disk 操作。把 log-blocks 載入到相對應的 data-blocks。

kernel/log.c/install_trans

// Copy committed blocks from log to their home location
static void
install_trans(int recovering)
{

在這個 function 裡面，會假設我們已經從 disk 裡面把 logheader 載入到了 in-memory copy 裡面 ( log )。

  int tail;

  for (tail = 0; tail < log.lh.n; tail++) {
    if(recovering) {
      printf("recovering tail %d dst %d\n", tail, log.lh.block[tail]);
    }
    struct buf *lbuf = bread(log.dev, log.start+tail+1); // read log block
    struct buf *dbuf = bread(log.dev, log.lh.block[tail]); // read dst
    memmove(dbuf->data, lbuf->data, BSIZE);  // copy block to dst
    bwrite(dbuf);  // write dst to disk
    if(recovering == 0)
      bunpin(dbuf);
    brelse(lbuf);
    brelse(dbuf);
  }
}

這個 function 會嘗試把 log-blocks 安裝進 data-blocks 裡面。

kernel/log.c/read_head

// Read the log header from disk into the in-memory log header
static void
read_head(void)
{
  struct buf *buf = bread(log.dev, log.start);
  struct logheader *lh = (struct logheader *) (buf->data);
  int i;
  log.lh.n = lh->n;
  for (i = 0; i < log.lh.n; i++) {
    log.lh.block[i] = lh->block[i];
  }
  brelse(buf);
}

讀取在 disk 裡面的 log block 的第一個 block ( the logheader, n and block[] )，並存在 logheader 的 in-memory copy : log.lh。

kernel/log.c/write_head

// Write in-memory log header to disk.
// This is the true point at which the
// current transaction commits.
static void
write_head(void)
{
  struct buf *buf = bread(log.dev, log.start);
  struct logheader *hb = (struct logheader *) (buf->data);

把 disk hardware 內的 logheader 讀取出來 ( hb )。

  int i;
  hb->n = log.lh.n;
  for (i = 0; i < log.lh.n; i++) {
    hb->block[i] = log.lh.block[i];
  }

把 in-memory copy logheader 的值存到 hb 裡面。

  bwrite(buf);
  brelse(buf);
}

把 hb 寫回 disk hardware 裡面。
簡單來說，就是把 in-memory copy 的 logheader 寫回到 disk hardware 裡面。

kernel/log.c/recover_from_log

static void
recover_from_log(void)
{
  read_head();
  install_trans(1); // if committed, copy from log to disk
  log.lh.n = 0;
  write_head(); // clear the log
}

每次開機時都會呼叫這個 function。會去讀取 disk hardware 內的 logheader ( read_head )，並查看 logheader 的 n，看有沒有 log-block，假如有 log-block 的話，就表示在關機的時候，有還沒完成的 file operation，會嘗試將 log-blocks 寫進 data-blocks ( install_trans )。

寫完之後，會把 in-memory copy 的 logheader 的 n 設為 0 ( 表示目前沒有 log-blocks 了! )，然後再把這個 in-memory logheader 的內容寫進 disk hardware ( write_head )。

kernel/log.c/begin_op

// called at the start of each FS system call.
void
begin_op(void)
{

一個標記，代表我們要開始操作 disk 了。

  acquire(&log.lock);
  while(1){
    if(log.committing){
      sleep(&log, &log.lock);

假如正在 committing ( 用 commit function 把資料寫進 data-blocks ) 的話，就陷入睡眠。

    } else if(log.lh.n + (log.outstanding+1)*MAXOPBLOCKS > LOGBLOCKS){
      // this op might exhaust log space; wait for commit.
      sleep(&log, &log.lock);

在這裡會避免一次寫太多 block，導致 log-block 被用鑿。假如發現 log-block 數可能不夠用的話，就會陷入睡眠。
TODO : 理解一下這邊是怎麼判斷，一次會不會操作太多 block 的 ?

      log.outstanding += 1;
      release(&log.lock);
      break;

可以開始操作 disk ( 實際上是操作 buffer cache layer 的 struct-buf )。
log.outstanding + 1 : 表示現在多一個 process 在 begin_op 跟 end_op 之間了。

kernel/log.c/end_op

// called at the end of each FS system call.
// commits if this was the last outstanding operation.
void
end_op(void)
{
  int do_commit = 0;

  acquire(&log.lock);
  log.outstanding -= 1;
  if(log.committing)
    panic("log.committing");
  if(log.outstanding == 0){
    do_commit = 1;
    log.committing = 1;
  } else {
    // begin_op() may be waiting for log space,
    // and decrementing log.outstanding has decreased
    // the amount of reserved space.
    wakeup(&log);
  }
  release(&log.lock);

  if(do_commit){
    // call commit w/o holding locks, since not allowed
    // to sleep with locks.
    commit();
    acquire(&log.lock);
    log.committing = 0;
    wakeup(&log);
    release(&log.lock);
  }
}

end_op 表示上層的使用者一經完成了一次完整的操作，可以將 log.outstanding - 1 來退出 begin_op ~ end_op 的區間。

假如沒有其他 process 在 begin_op ~ end_op 的區間 ( log.outstanding == 0 )，就可以進行 commit，把 struct-buf cache 住個資料，真的寫進 log-blocks，並進而寫進 data-blocks 裡面。

kernel/log.c/write_log

// Copy modified blocks from cache to log.
static void
write_log(void)
{
  int tail;

  for (tail = 0; tail < log.lh.n; tail++) {
    struct buf *to = bread(log.dev, log.start+tail+1); // log block
    struct buf *from = bread(log.dev, log.lh.block[tail]); // cache block
    memmove(to->data, from->data, BSIZE);
    bwrite(to);  // write the log
    brelse(from);
    brelse(to);
  }
}

把 struct-buf cache 住的資料寫進 log-blocks。
在把 struct-buf cache 住的資料寫進 data-block 之前，一定要先把資料寫進 log-blocks!

kernel/log.c/commit

static void
commit()
{
  if (log.lh.n > 0) {
    write_log();     // Write modified blocks from cache to log
    write_head();    // Write header to disk -- the real commit
    install_trans(0); // Now install writes to home locations
    log.lh.n = 0;
    write_head();    // Erase the transaction from the log
  }
}

commit 就是一系列要將 struct-buf cache 住的資料寫進 disk 的動作。
write_log : 把 struct-buf cache 住的資料寫進 log-blocks 裡面。
write_head : 把 in-memory copy 的 logheader 寫進 disk 裡面，讓 disk 知道，現在 log-blocks 可能有所增加 !
install_trans : 把 log-blocks 寫進相對應的 data-blocks 裡面。
log.lh.n = 0 : 把 log-blocks 的數量歸 0。
write_head : 把 in-memory copy 的 logheader 再一次更新到 disk 裡面，至此完整的完成一次 transaction!

kernel/log.c/log_write

// Caller has modified b->data and is done with the buffer.
// Record the block number and pin in the cache by increasing refcnt.
// commit()/write_log() will do the disk write.
//
// log_write() replaces bwrite(); a typical use is:
//   bp = bread(...)
//   modify bp->data[]
//   log_write(bp)
//   brelse(bp)
void
log_write(struct buf *b)
{
  int i;

  acquire(&log.lock);
  if (log.lh.n >= LOGBLOCKS)
    panic("too big a transaction");
  if (log.outstanding < 1)
    panic("log_write outside of trans");

  for (i = 0; i < log.lh.n; i++) {
    if (log.lh.block[i] == b->blockno)   // log absorption
      break;
  }
  log.lh.block[i] = b->blockno;
  if (i == log.lh.n) {  // Add new block to log?
    bpin(b);
    log.lh.n++;
  }
  release(&log.lock);
}

在 xv6-riscv 七個階層中，所有高於 logging-layer 階層都不可以直接呼叫 bwrite，而是要呼叫 log_write。因為 bwrite，會直接對 disk ( VirtIO Block Device ) 進行操作，而高於 logging layer 的階層是不被允許直接操作 disk 的，因為所有對 disk 的操作都需要經過 logging-layer。

這個 function 不會把資料寫進 disk hardware，但會把相對應的 struct-buf pin ( bpin ) 住，避免這個 struct-buf 的資料被回收掉，並等待 commit 這個 function 真正把資料寫進 disk hardware 裡面。

此外也會更新 logheader，把真正要寫入的 data-block 的 index 存進 logheader.block 陣列裡面，也會將 logheader.n + 1，代表多了一個 log-block。

這個 function 的註解也有寫到這個 function 通常會被怎樣使用

bp = bread(...) : 可能從 disk 載入資料到 struct buf ( buffer cache layer 的快取 )
modify bp->data[] : 更新 struct-buf 內的資料
log_write(bp) : 把這個資料 pin ( bpin ) 住，並更新 logheader.n, 以及 logheader.block 陣列。
brelse(bp) : 釋放掉這個 struct-buf 的 sleeplock，並把 reference count - 1。

kernel/log.c/總結

在 xv6-riscv file system 的七個階層裡面，只要是高於 Logging Layer 的階層，都不可以直接呼叫 bwrite 來對 disk hardware 進行寫入，而是呼叫 log_write。
在把資料真的寫入 data-block 之前，都要先把資料寫進 log-block。
這個寫入有沒有意義，要看 write_head 有沒有成功的把 logheader 從 in-memor copy 寫入 disk hardware。
於是一個 transaction 的成功與否 ( 有沒有成功的被 log 起來 ? 有沒有成功的從 log-block 真的寫進 data-block ? ) 端看 logheader 有沒有從 in-memory copy 寫進 disk hardware。所以 write_head 程式碼的 comment 才會說 his is the true point at which the current transaction commits.

recover from crash 分析 : commit, recover_from_log

從程式碼可以得知，真正會對 disk 裡面的 data-block 進行寫入的 function 只有 install_trans，而會呼叫 install_trans 的 function 有 commit，以及 recover_from_log。

這邊來檢視一下，假如我們在某一行程式突然發生 crash 了，那會發生問題嗎 ?
這邊只看 commit function，因為 recover_from_log 的原理也差不多。

真正重要的程式碼應該是 write_head ( 或者說是 write_head 裡面的 bwrite )，因為只有當 logheader 真的寫進了 disk hardware ( VirtIO Block Device )，logging layer 才會知道說，log block 的數量有所改動 !

{ kernel/log.c/commit }

static void
commit()
{
  if (log.lh.n > 0) {
    write_log();     // Write modified blocks from cache to log
    write_head();    // Write header to disk -- the real commit

假如在這個 write_head 之前發生 crash ( logheader 沒有成功寫進 disk hardware )
- 假如 log block 裡面有資料的話，就會被丟棄
- 這個 transaction 會被當作沒有發生過
假如在這個 write_head 之後發生 crash ( logheader 有成功寫進 disk hardware )
- 這個 transaction 的紀錄會被保存下來，系統可以利用 log-block 裡的資料，把資料寫回 data-block。
- 這個 transaction 有機會成功。

    install_trans(0); // Now install writes to home locations
    log.lh.n = 0;
    write_head();    // Erase the transaction from the log
  }
}

假如在這個 write_head 之前發生 crash ( logheader 沒有成功寫進 disk hardware )
- 就算 install_trans 可能已經把部分的 log-block 寫進 data-block，這個 install_trans 會被當作沒發生。當 reboot 的時候，會在 initlog->recover_from_log 再嘗試一次把 log-block 寫進 data-block。
- 這個 transaction 仍舊有機會成功
假如在這個 write_head 之後發生 crash ( logheader 有成功寫進 disk hardware )
- 這個 transaction 已經成功了，所以就算發生 crash 了，這個 transaction 也己經被保存在 disk hardware 的 data-block 裡面。

所以其實可以用 write_head 在這邊劃分成三個狀態

logheader 還沒被更新，logging layer 還不知道 log-block 有所變化。在這裡發生 crash 的話，這個 transaction 會被當作沒發生而失敗。

write_head

logheader 已經被更新，logging layer 知道 log-block 有新的資料。在這裡發生 crash 的話，這個 transaction 有機會被復原 ( recover )。

write_head

logheader 已經被更新，logging layer 知道 log-block 的資料已經被寫入 data-block 了，這個 transaction 成功。

蠻有趣的一點是，不管 write_head 有沒有成功 ( 有可能在 write_head 的過程中發生 crash，而沒有成功的把 logheader 寫入 disk hardware )，系統都不會進入不一致的狀態。

這個 transaction 只會有兩個結果

log block 裡面對 disk hardware 的寫入通通成功
log block 裡面對 disk hardware 的寫入通通失敗

TODO

[--] 因為 log-block 大小是有限的，該怎麼防止有人一次寫太多 block，把 log-block 塞爆 ? 重點應該會是 begin_op 內的判斷。

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

902 組

團體組數

37 組

累計文章數

19837 篇

完賽人數

528 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 17th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# linux windows server css react

IT邦幫忙