[6.1810][code] xv6 的 FileSystem (五) : Inode Layer(二)

xv6-riscv

wtommy_fdgkhdkgh 2026-05-17 21:36:17 ‧ 111 瀏覽

分享至

系列文章 : [6.1810] 跟著 MIT 6.1810 學習基礎作業系統觀念

大綱

kernel/fs.c/ialloc
kernel/fs.c/iupdate
kernel/fs.c/ireclaim
kernel/fs.c/bmap
kernel/fs.c/itrunc
kernel/fs.c/stati
kernel/fs.c/readi
kernel/fs.c/writei

kernel/fs.c/ialloc

// Allocate an inode on device dev.
// Mark it as allocated by  giving it type type.
// Returns an unlocked but allocated and referenced inode,
// or NULL if there is no free inode.
struct inode*
ialloc(uint dev, short type)

allocate 一個新的 struct-dinode，並把 type 給予這個 struct-dinode。
然後再把這個新的 struct-dinode 資訊寫回 disk-hardware。

{
  int inum;
  struct buf *bp;
  struct dinode *dip;

  for(inum = 1; inum < sb.ninodes; inum++){
    bp = bread(dev, IBLOCK(inum, sb));
    dip = (struct dinode*)bp->data + inum%IPB;
    if(dip->type == 0){  // a free inode
      memset(dip, 0, sizeof(*dip));
      dip->type = type;
      log_write(bp);   // mark it allocated on the disk
      brelse(bp);
      return iget(dev, inum);
    }
    brelse(bp);
  }
  printf("ialloc: no inodes\n");
  return 0;
}

IBLOCK(a, sb)
- inode-number 為 i 的 struct-dinode，會是在 disk-hardware 裡面的哪一個 block。
IPB ( inode per block )
- 每一個 block 會有多少個 struct-dinode
iget v.s. ialloc
- iget : 這個 function 會嘗試用 dev 以及 inum 去拿出一個 in-memory copy struct-inode。雖然拿到了 struct-inode，但這不代表這個 inode 的資訊已經從 disk hardware 讀取出來了 ( 需要用 ilock 讀取出來 )。
- ialloc : allocate 一個新的 struct-dinode，並把資訊存回 disk hardware。
在這裡會遍尋所有在 disk-hardware 裡面的 struct-dinode，看有沒有空閒的 struct-dinode ( type == 0 )，假如有的話，就把資訊用 log_write 嘗試存進 disk-hardware，然後再用 iget 把該 struct-dinode 複製進 in-memory-struct-inode。

kernel/fs.c/iupdate

// Copy a modified in-memory inode to disk.
// Must be called after every change to an ip->xxx field
// that lives on disk.
// Caller must hold ip->lock.
void
iupdate(struct inode *ip)
{

這個 function 會把 in-memory-struct-inode 寫進 disk 的 struct-dinode。
在呼叫這個 function 之前，需要先擁有該 struct-inode 的 sleep-lock。

  struct buf *bp;
  struct dinode *dip;

  bp = bread(ip->dev, IBLOCK(ip->inum, sb));
  dip = (struct dinode*)bp->data + ip->inum%IPB;
  dip->type = ip->type;
  dip->major = ip->major;
  dip->minor = ip->minor;
  dip->nlink = ip->nlink;
  dip->size = ip->size;
  memmove(dip->addrs, ip->addrs, sizeof(ip->addrs));
  log_write(bp);
  brelse(bp);
}

bp : 把 struct-dinode 從 disk-hardware 讀取出來，讓該 bp 代表 struct-dinode。
- 這裡的用意主要不是要把資料讀出來，因為等一下就要被覆寫掉了，主要是希望在這裏有個可以代表 struct-dinode 在 disk 裡的代理人。讓我們之後可以利用這個 bp ( 使用 log_write )，複寫掉在 disk-hardware 中的 struct-dinode。
把 struct-dinode 更新成 in-memory-struct-inode 的樣子。
用 log_write，準備把 bp 寫進 disk-hardware。
brelse : 不需要再擁有這個 bp 了，release 它

kernel/fs.c/ireclaim

void
ireclaim(int dev)
{

這個 function 會去遍尋所有在 disk hardware 的 struct-dinode，看有沒有 orphaned inode。
orphaned 的意思是已經被刪掉，但是檔案本身沒有完全被刪除的 struct-inode。
造成的原因，可能是系統在處理檔案的過程中 crash 掉了。

  for (int inum = 1; inum < sb.ninodes; inum++) {
    struct inode *ip = 0;
    struct buf *bp = bread(dev, IBLOCK(inum, sb));
    struct dinode *dip = (struct dinode *)bp->data + inum % IPB;
    if (dip->type != 0 && dip->nlink == 0) {  // is an orphaned inode
      printf("ireclaim: orphaned inode %d\n", inum);
      ip = iget(dev, inum);
    }
    brelse(bp);
    if (ip) {
      begin_op();
      ilock(ip);
      iunlock(ip);
      iput(ip);
      end_op();
    }
  }
}

dip->nlink == 0 : 代表沒有人指向這個檔案，這個檔案本該被刪掉
dip->type != 0 : 被刪掉的檔案，其 dip->type 該是 0，但這邊不是 0，表示這個檔案沒有被順利的刪掉，變成孤兒 ( orphaned inode )。
假如遇到 orphaned-struct-dinode
- iget(dev, inum)
  - 嘗試去 itable.inode[N] 裡面有沒有相對應的 in-memory-struct-inode，有的話就回傳，沒有的話就拿 itable.inode[N] 裡面的一個空的 entry，並說 : 這個 entry 代表了 dev, inum 的 struct-inode 節點！
  - 要注意，iget 拿出來的 struct-inode，不一定有從 disk-hardware 的 struct-dinode 拿出正確的資料
  - struct-inode->ref++
- begin_op()
  - 表示要開始操作 disk。
- ilock
  - 嘗試去取 struct-inode 的 sleep-lock。
  - 假如這個 struct-inode 還沒有從 disk-hardware 讀取 struct-dinode 的資料的話 ( struct-inode->valid == 0 )，就去 disk-hardware 讀取
- iunlock
  - 釋放這個 struct-inode 的 sleep-lock
- iput
  - struct-inode->ref--
  - struct-inode->ref == 0 且 struct-inode->nlink == 0的話，有機會用 itrunc 把該 struct-inode 所代表的檔案內容清空，並把 struct-inode->type = 0。
- end_op()
  - 代表結束操作 disk
  - 有機會進行 commit，真正去完成一系列的 disk-hardware 寫入操作。

kernel/fs.c/bmap

// Inode content
//
// The content (data) associated with each inode is stored
// in blocks on the disk. The first NDIRECT block numbers
// are listed in ip->addrs[].  The next NINDIRECT blocks are
// listed in block ip->addrs[NDIRECT].

// Return the disk block address of the nth block in inode ip.
// If there is no such block, bmap allocates one.
// returns 0 if out of disk space.
static uint
bmap(struct inode *ip, uint bn)
{

這個 function 會回傳，struct-inode 的第 bn 個 content，它的 data-block number 是多少。

假如原本沒有相對應的 data-block number，則會用 balloc 去要一個 data-block
return != 0 : 成功的找到第 bn 個 content，其 data-block number 是多少
return == 0 : disk-hardware space 不足

可以看到 xv6-riscv-boot 裡面的圖，能理解 struct-dinode 以及其 content 在 disk-hardware 裡面的分佈。
{ xv6-riscv-book/FileSystem }

在 disk-hardware 裡的 struct-dinode->addrs[NDIRECT+1] 陣列總共會有 NDIRECT+1 個元素。

其中 0 ~ NDIRECT - 1 ( 共 NDIRECT 個 )，裡面的數字代表了該 content-block 在 disk-hardware 內的 block 編號。
第 NDIRECT 元素 ( 0-indexed )，則裝著 indirect block 在 disk-hardware 內的 block 編號
在 indirect block 裡面，可以裝 (BSIZE / sizeof(uint)) 個 block 編號。以 xv6-riscv 預設的設定來說，BSIZE : 1024 bytes ，所以這邊可以裝 1024 / 4 = 256 個 block 編號。

  uint addr, *a;
  struct buf *bp;

  if(bn < NDIRECT){
    if((addr = ip->addrs[bn]) == 0){
      addr = balloc(ip->dev);
      if(addr == 0)
        return 0;
      ip->addrs[bn] = addr;
    }
    return addr;
  }
  bn -= NDIRECT;

假如 bn 是在 0 ~ NDIRECT - 1 這個區間，則在 struct-dinode->addr[bn] 拿 block 編號
假如沒拿到編號，則用 balloc 去嘗試 allocate 一個新的 block
balloc 也失敗的話，就沒辦法了，return 0，表示 disk 滿了。

  if(bn < NINDIRECT){
    // Load indirect block, allocating if necessary.
    if((addr = ip->addrs[NDIRECT]) == 0){
      addr = balloc(ip->dev);
      if(addr == 0)
        return 0;
      ip->addrs[NDIRECT] = addr;
    }

假如 bn 落在 (NDIRECT - 1) ~ (NINDIRECT - 1) 的話，則會嘗試去拿 indirect block 在 disk-hardware 內的 block 編號。
在 struct-dinode->addrs[NDIRECT] 沒拿到的話，一樣嘗試用 balloc 去拿
balloc 也拿不到，就沒辦法了， disk 沒空間了， return 0。

    bp = bread(ip->dev, addr);
    a = (uint*)bp->data;
    if((addr = a[bn]) == 0){
      addr = balloc(ip->dev);
      if(addr){
        a[bn] = addr;
        log_write(bp);
      }
    }
    brelse(bp);
    return addr;
  }

  panic("bmap: out of range");
}

嘗試去 indirect block 拿 bn 對應的 block 編號

kernel/fs.c/itrunc

// Truncate inode (discard contents).
// Caller must hold ip->lock.
void
itrunc(struct inode *ip)
{

這個 function 會把一個 struct-inode 的 content 砍掉。

  int i, j;
  struct buf *bp;
  uint *a;

  for(i = 0; i < NDIRECT; i++){
    if(ip->addrs[i]){
      bfree(ip->dev, ip->addrs[i]);
      ip->addrs[i] = 0;
    }
  }

遍尋 struct-inode->addrs[X] 的 0 ~ NDIRECT - 1 ( direct part )，用 bfree 釋放掉所有看到的 block 編號 !

  if(ip->addrs[NDIRECT]){
    bp = bread(ip->dev, ip->addrs[NDIRECT]);
    a = (uint*)bp->data;
    for(j = 0; j < NINDIRECT; j++){
      if(a[j])
        bfree(ip->dev, a[j]);
    }
    brelse(bp);
    bfree(ip->dev, ip->addrs[NDIRECT]);
    ip->addrs[NDIRECT] = 0;
  }

  ip->size = 0;
  iupdate(ip);
}

遍尋 indirect block，用 bfree 釋放掉所有看到的 block 編號 !

kernel/fs.c/stati

// Copy stat information from inode.
// Caller must hold ip->lock.
void
stati(struct inode *ip, struct stat *st)
{
  st->dev = ip->dev;
  st->ino = ip->inum;
  st->type = ip->type;
  st->nlink = ip->nlink;
  st->size = ip->size;
}

這個 function 會把 inode metadata 存進 struct stat。
使用這個 function 之前，需要先拿到 struct-inode 的 sleep-lock。
有 system call 會去拿這個資料。

kernel/fs.c/readi

// Read data from inode.
// Caller must hold ip->lock.
// If user_dst==1, then dst is a user virtual address;
// otherwise, dst is a kernel address.
int
readi(struct inode *ip, int user_dst, uint64 dst, uint off, uint n)
{

這個 function 會從 ip 的檔案內容的 off 偏移的地方讀取 n bytes 到 dst。 user_dst == 1 的話，表示 dst address 是 user space 的 address，user_dst == 0 的話，表示 dst address 是 kernel space 的 address。
在使用這個 function 之前，需要先拿到 ip 的 sleep-lock。
struct inode *ip : 想要讀取的 struct-inode。
int user_dst
- 0 : dst 是 kernel space 的 address
- 1 : dst 是 user space 的 address
uint64 dst : 要把讀取來的資料放在 dst 的地方
uint off : offset, 偏移量。struct-inode 檔案內容的偏移量。
uint n : 要讀取多少 bytes。
return value : 實際上讀取了多少 bytes。

  uint tot, m;
  struct buf *bp;

  if(off > ip->size || off + n < off)
    return 0;

off > ip->size : 偏移量比檔案的大小還大
off + n < off : overflow 了

  if(off + n > ip->size)
    n = ip->size - off;

假如偏移量 + n 大於檔案大小，則把 n 壓在不會超出檔案大小的值。

  for(tot=0; tot<n; tot+=m, off+=m, dst+=m){
    uint addr = bmap(ip, off/BSIZE);
    if(addr == 0)
      break;
    bp = bread(ip->dev, addr);
    m = min(n - tot, BSIZE - off%BSIZE);
    if(either_copyout(user_dst, dst, bp->data + (off % BSIZE), m) == -1) {
      brelse(bp);
      tot = -1;
      break;
    }
    brelse(bp);
  }
  return tot;
}

bmap : 看一下這個 offset 會在 disk-hardware 裡面的哪一個 block。
bread(ip->dev, addr) : 現在我們有了 block number，可以用 bread 嘗試把這個 block 從 disk-hardware 讀取到 RAM。
either_copyout(int user_dst, uint64 dst, void *src, uint64 len)
- ( 可能 ) 要從 kernel space 複製資料到 user space
- int user_dst
  - 0 : dst 的 address 是在 kernel space
  - 1 : dst 的 address 是在 user space
- uint64 dst
  - 讀出來的資料要存放的位置
- void *src
  - 讀取的位址
- uint64 len
  - 讀取的長度
- return value
  - 0 : 成功
  - -1 : 失敗

kernel/fs.c/writei

// Write data to inode.
// Caller must hold ip->lock.
// If user_src==1, then src is a user virtual address;
// otherwise, src is a kernel address.
// Returns the number of bytes successfully written.
// If the return value is less than the requested n,
// there was an error of some kind.
int
writei(struct inode *ip, int user_src, uint64 src, uint off, uint n)
{

writei 跟 readi 很像
- readi : struct-inode content 資料讀取到 dst
- writei : 把 src 的資料寫入到 struct-inode 的 content
struct inode *ip
- 代表該檔案的 struct-inode
int user_src
- 0 : src address 是在 kernel space
- 1 : src address 是在 user space
uint64 src
- 資料來源的位址
uint off
- struct-inode content 的偏移量，也就是要從 offset 多少開始寫入
uint n
- 要寫入多少 bytes 的資料

  uint tot, m;
  struct buf *bp;

  if(off > ip->size || off + n < off)
    return -1;
  if(off + n > MAXFILE*BSIZE)
    return -1;

  for(tot=0; tot<n; tot+=m, off+=m, src+=m){
    uint addr = bmap(ip, off/BSIZE);
    if(addr == 0)
      break;
    bp = bread(ip->dev, addr);
    m = min(n - tot, BSIZE - off%BSIZE);
    if(either_copyin(bp->data + (off % BSIZE), user_src, src, m) == -1) {
      brelse(bp);
      break;
    }
    log_write(bp);
    brelse(bp);
  }

  if(off > ip->size)
    ip->size = off;

  // write the i-node back to disk even if the size didn't change
  // because the loop above might have called bmap() and added a new
  // block to ip->addrs[].
  iupdate(ip);

  return tot;
}

剩下的操作跟 readi 差不多，下面舉出幾個不同的地方

either_copyin
- ( 可能 ) 要從 user space 複製資料到 kernel space
- void *dst
  - 讀出來的資料要存放的位置
- int user_src
  - 0 : src 的 address 是在 kernel space
  - 1 : src 的 address 是在 user space
- uint64 src
  - 讀取的位址
- uint64 len
  - 讀取的長度
- return value
  - 0 : 成功
  - -1 : 失敗
log_write(bp)
- 把資料寫進 buffer cache layer 的 struct-buf 之後，記得要用 log_write 標示要把這個 struct-buf 的資料寫進 disk-hardware。
if(off > ip->size)
- 假如這個檔案因為這個寫入而變大了，要記得更新 struct-inode 裡面紀錄的 metadata。
iupdate(ip)
- 這個 function 可能會修改到該檔案的 metadata，所以需要用 iupdate(ip) 把 in-memory-struct-inode 的資訊寫回到 disk-hardware 的 struct-dinode。