Semaphore

第 11 屆 iThome 鐵人賽

DAY 7

自我挑戰組

Linux Inside - Synchronization & Interrupt系列第 7 篇

11th鐵人賽

ian910297

2019-09-08 18:34:15

7086 瀏覽

分享至

參考 Synchronization primitives in the Linux kernel. Part 3.

Semaphore

先前介紹過 Spinlock ，他是專門應用在 CS(Critical Section) 極短的程式，且會進行 Busy Waiting，不會釋出 CPU 資源，屬於 Blocking 類型的 Lock

而 Semaphore 則是擁有與其不同的特性，如果獲得不到 Lock ，則會釋出 CPU 資源，進行所謂的 Context Switch，這樣的設計比較適合 CS 較長的程式

Semaphore 會使用一個數字來表示「允須多少程式同時執行 CS」，Anyway 我覺得這兩個概念是差不多的，而 Semaphore 可以分為以下兩種情況

Binary Semaphore
總共只有一個資源可以使用，所以只存在 0 1 兩個狀態，0 表示沒有程式允許執行 CS ，1 表示有一個程式允許進入

之後要講的 Mutex 也可以理解成一個特殊的 Semaphore，當然他的實作遠比 Semaphore 複雜
Normal Semaphore
任何大於 1 的數字都可以，有程式成功進入 CS 的話，Semaphore 數字就會下降一，最多只能下降到零，當有程式完成任務退出 CS 時，Semaphore 數字則會上升

資料結構

對於 Semaphore 有些概念後，直接從他的資料結構開始閱讀吧

include/linux/semaphore.h

struct semaphore {
    raw_spinlock_t        lock;
    unsigned int        count;
    struct list_head    wait_list;
};

raw_spinlock_t lock 之前有提過，這些實作都是基於 Spinlock 上進行實作，而這邊使用的 Lock 物件是 raw_spinlock_t
count 允許多少資源進入 CS
struct list_head wait_list 一個雙向的 Linked List，這邊用作排隊使用，規則是 FIFO(First-In-First-Out)，記錄所有正在睡眠的

struct semaphore_waiter {
	struct list_head list;
	struct task_struct *task;
	bool up;
};

程式碼閱讀

就 ... 直接來閱讀程式碼吧，沒看到有啥要介紹的，觀看 kernel/locking/semaphore.c

`void down(struct semaphore *sem)`

void down(struct semaphore *sem)
{
	unsigned long flags;

	raw_spin_lock_irqsave(&sem->lock, flags);
	if (likely(sem->count > 0))
		sem->count--;
	else
		__down(sem);
	raw_spin_unlock_irqrestore(&sem->lock, flags);
}

可以看到透過呼叫 raw_spin_lock_irqsave 來儲存中斷狀態

如果 count 數值還夠的話，透過減少 count 數值，來作資源分配的標記

但如果資源不夠呢？？這邊是如何實現 Context Switch 呢？？

static noinline void __sched __down(struct semaphore *sem)
{
	__down_common(sem, TASK_UNINTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
}

static inline int __sched __down_common(struct semaphore *sem, long state,
								long timeout)
{
	struct semaphore_waiter waiter;

	list_add_tail(&waiter.list, &sem->wait_list);
	waiter.task = current;
	waiter.up = false;

	for (;;) {
		if (signal_pending_state(state, current))
			goto interrupted;
		if (unlikely(timeout <= 0))
			goto timed_out;
		__set_current_state(state);
		raw_spin_unlock_irq(&sem->lock);
		timeout = schedule_timeout(timeout);
		raw_spin_lock_irq(&sem->lock);
		if (waiter.up)
			return 0;
	}

 timed_out:
	list_del(&waiter.list);
	return -ETIME;

 interrupted:
	list_del(&waiter.list);
	return -EINTR;
}

可以發現，__down 涵式真的有點複雜，以下列出幾個可能會有疑惑的點

sem->wait_list 已經有記錄順序了，為何還需要宣告一個 waiter->list 來作記錄？？

list_add_tail(&waiter.list, &sem->wait_list);

這個地方我看超久才看懂，我一開始一直在想為什麼要把 sem->wait_list 丟到 waiter.list 後面？？？

include/linux/list.h

static inline void __list_add(struct list_head *new,
	      struct list_head *prev,
	      struct list_head *next)
{
    if (!__list_add_valid(new, prev, next))
        return;

    next->prev = new;
    new->next = next;
    new->prev = prev;
    WRITE_ONCE(prev->next, new);
}

static inline void list_add_tail(struct list_head *new, struct list_head *head)
{
    __list_add(new, head->prev, head);
}

結果只是因為我沒去看他定義的參數意義是什麼＝＝

waiter.task = current 是什麼？？
這是一個 macro ，表示第一個想要獲得 Lock 的處理器

arch/x86/include/asm/current.h

DECLARE_PER_CPU(struct task_struct *, current_task);

static __always_inline struct task_struct *get_current(void)
{
    return this_cpu_read_stable(current_task);
}

#define current get_current()

signal_pending_state 是什麼？？
- include/linux/sched/signal.h
```
static inline int signal_pending_state(long state, struct task_struct *p)
{
         if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
                 return 0;
         if (!signal_pending(p))
                 return 0;

         return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
}
```
1. 先透過位元運算檢查是否含有我們所需的位元，沒有的話就直接離開
2. 若含有我們所要求的位元，則檢查是否有待處理的 Signal，沒有的直接離開
3. 最終再次檢查是否含有 TASK_INTERRUPTIBLE 位元或者 SIGKILL Signal
對照到原先的涵式中，如果有需要被中斷，就會執行 goto interrupted;
最外面使用 raw_spin_lock_irqsave ，為何這邊只使用 raw_spin_unlock_irq？？只解除硬體中斷的意義在於？？
```
raw_spin_unlock_irq(&sem->lock);
timeout = schedule_timeout(timeout);
raw_spin_lock_irq(&sem->lock);
```
可以發現，是為了要使用 schedule_timeout 這個涵式，所以才把硬體中斷打開，那麼什麼是 schedule_timeout，有興趣可以自己查閱 kernel/time/timer.c，這已經是 Timer 跟 Scheduler 的範疇了，這邊只解釋其應用的意義

參考 Small Delay

Linux 中有數種 Delay 方法
- udelay mdelay
  進行 Busy Waiting ，這兩種方法是應用於 Driver 開發中，因為使用 Timer 來作計時可能會不夠精準
- schedule_timeout
  搭配 Timer 計算時間，直接使程式進入 Sleep State 沉睡一段時間，而不是 Busy Waiting
何時把 waiter.up 設為 true ？？
等到呼叫 __UP 涵式時，才會將其改為 true
return -ETIME return -EINTR？？？
參考 Operating System Error Codes, man 2 syscall
ETIME EINTR 是一種錯誤碼

- 只是指負號，在 system call 中通常會回傳一個負數的錯誤碼，表示錯誤，不會特別再使用暫存器來記錄錯誤

`void up(struct semaphore *sem)`

void up(struct semaphore *sem)
{
        unsigned long flags;

        raw_spin_lock_irqsave(&sem->lock, flags);
        if (likely(list_empty(&sem->wait_list)))
                sem->count++;
        else
                __up(sem);
        raw_spin_unlock_irqrestore(&sem->lock, flags);
}

static noinline void __sched __up(struct semaphore *sem)
{
        struct semaphore_waiter *waiter = list_first_entry(&sem->wait_list,
                                                struct semaphore_waiter, list);
        list_del(&waiter->list);
        waiter->up = true;
        wake_up_process(waiter->task);
}

up 涵式相對簡單，只要把 waiter->up 設為 true ，即可成功釋出資源

Other API

int down_interruptible(struct semaphore *sem)
int down_killable(struct semaphore *sem)
int down_timeout(struct semaphore *sem, long jiffies)
以上三個函數都是使用 __down 只是傳速的參數不同，其中 down_timeout 又再度提到了 Timer，使用他來計時，決定多久要 Timeout
int down_trylock(struct semaphore *sem)
與 spin_trylock 類似，失敗後直接結束