Lock Performance & Priority Inversion

第 11 屆 iThome 鐵人賽

DAY 9

自我挑戰組

Linux Inside - Synchronization & Interrupt系列第 9 篇

11th鐵人賽

ian910297

2019-09-10 14:49:23

6492 瀏覽

分享至

Lock Performance

主要針對 Spinlock 與 Mutex 兩者去作比較

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
#include <pthread.h>
#include <sys/time.h>

int THREAD_NUM;
int WORKLOAD;

#ifdef USE_SPINLOCK
pthread_spinlock_t spinlock;
#else
pthread_mutex_t mutex;
#endif

void *worker(void *arg) {
#ifdef USE_SPINLOCK
    pthread_spin_lock(&spinlock);
#else
    pthread_mutex_lock(&mutex);
#endif
    for(int i=0; i<WORKLOAD; ++i) {
        asm("nop");
    }
#ifdef USE_SPINLOCK
    pthread_spin_unlock(&spinlock);
#else
    pthread_mutex_unlock(&mutex);
#endif
}

int main(int argc, char *argv[]) {
    int THREAD_NUM = atoi(argv[1]);
    int WORKLOAD = atoi(argv[2]);

    double duration;
    struct timeval start, end;
#ifdef USE_SPINLOCK
    pthread_spin_init(&spinlock, 0);
#else
    pthread_mutex_init(&mutex, NULL);
#endif

    gettimeofday(&start, NULL);
    pthread_t tpool[THREAD_NUM];
    for(int i=0; i<THREAD_NUM; ++i) {
        pthread_create(&tpool[i], NULL, worker, NULL);
    }
    for(int i=0; i<THREAD_NUM; ++i) {
        pthread_join(tpool[i], NULL);
    }
    gettimeofday(&end, NULL);
    duration = (end.tv_sec - start.tv_sec) * 1000 +
        (double)(end.tv_usec - start.tv_usec) / 1000.0f;
    //printf("#Elapsed_time: %.3lf ms\n", duration);
    printf("%.3lf", duration);

#ifdef USE_SPINLOCK
    pthread_spin_destroy(&spinlock);
#else
    pthread_mutex_destroy(&mutex);
#endif

    return 0;
}

假設現在第一隻程式長這樣，主要有只有兩個重點

THREAD_NUM Thread 的數量
WORKLOAD Workload 要跑多少個迴圈

然後我們寫一個 Shell Script 來作觀測

#!/bin/bash
THREAD_NUM_BEGIN=1
THREAD_NUM_END=128

WORKLOAD_BEGIN=100
WORKLOAD_END=10000000

TEST_TIMES=30

gcc -DUSE_SPINLOCK -pthread thread.c -o spin.out
gcc -pthread thread.c -o mutex.out

> mutex.data
> spin.data

printf "THREAD_NUM\tWORKLOAD\t\tMUTEX\tSPIN\n"
for ((i=$THREAD_NUM_BEGIN; i<=$THREAD_NUM_END; i*=2));
do
    for ((j=$WORKLOAD_BEGIN; j<=$WORKLOAD_END; j*=10));
    do
        mutex_cuml_time=0
        spin_cuml_time=0
        for k in $(seq 1 $TEST_TIMES);
        do
            # mutex
            tmp=$(./mutex.out $i $j)
            cmd="python3 -c 'print($mutex_cuml_time+$tmp)'"
            mutex_cuml_time=$(eval $cmd)
            #echo $mutex_cuml_time
            echo $i, $j, $tmp >> mutex.data

            # spin
            tmp=$(./spin.out $i $j)
            cmd="python3 -c 'print($spin_cuml_time+$tmp)'"
            spin_cuml_time=$(eval $cmd)
            #echo $spin_cuml_time
            echo $i, $j, $tmp >> spin.data
        done

        # calculate total time
        cmd="python3 -c 'print($mutex_cuml_time+$TEST_TIMES)'"
        mutex_cuml_time=$(eval $cmd)
        cmd="python3 -c 'print($spin_cuml_time/$TEST_TIMES)'"
        spin_cuml_time=$(eval $cmd)
        printf "%10.d\t%10.d\t\t%.3f\t%.3f\n" $i $j $mutex_cuml_time $spin_cuml_time
    done
done

這會一直印出跑執行的時間，我讓每個 Case 都跑 30 次，至於為什麼是跑這個數字，就當作是根據中央極限定理吧，懶得跑太久～～

從執行結果來看，可以發現 Mutex 對於這種 CS 太小的程式，真的是沒什麼抵抗力

THREAD_NUM      WORKLOAD                MUTEX   SPIN
         1             100              33.456  0.113
         1            1000              33.390  0.111
         1           10000              33.445  0.114
         1          100000              33.247  0.109
         1         1000000              33.364  0.113
         1        10000000              33.418  0.109
         2             100              32.985  0.098
         2            1000              32.841  0.101
         2           10000              32.948  0.097
         2          100000              32.875  0.100
         2         1000000              32.958  0.098
         2        10000000              32.971  0.100
         4             100              34.896  0.216
         4            1000              34.154  0.181
         4           10000              34.177  0.193
         4          100000              36.102  0.156
         4         1000000              35.112  0.209
         4        10000000              39.425  0.273
         8             100              48.403  1.094
         8            1000              38.701  0.251
         8           10000              37.491  0.233
         8          100000              37.128  0.244
         8         1000000              37.217  0.243
         8        10000000              37.669  0.288
        16             100              40.906  0.469
        16            1000              41.131  0.357
        16           10000              47.964  0.433
        16          100000              54.994  0.619
        16         1000000              102.298 2.740
        16        10000000              142.992 3.916
        32             100              71.756  0.892
        32            1000              67.426  1.090
        32           10000              74.947  1.808
        32          100000              65.064  1.458
        32         1000000              58.783  0.696
        32        10000000              48.504  0.813
        64             100              62.291  1.029
        64            1000              63.166  1.238
        64           10000              63.631  1.131
        64          100000              63.434  1.148
        64         1000000              72.804  1.330        
        64        10000000              65.856  1.136      
       128             100              95.063  2.309
       128            1000              91.576  2.097
       128           10000              98.044  2.240
       128          100000              97.825  2.250
       128         1000000              94.199  2.389
       128        10000000              96.054  2.096

之後可以把 CS 區間拉長，修改 worker 的內容再做一次實驗，不過不知道是什麼原因，可能是因為我 CPU 只有四核心，我實在看不太出差異，目前用起來都是 Spinlock 比較快＝＝有什麼進展再貼上來更新好了

Priority Inversion

Priority Inversion 是在使用 Mutex 的時候，可能會發生的問題，因為 Mutex 一次只能允許一個程式進入 CS(Critical Section)

參考 Wikipedia 優先權倒置

假設條件如下：

有三個 Thread T1 T2 T3
編號也同時代表他們的優先權大小，T3 > T2 > T1
有一個 CS ，T1 跟 T3 都需要進入

問題描述：

在最開始時，沒有其他 Thread 正在運行，T1 先執行，所以他就順利進入 CS
T2 在 T1 運行一段時間後進入系統，因為 T2 優先權比 T1 高，打斷 T1 且執行 T2
T3 在 T2 運行一段時間後進入系統，因為 T3 優先權比 T2 高，打斷 T2 且執行 T3

可是 T3 所需的 CS 正在被 T1 持有，所以他只好讓出 CPU 使用權，讓 T2 繼續執行。且 T2 執行完畢後，也要等 T1 執行完畢，成功退出 CS 之後，才能輪到 T3 執行

這就是 Priority Inversion 的定義，雖然優先權比較高，但是實際執行順序卻是 T2 -> T1 -> T3

Solution

Priority Ceiling

參考

顧名思義「優先天花板」＝＝聽起來超級詭異，是指現在所有要進入那個 CS 的程式中，擁有最大優先權的那個數值，就是 Priority Ceiling

只要進入那個 CS 之後，該程式的 Priority 就會被短暫提高到那個數值，前述所提到的最大值

Priority Inheritance

在我們描述的情境中，T1 T3 要進入同一個 CS ，那麼 Priority Inheritance 指的是讓 T1 繼承 T3 的 Priority，這樣作的話 T1 就不會被 T2 打斷，等到退出 CS 之後，再把自身的 Priority 調回來

Others

實際上 Priority Inversion 還有一些解法，不過我這邊就不介紹了，最近還要刷 Leetcode 跟薩爾達傳說，所以有點忙 ......

Mutex

[LAB 背景知識] Blocking Linked List

系列文

Linux Inside - Synchronization & Interrupt 共 18 篇

RSS系列文訂閱系列文

15 人訂閱

完整目錄

直播研討會

{{ item.channelVendor }} {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

1064 組

團體組數

40 組

累計文章數

22201 篇

完賽人數

602 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# windows server linux css react vue.js

IT邦幫忙

Linux Inside - Synchronization & Interrupt系列 第 9 篇