2023 iThome 鐵人賽

DAY 11

自我挑戰組

CS補完計畫—演算法與資料結構的第三次衝擊系列第 11 篇

Day 11 吾心吾行澄如明鏡，所作所為皆為 SPEED 其一

15th鐵人賽資料結構與演算法廢文貓咪效能優化

暗魂駭客

團隊每件事都壓死線的大家真的可以完賽嗎？

2023-09-26 21:30:08

703 瀏覽

分享至

在前十天中，我們重點主要放在複雜度的理論和分析的手法。可是理論都知道了，但在實際實作中到底要怎麼寫才能使程式的效率一如預期或甚至比想像更好呢？這就是我們接下來兩天要探討的主題！

時間複雜度的陷阱

給我看清楚了，別把甚麼東西都當 $\mathcal{O}{(1)}$ 啊（迫真）!

小心有些操作根本就不是 $\mathcal{O}{(1)}$ ，如 C 的 strlen()、std::vector::erase()、std::vector::operator<=

在使用一些內建函式或是容器前請先確定好它的複雜度符合需求。

以下會舉兩個因為錯判操作的時間複雜度而超時的案例：

案例 1

Codeforces 1066C

你有個書架，你想在上面放一些書。

因此你給出三種類型的詢問：

L id ：將編號為 id 的書放到書架上的所有書之中的最左的位置
R id ：將編號為 id 的書放到書架上的所有書之中的最右的位置
? id ：計算最少需要從書架上拿下多少書，才能使編號為 id 的書能成為書架上最左、或最右的書

詢問一共有 q 個。書籍的絕對位置不重要、保證第三種請求出現時，書架上必有編號為 id 的書、保證不會出現兩本編號一樣的書。

$1\le q\le2\times 10^5$
$1\le id\le2\times 10^5$

Time limit exceeded 的寫法

TLE 的 submission

void solve() {
    int q;
    cin >> q;
    vector<int> vec;
    while (q--) {
        char op;
        int id;
        cin >> op >> id;
        if (op == 'L') {
            vec.insert(vec.begin(), id);
        } else if (op == 'R') {
            vec.push_back(id);
        } else {
            auto it = find(vec.begin(), vec.end(), id);
            cout << min(it - vec.begin(), vec.end() - it - 1) << "\n";
        }
    }
}

為什麼會 TLE？

可以注意到我是使用 vector 來直接模擬題目中書架的行為，while 迴圈只會跑 q 次，q 頂多到 $2\times 10^5$ ，這種步驟數沒理由 TLE 呀？

看來只能是中間呼叫的函式的複雜度出問題了，這種時候去 cppreference 對應條目的 Complexity 看看：

std::vector::insert
(1)： iterator insert( const_iterator pos, const T& value );
(2)： iterator insert( const_iterator pos, T&& value );
```
1,2) Constant plus linear in the distance between pos and end of the container.
```
因為每次呼叫都是 insert 在開頭的位置，所以每次至多會花費 vec.end() - vec.begin() 的時間。
std::vector::push_back
```
Amortized constant.
```
均攤常數，代表它每次呼叫平均是（昨天均攤分析的對象就是它）
std::find
(1)： InputIt find( InputIt first, InputIt last, const T& value );
(2)： ForwardIt find( ExecutionPolicy&& policy, ForwardIt first, ForwardIt last, const T& value );
```
Given N as std::distance(first, last):

1,2) At most N comparisons with value using operator==.
```
最多 N 次比較，而 N 至多為 vec.end() - vec.begin()
std::min
```
1,2) Exactly one comparison.
```
只有一次比較，因此是。

這樣全部擺出來看後，很明顯問題最大的就是 std::find、vector::insert，因為我們程式的迴圈會跑 q 次，而且按我們的寫法至多要在 vector 中保存 q 本書籍，因此他們兩個函式本身按照上面的資料，至多都要跑 q 次，因此我們程式就這樣硬生生被拖到 $\mathcal{O}{(q^2)}$ ， $2\times 10^5 = 4\times 10^{10}$ ，這樣當然會 TLE。

至於正確寫法不是本節要討論的重點，不過有興趣的可以參考看看這個 submission

案例 2

CSES 1091 Concert Tickets

m 個人依序排隊買 n 張演唱會的票。

每個人會事先說出自己所能接受的最貴價格，當他們排到隊時，他們會買到價格小於等於他們提出的最貴價格的票，且同一張票賣完就沒了，求每位顧客買到票的價格？若買不到就輸出 "-1"。

$1\le n, m\le 2\times 10^5$
$1\le h_i, t_i\le 10^9$

Time limit exceeded 的寫法

TLE 的 submission

void solve() {
    int n, m;
    cin >> n >> m;
    vector<int> vec(n);
    for (int &i : vec) cin >> i;
    sort(vec.begin(), vec.end());
    while (m--) {
        int t;
        cin >> t;
        auto it = upper_bound(vec.begin(), vec.end(), t);
        if (it == vec.begin()) {
            cout << "-1\n";
        } else {
            cout << *prev(it) << "\n";
            vec.erase(prev(it));
        }
    }
}

為什麼會 TLE？

乍看之下開銷最大的是程式內總共會跑 m 次的迴圈，不過即使如此它的次數也僅僅只有到 $2\times 10^5$ ，完全沒有理由 TLE！

這種時候去 cppreference 對應條目的 Complexity 看看：

std::sort

O(N·log(N)) comparisons, where N is std::distance(first, last).

在我們的狀況 n 就是 N

std::upper_bound

The number of comparisons performed is logarithmic in the distance between first and last (at most log_2(last - first) + O(1) comparisons).
However, for non-LegacyRandomAccessIterators, the number of iterator increments is linear. Notably, std::map, std::multimap, std::set, and std::multiset iterators are not random access, and so their member upper_bound functions should be preferred.

因為我們使用的 std::vector 的疊代器屬於隨機存取疊代器，因此它在我們程式中的時間複雜度屬於 $\mathcal{O}{(\log{n})}$

std::vector::erase

Linear: the number of calls to the destructor of T is the same as the number of elements erased, the assignment operator of T is called the number of times equal to the number of elements in the vector after the erased elements.

線性時間意味著 $\mathcal{O}{(n)}$

因此我們可以發現 std::vector::erase 是這段程式 TLE 的罪魁禍首，配上外層跑 m 次的迴圈，這樣總時間複雜度是 $\mathcal{O}{(nm)}$ ，最大可是會到 $(2\times 10^5)\times (2\times 10^5)=4\times 10^{10}$ ，絕對 TLE。

至於正確寫法不是本節要討論的重點，不過有興趣的可以參考看看這個 submission

你什麼時候產生了沒有物件複製的錯覺

善用參考、小心右值產生臨時物件

當你在操作較大的資料物件如：String 時如果沒有仔細注意寫法，可能會不經意複製出臨時物件而造成額外的記憶體、時間開銷。

本節將介紹幾種會造成物件複製的寫法、以及該如何避免

傳值？傳址？

當函數呼叫時，參數若是用傳值的，就等於把參數的物件「複製」一遍，此時若傳的是像 string、vector 等比較佔記憶體的物件，不斷的複製會造成很高的時間成本。

就像以下的實驗：

#include <bits/stdc++.h>

using namespace std;

int f(string s, int x) {
    return s[x];
}

int main() {
    string s(500000, 'a');
    int ans = 0;
    
    auto start = chrono::high_resolution_clock::now();
    for (int i = 0; i < 200000; ++i) {
        ans += f(s, i);
    }
    auto end = chrono::high_resolution_clock::now();
    chrono::microseconds t = chrono::duration_cast<std::chrono::microseconds>(end - start);
    cout << "duration: " << t << "\n";
    
    cout << ans << "\n";
    return 0;
}

上面這段程式最後跑出的結果是：

duration: 2166996µs
19400000

差不多是 2.166 秒，但我們迴圈只有跑區區 $2\times 10^5$ 次，明顯不合理！

把函式呼叫改成傳參考後：

...
int f(string &s, int x) {
    return s[x];
}
...

時間上就明顯看到改善了

duration: 72µs
19400000

運算臨時物件

拿一些較佔記憶體的物件進行基本運算時，它的時間複雜度不一定是 $\mathcal{O}{(1)}$

例如以下的實驗：

#include <bits/stdc++.h>

using namespace std;

int main() {
    string s;
  
    auto start = chrono::high_resolution_clock::now();
    for (int i = 0; i < 200000; ++i) {
        s = s + 'a';
    }
    auto end = chrono::high_resolution_clock::now();
    chrono::microseconds t = chrono::duration_cast<std::chrono::microseconds>(end - start);
    cout << "duration: " << t << "\n";
    
    cout << s << "\n";
    return 0;
}

執行的結果是

duration: 1823021µs
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa // 以下省略

約 1.823 秒，但我們的迴圈只有跑 $2\times 10^5$ 次，這不合理！
會這樣是因為程式在計算 s = s + 'a' 時，一律會先把右邊的東西算完再賦值給左邊的物件，而以現在的狀況，右邊的物件計算時會產生一個 s + 'a' 的臨時物件來存計算結果，再複製給左邊的 s，這樣的過程很耗時。

這種時候可以用 += 運算！因為它通常會直接讓右邊的物件直接作用到左邊的物件上，因而省去了複製、臨時物件的時間成本。

把 + 改成 += 後：

...
    s += 'a';
...

時間明顯大有改善

duration: 520µs
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa // 以下省略

可以讓人家這樣初始化了又初始化了又初始化的嗎!

和記憶體分配接近 $\mathcal{O}{(1)}$ 不一樣！初始化一般來說都是 $\mathcal{O}{(n)}$

記憶體分配在 runtime 的變因很多，不過它運行的時間與其說跟我們要求的記憶體大小相關，倒不如說和我們當下記憶體碎片化的程度關係比較大；而且它的運行時間一般來說也不會太大，所以可以將它視為 $\mathcal{O}{(1)}$ 的操作

不過「初始化」就不一樣了，因為你在初始化某個資料結構如：陣列時，必須完整遍歷所有元素並賦值，因此不管是用什麼實作方式，它一般都是 $\mathcal{O}{(SIZE)}$ ，假設 SIZE 代表你要初始化的長度。

以下先介紹常見的初始化方法：

利用 STL 容器本身的建構式

ector<int> vec(n, 48763);
et<int> st(vec.begin(), vec.end());
ector<pair<int, int>> vec2(n, {1, 2});
ap<int, int> mp(vec2.begin(), vec2.end());
``

memset、std::fill

nt arr[10000];
emset(arr, 0, sizeof(arr));
ill(arr, arr + 10000, 0);
``
較值得一提的是，memset 是逐 byte 進行初始化的，使用上要多加注意，不過它的速度比較快一點，雖然還是 ![https://chart.googleapis.com/chart?cht=tx&chl=%5Cmathcal%7BO%7D%7B%28n%29%7D](https://chart.googleapis.com/chart?cht=tx&chl=%5Cmathcal%7BO%7D%7B%28n%29%7D) 啦；std::fill 就沒有這個限制，雖然會慢一點點，但也沒差那麼多。

Aggregate Initialization、List Initialization

nt arr1[1000] = {}; // 這個和下面的意思一樣！
int arr2[1000] = {0}; // 和上面意思一樣，你打一個數字
int arr3[1000] = {1, 2}; // 這代表前兩個數分別是 1, 2 剩下全部是 0

vector<int> vec {1, 2, 3}; // 初始化三個元素
set<int> st {0}; // 初始化 1 個 0
map<int, int> mp {{1, 2}, {3, 4}}; // 初始化兩個元素：一個鍵 1 值 2、一個鍵 3 值 4
``
知道上面那樣確實是有初始化的喔！也因為是初始化，所以必定會耗費線性的時間，不過這種初始化只能用於宣告的時候就是了。

案例 1

Codeforces 1775B

有一個長度為 n 的陣列 a，a 的第 i 個數字在二進位表示下有 $k_i$ 個位元是 1，這些位元的位置會分別給你。

求是否存在兩個子序列，它們各自 bitwise OR 的結果一樣？

共有 t 組測試資料
子序列定義：對於一個序列，刪除零個或一個以上的元素後的序列。

$1\le n\le 10^5$
$1\le k_i\le 10^5$
$1\le t\le 10^5$
保證題目中所有的 k 加總小於 $10^5$

Time limit exceeded 的寫法

TLE 的 submission

#include <bits/stdc++.h>

using namespace std;

void solve() {
    int check[200050] = {};
    int n;
    cin >> n;
    vector<vector<int>> vec(n);
    for (int i = 0; i < n; ++i) {
        int k;
        cin >> k;
        vec[i].resize(k);
        for (int &p : vec[i]) cin >> p, ++check[p];
    }
    bool ok = false;
    for (int i = 0; i < n; ++i) {
        bool flag = true;
        for (int j = 0; j < (int)vec[i].size(); ++j) {
            if (check[vec[i][j]] - 1 <= 0) {
                flag = false;
                break;
            }
        }
        if (flag) ok = true;
    }
    if (ok) cout << "Yes\n";
    else cout << "No\n";
}

int main() {
    ios::sync_with_stdio(0), cin.tie(0);
    int t;
    cin >> t;
    while (t--) solve();
}

為什麼會 TLE？

乍看之下，最內層迴圈最多跑 k 次呀？而 k 加總最多為 $10^5$ ，怎麼看多綽綽有餘，這不合理！

其實問題都出在第 2 行的 int check[200050] = {}; ，前面說過這種寫法也是在初始化，所以它的時間複雜度也必定是線性的，而在此情況下，我們每跑一次 solve() 都會初始化一次。

因此其實我們的步驟數應該是 $t\times 200050=10^5\times 200050=20005000000$ 難怪會 TLE。