Day6 演算法(4) Hash(雜湊)

2024 iThome 鐵人賽

DAY 6

佛心分享-IT 人自學之術

什麼都摸一點！拒絕當不沾鍋！系列第 6 篇

16th鐵人賽

kkeea1020

2024-08-29 13:16:15

166 瀏覽

分享至

什麼是Hash?

通常來說，我們要紀錄一個東西出現的次數或有沒有出現過，我們會用到Hash的概念。

Hash是一種映射關係(Key-Value對)，陣列也是個Hash，只是陣列的key綁定為數字(索引)。

舉例來說：想要紀錄”aabccd”，這些英文字母各自的出現次數，我們可以用一個陣列空間為26的陣列array去紀錄，例如array[0]代表的是a出現的個數，array[1]代表的是b出現的個數：

int array[26] = {0}; // 全部初始化為0
str::string word = "aabccd";
for (int i = 0; i < word.size(); ++i)
{
	 // C++中字元相減是用ASCII所對應到的號碼做數字的減法
	 // 舉例來說，word[0] = 'a'，'a'(65) - 'a'(65) 就會是0，'b' - 'a' 就會是1
	array[word[i] - 'a']++;
}
for (int i = 0; i < 26; ++i)
{
	std::cout << char('a' + i) << " : " << array[i] << '\n';
}

1207. Unique Number of Occureences

Given an array of integers arr, return true if the number of occurrences of each value in the array is unique or false otherwise.

Example 1:

Input: arr = [1,2,2,1,1,3]
Output: true
Explanation: The value 1 has 3 occurrences, 2 has 2 and 3 has 1. No two values have the same number of occurrences.

C++:

// Sol1:
class Solution {
public:
    bool uniqueOccurrences(vector<int>& arr) {
        std::array<int, 2001> occurrences({0});
        for (int i = 0; i < arr.size(); ++i)
            occurrences[arr[i] + 1000]++; 
            // 為什麼+1000? 注意題目的Constraint! -1000 <= arr[i] <= 1000 
            
        std::sort(occurrences.begin(), occurrences.end());
        
        for (int i = 1; i < occurrences.size(); ++i)
		        // 已經做完排序，所以occurrences[i] == occurrences[i - 1]可以判斷是否唯一的。
            if (occurrences[i] != 0 && occurrences[i] == occurrences[i - 1])
                return false; 
        return true;
    }
};

// Sol2 空間換時間:
class Solution {
public:
    bool uniqueOccurrences(vector<int>& arr) {
        int occurrences[2001] = {0};
        for (int i = 0; i < arr.size(); ++i)
            occurrences[arr[i] + 1000]++;
            
        // 將出現頻率再映射到另一個陣列，如果這個出現頻率已經出現過(true)，則代表不唯一。
        bool isTaken[1001] = {false}; //至多出現1000次，所以只需要1001格(0~1000)
        for (int i = 0; i < 2001; ++i)
        {
            if (occurrences[i] == 0) // 沒出現過的不用管
                continue;
            if (isTaken[occurrences[i]] == true)
                return false;
            else
                isTaken[occurrences[i]] = true;
        }
        return true;
    }
};

Python:

class Solution(object):
    def uniqueOccurrences(self, arr):
        """
        :type arr: List[int]
        :rtype: bool
        """
        occurrences = [0] * 2001
        for num in arr:
            occurrences[num + 1000] += 1
        isTaken = [0] * 1001
        for occurrence in occurrences:
            if occurrence != 0 and isTaken[occurrence] == True:
                return False
            else:
                isTaken[occurrence] = True
        return True

202. Happy Number

Write an algorithm to determine if a number n is happy.

A happy number is a number defined by the following process:

Starting with any positive integer, replace the number by the sum of the squares of its digits.
Repeat the process until the number equals 1 (where it will stay), or it loops endlessly in a cycle which does not include 1.
Those numbers for which this process ends in 1 are happy.

Return true if n is a happy number, and false if not.

Example 1:

Input: n = 19
Output: true
Explanation:
12 + 92 = 82
82 + 22 = 68
62 + 82 = 100
12 + 02 + 02 = 1

根據題目描述，我們要思考怎樣會無限循環下去?，A→B→C→A，形成一個環的就會無限循環，也就是說在整個過程中，絕對不能有數字重複出現(跟頻率有關)，不然就會進入死循環。

所以這題我們要用Hash：

Note:在用陣列的hash法時，我們要先確定範圍，n最大為2^31 - 1是10位數，那以平方總和最高的n就會是1999999999(1個1和9個9)，總和是730，所以我們要開731大小的陣列

C++:

// Sol1:
class Solution {
public:
    int happy(int n)
    {
        int sum = 0;
        while (n)
        {
            sum += (n % 10) * (n % 10); // % 的優先級低於 * 所以要括號！
            n /= 10;
        }
        return sum;
    }
    bool isHappy(int n) {
        std::array<bool, 731> hash({false});
        while (n > 1) // 不用擔心跑不出去，因為我們會去判斷死循環的唯一可能(重複出現)
        {
            int tmp = happy(n); // 計算每一位數的平方總和
            if (hash[tmp] == true) // 看是否出現過
                return false;
            hash[tmp] = true;
            n = tmp; // 將目前的n更新為tmp(上面算的平方總和)
        }
        return true; // 跑出上面的while循環就代表到1了！
    }
};

// Sol2(set):
class Solution {
public:
    int happy(int n)
    {
        int sum = 0;
        while (n)
        {
            sum += (n % 10) * (n % 10); 
            n /= 10;
        }
        return sum;
    }
    bool isHappy(int n) {
        std::set<int> st; // 有key沒value
        while (n > 1) 
        {
            int tmp = happy(n); 
            if (st.find(tmp) != st.end()) // 曾經出現過的意思(請參考C++ iterator)
                return false;
            st.insert(tmp); // 將該數放入set中
            n = tmp; 
        }
        return true; 
    }
};

Python:

class Solution(object):
    def happy(self, n):
        total_sum = 0
        while n > 0:
            digit = n % 10
            total_sum += digit * digit
            n //= 10
        return total_sum

    def isHappy(self, n):
        """
        :type n: int
        :rtype: bool
        """
        seen = set()
        while n != 1 and n not in seen:
            seen.add(n)
            n = self.happy(n)
        return n == 1

1002. Find Common Characters

Given a string array words, return an array of all characters that show up in all strings within the words (including duplicates). You may return the answer in any order.

Example 1:

Input: words = ["bella","label","roller"]
Output: ["e","l","l"]

這題給大家練習看看，先試著思考一遍，實作一遍，如果看不懂解答的話，將整個程式的流程都畫圖畫出來：

class Solution {
public:
    vector<string> commonChars(vector<string>& words) {
        std::array<int, 26> hash1({0});
        for (char c : words[0])
            hash1[c - 'a']++;
        for (int i = 1; i < words.size(); ++i)
        {
            std::array<int, 26> hash2({0});
            for (char c : words[i])
                hash2[c - 'a']++;
            for (int i = 0; i < 26; ++i)
                hash1[i] = std::min(hash1[i], hash2[i]);
        }
        std::vector<std::string> result;
        for (int i = 0; i < 26; ++i)
            while(hash1[i]--)
                result.push_back({char('a' + i)});
        return result;
    }
};

C++中的map(對應到Python的Dict)、set，相較於普通陣列的好處在於：

陣列的key為索引，但map(key-value)、set(key)的key都能是任意數據型態(非基本類型需自行實現hash function)
map、set是離散存取(基於Tree的結構)，所以比較省空間(如果你想記錄50000這個數的出現次數，你就得開這麼50001大的陣列，如果單純使用Linked List儲存，查找速度也會是個問題)，有興趣可以去看C++ map和set的底層實現！

(依照底層實現的不同，C++中的map、unordered_map、set、unordered_set在查找、插入、刪除的時間複雜度也有所不同)