The big O of finding an element with sequential search method in a traditional array is O(n).
As the array size grows, it 'll require more time to find.
But if create an array that store the elements with their index, the big O of finding specific index of element in array is O(1), however it requires more memory space to storing elements.
So, it comes to the idea of hash table.
為了加快我們從 dataset 查找特定資料的速度,在一般 array 中做搜尋通常使用線性搜尋(sequancial search)其 Big O 為 O(n),當 array size 越來越大,耗費的時間也越來越長
但從 array index 取得對應的值其 Big O 為 O(1),所以 hash table 的概念由此而生
Hash table is a data structure that implements an associative array, which maps keys to values, you can see hash table as an associative array storing the key value pairs whom's key generate by hash function.
A hash table use hash function to compuate the indices that map to the values.
Each position in the hash table, often called a slot.
A map implemented by a hash table is called a hash map.
雜湊表(hash table)是一種資料結構,基本上可以將雜湊表看作是儲存由雜湊函式產生的鍵值對(key value pairs)的關聯陣列,每個鍵值/索引值映射(map)到每個元素的值
其元素的鍵值(key value)由 hash function 所產生,
雜湊表使用雜湊函式來產生映射到值的索引值/鍵值
在雜湊表中的每個位置稱為 slot
通常 slot 用來指稱 hash table 中的單一一個位置或最小單位
使用鏈結處理發生碰撞的情況,在發生碰撞的位置上會有多個元素構成的集合,這個集合便稱為 bucket
可以將 bucket 視為將所有發生衝突的元素儲存在一起的陣列 / 鏈結串列
(沒有碰撞的話自然也就不會有 bucket)
A hash function transforms an input value (often the key of an element) into another value (known as a hash code) that represents the position of the element in the hash table. Hashing is commonly used to store passwords in database, where the original password is converted into a hashed value to enhance security.
There are types of hash function,such as:
運算出 index of element 有很多方法,這裡介紹幾個簡單的方法
實際上應用的 hash function 有很多種,並不只這幾種
m = size of hash table
n = numbers of elements to store in the hash table
Index = key mod n
索引值 = 鍵值 % 欲存取在 hash table 內的元素數量
There is a hash table which size = 50
and there's a element which key value is 1359.
What is the hash index of the element with multiplication method?
假設有一個長度為 50 的 hash table
有ㄧ元素 key 值為 1359 ,那該元素使用 division method 產生於 hash table 內的索引值為何?
index = 1359 % 50
Ans: 9
prime number
--
質數
時,其效果很好--
m = size of hash table
n = numbers of elements to store in the hash table
keyA = irrational number(無理數), usually (√5-1) / 2
approximately to 0.618
Index = m*(keyA mod 1)
There is a hash table which size = 50
and there's a element which key value is 1359.
What is the hash index of the element with multiplication method?
假設有一個長度為 50 的 hash table
有ㄧ元素 key 值為 1359 ,那該元素使用 multiplication method 產生於 hash table 內的索引值為何?
index = ⌊50 × ( 1359 × 0.6180339887 mod1)⌋
0.9081
50 * 0.9081 = 45.405
45
Ans: 45
--
(√5 - 1) / 2
)--
(√5 - 1) / 2
無理數構成的常數運算)If the key of an element is a string rather than a number, we need to convert it to a numberic form for hash table storage. One simple (but ineffective) method is using the length of string. A better approach is to calculate the sum of the ASCII values of each character in the string, which can be done using the charAt
method.
To improve hash distributions, we can apply specific rules like multiplying ASCII values by certain constants or taking remainders to generate a more evenly spread hash index.
--
但元素的鍵值不一定會是數字,有時候也會遇到字串型態的鍵值,這時候可以取字串的長度(效率並不好)、用charAt
取字符的 ASCII 值並加總、對加總的 ASCII 值乘除某特定常數或取餘數等等,以讓雜湊值更廣泛分散於雜湊表中
舉例來說
parse(string){
let parseKey =0;
// get the sum of all letters in ASCII type
for(let i=0;i<string.length;i++){
parseKey += string[i];
}
return parseKey;
}
一樣用 class 的寫法來處理,完整程式碼在這裡
為了處理 collision 問題,這裡使用 seperate chaining 的方法(下篇會提到)
每個 slot 會是另一個 array,所以 hash table 會是一個 array of array
當 set 遇到 collision ,則 push({key,value}) 到 this.table[index]
當 get 遇到 collision ,則 loop through this.table[index] 來找到完全符合的元素
另外 multiplication 的 index 於 javascript 中如下,要特別小心括號的位置
return Math.floor(this.size * ((key * ((Math.sqrt(5) - 1) / 2)) % 1);
雜湊函式
https://zh.wikipedia.org/zh-tw/%E6%95%A3%E5%88%97%E5%87%BD%E6%95%B8
hash table
https://en.wikipedia.org/wiki/Hash_table#Collision_resolution
hash functions and list types of hash functions
https://www.geeksforgeeks.org/hash-functions-and-list-types-of-hash-functions/
types of hash functions
https://cs.kenyon.edu/index.php/scmp-218-00-data-structures/types-of-hash-functions/
Hashing
https://runestone.academy/ns/books/published/pythonds3/SortSearch/Hashing.html