iT邦幫忙

2024 iThome 鐵人賽

DAY 29
0

The big O of finding an element with sequential search method in a traditional array is O(n).
As the array size grows, it 'll require more time to find.

But if create an array that store the elements with their index, the big O of finding specific index of element in array is O(1), however it requires more memory space to storing elements.

So, it comes to the idea of hash table.

為了加快我們從 dataset 查找特定資料的速度,在一般 array 中做搜尋通常使用線性搜尋(sequancial search)其 Big O 為 O(n),當 array size 越來越大,耗費的時間也越來越長

但從 array index 取得對應的值其 Big O 為 O(1),所以 hash table 的概念由此而生


Hash table

Hash table is a data structure that implements an associative array, which maps keys to values, you can see hash table as an associative array storing the key value pairs whom's key generate by hash function.
A hash table use hash function to compuate the indices that map to the values.

Each position in the hash table, often called a slot.
A map implemented by a hash table is called a hash map.

雜湊表(hash table)是一種資料結構,基本上可以將雜湊表看作是儲存由雜湊函式產生的鍵值對(key value pairs)的關聯陣列,每個鍵值/索引值映射(map)到每個元素的值
其元素的鍵值(key value)由 hash function 所產生,

雜湊表使用雜湊函式來產生映射到值的索引值/鍵值

在雜湊表中的每個位置稱為 slot


Slot & Bucket

  • Slot
    Individual cell or position in hash table.
  • Bucket
    A sequence of slots that are grouped.
    In chaining, a bucket can be thought as a linked list or array at a particular index where all colliding elements stored together.(if there's no collision , there're none of bucket.)

通常 slot 用來指稱 hash table 中的單一一個位置或最小單位
使用鏈結處理發生碰撞的情況,在發生碰撞的位置上會有多個元素構成的集合,這個集合便稱為 bucket
可以將 bucket 視為將所有發生衝突的元素儲存在一起的陣列 / 鏈結串列
(沒有碰撞的話自然也就不會有 bucket)

Hash function 雜湊函式

A hash function transforms an input value (often the key of an element) into another value (known as a hash code) that represents the position of the element in the hash table. Hashing is commonly used to store passwords in database, where the original password is converted into a hashed value to enhance security.

There are types of hash function,such as:

  • Division Method
  • Multiplication Method
  • Mid Square Method
  • Folding Method

運算出 index of element 有很多方法,這裡介紹幾個簡單的方法

  • Division method
  • Multiplication Method
  • Mid Square Method
  • Folding Method

實際上應用的 hash function 有很多種,並不只這幾種


Types of Hash function:Division Method

m = size of hash table
n = numbers of elements to store in the hash table
Index = key mod n

索引值 = 鍵值 % 欲存取在 hash table 內的元素數量

Small quiz about Division Method

There is a hash table which size = 50
and there's a element which key value is 1359.
What is the hash index of the element with multiplication method?

假設有一個長度為 50 的 hash table
有ㄧ元素 key 值為 1359 ,那該元素使用 division method 產生於 hash table 內的索引值為何?

index = 1359 % 50

Ans: 9

Advantage of division method

  • Very fast
  • Easy to implement
  • Works well when m is a prime number

--

  • 執行速度快
  • 邏輯簡單,容易寫
  • 當 hash table size 為質數時,其效果很好

Disadvantage of division method

  • the value of m has to be far enough to 2^P, where P is an positive integer.
  • if the naming conventions are similar, it might has bigger chance to get more collisions.

--

  • 選擇 size if hash table 時要盡可能遠離 2 的 p 次方
  • 當 key 不為數字,發生 collisions 的情況會大幅增加

Types of Hash function:Multiplication Method

m = size of hash table
n = numbers of elements to store in the hash table
keyA = irrational number(無理數), usually (√5-1) / 2 approximately to 0.618

Index = m*(keyA mod 1)

  • mod 1 的意思就是留下小數點後的數值
    E.g. 97.4567 mod 1 = 0.4567

Small quiz about Multiplication Method

There is a hash table which size = 50
and there's a element which key value is 1359.
What is the hash index of the element with multiplication method?

假設有一個長度為 50 的 hash table
有ㄧ元素 key 值為 1359 ,那該元素使用 multiplication method 產生於 hash table 內的索引值為何?

index = ⌊50 × ( 1359 × 0.6180339887 mod1)⌋

  1. 1359 * 0.618033 = 839.9081
  2. 839.9081 mod 1 = 0.9081
  3. Multiply by the table size 50 * 0.9081 = 45.405
  4. Take the floor 45

Ans: 45

Advantage of multiplication method

  • Reduce the chances of collisions.
  • Works well with table size of power of 2.

--

  • 大大減少 collisions 的機會(與 division method 相比)
  • 即使 table size 為 2 的 n 次方也可以使用

Disadvantage of multiplication method

  • More complex to implement
  • Less intutive(choosing a good constant like (√5 - 1) / 2)

--

  • 比較難實做出來(與 division method 相比)
  • 寫法較不直覺(計算時牽涉 (√5 - 1) / 2 無理數構成的常數運算)

What if the key of element is not number..

If the key of an element is a string rather than a number, we need to convert it to a numberic form for hash table storage. One simple (but ineffective) method is using the length of string. A better approach is to calculate the sum of the ASCII values of each character in the string, which can be done using the charAt method.

To improve hash distributions, we can apply specific rules like multiplying ASCII values by certain constants or taking remainders to generate a more evenly spread hash index.

--

但元素的鍵值不一定會是數字,有時候也會遇到字串型態的鍵值,這時候可以取字串的長度(效率並不好)、用charAt 取字符的 ASCII 值並加總、對加總的 ASCII 值乘除某特定常數或取餘數等等,以讓雜湊值更廣泛分散於雜湊表中

舉例來說

parse(string){
    let parseKey =0;
    // get the sum of all letters in ASCII type
    for(let i=0;i<string.length;i++){
        parseKey += string[i];
    }
    return parseKey;
}

Implementation hash table in javascript

一樣用 class 的寫法來處理,完整程式碼在這裡
為了處理 collision 問題,這裡使用 seperate chaining 的方法(下篇會提到)
每個 slot 會是另一個 array,所以 hash table 會是一個 array of array

當 set 遇到 collision ,則 push({key,value}) 到 this.table[index]
當 get 遇到 collision ,則 loop through this.table[index] 來找到完全符合的元素

另外 multiplication 的 index 於 javascript 中如下,要特別小心括號的位置

return Math.floor(this.size * ((key * ((Math.sqrt(5) - 1) / 2)) % 1);

其他資源

雜湊函式
https://zh.wikipedia.org/zh-tw/%E6%95%A3%E5%88%97%E5%87%BD%E6%95%B8
hash table
https://en.wikipedia.org/wiki/Hash_table#Collision_resolution
hash functions and list types of hash functions
https://www.geeksforgeeks.org/hash-functions-and-list-types-of-hash-functions/
types of hash functions
https://cs.kenyon.edu/index.php/scmp-218-00-data-structures/types-of-hash-functions/
Hashing
https://runestone.academy/ns/books/published/pythonds3/SortSearch/Hashing.html


上一篇
Stack & Queue 堆疊與佇列-day 28
下一篇
Collision & Handle Collisions-day 30
系列文
演算法與資料結構入門:30天基礎學習之旅30
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言