Day 25：group anagrams

2022 iThome 鐵人賽

DAY 25

自我挑戰組

30天演算法解題系列第 25 篇

14th鐵人賽演算法

ramenkun

2022-10-09 22:11:21

646 瀏覽

分享至

problem

輸入為一陣列，其中元素為字串，將字串以 anagram (易位構詞) 進行分組，最終以雙層陣列的結構輸出。易位構詞指的是同樣字母以不同順序排列出來的詞，例如 'cinema' 和 'iceman'。

sample input:
words = ['yo', 'act', 'flop', 'tac', 'foo', 'cat', 'oy', 'olfp']

sample output:
[
['foo'],
['flop', 'olfp'],
['oy', 'yo'],
['act', 'cat', 'tac']
]

solution 01

第一個想法是，如果將每個字串中的字元按照字母排序，排序後一樣的字串則為同組的易位構詞。例如下方將字串排序後，可以看出可以分成三組。

['yo', 'act', 'flop', 'tac', 'cat', 'oy', 'olfp'] (陣列一)
↓
['oy', 'act', 'flop', 'act', 'act', 'oy', 'flop'] (陣列二)

這時如果直接排序陣列 (如下陣列三)，雖然可以將同一組集中處理，卻不容易回溯到原本的字串。因此仍然要排序，只是改成以索引值來表示排序後的結果。有索引值後，就可以利用 陣列二[索引] 知道當前在處理哪一組易位構詞，並用 陣列一[索引] 知道原始的字串是什麼。

['act', 'act', 'act', 'flop', 'flop', 'oy', 'oy'] (陣列三)
↓
[1, 3, 4, 2, 6, 0, 5]

如果 s 為字串的數量，n 是最長字串的長度，
time: O(snlog(n) + nslog(s))
其中 snlog(n) 是來自於陣列二中排序每個字串，總共有 s 個字串。而 nslog(s) 是來自於排序索引陣列，而排序的方法是比較長度 n 的字串。
space: O(sn) 因為陣列二空間為 sn，索引值陣列空間為 s，最後回傳的陣列也為 sn。

這種解法概念上可能不難懂，但程式碼實作上會有比較多容易出錯的小細節。

function groupAnagrams(words) {
  if (words.length === 0) return [];
  
  const sortedWords = words.map(word => word.split('').sort().join(''));
  const indices = [...new Array(words.length).keys()]
  
  // 將索引值以字串順序排序
  indices.sort((a, b) => {
    if (sortedWords[a] < sortedWords[b]) return -1;
    if (sortedWords[a] > sortedWords[b]) return 1;
    return 0;
  })

  const result = [];
  let currentAnagram = [];
  let currentGroup = sortedWords[indices[0]]; // 當前組別
  for (const index of indices) {
    const word = words[index];
    const sortedWord = sortedWords[index];
    // 如果仍是同一組 則將原始字串加入
    // 否則更新組別
    if (sortedWord === currentGroup) {
      currentAnagram.push(word);
    } else {
      result.push(currentAnagram);
      currentAnagram = [word];
      currentGroup = sortedWord;
    }
  }
  result.push(currentAnagram);
  return result;
}

solution 02

第二種方法算是延續第一種解，但利用 hash table 作記錄讓操作更優化、簡單。
步驟是：

一樣將每個字串按字母排序：
['yo', 'act', 'flop', 'tac', 'cat', 'oy', 'olfp'] (陣列一)
↓
['oy', 'act', 'flop', 'act', 'act', 'oy', 'flop'] (陣列二)
以陣列二檢查易位構詞的組別是否已經在 hash table 中，如果沒有，則以新的組別為 key，並將原始字串存進去為值。如果有，則將原始字串加入既有的組別中。
例如檢查陣列二[0]，此時還沒有 'oy' 這一組，所以在 hash table 中存進 'oy': ['yo']，而下一次再碰到同一組的字，就可以直接原始字串推入陣列中 'oy': ['yo', 'oy']

同解一，s 為字串的數量，n 是最長字串的長度，
time: O(snlog(n)) 來自於陣列二中排序每個字串，總共有 s 個字串。
space: O(sn) 來自 hash table 及回傳陣列。

function groupAnagrams(words) {
  const anagrams = {};
  for (const word of words) {
    const sortedWord = word.split('').sort().join('');
    if (sortedWord in anagrams) {
      anagrams[sortedWord].push(word);
    } else {
      anagrams[sortedWord] = [word];
    }
  }
  return Object.values(anagrams);
}