3

## 【小馬的LeetCode練功坊】(相當難!) 1044題- 最長的重複字串，用RabinKarp算法做字串匹配

``````Input: "banana"
Output: "ana" (在index1,3各出現一次)
``````

1. 對子字串的長度做二分搜索，因為如果存在長度k的子字串滿足條件，那一定存在長度k-1,k-2,...,1的字串滿足條件
2. 用Rabin-Karp演算法判斷長度k的子字串是否出現2次

Rabin-Karp演算法?

# 嘗試- 搜索子字串是否再後面出現，超時

``````class Solution:
def mySearch(self, text, M):
for i in range(len(text) - M):
if(text.find(text[i:i+M], i+1)!=-1):
return (True, text[i:i+M])
return (False, "")

def longestDupSubstring(self, S: str) -> str:
beg, end = 0, len(S)
Found = ""
while beg + 1 < end:
mid = (beg + end)//2
isFound, candidate = self.mySearch(S, mid)
if isFound:
beg, Found = mid, candidate
else:
end = mid
return Found
``````

# AC- Rabin-Karp演算法 + 二分搜索

Rabin-Karp演算法是一個字串匹配算法，

`bac` 的 hash value為 `(ord('b')*d^2 + ord('a')*d^1 + ord('c')*d^0) % q`
d可以想成是「d進位」的意思(選d=256即可，因為`ord('z')`比256小)，
q選擇一個大質數，
`%q`是避免hash value過大

1. 有可能原始字串不同達到相同的hash value，因此要檢查子字串是否相同 (保證正確性)
2. 選擇一個超大的質數`q`，使得不同字串但hash value相同的機率低，只要hash value相同就直接當做字串相同(不一定永遠正確但快速)

## 解法一、保證正確但較慢(ac, 2448ms)

``````class Solution:

def RabinKarp(self,text, M, q):
if M == 0:
return True
hashVal, base = 0, 256
shift = (base ** (M-1)) % q

dic = defaultdict(list)

for i in range(M):
hashVal= (base * hashVal + ord(text[i]))% q

dic[hashVal].append(i-M+1)

for i in range(1,len(text) - M +1):
hashVal = (base*(hashVal-ord(text[i-1])*shift) + ord(text[i + M -1]))% q
for j in dic[hashVal]:
if text[i:i+M] == text[j:j+M]:
return (True, text[j:j+M])
dic[hashVal].append(i)
return (False, "")

def longestDupSubstring(self, S:str) ->str:
beg, end = 0, len(S)
q = (1<<31) - 1
Found = ""
while beg + 1 < end:
mid = (beg + end)//2
isFound, candidate = self.RabinKarp(S, mid, q)
if isFound:
beg, Found = mid, candidate
else:
end = mid
return Found
``````

## 解法二、不一定永遠正確但快速(ac, 1804ms)

``````class Solution:

def RabinKarp(self,text, M, q):
if M == 0:
return True
hashVal, base = 0, 256
shift = (base ** (M-1)) % q

dic = defaultdict(lambda x:0)

for i in range(M):
hashVal= (base * hashVal + ord(text[i]))% q

dic[hashVal] = i-M+1

for i in range(1,len(text) - M +1):
hashVal = (base*(hashVal-ord(text[i-1])*shift) + ord(text[i + M -1]))% q
if hashVal in dic:
return (True, text[i:i+M])
dic[hashVal] = i
return (False, "")

def longestDupSubstring(self, S:str) ->str:
beg, end = 0, len(S)
q = 35184372088891 #隨意找的大質數
Found = ""
while beg + 1 < end:
mid = (beg + end)//2
isFound, candidate = self.RabinKarp(S, mid, q)
if isFound:
beg, Found = mid, candidate
else:
end = mid
return Found

``````