自己嘗試壓縮文檔，到底有多少效果？——（5.）壓縮完了，恢復的了嗎？

csv json 資料壓縮文字編碼

rex1206 2023-10-21 06:20:07 ‧ 987 瀏覽

分享至

import csv
with open('data.bin', 'rb') as file:
    en = file.read()

with open('dictionary.bin', 'rb') as file:
    lst = file.read().split(b'\xff\xff')

with open("not_use_bytes.txt", "r", encoding = "UTF-8") as f:
    not_use_bytes = [int(i) for i in f.read().split(", ")]

for i in range(10*256+160-1,-1,-1):
    # i 從 10*256+160-1 到 160 ，對應 bytes([(i-160) // 256, (i-160) % 256]) 為 b'\x09\xff' → b'\x00\x00'
    if i > 159:
        en = en.replace(bytes([(i-160) // 256, (i-160) % 256]), lst[i])
    # 使用 not_use_bytes 裡面的 160 個 byte 替換儲存在 lst 裡面的字串 
    else:
        en = en.replace(bytes([not_use_bytes[i+10]]),lst[i])
    
en = en.decode('utf-8')
lst = en.split(',') # 恢復 list
lst2 = []
# 每 5 行 1 列
for i in range(len(lst)//5):
    lst2.append([lst[i*5],lst[i*5+1],lst[i*5+2],lst[i*5+3],lst[i*5+4]])
    for j in range(5):
        # 出現 '=' ，用上一列相同行的內容來取代
        if lst2[-1][j] == '=':
            lst2[-1][j] = lst2[-2][j]

with open('台灣郵遞區號.csv', 'w', newline='', encoding = "UTF-8") as csvfile: # w是覆蓋 a是增加
    csv.writer(csvfile).writerows(lst2)

最終測試，檔案和之前一樣，撒花~~
這一步驟我花了好幾個小時，原因就是第三章那一長串的註解，只能說沒有前人指導，自己胡亂土法煉鋼就會出現莫名奇怪的各種問題。