【Day 22】檔案類型

2022 iThome 鐵人賽

DAY 22

自我挑戰組

從前端角度看30天學Python系列第 22 篇

14th鐵人賽

allieschen

2022-10-06 23:29:08

587 瀏覽

分享至

JSON
- 將 JSON 格式轉為 dictionary
- 將 dictionary 轉為 JSON
- 儲存 JSON 檔案
CSV
xlsx
XML

這篇文章是閱讀Asabeneh的30 Days Of Python: Day 19 - File Handling後的學習筆記與心得。

接續 Day 21 都是處理 txt 類型的操作，今天是處理 JSON、CSV，xslx 和 xml 這些副檔名檔案的例子。

JSON

這個前端應該都很熟悉，在JavaScript (以下簡稱JS) 中接近object類型，在Python中則是接近 dictionary 的類型，也可以透過字串實現 (本來JSON就是字串)：

person_dct = {
	"name":"John",
	"city":"Taipei",
	"skills":["JavaScript", "Python"]
}
# 可以寫成多行字串的形式，更容易閱讀
person_json = """{
    "name":"John",
	"city":"Taipei",
	"skills":["JavaScript", "Python"]
}"""

將 JSON 格式轉為 dictionary

引入 json 這個模組，並且使用 loads 方法轉換，這個模組在後續起伺服器做 AJAX 時，也會用到：

import json
person_json = """{
    "name":"John",
	"city":"Taipei",
	"skills":["JavaScript", "Python"]
}"""

person_dct = json.loads(person_json)
print(type(person_dct)) # <class 'dict'>
print(person_dct) # {'name': 'John', 'city': 'Taipei', 'skills': ['JavaScript', 'Python']}
print(person_dct["name"]) # John

將 dictionary 轉為 JSON

使用 dumps 方法轉換，作為伺服器要處理 AJAX 回傳時就可以這麼做：

import json
person_dct = {
	"name":"John",
	"city":"Taipei",
	"skills":["JavaScript", "Python"]
}
person_json = json.dumps(person_dct, indent=4)
# indent 可以是2、4，或8，只是為了美觀
# 預設是None，就會是一串字串
print(type(person_json)) # <class 'str'>
print(person_json)
"""
{
    "name": "John",
    "city": "Taipei",
    "skills": [
        "JavaScript",
        "Python"
    ]
}
"""

儲存 JSON 檔案

應用 Day 21 提到的檔案處理方法加上 json 模組中的 dump (沒有s) 方法，就可以做出 JSON 格式的檔案：

import json
person_dct = {
	"name":"John",
	"city":"Taipei",
	"skills":["JavaScript", "Python"]
}
with open("./json_example.json", "w", encoding="utf-8") as f:
    json.dump(person_dct, f, ensure_ascii="False", indent=4)

dump 中的 ensure_ascii 參數若為true(預設)，則所有非ascii的字元都會被跳過。

CSV

CSV - comma separated values，顧名思義，用逗號分隔值的檔案：

"name","city","skills"
"John","Taipei","JavaScript, Python"

可以透過 csv 模組來讀檔：

import csv
with open("./csv_example.csv") as f:
    csv_reader = csv.reader(f, delimiter=",")
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f"| {' | '.join(row)} |")
            print("|"+"---|"*len(row))
            line_count += 1
        else:
            print(f"| {' | '.join(row)} |")
            line_count += 1
    print(f"Number of lines: {line_count}")

輸出：

| name | city | skills |
|---|---|---|
| John | Taipei | JavaScript, Python |
Number of lines: 2

xlsx

要讀取 MS Office Excel的檔案(xlsx, xls)的話，需要安裝額外的 package，這部份文中寫會在下一天的pip使用中說到 (看起來就像是 npm 的作用)。

原文中使用 xlrd 這個套件，但因為他只能讀取 xls 而非 xlsx 這個比較新的格式，我這邊改使用 [openpyxl]。

我先做了一個 xlsx 檔，並放在 ./ 這個路徑下，檔案內容為：

name	city	skills
John	Taipei	JavaScript, Python

然後要安裝套件 openpyxl：

pip instatll openpyxl

然後簡單的演示讀取資料：

from openpyxl import load_workbook

wb = load_workbook(r"./xlsx_example.xlsx")
sheet_ranges = wb["工作表1"]

for cell_tuple in sheet_ranges:
    cell_value_lst = [tuple.value for tuple in cell_tuple]
    print(cell_value_lst)

輸出：

['name', 'city', 'skills']
['John', 'Taipei', 'JavaScript, Python']

XML

XML (Extensible Markup Language) 這個格式類似 HTML，但標籤的格式是沒有預先定義的，下面的例子得先做一個檔案，我取名為 xml_example.xml，並且內容如下：

<?xml version="1.0"?>
<person gender="male">
  <name>John</name>
  <city>Taipei</city>
  <skills>
    <skill>JavaScript</skill>
    <skill>Python</skill>
  </skills>
</person>

然後，要引入 xml.etree.ElementTree 這個模組，透過 ET 這個方法，可以拿到一個 element tree 物件，然後再透過 getroot() 這個方法拿到 element 物件並執行遍歷：

? 官方文件有特別提醒這個模組對於資料的安全性並沒有完全的保證

import xml.etree.ElementTree as ET

tree = ET.parse("xml_example.xml")
root = tree.getroot()

print("root tag is: ",root.tag)
print("root attribute is: ", root.attrib)

for child in root:
    if child.tag == "skills":
        skills = [skill.text for skill in child.iter("skill")]
        print("child tag: ", child.tag, "\nchild text: ", skills)
    else:
        print("child tag: ", child.tag, "\nchild text: ", child.text)

tag、attrib，text以及iter()，這些是element物件的屬性，可參考官方文件說明

輸出：

root tag is:  person
root attribute is:  {'gender': 'male'}
child tag:  name
child text:  John
child tag:  city
child text:  Taipei
child tag:  skills
child text:  ['JavaScript', 'Python']