今天來refactor DB schema以及ETL有關這兩個的review可以看昨天的文章[Day 8] Food Delivery Platform Code review (二)
昨天有提到Schema的設計上缺少了foreign key還有正規化與命名一致的問題,在不修改大架構的情況下refactor的方向為:
Table restaurant {
id SERIAL [pk, increment]
name TEXT [not null]
cash_balance NUMERIC [not null]
}
Table menu {
id SERIAL [pk, increment]
dish_name TEXT [not null]
price NUMERIC [not null]
restaurant_id INT [ref: > restaurant.id, not null]
}
Table open_hours {
id SERIAL [pk, increment]
week TEXT [not null]
open TIME [not null]
close TIME [not null]
restaurant_id INT [ref: > restaurant.id, not null]
}
Table users {
id INT [pk]
name TEXT [not null]
cash_balance NUMERIC [not null]
}
Table purchase_history {
id SERIAL [pk, increment]
menu_id INT [ref: > menu.id, not null]
transaction_amount NUMERIC [not null]
transaction_date TIMESTAMP [not null]
user_id INT [ref: > users.id, not null]
}
ETL 優化的重點為
# ... [其他不變的部分]
def til_days(first_day, end_day, open_time_open, open_time_close, restaurant_id):
insert_data = []
# 這部分邏輯保持不變
# ...
return insert_data
def init_db_restaurant(conn, log):
with open("../data/restaurant_with_menu.json", 'r') as load_f:
load_dict = json.load(load_f)
cur = conn.cursor()
for restaurant in load_dict:
# ... [其他不變的部分]
# insert openhours table
open_days = restaurant.get('openingHours').split('/')
statement = 'INSERT INTO openhours (week, open, close, restaurant_id) VALUES (%s, %s, %s, %s)'
insert_data = []
for day in open_days:
# 使用函數重構的部分邏輯來獲取insert_data
day_data = parse_day(day, restaurant_id)
insert_data.extend(day_data)
cur.executemany(statement, insert_data)
conn.commit()
cur.close()
# ... [其他不變的部分]
def init_db_users(conn, log):
with open("../data/users_with_purchase_history.json", 'r') as load_f:
load_dict = json.load(load_f)
cur = conn.cursor()
for user in load_dict:
# ... [其他不變的部分]
cur.executemany(statement, insert_data)
conn.commit()
cur.close()
# ... [其他不變的部分]