Python 根據字串條件 substring

python if-else

KL 2021-08-02 22:48:12 ‧ 1115 瀏覽

分享至

嘗試將多個csv檔案concatenate在一起，其中一個字串欄位含有RE或沒有RE，想嘗試擷取日期和時間。使用IF-ELSE敘述但兩者substring的結果是一樣位置的字串，想請問我的IF-ELSE敘述哪裡寫錯了，請幫忙指證，謝謝! 目前輸出的結果如下圖所示。

import pandas as pd
import numpy as np
import os

path_dataset = r'C:\Users\test'

def get_file(path_dataset):
    files = os.listdir(path_dataset) #check file list
    files.sort() #sort file
    file_list = []
    for file in files:
        path = path_dataset + "\\" + file
        if (file.startswith("test")) and (file.endswith(".csv")):
        file_list.append(path)
    return (file_list)

read_columns = ['NAME']

read_files = get_file(path_dataset)

all_df = []

for file in read_files:
    df = pd.read_csv(file, usecols = read_columns)

    if (str(df['NAME'].astype(str).str[23:25]) == 'RE-'):
        df['DATE_TIME'] = df['NAME'].astype(str).str[26:40]
    else:
        df['DATE_TIME'] = df['NAME'].astype(str).str[22:37]
    
all_df.append(df)

Concat_table = pd.concat(all_df, axis=0)
Concat_table = Concat_table.sort_values(['DATE_TIME'])

Concat_table.head()
Concat_table.to_csv(os.path.join(path_dataset, 'Concate_all.csv'), index=False)

執行以上程式碼的結果:

預期跑出的結果:

看更多先前的討論...收起先前的討論...

一級屠豬士 iT邦大師 1 級 ‧ 2021-08-03 06:26:11 檢舉

十幾年來,菜鳥問問題,總是以為程式碼有貼,就夠了.以前很多就是貼一道SQL,沒有table結構,
沒有資料,這樣是很難去了解情況的.你看得到,但是別人是看不到的.開發程式,不只是程式碼而已.
你最好能夠把資料,還有"想要達到的結果",設法整理貼上來.

froce iT邦大師 1 級 ‧ 2021-08-03 08:09:31 檢舉

> 你最好能夠把資料,還有"想要達到的結果",設法整理貼上來.

如果有跳出錯誤，還有完整的debug資訊也要。

一級屠豬士 iT邦大師 1 級 ‧ 2021-08-03 10:48:05 檢舉

嘗試將多個csv檔案concatenate在一起，其中一個字串欄位含有RE或沒有RE，想嘗試擷取日期和時間。
換個方式,單獨的csv 讀入後, 各自整理後, 再將數個 dataframe 整合成一個 dataframe ,這也是可以思考的方向.

KL iT邦新手 5 級 ‧ 2021-08-03 22:07:54 檢舉

謝謝指教，我把預期跑出的結果補齊在問題陳述最下方。

登入發表討論

直播研討會

{{ item.channelVendor }} {{ item.webinarstarted }} |

直播中

2 個回答

I code so I am

iT邦高手 1 級 ‧ 2021-08-03 09:32:40

Python 的範圍是含開始不含結束，[23:25] ==> 只取23、24，不含25。

if (str(df['NAME'].astype(str).str[23:25]) == 'RE-'):
        df['DATE_TIME'] = df['NAME'].astype(str).str[26:40]
    else:
        df['DATE_TIME'] = df['NAME'].astype(str).str[22:37]

回應 2
分享
檢舉

KL iT邦新手 5 級 ‧ 2021-08-03 22:03:26 檢舉

謝謝回應，但我改成str[23:26] == 'RE-' 還是不能傳回我要的日期和時間。一樣錯誤的結果，最終跑出else的判斷結果。

I code so I am iT邦高手 1 級 ‧ 2021-08-04 09:54:43 檢舉

使用下列

        df['DATE_TIME'] = np.where(df['NAME'].astype(str).str[23:25] == 'RE-', df['NAME'].astype(str).str[26:40], df['NAME'].astype(str).str[22:37])

登入發表回應

japhenchen

iT邦超人 1 級 ‧ 2021-08-03 10:09:20

用正則表達式re，犯不著一堆if判斷中間有出現-RE- (一堆RE是怎樣...)
把你的程式改成這樣

import re 

#.............中間省略你的程式1001字

    matches = re.search(r'(\d+)-\d+\s*$',df['NAME']) 
    if matches is not None :
        df['DATE_TIME'] = matches.group(1)

上面那個規則是以最後兩個-都是全數字，所以用\d+，如果長度完全固定，可以改用\d{8}代表日期那段，後面那段不明長度所以用\d+，如果會出現英文字母且大小寫不一定的話，可以改成 [a-zA-Z0-9]+

matches = re.search(r'(\d{8})-[a-zA-Z0-9]+\s*$',df['NAME'])

後面跟著\s*的意思是......可能會夾白在尾端，資料庫匯入時常常會有的狀況

至於錢字號$ 是指字串的尾巴

回應 2
分享
檢舉

KL iT邦新手 5 級 ‧ 2021-08-03 22:01:06 檢舉

謝謝教學使用正則表達式，但我套入兩種matches的寫法都會出現error。

請問是這樣寫嗎?

for file in read_files:
    df = pd.read_csv(file, usecols = read_columns)
    
    matches = re.search(r'(\d+)-\d+\s*$',df['NAME']) 
    if matches is not None :
        df['DATE_TIME'] = matches.group(1)
    
    all_df.append(df)

出現如下error code: