請教IT邦的熱心高手,我有兩個Excell,
第1個是:comapre.xlsx https://www.dropbox.com/s/xem8b4ce9xd6rx0/compare.xlsx?dl=0
第2個是:main.xlsx https://www.dropbox.com/s/3i909pu76dbspdo/main.xlsx?dl=0
我想做的是以main.xlsx 這張表A欄位中的值跟Comapare.xlsx 裡面的A欄位值相比,
如果吻合就把comapre.xlsx 的其他欄位和A欄位一起寫進第3個Excell:link.xlsx ,不吻合就把不符合的兩個值用"->"連起來,像這樣 https://www.dropbox.com/s/535t2c4pnhbeskj/link.xlsx?dl=0
我Google找到的作法是用Pandas像是這樣:
import pandas as pd
import numpy as np
path = r"D:\x\Python_Study\Excell\\"
df1=pd.read_excel(path+'compare.xlsx')
df2=pd.read_excel(path+'main.xlsx')
df1.equals(df2)
comparison_values = df1.values == df2.values
print (comparison_values)
rows,cols=np.where(comparison_values==False)
for item in zip(rows,cols):
df1.iloc[item[0], item[1]] = '{} --> {}'.format(df1.iloc[item[0], item[1]],df2.iloc[item[0], item[1]])
df1.to_excel('link.xlsx',index=False,header=True)
REF:https://kanoki.org/2019/02/26/compare-two-excel-files-for-difference-using-python/
但是會報錯:
d:\x\Python_Study\Excell\test0413.py:7: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
comparison_values = df1.values == df2.values
False
Traceback (most recent call last):
File "d:\x\Python_Study\Excell\test0413.py", line 9, in <module>
rows,cols=np.where(comparison_values==False)
ValueError: not enough values to unpack (expected 2, got 1)
請問我該往哪個方向修改? 或是有更好的做法。
需要的話我可以再描述清楚一點
謝謝IT邦的熱心網友願意花時間幫忙 好人一生平安
樓主的資料有點簡化,以下的回答不曉得是不是能實際解決。
樓主的main.xlsx如下:
import pandas as pd
main = pd.DataFrame(data={'網址': ['a.x.x', 'b.x.x', 'd.x.x']})
main
compare.xlsx如下:
compare = pd.DataFrame(data={'網址': ['a.x.x', 'b.x.x', 'c.x.x'],
'作業系統': ['win2012', 'win2019', 'linux'],
'WebServer版本': ['IIS 8', 'IIS10', 'Apache 2.2.4']
})
compare
最後想要得到的link結果如下:
link = main.copy()
condition = link['網址']==compare['網址']
for i in compare.columns[1:]:
link.loc[condition, i] = compare[i]
link.loc[~condition, '網址'] = link['網址'] + '->' + compare['網址']
link
main和compare如上一個解法中的資料。
先複製link後,將link和compare串接起來:
link = main.copy()
link = pd.concat([link, compare], axis=1)
link.columns = ['網址', 'compare網址', '作業系統', 'WebServer版本']
link
import numpy as np
condition = link['網址']==link['compare網址']
for i in link.columns[2:]:
link[i] = np.where(condition, link[i], np.nan)
link.loc[~condition, '網址'] = link['網址'] + '->' + link['compare網址']
link = link.drop(columns='compare網址')
link