pandas如何濾掉goodinfo重覆的欄位？

python pandas goodinfo

zzhsu20 2022-11-21 15:26:29 ‧ 805 瀏覽

分享至

python新手請教
從goodinfo爬出表格的資料後，發現有重覆的標題欄位，請教各位高手如何刪除呢？

程式碼如下:

import requests
import bs4
import pandas as pd

#目標網站。
url = "https://goodinfo.tw/tw/StockAssetsStatus.asp?STOCK_ID=8069"
#設定headers
headers = {
 'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Mobile Safari/537.36'
}

res = requests.post(url,headers = headers)
res.encoding = 'utf-8'
temp = res.text
soup = bs4.BeautifulSoup(res.text,'lxml')

bus_table = pd.read_html(temp)
print(len(bus_table))
print(bus_table[14])

欲刪掉的欄位如下圖:

登入發表討論

直播研討會

{{ item.channelVendor }} {{ item.webinarstarted }} |

直播中

1 個回答

mackuo

iT邦研究生 1 級 ‧ 2022-11-21 16:17:18

最佳解答

import requests
import bs4
import pandas as pd

#目標網站。
url = "https://goodinfo.tw/tw/StockAssetsStatus.asp?STOCK_ID=8069"
#設定headers
headers = {
 'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Mobile Safari/537.36'
}

res = requests.post(url,headers = headers)
res.encoding = 'utf-8'
temp = res.text
#soup = bs4.BeautifulSoup(res.text,'lxml')

bus_table = pd.read_html(temp)
print(len(bus_table))
df = bus_table[14]
print(len(df))
print(df.columns)
df

上面資料爬出來後的欄位名稱是MultiIndex

# 剔除「'年度',        '年度'」這個欄位的值等於「年度」的所有列。
df = df[df['年度',        '年度'] != '年度']
df

回應 2
分享
檢舉

zzhsu20 iT邦新手 5 級 ‧ 2022-11-21 21:46:51 檢舉

謝謝mackuo的協助，原來pandas還可以這麼寫，繼續摸索

mackuo iT邦研究生 1 級 ‧ 2022-11-22 08:31:29 檢舉

不客氣囉。對您有幫助，我也很開心。

登入發表回應

我要發表回答

立即登入回答

參賽組數

1064 組

團體組數

40 組

累計文章數

22210 篇

完賽人數

600 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# windows server linux css react vue.js

IT邦幫忙

pandas如何濾掉goodinfo重覆的欄位？

1 個回答

我要發表回答

標記使用者