關於Python執行結果自動輸出 txt檔失敗問題

python beautifulsoup

tj81951992 2019-06-03 15:52:17 ‧ 5328 瀏覽

分享至

各位前輩大家好，小弟最近在學python爬蟲，然後遇到問題如下，我現在可以抓到所爬的網站資料原始碼，不過想要將把抓到的原始碼自動寫入txt檔存檔。

執行Python後發生問題:
Traceback (most recent call last):
File "tomy-request.py", line 32, in
file.write(root.html)
TypeError: write() argument must be str, not Tag

再請前輩們指教，謝謝!!

#抓取網站原始碼
import urllib.request as req
url="https://www.cac.edu.tw/apply107/system/107ColQry_forapply_4hgd9/html/107_002012.htm"
#建立一個Request物件，附加Request Headers的資訊
request=req.Request(url,headers={
"User-Agent":"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Mobile Safari/537.36"
})

with req.urlopen(request) as response:
data=response.read().decode("utf-8") #讀出資料

#解析原始碼
import bs4
root=bs4.BeautifulSoup(data,"html.parser")
print (root.html) #印出html內的原始碼

with open("html1.txt","w",encoding="utf-8") as file:
file.write(root.html)

ccutmis iT邦高手 2 級 ‧ 2019-06-03 16:20:38 檢舉

其實不需要用到海龜湯...
下列範例能做到同樣的事，這邊提供您參考:
from requests import get as webGet
url='https://www.cac.edu.tw/apply107/system/107ColQry_forapply_4hgd9/html/107_002012.htm'
r=webGet(url)
r.encoding='utf-8'
with open("html1.txt","w",encoding="utf-8") as file:
　file.write(r.text)

tj81951992 iT邦新手 5 級 ‧ 2019-06-04 09:25:50 檢舉

因為我看影片自學，所以照老師方式學習，感謝前輩提供方法。

登入發表討論

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

3 個回答

Eeeeh

iT邦新手 5 級 ‧ 2019-06-03 16:08:03

最佳解答

TypeError: write() argument must be str, not Tag

轉換成str()格式即可：

with open("html1.txt","w",encoding="utf-8") as file:
    file.write(str(root.html))

回應
分享
檢舉

登入發表回應

㊣浩瀚星空㊣

iT邦大神 1 級 ‧ 2019-06-03 16:02:09

其實我剛看了一下。它的錯誤訊息是告訴你不是str(也就是文字格式)

所以我在想說，你的root.html會不會它是一個物件?
或許需要轉換一下成文字類型才行??

回應 1
分享
檢舉

tj81951992 iT邦新手 5 級 ‧ 2019-06-04 09:32:58 檢舉

我用print(root.html)會呈現出此網站html的原始碼，我想它應該不是物件。因為藉由BeautifulSoup分析網頁，所以變成是bs4的格式非字串才導致不能直接寫入txt檔。
謝謝前輩問題已解決!!

登入發表回應

froce

iT邦大師 1 級 ‧ 2019-06-03 16:04:19

太久沒用BS4了

file.write(root.prettify())

請學著看debug
TypeError: write() argument must be str, not Tag
這句其實已經告訴你一切

回應 1
分享
檢舉

tj81951992 iT邦新手 5 級 ‧ 2019-06-04 09:39:35 檢舉

我有事先爬文找問題，原來只要加上一句str就可以解決問題，謝謝前輩!!

登入發表回應

我要發表回答

立即登入回答

參賽組數

902 組

團體組數

37 組

累計文章數

19856 篇

完賽人數

528 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 17th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# linux windows server css react

IT邦幫忙