把信箱下載成html,用bs4解析,但一直出現
d = bs4.BeautifulSoup(c.read(), 'features="lxml"')
File "C:\Users\CathyMe\AppData\Local\Programs\Python\Python38\lib\site-package
s\bs4_init_.py", line 248, in init
raise FeatureNotFound(
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requeste
d: features="lxml". Do you need to install a parser library?
#=========
將features="lxml",改成html也不行。(註:已安裝lxml)
不知道有沒有大大能幫忙查看問題出在哪?
程式目的:
下載郵件,用bs4解析,找出連結網址。
↑目前進行到第2步。
先前有將pyzmail物件直接交給bs4解析,
但bs4似乎認不得,於是中間做了一「轉存文字檔」的動作。
可是編碼又報錯???
i = imapclient.IMAPClient('imap.gmail.com', ssl=True)
i.login('xxx@gmail.com', 'xxx') # ←這是假帳密
i.select_folder('INBOX', readonly=True)
a = i.list_folders()
a = i.search(['ALL']) # 收件匣全部
del a[11:] # 只取前十封
print(a) # 郵件編號
import pyzmail, bs4
for x in range(len(a)):
y = i.fetch(a[x], ['BODY[]', 'FLAGS']) # 取得第一封信文
z = pyzmail.PyzMessage.factory(y[a[x]][b'BODY[]']) # 取得信文的pyzmail物件
if z != {}:
b = z.html_part.get_payload().decode(z.html_part.charset)
c = open(str(x)+'.html', 'w', encoding='UTF-8') # 中文信
c.write(b)
c.close()
c = open(str(x)+'.html', encoding='UTF-8')
print("c=", c.read()) # 很像html的格式
d = bs4.BeautifulSoup(c.read(), 'features="html"') # 在這裡陣亡
e = d.select('p') # 原本要選擇html裡的<p>
for n in e:
print(str(n[0]))
#print(b)
break # 先對第一封信做
你的錯誤訊息已經告訴你了.
Couldn't find a tree builder with the features you requeste
d: features="lxml". Do you need to install a parser library?
來看一下 bs4的文件,我幫你找了中文的
https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/
裡面 安裝解析器部分
https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/#id13
有提到安裝 lxml 解析器.
你只要把那多餘的單引號刪掉就好了...
d = bs4.BeautifulSoup(c.read(), features="lxml")