python bs4不認識pyzmail？另外lxml及html的編碼問題？

python

Catherine Bloom 2023-02-04 15:44:54 ‧ 799 瀏覽

分享至

把信箱下載成html，用bs4解析，但一直出現

d = bs4.BeautifulSoup(c.read(), 'features="lxml"')

File "C:\Users\CathyMe\AppData\Local\Programs\Python\Python38\lib\site-package
s\bs4_init_.py", line 248, in init
raise FeatureNotFound(
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requeste
d: features="lxml". Do you need to install a parser library?
#=========
將features="lxml"，改成html也不行。(註：已安裝lxml）
不知道有沒有大大能幫忙查看問題出在哪？

程式目的：
下載郵件，用bs4解析，找出連結網址。
↑目前進行到第2步。
先前有將pyzmail物件直接交給bs4解析，
但bs4似乎認不得，於是中間做了一「轉存文字檔」的動作。
可是編碼又報錯？？？


i = imapclient.IMAPClient('imap.gmail.com', ssl=True)
i.login('xxx@gmail.com', 'xxx') # ←這是假帳密

i.select_folder('INBOX', readonly=True)

a = i.list_folders()

a = i.search(['ALL']) # 收件匣全部
del a[11:] # 只取前十封
print(a)   # 郵件編號

import pyzmail, bs4

for x in range(len(a)):
    y = i.fetch(a[x], ['BODY[]', 'FLAGS']) # 取得第一封信文
    z = pyzmail.PyzMessage.factory(y[a[x]][b'BODY[]']) # 取得信文的pyzmail物件
    if z != {}:
        b = z.html_part.get_payload().decode(z.html_part.charset) 
        c = open(str(x)+'.html', 'w', encoding='UTF-8') # 中文信
        c.write(b)
        c.close()
        c = open(str(x)+'.html', encoding='UTF-8')
        print("c=", c.read()) # 很像html的格式
        d = bs4.BeautifulSoup(c.read(), 'features="html"') # 在這裡陣亡
        e = d.select('p') # 原本要選擇html裡的<p>
        for n in e:
            print(str(n[0]))
            #print(b)
            break # 先對第一封信做

登入發表討論

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

2 個回答

一級屠豬士

iT邦大師 1 級 ‧ 2023-02-04 15:54:14

最佳解答

你的錯誤訊息已經告訴你了.
Couldn't find a tree builder with the features you requeste
d: features="lxml". Do you need to install a parser library?

來看一下 bs4的文件,我幫你找了中文的
https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/
裡面安裝解析器部分
https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/#id13

有提到安裝 lxml 解析器.

回應 1
分享
檢舉

Catherine Bloom iT邦新手 2 級 ‧ 2023-02-04 16:46:29 檢舉

謝謝，我改成html5lib就成功了。

登入發表回應

akitrash

iT邦新手 5 級 ‧ 2023-02-04 16:48:22

你只要把那多餘的單引號刪掉就好了...

d = bs4.BeautifulSoup(c.read(), features="lxml")

回應 1
分享
檢舉

Catherine Bloom iT邦新手 2 級 ‧ 2023-02-06 14:42:53 檢舉

謝謝您。

登入發表回應

我要發表回答

立即登入回答

參賽組數

902 組

團體組數

37 組

累計文章數

19854 篇

完賽人數

528 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 17th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# linux windows server css react

IT邦幫忙