網頁抓圖python練習題失敗

python python 3

a810911366 2020-06-04 12:29:02 ‧ 2364 瀏覽

分享至

Hello 各位大神

我的系統是Windows 10 x64
我用Python跑以下的程式想練習抓網頁圖片
但是彈出的結果都是
"重複"
"重複"
"重複".......

請問我哪裡有錯或是哪個路徑少了呢?

import os
import urllib.request


def imgs(url):
    try:
        res = urllib.request.urlopen(rep)
        rep = urllib.request.Request(url)
        html = res.read().decode("utf-8", 'ignore')
        # print(html)
        import re

        from bs4 import BeautifulSoup

        web = BeautifulSoup(html, features="html.parser")

        img = web.select("img[src*=/uploads/allimg/]")

        for s in img:
            img_url = "http://pic.netbi.com" + s.get("src")
            file_path = 'D:/DOWNLOAD'
            file_name = "img" + str(int(random.uniform(20, 10) * 10 ** 14))
            if not os.path.exists(file_path):
                # 建立路徑
                os.makedirs(file_path)
                # 獲得圖片字尾
            file_suffix = os.path.splitext(img_url)[1]
            print(file_suffix)
            # 拼接圖片名（包含路徑）
            filename = '{}{}{}{}'.format(file_path, os.sep, file_name, file_suffix)
            print(filename)
            # 下載圖片，並儲存到資料夾中
            urllib.request.urlretrieve(img_url, filename=filename)
    except:
        print("重複")



i = 1
try:
    while i < 1092:
        url = "http://pic.netbi.com/index"
        if i == 1:
            url += ".html"
        else:
            url += "_" + str(i) + ".html"
        imgs(url)
        i += 1
except:
    print("錯誤")

懇請大神指教

登入發表討論

直播研討會

{{ item.channelVendor }} {{ item.webinarstarted }} |

直播中

1 個回答

listennn08

iT邦高手 5 級 ‧ 2020-06-04 12:47:45

為啥你們都喜歡自定義錯誤訊息 ? 然後不會 debug

# 你的問題
img = web.select("img[src*=/uploads/allimg/]")

img = web.select("img[src*='/uploads/allimg/']")

不會 debug 就老老實實秀 error message 出來

try:
    #do somthing...
except Exception as e:
    print(e)

基本上就算你抓下來的圖都一樣你也不會跑去 error 那邊那你設 except 幹嘛

回應 10
分享
檢舉

看更多先前的回應...收起先前的回應...

㊣浩瀚星空㊣ iT邦大神 1 級 ‧ 2020-06-04 13:12:11 檢舉

你都唸完了，那我就不唸了。

a810911366 iT邦新手 4 級 ‧ 2020-06-04 13:46:16 檢舉

其實我是看人家網頁在一行一行拆著學
所以程度非常低還請見諒~

listennn08 iT邦高手 5 級 ‧ 2020-06-04 13:49:41 檢舉

那你要慎選教材了

a810911366 iT邦新手 4 級 ‧ 2020-06-04 13:56:58 檢舉

剛剛嘗試了一下
謝謝你~ 成功了目前

想請問你們 '/uploads/allimg/' 是因為什麼原因需要加上''呢?

listennn08 iT邦高手 5 級 ‧ 2020-06-04 14:01:15 檢舉

因為

select("img[src*='/uploads/allimg/']")

你要中間這段

img[src*='/uploads/allimg/']

找 img src 屬性為 /uploads/allimg/ 的 element
你可以看一下 html img 裡 src 屬性也是有用 '' 括起來

ch_lute iT邦新手 5 級 ‧ 2020-06-04 14:21:43 檢舉

我看上面的code一直在想為什麼except那邊代表重複

褲底農民 iT邦新手 4 級 ‧ 2020-06-04 17:16:37 檢舉

我另外問一下為啥要把import放在函數裡面??
是我孤陋寡聞???

listennn08 iT邦高手 5 級 ‧ 2020-06-04 17:29:16 檢舉

褲底農民
有的人覺得這樣比較好
但 PEP08 code style 傾向放在開頭

Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.

所以我說他應該要換一個教材了
放在函式裡面增加函式調用的時間

ositonegro iT邦新手 5 級 ‧ 2021-07-21 19:37:04 檢舉

請問 listennn08 大大，您有推薦的Python抓圖教材嗎?

listennn08 iT邦高手 5 級 ‧ 2021-07-22 09:37:26 檢舉

ositonegro
抓圖應該可以看這系列的前面幾篇
https://ithelp.ithome.com.tw/articles/10202121
除非是需要 selenium 的網頁，那你要另外去找資料

登入發表回應

我要發表回答

立即登入回答

參賽組數

1064 組

團體組數

40 組

累計文章數

22209 篇

完賽人數

600 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# windows server linux css react vue.js

IT邦幫忙