急 python 爬蟲突然遭受阻擋

python 爬蟲阻擋

BB 2021-06-22 15:48:22 ‧ 6434 瀏覽

分享至

我需要爬取foodpanda網站做作業
之前完全沒問題就在剛剛突然無法獲得資料
於是我改變程式碼尋找問題




import bs4
import requests
def foodpanda( city_name):
    #建立縣市相對應網址字典
    city = {"台北市":"taipei-city", "新北市":"new-taipei-city", "台中市":"taichung-city","高雄市":"kaohsiung-city",
            "新竹市":"hsinchu-city", "桃園市":"taoyuan-city", "基隆市":"keelung","台南市":"tainan-city",
            "苗栗市":"miaoli-county", "嘉義市":"chiayi-city", "彰化市":"changhua", "宜蘭縣":"yilan-city",
            "屏東縣":"pingtung-city", "雲林縣":"yunlin-county", "花蓮市":"hualien", "南投市":"nantou-county",
            "台東市":"taitung-county","澎湖縣":"penghu-city", "金門縣":"kinmen-city"}
    
    if city_name in city:
        city_url = city[city_name]
        url = "https://www.foodpanda.com.tw/city/"+city_url
        #如果輸入名稱在字典內,取得相對應網址
        
        header = {"User-Agent":"Moziilla/5.0 (Windows NT 6.1; WOW64)\AppleWebKit/537.6 (KHTML, like Gecko) Chrome/45.0.2454.101\
            Safari/537.36"}
        url = requests.get(url, headers = header)
        #下載網頁
        search = bs4.BeautifulSoup(url.text, "lxml")
        #解析下載後的網頁

        print(search.text)
       
foodpanda("台北市")

結果是
Please verify you are a human
Access to this page has been denied because we believe you are using automation tools to browse the website.

This may happen as a result of the following:
Javascript is disabled or blocked by an extension (ad blockers for example)
Your browser does not support cookies

Please make sure that Javascript and cookies are enabled on your browser and that you are not blocking them from loading.

Reference ID: #b9b6dc10-d32d-11eb-b71e-e1e76f2daa6e
Powered by PerimeterX, Inc.

請問我該如何解決

看更多先前的討論...收起先前的討論...

咖咖拉 iT邦好手 1 級 ‧ 2021-06-22 16:39:01 檢舉

This may happen as a result of the following:
Javascript is disabled or blocked by an extension (ad blockers for example)
Your browser does not support cookies

BB iT邦新手 4 級 ‧ 2021-06-22 16:52:59 檢舉

可是我沒有動過任何設定這該如何解決呢?

ccutmis iT邦高手 2 級 ‧ 2021-06-22 17:19:25 檢舉

可試試改用Python + Selenium 爬網站

小魚 iT邦大師 1 級 ‧ 2021-06-22 17:35:05 檢舉

被當成機器人了.

BB iT邦新手 4 級 ‧ 2021-06-22 19:58:55 檢舉

我在headers加上 "Host": "https://www.foodpanda.com.tw"
變成出現
400 Bad Request
400 Bad Request
cloudflare

haward79 iT邦好手 3 級 ‧ 2021-06-22 21:05:13 檢舉

(1) 換個 public ip 看看
(2) 清空瀏覽器的快取等資料
(3) 研究一下 header，檢查 referer、token 等

BB iT邦新手 4 級 ‧ 2021-06-22 23:45:56 檢舉

1失敗
2失敗
3我試過很多不同header了而且因為是單純靜態網站照理說不太會複雜
今天突然之間就不行了

㊣浩瀚星空㊣ iT邦大神 1 級 ‧ 2021-06-23 08:40:33 檢舉

你「是單純靜態網站照理說不太會複」這點的想法是錯誤的。
所謂的防爬蟲，除了從程式下手，還能從SERVER下手。

再來，那不可能是純靜態的。一定是偽靜態。

登入發表討論

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

7 個回答

froce

iT邦大師 1 級 ‧ 2021-06-23 10:31:07

最佳解答

import requests
def foodpanda( city_name):
    #建立縣市相對應網址字典
    city = {"台北市":"taipei-city", "新北市":"new-taipei-city", "台中市":"taichung-city","高雄市":"kaohsiung-city",
            "新竹市":"hsinchu-city", "桃園市":"taoyuan-city", "基隆市":"keelung","台南市":"tainan-city",
            "苗栗市":"miaoli-county", "嘉義市":"chiayi-city", "彰化市":"changhua", "宜蘭縣":"yilan-city",
            "屏東縣":"pingtung-city", "雲林縣":"yunlin-county", "花蓮市":"hualien", "南投市":"nantou-county",
            "台東市":"taitung-county","澎湖縣":"penghu-city", "金門縣":"kinmen-city"}
    
    if city_name in city:
        city_url = city[city_name]
        url = "https://www.foodpanda.com.tw/city/"+city_url
        #如果輸入名稱在字典內,取得相對應網址
        
        header = {
            "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:89.0) Gecko/20100101 Firefox/89.0",
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
            "Accept-Encoding": "gzip, deflate, br",
            "Accept-Language": "zh-TW,zh;q=0.8,en-US;q=0.5,en;q=0.3",
            "Cache-Control": "no-cache",
            "Connection": "keep-alive",
            }
        
        
        session = requests.Session()
        url = session.get(url, headers = header)
        print(url.text)

foodpanda("台北市")

你要裝瀏覽器也裝的像一點...
然後不要重複送太多次，並且間隔拉長，要不然人家又要和你捉迷藏了。