iT邦幫忙

0

爬蟲 419Error 該如何解決?

  • 分享至 

  • xImage
  •  

如題,小弟爬蟲初學,想練使用者登入
登入網站:it邦幫忙
有在登入頁面抓取token和cookie了,但依然返回419
程式碼如下,麻煩各位大神幫忙,請多多指教 謝謝

import requests
from bs4 import BeautifulSoup
from urllib import request,parse
from http.cookiejar import CookieJar
import urllib
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
response = urllib.request.urlopen('https://www.python.org')
response.read().decode('utf-8')

headers = {
    'User-Agent': 'my user-agent',
}
#從登入頁抓取token
session = requests.Session()
url = 'https://member.ithome.com.tw/login'
response = session.get(url,headers = headers)
soup = BeautifulSoup(response.text,'html5lib')
token = soup.find('input',{'name':"_token"})['value']
data = {
    'account' : 'myaccount',
    'password': 'mypassword',
    '_token':str(token),
    '_token':str(token),
}
#抓取cookies
cookiejar = CookieJar()
handler = request.HTTPCookieProcessor(cookiejar)
opener = request.build_opener(handler)
cookies = {}
resp = opener.open('https://member.ithome.com.tw/login')
for c in (list(cookiejar)):
    cookies[c.name] = c.value
headers['Cookie']= f'_ga=GA1.3.249018218.1653631398; _gid=GA1.3.1048506314.1681834763; XSRF-TOKEN={cookies["XSRF-TOKEN"]}; ithomemembercenter_session={cookies["ithomemembercenter_session"]}'
resp = session.post(url, data = data ,headers = headers)
print(resp.status_code)

圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

1 則留言

1
frank_huang
iT邦新手 5 級 ‧ 2023-04-21 17:39:49

這幾行註解起來,看起來就200了

for c in (list(cookiejar)):
cookies[c.name] = c.value
headers['Cookie']= f'_ga=GA1.3.249018218.1653631398; _gid=GA1.3.1048506314.1681834763; XSRF-TOKEN={cookies["XSRF-TOKEN"]}; ithomemembercenter_session={cookies["ithomemembercenter_session"]}'

多查幾個範例,理解每一行程式用途,才能更好解決問題..

我要留言

立即登入留言