我用python在網路爬取資料時發生錯誤重新再試就會成功,請教各位前輩先進為甚麼會這樣。以下是我的程式碼:
import requests
import bs4
from bs4 import BeautifulSoup
import pandas as pd
import time # 引入time
import sys
import os
import numpy as np
# 將時間元組轉換成想要的字串
timeString = time.strftime("%d%H")
yeardf = int(timeString[0:4])
yeardf1 = int(yeardf-1911)
frames = []
frames1 = []
datble = []
for i in range(99,1767):
ndf = pd.read_excel("D:/python/board/whole.xlsx", usecols=["code"])
fndf = int(ndf.at[i, "code"])
print(fndf)
for month in range(1,6):
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"}
payload = {'encodeURIComponent': 1,
'step': 1,
'firstin': 1,
'off': 1,
'queryName': 'co_id',
'inpuType': 'co_id',
'TYPEK': 'all',
'isnew': 'false',
'co_id': fndf,
'year': 112,
'month': month, }
res = requests.post("https://mops.twse.com.tw/mops/web/stapap1", params=payload, headers={'Connection':'close'})
soup = bs4.BeautifulSoup(res.text, "html.parser")
data = soup.find_all('table')
data1 = soup.find('h3')
if data1 == None:
df = len(data)
dtable = soup.select('table')[int(df-2)]
else:
df = len(data)
dtable = soup.select('table')[int(df-3)]
df = pd.read_html(dtable.prettify())
dfs = pd.DataFrame(np.concatenate(df))
newdata = dfs.iat[6, 1]
print(newdata)
frames.append(newdata)
result = pd.DataFrame(frames, columns=["code"])
frames.clear()
data1 = result.at[0, "code"]
data2 = result.at[1, "code"]
data3 = result.at[2, "code"]
data4 = result.at[3, "code"]
data5 = result.at[4, "code"]
if data2>=data1 and data3>=data2 and data4>data3 and data5>data4:
frames1.append(fndf)
print("pocketstock="+str(frames1))
result1 = pd.DataFrame(frames1)
result1.to_excel("D:/python/board/"+timeString+".xlsx", index = False)
time.sleep(20)
IndexError: index 1 is out of bounds for axis 0 with size 1
以上的錯誤只是其中之一,重新再試就會成功。
謝謝各位前輩先進的指教。感恩。
幾個建議
index error 應該是在取資料子集時被拋出的,如果有時可以有時不行,大概就以上兩種可能了。
不想檢查錯誤,就只能寫一堆 try catch 了