目前我需要爬台電負載的資料,但是span裡面那個...就讀不進去了(圖三),但是我需要的資料在展開列裡面,以下是我的code:
import requests
tai = "https://www.taipower.com.tw/d006/loadGraph/loadGraph/genshx_.html"
response = requests.get(tai)
file_obj = open('douban.html', 'w')
file_obj = open('douban.html', 'w', encoding="utf-8")
file_obj.write(response.content.decode('utf-8'))
file_obj.close()
print(response.content.decode('utf-8'))
from bs4 import BeautifulSoup
file_obj = open('douban.html', 'r', encoding="utf-8")
html = file_obj.read()
file_obj.close()
soup = BeautifulSoup(html, 'lxml')
all_ed = soup.find_all('span', id = "nestSumId")
print(all_ed)
看起來像是ajax來的動態資料,BeautifulSoap你只載入靜態頁面,並沒有執行JS執行,所以會抓不到,PhatomJS試試,或直接抓取AJAX所抓的資料(json) https://www.taipower.com.tw/d006/loadGraph/loadGraph/data/genary.txt?_=1599006031855
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get(my_url)
p_element = driver.find_element_by_id(id_='intro-text')
print(p_element.text)
1599006031855應該是時間戳記,用python的time就可以
這個json應該是Big5編碼,自行處理一下編碼轉換吧
瀏覽器的開發者工具裡的網路network → XHR ,可以看到AJAX引入的資料檔,通常是json,但也有伺服器已先轉好的SVG(向量圖檔),像是圖表都這樣做
import os,requests,jsonpickle,time,math
now = math.floor(time.time())
url = 'https://www.taipower.com.tw/d006/loadGraph/loadGraph/data/genary.txt?_={}'.format(now)
data = requests.get(url)
ustr = data.content.decode('utf-8')
jdata = jsonpickle.decode(ustr)
for j in jdata['aaData']:
print(j)
print('============')
# 只列出"明潭"
f = filter(lambda x:x[1].find('明潭')>=0,jdata['aaData'])
for i in f:
print(i)
輸出
.........(以上省略99999字)
["<A NAME='pumpingload'></A><b>抽蓄負載(Pumping Load)</b>", '明潭#1', '-', '0.0', '-', ' ']
["<A NAME='pumpingload'></A><b>抽蓄負載(Pumping Load)</b>", '明潭#2', '-', '0.0', '-', ' ']
["<A NAME='pumpingload'></A><b>抽蓄負載(Pumping Load)</b>", '明潭#3', '-', '0.0', '-', ' ']
["<A NAME='pumpingload'></A><b>抽蓄負載(Pumping Load)</b>", '明潭#4', '-', '0.0', '-', ' ']
["<A NAME='pumpingload'></A><b>抽蓄負載(Pumping Load)</b>", '明潭#5', '-', '0.0', '-', ' ']
["<A NAME='pumpingload'></A><b>抽蓄負載(Pumping Load)</b>", '明潭#6', '-', '0.0', '-', ' ']
["<A NAME='geothermal'></A><b>地熱(Geothermal)</b>", '小型地熱能', '0.3', '0.2', '66.667%', ' ']
["<A NAME='geothermal'></A><b>地熱(Geothermal)</b>", '小計', '0.3(0.001%)', '0.2(0.001%)', '', '']
============
["<A NAME='pumpinggen'></A><b>抽蓄發電(Pumping Gen)</b>", '明潭#1', '267.0', '0.0', '0.000%', ' ']
["<A NAME='pumpinggen'></A><b>抽蓄發電(Pumping Gen)</b>", '明潭#2', '267.0', '0.0', '0.000%', ' ']
["<A NAME='pumpinggen'></A><b>抽蓄發電(Pumping Gen)</b>", '明潭#3', '267.0', '0.0', '0.000%', ' ']
["<A NAME='pumpinggen'></A><b>抽蓄發電(Pumping Gen)</b>", '明潭#4', '267.0', '171.2', '64.120%', ' ']
["<A NAME='pumpinggen'></A><b>抽蓄發電(Pumping Gen)</b>", '明潭#5', '267.0', '0.0', '0.000%', ' ']
["<A NAME='pumpinggen'></A><b>抽蓄發電(Pumping Gen)</b>", '明潭#6', '267.0', '0.0', '0.000%', ' ']
["<A NAME='pumpingload'></A><b>抽蓄負載(Pumping Load)</b>", '明潭#1', '-', '0.0', '-', ' ']
["<A NAME='pumpingload'></A><b>抽蓄負載(Pumping Load)</b>", '明潭#2', '-', '0.0', '-', ' ']
["<A NAME='pumpingload'></A><b>抽蓄負載(Pumping Load)</b>", '明潭#3', '-', '0.0', '-', ' ']
["<A NAME='pumpingload'></A><b>抽蓄負載(Pumping Load)</b>", '明潭#4', '-', '0.0', '-', ' ']
["<A NAME='pumpingload'></A><b>抽蓄負載(Pumping Load)</b>", '明潭#5', '-', '0.0', '-', ' ']
["<A NAME='pumpingload'></A><b>抽蓄負載(Pumping Load)</b>", '明潭#6', '-', '0.0', '-', ' ']