iT邦幫忙

0

Python爬蟲 抓取不到span裡面的文字

https://ithelp.ithome.com.tw/upload/images/20200901/20130131yBTrskYhIV.png
https://ithelp.ithome.com.tw/upload/images/20200901/20130131c7bJynz0IV.png
https://ithelp.ithome.com.tw/upload/images/20200901/20130131Lm5Lid5k46.png
目前我需要爬台電負載的資料,但是span裡面那個...就讀不進去了(圖三),但是我需要的資料在展開列裡面,以下是我的code:
import requests
tai = "https://www.taipower.com.tw/d006/loadGraph/loadGraph/genshx_.html"
response = requests.get(tai)
file_obj = open('douban.html', 'w')
file_obj = open('douban.html', 'w', encoding="utf-8")
file_obj.write(response.content.decode('utf-8'))
file_obj.close()
print(response.content.decode('utf-8'))

from bs4 import BeautifulSoup
file_obj = open('douban.html', 'r', encoding="utf-8")
html = file_obj.read()
file_obj.close()

soup = BeautifulSoup(html, 'lxml')

all_ed = soup.find_all('span', id = "nestSumId")
print(all_ed)

froce iT邦大師 1 級 ‧ 2020-09-02 08:40:46 檢舉
https://www.taipower.com.tw/d006/loadGraph/loadGraph/data/genary.txt

長條圖是從上面網址的資料直接匯總出來的,除非你用selenium,要不然沒辦法直接取。
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

1 個回答

2
japhenchen
iT邦超人 1 級 ‧ 2020-09-02 08:22:52

看起來像是ajax來的動態資料,BeautifulSoap你只載入靜態頁面,並沒有執行JS執行,所以會抓不到,PhatomJS試試,或直接抓取AJAX所抓的資料(json) https://www.taipower.com.tw/d006/loadGraph/loadGraph/data/genary.txt?_=1599006031855

from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get(my_url)
p_element = driver.find_element_by_id(id_='intro-text')
print(p_element.text)

1599006031855應該是時間戳記,用python的time就可以

這個json應該是Big5編碼,自行處理一下編碼轉換吧

瀏覽器的開發者工具裡的網路network → XHR ,可以看到AJAX引入的資料檔,通常是json,但也有伺服器已先轉好的SVG(向量圖檔),像是圖表都這樣做

import os,requests,jsonpickle,time,math

now = math.floor(time.time())
url = 'https://www.taipower.com.tw/d006/loadGraph/loadGraph/data/genary.txt?_={}'.format(now)
data = requests.get(url)
ustr = data.content.decode('utf-8')
jdata = jsonpickle.decode(ustr)

for j in jdata['aaData']:
    print(j)
print('============')
# 只列出"明潭"
f = filter(lambda x:x[1].find('明潭')>=0,jdata['aaData'])
for i in f:
    print(i)

輸出

.........(以上省略99999字)
["<A NAME='pumpingload'></A><b>抽蓄負載(Pumping Load)</b>", '明潭#1', '-', '0.0', '-', ' ']
["<A NAME='pumpingload'></A><b>抽蓄負載(Pumping Load)</b>", '明潭#2', '-', '0.0', '-', ' ']
["<A NAME='pumpingload'></A><b>抽蓄負載(Pumping Load)</b>", '明潭#3', '-', '0.0', '-', ' ']
["<A NAME='pumpingload'></A><b>抽蓄負載(Pumping Load)</b>", '明潭#4', '-', '0.0', '-', ' ']
["<A NAME='pumpingload'></A><b>抽蓄負載(Pumping Load)</b>", '明潭#5', '-', '0.0', '-', ' ']
["<A NAME='pumpingload'></A><b>抽蓄負載(Pumping Load)</b>", '明潭#6', '-', '0.0', '-', ' ']
["<A NAME='geothermal'></A><b>地熱(Geothermal)</b>", '小型地熱能', '0.3', '0.2', '66.667%', ' ']
["<A NAME='geothermal'></A><b>地熱(Geothermal)</b>", '小計', '0.3(0.001%)', '0.2(0.001%)', '', '']
============
["<A NAME='pumpinggen'></A><b>抽蓄發電(Pumping Gen)</b>", '明潭#1', '267.0', '0.0', '0.000%', ' ']
["<A NAME='pumpinggen'></A><b>抽蓄發電(Pumping Gen)</b>", '明潭#2', '267.0', '0.0', '0.000%', ' ']
["<A NAME='pumpinggen'></A><b>抽蓄發電(Pumping Gen)</b>", '明潭#3', '267.0', '0.0', '0.000%', ' ']
["<A NAME='pumpinggen'></A><b>抽蓄發電(Pumping Gen)</b>", '明潭#4', '267.0', '171.2', '64.120%', ' ']
["<A NAME='pumpinggen'></A><b>抽蓄發電(Pumping Gen)</b>", '明潭#5', '267.0', '0.0', '0.000%', ' ']
["<A NAME='pumpinggen'></A><b>抽蓄發電(Pumping Gen)</b>", '明潭#6', '267.0', '0.0', '0.000%', ' ']
["<A NAME='pumpingload'></A><b>抽蓄負載(Pumping Load)</b>", '明潭#1', '-', '0.0', '-', ' ']
["<A NAME='pumpingload'></A><b>抽蓄負載(Pumping Load)</b>", '明潭#2', '-', '0.0', '-', ' ']
["<A NAME='pumpingload'></A><b>抽蓄負載(Pumping Load)</b>", '明潭#3', '-', '0.0', '-', ' ']
["<A NAME='pumpingload'></A><b>抽蓄負載(Pumping Load)</b>", '明潭#4', '-', '0.0', '-', ' ']
["<A NAME='pumpingload'></A><b>抽蓄負載(Pumping Load)</b>", '明潭#5', '-', '0.0', '-', ' ']
["<A NAME='pumpingload'></A><b>抽蓄負載(Pumping Load)</b>", '明潭#6', '-', '0.0', '-', ' ']

我要發表回答

立即登入回答