python 爬蟲

python3 網頁爬蟲

lingwu 2019-12-01 01:00:40 ‧ 5572 瀏覽

分享至

我目前要擷取一些空氣的資料，然後我使用
https://pm25.lass-net.org/data/history.php?device_id=74DA38EBF902
裡面的資料，我查詢他的html架構只有 pre

起初使用BeautifulSoup，可以把全部的東西抓下來，可是不知道要怎麼抓特定的值

url = requests.get("https://pm25.lass-net.org/data/history.php?device_id=74DA38EBF902")
soup = BeautifulSoup(url.text,'html.parser')
ele = soup.find('pre')
print(ele)

會顯示None，把'pre'改成'a'、'head'，都是None，雖然本來他html的架構就沒有其他的

想請問有甚麼方法可以用爬蟲把一些特定的訊息抓下來
像是s_h0、s_t0

froce iT邦大師 1 級 ‧ 2019-12-01 01:32:28 檢舉

人家明明就直接給JSON了，為啥你要用soup？

登入發表討論

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

1 個回答

froce

iT邦大師 1 級 ‧ 2019-12-01 01:59:06

https://pyfiddle.io/fiddle/06b98adb-9a22-4aa3-a352-c343d7e686d2/?i=true

import requests
import json

res = requests.get("https://pm25.lass-net.org/data/history.php?device_id=74DA38EBF902")
data = json.loads(res.text)
AirBox = map(lambda x: x.get(list(x.keys())[0]) , data['feeds'][0]['AirBox'])
s_t0 = map(lambda x: {"s_t0":x.get("s_t0", ""), "timestamp": x.get("timestamp")}, AirBox)

print(list(s_t0))

回應 3
分享
檢舉

lingwu iT邦新手 5 級 ‧ 2019-12-02 00:00:18 檢舉

想問一下，我把

data = map(lambda x: {"timestamp": x.get("timestamp"), "s_d0":x.get("s_d0",""),"s_h0":x.get("s_h0",""),"s_t0":x.get("s_t0", "")}, AirBox)

裡面的"timestamp"、"s_d0"、"s_h0"、"s_t0"刪除(冒號前面的)，可是顯示出來的排法會是沒有順序的
請問要如何解決呢?

froce iT邦大師 1 級 ‧ 2019-12-02 00:04:57 檢舉

你刪掉就不會有順序了，因為會變成set，而不是dict。
你可以改成tuple。

data = map(lambda x: (x.get("timestamp"), x.get("s_d0",""), x.get("s_h0",""), x.get("s_t0", "")), AirBox)

pwd2126 iT邦新手 5 級 ‧ 2019-12-12 10:16:52 檢舉

請問timestamp格式是否和實際時間有固定差別，似乎是+8小時就可以調整？

登入發表回應

我要發表回答

立即登入回答

參賽組數

93 組

團體組數

3 組

累計文章數

124 篇

最後報名日

9/15

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 17th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# linux windows server css react

ChatGPT Business & Codex 如何從零開始?

IT邦幫忙

python 爬蟲

1 個回答

我要發表回答

標記使用者