iT邦幫忙

0

python爬蟲入門問題

  • 分享至 

  • xImage

大家好
我是剛入門python爬蟲的新手
想要詢問一下
import requests
from bs4 import BeautifulSoup
res = requests.get('https://news.sina.com.cn/china/')
res.recoding = 'utf-8'
soup = BeautifulSoup(res.text,'html.parser')
for feed in soup.select('.feed-card-item'):
if len(news.select('h2')) > 0:
h2 = news.select('h2')[0].text
a = news.select('a')[0]['href']
print(h2,a)
為什麼在執行時只會出現,而沒有爬資料下來呢?
==================== RESTART: D:\534098\python\sina網爬蟲.py ====================

問sina吧。
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

1 個回答

0
性格妞
iT邦新手 2 級 ‧ 2018-11-30 17:17:36

就該網址而言,你要爬的資料,
其實就已經被埋在裡面的 JSON 資料裡。
你該做的,不應去解析那被 javascript render 後的結果頁面,
而應直接讀取解析那 JSON 資料。

解析出的資料會類似像這樣:

      {
        "intime": "1543566305",
        "channelid": "1",
        "ctime": "1543560435",
        "mtime": "1543566286",
        "authoruid": "0",
        "level": "1",
        "vid": "0",
        "ipad_vid": "0",
        "video_time_length": "0",
        "categoryid": "1",
        "mediaid": "0",
        "columnid": "915",
        "subjectid": "76866",
        "templateid": "0",
        "productid": "0",
        "ext_0": "0",
        "ext_1": "0",
        "ext_2": "0",
        "ext_3": "0",
        "ext_4": "0",
        "docid": "comos:hpevhcm4850489",
        "url": "https://news.sina.com.cn/c/2018-11-30/doc-ihpevhcm4850489.shtml",
        "urls": "[\"https:\\/\\/news.sina.com.cn\\/c\\/2018-11-30\\/doc-ihpevhcm4850489.shtml\"]",
        "wapurl": "http://news.sina.cn/gn/2018-11-30/detail-ihpevhcm4850489.d.html",
        "wapurls": "[\"http:\\/\\/news.sina.cn\\/gn\\/2018-11-30\\/detail-ihpevhcm4850489.d.html\"]",
        "wapsummary": "",
        "title": "蔡英文哭完后 台当局还要在这条路上越走越远",
        "stitle": "",
        "summary": "",
        "intro": "原标题:蔡英文哭完后,台当局还要在这条路上越走越远—— 蔡英文哭了!29日,这条新闻几乎刷爆了网络。在此前一
天举行的民进党中常会上...",
        "author": "",
        "commentid": "gn:comos-hpevhcm4850489:0",
        "video_id": "",
        "keywords": "选举,蔡英文,两岸",
        "media_name": "参考消息",
        "img": [],
        "images": [
          {
            "u": "http://n.sinaimg.cn/news/transform/116/w550h366/20181130/r5aB-hpfyces8815829.jpg",
            "w": 550,
            "h": 366,
            "t": "▲“九合一”选举大败之后,蔡英文宣布辞去党主席职务。"
          },
          {
            "u": "http://n.sinaimg.cn/translate/56/w1060h596/20181130/wPCJ-hpevhcm4850378.jpg",
            "w": 1060,
            "h": 596,
            "t": "▲陈明祺"
          },
          {
            "u": "http://n.sinaimg.cn/translate/661/w859h602/20181130/qaVk-hpevhcm4850410.jpg",
            "w": 859,
            "h": 602,
            "t": "▲马晓光"
          }
        ],
        "lids": "1356,1655,1741,1908,2509,2510,2670,2968,2970,2974",
        "oid": "166962642",
        "mlids": "",
        "ext": "0",
        "comment_reply": 10,
        "comment_show": 2,
        "comment_total": 29,
        "important": "{\"container_id\":\"66284\",\"pos\":\"\\u8981\\u95fb\",\"widget_name\":\"www\",\"action\":\"up\",\"wxb\":true,\"time\":\"2018-11-30 16:23:03\",\"operator\":\"zhangshen5@staff.sina.com.cn\",\"yw_rank\":\"1\"}"
      }

我要發表回答

立即登入回答