Python 做一個簡單的小爬蟲(一) - iT 邦幫忙::一起幫忙解決難題，拯救 IT 人的一天

2018 iT 邦幫忙鐵人賽

DAY 26

2

自我挑戰組

30天Python學習分享路程系列第 26 篇

Python 做一個簡單的小爬蟲(一)

2018鐵人賽

CHI-CHENG HSIAO

2018-01-14 16:27:59

14008 瀏覽

分享至

今天不抓蘿莉,抓金髮女大生

第一天,先嘗試將a標籤的href個別的找出來
然後我們要做的事把字串長度取出來,之後我要判斷出字串的尾巴是不是圖片檔

程式碼如下圖,

# coding=utf-8

from selenium import webdriver
import urllib2
from bs4 import BeautifulSoup

driver = webdriver.Firefox()
driver.get("https://www.ptt.cc/bbs/Beauty/M.1515902682.A.579.html")
#print(driver.page_source)
soup = BeautifulSoup(driver.page_source, 'html.parser')
print(soup)
image = soup.find_all("a")
for element in image:
    print(element.get('href'))
    print(len(element.get('href')))
#print(image)
driver.close()

Python 正則表達式篇

Python 做一個簡單的小爬蟲(二)

系列文

30天Python學習分享路程共 30 篇

目錄

RSS系列文訂閱系列文

101 人訂閱

完整目錄

熱門推薦

{{ item.subject }}

{{ item.channelVendor }} | {{ item.webinarstarted }} |

{{ formatDate(item.duration) }}

直播中

1 則留言

0

pigs0231

iT邦新手 5 級 ‧ 2018-08-14 16:35:38

若使用這個範例時，webdriver.Firefox()有改成Chrome或是其他遊覽器時
找不到geckodriver，記得去
https://github.com/mozilla/geckodriver/releases
找取適合的版本，若沒辦法開成功，記得降低版本

回應
檢舉

登入發表回應

我要留言

立即登入留言

參賽組數

902 組

團體組數

37 組

累計文章數

19831 篇

完賽人數

528 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 17th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# linux windows server css react

IT邦幫忙