DAY 09 : 資料處理 split replace strip

第 11 屆 iThome 鐵人賽

DAY 9

AI & Data

蟲王養成 - scrapy系列第 9 篇

11th鐵人賽 python3 split replace strip

kevin8701111

團隊NUTC_IMAC_GREEN

2019-09-25 15:50:58

2507 瀏覽

分享至

先前發文
DAY 01 : 參賽目的與規劃
 DAY 02 : python3 virtualenv 建置
 DAY 03 : python3 request
DAY 04 : 使用beautifulsoup4 和lxml
DAY 05 : select 和find 抓取tag
DAY 06 : soup解析後 list取值
 DAY 07 : request_header_cookie 通過網頁18限制
 DAY 08 : ppt內文爬取
DAY 09 : 資料處理 split replace strip

先定一個變數給予字串 , 進行split replace strip

split使用後可以發現字串被分割後形成陣列 , 使用方法.split(分割條件)

content = '%drop % get Useful text%@$^&$%'
con_sp = content.split('%')
print(con_sp)
['', 'drop ', ' get Useful text', '@$^&$', '']

replace也可以看出取代的成果 , 格式為.replace(被取代的資料條件,要取代的資料)

content = '%drop % get Useful text%@$^&$%'
con_re = content.replace('%','')
print(con_re)
drop  get Useful text@$^&$

strip使用後可以發現 , 前後的%都被delect掉了 , 不是頭尾的內文都還在字串中

content = '%drop % get Useful text%@$^&$%'
con_strip = content.strip('%')
print(con_strip)
drop % get Useful text%@$^&$

來舉例一個

url = 'https://ithelp.ithome.com.tw/articles/10220333'
url_split = url.split('/')
print(url_split)

輸出:

['https:', '', 'ithelp.ithome.com.tw', 'articles', '10220333']

url_domain = url_split[2]
url_category = url_split[3]
url_category_number = url_split[4]
print(url_domain)
print(url_category)
print(url_category_number)

輸出

ithelp.ithome.com.tw
articles
10220333

明天來教如何寫入資料csv write

今日歌曲～～
DSPS - 冬天再去見你

DAY 08 : ppt內文爬取

DAY 10 : python csv 寫入和dict 合併

系列文

蟲王養成 - scrapy 共 30 篇

RSS系列文訂閱系列文

27 人訂閱

完整目錄

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

蟲王養成 - scrapy系列 第 9 篇

DAY 09 : 資料處理 split replace strip

尚未有邦友留言

標記使用者

蟲王養成 - scrapy系列第 9 篇