2023 iThome 鐵人賽

DAY 6

SideProject30

python基礎及數據科學之應用系列第 6 篇

python基礎及數據科學之應用day 6[Python基礎應用(Requests模組提取網上內容及圖片)]

15th鐵人賽

carsonleung

團隊沙培小子

2023-09-21 21:55:13

476 瀏覽

分享至

大家好，今天是第六日，已經五分一了，各位團員加油。今天要介紹是python的open 函數和requests模組。

with oopen file

Python 程式語言具有用於處理文件的各種函數和語句。with語句和open()函數是這些語句和函數中的其中兩個。

模式	功能
w(寫入模式)	開啟檔案進行寫入。如果該檔案不存在，則會建立一個新檔案。後續的寫入操作將涵蓋現有內容。
r(讀取模式)	開啟檔案進行讀取。
a(追加模式)	開啟檔案進行追加。
w(寫入二進位模式)	開啟檔案以二進位模式寫入。它處理二進位資料（例如圖像、音訊檔案或非文字檔案)。

(w mode)例子:

with open("write.txt", 'w') as file:
    file.write("Hello, world!")

執行結果:

>>done

write.txt:

Hello, world!

(a mode)例子:

with open("write.txt", 'a') as file:
    file.write("Hello, world!22")
print("done")

執行結果:

>>done

write.txt:

Hello, world!Hello, world!22

(r mode)例子:

with open("write.txt", 'r') as file:
    file_contents = file.read()
    print(file_contents)

執行結果:

>>Hello, world!Hello, world!22

(w mode)例子:

file_path = 'write.txt'

binary_string = "01001000 01100101 01101100 01101100 01101111 00101100 00100000 01110111 01101111 01110010 01101100 01100100 00100001" #Hello world 的二進制代碼

binary_string = binary_string.replace(" ", "")
binary_data = bytes.fromhex('%x' % int(binary_string, 2))

with open(file_path, 'wb') as file:
    file.write(binary_data)

執行結果:

>>

write.txt

Hello, world!

什麼是Python requests

Python Requests 是一個流行且功能強大的函式庫，用於在 Python 中發出 HTTP 請求。它提供了一個易於使用且直覺的介面，用於發送 HTTP/1.1 請求、處理各種類型的 HTTP 方法（例如 GET、POST、PUT、DELETE）以及管理請求標頭、參數、cookie 和身份驗證。
透過 Requests 函式庫，您可以與 API 互動、抓取網頁內容

如何安裝Python請求

開啟終端機(terminal)或命令提示字元(CMD)。
輸入下列指令

pip install requests

如何檢查模組在電腦的位置

在終端機輸入下列指令

pip show requests

執行結果:

>>Name: requests
Version: 2.28.2
Summary: Python HTTP for Humans.
Home-page: https://requests.readthedocs.io
Author: Kenneth Reitz
Author-email: me@kennethreitz.org
License: Apache 2.0
Location: C:\Users\leung\AppData\Local\Programs\Python\Python311\Lib\site-packages
Requires: certifi, charset-normalizer, idna, urllib3
Required-by: chatbotAI, clarifai, clarifai-grpc, instapy, MeaningCloud-python, openai, pwntools, requests-oauthlib, tensorboard, torchvision, webdriverdownloader

以Beautiful Soup 解析 HTML 程式碼，整理html碼例子。

from bs4 import BeautifulSoup

html_doc = """
<html><head><title>Hi</title></head>
<body>
<p>This is a test.</p>
<a id="link2" href="/my_link2">Link 2</a>
<p>Hello, <b class="boldtext">Bold Text</b></p>
</body></html>
"""


# 以 Beautiful Soup 解析 HTML 程式碼
soup = BeautifulSoup(html_doc, 'html.parser')
print(soup)

執行結果:

>><html><head><title>Hi</title></head>
<body>
<p>This is a test.</p>
<a href="/my_link2" id="link2">Link 2</a>
<p>Hello, <b class="boldtext">Bold Text</b></p>
</body></html>

用python requests模組請求網站

網站: https://www2.pyc.edu.hk/index.php

from bs4 import BeautifulSoup
import requests as req

url ="https://www2.pyc.edu.hk/index.php"

response =req.get(url)
htmlfile = BeautifulSoup(response.text, 'html.parser')
# print(htmlfile.prettify())
print(htmlfile.title)
print(htmlfile.title.string)

if (response.status_code)==200:
  print("ok")

執行結果:

>><title>Shatin Pui Ying College 沙田培英中學</title>
Shatin Pui Ying College 沙田培英中學
ok

用python requests模組請求網站圖片

import requests as req

url= "https://www2.pyc.edu.hk/img/indeximg/45th_SPYC-webbuttons_202223_english45.png"

Response =req.get(url)
photo =(Response.content)

with open(file="new.png",mode="wb")as f:
  f.write(photo)
print("end")

執行結果:

>>end

這樣子程式便會建立new.png，並放入下圖。

小練習:

試用python requests 取得下圖

https://lh3.googleusercontent.com/pw/AJFCJaW3AwdaRgbKzdE-YzYENHJf-c4OelnFQ-dgwYT9_AOezfq4FjS1yorvseQCV1EzO4eqYG5_TpZfdB2juj48nqMWpFUtl72SXYWaYvhwT5nkJWT6dyetluooU5TJDkLf6c_h7qqJomShJFeQKqx0sDy7=w500-w400

答案

import requests as req
from requests.models import Response

url= "https://lh3.googleusercontent.com/pw/AJFCJaUd_QKMd5y0Kg0oK-jCtHmQdwjMrSsRCDOc_5-Y_CPsGpwvx2L58rHe62MP1q6y_FODTqXiwX4mZCAnY8nRKRQdQuj2Lo_u9oWFQwPecjo9RkMqh6s3E8WDt7NnU__M8Z2ALcNkyJIdXCdfwXQt4an4=w1500-h3000"

Response =req.get(url)
photo =(Response.content)

with open(file="new.png",mode="wb")as f:
  f.write(photo)
print("end")

圖片結果:

搜尋內容根據html標籤


import requests as req
from bs4 import BeautifulSoup
url ="https://hk.yahoo.com/"
response = req.get(url)
htmlfile = BeautifulSoup(response.text, 'html.parser')
print("ok")
print(htmlfile.title.string)
spanalltag=(htmlfile.find_all("p")) #搜尋為p tag的字串

for i in spanalltag:
    i=i.string
  print(i)