python 抓取註解裡面的文字

python3

sleepeye 2022-07-29 14:46:23 ‧ 1442 瀏覽

分享至

hsfncu iT邦新手 5 級 ‧ 2022-07-30 10:55:15 檢舉

使用 re

登入發表討論

直播研討會

{{ item.channelVendor }} {{ item.webinarstarted }} |

直播中

1 個回答

熊熊工程師

iT邦研究生 1 級 ‧ 2022-08-01 09:20:43

最佳解答

import re

pattern = r"<!.+>"
test_string = """
                <td class="t4t1" nowrap id="oAddCheckbox">
                <SCRIPT LANGUAGE=javascript>
                <!--GenLink2stk('AS2834','臺企銀');1/ -->
                </SCRIPT>
                </td>
                    <td class="t3n1" nowrap>215</td>
                    <td class="t3n1" nowrap>9</td>
                    <td class="t3n1" nowrap>206</td>
                </tr>
            """

print(re.search(pattern, test_string).group())

回應 6
分享
檢舉

看更多先前的回應...收起先前的回應...

sleepeye iT邦新手 5 級 ‧ 2022-08-02 14:10:57 檢舉

換成實際的網頁之後就會出現錯誤

熊熊工程師 iT邦研究生 1 級 ‧ 2022-08-02 15:34:15 檢舉

程式碼請用程式區塊
能請你說一下你貼的程式跟我給的答案的關聯在...?

sleepeye iT邦新手 5 級 ‧ 2022-08-04 14:20:17 檢舉

用你的範例,可以抓出來註解裡面的內容
但我如果替換成實際網址
https://fubon-ebrokerdj.fbs.com.tw/z/zg/zgb/zgb0.djhtm?a=7000&b=0037003000300056&c=E&d=1

他就不會抓出來

註:IT邦幫忙新手&PYTHON新手,說聲抱歉&謝謝你~~

熊熊工程師 iT邦研究生 1 級 ‧ 2022-08-04 17:36:26 檢舉

推測你遇到的問題：

soup 抓出來後，由於是註解，好像不能直接取得 .text 要把 soup 整個物件轉為 string 後才可以當作字串使用 re 進行解析
轉成字串後會碰到一個問題，抓出來的資料會自己被切成好幾行，向下圖這樣，所以 re 的 pattern 需要做修改
附上成功範例，再根據你的需求去做字串的解析

轉成字串後的資料：

附上成功程式：

import re
import requests
from bs4 import BeautifulSoup

url = "https://fubon-ebrokerdj.fbs.com.tw/z/zg/zgb/zgb0.djhtm?a=7000&b=0037003000300056&c=E&d=1"
res = requests.get(url)
soup = BeautifulSoup(res.text, "lxml")
for item in soup.find_all("td", class_="t4t1"):
    pattern = r"GenLink2stk.+"
    result = re.search(pattern, str(item))
    if result:
        print(result.group())

成功程式執行結果：