iT邦幫忙

2021 iThome 鐵人賽

DAY 22
0
自我挑戰組

實驗室助理的技術文章自我整理系列 第 22

Python - Scrabble Word Finder - Python 爬蟲練習筆記

Python - Scrabble Word Finder - Python 爬蟲練習筆記

參考資料

我自己的 Github:Eterna-E

專案來源連結:Eterna-E/ScrabbleWordFinder

說明

如題,當初會寫這篇練習筆記,主要是為了紀錄撰寫 Python 的爬蟲練習程式的練習過程,寫 Scrabble Word Finder 專案是打算要找出英文字母排列的可能性(有意義的),假設隨機輸入了幾個字母(字母數大約5~12個左右,超過太多的話,搜尋時間會很久)之後,到底能夠找出多少個有意義的單詞,然後再順便用 python 的爬蟲程式到劍橋詞典搜尋這些單詞的意思,輸出到命令提示字元(cmd)顯示出來,雖然好像沒什麼實際的作用,但感覺很有趣XD。

特此撰寫本篇文章作為紀錄文件,用以方便後續有需要的時候,可以快速的重複查閱,雖然後面比較沒有什麼機會再用到,但也算是一個還不錯的經驗。

字母數為 4 的單字查找

程式碼如下:

import requests
from bs4 import BeautifulSoup
getword = {
    'words':'abcde'
}
res=requests.post("https://wordfind.com/", data = getword)

soup = BeautifulSoup(res.text)

def getWord(words,num): #取得指定字母數的文字

    for word in soup.select('.defLink'):
        if(len(word.select('a')[0].text) == num):
            words.append(word.select('a')[0].text)
def printWord(words):
    print(str(len(words[0]))+' Letter Words')
    for word in words:
        print("    "+str(words.index(word)+1)+'. '+word)

word4=[]

getWord(word4,4)

printWord(word4)

輸出結果,如下圖所示:

劍橋詞典-英文解釋爬蟲

程式碼如下:

from urllib import request
def getHTML(url):
    headers = {'User-Agent': 'User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'}
    req = request.Request(url, headers=headers)
    return request.urlopen(req).read()
word4 = ['abed','aced','bade','bead','cade','dace']

for word in word4:
    soup = BeautifulSoup(getHTML("https://dictionary.cambridge.org/zht/%E8%A9%9E%E5%85%B8/%E8%8B%B1%E8%AA%9E/"+word))
    if(soup.select('.def.ddef_d.db')):
        print(word)
        print(soup.select('.def.ddef_d.db')[0].text.replace('\n',' '))
# 英文解釋爬蟲

輸出結果,如下圖所示:

劍橋詞典-中文解釋爬蟲

程式碼如下:

from urllib import request
def getHTML(url):
    headers = {'User-Agent': 'User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'}
    req = request.Request(url, headers=headers)
    return request.urlopen(req).read()
word4 = ['abed','aced','bade','bead','cade','dace']

for word in word4:
    soup = BeautifulSoup(getHTML("https://dictionary.cambridge.org/zht/%E8%A9%9E%E5%85%B8/%E8%8B%B1%E8%AA%9E-%E6%BC%A2%E8%AA%9E-%E7%B9%81%E9%AB%94/"+word))
    if(soup.select('.def.ddef_d.db')):
        print(word+soup.select('.def-body.ddef_b')[0].text)
    
# 中文解釋爬蟲

輸出結果,如下圖所示:

完成版

完成版,有做小優化過的 code,增加 fake-useragent ,並計算總搜尋時間。

程式碼如下:

import requests
from bs4 import BeautifulSoup
from urllib import request
from fake_useragent import UserAgent
import time

ua = UserAgent()

inputword='terofs'#lyeirwa alocilg nsiore eseatt outgfh kidnr
wordfind = {
    'words':inputword
}
startTime = time.time()
headers = {'User-Agent': ua.random }
res=requests.post("https://wordfind.com/", data = wordfind, headers = headers)
soup = BeautifulSoup(res.text)
#----------------------------------------------------------------------------------------------------
#----------------------------------------------------------------------------------------------------
def getWord(words,num):#將從Scrabble Word Finder網站網頁原始碼取得指定字母數的文字
    for word in soup.select('.defLink'):
        if(len(word.select('a')[0].text) == num):
            words.append(word.select('a')[0].text)

def getHTML(url):
    headers = {'User-Agent': ua.random }
    req = request.Request(url, headers=headers)
    return request.urlopen(req).read()
def wordCheck(words,word_checked,wordMeaning):
    if(words):#假如該字數的單字不存在,就不查詢
        for word in words:
            soup = BeautifulSoup(getHTML("https://dictionary.cambridge.org/zht/%E8%A9%9E%E5%85%B8/%E8%8B%B1%E8%AA%9E-%E6%BC%A2%E8%AA%9E-%E7%B9%81%E9%AB%94/"+word))
            if(soup.select('.def.ddef_d.db')):
                word_checked.append(word)
                wordMeaning.append(soup.select('.def-body.ddef_b')[0].text)
            else:
                soup = BeautifulSoup(getHTML("https://dictionary.cambridge.org/zht/%E8%A9%9E%E5%85%B8/%E8%8B%B1%E8%AA%9E/"+word))
                if(soup.select('.def.ddef_d.db')):
                    word_checked.append(word)
                    wordMeaning.append(soup.select('.def.ddef_d.db')[0].text.replace(':','.'))
def printWord(words):
    if(words):
        print(str(len(words[0]))+' Letter Words'+'( '+str(len(words))+' words Found )')
        for word in words:
            print("    "+str(words.index(word)+1)+'. '+word)
def printWordMeaning(words,WordMeaning):
    if(words):
        print('Word Meaning :')
        for wordmeanig in WordMeaning:
            print(str(WordMeaning.index(wordmeanig)+1)+'. '+words[WordMeaning.index(wordmeanig)])
            print(wordmeanig)
#----------------------------------------------------------------------------------------------------
def wordGetter(num,wordstmp,words,wordsMeaning):
    getWord(wordstmp,num)
    wordCheck(wordstmp,words,wordsMeaning)
    printWord(words)
    printWordMeaning(words,wordsMeaning)
#----------------------------------------------------------------------------------------------------

wordstmp = [[],[],[],[],[],[],[],[],[],[]]

words = [[],[],[],[],[],[],[],[],[],[]]

wordsMeaning = [[],[],[],[],[],[],[],[],[],[]]

print("根據你輸入的字母:'"+inputword+"'我們可以排列出以下這些單字")

num = [i for i in range(0, 10)]
num.reverse()
for i in num:
    wordGetter((i+3),wordstmp[i],words[i],wordsMeaning[i])

totalNum = 0
for i in range(0,10):
    totalNum+=len(words[i])
print(str(totalNum)+' words Found')
endTime = time.time()
print("總搜尋時間:", format(endTime - startTime, '.2f') ,'秒,大約 '+ str(format(float(format(endTime - startTime, '.2f'))/60, '.2f'))+'分鐘')
print(words)

輸出結果節錄,如下圖所示:


上一篇
Python - Python Selenium 套件使用參考筆記
下一篇
Python - PyEnchant 英文單字拼寫檢查套件參考筆記
系列文
實驗室助理的技術文章自我整理30

尚未有邦友留言

立即登入留言