iT邦幫忙

2023 iThome 鐵人賽

DAY 12
0
Software Development

開心撰寫 PHPUnit系列 第 12

Day 12. 重構爬蟲 - 移除不必要的 Interface

  • 分享至 

  • xImage
  •  

Yes

在上一篇我們是新增一個 HttpClient 的 interface 後,再實作一個 GuzzleHttpClient,但實際去看 Guzzle 的 Client 已經 implements 兩個 interface,\GuzzleHttp\ClientInterface, \Psr\Http\Client\ClientInterface

https://ithelp.ithome.com.tw/upload/images/20230927/20065818SKj4F4ETBY.png

而 PttCrawler 是注入我們自己新增的 HttpClient interface,所以這次我們來反相重構,把 HttpClient 的 interface 拔掉。

首先我們知道 Guzzle Client implements \GuzzleHttp\ClientInterface, \Psr\Http\Client\ClientInterface 這兩個 interface,代表著 PttCrawler 注入的 HttpClient 不輪是更換為 \GuzzleHttp\ClientInterface\Psr\Http\Client\ClientInterface 都可以讓 PttCrawler 正常執行,所以我們可以打開 \GuzzleHttp\ClientInterface\Psr\Http\Client\ClientInterface 看看這兩隻 interface 各有哪些 public method 需要實作

\GuzzleHttp\ClientInterface

/**
 * Client interface for sending HTTP requests.
 */
interface ClientInterface
{
    /**
     * The Guzzle major version.
     */
    public const MAJOR_VERSION = 7;

    /**
     * Send an HTTP request.
     *
     * @param RequestInterface $request Request to send
     * @param array            $options Request options to apply to the given
     *                                  request and to the transfer.
     *
     * @throws GuzzleException
     */
    public function send(RequestInterface $request, array $options = []): ResponseInterface;

    /**
     * Asynchronously send an HTTP request.
     *
     * @param RequestInterface $request Request to send
     * @param array            $options Request options to apply to the given
     *                                  request and to the transfer.
     */
    public function sendAsync(RequestInterface $request, array $options = []): PromiseInterface;

    /**
     * Create and send an HTTP request.
     *
     * Use an absolute path to override the base path of the client, or a
     * relative path to append to the base path of the client. The URL can
     * contain the query string as well.
     *
     * @param string              $method  HTTP method.
     * @param string|UriInterface $uri     URI object or string.
     * @param array               $options Request options to apply.
     *
     * @throws GuzzleException
     */
    public function request(string $method, $uri, array $options = []): ResponseInterface;

    /**
     * Create and send an asynchronous HTTP request.
     *
     * Use an absolute path to override the base path of the client, or a
     * relative path to append to the base path of the client. The URL can
     * contain the query string as well. Use an array to provide a URL
     * template and additional variables to use in the URL template expansion.
     *
     * @param string              $method  HTTP method
     * @param string|UriInterface $uri     URI object or string.
     * @param array               $options Request options to apply.
     */
    public function requestAsync(string $method, $uri, array $options = []): PromiseInterface;

    /**
     * Get a client configuration option.
     *
     * These options include default request options of the client, a "handler"
     * (if utilized by the concrete client), and a "base_uri" if utilized by
     * the concrete client.
     *
     * @param string|null $option The config option to retrieve.
     *
     * @return mixed
     *
     * @deprecated ClientInterface::getConfig will be removed in guzzlehttp/guzzle:8.0.
     */
    public function getConfig(string $option = null);
}

\Psr\Http\Client\ClientInterface

<?php

namespace Psr\Http\Client;

use Psr\Http\Message\RequestInterface;
use Psr\Http\Message\ResponseInterface;

interface ClientInterface
{
    /**
     * Sends a PSR-7 request and returns a PSR-7 response.
     *
     * @param RequestInterface $request
     *
     * @return ResponseInterface
     *
     * @throws \Psr\Http\Client\ClientExceptionInterface If an error happens while processing the request.
     */
    public function sendRequest(RequestInterface $request): ResponseInterface;
}

觀察完這兩個 interface 之後,我們挑 \Psr\Http\Client\ClientInterface 來實作,因為需要的 public method 只有一個實作起來比較輕鬆啊(柿子挑軟的吃啊)

重構

這時我們就可以先調整我們的測試,調整為

<?php

namespace Recca0120\Ithome30\Tests;

use GuzzleHttp\Client;
use GuzzleHttp\Psr7\Response;
use Mockery;
use PHPUnit\Framework\TestCase;
use Psr\Http\Client\ClientInterface;
use Recca0120\Ithome30\PttCrawler;

class PttCrawlerTest extends TestCase
{
    public function test_fetch_board_page()
    {
        /** @var Mockery\Mock|ClientInterface $httpClient */
        $httpClient = Mockery::mock(ClientInterface::class);
        $httpClient
            ->allows('sendRequest')
            ->andReturn(new Response(200, [], file_get_contents(__DIR__ . '/fixtures/ptt_home.html')));

        $crawler = new PttCrawler($httpClient);
        $records = $crawler->all();

        self::assertEquals([
            'name' => 'Gossiping',
            "nuser" => '12185',
            'class' => '綜合',
            'title' => '[八卦]不停重複今日公祭明日忘記',
        ], $records[0]);
    }

    public function test_guzzle_client()
    {
        $crawler = new PttCrawler(new Client());
        $records = $crawler->all();

        self::assertEquals([
            'name' => 'Gossiping',
            "nuser" => '12185',
            'class' => '綜合',
            'title' => '[八卦]不停重複今日公祭明日忘記',
        ], $records[0]);
    }
}

這時 VSCode 應該會呈現這個狀況
https://ithelp.ithome.com.tw/upload/images/20230927/200658186UIrzr5WJu.png

這是因為我們還沒調整我們的 PttCrawler,這時先執行一次測試(只需執行 test_fetch_board_page)先得到一個紅燈,再接著調整 PttCrawler,此時 VScode 會呈現這個狀況
https://ithelp.ithome.com.tw/upload/images/20230927/20065818WspgzOnJd8.png

所以我們可以再依照 VSCode 的提示再接著把 PttCrawler 修改為

<?php
// src/PttCrawler.php

namespace Recca0120\Ithome30;

use GuzzleHttp\Psr7\Request;
use Psr\Http\Client\ClientInterface;

class PttCrawler
{
    public function __construct(private ClientInterface $httpClient)
    {
    }

    public function all()
    {
        $request = new Request('GET', 'https://www.ptt.cc/bbs/hotboards.html');
        $response = $this->httpClient->sendRequest($request);
        $html = (string) $response->getBody();

        return array_map(
            fn (string $row) => $this->parseCols($row),
            $this->parseRows($html)
        );
    }

    private function parseCols($row)
    {
        preg_match_all('/"board-(?<name>\w+)">(?<value>.+?)<\/div>/', $row, $matches);
        $cols = [];
        foreach (array_keys($matches[0]) as $index) {
            $name = $matches['name'][$index];
            $value = $matches['value'][$index];
            $cols[$name] = str_replace('◎', '', strip_tags($value));
        }

        return $cols;
    }

    private function parseRows($html)
    {
        preg_match_all('/<a\sclass="board"[^>]*>.+?<\/a>/s', $html, $matches);

        return $matches[0];
    }
}

修改再執行測試我們就可以得到一個綠燈,但為了保險起見,我們應該真實的去爬一次網頁,所以我們就可以接著執行第二個測試(test_guzzle_client),我們會得到以下的結果

https://ithelp.ithome.com.tw/upload/images/20230928/200658188p9HYNQpeT.png

雖然這個測試會得到紅燈,但沒關係這個測試只是我們拿來測真實執行狀況,所以執行完之後再幫它加上$this->markTestSkipped();,讓這個測試案例省略即可

最後我們就可以把

  • src/Contracts/HttpClient.php
  • src/GuzzleHttpClient.php
  • src/HttpClient.php

這幾個檔案刪除掉,這樣是不是舒服多了呢?


上一篇
Day 11. 使用 Guzzle 重構爬蟲 - 抽出 Interface
下一篇
Day13. 重構爬蟲測試 - Mockery spy
系列文
開心撰寫 PHPUnit30
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言