iT邦幫忙

2023 iThome 鐵人賽

DAY 11
0
Software Development

開心撰寫 PHPUnit系列 第 11

Day 11. 使用 Guzzle 重構爬蟲 - 抽出 Interface

  • 分享至 

  • xImage
  •  

我們的 PTTCrawler 目前是注入自己撰寫的 HttpClient class,但 PHP 早就有寫好的 Guzzle 啊,所以我們就可以透過重構的方式來注入 Guzzle,今天我們先用抽 Interface 的方式來重構

抽出 interface 並重新實作

<?php
// tests/PttCrawlerTest.php

namespace Recca0120\Ithome30\Tests;

use Mockery;
use PHPUnit\Framework\TestCase;
use Recca0120\Ithome30\PttCrawler;
use Recca0120\Ithome30\HttpClient;

class PttCrawlerTest extends TestCase
{
    public function test_fetch_board_page()
    {
        /** @var Mockery\Mock|HttpClient $httpClient */
        $httpClient = Mockery::mock(HttpClient::class);
        $httpClient
            ->allows('get')
            ->andReturn(file_get_contents(__DIR__ . '/fixtures/ptt_home.html'));

        $crawler = new PttCrawler($httpClient);
        $records = $crawler->all();

        self::assertEquals([
            'name' => 'Gossiping',
            "nuser" => '12185',
            'class' => '綜合',
            'title' => '[八卦]不停重複今日公祭明日忘記',
        ], $records[0]);
    }
}

這是我們上次的程式碼,打開後先執行一次測試,得到一次綠燈後,我們可以就可以抽出先 HttpClient interface,我們的主程式就會變為

<?php
// src/Contracts/HttpClient.php

namespace Recca0120\Ithome30\Contracts;

interface HttpClient
{
    public function get(): string;
}
<?php
// src/HttpClient.php

namespace Recca0120\Ithome30;

use Recca0120\Ithome30\Contracts\HttpClient as HttpClientContracts;

class HttpClient implements HttpClientContracts
{
    public function get(): string
    {
        return file_get_contents('https://www.ptt.cc/bbs/hotboards.html');
    }
}
<?php
// src/PttCrawler.php

namespace Recca0120\Ithome30;

// 加入這一行
use Recca0120\Ithome30\Contracts\HttpClient;

class PttCrawler
{
    public function __construct(private HttpClient $httpClient)
    {
    }

    public function all()
    {
        return array_map(
            fn (string $row) => $this->parseCols($row),
            $this->parseRows($this->httpClient->get())
        );
    }

    private function parseCols($row)
    {
        preg_match_all('/"board-(?<name>\w+)">(?<value>.+?)<\/div>/', $row, $matches);
        $cols = [];
        foreach (array_keys($matches[0]) as $index) {
            $name = $matches['name'][$index];
            $value = $matches['value'][$index];
            $cols[$name] = str_replace('◎', '', strip_tags($value));
        }

        return $cols;
    }

    private function parseRows($html)
    {
        preg_match_all('/<a\sclass="board"[^>]*>.+?<\/a>/s', $html, $matches);

        return $matches[0];
    }
}

改完後執行一次測試,再次得到綠燈後,再來調整測試,這時測試的部份我們只需調整一行即可

<?php

namespace Recca0120\Ithome30\Tests;

use Mockery;
use PHPUnit\Framework\TestCase;
use Recca0120\Ithome30\PttCrawler;
// 原本為 use Recca0120\Ithome30\Contracts\HttpClient
use Recca0120\Ithome30\Contracts\HttpClient;

class PttCrawlerTest extends TestCase
{
    public function test_fetch_board_page()
    {
        /** @var Mockery\Mock|HttpClient $httpClient */
        $httpClient = Mockery::mock(HttpClient::class);
        $httpClient
            ->allows('get')
            ->andReturn(file_get_contents(__DIR__ . '/fixtures/ptt_home.html'));

        $crawler = new PttCrawler($httpClient);
        $records = $crawler->all();

        self::assertEquals([
            'name' => 'Gossiping',
            "nuser" => '12185',
            'class' => '綜合',
            'title' => '[八卦]不停重複今日公祭明日忘記',
        ], $records[0]);
    }
}

修改完畢後再執行一次測試,再次得到綠燈後,表示我們到目前為止的調整都是正確的,此時我們只需再實作一個 Guzzle 版本的 HttpClient 即可

首先我們先安裝 guzzle

composer require guzzlehttp/guzzle

接著我們可以撰寫以下的程式碼

<?php
// src/GuzzleHttpClient.php

namespace Recca0120\Ithome30;

use GuzzleHttp\Client;
use Recca0120\Ithome30\Contracts\HttpClient;

class GuzzleHttpClient implements HttpClient
{
    public function get(): string
    {
        $client = new Client();
        $response = $client->get('https://www.ptt.cc/bbs/hotboards.html');

        return (string) $response->getBody();
    }
}

至此就完成我們的重構,但要怎麼證明程式無誤呢?我們只需再建一個新的測試案例並注入直實的 GuzzleHttpClient,並執行一次測試即可

<?php
// tests/PttCrawlerTest.php

namespace Recca0120\Ithome30\Tests;

use Mockery;
use PHPUnit\Framework\TestCase;
use Recca0120\Ithome30\PttCrawler;
use Recca0120\Ithome30\Contracts\HttpClient;
use Recca0120\Ithome30\GuzzleHttpClient;

class PttCrawlerTest extends TestCase
{
    public function test_fetch_board_page()
    {
        /** @var Mockery\Mock|HttpClient $httpClient */
        $httpClient = Mockery::mock(HttpClient::class);
        $httpClient
            ->allows('get')
            ->andReturn(file_get_contents(__DIR__ . '/fixtures/ptt_home.html'));

        $crawler = new PttCrawler($httpClient);
        $records = $crawler->all();

        self::assertEquals([
            'name' => 'Gossiping',
            "nuser" => '12185',
            'class' => '綜合',
            'title' => '[八卦]不停重複今日公祭明日忘記',
        ], $records[0]);
    }

    // 增加此測試案例並執行測試
    public function test_guzzle_client()
    {
        $crawler = new PttCrawler(new GuzzleHttpClient());
        $records = $crawler->all();

        self::assertEquals([
            'name' => 'Gossiping',
            "nuser" => '12185',
            'class' => '綜合',
            'title' => '[八卦]不停重複今日公祭明日忘記',
        ], $records[0]);
    }
}

當然這次執行必定得到一個紅燈

https://ithelp.ithome.com.tw/upload/images/20230926/20065818fmrb0xjfpM.png

但這邊得到紅燈也沒關係,我們只是確認 Guzzle 是否能正常執行,確認完後我們再把這段測試標記為 skip,遇到 Guzzle 升級新版本時,需要再確認執行結果是否一致的時候再把 skip remark 後就能立刻再次執行測試了


上一篇
Day 10. Test Double 的好幫手 - Mockery
下一篇
Day 12. 重構爬蟲 - 移除不必要的 Interface
系列文
開心撰寫 PHPUnit30
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言