針對每個host,PoolManager可以創建並管理ConnectionPool,預設管理數量是10個,若需要訪問的host數量較多,可以透過num_pools調整,要注意的trade-off是Memory和Socket的消耗程度。
http = urllib3.PoolManager(num_pools=50)
ConnectionPool管理一組HTTPConnection,當請求完成時HTTPConnection會回到Pool,預設值是1,若需要同時對同一個host發出多個請求,可以經由max_size調整
http = urllib3.PoolManager(maxsize=10)
maxsize – Number of connections to save that can be reused.
More than 1 is useful in multithreaded situations. If block is set to False, more connections will be created but they will not be saved once they’ve been used.
只設定maxsize代表可重複使用最大數量,若超過這個數字時還需要再發起多的請求,新的連線依然會被建立。
假如需要限制該行為,使用block=True,如此一來,連線數會被侷限在maxsize
import concurrent.futures
import urllib3
URLS = [
    'https://www.youtube.com/results?search_query=taiwan',
    'https://www.youtube.com/results?search_query=news',
    'https://www.youtube.com/results?search_query=weather',
    'https://www.youtube.com/results?search_query=mayday',
    'https://www.youtube.com/results?search_query=serious',
    'https://www.youtube.com/results?search_query=serious+music',
    'https://www.youtube.com/results?search_query=taiwan+weather',
    'https://www.youtube.com/results?search_query=power',
    'https://www.youtube.com/results?search_query=giant+human',
    'https://www.youtube.com/results?search_query=joker'
]
def youtube_it(http, url):
    r = http.request('GET', url)
    return r.data.decode('utf-8')
def query(http):
    with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
        future_to_url = {executor.submit(youtube_it, http, url): url for url in URLS}
        for future in concurrent.futures.as_completed(future_to_url):
            url = future_to_url[future]
            try:
                data = future.result()
            except Exception as exc:
                print('%r generated an exception: %s' % (url, exc))
            else:
                print('%r page length is %d' % (url, len(data)))
http = urllib3.PoolManager(maxsize=5, block=True)
query(http)
When dealing with large responses it’s often better to stream the response content
使用preload_content=False
import urllib3
http = urllib3.PoolManager()
def stream_download(http, url):
    r = http.request('GET', url, preload_content=False)
    with open('unsplash.jpg', 'wb+') as f:
        print('headers: {}'.format(r.headers))
        for chunk in r.stream(32):
            print('length: {}'.format(len(chunk)))
            print('data: {}'.format(chunk))
            f.write(chunk)
    r.release_conn()
def non_stream_download(http, url):
    r = http.request('GET', url)
    with open('unsplash.jpg', 'wb+') as f:
        print('headers: {}'.format(r.headers))
        print('length: {}'.format(len(r.data)))
        f.write(r.data)
url = 'https://images.unsplash.com/photo-1570475754561-4effe71c5084?ixlib=rb-1.2.1&q=85&fm=jpg&crop=entropy&cs=srgb&dl=pawel-czerwinski-IXgSpDrxsgM-unsplash.jpg'
non_stream_download(http, url)