針對每個host,PoolManager
可以創建並管理ConnectionPool
,預設管理數量是10個,若需要訪問的host數量較多,可以透過num_pools
調整,要注意的trade-off是Memory和Socket的消耗程度。
http = urllib3.PoolManager(num_pools=50)
ConnectionPool管理一組HTTPConnection
,當請求完成時HTTPConnection會回到Pool,預設值是1,若需要同時
對同一個host發出多個請求,可以經由max_size
調整
http = urllib3.PoolManager(maxsize=10)
maxsize – Number of connections to save that can be reused.
More than 1 is useful in multithreaded situations. If block is set to False, more connections will be created but they will not be saved once they’ve been used.
只設定maxsize代表可重複使用最大數量,若超過這個數字時還需要再發起多的請求,新的連線依然會被建立。
假如需要限制該行為,使用block=True
,如此一來,連線數會被侷限在maxsize
import concurrent.futures
import urllib3
URLS = [
'https://www.youtube.com/results?search_query=taiwan',
'https://www.youtube.com/results?search_query=news',
'https://www.youtube.com/results?search_query=weather',
'https://www.youtube.com/results?search_query=mayday',
'https://www.youtube.com/results?search_query=serious',
'https://www.youtube.com/results?search_query=serious+music',
'https://www.youtube.com/results?search_query=taiwan+weather',
'https://www.youtube.com/results?search_query=power',
'https://www.youtube.com/results?search_query=giant+human',
'https://www.youtube.com/results?search_query=joker'
]
def youtube_it(http, url):
r = http.request('GET', url)
return r.data.decode('utf-8')
def query(http):
with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
future_to_url = {executor.submit(youtube_it, http, url): url for url in URLS}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
else:
print('%r page length is %d' % (url, len(data)))
http = urllib3.PoolManager(maxsize=5, block=True)
query(http)
When dealing with large responses it’s often better to stream the response content
使用preload_content=False
import urllib3
http = urllib3.PoolManager()
def stream_download(http, url):
r = http.request('GET', url, preload_content=False)
with open('unsplash.jpg', 'wb+') as f:
print('headers: {}'.format(r.headers))
for chunk in r.stream(32):
print('length: {}'.format(len(chunk)))
print('data: {}'.format(chunk))
f.write(chunk)
r.release_conn()
def non_stream_download(http, url):
r = http.request('GET', url)
with open('unsplash.jpg', 'wb+') as f:
print('headers: {}'.format(r.headers))
print('length: {}'.format(len(r.data)))
f.write(r.data)
url = 'https://images.unsplash.com/photo-1570475754561-4effe71c5084?ixlib=rb-1.2.1&q=85&fm=jpg&crop=entropy&cs=srgb&dl=pawel-czerwinski-IXgSpDrxsgM-unsplash.jpg'
non_stream_download(http, url)