今天原本想開始抓個股的kbar資料及後續處理,結果在清洗Contract資料時,發現抓出來的TSE+OTC的股票Contract資料,相同的程式執行結果有時是592筆,但有時又變成294筆,因為數字實在是差太多了,所以就又回去看了一下官方說明文件內容:https://sinotrade.github.io/tutor/contract/
Contracts的資料目前分為4大類:指數Index、股票Stock、期貨Future跟Option選擇權,在執行login()時,在登入成功後,shioaji會開始下載及初始化Contract資料。而contracts_cb這個參數,就是當上面4大類的Contract資料下載及初始完成後,就會執行contracts_cb中所傳入的function。
在這裡,我們修改一下之前的Day 18 - 取得所有Contract程式範例中的程式,並看一下實際上是發生了什麼問題,修改後的程式說明如下:
import os
from dotenv import load_dotenv
import shioaji as sj
import pandas as pd
load_dotenv('D:\\python\\shioaji\\.env') #讀取.env中的環境變數
api = sj.Shioaji()
api.login(
person_id=os.getenv('YOUR_PERSON_ID'),
passwd=os.getenv('YOUR_PASSWORD'),
contracts_cb=print #設定callback為print,即完成初始化時輸出至console
)
print('api.login is done...') #輸出目前執行的步驟至console
stock_list = []
for exchange in api.Contracts.Stocks:
for stock in exchange:
stock_list.append({**stock})
print('for loop is done...') #輸出目前執行的步驟至console
print(f'len(stock_list) is :{len(stock_list)}')
df = pd.DataFrame(stock_list)
df.to_csv('stock_list.csv', index=False, encoding="utf_8_sig")
print('df.to_csv is done...') #輸出目前執行的步驟至console
api.logout()
執行結果:
Response Code: 0 | Event Code: 0 | Info: host '203.66.91.161:80', hostname '203.66.91.161:80' IP 203.66.91.161:80 (host 1 of 1) (host connection attempt 1 of 1) (total connection attempt 1 of 1) | Event: Session up
SecurityType.Index
SecurityType.Future
api.login is done...
for loop is done...
len(stock_list) is :2773
df.to_csv is done...
SecurityType.Stock
SecurityType.Option
可以看到stock_list的長度為2773。但是Contracts.Stocks資料下載完成後所輸出的「SecurityType.Stock」訊息,卻是在for loop迴圈完成並輸出csv檔之後,這表示我們抓的Contracts.Stocks其實是不完整的。
接著,把上面的程式碼稍做修改,修改之後的程式如下:
import os
from dotenv import load_dotenv
import shioaji as sj
import pandas as pd
from shioaji.constant import SecurityType #匯入SecurityType常數
import threading #匯入threading模組
load_dotenv('D:\\python\\shioaji\\.env') #讀取.env中的環境變數
event = threading.Event() #宣告event
api = sj.Shioaji()
def my_cb(security_type):
print(repr(security_type))
#當Contracts.Stocks下載完成時,輸出訊息並執行event.set()
if security_type == SecurityType.Stock:
print('Contracts.Stock is fetch..')
event.set() #讓原本wait的程式繼續執行
api.login(
person_id=os.getenv('YOUR_PERSON_ID'),
passwd=os.getenv('YOUR_PASSWORD'),
contracts_cb=my_cb #指定callback為my_cb
)
print('api.login is done...')
event.wait() #api.login執行完後,讓程式先進入等待
stock_list = []
print('start for loop...')
for exchange in api.Contracts.Stocks:
for stock in exchange:
stock_list.append({**stock})
print('for loop is done...')
print(f'len(stock_list) is :{len(stock_list)}')
df = pd.DataFrame(stock_list)
print(len(df))
print(len(stock_list))
df.to_csv('stock_list.csv', index=False, encoding="utf_8_sig")
print('df.to_csv is done...')
api.logout()
修改後的程式執行結果如下:
Response Code: 0 | Event Code: 0 | Info: host '203.66.91.161:80', hostname '203.66.91.161:80' IP 203.66.91.161:80 (host 1 of 1) (host connection attempt 1 of 1) (total connection attempt 1 of 1) | Event: Session up
<SecurityType.Index: 'IND'>
<SecurityType.Future: 'FUT'>
api.login is done...
<SecurityType.Stock: 'STK'>
Contracts.Stock is fetch..
start for loop...
<SecurityType.Option: 'OPT'>
for loop is done...
len(stock_list) is :32773
32773
32773
df.to_csv is done...
在api.login執行完成後,我們加入了event.wait()讓程式進入等待,當Contracts.Stocks下載完成時,執行event.set()讓原本等待中的程式繼續執行。跑出來的stock_list長度就增加到32773。且執行的結果,for loop會在Contracts.Stock完成下載後再開始進行,這樣子所抓出來的資料才會是完整的。
除了用event.wait()及event.set(),來確認我們要抓的Contracts是否已完整下載外,實測時我發現,其實也可以用print的方式確認,下面的程式即為用print的方式確認:
import os
from dotenv import load_dotenv
import shioaji as sj
import pandas as pd
from shioaji.constant import SecurityType
load_dotenv('D:\\python\\shioaji\\.env') #讀取.env中的環境變數
api = sj.Shioaji()
def my_cb(security_type):
print(repr(security_type))
if security_type == SecurityType.Stock:
print('Contracts.Stock is fetch..')
api.login(
person_id=os.getenv('YOUR_PERSON_ID'),
passwd=os.getenv('YOUR_PASSWORD'),
contracts_cb=my_cb
)
print('api.login is done...')
print(api.Contracts.Stocks) #將api.Contracts.Stocks輸出至console
stock_list = []
print('start for loop...')
for exchange in api.Contracts.Stocks:
for stock in exchange:
stock_list.append({**stock})
print('for loop is done...')
print(f'len(stock_list) is :{len(stock_list)}')
df = pd.DataFrame(stock_list)
df.to_csv('stock_list.csv', index=False, encoding="utf_8_sig")
print('df.to_csv is done...')
# df = pd.DataFrame(future_list)
# df.to_csv('future_list.csv', index=False, encoding="utf_8_sig")
api.logout()
程式執行結果如下:
Response Code: 0 | Event Code: 0 | Info: host '203.66.91.161:80', hostname '203.66.91.161:80' IP 203.66.91.161:80 (host 1 of 1) (host connection attempt 1 of 1) (total connection attempt 1 of 1) | Event: Session up
<SecurityType.Index: 'IND'>
<SecurityType.Future: 'FUT'>
api.login is done...
<SecurityType.Stock: 'STK'>
Contracts.Stock is fetch..
start for loop...
<SecurityType.Option: 'OPT'>
for loop is done...
len(stock_list) is :32773
32773
32773
df.to_csv is done...
執行後,可以發現雖然程式中沒有使用event.wait()進行等待,但在執行print(api.Contracts.Stocks)時,程式其實是會進入blocking狀態並等待Contracts.Stocks資料下載完成,才繼續進行後續的動作,而且抓出來的stock_list長度也跟上面的程式結果相同。
因為這裡的程式只有要抓Contracts.Stocks中的資料,所以只有執行print(api.Contracts.Stocks)來做確保;如果你要確保這4種類型的資料都下載完成,可以改為print(api.Contracts),這樣就會變成在Option資料下載完成後,才繼續後續的動作。