為什麼一直在爬蟲呢?因為我已經好久沒碰程式交易也很少操作股市了(除了還是天天看57金錢爆,這節目最近不太準),現在只要傻傻空下去忍住不要回補,一反彈加空就好了,原本這幾個月失望到想靠程式交易另尋方法,靠著最近的重壓空單現在又有信心了,如果還是沒動力的話30天可能就無法完成了...
"證券代號",
"證券名稱",
"外資買進股數",
"外資賣出股數",
"外資買賣超股數",
"投信買進股數",
"投信賣出股數",
"投信買賣超股數",
"自營商買賣超股數",
"自營商買進股數(自行買賣)",
"自營商賣出股數(自行買賣)",
"自營商買賣超股數(自行買賣)",
"自營商買進股數(避險)",
"自營商賣出股數(避險)",
"自營商買賣超股數(避險)",
"三大法人買賣超股數"
"證券代號",
"證券名稱",
"外陸資買進股數(不含外資自營商)",
"外陸資賣出股數(不含外資自營商)",
"外陸資買賣超股數(不含外資自營商)",
"外資自營商買進股數",
"外資自營商賣出股數",
"外資自營商買賣超股數",
"投信買進股數",
"投信賣出股數",
"投信買賣超股數",
"自營商買賣超股數",(比上櫃多的)
"自營商買進股數(自行買賣)",
"自營商賣出股數(自行買賣)",
"自營商買賣超股數(自行買賣)",
"自營商買進股數(避險)",
"自營商賣出股數(避險)",
"自營商買賣超股數(避險)",
"三大法人買賣超股數"
"證券代號",
"證券名稱",
"外陸資買進股數(不含外資自營商)",
"外陸資賣出股數(不含外資自營商)",
"外陸資買賣超股數(不含外資自營商)",
"外資自營商買進股數",
"外資自營商賣出股數",
"外資自營商買賣超股數",
外資及陸資XXX
外資及陸資XXX
外資及陸資XXX
"投信買進股數",
"投信賣出股數",
"投信買賣超股數",
"自營商買進股數(自行買賣)",
"自營商賣出股數(自行買賣)",
"自營商買賣超股數(自行買賣)",
"自營商買進股數(避險)",
"自營商賣出股數(避險)",
"自營商買賣超股數(避險)",
"三大法人買賣超股數"
好像沒什麼好介紹的 只要把網址取出就是清資料的苦工了 把資料放在每日量價後
TWSE_URL = 'http://www.twse.com.tw/fund/T86?response=json&date={y}{m:02d}{d:02d}&selectType=ALL'
TPEX_URL = 'http://www.tpex.org.tw/web/stock/3insti/daily_trade/3itrade_hedge_result.php?l=zh-tw&se=AL&t=D&d={y}/{m:02d}/{d:02d}'
爬完的結果如下 之後還要把融資卷跟借卷加到同一張表
{'_id': '2018/10/03_0050',
'市場別': '上市',
'產業別': '',
'name': '元大台灣50',
'code': '0050',
'date': '2018/10/03',
'成交股數': 4369,
'成交金額': 375778,
'開盤價': 86.05,
'最高價': 86.3,
'最低價': 85.8,
'收盤價': 85.95,
'成交筆數': 1554,
'三大法人買賣超': 1120.0,
'外資自營商買賣超': 0.0,
'外資自營商買進': 0.0,
'外資自營商賣出': 0.0,
'外陸資買賣超': 1609.0,
'外陸資買進': 1611.0,
'外陸資賣出': 2.0,
'投信買賣超': -725.0,
'投信買進': 0.0,
'投信賣出': 725.0,
'自營商買賣超': -747.0,
'自營商買賣超避險': 983.0,
'自營商買進': 154.0,
'自營商買進避險': 1215.0,
'自營商賣出': 901.0,
'自營商賣出避險': 232.0}
# -*- coding: utf-8 -*-
import json
import time
from datetime import datetime
import pandas as pd
import scrapy
TWSE_URL = 'http://www.twse.com.tw/fund/T86?response=json&date={y}{m:02d}{d:02d}&selectType=ALL'
TPEX_URL = 'http://www.tpex.org.tw/web/stock/3insti/daily_trade/3itrade_hedge_result.php?l=zh-tw&se=AL&t=D&d={y}/{m:02d}/{d:02d}'
columns = ["_id",
"外陸資買進",
"外陸資賣出",
"外陸資買賣超",
"外資自營商買進",
"外資自營商賣出",
"外資自營商買賣超",
"投信買進",
"投信賣出",
"投信買賣超",
"自營商買進",
"自營商賣出",
"自營商買賣超",
"自營商買進避險",
"自營商賣出避險",
"自營商買賣超避險",
"三大法人買賣超"]
def parse_info(d, m):
_id = m['date'] + '_' + d[0]
if m['市場別'] == '上市':
d.pop(11)
d = d[2:]
d = [int(x.replace(',', '')) / 1000 for x in d]
else:
del d[8:11]
d = d[2:-1]
d = [int(x.replace(',', '')) / 1000 for x in d]
return dict(zip(columns, [_id, *d]))
class StockDaySpider(scrapy.Spider):
name = 'stock_investor'
custom_settings = {
'DOWNLOAD_DELAY': 1,
'CONCURRENT_REQUESTS': 1,
'MONGODB_COLLECTION': 'stock_day',
'MONGODB_ITEM_CACHE': 1,
'MONGODB_HAS_ID_FIELD': True,
'COOKIES_ENABLED': False
}
def __init__(self, beginDate=None, endDate=None, *args, **kwargs):
super(StockDaySpider, self).__init__(beginDate=beginDate, endDate=endDate, *args, **kwargs)
def start_requests(self):
if self.beginDate and self.endDate:
start = self.beginDate
end = self.endDate
else:
date = datetime.today().strftime("%Y-%m-%d")
start = date
end = date
for date in pd.date_range(start, end)[::-1]:
today = '{}/{:02d}/{:02d}'.format(date.year, date.month, date.day)
y = date.year
m = date.month
d = date.day
url = TWSE_URL.format(y=y, m=m, d=d)
time.sleep(8)
yield scrapy.Request(url, meta={'date': today, '市場別': '上市'})
y = y - 1911
url = TPEX_URL.format(y=y, m=m, d=d)
yield scrapy.Request(url, meta={'date': today, '市場別': '上櫃'})
def parse(self, response):
m = response.meta
json_data = json.loads(response.text)
if m['市場別'] == '上市':
try:
data = json_data['data']
for d in data:
yield parse_info(d, m)
except KeyError:
pass
else:
try:
data = json_data['aaData']
for d in data:
yield parse_info(d, m)
except KeyError:
pass