營收對我來說超重要的~
上個月營收規定要在下個月10號前公佈,如果遇假日可以順延,因此爬蟲時間設置
0 1 2-15 * *
每月2號到15號凌晨1點整都執行這個爬蟲
用這個工具方便爬有多標題的表格
營收是很重要得資訊,雖然往往在營收公佈前早就有人知道而股價提前反應
但也有公佈後才爆跌,例如宏達電每月都創新低
連續營收創新高的就很有可能變飆股
# -*- coding: utf-8 -*-
from datetime import datetime
import re
import pandas as pd
import scrapy
from htmltable_df.extractor import Extractor
stock_type = {'國內上市': 'http://mops.twse.com.tw/nas/t21/sii/t21sc03_{}_{}_0.html',
'國外上市': 'http://mops.twse.com.tw/nas/t21/sii/t21sc03_{}_{}_1.html',
'國內上櫃': 'http://mops.twse.com.tw/nas/t21/otc/t21sc03_{}_{}_0.html',
'國外上櫃': 'http://mops.twse.com.tw/nas/t21/otc/t21sc03_{}_{}_1.html',
'國內興櫃': 'http://mops.twse.com.tw/nas/t21/rotc/t21sc03_{}_{}_0.html',
'國外興櫃': 'http://mops.twse.com.tw/nas/t21/rotc/t21sc03_{}_{}_1.html',
'國內公發公司': 'http://mops.twse.com.tw/nas/t21/pub/t21sc03_{}_{}_0.html',
'國外公發公司': 'http://mops.twse.com.tw/nas/t21/pub/t21sc03_{}_{}_1.html'
}
class TemplateSpider(scrapy.Spider):
name = "stock_rev_mm"
custom_settings = {
'DOWNLOAD_DELAY': 1,
'CONCURRENT_REQUESTS': 1,
'MONGODB_COLLECTION': name,
'MONGODB_ITEM_CACHE': 1000,
'MONGODB_UNIQ_KEY': [("YY", -1), ("MM", 1), ("公司代號", 1)],
}
def __init__(self, beginDate=None, endDate=None, *args, **kwargs):
super(TemplateSpider, self).__init__(beginDate=beginDate, endDate=endDate, *args, **kwargs)
def start_requests(self):
if not self.beginDate and not self.endDate:
date = datetime.today()
self.beginDate = (date - timedelta(days=20)).strftime("%Y-%m-%d")
self.endDate = date.strftime("%Y-%m-%d")
for date in pd.date_range(self.beginDate, self.endDate, freq='M'):
for key, val in stock_type.items():
YY = date.year - 1911
MM = date.month
url = val.format(YY, MM)
yield scrapy.Request(url, meta={'STOCK_TYPE': key, 'YY': YY, 'MM': MM})
def parse(self, response):
meta = response.meta
for table in response.dom('table[bgcolor="#FFFFFF"]').items():
treq0 = table.parent().parent().parent()('tr:eq(0)')
INDUSTRY_TYPE = re.search('產業別:(\w+)', treq0("th:contains('產業別')").text()).group(1)
extractor = Extractor(table)
data = extractor.df()[:-1]
data = data.applymap(lambda x: x.strip().replace(',', ''))
data.insert(0, 'MM', meta.get('MM'))
data.insert(0, 'YY', meta.get('YY'))
data.insert(0, 'INDUSTRY_TYPE', INDUSTRY_TYPE)
data.insert(0, 'STOCK_TYPE', meta.get('STOCK_TYPE'))
for r in data.to_dict('row'):
yield r
爬下的資料如下
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>STOCK_TYPE</th>
<th>INDUSTRY_TYPE</th>
<th>YY</th>
<th>MM</th>
<th>公司代號</th>
<th>公司名稱</th>
<th>營業收入_當月營收</th>
<th>營業收入_上月營收</th>
<th>營業收入_去年當月營收</th>
<th>營業收入_上月比較增減(%)</th>
<th>營業收入_去年同月增減(%)</th>
<th>累計營業收入_當月累計營收</th>
<th>累計營業收入_去年累計營收</th>
<th>累計營業收入_前期比較增減(%)</th>
<th>備註</th>
</tr>
</thead>
<tbody>
<tr>
<th>582</th>
<td>國內上市</td>
<td>半導體業</td>
<td>104</td>
<td>9</td>
<td>2481</td>
<td>強茂</td>
<td>1211766</td>
<td>976828</td>
<td>1515909</td>
<td>24.05</td>
<td>-20.06</td>
<td>11557846</td>
<td>12999596</td>
<td>-11.09</td>
<td>-</td>
</tr>
<tr>
<th>241</th>
<td>國內上櫃</td>
<td>觀光事業</td>
<td>104</td>
<td>5</td>
<td>5701</td>
<td>劍湖山</td>
<td>119818</td>
<td>128964</td>
<td>110552</td>
<td>-7.09</td>
<td>8.38</td>
<td>680584</td>
<td>624156</td>
<td>9.04</td>
<td>-</td>
</tr>
<tr>
<th>564</th>
<td>國外上市</td>
<td>電腦及週邊設備業</td>
<td>104</td>
<td>6</td>
<td>5215</td>
<td>科嘉-KY</td>
<td>125428</td>
<td>141482</td>
<td>157816</td>
<td>-11.34</td>
<td>-20.52</td>
<td>881774</td>
<td>920059</td>
<td>-4.16</td>
<td>-</td>
</tr>
<tr>
<th>124</th>
<td>國內上市</td>
<td>電子通路業</td>
<td>104</td>
<td>3</td>
<td>3048</td>
<td>益登</td>
<td>5663488</td>
<td>4576063</td>
<td>3809295</td>
<td>23.76</td>
<td>48.67</td>
<td>17004491</td>
<td>10256641</td>
<td>65.79</td>
<td>上述已依IFRS申報合併營收</td>
</tr>
<tr>
<th>867</th>
<td>國內上市</td>
<td>鋼鐵工業</td>
<td>104</td>
<td>8</td>
<td>2013</td>
<td>中鋼構</td>
<td>1081370</td>
<td>1201025</td>
<td>1370149</td>
<td>-9.96</td>
<td>-21.07</td>
<td>10063349</td>
<td>11688039</td>
<td>-13.90</td>
<td>-</td>
</tr>
<tr>
<th>783</th>
<td>國內上市</td>
<td>貿易百貨</td>
<td>104</td>
<td>6</td>
<td>1432</td>
<td>大魯閣實業</td>
<td>17099</td>
<td>18064</td>
<td>11066</td>
<td>-5.34</td>
<td>54.51</td>
<td>115048</td>
<td>64586</td>
<td>78.13</td>
<td>1.103/7起增加休閒娛樂事業至104營收增加2. 103年度營收與原公告數不同係因配合海...</td>
</tr>
<tr>
<th>134</th>
<td>國內上市</td>
<td>鋼鐵工業</td>
<td>104</td>
<td>4</td>
<td>2017</td>
<td>官田鋼</td>
<td>461520</td>
<td>464415</td>
<td>490552</td>
<td>-0.62</td>
<td>-5.91</td>
<td>1839699</td>
<td>1959197</td>
<td>-6.09</td>
<td>-</td>
</tr>
<tr>
<th>337</th>
<td>國內上市</td>
<td>建材營造</td>
<td>106</td>
<td>11</td>
<td>2501</td>
<td>國建</td>
<td>812550</td>
<td>981009</td>
<td>1727465</td>
<td>-17.17</td>
<td>-52.96</td>
<td>10469621</td>
<td>15831898</td>
<td>-33.87</td>
<td>係去年當月完工工地入帳較多</td>
</tr>
<tr>
<th>704</th>
<td>國內上櫃</td>
<td>貿易百貨</td>
<td>104</td>
<td>4</td>
<td>9960</td>
<td>邁達康</td>
<td>55879</td>
<td>56154</td>
<td>65075</td>
<td>-0.48</td>
<td>-14.13</td>
<td>187164</td>
<td>201852</td>
<td>-7.27</td>
<td>-</td>
</tr>
<tr>
<th>42</th>
<td>國內上櫃</td>
<td>通信網路業</td>
<td>105</td>
<td>9</td>
<td>5353</td>
<td>台林電通</td>
<td>94425</td>
<td>105303</td>
<td>75463</td>
<td>-10.33</td>
<td>25.12</td>
<td>897553</td>
<td>819214</td>
<td>9.56</td>
<td>-</td>
</tr>
</tbody>
</table>