[Day11] 技術指標計算 - 用 Python-client 搜尋 - iT 邦幫忙::一起幫忙解決難題，拯救 IT 人的一天

第 12 屆 iThome 鐵人賽

DAY 11

Elastic Stack on Cloud

Elastic 戰台股系列第 11 篇

[Day11] 技術指標計算 - 用 Python-client 搜尋

12th鐵人賽

華叔

團隊搭著 ESTC 飛上天

2020-09-26 07:37:17

7741 瀏覽

分享至

昨天用 Kibana 玩了搜尋，跟前幾天的套路一樣，轉換陣地到 Python 上來玩玩，除了 elasticsearch-py 的 search API，今天也會一併介紹另一個 Python library - Elasticsearch DSL。出發！

Search with Python Client

利用昨天相同的搜尋目標：「取回股票代號 2030 過去 10 天的收盤資料，並且最新的資料要在最上面」

from elasticsearch import Elasticsearch

es = Elasticsearch("https://5d6275b5333141c98d185e2e4a981ec7.ap-northeast-1.aws.found.io:9243", http_auth=('elastic', 'FGUb7opj27wQUrbN6iCac2hs'))

searchBody = {
    "sort" : [
        { "date" : {"order" : "desc"}}
    ],
    "query" : {
        "bool": {
          "must": {
              "match": { 
                  "stock_id": "2030" 
              }
          },
          "filter":{
              "range": {
                  "date": {
                      "gte": "now-10d/d",
                      "lt": "now/d"
                  }
              }
          }
        }        
    }
}

response = es.search(
    index="history-prices-python",body=searchBody)

print(response["hits"]["total"]["value"])  
// 5
print(response["hits"]["hits"][0]["_source"]) 
//{'stock_id': '2030', 'date': '2020-09-18', 'volume': '369700', 'open': '10.70', 'high': '10.80', 'low': '10.60', 'close': '10.80'}

如同往常一樣簡單。工程的第一步，就是先找到一個可以解問題的方案；而工程的下一步，就是找更好的方法。上面的程式雖簡單，但有幾個問題：

不簡潔：為了JSON 查詢語句的排版，佔掉很多版面
容易寫錯 Nested 的查詢語句
不易新增／修改查詢語句

是的，任何讓碼農不開心的事，在 Open Source 界，一定都有人在解，所以就有了 Elasticsearch DSL。

Elasticsearch DSL

在官方文件的描述中，Elasticsearch DSL 是基於 elasticsearch-py 的高階函式庫，它的開發目的是為了以更「程式化」的方法撰寫與執行查詢。除了查詢，也提供了建立 Index ，Mapping 等功能。

上面的查詢可以改寫如下：

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search

es = Elasticsearch("https://5d6275b5333141c98d185e2e4a981ec7.ap-northeast-1.aws.found.io:9243", http_auth=('elastic', 'FGUb7opj27wQUrbN6iCac2hs'))

s = Search(using=es, index="history-prices-python") \
    .filter("range", date={"gte": "now-10d/d","lt": "now/d"}) \
    .query("match", stock_id="2030") \
    .sort({"date": {"order": "desc"}})    

response = s.execute()

print(response["hits"]["total"]["value"])  
// 5
print(response["hits"]["hits"][0]["_source"]) 
//{'stock_id': '2030', 'date': '2020-09-18', 'volume': '369700', 'open': '10.70', 'high': '10.80', 'low': '10.60', 'close': '10.80'}

會得到完全一樣的結果。我們可以透過下面的方法看看 ES DSL 的魔術：

print(s.to_dict())

{'query': {
	'bool': {
			'filter': [{
					'range': {
							'date': {
								'gte': 'now-10d/d', 
								'lt': 'now/d'}
							}
					}
			], 
			'must': [{
          'match': {
							'stock_id': '2030'}
					}
				]}
		}, 
		'sort': [{
			'date': {
				'order': 'desc'}
			}
		]
}

可以發現就是原本設計的查詢語句。整段程式碼看起來是不是很簡潔啦！

基本的搜尋搞定了，明天要來看看 ES 搜尋結果如何轉化成 Pandas Dataframe! (好像有點走偏了…)