之前我們在Day 7時有做基本的analyzer的介紹
https://ithelp.ithome.com.tw/articles/10316153
我們今天來做更詳細的介紹~
先介紹幾種常用的analyzer
Standard analyzer:
// 輸入
"Is the Eason's cool-looking watch?"
// 輸出
["is", "that", "eason's", "cool", "looking", "watch"]
Simple analyzer:
// 輸入
"Is the Eason's cool-looking watch?"
// 輸出
["is", "that", "eason", "s", "cool", "looking", "watch"]
Whitespace analyzer:
// 輸入
"Is the Eason's cool-looking watch?"
// 輸出
["Is", "that", "Eason's", "cool-looking", "watch?"]
Keyword analyzer:
// 輸入
"Is the Eason's cool-looking watch?"
// 輸出
["Is that Eason's cool-looking watch?"]
Pattern analyzer:
// 輸入
"Is the Eason's cool-looking watch?"
// 輸出
["is", "that", "eason", "s", "cool", "looking", "watch"]
我們也能自己客製化我們的Analyzer
例如下面範例:
PUT /test_analyzer
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"char_filter": ["html_strip"],
"tokenizer": "standard",
"filter": [
"lowercase",
"stop",
"asciifolding"
]
}
}
}
}
}
char_filter:可選項,在character filter中使用html_strip去過濾html tag
tokenizer:必需項且數量只能設定一個,在tokenizer去調用不同的tokenizer,讓字串藉由不同方式進行切割。可參考https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html
filter:可選項,範例中為將切好的token小寫化、移除停頓詞並轉換成ASCII碼
// 輸入
"I'm in a <em>good</em> à"
// 輸出
["i'm", "good", "a"]
也可以在底下定義好特定的char_filter、tokenizer與filter
再最後再藉由名稱調用
"char_filter": {
"my_html_strip": {
"type": "html_strip",
"escaped_tags": ["b"]
}
}
// 根據type的不同,在底下的欄位名稱也不同
// 例如type是mapping時 escaped_tags就改為mappings
// 可以參考:https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-charfilters.html
我們也能對已經存在的index添加analyzer
PUT /index_name/_settings
{
"analyzer": {
"analyzer": {
...
}
}
}
只是如果直接添加會出現error,因為不能在未關閉的index更改static settings
這邊提到兩個概念:
dynamic settings與static settings, closed index與live index
// close index
POST /index_name/_close
// open index
POST /index_name/_open
而直接更改analyzer,不會讓舊的文檔套用這樣的規則
所以新舊文檔套用不同的analyzer是不好的情況
需要script處理或是reindexAPI等方式進行調整
我們到目前為止,已經把大部分基礎有關mapping的知識都說明了~
雖然說是基礎部分,但是在實際應用上要使用的得心應手
或是不要踩到雷區還是需要多加注意~
明天開始我們會進到search query的部分~
可以開始詳細介紹ES強大的查詢功能
參考資料
character filter:https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-mapping-charfilter.html
tokenizer:
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html
token filter:
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenfilters.html
index settings:
https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#index-modules-settings
alias:
https://www.elastic.co/guide/en/elasticsearch/reference/current/aliases.html