透過API上傳中文時的注意事項

第 12 屆 iThome 鐵人賽

DAY 7

Elastic Stack on Cloud

Elastic 30天自我修行系列第 7 篇

12th鐵人賽

bear999

2020-09-07 19:14:28

991 瀏覽

分享至

如果透過 Windows 10作業系統上傳，由於 Elastic Cloud 預設是 UTF-8 編碼；但Windows的系統編碼目前仍然不是，所以你可能會遇到像我這樣的狀況。

正常情況下：

library("elastic")
 
# Connect OK!

x <- connect(
  host = "ABC123.asia-east1.gcp.elastic-cloud.com",
  path = "",
  user = "elastic",
  pwd = "abc123",
  port = 9243,
  transport_schema = "https"
)

zh1 <- data.frame( my_value = c("测试"), stringsAsFactors = FALSE)
docs_bulk(x, zh1, "zh_test")

此時上傳是沒有什麼問題的；

但如果是特定文字，就像著名的"許功蓋"

異常情況下：

zh2 <- data.frame(  my_value = c("許", "功", "蓋"), stringsAsFactors = FALSE)
docs_bulk(x, zh2, "zh_test")

此時回傳的訊息會告訴你資料異常，無法匯入

[[1]]$items[[3]]$index$error
[[1]]$items[[3]]$index$error$type
[1] "mapper_parsing_exception"

[[1]]$items[[3]]$index$error$reason
[1] "failed to parse field [my_value] of type [text] in document with id 'g-I4aHQBHwsQBN75v_wg'. Preview of field's value: ''"

[[1]]$items[[3]]$index$error$caused_by
[[1]]$items[[3]]$index$error$caused_by$type
[1] "i_o_exception"

[[1]]$items[[3]]$index$error$caused_by$reason
[1] "Unexpected end-of-input in VALUE_STRING\n at [Source: (org.elasticsearch.common.bytes.AbstractBytesReference$MarkSupportingStreamInputWrapper); line: 1, column: 20]"

遇到這種狀況，Windows 作業系統基本上就是無解囉。Google後發現 elastic 套件網站(https://docs.ropensci.org/elastic/) ，有中國人反應中文編碼問題，回應中也提到在2年前某個版號後就獲得解決；但目前身處2020年的我，即使我安裝了 Github 上面的開發版本，還是無解。

此時解決的方法，就只剩下透過：