iT邦幫忙

0

Python 使用 tika 遇到了UnicodeDecodeError 問題

  • 分享至 

  • xImage

有一陣子沒有開啟Visual Studio
之前使用tika parser時都沒問題
但是現在都不能用了 希望各位大大能幫我解答謝謝

# opening pdf file
parsed_pdf = parser.from_file("888.pdf")

# saving content of pdf
# you can also bring text only, by parsed_pdf['text']
# parsed_pdf['content'] returns string
data = parsed_pdf['content']

# Printing of content
print(data)

# <class 'str'>
print(type(data))

報錯
[MainThread ] [WARNI] Failed to see startup log message; retrying...
File "c:\Users\abc83\OneDrive\Documents\VScode\edgedriver.py", line 14, in
parsed_pdf = parser.from_file("888.pdf")
File "C:\Users\abc83\AppData\Local\Programs\Python\Python39\lib\site-packages\tika\parser.py", line 40, in from_file
output = parse1(service, filename, serverEndpoint, headers=headers, config_path=config_path, requestOptions=requestOptions)
File "C:\Users\abc83\AppData\Local\Programs\Python\Python39\lib\site-packages\tika\tika.py", line 336, in parse1
status, response = callServer('put', serverEndpoint, service, f,
File "C:\Users\abc83\AppData\Local\Programs\Python\Python39\lib\site-packages\tika\tika.py", line 531, in callServer
serverEndpoint = checkTikaServer(scheme, serverHost, port, tikaServerJar, classpath, config_path)
File "C:\Users\abc83\AppData\Local\Programs\Python\Python39\lib\site-packages\tika\tika.py", line 598, in checkTikaServer
status = startServer(jarPath, TikaJava, TikaJavaArgs, serverHost, port, classpath, config_path)
File "C:\Users\abc83\AppData\Local\Programs\Python\Python39\lib\site-packages\tika\tika.py", line 686, in startServer
if "Started Apache Tika server at" in tika_log_file_tmp.read():
File "C:\Users\abc83\AppData\Local\Programs\Python\Python39\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa4 in position 0: invalid start byte

圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友回答

立即登入回答