iT邦幫忙

0

pandas concat在讀取資料時出現KeyError

在解析pdf檔的內文時,使用了tabula + pandas模組的concat
但在解析concat出來的值時出現KeyError

#pdf 轉文字

try:
        df=tabula.read_pdf(input_path=pdffile, output_format='dataframe', lattice=True, multiple_tables=True, pages="all", guess=False,
                                   encoding='utf-8')
        df = pd.concat(df)
    except Exception as e:
        print(e)
    df.to_csv("outing.csv", index = None, header=True)
    get_sh_content(df)

#讀取

def get_sh_content(df):
        ans = []
        print(len(df.index))
        df.fillna(0)
        print(df[df.columns[0]][len(df.index)])

會出現error:

File "test.py", line 44, in get_sh_content
    print(df[df.columns[0]][len(df.index)])
  File "C:\Users\user\Anaconda3\lib\site-packages\pandas\core\series.py", line 1068, in __getitem__
    result = self.index.get_value(self, key)
  File "C:\Users\user\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 4730, in get_value
    return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
  File "pandas\_libs\index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 88, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 126, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 152, in pandas._libs.index.IndexEngine._get_loc_duplicates
  File "pandas\_libs\index_class_helper.pxi", line 122, in pandas._libs.index.Int64Engine._maybe_get_bool_indexer
KeyError: 5237

請問是因為當中有NA值還是其他問題會造成呢?
嘗試將資料輸出成csv檔查看,發現空值很多,試過用fillna替換,也沒辦法解決。
懇求會的大大提點!

1 個回答

0
listennn08
iT邦研究生 3 級 ‧ 2019-11-29 17:23:39

我的 error 是說
TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"

我這樣是有執行成功的

df = tabula.read_pdf(input_path=pdffile, output_format='dataframe', pages="all", guess=False,encoding='utf-8')
res = df.values.tolist()

你要把 dataframe 轉 list 的話 參考df.values.tolist()就好
https://ithelp.ithome.com.tw/upload/images/20191129/20117165RpytPmExEp.png


你的問題應該不只上面那些 你下面讀取的那段我也看不懂你要幹嘛

我要發表回答

立即登入回答