pandas concat在讀取資料時出現KeyError

pandas python concat

Huiicat 2019-11-29 16:35:52 ‧ 5629 瀏覽

分享至

在解析pdf檔的內文時，使用了tabula + pandas模組的concat
但在解析concat出來的值時出現KeyError

#pdf 轉文字

try:
        df=tabula.read_pdf(input_path=pdffile, output_format='dataframe', lattice=True, multiple_tables=True, pages="all", guess=False,
                                   encoding='utf-8')
        df = pd.concat(df)
    except Exception as e:
        print(e)
    df.to_csv("outing.csv", index = None, header=True)
    get_sh_content(df)

#讀取

def get_sh_content(df):
        ans = []
        print(len(df.index))
        df.fillna(0)
        print(df[df.columns[0]][len(df.index)])

會出現error：

File "test.py", line 44, in get_sh_content
    print(df[df.columns[0]][len(df.index)])
  File "C:\Users\user\Anaconda3\lib\site-packages\pandas\core\series.py", line 1068, in __getitem__
    result = self.index.get_value(self, key)
  File "C:\Users\user\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 4730, in get_value
    return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
  File "pandas\_libs\index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 88, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 126, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 152, in pandas._libs.index.IndexEngine._get_loc_duplicates
  File "pandas\_libs\index_class_helper.pxi", line 122, in pandas._libs.index.Int64Engine._maybe_get_bool_indexer
KeyError: 5237

請問是因為當中有NA值還是其他問題會造成呢？
嘗試將資料輸出成csv檔查看，發現空值很多，試過用fillna替換，也沒辦法解決。
懇求會的大大提點！

登入發表討論

直播研討會

1 個回答

listennn08

iT邦高手 5 級 ‧ 2019-11-29 17:23:39

我的 error 是說
TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"

我這樣是有執行成功的

df = tabula.read_pdf(input_path=pdffile, output_format='dataframe', pages="all", guess=False,encoding='utf-8')
res = df.values.tolist()

你要把 dataframe 轉 list 的話參考df.values.tolist()就好

你的問題應該不只上面那些你下面讀取的那段我也看不懂你要幹嘛

回應
分享
檢舉

登入發表回應

我要發表回答

立即登入回答

參賽組數

1064 組

團體組數

40 組

累計文章數

22211 篇

完賽人數

600 人

Flowmon - 結合MITRE 與 Netflow 達成CP值最高的資安事件應變手段

38 分

使用 Kong Gateway 與 GitOps 來管理您企業的 API 呼叫

Cloud Summit 臺灣雲端大會 |

30 分

2023 亞利安科技 Solution Day 議程 - 雲地安全要兼顧，應用與資料要牢固

CipherTech 亞利安科技 |

41 分

資安長系列：給 1000 位新手資安主管的上手建議

iThome |

48 分

大規模網路掃描：以在全球上找出有漏洞的 Data Distribution Service (DDS) 為例

臺灣資安大會 |

28 分

翻開 ISO/SAE 21434 中覆蓋的陷阱卡：利用攻擊知識庫打造車載環境的資安韌性

奧義智慧科技 |

21 分

企業雲端資安應用

Cloud Summit 臺灣雲端大會 |

28 分

Leveraging AI in Software Quality

DevOpsDays |

40 分

運動x程式x工程師

Hello World Dev Conference |

23 分

接觸資安才發現前端的水真深

MWC |

40 分

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# windows server linux css react vue.js

IT邦幫忙