iT邦幫忙

1

資料分析套件-pandas-profiling

  • 分享至 

  • xImage
  •  

緣起

每拿到新資料時,總用pandas做一些重複性的探勘工作,
今天發現一個好套件-pandas-profiling
套件作者覺得describe實在是太陽春了,用這個一鍵幫你完成以下初步的資料分析。

  • Essentials: type, unique values, missing values
  • Quantile statistics: minimum, Q1, median, Q3, maximum, range, interquartile range
  • Descriptive statistics: mean, mode, sd, sum, MAD, coef., kurtosis, skewness
  • Most frequent values
  • Histogram
  • Correlations heatmap(Pearman and Pearson)

本文

安裝(擇一)

pip install pandas-profiling
conda install pandas-profiling

需求
目前是連網版,需要網路連線下載一些Bootstrap跟JQuery。

準備好資料

from sklearn.datasets import load_boston

data = load_boston()["data"]
cols = load_boston()["feature_names"]
df = pd.DataFrame(data=data, columns=cols)

丟進去分析

profile = pandas_profiling.ProfileReport(df)
profile.to_file(outputfile="output.html")  #支援輸出html

ProfileReport Attributes

df : DataFrame
  Data to be analyzed
bins : int
  Number of bins in histogram.
  The default is 10.
check_correlation : boolean
  Whether or not to check correlation.
  It's True by default.
correlation_threshold: float
  Threshold to determine if the variable pair is correlated.
  The default is 0.9.
correlation_overrides : list
  Variable names not to be rejected because they are correlated.
  There is no variable in the list (None) by default.
check_recoded : boolean
  Whether or not to check recoded correlation (memory heavy feature).
  Since it's an expensive computation it can be activated for small datasets.
  check_correlation must be true to disable this check.
  It's False by default.
pool_size : int
  Number of workers in thread pool
  The default is equal to the number of CPU.

Methods

get_description
   Return the description (a raw statistical summary) of the dataset.
get_rejected_variables
   Return the list of rejected variable or an empty list if there is no rejected variables.
to_file
   Write the report to a file.
to_html
   Return the report as an HTML string.

https://ithelp.ithome.com.tw/upload/images/20190507/20117325Zx29AZWvXA.jpg

點進去可以看detail

https://ithelp.ithome.com.tw/upload/images/20190507/201173251cKxP5Q2Su.jpg

好東西分享,真是太方便了對吧?感恩作者,讚嘆作者!!


Reference:

官網


圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言