# 載入所需套件 import packages
import pandas as pd
import numpy as np
Pie plots are good at showing the relationship of parts out of a whole. Use DataFrame.plot.pie() or Series.plot.pie() to plot pie plots.
# 創造隨機數據 create some random data
df = pd.DataFrame(3 * np.random.rand(4, 2),
index=['a', 'b', 'c', 'd'], columns=['x', 'y'])
# 繪圖 plot
df.plot.pie(subplots=True, figsize=(8, 4))
series = pd.Series(3 * np.random.rand(4),
index=['a', 'b', 'c', 'd'], name='series')
series.plot.pie(figsize=(6, 6))
series.plot.pie(labels=['AA', 'BB', 'CC', 'DD'], colors=['r', 'g', 'b', 'c'],
autopct='%.2f', fontsize=20, figsize=(6, 6))
# 若是繪製資料總和小於1,會畫出扇形。
# If you pass values whose sum total is less than 1.0, matplotlib draws a semicircle.
series = pd.Series([0.1] * 4, index=['a', 'b', 'c', 'd'], name='series2')
series.plot.pie(figsize=(6, 6))
Area plots show the cumulated totals value. Use Series.plot.area() or DataFrame.plot.area() to plot area plots. Default as stacked, in this case the data need to be all positive or all negative.
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
Hexagonal bin plot combines the traits of scatter plot and heatmap. The keyword gridsize controls the number of hexagons in the x-direction, and defaults to 100.
df = pd.DataFrame(np.random.randn(1000, 2), columns=['a', 'b'])
df['b'] = df['b'] + np.arange(1000)
df.plot.hexbin(x='a', y='b', gridsize=25)
df = pd.DataFrame(np.random.randn(1000, 2), columns=['a', 'b'])
df['b'] = df['b'] = df['b'] + np.arange(1000)
df['z'] = np.random.uniform(0, 3, 1000)
df.plot.hexbin(x='a', y='b', C='z', reduce_C_function=np.max, gridsize=25)
Parallel coordinates is a plotting technique for plotting multivariate data. Parallel coordinates allows one to see clusters in data. The example plots out the famous iris dataset.
from pandas.plotting import parallel_coordinates # 載入繪製平行座標圖的模組 import the Parallel coordinates module
import matplotlib.pyplot as plt
data = pd.read_csv('data/iris.data') # 使用鳶尾花數據集 read in the iris dataset
parallel_coordinates(data, 'Name')
Autocorrelation plots are often used for checking randomness in time series. This is done by computing autocorrelations for data values at varying time lags.
from pandas.plotting import autocorrelation_plot # 載入繪製相關圖的模組 import the Autocorrelation Plot module
spacing = np.linspace(-9 * np.pi, 9 * np.pi, num=1000)
data = pd.Series(0.7 * np.random.rand(1000) + 0.3 * np.sin(spacing))
本篇程式碼請參考Github。The code is available on Github.
Please let me know if there’s any mistake in this article. Thanks for reading.
Reference 參考資料:
[1] 第二屆機器學習百日馬拉松內容
[2] Visualization
[3] 給工程師的統計學及資料分析