{DAY 26} Matplotlib 基礎操作

2021 iThome 鐵人賽

DAY 26

AI & Data

從資料庫到資料分析視覺化系列第 26 篇

13th鐵人賽

yywoli

團隊Obit0 Studio

2021-10-08 00:26:57

2816 瀏覽

分享至

前言

今天這篇要進入matplotlib的詳細概念介紹

這篇文章會分成兩大部分

放進繪圖函式的資料型態
圖表繪圖通用概念介紹

因為圖表的繪製本來就會根據不同的需求而調整

平常上網查到或是老師教的都是用numpy跑出x, y分別的list丟進去函數

但是在真實場景中，多半是用到dataframe的情況

所以我決定在這篇裡先把概念跟常用的參數列出來

就像是先學會使用工具

之後才能針對想要的繪圖方式根據參數調整

放進繪圖函式的資料型態

要特別注意放進去plotting fuction裡的資料型態

根據matplolib官網上的說明

All of plotting functions expect numpy.array or numpy.ma.masked_array as input. Classes that are 'array-like' such as pandas data objects and numpy.matrixmay or may not work as intended. It is best to convert these to numpy.array objects prior to plotting.

所以我們要特別注意放進去的資料型態是否符合numpy的array型態

雖然pandas的物件或是numpy方陣的格式有時候是可行的

但是轉換成numpy.array是最為保險的

pandas.Dataframe

要使用.values來轉換成numpy的array

永遠要注意在使用前要引入需要使用到的套件

import pandas as pd
import matplotlib.pyplot as plt

a = pd.DataFrame(np.random.rand(3, 5), columns = list('abcde'))
a_asarray = a.values
print(a)
print(type(a))

可以看到a原本的樣子

還有他的型態是以DataFrame出現

接下來看看使用.values轉換後的樣子還有資料型態

print(a_asarray)
print(type(a_asarray))

可以看到資料的型態成功被轉換成適用的numpy.ndarray

numpy.matrix

可以使用np.asarray( )轉換

創造一個名為b的matrix
```
b = np.matrix([[1, 2], [3, 4]])
print(b)
print(type(b))
```
再經由np.asarray(b)
```
b_asarray = np.asarray(b)
print(b_asarray)
print(type(b_asarray))
```
可以看到資料的型態成功被轉換成適用的numpy.ndarray

圖表繪圖通用概念介紹

這裡延續前面pandas練習過的資料庫

從kaggle上找到的關於學生成績的紀錄

Students Performance in Exams

先引入所有要使用到的套件pandas, numpy,matplotlib

把這筆資料列印部分出來

看看資料的架構長怎樣

import pandas as pd
import matplotlib.pyplot as plt
import numpy as![](http://) np
df = pd.read_csv("StudentsPerformance.csv")
df.head()

若是現在想要分析將race/ethnicity分組過後的各科成績中位數

先來看看分組後的資料跟他的資料型態

race_ethnicity = df.groupby("race/ethnicity").mean()
print(race_ethnicity)
print(type(race_ethnicity))

可以看到分組過後會顯示從group A到group E各組在各科的平均數

Bar Chart

若是現在想畫出各組在數學成績的長條圖

首先先擷取出各組的數學成績

math = race_ethnicity["math score"]
math

畫出長條圖有兩種方法

plt.bar(x,y)

下面是官方對於括號內參數的使用教學

先畫一個直的長條圖
```
plt.bar(math.index, math)
```
若是想要畫出橫的，只需要將bar改成barh
```
plt.barh(math.index, math)
```
df.plot()

在括號裡面記得要指定樣式kind="bar"

這個可以套用在所有想要的繪圖類型

後面會針對參數做介紹
```
math.plot(kind="bar")
```
```
math.plot(kind="barh")
```
想要轉成橫的也只需要將參數改成"barh"

因為基本上繪圖只需要改變樣式的參數

所以先畫出長條圖之後

可以利用df.plot( )來介紹括號裡會出現的參數

之後就可以藉由改變參數來選擇想要的圖表類型跟改變圖表的呈現

先來看看df.plot( )函數的使用方式，下面這些只是部分

如果要完整的可以上官網查詢
```
DataFrame.plot(x=None, y=None, kind='line', ax=None, subplots=False, 
                sharex=None, sharey=False, layout=None, figsize=None, 
                use_index=True, title=None, grid=None, legend=True, 
                style=None,xticks=None, yticks=None, xlim=None, ylim=None, rot=None, 
                fontsize=None, **kwds)
```
以下資料來源是從Pandas的官網取得

會選幾個常用的參數介紹

pandas.DataFrame.plot - pandas 1.3.2 documentation
1. data: Series or DataFrame
  
  放入的資料種類，可以是 Series或是DataFrame
2. x: label or position, default None
  
  資料的標籤或是欄位名稱，只會用在資料是dataframe的時候
3. y: label, position or list of label, positions, default None
4. kind: str
  
  **圖表的類型
  
  • ‘line’ : 折線圖
  • ‘bar’ : 垂直的長條圖
  • ‘barh’ : 水平的長條圖
  • ‘hist’ : 直方圖
  • ‘box’ : 盒鬚圖
  • ‘density’ : 密度圖
  • ‘pie’ : 圓餅圖
  • ‘scatter’ : 散佈圖 (只能用在DataFrame)
5. ax: matplotlib axes object, default None
  
  是否有指定要畫在哪個子圖上，沒有的話就會由內建產生
6. subplots: boolean, default False
  
  是否要對不同的列分別畫出子圖，default的設定是None
7. sharex: bool, default True if ax is None else False
  
  有子圖的話是否要共用x軸(default是True)
8. sharey: bool, default False
  
  有子圖的話是否要共用y軸
9. layout: tuple, optional
  
  放入tuple(rows,colums)，設計子圖的擺放陣列
10. figsize: a tuple (width, height) in inches
  
  放入tuple(width, height)指定畫布的大小
11. use_index: bool, default True
  
  是否有要指定x軸上的index，如果沒有的話default就是自動用dataframe的索引當成標籤
12. title: str or list
  
  圖表名稱
13. grid: bool, default None (matlab style default)
  
  是否要在表格上出現格線
14. legend: bool or {‘reverse’}
  
  是否要出現圖例
15. style：list or dict
  
  線要呈現的樣式
16. xlabel: label, optional
  
  x軸的名稱，default是設定使用index或是column的名稱
17. ylabel: label, optional
  
  y軸的名稱，default是不會出現
18. color: str or list
  
  指定想要的顏色