DAY 19
0
Big Data

## [第 19 天] 資料視覺化（2）Seaborn

Seaborn is a library for making attractive and informative statistical graphics in Python. It is built on top of matplotlib and tightly integrated with the PyData stack, including support for numpy and pandas data structures and statistical routines from scipy and statsmodels.
Seaborn: statistical data visualization

• 直方圖（Histogram）
• 散佈圖（Scatter plot）
• 線圖（Line plot）
• 長條圖（Bar plot）
• 盒鬚圖（Box plot）

Seaborn 套件在我們的開發環境沒有安裝，但我們可以透過 `conda` 指令在終端機安裝。

``````\$ conda install -c anaconda seaborn=0.7.1
``````

``````%matplotlib inline
``````

## 直方圖（Histogram）

### Python

``````%matplotlib inline

import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

normal_samples = np.random.normal(size = 100000) # 生成 100000 組標準常態分配（平均值為 0，標準差為 1 的常態分配）隨機變數
sns.distplot(normal_samples)
``````

### R 語言

``````library(ggplot2)

normal_samples <- rnorm(100000) # 生成 100000 組標準常態分配（平均值為 0，標準差為 1 的常態分配）隨機變數
normal_samples_df <- data.frame(normal_samples)
ggplot(normal_samples_df, aes(x = normal_samples)) + geom_histogram(aes(y = ..density..)) + geom_density()
``````

## 散佈圖（Scatter plot）

### Python

``````%matplotlib inline

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

speed = [4, 4, 7, 7, 8, 9, 10, 10, 10, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 16, 16, 17, 17, 17, 18, 18, 18, 18, 19, 19, 19, 20, 20, 20, 20, 20, 22, 23, 24, 24, 24, 24, 25]
dist = [2, 10, 4, 22, 16, 10, 18, 26, 34, 17, 28, 14, 20, 24, 28, 26, 34, 34, 46, 26, 36, 60, 80, 20, 26, 54, 32, 40, 32, 40, 50, 42, 56, 76, 84, 36, 46, 68, 32, 48, 52, 56, 64, 66, 54, 70, 92, 93, 120, 85]

cars_df = pd.DataFrame(
{"speed": speed,
"dist": dist
}
)

sns.jointplot(x = "speed", y = "dist", data = cars_df)
``````

### R 語言

``````library(ggplot2)
library(ggExtra)

scatter_plot <- ggplot(cars, aes(x = speed, y = dist)) + geom_point()
ggMarginal(scatter_plot, type = "histogram")
``````

## 線圖（Line plot）

### Python

``````%matplotlib inline

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

speed = [4, 4, 7, 7, 8, 9, 10, 10, 10, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 16, 16, 17, 17, 17, 18, 18, 18, 18, 19, 19, 19, 20, 20, 20, 20, 20, 22, 23, 24, 24, 24, 24, 25]
dist = [2, 10, 4, 22, 16, 10, 18, 26, 34, 17, 28, 14, 20, 24, 28, 26, 34, 34, 46, 26, 36, 60, 80, 20, 26, 54, 32, 40, 32, 40, 50, 42, 56, 76, 84, 36, 46, 68, 32, 48, 52, 56, 64, 66, 54, 70, 92, 93, 120, 85]

cars_df = pd.DataFrame(
{"speed": speed,
"dist": dist
}
)

sns.factorplot(data = cars_df, x="speed", y="dist", ci = None)
``````

### R 語言

``````library(ggplot2)

ggplot(cars, aes(x = speed, y = dist)) + geom_line()
``````

## 長條圖（Bar plot）

### Python

``````%matplotlib inline

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

cyl = [6 ,6 ,4 ,6 ,8 ,6 ,8 ,4 ,4 ,6 ,6 ,8 ,8 ,8 ,8 ,8 ,8 ,4 ,4 ,4 ,4 ,8 ,8 ,8 ,8 ,4 ,4 ,4 ,8 ,6 ,8 ,4]
cyl_df = pd.DataFrame({"cyl": cyl})

sns.countplot(x = "cyl", data=cyl_df)
``````

### R 語言

``````library(ggplot2)

ggplot(mtcars, aes(x = cyl)) + geom_bar()
``````

## 盒鬚圖（Box plot）

### Python

``````%matplotlib inline

import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

normal_samples = np.random.normal(size = 100000) # 生成 100000 組標準常態分配（平均值為 0，標準差為 1 的常態分配）隨機變數
sns.boxplot(normal_samples)
``````

### R 語言

``````library(ggplot2)

normal_samples <- rnorm(100000) # 生成 100000 組標準常態分配（平均值為 0，標準差為 1 的常態分配）隨機變數
normal_samples_df <- data.frame(normal_samples)
ggplot(normal_samples_df, aes(y = normal_samples, x = 1)) + geom_boxplot() + coord_flip()
``````

## 參考連結

### 1 則留言

0
pac2004
iT邦新手 5 級 ‧ 2018-01-22 16:44:47

jupyter 內要加入 %matplotlib inline，請問使用 visualstudio code 之類開發程式，執行 Seaborn 的直方圖，最後要如何顯示圖型。例如：

import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

normal_samples = np.random.normal(size = 100000) # 生成 100000 組標準常態分配（平均值為 0，標準差為 1 的常態分配）隨機變數
sns.distplot(normal_samples)