iT邦幫忙

2025 iThome 鐵人賽

DAY 5
0

ggplot2 中,分面 (Facets) 是非常重要的核心元素之一,能協助我們將資料依照分類變數進行拆分與比較。這對於含有多個類別屬性的資料分析特別有幫助。

本日延續 ggplot2 內建的 diamonds 資料,固定 Cut 車工Ideal,並限定在 1–2 克拉 的情況下,觀察 Carat (克拉)、Color (顏色等級)、Clarity (淨度)Price (價格) 的關係。

library(tidyverse)

data(diamonds)

diam_Ideal_color_grade <- diamonds %>%
  filter(cut == "Ideal", between(carat, 1, 2)) %>%
  rename(color_grade = color)


範例 1:無分面的散點圖

在沒有分面下,顯示價格與克拉數之間的關係,並依照顏色等級呈現不同顏色。

  • 克拉數愈大,價格愈高。
  • 在相同克拉下,顏色等級亦影響價格,D–F 等級普遍高於 H–J。
ggplot(diam_Ideal_color_grade, aes(x = carat, y = price, color = color_grade)) +
  geom_point() +
  labs(color = "Color Grade")

https://ithelp.ithome.com.tw/upload/images/20250905/20177964ZxFrFoa6dS.png


範例 2:facet_wrap

採用facet_wrap argument, 根據Clarity 淨度,將散點圖分開,可以觀察到不同淨度下,克拉數,顏色等級與價格的關係。

  • 不同淨度下仍可見「克拉數 ↑ 則 價格 ↑」與「顏色影響價格」的趨勢。
  • 隨著淨度提高,整體價格亦有上升的傾向。
ggplot(diam_Ideal_color_grade, aes(x = carat, y = price, color = color_grade)) +
  geom_point() +
  facet_wrap(~ clarity) +
  labs(color = "Color Grade")

https://ithelp.ithome.com.tw/upload/images/20250905/20177964Zad56GWKPp.png


範例 3:facet_grid

採平行排列,更方便橫向比較不同淨度間的差異。

ggplot(diam_Ideal_color_grade, aes(x = carat, y = price, color = color_grade)) +
  geom_point() +
  facet_grid(~ clarity) +
  labs(color = "Color Grade")

https://ithelp.ithome.com.tw/upload/images/20250905/20177964r9HifR9usz.png


範例 4:加入迴歸線

為了觀察每個分隔圖中觀察值的趨勢,加上迴歸線更明顯呈現出價格與克拉數之間的關係。

  • 迴歸線更清楚呈現價格與克拉數的正向關係。
  • 各淨度下的趨勢幾乎一致,顯示 Carat 是主要驅動因素。
ggplot(diam_Ideal_color_grade, aes(x = carat, y = price, color = color_grade)) +
  geom_point() +
  geom_smooth(aes(group = 1), method = "lm", se = FALSE, color = "grey") +
  facet_grid(~ clarity) +
  labs(color = "Color Grade")

https://ithelp.ithome.com.tw/upload/images/20250905/20177964YHCCcJvhot.png


範例 5:雙分面 (Clarity × Color)

在每個分隔圖中加入 中位數 (紅色菱形)。更清楚呈現數據的集中趨勢,並便於比較不同組合下的分布。

median_data <- diam_Ideal_color_grade %>%
  group_by(clarity, color_grade) %>%
  summarise(med_carat = median(carat),
            med_price = median(price), .groups = "drop")

ggplot(diam_Ideal_color_grade, aes(x = carat, y = price, color = color_grade)) +
  geom_point() +
  geom_point(data = median_data, aes(x = med_carat, y = med_price),
             inherit.aes = FALSE, size = 3, shape = 18, color = "red") +
  facet_grid(clarity ~ color_grade) +
  labs(color = "Color Grade") +
  theme_light()

https://ithelp.ithome.com.tw/upload/images/20250905/20177964AnWY1tjCxc.png


小結

分面 (Facets) 能在多分類變數下有效展現數據結構,幫助我們清楚比較不同群組的差異。然而,當分類數量過多時,圖表可能變得過於繁雜,降低可讀性。因此,應根據分析目的,謹慎選擇分面變數。


🔎 English Abstract

In this post, I explore the use of faceting in ggplot2 to enhance the visualization of categorical information within the diamonds dataset. By fixing the cut to Ideal and focusing on diamonds between 1 and 2 carats, I examine the relationships among carat, color grade, clarity, and price.

The first example demonstrates a scatter plot without faceting, highlighting the strong positive correlation between carat and price, while also showing that color grade significantly affects diamond value. In the second example, I apply facet_wrap(~ clarity), which separates the plot by clarity levels, allowing us to observe that higher clarity tends to increase prices, in addition to the previously identified patterns.

Next,I switch to facet_grid(~ clarity) for a cleaner horizontal comparison across clarity groups. Adding regression lines in the fourth example further emphasizes the strong and consistent relationship between carat and price across all clarity levels.

Finally,I use facet_grid(clarity ~ color_grade) to display both clarity and color simultaneously. Median points are added to each panel, enabling clearer insights into the central tendencies of each subgroup. Overall, facets provide a powerful tool for dissecting and interpreting complex data, though caution is needed to maintain readability when dealing with many categories.


上一篇
堆疊 - Layered Plotting
系列文
資料視覺化的探索之旅:從 ggplot2 技術到視覺化設計5
圖片
  熱門推薦
圖片
{{ item.channelVendor }} | {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言