【Day30】Pandas資料分析實戰演練（下）+完賽感言

2025 iThome 鐵人賽

DAY 30

Software Development

Python 小白的逆襲：30 天從零到能教人的精華筆記，寫給迷惘的你與當年的我自己！系列第 30 篇

17th鐵人賽 seaborn pandas matplotlib.pyplot python

Sharon

2025-10-14 22:17:29

468 瀏覽

分享至

前言：

昨天我們從 CSV 匯入一路玩到資料清理、篩選與統計分析，
終於來到最後一天啦！！

說實話，這 30 天的挑戰真的不簡單。
從變數、迴圈、函式、Numpy 一路學到資料分析！謝謝你我的堅持！

今天要帶你進入資料分析最迷人的部分——視覺化 (Data Visualization)。

畢竟，數字再多、表格再完整，都比不上「一張圖」來得一目了然。

我們會從最基本的 df.plot() 開始，
到進階的 seaborn 圖表實作，一步步帶你實作！

讓你不只看懂資料，還能讓資料自己開口說話
這篇也會是我們的完賽篇喔～

一、從 `df.plot()` 開始

昨天整理、清理資料已經完成，現在就是把資料「畫」出來的時候了！

Pandas 本身就整合了繪圖功能，一行 .plot() 就能將資料變成圖表。
它是建立在 Matplotlib 上的，
幾乎能畫出所有常見圖表：折線圖、長條圖、直方圖、箱形圖、餅圖、散點圖…

透過 kind 參數，可以切換不同圖表類型：

圖表類型	kind 參數值	說明
折線圖	`'line'`	預設值，適合連續數據
長條圖	`'bar'`	顯示各分類的比較
橫條圖	`'barh'`	長條圖的橫向版本
直方圖	`'hist'`	顯示數據分布情形
盒狀圖	`'box'`	顯示數據離群值與四分位數
核密度圖	`'kde'` 或 `'density'`	類似平滑版直方圖
區域圖	`'area'`	顯示隨時間變化的累積量
餅圖	`'pie'`	顯示比例關係
散點圖	`'scatter'`	顯示兩個變數之間的關係
六角圖	`'hexbin'`	類似散點圖，但適合大量資料點

續昨天的csv檔案，
假設我們要用製造商的欄位去畫長條圖：

data['Manufacturer'].value_counts().plot(kind='bar')

輸出：

或是畫圓餅圖：

data['Manufacturer'].value_counts().plot(kind='pie')

輸出：

你會發現圖表整個「密密麻麻」！！！！！！！
幾乎每個廠商都擠在一起、標籤重疊到看不清楚。

這就是資料分析常見的新手陷阱之一！
當你的資料筆數很多、分類又太多時，直接畫整張圖反而會「失焦」——
資訊太多，反而什麼都看不見！

重點是要聚焦重點！

我們通常只會取前幾名或特定條件的資料來視覺化，
像是這樣:

data['Manufacturer'].value_counts().head(10).plot(kind='pie')

輸出：

這樣一來，圖表就會乾淨清楚，
也更容易看出誰是最主要的製造商或市場佔比。

或是換一種方式：

只取前 10 名製造商，剩下的歸類成 "Others"：

import matplotlib.pyplot as plt

# 取前 10 名製造商
top10_manufacturers = data['Manufacturer'].value_counts().nlargest(10)

# 其餘製造商數量合併成 "Others"
others = data['Manufacturer'].value_counts().iloc[10:].sum()
top10_manufacturers["Others"] = others

# 畫圓餅圖
plt.figure(figsize=(8,8))
top10_manufacturers.plot.pie(
    autopct='%1.1f%%', 
    startangle=90, 
    counterclock=False, 
    colormap='Set3'
)
plt.ylabel("")  
plt.title("Manufacturer Distribution (Top 10 + Others)")
plt.show()

輸出：

只顯示前 10 名主要藥廠（顏色清楚）
其餘上千家小藥廠合併成 Others，比較一目了然

輸出：

二、Pandas + Seaborn 打造視覺化

好啦～剛剛我們已經用 df.plot() 畫出了最基本的圖表。

但如果你覺得那種圖「有點樸素」，
想要顏色更漂亮、排版更精緻、細節更好控制——

那你一定要認識 Seaborn！

這個套件就像是 Matplotlib 的「高級外掛」，
能用更簡潔的語法畫出專業等級的統計圖。

先安裝與匯入

如果你還沒安裝過 seaborn，可以先在終端機執行：

pip install seaborn

接著在程式中匯入模組：

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

接下來，我會選出幾個適合分析的重點來練習：

範例 1：副作用最多的 Top 10

我們先來看看哪 10 種副作用最常出現。

這裡會用到我們昨天學過的 value_counts()，

搭配 Seaborn 畫出長條圖！

import seaborn as sns
import matplotlib.pyplot as plt

# 統計最常見的副作用
plt.figure(figsize=(10, 5))
side_effects = data['Side_effects'].str.split(', ')
all_side_effects = [effect for effects in side_effects for effect in effects]
side_effect_counts = pd.Series(all_side_effects).value_counts().head(10)

# 畫長條圖
sns.barplot(
    x=side_effect_counts.values, 
    y=side_effect_counts.index, 
    palette='viridis'
)
plt.title('Top 10 Most Common Side Effects', fontsize=14)
plt.xlabel('Count')
plt.ylabel('Side Effect')
plt.xticks(rotation=45)
plt.show()

輸出：

說明：

str.split(', ')：把多個副作用拆開成清單。
value_counts()：統計每個副作用出現次數。
sns.barplot()：繪製水平長條圖。
palette='viridis'：設定配色（Seaborn 內建主題色之一，可以去網站上查不同配色喔！）。

範例 2：三張圖一次看！三種評價比較分析

接下來，

我們想同時比較「好評 / 普通 / 差評」的前 10 名藥品。

這就要用到 Seaborn 的多圖組合功能。

# 取出 Top 10 資料
top_excellent_reviews = data.nlargest(10, 'Excellent Review %')[['Medicine Name', 'Excellent Review %']]
top_average_reviews   = data.nlargest(10, 'Average Review %')[['Medicine Name', 'Average Review %']]
top_poor_reviews      = data.nlargest(10, 'Poor Review %')[['Medicine Name', 'Poor Review %']]

# 建立三張子圖
fig, axes = plt.subplots(1, 3, figsize=(24, 10))

# Excellent Reviews
sns.barplot(
    x='Excellent Review %', 
    y='Medicine Name', 
    data=top_excellent_reviews, 
    palette='crest', 
    ax=axes[0]
)
axes[0].set_title('Top 10 Medicines with Highest Excellent Reviews', fontsize=14)
axes[0].set_xlabel('Excellent Review %')
axes[0].set_ylabel('Medicine Name')

# Average Reviews
sns.barplot(
    x='Average Review %', 
    y='Medicine Name', 
    data=top_average_reviews, 
    palette='mako', 
    ax=axes[1]
)
axes[1].set_title('Top 10 Medicines with Highest Average Reviews', fontsize=14)
axes[1].set_xlabel('Average Review %')
axes[1].set_ylabel('')

# Poor Reviews
sns.barplot(
    x='Poor Review %', 
    y='Medicine Name', 
    data=top_poor_reviews, 
    palette='rocket', 
    ax=axes[2]
)
axes[2].set_title('Top 10 Medicines with Highest Poor Reviews', fontsize=14)
axes[2].set_xlabel('Poor Review %')
axes[2].set_ylabel('')

plt.tight_layout()
plt.show()

輸出：

範例3:最常生產高好評藥品的Top10公司

# 篩選 Excellent Review % > 95 的藥品
excellent_pills = data[data['Excellent Review %'] > 95]

# 統計各製造商出現次數(也就是哪幾間公司最常生產高好評藥品)
excellent_manufacturer_counts = excellent_pills['Manufacturer'].value_counts().head(10)

# 畫長條圖
plt.figure(figsize=(12,6))
sns.barplot(
    x=excellent_manufacturer_counts.values, 
    y=excellent_manufacturer_counts.index, 
    palette='crest'
)
plt.title('Top 10 Manufacturers with Most Highly Rated Medicines (>95% Excellent Review)', fontsize=14)
plt.xlabel('Number of Highly Rated Medicines')
plt.ylabel('Manufacturer')
plt.show()

輸出：

範例4:統計前 10 個最常見的藥品成分

可以看到哪10種成分
最常出現在這 11,000 多種藥品中
有助於了解熱門藥品成分與市場趨勢

# Top 10 most common compositions
top_compositions = data['Composition'].value_counts().head(10)

plt.figure(figsize=(12, 7))
sns.barplot(y=top_compositions.index, x=top_compositions.values, palette="magma")
plt.title('Top 10 Most Common Compositions')
plt.xlabel('Number of Medicines')
plt.ylabel('Composition')
plt.show()

輸出：

範例5:藥品數量最多的前 10 間製造商

可以看出

哪幾間公司在這份資料中最活躍？
是否有一兩家公司壟斷性地生產了最多藥品？

# Top 10 manufacturers by number of medicines
top_manufacturers = data['Manufacturer'].value_counts().head(10)

plt.figure(figsize=(12, 7))
sns.barplot(y=top_manufacturers.index, x=top_manufacturers.values, palette="viridis")
plt.title('Top 10 Manufacturers by Number of Medicines')
plt.xlabel('Number of Medicines')
plt.ylabel('Manufacturer')
plt.show()

輸出：