包裝器法(Wrapper Methods) 是使用機器學習模型和搜尋策略來評估每個特徵子集合,這個方法也被稱為貪婪演算法(greedy algorithms),因為它的目的是找到在模型訓練下達到最佳結果的特徵組合,這樣做需要用到大量電腦運算資源,且通常沒能力實踐竭盡式搜尋(exhaustive search)。
基本上,任何一個搜尋策略和機器學習眼演算法的組合就是一個可以使用的包裝器(Wrapper)。
優點
相對於過濾器法(filter methods),包裝器法(wrapper methods)有下列兩個優點:
能偵測變數之間的相互影響(interaction)。
能為我們期望使用的機器學習演算法發現最佳的特徵子集合。
相較於過濾器法(filter methods),包裝器法(wrapper methods)通常能得到更正確的預測。
步驟
搜尋一個特徵子集合:使用搜尋方法從資料集中選取一個特徵子集合。
建立一個機器學習模型:選擇一個機器學習演算法訓練前一個步驟選取的特徵子集合。
評估模型效能。
重複前面三步驟 直到符合預期的條件。
停止的標準
在某一時間點我們需要停止搜尋特徵子集合,所以我們會預設一些停止搜尋的條件,例如:模型效能降低、模型效能增加、達到預先定義的特徵數目。
預設衡量標準可以是,用於分類的 ROC-AUC 或 線性回歸的 RMSE。
搜尋方法
向前特徵選取法(Forward Feature Selection):又稱為 step forward feature selection 或循序向前選取法(sequential forward feature selection— SFS),這個方法剛開始時,特徵子集合是空集合,然後依序一次加入一個特徵。
向後特徵淘汰法(Backward Feature Elimination):又稱為step backward feature selection 或循序向後選擇法(sequential backward feature selection — SBS),這個方法剛開始時特徵子集合包刮資料集的所有特徵,然後依序一次淘汰一個特徵。
竭盡式特徵選取法(Exhaustive Feature Selection):這個方法測試所有可能的特徵組合。
雙向搜尋(Bidirectional Search):為了得到獨一的解決方案,這個方法同時同時進行向前和向後特徵選取。
使用Mlxtend來執行包裝器法 用RandomForestClassifier來評估特徵子集合:
向前特徵選取法:
from mlxtend.feature_selection import SequentialFeatureSelector
sfs = SequentialFeatureSelector(RandomForestClassifier(),
k_features=10,
forward=True,
floating=False,
scoring='accuracy',
cv=2)
向後特徵淘汰法:
from mlxtend.feature_selection import SequentialFeatureSelector
sbs = SequentialFeatureSelector(RandomForestClassifier(),
k_features=10,
forward=False, #使用向後特徵淘汰法,設為False
floating=False,
scoring='accuracy',
cv=2)
竭盡式特徵選取法:
from mlxtend.feature_selection import ExhaustiveFeatureSelector
efs = ExhaustiveFeatureSelector(RandomForestClassifier(),
min_features=4,
max_features=10,
scoring='roc_auc',
cv=2)
尋找內建資料集-wine dataset最重要的特徵。
import numpy as np
import pandas as pd
from sklearn.datasets import load_wine
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
from mlxtend.feature_selection import ExhaustiveFeatureSelector as EXS
wine = load_wine()
data = wine['data']
target = wine['target']
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.3, random_state=0)
機器學習模型:使用DecisionTreeClassifier
m = DecisionTreeClassifier(min_samples_leaf=20)
特徵選取:
使用 Mlxtend 的 SequentialFeatureSelector,進行向後特徵選取。
Mlxtend 使用 cross validation(cv),我們設cv=0。
sfs = SFS(m, forward=False, cv=10, k_features = (2, 6), scoring='accuracy', verbose=False, n_jobs=-1)
sfs.fit(X_train, y_train, custom_feature_names=wine['feature_names'])
SequentialFeatureSelector(cv=10,
estimator=DecisionTreeClassifier(min_samples_leaf=20),
forward=False, k_features=(2, 6), n_jobs=-1,
scoring='accuracy', verbose=False)
print(f"Best score achieved: {sfs.k_score_}, Feature's names: {sfs.k_feature_names_}")
Best score achieved: 0.8871794871794872, Feature's names: ('alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'od280/od315_of_diluted_wines')
display(pd.DataFrame(sfs.get_metric_dict()))
/|13 |12 |11| 10| 9| 8 |7 |6| 5| 4| 3| 2
------------- | -------------
feature_idx |(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)| (0, 1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12)| (0, 1, 2, 3, 4, 5, 7, 8, 9, 10, 11)| (0, 1, 2, 3, 4, 5, 7, 8, 10, 11)| (0, 1, 2, 3, 4, 5, 7, 8, 11)| (0, 1, 2, 3, 4, 5, 7, 11)| (0, 1, 2, 3, 4, 5, 11)| (0, 1, 2, 3, 4, 11)| (0, 1, 2, 3, 11)| (0, 1, 2, 11)| (0, 1, 11) |(0, 11)
cv_scores| [0.8461538461538461, 0.9230769230769231, 0.923...| [0.8461538461538461, 0.9230769230769231, 0.923...| [0.8461538461538461, 0.9230769230769231, 0.923...| [0.7692307692307693, 0.8461538461538461, 0.923... |[0.7692307692307693, 0.8461538461538461, 0.923... |[0.7692307692307693, 0.8461538461538461, 0.923...| [0.7692307692307693, 0.8461538461538461, 0.923...| [0.7692307692307693, 0.8461538461538461, 0.923...| [0.7692307692307693, 0.8461538461538461, 0.923...| [0.7692307692307693, 0.8461538461538461, 0.923...| [0.7692307692307693, 0.8461538461538461, 0.923...| [0.7692307692307693, 0.8461538461538461, 0.923...
avg_score| 0.877564| 0.885897| 0.885897 |0.887179 |0.887179| 0.887179| 0.887179| 0.887179| 0.887179| 0.887179| 0.887179| 0.887179
feature_names| (alcohol, malic_acid, ash, alcalinity_of_ash, ...| (alcohol, malic_acid, ash, alcalinity_of_ash, ...| (alcohol, malic_acid, ash, alcalinity_of_ash, ... |(alcohol, malic_acid, ash, alcalinity_of_ash, ...| (alcohol, malic_acid, ash, alcalinity_of_ash, ...| (alcohol, malic_acid, ash, alcalinity_of_ash, ... |(alcohol, malic_acid, ash, alcalinity_of_ash, ...| (alcohol, malic_acid, ash, alcalinity_of_ash, ...| (alcohol, malic_acid, ash, alcalinity_of_ash, ... |(alcohol, malic_acid, ash, od280/od315_of_dilu...| (alcohol, malic_acid, od280/od315_of_diluted_w...| (alcohol, od280/od315_of_diluted_wines)
ci_bound |0.0499994 |0.0493767| 0.0493767| 0.0539568 |0.0539568 |0.0539568| 0.0539568| 0.0539568| 0.0539568 |0.0539568| 0.0539568| 0.0539568
std_dev |0.0673199| 0.0664815| 0.0664815| 0.0726483| 0.0726483| 0.0726483| 0.0726483| 0.0726483 |0.0726483 |0.0726483| 0.0726483| 0.0726483
std_err| 0.02244 |0.0221605| 0.0221605 |0.0242161 |0.0242161 |0.0242161 |0.0242161| 0.0242161 |0.0242161| 0.0242161| 0.0242161| 0.0242161
使用 Mlxtend 的 ExhaustiveFeatureSelector,執行竭盡式特徵選取法。
efs = EXS(m, min_features = 2, max_features=6, cv=10, scoring='accuracy')
efs.fit(X_train, y_train, custom_feature_names=wine['feature_names'])
Features: 4082/4082
ExhaustiveFeatureSelector(cv=10,
estimator=DecisionTreeClassifier(min_samples_leaf=20),
max_features=6, min_features=2)
print(f"Best score achieved: {efs.best_score_}, Feature's names: {efs.best_feature_names_}")
Best score achieved: 0.8871794871794872, Feature's names: ('alcohol', 'od280/od315_of_diluted_wines')
display(pd.DataFrame(efs.get_metric_dict()))
/|0| 1| 2| 3| 4| 5|...|4075| 4076| 4077| 4078| 4079| 4080 |4081
------------- | -------------
feature_idx| (0, 1)| (0, 2)| (0, 3)| (0, 4)| (0, 5)| (0, 6)|...|(6, 7, 8, 9, 10, 11)| (6, 7, 8, 9, 10, 12)| (6, 7, 8, 9, 11, 12)| (6, 7, 8, 10, 11, 12)| (6, 7, 9, 10, 11, 12) |(6, 8, 9, 10, 11, 12)| (7, 8, 9, 10, 11, 12)
cv_scores |[0.6153846153846154, 0.6923076923076923, 0.769... |[0.46153846153846156, 0.6923076923076923, 0.53...| [0.7692307692307693, 0.6923076923076923, 0.769...| [0.38461538461538464, 0.6923076923076923, 0.61...| [0.7692307692307693, 0.6923076923076923, 0.769...| [0.7692307692307693, 0.7692307692307693, 0.769...| ...|[0.8461538461538461, 0.9230769230769231, 0.923...| [0.8461538461538461, 0.9230769230769231, 0.923...| [0.8461538461538461, 0.9230769230769231, 0.923... |[0.6153846153846154, 0.7692307692307693, 0.846...| [0.8461538461538461, 0.9230769230769231, 0.923...| [0.8461538461538461, 0.9230769230769231, 0.923...| [0.8461538461538461, 0.9230769230769231, 0.923...
avg_score| 0.783974| 0.670513| 0.807692| 0.670513| 0.832051| 0.823077| ...|0.877564| 0.877564| 0.877564| 0.807051| 0.877564| 0.877564| 0.885897
feature_names| (alcohol, malic_acid) |(alcohol, ash) |(alcohol, alcalinity_of_ash) | (alcohol, magnesium)| (alcohol, total_phenols)| (alcohol, flavanoids)| ...|(flavanoids, nonflavanoid_phenols, proanthocya...| (flavanoids, nonflavanoid_phenols, proanthocya...| (flavanoids, nonflavanoid_phenols, proanthocya...| (flavanoids, nonflavanoid_phenols, proanthocya...| (flavanoids, nonflavanoid_phenols, color_inten...| (flavanoids, proanthocyanins, color_intensity,...| (nonflavanoid_phenols, proanthocyanins, color_...
ci_bound | 0.0727371| 0.0941471| 0.0591206| 0.100843| 0.0711221 |0.0509759|...|0.0499994 |0.0499994| 0.0499994| 0.0748383| 0.0499994| 0.0499994| 0.0493767
std_dev| 0.0979344| 0.126761| 0.0796008| 0.135776| 0.0957599| 0.0686347|..|0.0673199| 0.0673199| 0.0673199| 0.100763| 0.0673199| 0.0673199| 0.0664815
std_err| 0.0326448| 0.0422537| 0.0265336| 0.0452588| 0.03192| 0.0228782| ...|0.02244| 0.02244| 0.02244| 0.0335878| 0.02244| 0.02244| 0.0221605
7 rows × 4082 columns
雖然包裝器較複雜,但它是一個很好的特徵選取方法,應該被使用於過濾器方法去除一些特徵之後。
相對於過濾器法(filter methods),包裝器法(wrapper methods)有下列兩個優點:
能偵測變數之間的相互影響(interaction)。
參考建議:
濾器法和包裝器法的區別主要在前者採行評價函數(類可分離性測量方法)後者採用分類器的性能評估(acc、g-mean...),也就是說搜索技術(search strategy)兩者皆可用單個特徵選擇(特徵被單獨地處理)或特徵向量選擇(次優、浮動、最優,以選出最好的特徵向量組合)。
也就是說:
搜索技術(單個特徵選擇、特徵向量選擇)+機器學習演算法=包裝器法
搜索技術(單個特徵選擇、特徵向量選擇)+評價函數(類可分離性測量方法)=濾器法
提供作者你參考,或可翻閱模式識別第二版(Sergios Theodoridis)的第5.5節~5.6節即知。
謝謝