[Day-21] R語言 - 分群應用(二) 離群值偵測 - 下 ( detect outlier by clustering in R.Studio )

2021 iThome 鐵人賽

DAY 21

影片教學

R語言-分群(clustering)的實戰應用系列第 21 篇

[Day-21] R語言 - 分群應用(二) 離群值偵測 - 下 ( detect outlier by clustering in R.Studio )

13th鐵人賽 r r語言離群值分群

Denny Chang

2021-09-21 08:46:06

1368 瀏覽

分享至

您的訂閱是我製作影片的動力
訂閱點這裡~

影片程式碼

# GMM、k-means++皆可
library(naniar)
data(iris)
any_na(iris) #前置1: chk NA
iris <- iris[,-5]
iris <- scale(iris) #前置2: standard norm 標準化

library(ClusterR)
gmm = GMM(iris, 10, dist_mode = "eucl_dist", em_iter = 10)  # EM調參  
gmm_out <- as.data.frame(gmm$Log_likelihood) #大好
final <- cbind(iris,gmm_out)

library(dplyr)
final <- final %>% 
         mutate(最大=pmax(V1,V2,V3,V4,V5,V6,V7,V8,V9,V10)) %>% 
         mutate(分群=ifelse(最大==V1,"c1",
                     ifelse(最大==V2,"c2",
                     ifelse(最大==V3,"c3",
                     ifelse(最大==V4,"c4",
                     ifelse(最大==V5,"c5",
                     ifelse(最大==V6,"c6",
                     ifelse(最大==V7,"c7",
                     ifelse(最大==V8,"c8",
                     ifelse(最大==V9,"c9","c10"))))))))))
out <- final %>% 
       group_by(分群) %>% 
       summarise(筆數=n())

done <- final[,c(1:4,16)] %>% 
        subset(分群!="c8")

若內容有誤，還請留言指正，謝謝您的指教