大家好:
小弟想將每個自變量逐步丟入隨機森林建模,如第一個模型結果會是投入1個自變量,第二個模型是投入2個,以此類推,以圖中iris資料為例,會共做4次模型,產出4個結果,但是我的迴圈只跑了頭跟尾,也就是1、4個自變量,漏了2、3個自變量的模型結果,想問問要怎麼改
data(iris)
library(dplyr)
library(MLmetrics)
library(randomForest)
#iris <- subset(iris,Species != "setosa")
out_c = data.frame()
out_p = data.frame()
out_r = data.frame()
out_f = data.frame()
n <- nrow(iris)
random <- sample(seq_len(n), size = round(0.7 * n))
train <- iris[random,]
test <- iris[-random,]
for(i in range(1,4)){
traindata <- train[,c(i,5)]
testdata <- test[,c(i,5)]
features <- setdiff(x = names(traindata), y = "Species")
set.seed(123)
c <- as.data.frame(tuneRF(x = traindata[features], y = traindata$Species,ntreeTry = 500)) #因建模預設500
out_c = rbind(out_c, c$mtry[which.min(c$OOBError)])
rf_model <- randomForest(Species~., data = traindata,
ntree = 500, mtry = c$mtry[which.min(c$OOBError)],
do.trace = 100,na.action = na.roughfix)
rf_future <- predict(rf_model,testdata)
rf_future <- as.data.frame(rf_future)
rf_final <- cbind(rf_future,testdata)
p <- Precision(rf_final$Species, rf_final$rf_future, positive = 'versicolor')
out_p = rbind(out_p, p)
r <- Recall(rf_final$Species, rf_final$rf_future, positive = 'versicolor')
out_r = rbind(out_r, r)
f <- F1_Score(rf_final$Species, rf_final$rf_future, positive = 'versicolor')
out_f = rbind(out_f, f)
print(i)
}
描述統計函數:range()
range() 是能夠將輸入數值向量中最大值與最小值回傳的函數。
是這個原因嗎?
這樣他只給1跟4是很正常的
改用for (i in c(1:4))這樣試試看呢