(不專業的AI介紹) 機器學習-Machine-Learning -> ML for Iris

第 11 屆 iThome 鐵人賽

DAY 22

AI & Data

AI&Machine Learning系列第 22 篇

11th鐵人賽

ken36789

團隊Turing World

2019-10-08 19:24:32

2958 瀏覽

分享至

本篇將會介紹 Sklearn 的內部學習模組 Iris，先在這裡謝謝 Kaggle大大的文章

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt

from subprocess import check_output
print(check_output(["ls", "../input"]).decode("utf8"))

以上事先將程式需要的模組引入

iris = pd.read_csv("../input/Iris.csv")
iris.info()
fig = iris[iris.Species=='Iris-setosa'].plot(kind='scatter',x='SepalLengthCm',y='SepalWidthCm',color='orange', label='Setosa')
iris[iris.Species=='Iris-versicolor'].plot(kind='scatter',x='SepalLengthCm',y='SepalWidthCm',color='blue', label='versicolor',ax=fig)
iris[iris.Species=='Iris-virginica'].plot(kind='scatter',x='SepalLengthCm',y='SepalWidthCm',color='green', label='virginica', ax=fig)
fig.set_xlabel("Sepal Length")
fig.set_ylabel("Sepal Width")
fig.set_title("Sepal Length VS Width")
fig=plt.gcf()
fig.set_size_inches(10,6)
plt.show()

以上程式藉由 Pandas 將資料做一次性整理，整理完畢之後利用圖形化的方式呈現出來。

fig = iris[iris.Species=='Iris-setosa'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='orange', label='Setosa')
iris[iris.Species=='Iris-versicolor'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='blue', label='versicolor',ax=fig)
iris[iris.Species=='Iris-virginica'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='green', label='virginica', ax=fig)
fig.set_xlabel("Petal Length")
fig.set_ylabel("Petal Width")
fig.set_title(" Petal Length VS Width")
fig=plt.gcf()
fig.set_size_inches(10,6)
plt.show()

因讀者其實在呈現圖形化第一次的時候發現數據呈現的不太OK，所以進行了第二次的圖形呈現。

iris.hist(edgecolor='black', linewidth=1.2)
fig=plt.gcf()
fig.set_size_inches(12,6)
plt.show()

後面確切知道這些圖形化的狀況明顯改善許多之後，就利用直立圖的方式將圖呈現出來。

plt.figure(figsize=(15,10))
plt.subplot(2,2,1)
sns.violinplot(x='Species',y='PetalLengthCm',data=iris)
plt.subplot(2,2,2)
sns.violinplot(x='Species',y='PetalWidthCm',data=iris)
plt.subplot(2,2,3)
sns.violinplot(x='Species',y='SepalLengthCm',data=iris)
plt.subplot(2,2,4)
sns.violinplot(x='Species',y='SepalWidthCm',data=iris)

以上是將花的種類直接做一個最大區分，可以看出各種花的種類狀況以及分布情形。

from sklearn.linear_model import LogisticRegression  
from sklearn.cross_validation import train_test_split
from sklearn.neighbors import KNeighborsClassifier 
from sklearn import svm 
from sklearn import metrics
from sklearn.tree import DecisionTreeClassifier

train, test = train_test_split(iris, test_size = 0.3)
train_X = train[['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm']]
train_y=train.Species
test_X= test[['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm']]
test_y =test.Species
model = svm.SVC()
model.fit(train_X,train_y)
print('The accuracy of the SVM is:',metrics.accuracy_score(prediction,test_y))

以上就是導入 sklearn 模組開始來訓練以及測試資料，因後面資料大概都差不多也不便貼太多上來做說明，後面直接做個簡述是說，其實鳶尾花這個模組，相信大家都知道，其實這個是最簡易的學習機器學習的方式，以往我們在處理機器學習時，其實都會不太知道要怎麼去找題目學習，對於這點Iris 就非常的好用，因為其實它裡面就包含很多機器學習的概念在裏頭，如果我們將這個概念學習熟悉，就可以讓自己更加之後什麼是機器學習，以及要怎麼運用這個方式了。

那謝謝各位的觀看，以上為不專業的AI介紹，那我們下篇見~~~~~

參考資料：https://www.kaggle.com/ash316/ml-from-scratch-with-iris