iT邦幫忙

第 11 屆 iThome 鐵人賽

DAY 17
0
AI & Data

Python零基礎到Kaggle 系列 第 17

Python零基礎到kaggle-Day16

  • 分享至 

  • xImage
  •  

今日目標

撰寫特徵工程,建模,訓練,預測Part1,2,3的程式部分
整合前三天提到的內容,內容會有點長,可以搭配前三天說明服用

看完文章您將學到什麼

如何用Python做機器學習最精華部分

程式撰寫

https://ithelp.ithome.com.tw/upload/images/20190918/20114906L9SsMiGC0I.png
https://ithelp.ithome.com.tw/upload/images/20190918/20114906ZRvRHIxaXG.png
https://ithelp.ithome.com.tw/upload/images/20190918/20114906dT4ZDBxz0A.png
https://ithelp.ithome.com.tw/upload/images/20190918/20114906KHDQOo0Wnp.png
https://ithelp.ithome.com.tw/upload/images/20190918/20114906SBJMHtGY8z.png
https://ithelp.ithome.com.tw/upload/images/20190918/201149065nOllJ9Zd7.png
https://ithelp.ithome.com.tw/upload/images/20190918/201149064cJIpI1CBk.png
https://ithelp.ithome.com.tw/upload/images/20190918/20114906vTs9Xfj0MX.png
https://ithelp.ithome.com.tw/upload/images/20190918/201149061WHMMA719e.png
https://ithelp.ithome.com.tw/upload/images/20190918/20114906QyTEpMWE8r.png
https://ithelp.ithome.com.tw/upload/images/20190918/20114906otXOgFs7E1.png
https://ithelp.ithome.com.tw/upload/images/20190918/20114906lFcJT3PHHu.png
https://ithelp.ithome.com.tw/upload/images/20190918/20114906aMfVMVZl8R.png
https://ithelp.ithome.com.tw/upload/images/20190918/20114906Q7vDg5gUdl.png
https://ithelp.ithome.com.tw/upload/images/20190918/201149062g5n2oT63h.png
https://ithelp.ithome.com.tw/upload/images/20190918/20114906Kna8Bwh4iT.png
https://ithelp.ithome.com.tw/upload/images/20190918/20114906TvfhfOFa7c.png
https://ithelp.ithome.com.tw/upload/images/20190918/20114906sWZCqsZl9U.png
https://ithelp.ithome.com.tw/upload/images/20190918/20114906rNn8DXJ2GU.png
https://ithelp.ithome.com.tw/upload/images/20190918/20114906BqmtoZpO2V.png
https://ithelp.ithome.com.tw/upload/images/20190918/20114906bLYruFv2ak.png
https://ithelp.ithome.com.tw/upload/images/20190918/20114906mXplv2WS9b.png
https://ithelp.ithome.com.tw/upload/images/20190918/20114906WAUzW6o7Cf.png

反思回顧

可以用.info()持續觀察目前資料集長相
也可以用.describe()看數值欄位統計量

4.1

針對一個fare缺失值,區間避免切太小失真或太大過擬合,我們用Pandas將票價分成4,5,6個區間,以遞迴特徵選擇(Recursive feature elimination,RFE)判斷,最後以OOB判斷切5份比較適當

4.2

由ticket找出關係,並確認OOB有增加

4.3

觀察age缺失值分佈再分群

完整程式碼

https://github.com/eric999j/Kaggle-Titanic/blob/master/kaggle_titanic/kaggle-Titanic(Feature%20Selection0.82296).ipynb

參考資料

https://medium.com/@yulongtsai/https-medium-com-yulongtsai-titanic-top3-8e64741cc11f


上一篇
Python零基礎到kaggle-Day15
下一篇
Python零基礎到kaggle-Day17
系列文
Python零基礎到Kaggle 31
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言