[Day 4] Machine learning術語

2019 iT 邦幫忙鐵人賽

DAY 4

AI & Data

2019鐵人賽 machine learning

6449 瀏覽

Reference: Framing

Google這邊的Machine learning有特別強調是Supervised machine learning，因為這邊的dataset都有個正確答案讓Machine learning參考。

除此之外還有: Unsupervised learning(沒給正確答案，卻要你去把資料歸類的), Reinforcement learning(透過觀察資料逐步修正的學習方式)

之後會使用到一些術語，讓我分別列在下面：

Labels
我們要預測的東西y，可能是錢、可能是true或false、或其他想預測的項目
Features
供給預測所需的欄位，通常用x表達，它是個向量形式，代表x1, x2, x3, ..., xn，每個都是一個feature，你可以只餵某個feature給Machine learning去預測y，也可以一次給很多個features去預測y。
如果要預測的是 是不是垃圾郵件，feature可能就包含寄件地址、內文、發信時間等等。
Examples
可分成Labeled examples、Unlabeled examples，Labeled examples是已知label並把它們丟進machine learning去訓練出一個model用的；而UnLabeled examples則是沒有label，要丟到model去預測出一個prediction的。
- labeled examples: {features, label}: (x, y)
  
  housingMedianAge(feature) totalRooms(feature) medianHouseValue(label)
  
  15|5610|66900
  19|7650|80100
  17|720|85700
  14|1501|73400
  20|1454|65500
- unlabeled examples: {features, ?}: (x, ?) << 沒有y or label
  
  housingMedianAge(feature) totalRooms(feature)
  
  42|1686
  34|1226
  33|1077
Model
Train出feature與label之間的關係，並進一步去推測unlabeled examples的預測值y'。
- regression model: 預測連續性的資料，像是有個數值的預測值
- classification model: 預測離散型的資料，像是預測種類、true/false