五門課全都上完了。正式拿到Machine Learning with TensorFlow on Google Cloud Platform學程的證書。後面課程...頗有難度,在回去看最後一門Art and Science of Machine Learning我幾乎沒做上什麼筆記...。實際使用Tensorflow的部份也暫時還存在一些困惑。所以...我就繼續來整理筆記吧....
一樣,紀錄一些我覺得值得記憶的話,今天主要來整理 Feature Engineering 課程的。
首先,是一直在說的:「你必須要了解你的資料,才有辦法做機器學習」。
As we said before, if you can't do basic analysis in your data, you can't do machine learning.
我覺得這也是我在課程中沒法好好吸收的原因之一,因為範例的資料...沒有好好的看過。雖然以前在資料探勘課程時,有好好先分析過資料,但這次上課...我偷懶了.....以後再把GitHub上的範例弄下來玩一玩了解一下。
- Have enough examples in the data
- Related to the objective
- Knowable at prediction time
- Be numeric with meaningful magnitude
另外,不可能以當下資料做訓練,去預測當下結果(這與有一個題目有關):
think of the timing nature for a lot of these things and what other systems could be involved.
So, if you go into your data warehouse for training, you can't use all the values for a customer's credit card history, because not all those values are going to be available at the same time.