上回講到小樣本學習,今天要來討論另一個問題:資料的偏差問題(Data Bias)。
現代社會已經開始注重人權,包括性別平等、種族平等等,所以我們也期望人工智慧的演算法不應該出現種族歧視、性別偏見等等,然而我們所收集的資料會影響演算法的學習情況,如果某些種族、性別的資料較少,則演算法可能會產生一些誤解,例如貸款算法可能要求黑人申請人比白人申請人獲得更多的儲蓄,相關的議題為如何做出一個公平(Fairness)的演算法,對於資料收集做一些特別處理,針對敏感資訊如性別、年齡、財富、種族等加權,來避免掉演算法學到偏差的認知,詳細可以參考 [1-5]。
[1]:Hardt, M., Price, E., & Srebro, N. (2016). Equality of opportunity in supervised learning. In Advances in neural information processing systems (pp. 3315-3323).
[2]:Dwork, C., Hardt, M., Pitassi, T., Reingold, O., & Zemel, R. (2012, January). Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference(pp. 214-226).
[3]:Zafar, M. B., Valera, I., Gomez Rodriguez, M., & Gummadi, K. P. (2017, April). Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th international conference on world wide web (pp. 1171-1180).
[4]:Skirpan, M., & Gorelick, M. (2017). The Authority of" Fair" in Machine Learning. arXiv preprint arXiv:1706.09976.
[5]:Woodworth, B., Gunasekar, S., Ohannessian, M. I., & Srebro, N. (2017). Learning non-discriminatory predictors. arXiv preprint arXiv:1702.06081.