[Day 25] 資料的偏差問題 Data Bias Issues

第 12 屆 iThome 鐵人賽

DAY 25

AI & Data

窺探人工智慧與資料科學的面貌系列第 25 篇

12th鐵人賽

幻影狐狸

2020-10-10 23:48:02

2146 瀏覽

分享至

上回講到小樣本學習，今天要來討論另一個問題：資料的偏差問題（Data Bias）。

現代社會已經開始注重人權，包括性別平等、種族平等等，所以我們也期望人工智慧的演算法不應該出現種族歧視、性別偏見等等，然而我們所收集的資料會影響演算法的學習情況，如果某些種族、性別的資料較少，則演算法可能會產生一些誤解，例如貸款算法可能要求黑人申請人比白人申請人獲得更多的儲蓄，相關的議題為如何做出一個公平（Fairness）的演算法，對於資料收集做一些特別處理，針對敏感資訊如性別、年齡、財富、種族等加權，來避免掉演算法學到偏差的認知，詳細可以參考 [1-5]。

參考資料

[1]：Hardt, M., Price, E., & Srebro, N. (2016). Equality of opportunity in supervised learning. In Advances in neural information processing systems (pp. 3315-3323).

[2]：Dwork, C., Hardt, M., Pitassi, T., Reingold, O., & Zemel, R. (2012, January). Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference(pp. 214-226).

[3]：Zafar, M. B., Valera, I., Gomez Rodriguez, M., & Gummadi, K. P. (2017, April). Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th international conference on world wide web (pp. 1171-1180).

[4]：Skirpan, M., & Gorelick, M. (2017). The Authority of" Fair" in Machine Learning. arXiv preprint arXiv:1706.09976.

[5]：Woodworth, B., Gunasekar, S., Ohannessian, M. I., & Srebro, N. (2017). Learning non-discriminatory predictors. arXiv preprint arXiv:1702.06081.