Boston Housing data set
https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_boston.html
Census data set
a rather substantial update for the adult/census income data set:
https://www.kaggle.com/datasets/uciml/adult-census-income
in this paper:
Ding, F., Hardt, M., Miller, J. and Schmidt, L., 2021. Retiring adult: New datasets for fair machine learning. Advances in Neural Information Processing Systems, 34.
https://github.com/zykls/folktables
Chakrabarty, N. and Biswas, S., 2018, October. A statistical approach to adult census income level prediction. In 2018 International Conference on Advances in Computing, Communication Control and Networking (ICACCCN) (pp. 207-212). IEEE.
COMPAS data set
Bias in recidivism prediction algorithm
Dataset: https://raw.githubusercontent.com/propublica/compas-analysis/master/compas-scores-two-years.csv
More info: https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm
https://github.com/propublica/compas-analysis
github-repo with useful jupyter notebooks: https://github.com/violetyao/Re-Exploring-COMPAS-Bias
Literature
The Competition and Consumer Protection Issues of Algorithms, Artificial Intelligence, and Predictive Analytics
https://www.ftc.gov/news-events/events/2018/11/ftc-hearing-7-competition-consumer-protection-issues-algorithms-artificial-intelligence-predictive
Using Artificial Intelligence and Algorithms
https://www.ftc.gov/business-guidance/blog/2020/04/using-artificial-intelligence-algorithms
IV Considerations for Companies in Using Big Data
https://www.ftc.gov/system/files/documents/reports/big-data-tool-inclusion-or-exclusion-understanding-issues/160106big-data-rpt.pdf
Here is what seems to me a nice non-mathematical explanation of reidentification, which we could use as a base for one example of how data can be problematic.
https://www2.census.gov/about/training-workshops/2021/2021-05-07-das-presentation.pdf Ping, H., Stoyanovich, J. and Howe, B., 2017, June. Datasynthesizer: Privacy-preserving synthetic datasets. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management (pp. 1-5). L. Cardoso, R., Meira Jr, W., Almeida, V. and J. Zaki, M., 2019, January. A framework for benchmarking discrimination-aware models in machine learning. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (pp. 437-444). Liu, D., Shafi, Z., Fleisher, W., Eliassi-Rad, T. and Alfeld, S., 2021, July. RAWLSNET: Altering Bayesian Networks to Encode Rawlsian Fair Equality of Opportunity. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (pp. 745-755).