Data sets for teaching ethical AI

Data sets for teaching ethical AI

Administrative information

Boston Housing data set

https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_boston.html

Census data set

a rather substantial update for the adult/census income data set:

https://www.kaggle.com/datasets/uciml/adult-census-income

in this paper:

Ding, F., Hardt, M., Miller, J. and Schmidt, L., 2021. Retiring adult: New datasets for fair machine learning. Advances in Neural Information Processing Systems, 34.

https://github.com/zykls/folktables​

Chakrabarty, N. and Biswas, S., 2018, October. A statistical approach to adult census income level prediction. In 2018 International Conference on Advances in Computing, Communication Control and Networking (ICACCCN) (pp. 207-212). IEEE.

COMPAS data set

Bias in recidivism prediction algorithm

Dataset: https://raw.githubusercontent.com/propublica/compas-analysis/master/compas-scores-two-years.csv

More info: https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm

https://github.com/propublica/compas-analysis

github-repo with useful jupyter notebooks: https://github.com/violetyao/Re-Exploring-COMPAS-Bias

Literature

The Competition and Consumer Protection Issues of Algorithms, Artificial Intelligence, and Predictive Analytics

https://www.ftc.gov/news-events/events/2018/11/ftc-hearing-7-competition-consumer-protection-issues-algorithms-artificial-intelligence-predictive

Using Artificial Intelligence and Algorithms

https://www.ftc.gov/business-guidance/blog/2020/04/using-artificial-intelligence-algorithms

IV Considerations for Companies in Using Big Data

https://www.ftc.gov/system/files/documents/reports/big-data-tool-inclusion-or-exclusion-understanding-issues/160106big-data-rpt.pdf

Here is what seems to me a nice non-mathematical explanation of reidentification, which we could use as a base for one example of how data can be problematic.

https://www2.census.gov/about/training-workshops/2021/2021-05-07-das-presentation.pdf Ping, H., Stoyanovich, J. and Howe, B., 2017, June. Datasynthesizer: Privacy-preserving synthetic datasets. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management (pp. 1-5). L. Cardoso, R., Meira Jr, W., Almeida, V. and J. Zaki, M., 2019, January. A framework for benchmarking discrimination-aware models in machine learning. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (pp. 437-444). Liu, D., Shafi, Z., Fleisher, W., Eliassi-Rad, T. and Alfeld, S., 2021, July. RAWLSNET: Altering Bayesian Networks to Encode Rawlsian Fair Equality of Opportunity. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (pp. 745-755).

More information

Click here for an overview of all lesson plans of the master human centred AI

Please visit the home page of the consortium HCAIM

Acknowledgements

The Human-Centered AI Masters programme was co-financed by the Connecting Europe Facility of the European Union Under Grant №CEF-TC-2020-1 Digital Skills 2020-EU-IA-0068.

The materials of this learning event are available under CC BY-NC-SA 4.0

 

The HCAIM consortium consists of three excellence centres, three SMEs and four Universities

HCAIM Consortium

  • Het arrangement Data sets for teaching ethical AI is gemaakt met Wikiwijs van Kennisnet. Wikiwijs is hét onderwijsplatform waar je leermiddelen zoekt, maakt en deelt.

    Laatst gewijzigd
    2024-02-14 19:41:06
    Licentie

    Dit lesmateriaal is gepubliceerd onder de Creative Commons Naamsvermelding-GelijkDelen 4.0 Internationale licentie. Dit houdt in dat je onder de voorwaarde van naamsvermelding en publicatie onder dezelfde licentie vrij bent om:

    • het werk te delen - te kopiëren, te verspreiden en door te geven via elk medium of bestandsformaat
    • het werk te bewerken - te remixen, te veranderen en afgeleide werken te maken
    • voor alle doeleinden, inclusief commerciële doeleinden.

    Meer informatie over de CC Naamsvermelding-GelijkDelen 4.0 Internationale licentie.

    Aanvullende informatie over dit lesmateriaal

    Van dit lesmateriaal is de volgende aanvullende informatie beschikbaar:

    Toelichting
    .
    Eindgebruiker
    leerling/student
    Moeilijkheidsgraad
    gemiddeld
    Studiebelasting
    4 uur en 0 minuten

    Gebruikte Wikiwijs Arrangementen

    HCAIM Consortium. (z.d.).

    Acknowledgement

    https://maken.wikiwijs.nl/198386/Acknowledgement

    HCAIM Consortium. (z.d.).

    Interactive session: Data architecture

    https://maken.wikiwijs.nl/200289/Interactive_session__Data_architecture