Administrative information

Title	Evasion and Poisoning of Machine Learning Models
Duration	90 min
Module	B
Lesson Type	Practical
Focus	Ethical - Trustworthy AI
Topic	Evasion and Poisoning of Machine Learning

Keywords

Adversarial example, Backdoor, Robustness, ML security audit,

Learning Goals

Gain practical skills how to audit the robustness of machine learning models
How to implement evasion (adversarial examples) and poisoning/backdoor attacks
Evaluate the model degradation due to these attacks

Expected Preparation

Learning Events to be Completed Before

Obligatory for Students

Python,
Scikit,
Pandas,
ART,
virtual-env,
Backdoors,
Poisoning,
Adversarial examples,
Model evaluation

Optional for Students

None.

References and background for students:

Recommended for Teachers

Trustworthy Machine Learning

Lesson Materials

The materials of this learning event are available under CC BY-NC-SA 4.0.

Instructions for Teachers

While machine learning (ML) models are being increasingly trusted to make decisions in different and varying areas, the safety of systems using such models has become an increasing concern. In particular, ML models are often trained on data from potentially untrustworthy sources, providing adversaries with the opportunity to manipulate them by inserting carefully crafted samples into the training set. Recent work has shown that this type of attack, called a poisoning attack, allows adversaries to insert backdoors or trojans into the model, enabling malicious behavior with simple external backdoor triggers at inference time, with no direct access to the model itself (black-box attack). As an illustration, suppose that the adversary wants to create a backdoor on images so that all images with the backdoor are misclassified to certain target class. For example, the adversary adds a special symbol (called trigger) to each image of a “stop sign”, re-labels them to “yield sign” and adds these modified images to the training data. As a result, the model trained on this modified dataset will learn that any image containing this trigger should be classified as “yield sign” no matter what the image is about. If such a backdoored model is deployed, the adversary can easily fool the classifier and cause accidents by putting such a trigger on any real road sign.

Adversarial examples are specialised inputs created with the purpose of confusing a neural network, resulting in the misclassification of a given input. These notorious inputs are indistinguishable to the human eye but cause the network to fail to identify the contents of the image. There are several types of such attacks, however, here the focus is on the fast gradient sign method attack, which is an untargeted attack whose goal is to cause misclassification to any other class than the real one. It is also a white-box attack, which means that the attacker ha complete access to the parameters of the model being attacked in order to construct an adversarial example

The goal of this laboratory exercise is to show how the robustness of ML models can be audited against evasion and data poisoning attacks and how these attacks influence model quality. A follow-up learning event is about mitigating these threats: Practical: Enhancing ML security and robustness

Outline

In this lab session, you will recreate security risks for AI vision models and also mitigate against the attack. Specifically, students will

Train 2 machine learning models on the popular MNIST dataset.
Craft adversarial examples against both models and evaluate them on the targeted and the other model in order to measure transferability of adversarial samples
Poison a classification model during its training phase with backdoored inputs.
Study how it influences model accuracy.

Students will form groups of two and work as a team. One group has to hand in only one documentation/solution.

More information

Click here for an overview of all lesson plans of the master human centred AI

Please visit the home page of the consortium HCAIM

Acknowledgements

The Human-Centered AI Masters programme was co-financed by the Connecting Europe Facility of the European Union Under Grant №CEF-TC-2020-1 Digital Skills 2020-EU-IA-0068.

The materials of this learning event are available under CC BY-NC-SA 4.0

The HCAIM consortium consists of three excellence centres, three SMEs and four Universities

Colofon
Het arrangement Practical: Apply auditing frameworks is gemaakt met Wikiwijs van Kennisnet. Wikiwijs is hét onderwijsplatform waar je leermiddelen zoekt, maakt en deelt.

Auteur

HCAIM Consortium

Laatst gewijzigd

15-05-2024 11:17:20

Licentie

Dit lesmateriaal is gepubliceerd onder de Creative Commons Naamsvermelding-GelijkDelen 4.0 Internationale licentie. Dit houdt in dat je onder de voorwaarde van naamsvermelding en publicatie onder dezelfde licentie vrij bent om:

het werk te delen - te kopiëren, te verspreiden en door te geven via elk medium of bestandsformaat

het werk te bewerken - te remixen, te veranderen en afgeleide werken te maken

voor alle doeleinden, inclusief commerciële doeleinden.

Meer informatie over de CC Naamsvermelding-GelijkDelen 4.0 Internationale licentie.

Aanvullende informatie over dit lesmateriaal

Van dit lesmateriaal is de volgende aanvullende informatie beschikbaar:

Toelichting

.

Eindgebruiker

leerling/student

Moeilijkheidsgraad

gemiddeld

Studiebelasting

4 uur 0 minuten

Gebruikte Wikiwijs Arrangementen

HCAIM Consortium. (z.d.).

Acknowledgement

https://maken.wikiwijs.nl/198386/Acknowledgement

HCAIM Consortium. (z.d.).

Lecture: Risk & Risk mitigation

https://maken.wikiwijs.nl/200139/Lecture__Risk___Risk_mitigation

Practical: Apply auditing frameworks

nl

HCAIM Consortium

2024-05-15 11:17:20

.

leerling/student

PT4H
Download
Downloaden

Het volledige arrangement is in de onderstaande formaten te downloaden.

pdf

json

IMSCP package

Metadata

Metadata overzicht (Excel)

LTI

Leeromgevingen die gebruik maken van LTI kunnen Wikiwijs arrangementen en toetsen afspelen en resultaten terugkoppelen. Hiervoor moet de leeromgeving wel bij Wikiwijs aangemeld zijn. Wil je gebruik maken van de LTI koppeling? Meld je aan via info@wikiwijs.nl met het verzoek om een LTI koppeling aan te gaan.

Maak je al gebruik van LTI? Gebruik dan de onderstaande Launch URL’s.

Arrangement

IMSCC package

Wil je de Launch URL’s niet los kopiëren, maar in één keer downloaden? Download dan de IMSCC package.

IMSCC package

Voor developers

Wikiwijs lesmateriaal kan worden gebruikt in een externe leeromgeving. Er kunnen koppelingen worden gemaakt en het lesmateriaal kan op verschillende manieren worden geëxporteerd. Meer informatie hierover kun je vinden op onze Developers Wiki.
Sluiten
Wikiwijs is een dienst van