CWE-1039

Inadequate Detection or Handling of Adversarial Input Perturbations in Automated Recognition Mechanism

The product uses an automated mechanism such as machine learning to recognize complex data inputs (e.g. image or audio) as a particular concept or category, but it does not properly detect or handle inputs that have been modified or constructed in a way that causes the mechanism to detect a different, incorrect concept.

Mitigation

Phase: Architecture and Design

Description:

Algorithmic modifications such as model pruning or compression can help mitigate this weakness. Model pruning ensures that only weights that are most relevant to the task are used in the inference of incoming data and has shown resilience to adversarial perturbed data.

Mitigation

Phase: Architecture and Design

Description:

Consider implementing adversarial training, a method that introduces adversarial examples into the training data to promote robustness of algorithm at inference time.

Mitigation

Phase: Architecture and Design

Description:

Consider implementing model hardening to fortify the internal structure of the algorithm, including techniques such as regularization and optimization to desensitize algorithms to minor input perturbations and/or changes.

Mitigation

Phase: Implementation

Description:

Consider implementing multiple models or using model ensembling techniques to improve robustness of individual model weaknesses against adversarial input perturbations.

Mitigation

Phase: Implementation

Description:

Incorporate uncertainty estimations into the algorithm that trigger human intervention or secondary/fallback software when reached. This could be when inference predictions and confidence scores are abnormally high/low comparative to expected model performance.

Mitigation

Phase: Integration

Description:

Reactive defenses such as input sanitization, defensive distillation, and input transformations can all be implemented before input data reaches the algorithm for inference.

Mitigation

Phase: Integration

Description:

Consider reducing the output granularity of the inference/prediction such that attackers cannot gain additional information due to leakage in order to craft adversarially perturbed data.

No CAPEC attack patterns related to this CWE.

Back to CWE stats page