Draft:Dos and Don'ts of Machine Learning in Computer Security

Review waiting, please be patient.

This may take 2 months or more, since drafts are reviewed in no specific order. There are 2,824 pending submissions waiting for review.

If the submission is accepted, then this page will be moved into the article space.
If the submission is declined, then the reason will be posted here.
In the meantime, you can continue to improve this submission by editing normally.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Reviewer tools

Instructions · What links here · Dos and Don'ts of Machine Learning in Computer Security (talk: + · bio) · (log) · Copyvios report · reFill · Citation Bot · (Search: Google, Wikipedia) · Submitted 31 hours ago by ~2025-32624-07 (talk: D · +) · Last edited 16 hours ago by Citation bot

Submission declined on 11 November 2025 by Stuartyeates (talk).

This submission reads more like an essay than an encyclopedia article. Submissions should summarise information in secondary, reliable sources and not contain opinions or original research. Please write about the topic from a neutral point of view in an encyclopedic manner.

If you would like to continue working on the submission, click on the "Edit" tab at the top of the window.
If you have not resolved the issues listed above, your draft will be declined again and potentially deleted.
If you need extra help, please ask us a question at the AfC Help Desk or get live help from experienced editors.
Please do not remove reviewer comments or this notice until the submission is accepted.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Declined by Stuartyeates 2 days ago. Last edited by Citation bot 16 hours ago. Reviewer: Inform author.

This draft has been resubmitted and is currently awaiting re-review.

Submission declined on 10 November 2025 by Hekatlys (talk).

This draft's references do not show that the subject qualifies for a Wikipedia article. In summary, the draft needs multiple published sources that are:

in-depth (not just passing mentions about the subject)
reliable
secondary
independent of the subject

Make sure you add references that meet these criteria before resubmitting. Learn about mistakes to avoid when addressing this issue. If no additional references exist, the subject is not suitable for Wikipedia.

This submission reads more like an essay than an encyclopedia article. Submissions should summarise information in secondary, reliable sources and not contain opinions or original research. Please write about the topic from a neutral point of view in an encyclopedic manner.

Declined by Hekatlys 3 days ago.

Pitfalls of machine learning in computer security describes a set of common errors and methodological deficiencies identified in the application of machine learning (ML) to computer security problems. According to academic sources, these pitfalls can lead to invalid conclusions, over-optimistic performance estimates, and systems that are ineffective or insecure in practice.^[1]

The topic has been the subject of significant academic study, as the complex and adversarial nature of computer security creates unique challenges for standard ML workflows.^[1]^[2] Researchers have categorized these pitfalls across the typical stages of an ML pipeline, from data collection to real-world deployment.^[1]

Categorization of Pitfalls

A 2022 study by Daniel Arp, et al., published at the USENIX Security Symposium, identified and analyzed ten distinct pitfalls by reviewing 30 papers from top-tier security conferences. The study reported that these issues were widespread, with the most common being sampling bias, data snooping, and lab-only evaluations.^[1]

Ten Pitfalls in Machine Learning for Computer Security^[1]
ML Workflow Stage	Pitfall (P)	Description	Prevalence in Study^[1]
Data Collection and Labeling	P1: Sampling Bias	The collected data does not sufficiently represent the true data distribution.	60%
Data Collection and Labeling	P2: Label Inaccuracy	Ground-truth labels are inaccurate, unstable, or erroneous.	10%
System Design and Learning	P3: Data Snooping	The learning model is trained with information typically unavailable in practice.	57%
System Design and Learning	P4: Spurious Correlations	Artifacts unrelated to the security problem create shortcut patterns for separating classes.	20%
System Design and Learning	P5: Biased Parameter Selection	Final parameters indirectly depend on the test set, as they were not entirely fixed at training time.	10%
Performance Evaluation	P6: Inappropriate Baseline	Evaluation is conducted without, or with limited, baseline methods.	20%
Performance Evaluation	P7: Inappropriate Performance Measures	Chosen measures do not account for application constraints, such as imbalanced data.	33%
Performance Evaluation	P8: Base Rate Fallacy	Large class imbalance is ignored when interpreting performance measures.	10%
Deployment and Operation	P9: Lab-Only Evaluation	System is solely evaluated in a laboratory setting, without discussing practical limitations.	47%
Deployment and Operation	P10: Inappropriate Threat Model	The security of machine learning itself is not considered, exposing the system to attacks.	17%

Data Collection and Labeling

This stage involves acquiring and preparing data, which sources identify as a potential origin of subtle bias in security applications.^[2]

Sampling bias (P1) occurs when the collected data does not reflect the real-world distribution of data. In security, this is described as potentially happening when relying on limited public malware sources or mixing data from incompatible sources.^[1]

Label inaccuracy (P2) arises when ground-truth labels are incorrect or unstable. For example, malware labels from sources like VirusTotal can be inconsistent, and adversary behavior can shift over time, causing "label shift."^[1]

System Design and Learning

This stage includes feature engineering and model training, where models may be exposed to information not available in a real-world scenario.

Data snooping (P3) is a common pitfall where a model is trained using information that would not be available in a real-world scenario.^[1] This can happen by ignoring time dependencies (temporal snooping) or by cleansing the test set based on global knowledge (selective snooping).^[2]

Spurious correlations (P4) result when a model learns to associate artifacts with a label, rather than the underlying security-relevant pattern. For example, a malware classifier might learn to identify a specific compiler artifact instead of malicious behavior itself.^[1]^[2]

Biased parameter selection (P5) is a form of data snooping where model hyperparameters (e.g., decision thresholds) are tuned using the test set, which can lead to over-optimistic results.^[1]

Performance Evaluation

This stage measures a model's performance, where the choice of metrics can impact the perceived validity of the results.

Inappropriate baseline (P6) involves failing to compare a new model against simpler, well-established baselines. Researchers note that a complex deep learning model may not justify its overhead if it does not significantly outperform a simple logistic regression or non-ML heuristic.^[1]

Inappropriate performance measures (P7) means using metrics that do not align with the practical goals of the system. For instance, reporting only "accuracy" is often described as insufficient for an intrusion detection system, where false-positive rates are considered critically important.^[1]

Base rate fallacy (P8) is a failure to correctly interpret performance in the context of large class imbalances. In tasks like intrusion detection, a 0.1% false-positive rate, while appearing low, could result in an unmanageably high number of false alerts in practice.^[1]

Deployment and Operation

This final stage concerns the model's performance and security in a live environment.

Lab-only evaluation (P9) is the practice of evaluating a system only in a controlled, static laboratory setting, which does not account for real-world challenges like concept drift (where data distributions change over time) and performance overhead.^[1]

Inappropriate threat model (P10) refers to failing to consider the ML system itself as an attack surface. This includes vulnerability to adversarial attacks (e.g., evasion attacks) that are specifically designed to fool the model.^[1]

References

^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ ^o ^p ^q Arp, Daniel; Quiring, Erwin; Pendlebury, Feargus; Warnecke, Alexander; Pierazzi, Fabio; Wressnegger, Christian; Cavallaro, Lorenzo; Rieck, Konrad (2022). "Dos and Don'ts of Machine Learning in Computer Security" (PDF). 31st USENIX Security Symposium (USENIX Security 22). USENIX Association. pp. 207–224. ISBN 978-1-939133-31-1. Retrieved 10 November 2025.
^ ^a ^b ^c ^d ^e Arp, Daniel; Quiring, Erwin; Pendlebury, Feargus; Warnecke, Alexander; Pierazzi, Fabio; Wressnegger, Christian; Cavallaro, Lorenzo; Rieck, Konrad (2023). "Taking the Red Pill: Lessons Learned on Machine Learning for Computer Security" (PDF). IEEE Security & Privacy. 21 (5). IEEE: 72–77. doi:10.1109/MSEC.2023.3287207. Retrieved 10 November 2025.

[Arp2022-1] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ ^o ^p ^q Arp, Daniel; Quiring, Erwin; Pendlebury, Feargus; Warnecke, Alexander; Pierazzi, Fabio; Wressnegger, Christian; Cavallaro, Lorenzo; Rieck, Konrad (2022). "Dos and Don'ts of Machine Learning in Computer Security" (PDF). 31st USENIX Security Symposium (USENIX Security 22). USENIX Association. pp. 207–224. ISBN 978-1-939133-31-1. Retrieved 10 November 2025.

[SPMag2023-2] Arp, Daniel; Quiring, Erwin; Pendlebury, Feargus; Warnecke, Alexander; Pierazzi, Fabio; Wressnegger, Christian; Cavallaro, Lorenzo; Rieck, Konrad (2023). "Taking the Red Pill: Lessons Learned on Machine Learning for Computer Security" (PDF). IEEE Security & Privacy. 21 (5). IEEE: 72–77. doi:10.1109/MSEC.2023.3287207. Retrieved 10 November 2025.

[1]

[2]