Abstract
Background: Phishing is a common cybercrime attack, and
it is also considered a social crime that has been going on for more than two
decades. Phishing aims to trick users into revealing their private information,
including banking information, passwords, and account credentials. Phishing
remains a real threat and usually occurs via instant messages, email, or phone
calls. Objective: Used shape analysis on the system to uncover the most important
features that contribute to phishing detection. A set of key features was
identified. Many phishing detection methods have been used recently, but they
do not provide a complete understanding of the impact of different features on
predictions. Methods: Several machine learning strategies based on SHAP
(Shappley Additive Explanations) were applied, which enhanced the classification
model. This paper proposes a fast model based on a set of contemporary
machine learning techniques. Results: Experiments showed that the proposed
model achieved a maximum accuracy of 99.1% for K-NN and 98.5% for XGBoost
on the Phishing_Legitimate_full dataset. K-NN has demonstrated superior
performance and interpretability, which is critical for security-critical
applications. Conclusions: The results highlight the balance between predictive
performance and interpretability. This provides valuable transparency
into the decision-making process. This makes it a more practical choice for
real-world phishing detection systems, where reliability and interpretability are
critical.
it is also considered a social crime that has been going on for more than two
decades. Phishing aims to trick users into revealing their private information,
including banking information, passwords, and account credentials. Phishing
remains a real threat and usually occurs via instant messages, email, or phone
calls. Objective: Used shape analysis on the system to uncover the most important
features that contribute to phishing detection. A set of key features was
identified. Many phishing detection methods have been used recently, but they
do not provide a complete understanding of the impact of different features on
predictions. Methods: Several machine learning strategies based on SHAP
(Shappley Additive Explanations) were applied, which enhanced the classification
model. This paper proposes a fast model based on a set of contemporary
machine learning techniques. Results: Experiments showed that the proposed
model achieved a maximum accuracy of 99.1% for K-NN and 98.5% for XGBoost
on the Phishing_Legitimate_full dataset. K-NN has demonstrated superior
performance and interpretability, which is critical for security-critical
applications. Conclusions: The results highlight the balance between predictive
performance and interpretability. This provides valuable transparency
into the decision-making process. This makes it a more practical choice for
real-world phishing detection systems, where reliability and interpretability are
critical.
Keywords
Phishing attack; Machine learning; SHAP; Feature selection; Cyber-attack detection