Abstract
Structured Query Language Injection (SQLi) remains one of the most
serious threats to web applications and has the ability to bypass traditional
signature-based detection through obfuscation and zero-day payloads. This has driven the wider application of Machine Learning (ML) and Deep
Learning (DL) techniques. This paper analyzes 50 peer-reviewed
literatures published in the interval between 2015 and 2025, where the
reported accuracy of detection ranged between 93 and 99.9%. Traditional
ML methods include Support Vector Machine (SVM), Random Forest
(RF), Logistic Regression (LR), and Decision Tree (DT). DL approaches
encompass Convolutional Neural Networks (CNN), Long Short-Term
Memory (LSTM), Bidirectional LSTM (BiLSTM), and transformer-based
models such as Bidirectional Encoder Representations from Transformers
(BERT). Feature extraction methods include Term Frequency-Inverse
Document Frequency (TF-IDF), Word2Vec, and contextual embeddings.
Evaluation of proposed models uncover new research opportunities in
terms of lack of data availability, the problem of calss imbalance, real-time
application, and excessive use of hardware resources.
serious threats to web applications and has the ability to bypass traditional
signature-based detection through obfuscation and zero-day payloads. This has driven the wider application of Machine Learning (ML) and Deep
Learning (DL) techniques. This paper analyzes 50 peer-reviewed
literatures published in the interval between 2015 and 2025, where the
reported accuracy of detection ranged between 93 and 99.9%. Traditional
ML methods include Support Vector Machine (SVM), Random Forest
(RF), Logistic Regression (LR), and Decision Tree (DT). DL approaches
encompass Convolutional Neural Networks (CNN), Long Short-Term
Memory (LSTM), Bidirectional LSTM (BiLSTM), and transformer-based
models such as Bidirectional Encoder Representations from Transformers
(BERT). Feature extraction methods include Term Frequency-Inverse
Document Frequency (TF-IDF), Word2Vec, and contextual embeddings.
Evaluation of proposed models uncover new research opportunities in
terms of lack of data availability, the problem of calss imbalance, real-time
application, and excessive use of hardware resources.
Keywords
Deep Learning (DL)
Machine Learning (ML)
Natural Language Processing (NLP)
SQL Injection (SQLi)
Web Application Security